Are you a data novice? Do you find yourself nodding along but not really understanding what people are saying? Not to worry – here are all the data-related terms you need to know.
As we gradually move (or are dragged kicking and screaming) into Industry 4.0, we can no longer afford to ‘play dumb’ when it comes to data.
Much like we taught our parents how to use Facebook, so too must you teach yourself about encryption, cloud computing and the internet of things.
But where do you start? How do you tell your algorithms from your zip drives? How do you begin to wade through the mass of data terminology that is available at the click of a button?
Let’s begin with the basics.
This is information that has been converted into another form to be processed or analysed.
This refers to the vast amounts of structured and unstructured data that can come from a myriad of sources. It incorporates the ‘three Vs’: volume, variety and velocity, and can be measured in petabytes or exabytes (a hell of a lot of information, in other words). Small data can be managed more easily, tying in with the idea presented by Allen Bonde that “big data is for machines; small data is for people”.
This is content that can be freely accessed, used, edited and distributed anywhere, by anyone, at any time. The Open Definition was introduced in 2005 and promotes the spirit of interoperability, where no technical or legal barriers to this data exist.
As the name suggests, this is a digital repository where businesses store their data. A hashing system may be used to make data easily searchable, so that different company departments can each other’s content. Data warehousing is the process of this storage, which is used in everyday applications such as booking flights and withdrawing cash from an ATM.
A subset of the data warehouse, this is a store of data used by a particular group within a company, such as the sales team. In contrast to a central archive, data marts target a specific need or purpose. Data virtualisation is the management of such data.
Companies can mine the information gathered from raw data and analyse it to better inform future business decisions. This requires complex database software such as Microsoft SQL Server to form predictive analytics. If this seems like jargon to you, a simple example lies in supermarkets, where information is garnered from customer loyalty cards to define a target market for future products.
This is a facility containing a large number of networked computers used for storing, processing and distributing large amounts of data. It houses IT equipment such as servers, routers and firewalls, as well as necessary infrastructure for the building such as power supplies, backup generators and ventilation systems. As the focal point of critical IT operations, data centres are the beating heart of a business.
As easy as A to Z
A procedure, or set of rules, for solving a particular problem
The use of maths, statistics and computer programming to discover relevant patterns in recorded information
Application program interface – a set of instructions on how to access and build web-based software applications
Artificial intelligence – the creation of computing machines that can simulate human intelligence
A Google data storage system that manages the company’s core services, such as Search and Maps
The statistical analysis of human characteristics, both physiological and behavioural
A temporary store of data, used in web browsers to save frequently accessed web pages
A company that offers telecommunication services, such as Vodafone or BT
In telecoms, this is the part of the network through which data passes between two points.
‘The cloud’ is also a buzzword for the internet, referring to the software and services that can be accessed online, rather than just from your computer
The delivery of hosted services over the internet, which falls under three categories:
Public: Online services delivered to the general public
Private: Services made available only to a single organisation
Hybrid: A mixture of private and public cloud services for greater flexibility
The practice of privately owned servers renting out space in a data centre
The provision of proper ventilation to ensure data equipment and processes remain at the optimum temperature
A strategic plan that enables a business to retain or resume critical functions after a negative incident has occurred, such as a cyberattack
A distributed denial-of-service attack is the flooding of a website with traffic, potentially causing it to crash or shut down
Distributed file system
An application to allow clients to remotely access data stored on the server
The conversion of data into code to prevent unauthorised access. This practice has made the news in recent months, due to recent WhatsApp policies
Giga is derived from the Greek for giant, which is apt as it equals 1bn bytes of computer data storage. A gigabit has 1bn bits of information, usually used in describing telecoms technology
General Data Protection Regulation – a European Commission privacy regulation that will come into effect on 25 May 2018, imposing harsher penalties for non-compliance with data protection standards
A free Java-based program under the Apache software library that allows for the processing of large data sets across a distributed computer network
Standing for Internet Protocol, this is a number assigned to a piece of hardware, such as a computer, which identifies the sender or receiver of online information
The internet of things is the interconnected system of computer devices; everyday objects that transfer data via the internet. The industrial internet of things (IIoT) is the use of this technology in the manufacturing industry
Internet service provider – exactly what it says on the tin
A popular programming language used by developers to create web content and smartphone applications
You might have guessed this one – a delay in the transfer of data. Also known as that buffering symbol that turns you into a gigantic ball of rage
One megabyte equals 8 megabits. Megabytes refer to computer storage and memory, whereas megabit is used to describe internet connection speed
Data that describes other data. This information is used by search engines to filter through documents and generate appropriate matches
Open Compute Project
A Facebook-led initiative, this is a community-based organisation that shares designs of data centre products with other members of the IT industry in a bid to improve infrastructure and boost innovation
A computer program with a source code that can be modified to suit specific needs. Open source software promotes collaborative efforts, encouraging programmers to make their own work freely available
Platform-as-a-service – a cloud computing model that allows developers to manage online applications
Power usage effectiveness – a ratio to measure the energy efficiency of a data centre
Software-as-a-service – a software distribution model that allows a service provider to deliver applications to a customer via the internet
Network technology that enables engineers to manage network behaviour through open interfaces, controlling data traffic without touching individual switches
The core component of a computer program that is readable by humans
An electromagnetic archive. Data storage devices can be removable and connected to the computer via an input/output setting, such as a USB stick
Made famous by Netflix, this is a technique for transferring data that supports a steady, uninterrupted stream of content, allowing for superior visual or audio quality
Transmission Control Protocol/Internet Protocol – a set of rules to govern communications on the internet
Heading into monster territory, a terabyte is 1trn bytes of computer storage capacity. Used in data communications, a terabit is 1trn binary digits
An internet service provider that is the sole operator of its own network, with a direct connection to the internet and other network services
The creation of a virtual model of a network, server, storage device or operating system
A portable device used to back up computer files. Coming a long way from the birth of the floppy disk, the world’s highest capacity USB flash drive was recently revealed at CES 2017
Updated, 11.25am, 15 February 2018: This article was updated to attribute a quote about big data to Allen Bonde.