Data for Dummies: The essential data glossary

15 Feb 2017

Image: DONOT6_STUDIO/Shutterstock

Are you a data novice? Do you find yourself nodding along but not really understanding what people are saying? Not to worry – here are all the data-related terms you need to know.

As we gradually move (or are dragged kicking and screaming) into Industry 4.0, we can no longer afford to ‘play dumb’ when it comes to data.

Much like we taught our parents how to use Facebook, so too must you teach yourself about encryption, cloud computing and the internet of things.

But where do you start? How do you tell your algorithms from your zip drives? How do you begin to wade through the mass of data terminology that is available at the click of a button?

Let’s begin with the basics.


This is information that has been converted into another form to be processed or analysed.

Big data

This refers to the vast amounts of structured and unstructured data that can come from a myriad of sources. It incorporates the ‘three Vs’: volume, variety and velocity, and can be measured in petabytes or exabytes (a hell of a lot of information, in other words). Small data can be managed more easily, tying in with the idea presented by Allen Bonde that “big data is for machines; small data is for people”.

Open data

This is content that can be freely accessed, used, edited and distributed anywhere, by anyone, at any time. The Open Definition was introduced in 2005 and promotes the spirit of interoperability, where no technical or legal barriers to this data exist.

Data warehouse

As the name suggests, this is a digital repository where businesses store their data. A hashing system may be used to make data easily searchable, so that different company departments can each other’s content. Data warehousing is the process of this storage, which is used in everyday applications such as booking flights and withdrawing cash from an ATM.

Data mart

A subset of the data warehouse, this is a store of data used by a particular group within a company, such as the sales team. In contrast to a central archive, data marts target a specific need or purpose. Data virtualisation is the management of such data.

Data mining

Companies can mine the information gathered from raw data and analyse it to better inform future business decisions. This requires complex database software such as Microsoft SQL Server to form predictive analytics. If this seems like jargon to you, a simple example lies in supermarkets, where information is garnered from customer loyalty cards to define a target market for future products.

Data centre

This is a facility containing a large number of networked computers used for storing, processing and distributing large amounts of data. It houses IT equipment such as servers, routers and firewalls, as well as necessary infrastructure for the building such as power supplies, backup generators and ventilation systems. As the focal point of critical IT operations, data centres are the beating heart of a business.

As easy as A to Z


Image: kirill_makarov/Shutterstock


A procedure, or set of rules, for solving a particular problem


The use of maths, statistics and computer programming to discover relevant patterns in recorded information


Application program interface – a set of instructions on how to access and build web-based software applications


Artificial intelligence – the creation of computing machines that can simulate human intelligence


A Google data storage system that manages the company’s core services, such as Search and Maps


The statistical analysis of human characteristics, both physiological and behavioural


A temporary store of data, used in web browsers to save frequently accessed web pages


A company that offers telecommunication services, such as Vodafone or BT 


Image: Dmitriy Karelin/Shutterstock


In telecoms, this is the part of the network through which data passes between two points.

‘The cloud’ is also a buzzword for the internet, referring to the software and services that can be accessed online, rather than just from your computer 

Cloud computing

The delivery of hosted services over the internet, which falls under three categories:

Public: Online services delivered to the general public

Private: Services made available only to a single organisation

Hybrid: A mixture of private and public cloud services for greater flexibility


The practice of privately owned servers renting out space in a data centre


The provision of proper ventilation to ensure data equipment and processes remain at the optimum temperature

Disaster recovery

A strategic plan that enables a business to retain or resume critical functions after a negative incident has occurred, such as a cyberattack


A distributed denial-of-service attack is the flooding of a website with traffic, potentially causing it to crash or shut down

Distributed file system

An application to allow clients to remotely access data stored on the server


The conversion of data into code to prevent unauthorised access. This practice has made the news in recent months, due to recent WhatsApp policies


Image: SumanBhaumik/Shutterstock


Giga is derived from the Greek for giant, which is apt as it equals 1bn bytes of computer data storage. A gigabit has 1bn bits of information, usually used in describing telecoms technology


General Data Protection Regulation – a European Commission privacy regulation that will come into effect on 25 May 2018, imposing harsher penalties for non-compliance with data protection standards


A free Java-based program under the Apache software library that allows for the processing of large data sets across a distributed computer network

IP address

Standing for Internet Protocol, this is a number assigned to a piece of hardware, such as a computer, which identifies the sender or receiver of online information


The internet of things is the interconnected system of computer devices; everyday objects that transfer data via the internet. The industrial internet of things (IIoT) is the use of this technology in the manufacturing industry


Internet service provider – exactly what it says on the tin


A popular programming language used by developers to create web content and smartphone applications


You might have guessed this one – a delay in the transfer of data. Also known as that buffering symbol that turns you into a gigantic ball of rage


One megabyte equals 8 megabits. Megabytes refer to computer storage and memory, whereas megabit is used to describe internet connection speed


Data that describes other data. This information is used by search engines to filter through documents and generate appropriate matches 


Image: karlstury/Shutterstock

Open Compute Project

A Facebook-led initiative, this is a community-based organisation that shares designs of data centre products with other members of the IT industry in a bid to improve infrastructure and boost innovation

Open source

A computer program with a source code that can be modified to suit specific needs. Open source software promotes collaborative efforts, encouraging programmers to make their own work freely available 


Platform-as-a-service – a cloud computing model that allows developers to manage online applications


Power usage effectiveness – a ratio to measure the energy efficiency of a data centre


Software-as-a-service – a software distribution model that allows a service provider to deliver applications to a customer via the internet

Software-defined network

Network technology that enables engineers to manage network behaviour through open interfaces, controlling data traffic without touching individual switches 

Source code

The core component of a computer program that is readable by humans


An electromagnetic archive. Data storage devices can be removable and connected to the computer via an input/output setting, such as a USB stick


Image: Andrii Zastrozhnov/Shutterstock


Made famous by Netflix, this is a technique for transferring data that supports a steady, uninterrupted stream of content, allowing for superior visual or audio quality


Transmission Control Protocol/Internet Protocol – a set of rules to govern communications on the internet


Heading into monster territory, a terabyte is 1trn bytes of computer storage capacity. Used in data communications, a terabit is 1trn binary digits

Tier-one carrier

An internet service provider that is the sole operator of its own network, with a direct connection to the internet and other network services


The creation of a virtual model of a network, server, storage device or operating system

Zip drive

A portable device used to back up computer files. Coming a long way from the birth of the floppy disk, the world’s highest capacity USB flash drive was recently revealed at CES 2017

Updated, 11.25am, 15 February 2018: This article was updated to attribute a quote about big data to Allen Bonde.

Shelly Madden was sub-editor of Silicon Republic