Why is metadata management so important when it comes to GDPR?

1 Oct 2018

Octopai CEO Amnon Drori, wearing a navy blazer and white and blue striped shirt.

Amnon Drori, CEO and co-founder of Octopai. Image: Octopai

GDPR compliance is an ongoing priority and metadata management is a crucial aspect.

Complying with GDPR and ensuring the safety of data is a major priority for all organisations, but metadata is an often overlooked element of this process.

Siliconrepublic.com spoke to Amnon Drori, CEO and co-founder of metadata management and data lineage firm Octopai, about how to manage this issue.

What kind of role does metadata play in GDPR compliance?

Under the new GDPR regulations, any EU customer data that can identify consumers must be anonymised or deleted completely and, in order to do this, companies must now have a level of awareness about their data that previously would have been unheard of.

Organisations must know their data backwards and forwards, inside and out, where it originated, who had access to it, which changes it underwent and when. They must know where it resides throughout the organisation’s multiple business intelligence (BI) systems – such as Power BI, Cognos, Informatica, DataStage, for example – and how it affects other data items in order to ensure that the data mandated by GDPR is both identified and handled properly.

To gain full clarity of the data an organisation possesses, it must have access to the associated metadata because only metadata can give us the insights we need to ensure compliance. Metadata ultimately tells you where data comes from, where it resides in all the different systems, how it’s being used and by whom. It is key to governing your data and if you can’t answer all of those questions, you’re simply not going to be GDPR-compliant.

What are the risks when metadata is not examined?

Without a robust metadata management strategy, organisations are likely to make business decisions based on incorrect data. A multivendor/multilingual BI infrastructure is likely full of data elements that carry the same meaning but are named differently (eg DOB, date of birth, D-O-B, birth date, birthday) and without the ability to correlate these data elements, it is very difficult to find and understand one’s data.

Metadata management is critical for ensuring data quality and it is essentially responsible for ensuring consistency and accuracy of data across various reporting systems.

What are people missing when it comes to metadata?

Many people don’t realise just how critical metadata management is for data management as a whole. Whether an organisation is focusing on data governance or data quality and being able to trust its data to make accurate data-driven business decisions, if metadata is not managed well, it is very difficult for an organisation to find data and accurately understand the data movement process.

Many people don’t realise that metadata is the only way to track where data resides within and flows through an organisation’s various multivendor BI systems. Many metadata management tools on the market today are vendor-specific and therefore are unable to provide a complete view of the data journey, and still require some manual searching/tracing to discover and understand how data flows through the entire BI infrastructure.

Collecting the metadata and documenting the process flows and data lineage manually requires a significant amount of time, resources and is prone to error, especially in larger organisations with many reports and analyses.

What kind of data can be missed or difficult to locate?

Often, there are different definitions or metadata attributed to the same thing, which can make it very difficult to locate certain data. For example, if a BI analyst needs to identify and encrypt a phone number, how can he/she know every single place where it resides?

Different systems have different metadata labelling. Sometimes, the field could be labelled ‘phone number’, while other times it could be ‘mobile phone number’ and other times as ‘telephone number’.

A seemingly simple task instantly becomes super-complicated and time-consuming. Imagine that the BI group would have to manually go through everything, searching for every unique denomination of the term ‘phone number’. For a bit of perspective, it would be like opening 500 Excel sheets and trying to find every single location of a certain piece of data. You might be able to use filters here and there, but the process would take months and months of manual searching.

Also, data that is not explicit data can be difficult to locate. For example, data that is called ‘credit card’ is easy to understand, but data that is called ‘churn rate’, which is a calculation of several data elements, is very difficult to find because it involves many other data elements that are being calculated somewhere in the BI infrastructure.

What rules do you recommend for ongoing compliance?

Manually mapping out our data and trying to glean insights and understand cross-connections (especially for regulatory purposes) is a thing of the past given how many platforms the average enterprise uses, how many connection points there are and how much data each enterprise produces on a daily basis.

The risk of inaccuracy is simply too great, so we have to give our BI teams the support they need and automate the process of mapping data lineage so they can spend more of their time doing what they are actually paid to do: business intelligence.