Could homomorphic encryption be the solution to big data’s problem?


14 Oct 2020

Image: © Irina Shatilova/Stock.adobe.com

Helical founder Eric Hess discusses how homomorphic encryption could change the way data is transferred and processed securely.

Click here to view the full Data Science Week series.

While advances in data analytics have enabled businesses to gain expanded insight into large structured and unstructured datasets, these advances have come with increased privacy and misappropriation risks.

Exercising greater control over the life cycle of data and confidentiality agreements has mitigated these risks but outsourcing of sensitive or regulated components of data processing to third parties is still widely viewed as fraught with risk.

If all sensitive data or data processes and algorithms could be shared with or processed by any third party (including competitors) subject to the provider’s controls, however, it would open up unimagined avenues of enterprise collaboration, specialisation and integration.

Homomorphic encryption solves for this significant gap and, while commercial viability is still a challenge, compelling use cases are emerging. In the coming years, any organisation endeavouring to become a centre of excellence in big data analytics will have no choice but to embrace homomorphic encryption.

Encryption and its limitations

Encryption is a digital safe where information is secured while locked inside. Plaintext data is converted to ciphertext using an algorithm that is sufficiently complicated to make the data unreadable without a decryption key. It can be stored and transmitted in this format and recipients can decrypt it, provided they have the key. Once encrypted data is needed for analysis, compliance or any other use case, it must be converted back to plaintext, which can sacrifice security.

Homomorphic encryption addresses this core weakness by allowing analysis on data in its ciphertext form. Craig Gentry, an early homomorphic encryption innovator, described the process as manipulating the contents of a locked box through gloves that are accessed through ports on the outside of the box.

One party places and locks contents in the box for a third party to manipulate without seeing what they are working on. The box is returned to the controller when the processor has completed the assigned task and custody is never surrendered.

Gentry’s dissertation made homomorphic encryption attainable with one major barrier: computational overhead. Processing ciphertext creates a lot of overhead as the calculations are performed bit by bit. IBM has improved processing overhead, claiming it now runs 75 times faster than before, and a wide range of alternative schemes have further improved processing speeds.

Real-life applications of homomorphic encryption

Spurred by the collaborative models being deployed in connection with potential Covid-19 vaccines and treatments, homomorphic encryption will likely experience the highest relative rates of adoption and innovation in clinical research.

Homomorphic encryption can provide a mechanism for the life sciences industry to continue protecting intellectual property while leveraging the collaborative benefits from Covid-19 in other medical research.

Use cases will also be compelling for financial services, where data analytics defines the success or failure of algorithms and is becoming increasingly important as relative high-frequency trading advantages become more elusive. National security and critical infrastructure also provide early compelling use cases.

‘Encrypted processing will create new opportunities, applications and even industries’

New opportunities will be created for data controllers (those with custody of data) to engage with data processors, as well as collaborative opportunities where the parties are both controllers and processors of data. Collaborative opportunities not only offer the benefits of specialisation but the promise of ‘data collectives’ as well, where members will be able to define terms of use and disclosed outputs among its members.

Data collectives are not a new concept to securities markets. For example, in 2005 the US Securities and Exchange Commission mandated regulated security markets to act jointly to disseminate consolidated information on quotations and transactions in securities markets.

Now, homomorphic encryption could empower competitive financial firms to not only provide alternatives to these sources, but innovate collectively to create their own proprietary market data products.

Machine learning and the cloud

For all the promise of machine learning, the process of training and tuning machine learning applications requires big datasets.

Industry collectives could aggregate encrypted data and assign processes to collective members or vendors. Not only would this permit greater specialisation, but the collective dataset would accelerate machine learning in a way that additional computing power or PhDs cannot.

A recent IBM case study leveraging machine learning on a homomorphically encrypted database sought to predict whether bank customers would likely need a loan in the near future. A machine learning algorithm selected the most relevant variables for predicting loan status. The algorithm was trained on both encrypted and unencrypted data to measure accuracy and efficiency. The result was a near identical rate of accuracy and a manageable level of slowdown – a persuasive positive indicator for the arrival of homomorphic encryption’s commercial viability.

Homomorphic encryption will also accelerate the movement of big data analytics to cloud environments. Organisations leveraging big data have been reticent about cloud security since downloading big datasets from the cloud for processing can be impractical.

On the other hand, performing data processing for their most sensitive data in the cloud also requires storing the data encryption key in the cloud, making an organisation’s security only as strong as the cloud environment. With homomorphic encryption, processing can occur in ciphertext form in the cloud with encryption keys stored offline.

Privacy needs to be re-examined

Many initiatives endeavouring to harness the power of big data have struggled with resource limitations, current technologies and regulations. Take, for example, financial regulators who struggle with the burdens of monitoring financial audit trails across multiple markets, asset types and participants.

Aggregating and disseminating this data to regulators is critical for surveillance, but creates a treasure trove of highly sensitive, unencrypted data while it is processed, and this occurs across multiple regulators.

This big data problem and the risk that this information will be used to engage in manipulative trading or even destabilise financial markets will only continue to grow unless encryption is deployed throughout the data’s life cycle. In fact, regulators only require audit trails related to red flags that their surveillance algorithms identify, which can all be done in a fully encrypted format.

The competing concerns of privacy regulation and the value of data analytics is also an issue that the healthcare industry has struggled with.

Fragmentation of health information is compounded by privacy concerns, which are a significant roadblock to data sharing and has prevented the integration of health data that could facilitate better health outcomes. The utility of digital health information systems could be greatly enhanced by the deployment of homomorphic encryption.

Encrypted processing will create new opportunities, applications and even industries by greatly minimising intellectual property and regulatory concerns. It may even turn competitors into collaborators.

Homomorphic encryption will also force a re-examination of baseline assumptions related to confidentiality and security. How will restrictions on disclosure apply to encrypted processing by third parties? What are appropriate access controls where the entire life cycle of data is encrypted? What is ‘reasonable security’ for processors of such data?

Privacy regulation will need to be re-examined in light of personal information being mined in an encrypted format. If an organisation is prohibited from sharing or selling data, what are the legal implications of their sharing and processing encrypted data that is never exposed?

Lastly and importantly, how will we know that the technologies we are deploying to accomplish not only homomorphic encryption but homomorphically encrypted processes are complying with the applicable laws, standards and obligations? Solutions will need to be auditable by design.

Homomorphic encryption is about more than big data. It is about solving for trust with tools that have never been available before and for which no similar workaround existed.

By Eric Hess

Eric Hess is the founder of Hess Legal Counsel and Helical. Hess Legal advises securities and digital asset firms on contract, security and privacy, governance, technology licensing and financing issues. Helical offers a cybersecurity-as-a-service platform.