Platform from Harvard and MIT could make it easier to control your data

21 Feb 2019

Pink combination code lock on a suitcase, representing privacy.

Image: © oatintro/Stock.adobe.com

MIT and Harvard researchers have developed Riverbed, a platform that ensures web services comply with individual personal data restrictions.

When we use the internet on mobile or desktop, our personal data is generally stored on remote servers. Everything from photos and fitness data to wearables and social media profiles rests across a variety of servers spanning the globe.

Many technology services aggregate the datasets across the servers to gather insights about user shopping habits, or share data with advertisers. In general, users who sign up to these services have little control over how their data is processed. This may soon change.

More control over personal data

A new platform developed by researchers at MIT and Harvard University, called Riverbed, essentially forces servers to only use personal data in ways users have given explicit approval for. While it is still in the proof-of-concept stage, it could offer a promising way for people to manage their data permissions.

In a paper scheduled to be presented at the USENIX Networked Systems Design and Implementation conference, researchers outlined how it would operate.

“Users give a lot of data to web apps for services, but lose control of how the data is used or where it’s going,” said first author of the paper, Frank Wang, a PhD graduate of the Department of Electrical Engineering and Computer Science, and the Computer Science and Artificial Intelligence Laboratory at MIT.

“We give users control to tell web apps – ‘This is exactly how you can use my data.’” Wang was joined on the paper by PhD student Ronny Ko and associate professor of computer science James Mickens, both of Harvard.

How does it work?

With Riverbed, your web browser/smartphone app does not communicate directly with the cloud. Instead, a Riverbed proxy runs on your device to mediate proceedings.

When the service such as Facebook or Twitter tries to upload user data to a remote location, the proxy tags the data with a set of permissible uses – this is called a ‘policy’.

A user can choose any number of predefined restrictions, such as ‘do not store my data on persistent storage’ or ‘my data may only be shared with the external service [domain name]’. The Riverbed proxy tags all this data with the selected policy.

On the data centre side, Riverbed assigns the uploaded data to an isolated cluster of software components. Each of the clusters process only data tagged with the same policies.

One could process data that cannot be shared with other services, while another could manage data that the user has asked not to be written to a disk. Riverbed monitors the server-side code to make sure it complies with the user policies – if not, it terminates the service.

The aim is to enforce user data preferences, while maintaining advantages of cloud computing, such as performing large-scale computations on outsourced servers.

GDPR compliance

With GDPR, many web developers have said there is not enough guidance for writing apps sophisticated enough to leverage user data and comply with the regulation. While Information Flow Control systems (IFCs) have been designed by computer scientists to label variables, it has been difficult to implement IFCs at scale.

The server-side code of your average app can operate on top of a special ‘monitor’ program that tracks, regulates and verifies how other programs manipulate your personal data. This monitor creates a separate copy of the app’s code and each unique data policy – each copy is called a ‘universe’.

The monitor ensures that users who share the same policy have their data uploaded to, and manipulated by, the same universe. This method enables the monitor to terminate a universe’s code, if that code attempts to violate the universe’s data policy.

Wang said that down the line, Riverbed could make compliance with things such as GDPR easier for online services – although the added computation required would slow service down by about 10pc, many organisations could sacrifice the speed to avoid data-related issues.

Wang added: “All users in each universe have the same policies, so you can do all your operations and not worry about what data is put into an algorithm, because everyone has the same policy on data in that universe.”