How on Earth did 540m Facebook users’ details end up on public servers?

4 Apr 2019

Image: © weerapat1003/

This massive data leak could pose one of the most serious existential crises for Facebook. Can the social network put this data genie back in the bottle?

It has emerged that more than 540m Facebook user records were exposed on public servers after researchers from cybersecurity firm UpGuard discovered two separate sets of user data on Amazon cloud servers.

One set was linked to Mexican media company Cultura Colectiva, which contained 540m records including comments, likes, reactions, account names, Facebook IDs and more.

‘The data genie cannot be put back in the bottle’

A second set was linked to a no-longer functioning app called At the Pool and contained plaintext passwords belonging to 22,000 users.

Hold on a minute, 540m users is a big chunk of Facebook’s 2.3bn global audience. Surely this data was protected?

It is now but it wasn’t up until yesterday. The datasets were not even password-protected and the data was available for anyone to access. It all stems back to the kind of access that Facebook used to grant third-party app developers but subsequently clamped down on. The dangerous reality of this kind of access emerged when the Cambridge Analytica scandal boiled over more than a year ago.

What’s Facebook doing about it?

The social network has alerted Amazon to take the user data off its servers. UpGuard said it alerted Cultura Colectiva to the problem in January but received no response. By the end of January the researchers alerted Amazon, which alerted the media company again but the database wasn’t secured until yesterday (3 April).

According to UpGuard, each of the datasets was stored in its own Amazon S3 bucket and allowed public download of files.

“Facebook’s policies prohibit storing Facebook information in a public database. Once alerted to the issue, we worked with Amazon to take down the databases,” Facebook said in a statement. “We are committed to working with the developers on our platform to protect people’s data.”

How much data do these buckets contain?
A screenshot of the kind of data that was left lying in an Amazon S3 storage bucket by Cultura Colectiva.

Redacted example of data from the exposed Cultura Colectiva dataset. Image: UpGuard

UpGuard said that the datasets vary in terms of when they were last updated, the data points present and the unique individuals in each.

“What ties them together is that they both contain data about Facebook users – describing their interests, relationships and interactions – that were available to third-party developers. As Facebook faces scrutiny over its data stewardship practices, [it has] made efforts to reduce third-party access.

“But, as these exposures show, the data genie cannot be put back in the bottle. Data about Facebook users has been spread far beyond the bounds of what Facebook can control today. Combine that plenitude of personal data with storage technologies that are often misconfigured for public access, and the result is a long tail of data about Facebook users that continues to leak.”

What will this mean for the future of Facebook?

Well, it’s hard to say. If you value your privacy, should you continue to use the social network? As UpGuard says, the data genie cannot be put back in the bottle.

“These two situations speak to the inherent problem of mass information collection: the data doesn’t naturally go away, and a derelict storage location may or may not be given the attention it requires,” UpGuard said.

“For app developers on Facebook, part of the platform’s appeal is access to some slice of the data generated by and about Facebook users. For Cultura Colectiva, data on responses to each post allows them to tune an algorithm for predicting which future content will generate the most traffic. The data exposed in each of these sets would not exist without Facebook, yet these datasets are no longer under Facebook’s control.

“In each case, the Facebook platform facilitated the collection of data about individuals and its transfer to third parties, who became responsible for its security. The surface area for protecting the data of Facebook users is thus vast and heterogeneous, and the responsibility for securing it lies with millions of app developers who have built on its platform,” UpGuard said.

The issue highlights how Facebook shared this kind of information freely with third-party developers for years before cracking down. It harks back to the Cambridge Analytica affair and the levels of access that the social network granted third-party interests such as app developers and advertisers to its data.

In the case of Cambridge Analytica, researchers sold the data knowing that it violated Facebook’s terms and conditions. But in this latest saga, it shows that app developers who fail to secure data are as dangerous as any malcontent or hacker, and pose a serious threat to an unsuspecting user’s privacy.

The latest situation adds up to another debacle besetting the social media giant and, no matter how many charm offensives its leadership embarks upon, Facebook will never be able to put this genie back in the bottle.

It is also a punch in the eye for proponents of what many detractors call the ‘surveillance economy’ where advertising and e-commerce is predicated on intelligence about users’ every move and desire.

A different future may actually be in the hands of Zuckerberg who recently suggested a new type of privacy-focused social network where all messages are encrypted.

“We already see that private messaging, ephemeral stories and small groups are by far the fastest-growing areas of online communication,” he said in early March.

Time to get to work, Zuck.

John Kennedy is a journalist who served as editor of Silicon Republic for 17 years