California computer scientists develop simple data breach detection tool

13 Dec 2017

Researchers have figured out a way to spot a data breach. Image: Vintage Tone/Shutterstock

With data breaches on the increase, this new tool could be useful for companies and organisations.

Computer scientists at the University of California (UC) San Diego have built and successfully tested a tool designed to detect when websites have fallen victim to a data breach, by monitoring the activity of email accounts associated with them.

During the 18-month study period, the researchers found that close to 1pc of the websites they tested had suffered a data breach, regardless of the audience and reach size.

Although 1pc doesn’t seem like much on first glance, when you consider that there are more than 1bn sites on the internet, this translates to roughly 10m websites that could be affected by a data breach annually.

A data breach is a case of when, not if

Alex C Snoeren, a professor of computer science at the Jacobs School of Engineering at UC San Diego and the paper’s senior author, said: “No one is above this – companies or nation states – it’s going to happen, it’s just a question of when.”

Researchers found that popular sites were just as likely to be hacked as unpopular ones, which translates to 10 out of the top 1,000 most visited sites on the internet potentially falling victim to a data breach.

Joe DeBlasio, the first author of the paper, said: “1pc of the really big shops getting owned is terrifying.”

The detection tool was presented in November at the ACM Internet Measurement Conference in London. The concept behind the tool is called TripWire.

DeBlasio created a bot that registers and creates accounts on a large number of websites (approximately 2,300 were included in this study). Each account is associated with a unique email address.

The tool was designed to use the same password for both the email account and the website account associated with each email. Researchers then bided their time to see if an outside party used the password to access the email account, which would indicate the website’s account information had been leaked.

Researchers also had to ensure the breach was related to hacked websites and not the email provider or their own infrastructure, so a control group was set up consisting of more than 100,000 email accounts created with the same provider used in the study. These addresses weren’t used to register on websites, and none of them were found to have been accessed by hackers.

19 websites were determined to have been hacked, including a well-known US start-up with more than 45m active customers. Once the accounts had been breached, the security teams of the affected sites were warned, and emails and phone calls were exchanged.

Snoeren said he was “heartened” by the serious response from the large sites that had been affected, but was surprised that none of the affected sites acted on the results of the study by disclosing their respective breaches to customers.

He continued: “The reality is that these companies didn’t volunteer to be part of this study.

“By doing this, we’ve opened them up to huge financial and legal exposure. So, we decided to put the onus on them to disclose.”

Breaches often used to leverage data harvesting

Very few of the breached accounts were used to send spam. Instead, hackers mostly monitored email traffic, which researchers speculated was in order to harvest valuable information such as credit card or banking details.

Researchers then took things up a notch, creating at least two accounts per website. One account had easy password strings of seven-character words, with a capitalised first letter and single digit at the end. The other account had harder passwords of 10-character strings of numbers and letters in both upper and lower case.

Seeing which of the two accounts were breached allowed researchers to make an educated guess about how websites store passwords. If both the easy and hard passwords were hacked, the website likely just stores them in plain text, in violation of best practice. If only the account using the easy password was breached, the sites likely used a more sophisticated method for password storage – an algorithm that turns passwords into a random string of data, with random information added.

The researchers had some advice for users: don’t reuse passwords, use a password manager and question how much you really need to disclose online.

Snoeren asked: “Why do they need to know your mother’s real maiden name and the name of your dog?”

The researchers hope that companies avail of the tool themselves, and said any major email provider could provide the service. By using the tool, an organisation could be better armed against a data breach.