Twitter a ‘playground’ for research ahead of first Twitter data-mining conference

21 Apr 2015

The world’s first global conference on data mining through Twitter to greatly benefit research starts tomorrow (22 April) at the EMLYON Business School in France.

We’re all a part of a giant social experiment and you might not even know it. With an estimated 288m monthly active users, Twitter is fast outstripping itself as a place where news and reactions to news are shared and its influence is now being realised not just by businesses, but by academic researchers too.

The term ‘big data’ is bandied about frequently, with Twitter given as one of the prime examples where the sharing of information about a person’s psyche, interests or preferences in consumer goods can lead to a market research study that would have been almost impossible to achieve before its existence.

Even Twitter itself is well aware of the potential of tapping into the 500m-plus tweets sent daily having revealed last year that of its US$1.2bn in revenues during that period, US$70m of this was derived from selling customer data.

While its benefits are obvious from a marketing standpoint, academic researchers. including Dr Clément Levallois, organiser of the upcoming conference, sees the micro-blogging site as almost a more-true reflection of society that would otherwise be limited by geographical boundaries.

“It’s very tempting and very fascinating for scientists to construct [Twitter] networks: who follows who, who mentions who, in order to discover social structures,” he said, speaking to Siliconrepublic.com.

“Then you can draw from it interesting conclusions, including political movements or intellectual trends or how the physical geography is actually quite different to the geography of the social networks.”

Breaking artificial boundaries between academic fields

The conference will take place over three days, bringing together researchers from a number of countries, as well as two of Twitter’s top data miners: data scientist, Dr Jeff Kolb and Romain Huet, a Twitter developer advocate who campaigns for solutions for developers to interface Twitter with their apps.

Some of the topics due to be spoken about include, Twitter data for urban policy making, Sentiment-based spam tweets detection and The brand or a spokesperson, who should tweet? When brand conversation leads to brand humanisation.

“I myself as a researcher who travels between different fields felt I would be very comfortable in a conference where the main topic would be Twitter, without artificial boundaries put by the fact that person comes from biology or that person comes from computer science; all on an equal footing with a shared interest in Twitter and the opportunities it opens for research,” Dr Levallois said of his reasoning for establishing the conference.

Describing Twitter as a ‘textual treasure’, he said that, as a researcher, Twitter is now pushing their research ‘to the next stage’, but resists the temptation to see a future where traditional academic research dies off in favour of mass data mining, “It’s a new playground basically, and a huge one,” he said.

Dr Clement Lavallois Dr Clement Levallois, assistant professor at EMLYON Business School

Privacy: Is it right to harvest data en masse from users?

Of course, mining tweets en masse is a privacy minefield, with implications that people are unwittingly participating in a survey, which could use information that could reveal personal information when compiled not just from their tweets, but where they’re located and anything else they may happen to reveal on the site.

If researchers are only looking to discover patterns within vast spectrums, how do they ensure that the data they gather means the individual user remains anonymous?

After all, when they conduct a standard study, the participants are given the utmost assurance that their participation remains unknown to anyone reading the material, which is why researchers, including those from the Virginia Bioinformatics Institute of Virginia Tech, have proposed guidelines for researchers to use including making “objectives, methodologies, and data handling practices transparent and easily accessible”.

Agreeing with the need for privacy, Levallois said that it’s an issue not just for researchers, but companies as well: “Big data sets collected by any major company now open the same kind of issues [among researchers] such as anonymity and how thoroughly can data be anonymised?

“Even if a tweet has been anonymised, you might be able to reconstruct the identity of the person who tweeted if you can reconstruct their social networks. These are the issues that you encounter in 2015, not just on Twitter specifically.”

Twitter is a primordial soup, waiting to be tapped

Given their interest in harvesting computer power and big data, EMLYON Business School announced earlier this year that it had signed a deal with IBM to create partnership links between the company and its researchers to better tap into large datasets, including those provided by Twitter, with the help of IBM’s supercomputer, Watson.

All of this potential appears to excite Dr Levallois as a researcher, not just in terms of where this extreme processing power can take big data through online networks like Twitter, but in terms of where the technology itself can go.

“Sometimes these things are so great that we simply don’t know what kind of business models or what kind of new hardware or what kind of new meaningful artificial intelligence will arise from these building blocks that we have now,” he said.

“Like the primordial soup, we have all these ingredients and we will see what kind of life will arise from it. But my take on it is that I’m very much trying to be in the middle of it so that I can catch the emergent forms as soon as they arrive.”

Lyon image via Damien/Flickr

Colm Gorey was a senior journalist with Silicon Republic

editorial@siliconrepublic.com