Scaling up and down: The R&D behind data science in Ireland

30 Sep 2015

The explosion of data science operations hits the end user last as, invariably, industry and R&D bodies try to develop a moving beast from the earliest possible interaction. But what is happening behind the scenes in Ireland?

Research into data science, be it big or small, can be found across pretty much every educational institution in the country.

Be it software architecture, social studies, or one of the many ways to extract personalised extras from the mounds of data we produce as individuals every day, it looks like research into how data is created, collated and consumed will continue for eternity.

Look at the numbers

Just look at the numbers and you can already see that Ireland has the edge when it comes to capturing a slice of the most significant transition in business and technology history: the onset of the data-driven economy.

We’re home to some of the biggest IT companies in the world, with data centres, EMEA headquarters and, notably, data HQs for the likes of Facebook, Apple, Google and more based here.

But behind the curtain, what’s going on? Every year or so the business model behind how Google or Facebook use their immense reserves of user data seems to alter. Why?

Talk to the experts

Well, we spoke to one or two researchers around the country to find out where this is all heading, and we found it was a two-way street.

“Scale is the big target,” explained the Insight Centre for Data Analytics’ Professor Nial Friel, whose own research project has the catchy title of Advances for the probabilistic analysis of network data.

Through a four-year SFI-funded investigator project, Friel and his colleagues are looking at developing statistical models and inferential techniques for large networks.

The reason behind the project, he said again, is scale.

Data centre

Creaking architecture

“The starting point for us was that statistical methods for network data usually only apply to small networks — like friendship ties in a social network.

“That can generate complicated and highly-structured networks, and the statistical methods don’t scale very well to larger networks, which is what we want. We want to develop next-generation statistical models for larger data.”

If you have ‘n’ number of people in a network, then the total number of possible connections in the network scales as ‘n2’.

“And the number of possible networks scales as two to the power of n2, a huge number!”*

Hot topic

Scale, scale, scale. Data, data, data.

“It’s a massively hot topic, not just for network data but for big data generally,” he said.

Will this stretch into areas like AI? “Quite possibly,” he explained. “We’re not thinking in terms of AI but it is one of the big success stories of statistics and machine learning.”

Google Translate is one example Friel gives where AI and its extensions have proved remarkably successful. “Google’s speech recognition technology as well,” he said. “The amount of data they’re working on is frightening.”

What direction are you headed?

Of course, scaling up is only one of the two routes researchers are going, with a colleague of Friel’s heading in the opposite direction.

Human analytics appears to be at the opposite end of the spectrum: big data on a personal, personalised level.

Most big data applications are simple, you see. In a trivialised, basic way of describing an expansive area of the digital economy, you take data, slice it and dice it, combine it with another reading and tell a story.

Different meanings

“But the ‘big’ in big data doesn’t necessarily mean size,” explained Insight’s Alan Smeaton. “It can mean complexity. Scaling down is important. How the data is sliced, diced, contextualised, for the user, is key.”

Human analytics is essentially what Smeaton is pressing home. If you drill down, each of us produces data every day.

We carry phones, watches, wearables and even implants such as cochlear implants and pacemakers that are buzzing away and generating our own, personal data.

The healthy approach

Think health and wellness, something many people think is simply going to the doctor when they feel sick. However, researchers are looking at the bigger picture, at preventative tools.

“I monitor my sleep, and my daily activity,” said Smeaton, who previously relied on a Fitbit before moving over to an Apple Watch he claims he’s still “trying to make sense of”.

By recording our activities through sensors, we’re creating digital diaries, he explained, with simple, off-the-shelf devices that are maybe not being utilised to the fullest.

“Think of grazing your knuckles,” he said. “That can reveal the interstitial fluid in between skin and blood. At Insight we’re looking at that, drilling into it to learn its composition. We want to analyse it.”

PH, diet and context

When you perspire, what is the PH level of your sweat? That, on its own, means nothing, but add some context like daily diet, activity or sleep and you can establish hydration levels and health indicators.

“Your house is valued at a certain price on Daft,” he said. “But that means nothing without looking at the area, the amenities, etc. You add context and all of a sudden this data, this human analytics, makes sense. That’s what’s really exciting here.”

And it’s personalisation that we got first-hand experience of last week when we messed around with a new digital footprint app called BigFoot, developed by the ADAPT Centre for Digital Content Technology at TCD and AI start-up Aylien.

Big Foot

BigFoot, a real eye-opener

BigFoot is quite simple in its concept, retrieving your social media activity across Twitter, Facebook and Instagram and combining it into a visualisation that shows just how marketing companies may be able to work you out.

“Data science is more than just big data… there are individuals involved,” ADAPT’s Professor Owen Conlan told us.

“Personal analytics are things that people want to get involved in. BigFoot is just one example of that, but it’s more – it’s about learning about trends and what you are about.”

Personalisation and AI

Personalisation is Conlan’s primary concern, with the result of people learning about their interactions  leading to what he calls ‘reflection’.

If you are shown particular content that you don’t like, you might minimise the window on a computer screen. Data science should develop tools that understand that, explained Conlan, so that you react to content and it reacts to you.

“A really good example is Petri,” he said of a project he is working on in ADAPT. “Petri allows people who run forums to identify the outstanding individuals that are helping to answer questions and shape the narrative.”

If you think about customer care forums, for example, this could mean companies learn who is performing best within their labour force, and how they are doing it.

Extrapolating this out – which is what research bodies such as ADAPT, Insight, CeADAR or their likes are doing right now – is the future.

A simple train trip

“I’m on the phone right now, travelling on the train. Something is tracing this route for me,” said Conlan just before we ended our chat.

“I work on my route. With analytics I know I was active on Word or answering emails when I went in to work today.

“So it was a productive trip, rather than a dead trip. Using data right can be something you use to prove your productivity and it can play into your work-life balance.

“I think this is all so important. So much of our lives are all about data.”

*UPDATE: This article was updated at 10.15 to amend a quote attributed to Professor Nial Friel.

Siliconrepublic.com’s Data Science Week brings you special coverage of this rapidly growing field from 28 September to 2 October 2015. Don’t miss an entry worth your analysis by subscribing to our news alerts or following @siliconrepublic and the hashtag #DataScienceWeek on Twitter.

Main globe image and data centre image via Shutterstock

Gordon Hunt was a journalist with Silicon Republic

editorial@siliconrepublic.com