‘In Ireland, we cannot produce data scientists fast enough’

20 Jun 20181.43k Views

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Andrew Parnell, Hamilton professor at Maynooth University. Image: SFI

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Prof Andrew Parnell of Maynooth University is aiming to see the data from the mathematical trees to make predictions that could not only save time, but lives as well.

Trying to make sense of the reams of data available to the average researcher is challenging at the best of times, but new tools are constantly being developed to help them get to grips with it and possibly make major breakthroughs in the process.

One of those researchers is Andrew Parnell, a Hamilton professor at Maynooth University and the deputy director of machine learning and statistics at the Science Foundation Ireland-funded Insight Centre for Data Analytics.

After obtaining a bachelor’s degree in mathematics and management science, Parnell went on to complete a master’s degree in statistics from the University of Kent in 2000.

In 2008, he became a senior lecturer in statistics at University College Dublin (UCD) and in 2017 co-founded the NovaUCD start-up Prolego Scientific, now acting as its chief scientific officer.

What inspired you to become a researcher?

I started writing computer code in BASIC at home on a Commodore VIC-20 my brother bought back in my early teens. I still remember the first time I wrote a simple program that printed text on the screen, and the feeling of creation and reward that gave me.

Much later at school and then at university, I learned about the ways you could use the tools of maths to make these programs faster, more elegant and visually impressive. In The Blind Watchmaker, Richard Dawkins showed me how to use these tools to make wonderfully complex shapes and patterns from very simple rules.

To this day, I still love using those tools to discover patterns in datasets that are not visible to the naked eye, but easy to uncover using maths, statistics and computer code. I find it wonderfully liberating that anybody can develop these methods using a €50 computer without any need for expensive lab equipment.

Can you tell us about the research you’re currently working on?

My current research interest is in mathematical trees. A mathematical tree is made up of branches and leaves just like an ordinary tree, but the mathematical trees I develop are used for making decisions.

In the early 2000s, a famous statistician called Leo Breiman developed a method for using multiple decision trees to create predictions. He called the method ‘random forests’ and it has become one of the foundational methods in machine learning, outperforming neural networks and deep learning in many scenarios.

Unfortunately, one of the downsides of the random forests method is that it doesn’t do a very good job of making predictions with uncertainty. In certain circumstances, such as predicting the next movie you will watch on Netflix, uncertainty doesn’t matter much.

But, in other circumstances, such as predicting the size of a tumour, you really want to know how uncertain that prediction is. My team develops new versions of random forests that make predictions with uncertainties. We want to take these new techniques and apply them to new areas that are important for society and the economy.

What commercial applications do you foresee for your research?

The huge advantage of working in the areas of mathematics, statistics and computer science is that the work we do here can apply to any area.

For this research project, I have chosen to focus on four areas of key benefit to the Irish economy. They are:

  • improving health and wellbeing of racehorses with the equine genomics company Plusvital
  • better classification of products in online shopping websites with Clavis Insight (recently acquired by Ascential)
  • reduced downtime of websites through better web monitoring with Littledata
  • improved decision-making in closed-loop control systems (eg air traffic control) with IBM

The methods lying at the heart of these improvements can be applied anywhere and may have far wider use and benefit to Irish society than the problems listed above.

What are some of the biggest challenges you face as a researcher in your field?

In Ireland, we cannot produce data scientists fast enough.

At Maynooth University, we have an MSc in data science and analytics, which is in high demand every year. These students are snapped up by the data science companies in Ireland before they have even finished their degree.

By contrast, the rewards for studying for a PhD, while great, are not immediately obvious to a student being offered €70,000-plus straight out of university. Attracting good PhD students is key to keeping this field at the state of the art in Ireland.

Are there any common misconceptions about this area of research?

There is a weird fashion right now about the terms deep learning, artificial intelligence and machine learning that people associate with computer science, while subjects like statistics and mathematics get left behind.

What most people don’t realise is that the people who developed these methods – such as Geoff Hinton who developed deep learning – have very strong backgrounds in statistics and maths.

You need to study these technical, currently unfashionable subjects, to be good at the fashionable ones.

What are some of the areas of research you’d like to see tackled in the years ahead?

With recent developments, we now have the ability to analyse and create predictions on very large datasets. However, we still don’t have the ability to create predictions when we also want uncertainty.

There is a method for doing this on smaller datasets, known as Bayesian inference, named after an 18th-century Presbyterian minister called Thomas Bayes.

The challenge is to get big data and Bayes together. This research challenge – ‘Big Bayes’ if you like – is what myself and many other scientists will be working on for the next few years.