Big data can solve big problems, but only if computing can keep up 

24 Mar 2016

Joshua New from the Center for Date Innovation asks if enough is being done to ensure computing power keeps up with the amount of data the world is producing.

The scientific community, particularly in fields such as particle physics and genetic sequencing, is using vast quantities of data to tackle some of the most challenging problems known to science. But analysing massive amounts of data requires an enormous amount of computing power, and researchers cannot do this without high-performance computing (HPC), also known as supercomputing.

The US government has been at the forefront of funding some of the world’s most advanced HPC systems and, as of November 2015, five of the top 10 most powerful supercomputers in the world were located in the US.

But, as bigger data allows scientists to solve bigger problems, countries that wish to be leaders in the data economy should ensure that computing power can keep up to avoid creating a technical bottleneck for scientific progress.

Mind-boggling data

Scientists rely on HPC to solve problems in a wide variety of data-intensive disciplines.

For example, the European Organisation for Nuclear Research’s (CERN) Large Hadron Collider collides millions of particles a second, each of which produce a megabyte of data, leaving scientists with 30 petabytes (30m gigabytes) of data to analyse, even after they extract the one percent of data they need.

To help make sense of such a large amount of information, researchers at the US Department of Energy’s (DOE) Argonne National Laboratory are using a HPC system dubbed Mira to simulate and analyse these collisions, thanks to Mira’s enormous 10-petaflops computing power, which performs 1015 calculations per second.

‘Researchers at the University of Chicago are using a different Argonne National Laboratory supercomputer named Beagle to analyse 240 full human genomes in just 50 hours’

For perspective, Mira can compute as much data in a day as an average personal computer could in 20 years. Other projects, like the Partnership for Advanced Computing in Europe, a €400m initiative funded by Spain, Italy, Germany, and France, have provided access to HPC for scientists and researchers across Europe too.

The amount of genomic data available to researchers, which helps them develop new insights into genetic diseases, personalise treatments to individual patients, and potentially even develop cures for cancer, has skyrocketed in recent years. This is thanks to advances in sequencing technology that have made it possible to sequence a whole human genome for as little as $1,000 and in as quickly as 26 hours.

Considering that a comprehensive sequence of a whole human genome is approximately 200GB of information, working with this data also requires massive amounts of computing power. Researchers at the University of Chicago are using a different Argonne National Laboratory supercomputer named Beagle to analyze 240 full human genomes in just 50 hours.

Good energy

Many private sector applications also rely on massive data sets, so by investing in HPC systems, the US government helps more than just publicly-funded research.

In February 2016, DOE announced 10 projects to allow companies to tap into the power of HPC systems at DOE-managed national labs, which few companies could afford to develop themselves, to improve efficiency and product development.

For example, General Electric will use advanced HPC particle physics simulations to improve the efficiency and lifespan of their aircraft engines. And an initiative called HPC4Mfg (HPC for manufacturing) provides manufacturing companies in energy-intensive sectors, such as chemicals and food processing, with an opportunity to use HPC systems to develop strategies to boost energy efficiency.

Government support is key

Robust government support for HPC systems will be crucial to scientific advancement in key research priorities, such as using advanced modeling to develop resiliency to climate change.

Policymakers should recognise that increasing access to HPC resources can offer important economic benefits, such as improved manufacturing techniques, as well as social benefits, such as reduced energy consumption.

The US has strived to be on the ‘bleeding edge’ of HPC – in July 2015, President Barack Obama issued an executive order to launch the National Strategic Computing Initiative, which will eventually develop a supercomputer 30 times more powerful than any existing HPC system.

But simply building the most powerful system will not be enough to maximize the benefits HPC can offer to scientific discovery or produce secondary benefits to the private sector.

For example, there is already a shortage of workers skilled in both data-intensive sciences and HPC, the demand for which will only increase in the coming years.

And, as new technological breakthroughs give rise to entirely new types of HPC systems, such as quantum computing or even biological computing, governments should be at the forefront of ensuring they are fully exploring the potential of these approaches.

Joshua New

Joshua New is a policy analyst at the Center for Data Innovation, a think tank studying the intersection of data, technology, and public policy. Follow Josh on Twitter @Josh_A_New.

File data storage image via Shutterstock