How data science on a cosmic scale is being used to search for ET

14 Oct 2020

Image: © Cecilia Lim/

Exotica is an effort to catalogue every single object in the known universe that could be a sign of alien life – and with that comes challenges.

Data science and machine learning are often brought in to do the massive jobs that humans would take years, if not decades to do. This can be seen with efforts to find treatments or a potential vaccine for Covid-19, or in aerospace design.

However, in the search for potential signs of alien life, one research project is aiming to build a database that eclipses nearly all others in size; one that contains every known object or phenomenon in the entire universe. The Exotica project is the work of Breakthrough Listen, an international effort to search for evidence of technological life in the universe in a variety of ways.

In June, the catalogue of more than 700 distinct targets with “one of everything” in the observed universe was released, ranging from comets to galaxies and from mundane objects to the most rare and violent celestial phenomena.

By establishing a dataset that Exotica’s creators call a “survey breadth”, astronomers may find it easier to identify signs of technology developed by extraterrestrial intelligence – referred to as technosignatures – and rule out the possibility that any phenomena widely considered natural are in fact artificial.

Lead author of the catalogue, Dr Brian Lacki of the Institute for Advanced Study, told how Exotica was built on the work of a number of “indispensable” data tools.

Among the most important, he said, is the Astrophysical Data System (ADS), which contains much of the world’s astronomical literature with links to other relevant papers and those that it refers to and cites.

“Suppose you run into a paper written back in the 1980s saying that some particular star is anomalous,” Lacki said. “You haven’t heard anything about it. Should that star be regarded as a forgotten mystery, or did someone come up with a good explanation 20 years ago and everyone’s moved on? Tools like ADS allow you to follow the ‘conversation’ in the literature and see how researchers grappled with these objects.”

Artist’s impression of the large, thin ’Oumuamua object venting gasses as it approaches the sun.

An artist’s impression of ‘Oumuamua approaching the sun. Image: ESA/Hubble, NASA, ESO, M Kornmesser

The numbers

First conceived in 2018, Exotica has helped gather much of the existing literature out there on known objects and followed a web of references listed by ADS.

According to Lacki, the catalogue so far consists of around 10pc of data collected on known objects, which amounts to approximately 1 petabyte of data, or 1m gigabytes. While not directly involved in the managing of this incredible amount of data, Lacki said the process has been “quite a challenge”.

“In addition to purchasing a great many hard disks, the team has been dealing with cloud storage,” he said

“Yet, the telescopes themselves actually produce about 10GBps of voltage data during each observation, which comes out to about a petabyte a day. To avoid overflowing our capacity, the backends of our instruments take that data and essentially produce detailed summaries in the form of spectra.”

While searching for signs of potential alien life is its main focus, the catalogue may also help benefit astronomy in the long run. That’s because it can help others to perform ‘treasure surveys’, where a variety of targets are observed and that data is stored so that astronomers over the coming years and decades can come back to it.

For Lacki, the catalogue could help “close any gaps” that might occur in the planning of astronomical surveys because something might not be “fashionable” at that time.

“The catalogue is supposed to contain ‘one of everything’,” he said. “That includes both the things we find exciting, but also some relatively obscure and seemingly mundane objects. In the past, some things that have seemed ordinary have turned out to lead to great discoveries.”

Looking to the future, Lacki’s hopes are that rather than just being a catalogue containing one of everything, it could include “10 of everything, then a hundred of everything, and so on”.

AI and ET

While the obvious solution to help speed this work along is gaining access to more storage space, machine learning is where the Breakthrough Listen team sees real changes in the years to come. Instead of a human stumbling on a new discovery while working late into the night, new mysteries might be discovered thanks to AI. Not only that, but it could help prevent a discovery from slipping through the cracks.

“For about 20 years, there’s been talk about how one might take a vast database and visualise the properties of objects as ‘clouds’ in some parameter space,” Lacki said. “We might be able to look for natural divisions between different types of targets by looking for distinct clusters in a database, and machine learning can aid that.”

So will one of the largest data science projects of its kind really one day help confirm the potential existence of extra-terrestrial life? For Lacki, compiling such a large list of objects only helps in furthering astronomy by potentially discovering something unexpected that we might have missed.

While it’s almost impossible to categorise what intelligent extra-terrestrial life could look like, Lacki feels endeavours such as this enormous data science project can help at least give us a starting point.

“Maybe there are things out there which will appear bizarre and uncanny, but will turn out to be just as interesting as a planet of beings like ourselves,” he said. “That kind of discovery, I think, is most likely to happen if we look at the more extreme or off-beat objects, like those found in the Exotica catalogue.”

Colm Gorey was a senior journalist with Silicon Republic