How to take data to a ‘new analytic dimension’

9 Jul 2024

Image: Alba González–Cebrián

A research career was always a given for Dr Alba González–Cebrián, but her path was laid with a public education and family support, she says.

Dr Alba González–Cebrián describes her academic journey as a weaving of “diverse paths”. Having completed a degree in biomedical engineering, she found herself drawn to data science and decided to pursue a master’s degree in data analysis, before undertaking a PhD in statistics.

She describes herself as fortunate to have had the guidance of Prof Alberto Ferrer during her doctoral studies, and she was able to apply statistical machine learning to biomedical engineering challenges, “thereby maintaining a connection with my roots”, she says.

She has since transitioned into more data-centric research and has found “a nice fit” in her current role as a research associate in the National College of Ireland (NCI).

The key for González–Cebrián when engaging the public with her research is “earning people’s trust” by “being honest about limitations and showing consideration when offered constructive criticism”.

Here, she gives an insight into her current research focus.

Tell us about your current research.

At NCI, I have been tackling the development of data versioning algorithms. It all started with the aim of creating the Smardy research data marketplace by offering value-added services. Smardy is a Eureka R&I project that NCI is involved with as a consortium partner.

Having experience in data pre-processing and exploratory analysis, I recognised these crucial yet often overlooked aspects as a critical step of efficient data usage. This led me to envision a versioning data service that did more than merely timestamp new versions; it would also highlight changes in terms of information.

Click here to listen to Future Human: The Series.

Prof Horacio González-Vélez saw the potential in this innovative concept and, together, we transformed this ‘fanciful’ idea into a series of more concrete, technical research questions that intersect with other trending topics in data science, such as FAIR principles, which aim to ensure that scientific data and resources are findable, accessible, interoperable and reusable, promoting effective data management and sharing. There are still many limitations and research questions, but our first steps have been promising.

In your opinion, why is your research important?

Well, research is generally important because it resonates with a deep and beautiful human thirst for knowledge, and it is particularly important for me because it excites me about my work, which is a huge privilege.

This work at NCI is exciting because it tackles a significant data management and analysis challenge. We are discussing automatic data versioning using machine learning, specifically unsupervised models. Traditionally, data versioning has been mostly just tracking changes without understanding their significance. With machine learning, we can automate this process and really get into the nitty-gritty of what these changes mean.

This is relevant because automating data versioning can reduce human error and speed up processes, which is essential for fast-changing environments working with dynamic data. Our framework could enhance data integrity by providing a new analytical dimension of changes, helping users make more informed decisions, and increasing data transparency, accessibility and reusability.

By providing more informed and reliable data, companies and institutions streamline workflows, which can significantly improve reproducibility, foster innovation through better insights, and improve best practices in data management and analysis.

Hopefully, it will also promote better collaboration across different actors, eg, research groups or industries, ensuring everyone uses the most accurate data available.

What inspired you to become a researcher?

I have always been annoyingly curious – just ask my parents! Research gives me space for both my rational, critical thinking and my more creative side, which is about imagination and curiosity. I can have fun but also grow, evolve and challenge myself. So, the spark of research, chasing the truth through unanswered questions, was a given to me.

The conscious choice of career was more complex. Even if research was an obvious path for me, I probably would not work on it today without the luck of having had access to publicly funded education and services and of having wonderful people in my life (family, friends, supervisors and colleagues), who have supported and grounded me.

What are some of the biggest challenges or misconceptions you face as a researcher in your field?

There has been for a long time the romantic idea of the perfect researcher as some isolated genius who prioritises career above all else, working on weekends and having eureka moments at 4am after too much caffeine. I am deeply committed to my work (and coffee), but I think this romanticisation of researchers is a big mistake.

Rome was not built in a day; individual passion and vocation do a lot, but structural support is essential. Research is carried out by people who require access to education, project funding, work-life balance, fair salaries and connection with the world. And I find engaging with colleagues in stimulating conversations incredibly important. You are not supposed to think always alone!

I have heard crazy stories about toxic, competitive environments, and I think they have a lot to do with insufficient research funding. Even if I can empathise, I am not a fan of that extreme individualism; I prefer collaboration and camaraderie. Mutual support is better for us and for the research.

Do you think public engagement with science and data has changed in recent years?

Maybe, and that is indeed positive because it is true that, until recently, scientific research might have been seen as totally separate from everyday life, which is not the case. However, I think this engagement with science and data-driven research came with (again) a romanticisation spurred by the belief that it will save us all from everything.

Of course, I agree that science and data-driven technology are incredibly useful. Still, my impression is that there is a lot that could be done from a less technological perspective to improve social conditions and tackle existing challenges, such as the climate emergency. That is an area tangent to science and data, and it would be interesting if we could have unpolarised, even contradictory, conversations about this without idealising anything.

Find out how emerging tech trends are transforming tomorrow with our new podcast, Future Human: The Series. Listen now on Spotify, on Apple or wherever you get your podcasts.