The computer scientist giving data a voice

18 Jun 2024

Prof Declan O'Sullivan. Image: Chris Bellew/Fennell Photography

Prof Declan O’Sullivan’s work is all about creating open, standardised systems that give a bird’s eye view of complex datasets.

Earlier this year, Trinity College Dublin (TCD) launched the Voices project to uncover the hidden voices of women in early modern Ireland, with the help of AI and digital technologies.

Prof Declan O’Sullivan, from TCD’s School of Computer Science and Statistics, is leading the digital aspect of the €2.5m European Research Council-funded project.

O’Sullivan began his computer science journey in the 1980s, by completing undergraduate and master’s degrees at TCD. He then worked in industry for a number of years before returning to TCD in 2001 to study for a PhD. His research focuses on data and its relationship with machine learning and AI. He is a principal investigator at the Adapt SFI Research Centre for AI-driven Digital Content Technology. In 2019, he was elected a fellow of TCD in recognition of his research and contributions to the college.

For the Voices project, which brings together historians, literary scholars, data analysts and computer scientists, O’Sullivan and his team will use AI-driven text recognition tools and explore the use of generative AI to give the wider team the ability to access, search and analyse historical documents.

The aim is to document women’s experiences of social upheaval, bloody civil war and extreme trauma, to organise the data into an open-access, searchable tool called a knowledge graph to drive further research, and to establish a process which can be replicated to recover other hidden and marginalised voices from historical records.

O’Sullivan took the time to explain his research and why he thinks it’s important. He believes that post-Covid, people are more interested in getting insights into the processes of research.

“There is a lot more curiosity amongst the public about how problems are solved, not just that problems are solved.”

Tell us about your current research.

My team and I are focusing on the challenge of how information from different systems can seamlessly interact without much human help – a challenge I have been passionate about since the 1980s. Despite all the advances over the decades, this remains a persistent issue, as evidenced by the amount of human effort and cost involved in integrating systems together, even within the same organisation.

Over the last decade, myself and my team have focused on the development of techniques and approaches that allow experts in various fields to use semantic web and knowledge graph technologies to aid such data integration.

In your opinion, why is your research important?

The use of open source and standards-based (based on W3C standards) knowledge graph technologies brings significant benefits to those seeking to undertake data integration: it supports data in any format (CSV, XML, relational database etc); it’s based on standard and open internet technologies, removing the need for proprietary solutions; the data does not need to be physically transported to another site for the integration to work (in other words, it’s a naturally federated approach); data from many different sources and organisations can be connected together in the graph avoiding unnecessary duplication; and it’s a technology that copes with evolution of data requirements, transforming how industries operate and collaborate.

What inspired you to become a researcher?

For sure the spark for my becoming a researcher can be traced back to working with Prof Jane Grimson during my MSc on the problem of semantic interoperability and the potential for a federated database approach, that was being researched at the time.

It was clear to me from the project that people and organisations always design their databases with a particular application or perspective in mind, but that over time the need to bring this data together with other data or into use by other applications was inevitable, leading to potentially a persistent semantic interoperability problem.

Trying to design solutions to cope with the problem taught me the significant ongoing need to adapt and innovate in data integration, which has a profound implications across various sectors.

What are some of the biggest challenges or misconceptions you face as a researcher in your field?

A common misconception in my field is that semantic interoperability issues are a thing of the past, or can only be solved through the extensive use of consultancy or human intervention to achieve data integration in organisations. My work aims to research technology to aid the reduction of human effort required to undertake data integration and develop adaptable technologies that will cope with the constant evolution of organisational needs.

How do you encourage engagement with your work?

My team and I are committed to projects that serve the public good and we try to involve ourselves in significant data integration projects in that space. This typically involves working with subject matter experts/domain experts from other disciplines. For example, we are currently closely collaborating with historians to build knowledge graphs in the Virtual Record Treasury of Ireland and Voices projects, and with clinicians in the federation of rare disease data in the FAIRVASC project, which aims to build a single European dataset to open the door for new research into rare diseases.

Find out how emerging tech trends are transforming tomorrow with our new podcast, Future Human: The Series. Listen now on Spotify, on Apple or wherever you get your podcasts.