The ethical challenges of banking genetic data

6 Jul 2023

A colourful illustration of a DNA test inforgraphic against a black background.

Image: © Tartila/Stock.adobe.com

Purdue University’s Dr Robbee Wedow discusses the challenges of ensuring informed consent among research participants when it’s becoming standard practice to collect data in genetic depositories to be available for future research.

A version of this article was originally published by The Conversation (CC BY-ND 4.0)

Imagine you agreed to be part of a new and exciting long-term research study to better understand human health and behaviour. For the past few years, you’ve been visiting a collection site where you fill out some questionnaires about your health and daily activities. Research assistants take your height, weight and some other physical characteristics about you. Because you agreed to contribute your genetic data to the study, you also provided a saliva sample during your first visit.

Later, you see a news article reporting that researchers analysing data from the study you’re participating in have found genetic variants that predict the likelihood of someone completing college.

You remember reading a long form when you consented to giving your data, but you can’t quite remember all the details. You know the study was about health, but how do these findings about genes and education have anything to do with health? Did they analyse your data specifically? What did they find?

What are biobanks?

Many scientific research studies collect data meant to answer a specific research question. For example, to study the genetics of diabetes, researchers might collect data on your blood pressure and lipid levels in addition to genetic data.

But increasingly, scientists are collecting large amounts of data to be kept in biobanks – repositories that store genetic data and other biospecimens like blood, urine or tumour tissue to be used in a wide number of future studies.

Some biobanks, such as the UK Biobank, link biospecimen data to other collected data, such as sexual behaviour, medical history, weight, diet and lifestyle. Private companies like 23andMe also obtain consent from their customers to have their data used in research efforts.

As a researcher interested in the intersection between social behaviors and genetics, I frequently have conversations with people who weren’t aware of how their genetic data is being used. They’re often surprised that the genetic data they consented to be used for research at a private company by using a DNA testing kit or at a biobank while visiting their local clinic might be used to study the genetics of same-sex sexual behaviour or risk-taking.

In our newly published research, my colleagues and I found that even choosing not to respond to survey questions can reveal information about the population (we found that not responding to survey questions is correlated with a person’s education, health and income levels) if genetic data is available.

Genetic data and informed consent

The research that can be done with biobank data might sound scary, but it shouldn’t be. Genetic data, like the data used in our study, is de-identified. This means that it cannot be linked back to individual research participants, who remain anonymous.

Further, genetic data for these sorts of genetic studies is used at the aggregate level, meaning it isn’t used to predict or evaluate any one particular individual’s responses or behaviours.

Researchers aren’t using genetic data to target individuals with certain genetic profiles. Almost all genetic research is used to better understand how health behaviours and other factors affect health and to figure out ways to improve outcomes. This goal is why most research participants agree to contribute their data to research in the first place: to help the world through science.

The problem is whether research participants really understand how their data can be used. Many of the original ideas around the development of the informed consent process and Institutional Review Boards, or IRBs, intended to protect research participants from direct harm or privacy violations were based on the expectation that research studies would be addressing particular questions about a single subject, such as cardiovascular disease or lung cancer. This focus was so as not to repeat unethical research atrocities like the infamous Tuskegee Syphilis Study, where researchers did not tell participants, who were all black men, that they had syphilis and withheld treatment that was already widely available and known to be highly effective.

But since genetic data is de-identified, it is often considered exempt from full IRB review, which is a US protocol to ensure studies meet ethical standards and institutional policies. And the broad number of research questions that can be explored with biobanks, along with the amount and types of data collected, has made these original protections to ensure truly informed consent insufficient.

Improving informed consent

To be clear, biobanks are enormously important for public health research. They allow researchers to link many different outcomes and variables together to paint a critical overall picture of human health and behaviour. And in contrast with the personally identifiable online or phone data that companies collect to show you targeted ads, biobanks collect de-identified data that is evaluated in aggregate.

In the age of vast data collection, ensuring that participants are aware of how their data can and cannot be used is necessary to ensure that biobanks are a transparent tool for global good.

Biobanks can’t predict how a participant’s data will be used in the future, so it can be difficult for researchers and ethicists to bring back the “informed” part of “informed consent.” Even so, more needs to be done to earn the trust of valuable research participants who contribute the data to improve science and society.

By Dr Robbee Wedow

Dr Robbee Wedow is an assistant professor of sociology and data science at Purdue University. His main research interest is in sociogenomics, which lies at the intersection of sociology, demography, and statistical and computational genetics.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.