How this machine learning researcher brings art and data together

27 Jul 2022

Image: Caroline Sinders

Artist and researcher Caroline Sinders is using research-based art projects to examine data and technology’s impact on society.

While data analytics has become one of the most valuable assets in a company’s arsenal, it’s not without flaws.

A major issue is how societal biases such as sexism can appear in datasets and AI algorithms due to the data that has been inputted. One woman who is trying to combat sexist data is Caroline Sinders, a machine learning design researcher and artist.

Speaking to SiliconRepublic.com, Sinders said she likes to use art as a mechanism for critique.

“Art allows me to visualise current urgencies, or current imaginaries or potential speculative solutions. It also allows me to really play.”

Sinders has worked on a number of projects using data science, machine learning and art. One of which is Feminist Data Set, a multiyear research-based art project that interrogates every step of the AI process, including data collection, data labelling, data training, selecting an algorithm to use and the algorithmic model to check for bias.

She said one of the reasons she wanted to bring art into the project was to engage community members in the process and let people ask questions around how to generate an algorithmic model in a feminist way.

“If I were doing this in a much more controlled environment, like in a lab for example, this would have been a much shorter project and we would probably have way less participants,” she said.

“What I like about it is by making it an art project, by elongating it, it allows me to do things over and over and over again, and I can change things, I can adjust things, I can move things around. But also, it allows me to follow provocations that participants give. Instead of saying, ‘Oh, that’s a great idea, but it’s not relevant,’ it allows me to actually say like, ‘Oh, actually, let’s follow that thread for a second.’”

‘Datasets should be thought of as organic entities that will expire one day’
^{– CAROLINE SINDERS}

She added that making it as strict as a full research project would also constrain the kind of text that participants could submit. Currently, participants can submit any kind of text, including poetry, blogposts and song lyrics, to form a text model.

She said this text model is going to be “misshapen” because of the different kinds of text being used. She also said that, unlike with natural image processing, she’s interested in manual annotation of data “to then try to imbue the sort of extra narrative within it”.

“That is an artistic choice as well. That becomes like a form of poetry, that also becomes a form of text itself that can fold back into the text, but that’s not how you would actually generate an NLP model. And I think that that’s OK though, because it’s still an illustrative step because that is like a kind of maybe analysis.”

Within the Feminist Data Set project, Sinders also created Technically Responsible Knowledge (TRK), which is a tool and advocacy initiative spotlighting unjust labour in the machine learning pipeline.

It includes an open-source data labelling and training tool and a wage calculator, and was created so that it could be used by non-coders.

“I wanted to include this data sheets aspect of, well, what’s a summary someone would add about this? Who made it and what is it about and where did it come from? Why does this exist? And that becomes the way to sign a dataset,” she explained.

“One of the things I’m really interested in is this idea that maybe datasets should be thought of as organic entities that they will expire one day. So then what is the lifecycle or the lifetime of a dataset? And then a dataset needs a label. It needs to have the day it was made or the day it was finished, and who worked on it and where are they from. So those were other things I was including within that as well.”

AI, machine learning and the public good

Outside of her Feminist Data Set project, Sinders is also extremely passionate about designing for the public good and has noticed plenty of examples of how machine learning can be helpful for society.

One area in which she saw AI used for societal good was while she was a writing fellow with Google’s People and AI Research (PAIR) team, where she looked at how different cities used artificial intelligence.

One example was Amsterdam using AI along with humans to parse through people making non-emergency phone calls, such as reporting fallen trees or illegal parking.

“They apparently had a lot of great success in using that. It has helped them create different buckets, and then it helps the humans sort faster for the most part.

“One of the reasons they wanted to do that is they recognised that a phone tree that they design is probably really confusing for a regular constituent or consumer. They know which department has to handle fallen trees, but a consumer may not know that.”

Sinders also said machine learning has a huge role to play when it comes to the climate crisis. When she was an artist embedded with the European Commission, researchers explained how they used machine learning to analyse changes in thousands of images of coastlines to monitor erosion as well as other tools like heat mapping.

“Machine learning is just able to sort those images so much faster than a person could. And it’s also been providing these different levels of analysis as to how things have changed. So then machine learning becomes this extra extension of the researcher in a way and is able to provide this really useful analysis,” she said.

“I think there is a lot of interesting movement in the climate change space of companies using machine learning to help analyse aspects of climate change already, but then also project and create simulations of what is a future if we change different parts of our present,” she added. “I think that is a really great use of machine learning.”

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.