What’s DALL-E Mini, the AI image generator taking the internet by storm?

13 Jun 2022

AI-generated image of a black cat in the style of Monet. Image: DALL-E Mini

The publicly available DALL-E Mini was created to reproduce the results of OpenAI’s text-to-image model with a much smaller architecture.

Images created by a text-to-image AI model called DALL-E Mini have been popping up across the internet recently.

Despite the similar name, this AI tool is not connected to the DALL-E model developed by OpenAI.

In 2021, OpenAI created DALL-E, an AI model that is able to generate images based on simple text descriptions. A second version called DALL-E 2 was unveiled in April, which OpenAI said can generate more realistic and accurate images “with four times greater resolution”.

While DALL-E was getting attention, another project was developed to try and reproduce the results of this AI with a smaller architecture.

Created by machine learning engineer Boris Dayma, DALL-E Mini is an open-source AI model inspired by OpenAI’s tech that can create images from text prompts.

The model is trained by looking at millions of images from the internet with their associated captions. Over time, the model learned how to draw an image from the text prompt.

While this is similar to how OpenAI’s models were trained, there are significant differences in both quality and scale.

Dayma said the first DALL-E Mini model was 27 times smaller than the original DALL-E. DALL-E was also trained on 250m pairs of image and text, while DALL-E Mini used only 15m pairs.

Unlike OpenAI’s model, however, DALL-E Mini is available to the public. This has led to a wave of comedic and strange images spreading across the internet.

Concerns around bias and offensive imagery

OpenAI has said its text-to-image model is not yet open to the public as it is testing the limitations and capabilities of the model to “develop and deploy AI responsibly”.

Last month, Google Research also revealed a competitor to DALL-E, called Imagen. The Google team behind the model said it had an “unprecedented degree of photorealism” and a deep level of language understanding.

But it added that a preliminary analysis suggested that the model encodes a range of “social and cultural biases” when generating images of activities, events and objects.

Concerns have also been raised that this sort of technology could help people spread disinformation online through the use of authentic-looking fake images.

While the images generated by DALL-E Mini are not nearly as realistic as the more powerful AI models, there is still the risk of biased and offensive imagery being created.

“While the capabilities of image-generation models are impressive, they may also reinforce or exacerbate societal biases,” DALL-E Mini says on its Hugging Face page.

“While the extent and nature of the biases of the DALL-E Mini model have yet to be fully documented, given the fact that the model was trained on unfiltered data from the internet, it may generate images that contain stereotypes against minority groups.

“Work to analyse the nature and extent of these limitations is ongoing and will be documented in more detail in the DALL-E Mini model card.”

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.

Leigh Mc Gowran is a journalist with Silicon Republic