Meta to make a scene with its latest text-to-image AI tool

15 Jul 2022

Images generated by Meta AI tool Make-A-Scene.

Image: Meta

Following OpenAI’s DALL-E and Google’s Imagen, Meta has now entered the space with Make-A-Scene – taking the tech a notch higher.

Text-to-image tools are getting increasingly popular these days, and Meta is the latest player with a new AI tool it is developing called Make-A-Scene.

Capable of generating an image from text prompts, Meta’s latest research project takes the technology a step further by accepting rough sketches from the user to direct the AI before the final image is created.

The free-form sketches, which can be anything from a lone cactus in a desert at night to a zebra riding a bike, will accompany text prompts to help the AI determine how the user visualises the finished product.

Showcasing Make-A-Scene on its website yesterday (14 July), Meta gave the example of a painting of a zebra riding a bike.

“[The outcome] might not reflect exactly what you imagined; the bicycle might be facing sideways, or the zebra could be too large or small,” it wrote.

“With Make-A-Scene, this is no longer the case. It demonstrates how people can use both text and simple drawings to convey their visions with greater specificity using a variety of elements.”

The text-to-image craze

Text-to-image AI technology has been growing in popularity, particularly since open-source model DALL-E mini started to take the internet by storm in recent months. It was inspired by the original DALL-E model developed by OpenAI, although they are not connected.

OpenAI created DALL-E in 2021 as an AI model that can generate images based on simple text descriptions. A second version called DALL-E 2 was unveiled in April, which OpenAI said can generate more realistic and accurate images “with four times greater resolution”.

Google also slid into the scene with its own text-to-image model in May. The search giant claims its Imagen AI model has an “unprecedented degree of photorealism” and a deep level of language understanding.

It shared examples of images that the AI model has created – ranging from a cute corgi in a house made from sushi to an alien octopus reading a newspaper.

Intended for adult artists and children alike, Meta’s Make-A-Scene is trying to differentiate itself from the crowding space with a claim to more ‘nuanced’ results spurred by the user’s sketches. However, users can also choose to generate images using only text prompts.

“The model focuses on learning key aspects of the imagery that are more likely to be important to the creator, like objects or animals,” Meta said.

Meta has been focusing a great deal on AI lately, as it prepares to develop technologies to accompany its foray into the metaverse. It has been developing concepts such as universal speech translation, AI that can learn like a human and a more conversational AI assistant.

In December 2021, the company revealed it had developed technology that can animate human-like figures in children’s drawings, in the hopes to build AI that can “understand the world from a human point of view”.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.