DeepMind’s new AI, Gato, can play video games and control robots

16 May 2022

Image: © Игорь Головнёв/Stock.adobe.com

Gato is DeepMind’s latest AI creation, pitched as a jack of all trades.

Researchers have long wanted to build a machine capable of thinking and acting like a human. While we’re not quite there yet, a new AI system developed by DeepMind might have brought us one step closer.

Gato is DeepMind’s latest AI that can perform more than 600 different tasks, such as playing video games, captioning images and moving real-world robotic arms.

The idea behind Gato is to create a ‘generalist’ AI system that can perform many different tasks that humans can do, without carving a niche for itself as an expert on one task. Essentially, as far as artificial intelligence goes, it is a jack of all trades and master of none.

“Inspired by progress in large-scale language modelling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs,” DeepMind, which is Alphabet’s AI subsidiary, wrote in a blogpost last week.

Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: https://t.co/9Q7WsRBmIC

Paper: https://t.co/ecHZqzCSAm 1/ pic.twitter.com/cC8ukhw4at

— DeepMind (@DeepMind) May 12, 2022

Described as a “multi-modal, multi-task, multi-embodiment generalist policy”, UK-based DeepMind said Gato can play Atari, caption images, stack blocks with a real robot arm and much more, deciding in each case “whether to output text, joint torques, button presses, or other tokens”.

It is similar to the GPT-3 text generator from OpenAI in that it accepts user input and performs tasks, learning from its practice based on billions of inputs prior. But DeepMind is smaller than GPT-3 in terms of parameters, and is pitched as distinct from other systems based on the wide range of tasks it can perform.

“It sounds exciting that the AI is able to do all of these tasks that sound very different, because to us it sounds like writing text is very different to controlling a robot,” Mike Cook, a member of the Knives & Paintbrushes research collective, told TechCrunch last week.

“But in reality, this isn’t all too different from GPT-3 understanding the difference between ordinary English text and Python code.”

An illustration of some of the different tasks Gato can perfrom.

An illustration of some of the different tasks Gato can perform. Image: DeepMind

Cook explained that Gato receives specific training data for these tasks “just like any other AI of its type” and learns how patterns in the data relate to one another, including learning to associate certain kinds of inputs with certain kinds of outputs.

“This isn’t to say this is easy, but to the outside observer this might sound like the AI can also make a cup of tea or easily learn another 10 or 50 other tasks, and it can’t do that. I think it’s a nice bit of work, but it doesn’t strike me as a major stepping stone on the path to anything.”

DeepMind says that Gato is trained on a large number of datasets comprising “agent experience in both simulated and real-world environments”, in addition to a variety of natural language and image datasets.

In a research paper published last week, DeepMind also claims that for many of the 600-odd tasks that a pre-trained Gato model can do, it can outperform humans.

Earlier this year, DeepMind created a new AI-powered system called AlphaCode, which it said can write computer programs “at a competitive level”.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.