Meta said its AI research team has been working for years on a supercomputer which it believes will play a pivotal role in the metaverse.
Meta, the company formerly known as Facebook, is introducing a supercomputer for AI research.
The team behind the AI Research SuperCluster (RSC) said they believed it would be the world’s “largest and fastest” supercomputer when fully built out in mid-2022.
Meta’s AI researchers have spent the past two years working on the project, which will play an important role in the company’s ‘metaverse’, the term Meta CEO Mark Zuckerberg recently coined to refer to the augmented reality (AR) ‘universe’.
According to a post published by Kevin Lee and Shubho Sengupta on Meta’s AI blog, the next generation of advanced AI will require powerful new computers capable of quintillions of operations per second. The RSC is Meta’s way of addressing this requirement.
Sengupta and Lee claimed that the RSC will help Meta’s AI researchers “build new and better AI models” that can work across hundreds of different languages, analysing text, images and video together, as well as developing new augmented reality tools.
“We hope RSC will help us build entirely new AI systems that can, for example, power real-time voice translations to large groups of people, each speaking a different language, so they can seamlessly collaborate on a research project or play an AR game together. Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform – the metaverse, where AI-driven applications and products will play an important role,” they wrote.
Meta has been working on AI research long-term since around 2013, when the Facebook AI Research lab was established. It has made several advancements already, in areas such as self-supervised learning and transformer models.
To make these advancements, Meta’s researchers had to rely on high-performing computers, and the team has been building these new infrastructures for years.
The first generation of the supercomputer, designed in 2017, had 22,000 Nvidia V100 Tensor Core GPUs in a single cluster that performed 35,000 training jobs a day.
In early 2020, the team decided to accelerate its progress on supercomputer research to design “a new computing infrastructure from a clean slate” to “take advantage of new GPU and network fabric technology”.
The team clarified that although the RSC supercomputer was now “up and running” its development was not complete. Once the RSC is fully completed, the team expects it will be one of the fastest in the world, performing at 5 exaflops of mixed precision compute.
Throughout 2022, the team plans to increase the number of GPUs from 6,080 to 16,000. This will increase the supercomputer’s AI training performance by more than two-and-a-half fold. The RSC’s storage system will have a target delivery bandwidth of 16TBps and an exabyte-scale capacity to meet increased demand.
The researchers at Meta worked with Penguin Computing, its architecture and managed services partner, on hardware integration for the RSC. Nvidia provided the team with its AI computing technologies, including software stack components such as NCCL for the cluster.
The team concluded that the RSC will enable them “not only to create more accurate AI models for our existing services, but also to enable completely new user experiences, especially in the metaverse”.
“Our long-term investments in self-supervised learning and in building next-generation AI infrastructure with RSC are helping us create the foundational technologies that will power the metaverse and advance the broader AI community as well,” they said.
Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.