Cerebras breaks record for largest AI models trained on a single device

23 Jun 2022

A person in a white lab outfit with a face mask and plastic gloves, holding a large brown processor. It is WSE-2 processor by Cerebras.

Cerebras achieved this result thanks to the WSE-2 processor, which the company said is the largest processor ever built. Image: Cerebras Systems

Cerebras said it can reduce the engineering time to run large NLP models from months to minutes, making it more cost-effective and accessible.

Cerebras Systems claims it has achieved a new feat, by training an AI model with up to 20bn parameters on a single system.

The AI company said using a single CS-2 system can reduce the engineering time and effort necessary when training natural language processing (NLP) models from months to minutes.

NLP is the area of AI that focuses on enabling computers to analyse and understand human language from either text or voice data.

Cerebras CEO Andrew Feldman said bigger NLP models are shown to be more accurate, but this means only select companies have the resources and expertise to do the “painstaking work” required.

“As a result, only very few companies could train large NLP models – it was too expensive, time-consuming and inaccessible for the rest of the industry,” Feldman added.

Cerebras said its latest result will help eliminate one of the “most painful aspects” of training large NLP models, which usually involves spreading the model across hundreds or thousands of different GPUs.

The company added that the process of partitioning a model across GPUs is unique to each network compute cluster pair, so the work can’t be ported to different clusters or across neural networks.

The ability to train a large model on a single device was possible thanks to the Cerebras WSE-2 processor, which the company said is the largest processor ever built. It is 56 times larger, has 2.55trn more transistors and 100 times as many compute cores as the largest GPU.

Last year, Cerebras said it was using the WSE-2 processor to power a new chip cluster that it could “unlock brain-scale neural networks”.

With this processor, the AI company said a single CS-2 could support models with hundreds of billions or even trillions of parameters.

Dan Olds, chief research officer at market intelligence firm Intersect360 Research, said this could give organisations an “easy and inexpensive on-ramp to major league NLP”.

“Cerebras’ ability to bring large language models to the masses with cost-efficient, easy access opens up an exciting new era in AI,” Olds said. ““It will be interesting to see the new applications and discoveries CS-2 customers make as they train GPT-3 and GPT-J class models on massive datasets.”

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.