Are AI companies shifting to small language models?

20 May 2024

Image: © sergray/

Both Microsoft and Google have revealed lightweight AI options for customers this year, while some experts are pushing for the efficiency and lower cost of small language models.

While the recent advances in AI have created many exciting prospects for the tech sector, it has also created a growing problem in terms of both cost and energy use.

The capabilities of large language models – the foundation of many generative AI services – have surged in recent years. But the resources required to make these systems a reality has grown at a similar exponential rate.

This year’s AI Index – an independent initiative at Stanford University – claimed OpenAI’s GPT-4 used an estimated $78m worth of compute to train, while Google’s Gemini Ultra cost $191m million for compute. This marks a dramatic increase to previous years – the report estimates it cost Google only $12m to train its PaLM model in 2022.

Meanwhile, there are reports that AI models, cryptocurrency and data centres together are expected to use as much energy as a small country in the next few years. Combine these issues with a more competitive market and it makes sense that leading AI companies are making a push towards ‘small’ language models.

The rise of smaller models

Google and Microsoft – two of the biggest players in the AI sector currently – have both made moves towards more lightweight AI options for their customers this year.

Microsoft recently unveiled Phi-3, its series of small language models that are designed to offer similar functions as large language models but in a more compact format and with less training data.

The company claims that large language models serve an important function but that they require significant computing resources to operate. It added that small language models are designed to perform simpler tasks, to be more accessible for smaller organisations and can be “fine-tuned” to meet specific needs.

“What we’re going to start to see is not a shift from large to small, but a shift from a singular category of models to a portfolio of models where customers get the ability to make a decision on what is the best model for their scenario,” said Sonali Yadav, Microsoft’s principal product manager for generative AI.

In February, Google revealed a series of lightweight AI models called Gemma and claimed these models can run on laptops and desktops, while surpassing the capabilities of some larger models.

At the time, Victor Botev, the CTO of, said Gemma was a sign of the “fast-growing capabilities” of smaller language models and that practical application is more important than massive parameter counts – “especially when considering the huge costs involved with many large language models”.

Will this trend continue?

As more companies and users begin to adopt AI – and as tech giants push AI into their various offerings – it seems inevitable that the cost of AI will continue.

Simon Bain, the CEO and founder of the data platform OmniIndex, claims generative AI is an “unrefined and chaotic beast” that is forcing data centres to find ways to “cope with the extraordinary and explosive demands of today’s AI boom”.

“AI developers without Google’s resources behind them will find themselves with a mountain to climb in a matter of weeks,” Bain said. “As the technology matures, we will hopefully begin to see more tamed versions of AI which do not require the energy consumption of 1,000 US homes to train and that offer users a more precise, efficient and useful service.”

Bain argues that small language models will offer a “more accurate result at a much lower cost” for businesses and that they will be trained on “much more precise and controlled data sets” to improve accuracy – an issue that exists in various AI models.

Find out how emerging tech trends are transforming tomorrow with our new podcast, Future Human: The Series. Listen now on Spotify, on Apple or wherever you get your podcasts.

Leigh Mc Gowran is a journalist with Silicon Republic