Researchers theorise that large language models are able to create and train smaller versions of themselves to learn new tasks.
A new study aims to understand how certain large language models are able to learn new tasks from only a few examples.
These AI models – such as GPT-3 and the popular chatbot ChatGPT – are natural language processing systems that are trained on massive amounts of data.
With this vast amount of data, large language models are able to take a small piece of text and predict what comes next, allowing them to create human-like texts and answers to questions.
However, some researchers are exploring a phenomenon where large language models learn to accomplish a task after seeing only a few examples, despite not being trained for the task. This is known as “in-context learning”.
Normally, machine-learning models such as GPT-3 would need to be retrained with new data and updated parameters to tackle a new task. But with in-context learning, the model can handle the new task without updating its parameters.
In a new study, scientists from MIT, Google Research and Stanford University have studied similar large language models to try understand this process.
Lead author of the paper Ekin Akyürek said a better understanding of in-context learning could lead to improved AI models that don’t require costly retraining.
“Usually, if you want to fine-tune these models, you need to collect domain-specific data and do some complex engineering,” Akyürek said. “But now we can just feed it an input, five examples, and it accomplishes what we want.
“So, in-context learning is an unreasonably efficient learning phenomenon that needs to be understood.”
An AI model within a model
Some scientists theorise that large language models can perform in-context learning because they are trained on such massive amounts of data, meaning they have likely seen similar examples before.
But Akyürek and his team believe these AI models create smaller machine-learning models inside themselves, which the model then trains to complete a new task.
To test this hypothesis, the researchers used a neural network model that has the same architecture as GPT-3, but is trained for in-context learning.
The team’s experiments showed that these models can theoretically simulate and train smaller versions of themselves.
The researchers plan to push ahead with more complex experiments, along with an exploration into the types of pretraining data that can enable in-context learning.
Large language models have grown in popularity with the rapid rise of ChatGPT, which has created an AI race between some Big Tech companies.
For example, Microsoft recently revealed a new Bing search engine and Edge browser with AI capabilities in the hope of challenging Google’s market dominance. This followed Google’s sudden announcement that it is developing Bard, an AI chatbot to rival ChatGPT.
10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.