On the Front Lines of AI: Training Large Language Models on New Tasks – without All the Retraining

Massachusetts Institute of Technology

Sometimes, machine learning models learn a new task without seeming to have learned – or been trained – to do it. That’s the findings of researchers at MIT, Stanford and Google Research, who report on a curious phenomenon called “in-context learning,” “in which a large language model learns to accomplish a task after seeing only a few examples — despite the fact that it wasn’t trained for that task,” according to a blog post released today by MIT.

Unraveling this mystery, and then replicating the technique, could result in AI practitioners minimizing the lengthy and expensive process of gathering data and updating parameters involved in retraining models to perform new tasks.

“The researchers’ theoretical results show that these massive neural network models are capable of containing smaller, simpler linear models buried inside them,” MIT reported. “The large model could then implement a simple learning algorithm to train this smaller, linear model to complete a new task, using only information already contained within the larger model. Its parameters remain fixed.”

“Usually, if you want to fine-tune these models, you need to collect domain-specific data and do some complex engineering,” said Ekin Akyürek, an MIT graduate student in computer science and lead author of a paper exploring in-context learning. “But now we can just feed it an input, five examples, and it accomplishes what we want. So in-context learning is a pretty exciting phenomenon.”

Akyürek said many in the ML research community believe large language models (LLMs) are capable of in-context learning due to how they were trained. As an example, the LLM GPT-3, the basis for the phenomenally popular Chat-GPT generative AI application from OpenAI, was training by feeding it enormous amounts of internet text. Its parameters number in the hundreds of billions. “So, when someone shows the model examples of a new task, it has likely already seen something very similar because its training dataset included text from billions of websites,” MIT reported. “It repeats patterns it has seen during training, rather than learning to perform new tasks.”

However, Akyürek hypothesized that in-context learning isn’t simply matching patterns seen previously, but that the models are truly learning to do new tasks.

He and his research colleagues experimented “by giving these models prompts using synthetic data, which they could not have seen anywhere before, and found that the models could still learn from just a few examples,” according to MIT. “Akyürek and his colleagues thought that perhaps these neural network models have smaller machine-learning models inside them that the models can train to complete a new task.”

“That could explain almost all of the learning phenomena that we have seen with these large models,” he said.

The upshot: Akyürek and his colleagues said in-context learning may occur by adding only a few layers to the neural network. “There are still many technical details to work out before that would be possible, Akyürek cautions, but it could help engineers create models that can complete new tasks without the need for retraining with new data,” MIT said.

Akyürek was joined on the paper by Dale Schuurmans, Google Brain research scientist and professor of computing science at the University of Alberta; by senior authors Jacob Andreas, the X Consortium assistant professor in the MIT Department of Electrical Engineering and Computer Science and a member of the MIT Computer Science and AI Laboratory (CSAIL); Tengyu Ma, an assistant professor of computer science and statistics at Stanford; and Danny Zhou, principal scientist and research director at Google Brain.

source: Adam Zewe, MIT News Office

Comments

  1. I don’t think calling this “learning” is correct. Learning changes parameters. With fixed parameters, this is called inferencing. What it shows in this phenomenon is that the LLM has already learnt that new tasks without being realised by the developers. Similar patterns have been learnt and that’s why it can predict correct sentiment with just few examples.