
Image: © pinkeyes/Stock.adobe.com
OpenAI’s ChatGPT employs a technique called reinforcement learning from human feedback, a practical application of the awardees’ work.
Andrew Barto and Richard Sutton have received one of the highest honours in computing for developing the foundations of reinforcement learning (RL) – one of the key pieces of research behind the artificial intelligence (AI) we see today.
The recipients of the 2024 Association of Computing Machinery (ACM) A M Turing Award are credited with introducing the main ideas, constructing the mathematical foundations and developing important algorithms that led to the creation of “one of the most important approaches for creating intelligent systems”.
Barto is professor emeritus at the Department of Information and Computer Sciences at the University of Massachusetts, Amherst, while Sutton is a professor of computer science at the University of Alberta, the chief scientific advisor at the Alberta Machina Intelligence Institute and a research scientist at Keen Technologies, an AI company.
The two began collaborating in 1978 at the University of Massachusetts at Amherst where Barto was Sutton’s PhD and postdoctoral advisor.
In the early 1980s, Barto and Sutton drew on mathematical foundations provided by Markov decision processes (MDPs), whereby an agent – a computational entity that can perceive and act – makes decisions in a random environment, receiving a reward signal after each transition with the aim of maximising its long-term rewards.
Whereas standard MDP theory assumes that everything about the MDP is known to the agent, the RL framework allows for the environment and the rewards to be unknown. The minimal information requirements of RL, combined with the generality of the MDP framework, allows RL algorithms to be applied to a vast range of problems.
Later, the two, along with others, developed many of the basic algorithmic approaches for RL, leading to their textbook Reinforcement Learning: An Introduction in 1988, which is still a standard reference in the field, having been cited more than 75,000 times.

Image: Andrew Barto and Richard Sutton
However, successful practical applications for RL came decades later, and include the development of OpenAI’s ChatGPT, which employs a technique called reinforcement learning from human feedback to capture human expectations in its responses.
Moreover, RL is also widely applied in various sectors, including chip design, internet advertising and global supply chain optimisation.
“Barto and Sutton’s work demonstrates the immense potential of applying a multidisciplinary approach to longstanding challenges in our field,” said Yannis Ioannidis, the president of ACM.
“Research areas ranging from cognitive science and psychology to neuroscience inspired the development of reinforcement learning, which has laid the foundations for some of the most important advances in AI and has given us greater insight into how the brain works.
“Barto and Sutton’s work is not a stepping stone that we have now moved on from. Reinforcement learning continues to grow and offers great potential for further advances in computing and many other disciplines.”
While senior VP at Google Jeff Dean said that the awardees’ work has been a “lynchpin of progress in AI over the last several decades”. The company financially supported the $1m cash prize that the awardees received today (5 March).
“In a 1947 lecture, Alan Turing stated ‘What we want is a machine that can learn from experience’. Reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing’s challenge,” Dean said.
“The tools they developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers and driven billions of dollars in investments. RL’s impact will continue well into the future.”
The Turing Award, often referred to as the ‘Nobel Prize in Computing,’ is named after Alan M Turing, the British mathematician who articulated the mathematical foundations of computing.
Last year, theoretical computer scientist Avi Wigderson won the prestigious award for reshaping our understanding of the role of randomness in computation. Previous winners include AI leader Geoffrey Hinton, who also won last year’s Nobel Prize in Physics, Lisp programming inventor John McCarthy and software design pioneer Niklaus Wirth.
Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.