After the headline-grabbing drama, CEO Sam Altman was reinstated without a board seat; the company’s chief scientific officer, Ilya Sutskever, returned to his post; and the nonprofit’s board of directors was given a proper shakeup.
But what was behind it all?
Rumors and hype are swirling around reports that OpenAI researchers created a new model, called Q* (pronounced Q-star), able to complete grade-school-level math problems. This new development, and Altman’s push for commercialization, are what some observers believe to have spooked the nonprofit board, whose mission is centered around developing AI for the good of humanity.
A generative artificial intelligence (AI) model that can regularly and reliably solve math problems on its own would constitute a huge advance in the capabilities of AI systems.
Even today’s most advanced and cutting-edge AI systems struggle to repeatably solve relatively simple math problems, a situation that has been both vexing AI researchers and inspiring them to push the field forward for years.
If there is an AI model out there, or under development, that can really do math — even simple equations — on its own, then that represents a massive leap forward for AI’s applications across many industries, especially payments.
Math, after all, is a benchmark for reasoning. And the bread and butter for most AI models — particularly large language models (LLMs) — is pattern recognition, not logical sequence cognition.
LLMs are trained on text and other data that would take a human many millennia to read, but generative AI models still can’t be trusted to reliably discern that if X is the same as Y, then Y is the same as X.
AI systems with the ability to plan already exist, however they are typically embedded within highly contextually limited scenarios, such as playing chess, where the rules and permutations are fixed, or controlling a robot on a grid. Outside of their defined zone of expertise, these systems, including Google DeepMind’s AlphaGo and AlphaGo Zero, are limited in their planning capacity even when compared to animals like cats or mice.
Building a generative AI system that is capable of unsupervised reasoning and able to solve math problems without regular mistakes is a challenging, but important, milestone.
The name of OpenAI’s alleged model, Q*, may give a hint as to how to get there. It combines two fundamental computer science techniques, Q-learning and A* (pronounced A-star).
A* was created to build a mobile robot that could plan its own actions; while Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state.
Performing math reliably, for both AIs and humans, requires planning over multiple steps. But a senior NVIDIA scientist, Dr. Jim Fan, tweeted that some combination of an A* model like Google’s chess rules AlphaGO and a Q-learning system like an LLM could someday get there.
Maybe that model, combining Q-learning with A*, would be called something like Q*.
Most AI models operate on weights, not contexts, meaning they operate without truly understanding what they are dealing with. To perform math, however, an in-step sequential understanding is crucial.
An AI capable of doing math reliably is an enticing concept because, as in the laws of nature themselves, math represents a foundation of learning for other, more abstract tasks.
A 2023 research paper by OpenAI’s Sutskever along with other OpenAI researchers, titled “Let’s Verify Step by Step” investigates this concept while attempting to reduce the regularity of AI models trained on the “MATH dataset” from producing logical mistakes. The OpenAI scientists leveraged a dataset of 800,000 step-level human feedback labels to train their model.
Getting AI models to solve math problems would represent a crucial step in the innovation’s ability to transform enterprise workflows and operations, helping reduce the daily labor burden on internal teams by evolving their responsibilities from doing the process to managing or overseeing it.
Within security-critical areas like finance and payments, the future-fit impact of this as-yet hypothetical capability can’t be overstated.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.