So long, generative artificial intelligence (AI). We hardly knew ye.
The buzzy technology is approaching legacy tech status, MIT Technology Review reported Sept. 15, with AI firms already rushing to bring to market the next generation of foundational AI models capable of more productivity contributions and efficiency captures.
These new, multimodal, interactive large language models (LLMs) are being referred to as “interactive AI.”
Leading-edge tech companies including both the world’s largest incumbents and its nimblest startups are looking to push beyond the realm of what is currently capable within the AI landscape, as the tech ecosystem returns to its software-race roots.
Their hope? Create new AI models able to work with both text and images (and eventually voice) interchangeably.
“The first wave of AI was about classification,” Mustafa Suleyman, co-founder of DeepMind and Inflection AI, said in the MIT report. “Deep learning showed that we can train a computer to classify various types of input data: images, video, audio, language. Now we’re in the generative wave, where you take that input data and produce new data.”
“The third wave will be the interactive phase,” he added. “That’s why I’ve bet for a long time that conversation is the future interface. You know, instead of just clicking on buttons and typing, you’re going to talk to your AI.”
Interactive, multimodal AI will be able to do things like produce a text analysis when given a spreadsheet or chart, and design engineering equations for a product after seeing a wireframe sketch.
The big change between today’s generative AI and next-generation interactive AI is that today’s are, typically, static; they do what they are told. However interactive AI will be animated by the potential to take further multimodal action across media types and channels.
In a sign of what is to come, OpenAI announced Wednesday (Sep. 20) DALL-E 3, the latest version of its AI image synthesis model. The new image generation model features full integration with OpenAI’s ChatGPT product and will be available for OpenAI’s Enterprise and Plus subscription customers early next month.
“DALL-E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts,” OpenAI said in the announcement. “Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. This represents a leap forward in our ability.”
DALL-E 3 is designed to decline requests that ask for an image in the style of a living artist, and the AI model also allows for content creators to opt their images out from its training data.
Also this week, OpenAI announced Tuesday (Sep. 19) a new contracted group of AI experts to help the company inform its AI model risk assessment and mitigation strategies, called the Red Teaming Network. Red teaming can catch biases in models and identify prompts that are able to move past safety filters.
Not to be outdone, OpenAI rival and Big Tech giant Google — whose size dwarves the upstart, AI-only company — said last week that its latest conversational AI software, Gemini, is being tested by a small group of early access companies.
The move comes on the heels of a sweeping set of upgrades to Google’s chatbot, Bard, released Tuesday (Sep. 19) and meant to further integrate the AI tool into end-users’ lives by providing more sophisticated and streamlined capabilities.
The coming wave of interactive AI tools will only amplify those capabilities, turning spreadsheets and charts into dynamic content and reducing manual labor to a previously unimagined degree.
“I think, finance departments, if they choose to be, can be at the leading edge when it comes to embracing these new systems, new opportunities, new ways of running processes and reexploring processes that may not have been efficient in the past,” Hinge Health Chief Financial Officer James Budge told PYMNTS in an interview posted Thursday (Sept. 21). “And AI is a huge part of that.”
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.