Visa The Embedded Lending Opportunity April 2024 Banner

What’s Next for AI? Experts Say Going More Multimodal

So long, generative artificial intelligence (AI). We hardly knew ye.

The buzzy technology is approaching legacy tech status, MIT Technology Review reported Sept. 15, with AI firms already rushing to bring to market the next generation of foundational AI models capable of more productivity contributions and efficiency captures.

These new, multimodal, interactive large language models (LLMs) are being referred to as “interactive AI.”

Leading-edge tech companies including both the world’s largest incumbents and its nimblest startups are looking to push beyond the realm of what is currently capable within the AI landscape, as the tech ecosystem returns to its software-race roots.

Their hope? Create new AI models able to work with both text and images (and eventually voice) interchangeably.

“The first wave of AI was about classification,” Mustafa Suleyman, co-founder of DeepMind and Inflection AI, said in the MIT report. “Deep learning showed that we can train a computer to classify various types of input data: images, video, audio, language. Now we’re in the generative wave, where you take that input data and produce new data.”

“The third wave will be the interactive phase,” he added. “That’s why I’ve bet for a long time that conversation is the future interface. You know, instead of just clicking on buttons and typing, you’re going to talk to your AI.”

Interactive, multimodal AI will be able to do things like produce a text analysis when given a spreadsheet or chart, and design engineering equations for a product after seeing a wireframe sketch.

Already, companies like Google and OpenAI are getting close to realizing it.

Read also: It’s a ’90s Browser War Redux as Musk and Meta Enter AI Race

From Static to Animated Technology

The big change between today’s generative AI and next-generation interactive AI is that today’s are, typically, static; they do what they are told. However interactive AI will be animated by the potential to take further multimodal action across media types and channels.

In a sign of what is to come, OpenAI announced Wednesday (Sep. 20) DALL-E 3, the latest version of its AI image synthesis model. The new image generation model features full integration with OpenAI’s ChatGPT product and will be available for OpenAI’s Enterprise and Plus subscription customers early next month.

“DALL-E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts,” OpenAI said in the announcement. “Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. This represents a leap forward in our ability.”

In response to ongoing copyright concerns and data fears, OpenAI is also taking steps to ensure that its next step forward with AI is as self-regulated and compliant as it needs to be.

DALL-E 3 is designed to decline requests that ask for an image in the style of a living artist, and the AI model also allows for content creators to opt their images out from its training data.

Also this week, OpenAI announced Tuesday (Sep. 19) a new contracted group of AI experts to help the company inform its AI model risk assessment and mitigation strategies, called the Red Teaming Network. Red teaming can catch biases in models and identify prompts that are able to move past safety filters.

See also: Walled Garden LLMs Build Enterprise Trust in AI

Bringing AI to the Enterprise

Not to be outdone, OpenAI rival and Big Tech giant Google — whose size dwarves the upstart, AI-only company — said last week that its latest conversational AI software, Gemini, is being tested by a small group of early access companies.

The move comes on the heels of a sweeping set of upgrades to Google’s chatbot, Bard, released Tuesday (Sep. 19) and meant to further integrate the AI tool into end-users’ lives by providing more sophisticated and streamlined capabilities.

Since the advent of AI, PYMNTS has been tracking how generative AI can help speed up analysis and accelerate the agility of organizational decision-making.

The coming wave of interactive AI tools will only amplify those capabilities, turning spreadsheets and charts into dynamic content and reducing manual labor to a previously unimagined degree.

“I think, finance departments, if they choose to be, can be at the leading edge when it comes to embracing these new systems, new opportunities, new ways of running processes and reexploring processes that may not have been efficient in the past,” Hinge Health Chief Financial Officer James Budge told PYMNTS in an interview posted Thursday (Sept. 21). “And AI is a huge part of that.”

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.