Adobe The Online Shopping Features Driving Consumers January 2024 Banner

OpenAI Says ChatGPT Can Now ‘See’ and ‘Speak’


OpenAI said the days of communicating with ChatGPT by typing are coming to an end.

The artificial intelligence (AI) company said it is introducing new voice and image capabilities for its generative AI chatbot, letting users have a voice conversation or show the AI what they’re talking about, according to a Monday (Sept. 25) blog post.

“ChatGPT can now see, hear and speak,” the post said

“Voice and image give you more ways to use ChatGPT in your life,” it added. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow-up questions for a step-by-step recipe).”

OpenAI will begin offering voice and images in ChatGPT to Plus and Enterprise users over the next two weeks, per the post. Voice will be available on iOS and Android, and images will be an option for users on all platforms.

The updates to OpenAI come as several Big Tech firms are investing in AI-powered voice assistants.

For example, Apple is reportedly spending millions of dollars each day building out its generative AI capabilities across its product teams, with a focus of the initiative on a next-generation AI upgrade for Apple’s Siri voice assistant.

“And the embedded voice tool sorely needs it,” PYMNTS wrote Sept. 7. “Most voice assistants today, including those from Amazon and Google, still struggle to move beyond a core set of applications like playing music, turning lights on and off, telling their owners the weather or stock prices, and relaying other information directly from a website. Even the promising area of voice-activated connected commerce has yet to be fully scratched by today’s platforms.”

Commercialized voice assistants are feeling the pressure. Google and Amazon announced last month that their voice assistants, Google Assistant and Alexa, can now be used simultaneously on the same device, a new line of JBL smart speakers from Harman.

PYMNTS Intelligence found that consumers might still be unsure about the reliability and safety of voice technology, although those views could change as AI-powered tools become smarter, more available and more a part of everyday life.

According to the report “How Consumers Want to Live in the Voice Economy 54% of consumers said they would prefer voice technology in the future because it is faster than typing or using a touchscreen.

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.