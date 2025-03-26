Alibaba Cloud has launched a multimodal artificial intelligence (AI) model that can process inputs in the form of text, images, audio and video, and can generate real-time responses in the form of text and natural speech.

The new Qwen2.5-Omni-7B can be deployed on mobile phones and laptops, the company said in an article posted on Alibaba’s news website, Alizila.

Because the model is both compact and multimodal, it can power “agile, cost-effective AI agents,” according to the article.

“For example, the model could be leveraged to transform lives by helping visually impaired users navigate environments through real-time audio descriptions, offering step-by-step cooking guidance by analyzing video ingredients, or powering intelligent customer service dialogues that really understand customer needs,” the article said.

Qwen2.5-Omni-7B is open-sourced on Hugging Face and GitHub and can be accessed via Qwen Chat and ModelScope, which is Alibaba Cloud’s open-source community, per the article.

Among the more than 200 generative AI models open-sourced by Alibaba Cloud, the new model stands apart in terms of its performance across all modalities and the “new benchmark” it set in real-time voice interaction, natural and robust speech generation, and following end-to-end speech instructions, the article said.

This announcement came about two months after Alibaba released an AI model called Qwen2.5-Max and said it outperforms top AI models on key benchmarks.

Alibaba said at the time that Qwen2.5-Max held its own against DeepSeek V3, Llama 3.1-405B, GPT-4o and Claude 3.5 Sonnet in the MMLU-Pro, GPQA-Diamond, LiveCodeBench, LiveBench and Arena-Hard benchmarks.

In February, Alibaba said during an earnings call that it will spend more on AI in the next three years than it has in the last decade.

“We aim to continue to develop models that extend the boundaries of intelligence,” Alibaba CEO Eddie Wu said during the call. “Why is that the primary aim? Well, it’s because of all the visible AI application scenarios today that we see around content creation, search and so on and so forth have arisen precisely as a result of the ongoing extension of those boundaries, and we want to keep pushing out those boundaries to create more and more opportunities.”