Alibaba Cloud Unveils AI Models That Understand Visual Content

Alibaba Cloud, AI, artificial intelligence

Alibaba Cloud has unveiled two open-source artificial intelligence (AI) models that have the ability to understand both images and text.

These models — Qwen-VL and Qwen-VL-Chat — were trained on Alibaba Cloud’s large language model Qwen-7B and offer improved image recognition and understanding performance compared to other open-source large vision language models, the cloud computing company said in a Friday (Aug. 25) press release.

The launch of these new models demonstrates Alibaba Cloud’s commitment to advancing multi-modal capabilities for its large language models, according to the press release. By incorporating sensory inputs like images and audio, Alibaba Cloud aims to explore new applications for researchers and commercial organizations.

These models have the potential to transform user interactions with visual content, the release said. One of the key advantages of these models is their ability to generate photo captions for news outlets or assist non-Chinese speakers in reading street signs that are in Chinese. Furthermore, the models enable visual question answering, making shopping more accessible to blind and partially sighted users.

Alibaba Groups online marketplace, Taobao, has already integrated optical character recognition (OCR) technology to aid visually impaired individuals in reading text, per the release. The newly launched large vision language models further simplify this process by allowing visually impaired users to obtain answers from images through multi-round conversations.

Alibaba Cloud’s previous large language models, Qwen-7B and Qwen-7B-Chat, have gained significant popularity since their launch a month ago, with over 400,000 downloads, according to the press release. These models have been made available to developers, researchers and commercial organizations, facilitating the development of their own generative AI models in a cost-effective manner.

Alibaba reported Aug. 10 that its cloud business reported revenue growth of 4% and has seen “strong demand” for training AI models and related services.

“Cloud is relevant to all the industries,” Alibaba CEO Daniel Zhang said during the company’s earnings call.

Other tech industry leaders are also doubling down on innovations in generative AI and machine learning, with Meta, Microsoft and Alphabet mentioning “AI” more than 200 times during earnings calls held during the spring. Amazon, too, said it was working on an improved large language model to power its smart assistant Alexa.