Meta Unveils AI Model for Speech and Text Translations

Meta has unveiled an artificial intelligence (AI) model that performs speech and text translations for nearly 100 languages.

The new SeamlessM4T is an all-in-one multimodal and multilingual AI translation model, Meta said in a Tuesday (Aug. 22) press release. It supports speech recognition, speech-to-text translation, speech-to-speech translation, text-to-text translation, and text-to-speech translation.

One of the advantages of SeamlessM4T is its single system approach, which enhances efficiency and quality by reducing errors and delays in the translation process, according to the release.

SeamlessM4T is publicly released under a research license, the release said. This allows researchers and developers to build upon the model’s capabilities.

Additionally, Meta is releasing the metadata of SeamlessAlign, an open multimodal translation dataset that includes 270,000 hours of mined speech and text alignments, per the release. This dataset will serve as a resource for future research and development in the field.

SeamlessM4T builds upon Meta’s previous advancements in language translation technology, according to the press release. Last year, the company released No Language Left Behind (NLLB), a text-to-text machine translation model supporting 200 languages. NLLB has been integrated into Wikipedia as one of the translation providers.

Meta also demonstrated the Universal Speech Translator, the first direct speech-to-speech translation system for Hokkien, a variety of Chinese without a widely used writing system. Earlier this year, Meta unveiled Massively Multilingual Speech, offering speech recognition, language identification and speech synthesis technology across over 1,100 languages.

SeamlessM4T incorporates insights and learnings from all these projects to provide a state-of-the-art multilingual and multimodal translation experience, the release said.

It was reported in February 2022 that Meta was working on harnessing AI to create universal language translations and improve spoken interactions with voice assistants. In a demo offered by the company at the time, a voice assistant noticed, as a family was preparing a meal, that supplies of salt were low and ordered more.

In another use case for the technology, generative AI-powered language translation services can enable seamless communication between customers and providers, breaking down linguistic barriers and expanding the reach of telecom services to diverse markets.