Google released Gemini 3.5 Live Translate on Tuesday, an audio model that translates speech continuously across more than 70 languages instead of waiting for a speaker to finish. The model began rolling out to the Google Translate app on Android and iOS the same day, opened to developers in public preview, and reaches Google Meet this month, where speech translation has so far covered five languages.

The launch reprices a professional service as a bundled feature. AI interpretation vendors charge $8 to $35 per attendee-hour for what Google now offers free in the consumer Translate app and is folding into eligible paid Workspace tiers, starting with a private preview, and the language-solutions market absorbing that difference was worth $30.85 billion in 2025, by Slator's count. What Google has not published is the reliability data that would tell buyers when the free feature is good enough.

Key Takeaways

AI-generated summary, reviewed by an editor. More on our AI guidelines.

From the Translate app to Grab's 10 million monthly calls

The model generates translated speech while the speaker is still talking, "balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker," product manager Anuda Weerasinghe and senior staff software engineer Tony Lu wrote in the announcement. The output stays a few seconds behind the speaker and preserves intonation, pacing and pitch, the post said.

The consumer surface builds on a beta Google opened in December, which offered headphone translation on Android in the U.S., Mexico and India. Tuesday's update extends the feature to iOS and adds an Android "listening mode" that plays the translation through the phone's earpiece, held to the ear like a call, with no headphones required. Developers get the model through the Gemini Live API and AI Studio, where streaming platforms including LiveKit, Agora and Pipecat have built integrations. Grab is testing it to translate pickup calls between drivers and travelers, who place more than 10 million voice calls a month through the app, according to Google. "I genuinely think this is the beginning of the end of language barriers," Philipp Schmid, an AI developer relations engineer at Google DeepMind, posted on X.

Inside Meet, five languages and a 90-minute cap

The system 3.5 Live Translate replaces is narrower than the launch language suggests. Meet's current speech translation works between English and exactly five languages, French, German, Italian, Portuguese and Spanish, according to Google's support documentation, which also sets a 90-minute session limit and warns that "translations will be a few seconds delayed for completeness." The same page states that "no audio is saved" and "no models are trained on your voice."

The new model removes the English pivot. Meetings can run translation across 70-plus languages and more than 2,000 language combinations in a single meeting, by Google's count, and a new button in Meet's control row starts it, per 9to5Google. The private preview opens this month for select business Workspace customers, with a broader rollout promised later this year. The existing feature already gates access by paid tier, from Google AI Pro subscriptions up through Workspace Business and Enterprise plans, per the support page.

What the same-day model card concedes

Google's launch messaging promises "no awkward pauses or choppy audio, just real connection without language barriers," as the Google AI account put it on X. The model card the company published the same day reads differently. "Voices can be inconsistent, and voices may shift after long pauses, change gender, or get stuck on one voice during rapid multi-speaker sessions," the card states. Language detection "can struggle with non-native accents, similar languages, or rapid language switches," conditions that describe the Grab pickups and guided tours Google itself promotes as uses.

The card names the latency measures Google applies to the model, initial latency and word-level latency among them, without publishing a figure for either. Ars Technica's Ryan Whitwam observed that "the demos, which are all being recorded under controlled conditions, do sound impressive." Every audio stream carries a SynthID watermark woven into the waveform to mark it as AI-generated, and Google says there is currently no way to remove it.

Know someone who'd find this useful? ✉️ Email it to a friend in one click, or they can subscribe free here.

Slator puts the market at $30.85 billion

Slator's 2026 report values language solutions and AI at $30.85 billion for 2025 and projects $36.10 billion by 2031, a 2.65% annual growth rate for a sector AI was supposed to expand. Within that total, Slator's researchers said on the firm's podcast this month that traditional language-service integrators declined about 5% in 2025 while language-technology platforms grew nearly 20%. In live speech, platforms such as Wordly and KUDO price AI-only sessions at $8 to $35 per attendee-hour, against $60 to $200 per interpreter-hour when humans stay in the loop, per Fora Soft's 2026 buyer's guide.

DeepL added voice translation for Zoom, Teams and contact centers in April, and built its pitch around accuracy over speed, accepting a longer delay as the cost. Apple's Live Translation on AirPods launched last September with five languages, a list an iOS 26 update later grew to ten. At 70-plus languages inside a free consumer app and a Workspace bundle, Google enters the same contest with the widest list and no per-minute price.

Meet's broader rollout is promised for later this year, and Ars Technica expects a Gemini 3.5 Pro model within weeks. Google's model card describes two latency measurements for the translation lag, one taken at the start of speech and one aligned word by word, and reports a number for neither. A buyer weighing the free feature against a $35-per-attendee-hour vendor is comparing it against a delay Google has measured and not disclosed.

Frequently Asked Questions

What is Gemini 3.5 Live Translate?

A Google audio model released June 9, 2026 that translates speech continuously across more than 70 languages while the speaker is still talking, staying a few seconds behind and preserving intonation, pacing and pitch. It replaces turn-by-turn translation, which waits for the speaker to finish.

Where can I use Gemini 3.5 Live Translate today?

It is rolling out in the Google Translate app on Android and iOS, with an Android-only listening mode that plays translations through the phone's earpiece. Developers get it in public preview via the Gemini Live API and Google AI Studio. Google Meet gets it in private preview for select business Workspace customers this month.

What changes in Google Meet?

Meet's current speech translation covers five languages paired with English and carries a 90-minute session limit. With Gemini 3.5 Live Translate, meetings can span more than 70 languages and 2,000-plus language combinations without an English pivot. A broader rollout is promised later this year.

What are the model's known limitations?

Google's model card says voices can shift after long pauses, change gender, or get stuck on one voice in rapid multi-speaker sessions, and language detection can struggle with non-native accents, similar languages, or rapid switching. Google publishes no latency figures. All output carries a SynthID watermark marking it as AI-generated.

What does this mean for the translation industry?

Slator values the language-solutions market at $30.85 billion for 2025, with traditional service integrators already shrinking. AI interpretation platforms charge $8 to $35 per attendee-hour for what Google now offers free in Translate and bundles into paid Workspace tiers, putting direct pricing pressure on standalone vendors.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

DeepL Adds Voice Translation, but the Delay Is the Product
DeepL picked Thursday for the voice launch it has been circling for years. The new suite plugs into Zoom and Microsoft Teams, stretches to mobile conversations and training rooms, and gives contact ce
AI dubbing is not translation. It is a rights transfer in disguise.
In December 2025, anime fans opened Prime Video and heard something wrong. The words arrived in English and Latin American Spanish. The plot remained legible. But the voices, in titles including Banan
Deepgram Launches Flux Multilingual Speech Model With 10-Language Mid-Call Switching
Deepgram today announced Flux Multilingual, a conversational speech recognition model that supports 10 languages with real-time language detection and the ability to switch languages during an active
AI News

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai