Audio Mode — Turn-Based Translation with Voice Playback

March 2026

Live Translate Live now has a second way to translate conversations: audio mode. Instead of the continuous scrolling marquee, audio mode is a turn-based translation system — you speak, see your words transcribed on screen, tap a button, and hear the translation spoken aloud by an AI voice. Then hand the phone to the other person for their turn.

How It Works

Audio mode follows a simple cycle:

  1. Speak — press and hold the push-to-talk button, or use always-listening mode, and talk naturally in your language
  2. Review — your speech appears as text on screen with dynamic font sizing that adjusts to fit the display
  3. Edit if needed — tap the transcript to fix any words the speech recognition got wrong
  4. Translate — tap the "Translate into [Language]" button to send the text for translation
  5. Listen — the translated text is read aloud by an AI voice so the other person hears the translation spoken in their language
  6. Replay or clear — replay the audio if needed, or clear the screen and start a new turn

Each turn is self-contained. You control when the microphone listens, when the translation happens, and when to move on. There is no background processing between turns.

Hold Your Phone and Talk

The scrolling marquee mode works best when a device is laid flat on a table between two speakers — both people read their side of the screen simultaneously. That is great for sit-down conversations, but it does not work as well when you are standing, walking, or moving around.

Audio mode is designed for handheld use. Hold your phone normally, speak into it, and tap translate. The other person hears the translation spoken aloud — no need to read a screen. Hand the phone over for their turn. This makes audio mode practical in situations where laying a device on a table is not an option: standing in a market, walking through a hospital, or talking at a service counter.

Save Credits in Noisy Environments

In the live marquee mode, the speech recognition engine runs continuously while your session is active. Background noise in a busy restaurant, street, or airport is processed the same as real speech — and credits are consumed the entire time, whether anyone is speaking or not.

Audio mode works differently. Speech recognition only runs while you are actively speaking, and translation only happens when you tap the button. In a noisy restaurant where the marquee mode might burn through credits for an entire hour-long dinner, audio mode only uses credits for the sentences you actually translate. If you exchange 30 short phrases over dinner instead of running continuous recognition for 60 minutes, the difference in cost can be significant.

Audio mode uses per-character billing for translation and text-to-speech rather than time-based billing. You pay for the text you translate and the audio that gets generated — nothing more.

Audio Playback — No Screen Reading Required

The translated text is read aloud by an AI-generated voice. The other person does not need to look at the screen at all — they just listen. This makes audio mode useful in situations where reading a screen is impractical:

After the audio plays, you can tap replay to hear it again. The translated text also appears on screen as a fallback if the other person prefers to read.

Inline Editing

Speech recognition is accurate but not perfect. Proper nouns, technical terms, and accented speech can occasionally produce errors. In the scrolling marquee mode, those errors get translated immediately because the process is continuous — there is no chance to correct them before translation.

Audio mode gives you a review step. After you speak, your transcript appears on screen and you can tap to edit it. Fix a misspelled name, correct a number, or rephrase a sentence before it gets translated. This means the translation is based on exactly what you intended to say, not on what the speech recognition guessed.

Push-to-Talk and Always-Listening Modes

Audio mode supports two input methods:

Both modes feed into the same review-edit-translate cycle. The transcript builds on screen as you speak, and you translate when ready.

When to Use Audio Mode vs. the Scrolling Marquee

Both modes are available in the same app. Here is a quick comparison:

You can switch between modes at any time without losing your session or conversation history.

Supported Languages for Voice Playback

Audio mode's AI voice playback supports 32 languages: English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Russian, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Vietnamese, Norwegian, and Hungarian.

Languages outside this list are still available in the scrolling marquee mode, which displays translations as text without voice playback.

Try Audio Mode

Sign in to Live Translate Live, select your languages, and switch to audio mode. Speak, review, translate, and listen — all from your phone in your hand. No flat table required.

Start translating · View pricing · See all features


Try Live Translate Live

Start translating real-time bilingual conversations today.

Get Started Free