How to Pick an AI Voice for Slide Narration
A practical decision tree for picking the right Oral Slides voice based on audience, deck length, language, and brand tone.
The voice carries more of the listening experience than people expect. A great deck with the wrong voice feels off; a plain deck with the right voice still works. This guide is a short decision tree you can run in 60 seconds.
Start with the audience, not the catalog
Open the voice picker only after you know two things:
- Who is going to watch this video?
- Where will they watch it (LMS, email, YouTube, internal Slack)?
Most picking mistakes happen when teams browse the 40+ voice list first and pick whichever clip sounds "fun." Fun is rarely what an LMS audience wants.
A short decision tree
| Audience | Channel | Voice families to try first |
|---|---|---|
| Internal training | LMS / internal wiki | Ethan, Maia, Neil, Andre |
| Sales prospects | Email follow-up | Cherry, Jennifer, Aiden |
| Students | YouTube / classroom | Ethan, Mia, Maia |
| Investors | Email or short link | Neil, Jennifer, Andre |
| Casual customers | Social / product loop | Cherry, Sunny, Sohee |
These aren’t hard rules — they’re a starting bracket so you don’t audition the entire catalog.
Match the language exactly
The TTS engine sounds noticeably better when the voice is native to the script language. If your script is mostly English with a Chinese product name, pick an English voice and let the model handle the brand term. Picking a Chinese voice "for the brand name" usually breaks the rest of the sentence.
Languages currently supported:
- English, Chinese, Spanish, French, German
- Italian, Portuguese, Russian, Japanese, Korean
Mixing two languages in a single slide rarely lands well. If you must, split the bilingual content across two slides and assign a different voice per slide.
Pace, energy, accent
After the language match, three sub-decisions:
- Pace. Training and lectures should sit between 140 and 160 words per minute. Sales demos can push to 170. Anything past 180 sounds rushed in TTS.
- Energy. Higher-energy voices (Cherry, Sunny, Aiden) lift short clips. Lower-energy voices (Neil, Andre, Maia) hold attention longer.
- Accent. If the channel is regional (e.g., a deck for the UK team, or a deck targeted at southern China), a regional voice signals attention. Avoid an accent that fights the script — a Beijing-coded voice on a Cantonese product page rarely reads as "local."
Test on the worst slide first
Don’t test voices on slide 1 — slide 1 is short and forgiving. Pick the slide with:
- the longest paragraph
- a number-heavy chart explanation
- or a brand name that the model has to pronounce
If a voice nails that slide, it will nail the rest. If it stumbles on numbers or proper nouns, swap it before generating audio for the entire deck.
When in doubt, default
For most teams shipping internal videos, Ethan (M, standard Mandarin / English) and Cherry (F, warm and clear) are safe defaults. They handle most slide content cleanly, sound natural in 95% of decks, and rarely surprise you. Start there, then upgrade only if the use case is unusually formal or unusually casual.
Once you’ve picked, lock the voice at the project level. Re-rendering audio because the voice changed mid-project always invalidates the rest of the audio cache and costs credits.
Oral Slides Guides