How to Pick an AI Voice for Slide Narration

A practical decision tree for picking the right Oral Slides voice based on audience, deck length, language, and brand tone.

Voice waveform — picking the right narration voice

The voice carries more of the listening experience than people expect. A great deck with the wrong voice feels off; a plain deck with the right voice still works. This guide is a short decision tree you can run in 60 seconds.

Start with the audience, not the catalog

Open the voice picker only after you know two things:

Who is going to watch this video?
Where will they watch it (LMS, email, YouTube, internal Slack)?

Most picking mistakes happen when teams browse the 40+ voice list first and pick whichever clip sounds "fun." Fun is rarely what an LMS audience wants.

A short decision tree

Audience	Channel	Voice families to try first
Internal training	LMS / internal wiki	Ethan, Maia, Neil, Andre
Sales prospects	Email follow-up	Cherry, Jennifer, Aiden
Students	YouTube / classroom	Ethan, Mia, Maia
Investors	Email or short link	Neil, Jennifer, Andre
Casual customers	Social / product loop	Cherry, Sunny, Sohee

These aren’t hard rules — they’re a starting bracket so you don’t audition the entire catalog.

Match the language exactly

The TTS engine sounds noticeably better when the voice is native to the script language. If your script is mostly English with a Chinese product name, pick an English voice and let the model handle the brand term. Picking a Chinese voice "for the brand name" usually breaks the rest of the sentence.

Languages currently supported:

English, Chinese, Spanish, French, German
Italian, Portuguese, Russian, Japanese, Korean

Mixing two languages in a single slide rarely lands well. If you must, split the bilingual content across two slides and assign a different voice per slide.

Pace, energy, accent

After the language match, three sub-decisions:

Pace. Training and lectures should sit between 140 and 160 words per minute. Sales demos can push to 170. Anything past 180 sounds rushed in TTS.
Energy. Higher-energy voices (Cherry, Sunny, Aiden) lift short clips. Lower-energy voices (Neil, Andre, Maia) hold attention longer.
Accent. If the channel is regional (e.g., a deck for the UK team, or a deck targeted at southern China), a regional voice signals attention. Avoid an accent that fights the script — a Beijing-coded voice on a Cantonese product page rarely reads as "local."

Test on the worst slide first

Don’t test voices on slide 1 — slide 1 is short and forgiving. Pick the slide with:

the longest paragraph
a number-heavy chart explanation
or a brand name that the model has to pronounce

If a voice nails that slide, it will nail the rest. If it stumbles on numbers or proper nouns, swap it before generating audio for the entire deck.

When in doubt, default

For most teams shipping internal videos, Ethan (M, standard Mandarin / English) and Cherry (F, warm and clear) are safe defaults. They handle most slide content cleanly, sound natural in 95% of decks, and rarely surprise you. Start there, then upgrade only if the use case is unusually formal or unusually casual.

Once you’ve picked, lock the voice at the project level. Re-rendering audio because the voice changed mid-project always invalidates the rest of the audio cache and costs credits.

Voice waveform — picking the right narration voice

Start with the audience, not the catalog

Open the voice picker only after you know two things:

Who is going to watch this video?
Where will they watch it (LMS, email, YouTube, internal Slack)?

Most picking mistakes happen when teams browse the 40+ voice list first and pick whichever clip sounds "fun." Fun is rarely what an LMS audience wants.

A short decision tree

Audience	Channel	Voice families to try first
Internal training	LMS / internal wiki	Ethan, Maia, Neil, Andre
Sales prospects	Email follow-up	Cherry, Jennifer, Aiden
Students	YouTube / classroom	Ethan, Mia, Maia
Investors	Email or short link	Neil, Jennifer, Andre
Casual customers	Social / product loop	Cherry, Sunny, Sohee

These aren’t hard rules — they’re a starting bracket so you don’t audition the entire catalog.

Match the language exactly

Languages currently supported:

English, Chinese, Spanish, French, German
Italian, Portuguese, Russian, Japanese, Korean

Mixing two languages in a single slide rarely lands well. If you must, split the bilingual content across two slides and assign a different voice per slide.

Pace, energy, accent

After the language match, three sub-decisions:

Pace. Training and lectures should sit between 140 and 160 words per minute. Sales demos can push to 170. Anything past 180 sounds rushed in TTS.
Energy. Higher-energy voices (Cherry, Sunny, Aiden) lift short clips. Lower-energy voices (Neil, Andre, Maia) hold attention longer.
Accent. If the channel is regional (e.g., a deck for the UK team, or a deck targeted at southern China), a regional voice signals attention. Avoid an accent that fights the script — a Beijing-coded voice on a Cantonese product page rarely reads as "local."

Test on the worst slide first

Don’t test voices on slide 1 — slide 1 is short and forgiving. Pick the slide with:

the longest paragraph
a number-heavy chart explanation
or a brand name that the model has to pronounce

If a voice nails that slide, it will nail the rest. If it stumbles on numbers or proper nouns, swap it before generating audio for the entire deck.

When in doubt, default

Once you’ve picked, lock the voice at the project level. Re-rendering audio because the voice changed mid-project always invalidates the rest of the audio cache and costs credits.

Start with the audience, not the catalog

A short decision tree

Match the language exactly

Pace, energy, accent

Test on the worst slide first

When in doubt, default

Table of Contents

How to Pick an AI Voice for Slide Narration

Start with the audience, not the catalog

A short decision tree

Match the language exactly

Pace, energy, accent

Test on the worst slide first

When in doubt, default

Table of Contents