How do I add or edit text-to-speech clips? – Help Center

Text-to-speech voices in Vyond are hosted by trusted third party providers. Our third party providers include Microsoft Azure, Amazon Polly, WellSaid, and Google. WellSaid and Google voices are for Enterprise and Agency users only.

Please note that voices provided by Oddcast are retired and no longer available for use, for more information please check out the link here.

Credits will be used for text-to-speech. Please check out the link here to get more details on credit allocation.

Creating new text-to-speech items
Editing existing text-to-speech items
List of available text-to-speech languages
Multilingual Voices
Auto Detect
Neutral Voice
Other tips and considerations

Voices with the option to adjust tone will display a smiling face icon next to their name.

Screenshot 2023-11-16 at 11.25.19 AM.png

Creating new text-to-speech items

1. Select a character on the stage and open the Dialog panel in the toolbar.
Screenshot 2023-11-16 at 11.28.26 AM.png

2. Click ADD DIALOG to open the menu and select Text-to-Speech.
Screen_Shot_2019-04-26_at_2.25.51_PM.png

3. Select a language, voice accent, and voice name from the dropdown menus.
Screenshot 2023-11-16 at 1.33.14 PM.png

4. Type your text in the box and click on the robot symbol to generate the clip.
Screenshot 2023-11-16 at 1.38.51 PM.png

5. If available, adjust the tone, speed, or pitch of the text-to-speech audio using the VOICE STYLE section.

Screenshot 2023-11-16 at 1.43.35 PM.png

Editing existing text-to-speech items

1) Right click on the text-to-speech clip in the timeline and select Settings.
Screenshot 2023-11-16 at 1.46.28 PM.png

2) Click the Edit icon in the text-to-speech panel.
Screenshot 2023-11-16 at 1.48.27 PM.png

3) Select a new language and/or voice from the menu (if needed) or adjust the text.
Screenshot 2023-11-16 at 2.13.23 PM.png

4) Edit the text in the box and click on the robot symbol to regenerate the clip.
Screenshot 2023-11-16 at 2.15.23 PM.png

List of available text-to-speech languages

Albanian, Armenian, Assamese, Azerbaijani, Afrikaans, Amharic, Arabic, Bangla, Basque, Bosnian, Bulgarian, Burmese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Inuktitut, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Marathi, Mongolian, Nepali, Norwegian, Odia, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian,Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Zulu (last updated - January 26, 2022).

Please note, we have a child's voice: Female - Ana from English (US).

At the bottom of the article, you will find an attachment (.xlsx) with a full list of text-to-speech voices.

Multilingual Voices

Two new voices have been added for multilingual use. These voices can also be used in different languages to create more consistency throughout a video.

Female - Jenny (Multilingual)
Male - Ryan (Multilingual)

Screen Shot 2023-09-06 at 9.04.21 AM.png

Please note that mixing language scripts may not result in a text-to-speech generation or may only result in one language being converted. We recommend only using one language script.

Auto Detect

Auto detect has been added to text-to-speech. You can input the text of a specific supported language and the panel will detect the language after a few moments:

Screenshot 2023-11-16 at 2.18.06 PM.png

Please note that if the language is not supported, an error will appear and the language will default back to English. Supported languages can be found in this article under the section List of available text-to-speech languages.

Neutral Voice

A neutral voice has been added to text-to-speech. The voice is called Neutral-Blue.

Other tips and considerations

There is a 3000 character limit for all text-to-speech for latin base languages. For non-latin characters (Chinese, Korean, Japanese, Hindi and the other Indic languages), the TTS will fail to generate if there are more than 1,666 characters in the TTS.
Avoid using the ampersand symbol (&) when generating non-English WellSaid Labs TTS voices. Ampersands (&) will cause voice generations to fail no matter the script.
- This applies to non-English WellSaid Labs TTS voices when creating in the Studio, or when creating a Vyond Go video.
You can hover your mouse over the selected text-to-speech track in the timeline to see the language, voice origin, and name used.

Screenshot 2023-11-16 at 2.20.44 PM.png

When typing text in a specific language, make sure to use the appropriate alphabet - for example, if typing in Russian, use the Cyrillic alphabet in the TTS field.
The text-to-speech engine is sensitive to accent marks (e.g: á, ê, ī). In order to guarantee a correct pronunciation, make sure the text has the appropriate accent marks if applicable.

Here are some other useful tips:

Add "fake" punctuations for the TTS voice to mark pauses (e.g: add "," for a short pause and "." for a longer pause).
Use "phonics" to write words that are not part of the dictionary (e.g: for the engine to pronounce "Gigya", write it as "guee gya").
Split the dialog into smaller parts of 10 words or less. For example:
"The Economic Framework is a set of decision rules that align everyone to the financial objectives of the solution and guides the economic decision-making process"
1) "The Economic Framework is a set of decision rules"
2) "that align everyone to the financial objectives of the solution"
3) "and guides the economic decision-making process"
While our text-to-speech is powered by our trusted third party providers, Vyond recommends avoiding the use of sensitive information.

Note: No TTS provider is able to automatically provide the correct voice cadence or tone desired, and some tweaking is usually necessary.

TTS Voices in Vyond.xlsx
50 KB Download