Zonos
UltraExpressive voice cloning with emotion and style control
About Zonos
Zonos by Zyphra is a 1.6B parameter text-to-speech model with advanced emotion and style control. It supports voice cloning from 5-30 seconds of reference audio and can modulate the emotional tone of generated speech. Choose from emotions like happiness, sadness, anger, fear, surprise, and disgust to create highly expressive and emotionally nuanced audio.
Key Features
Emotion Control
Control speech emotions: happiness, sadness, anger, fear, surprise, disgust, and neutral.
Voice Cloning
Clone any voice from 5-30 seconds of reference audio with high fidelity.
Expressive Speech
1.6B parameters produce highly expressive speech with nuanced emotional delivery.
Multilingual
Supports English, Japanese, Chinese, French, and German.
Use Cases
How to Use Zonos
-
1
Sign up or open the demo
Create a free TextToSpeechAI account to get starter credits, or use the no-signup demo to try Zonos right away.
-
2
Choose the Zonos engine
Select Zonos from the voice and model picker. To clone a voice, upload 5-30 seconds of clean reference audio so Zonos can match the speaker.
-
3
Enter your text
Type or paste the script you want spoken. Zonos works across English, Japanese, Chinese, French, and German.
-
4
Pick an emotion and generate
Choose one of the seven Zonos emotions - neutral, happiness, sadness, anger, fear, surprise, or disgust - then click generate to render expressive speech in that mood.
-
5
Download or use the API
Play back and download the finished audio, or call the same Zonos engine programmatically through the TextToSpeechAI REST API for automated workflows.
Zonos API
Generate speech programmatically using the TextToSpeechAI REST API.
curl -X POST "https://api.texttospeechai.com/v1/generate/" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Zonos generates incredibly expressive speech with fine\u002Dgrained emotion control.",
"voice": "en_US-lessac-medium"
}'
Frequently Asked Questions
Technical Specs
- Generation Speed Medium
- Output Quality Excellent
- Voice Cloning Supported
- Languages 5
- GPU VRAM 8GB+
- Credits/1000 chars 50