Chatterbox
PremiumZero-shot voice cloning with expressive speech in 23 languages
About Chatterbox
Chatterbox is a powerful voice cloning TTS model from Resemble AI. It performs zero-shot voice cloning from just a few seconds of reference audio, supporting 23 languages with natural expression. Chatterbox includes paralinguistic tags for adding natural sounds like laughter and coughs to generated speech.
Key Features
Zero-Shot Voice Cloning
Clone any voice from a few seconds of audio - no training required.
23 Languages
From Arabic to Chinese, covering most major world languages.
Expressive Tags
Add [laugh], [cough], [chuckle] for natural paralinguistic sounds.
Fast Inference
Sub-200ms latency with the Turbo variant for real-time applications.
Use Cases
How to Use Chatterbox
-
1
Sign up or open the demo
Create a free TextToSpeechAI account to claim 200 starter credits, or use the on-page demo to try Chatterbox without signing in.
-
2
Select Chatterbox and add a reference clip
Choose the Chatterbox engine, then upload a short (a few seconds) audio clip of the voice you want to clone. Chatterbox zero-shot clones it instantly - no training required.
-
3
Enter your text with optional tags
Type or paste the text to speak in any of the 23 supported languages, and drop in [laugh], [cough], or [chuckle] tags wherever you want natural paralinguistic sounds.
-
4
Generate the speech
Click generate and TextToSpeechAI renders your text in the cloned Chatterbox voice on hosted GPU infrastructure, spending 25 credits per 1,000 characters.
-
5
Download or use the API
Download the finished audio file, or automate generation through the TextToSpeechAI REST API at api.texttospeechai.com using your account token.
Chatterbox API
Generate speech programmatically using the TextToSpeechAI REST API.
curl -X POST "https://api.texttospeechai.com/v1/generate/" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Chatterbox can clone your voice from just a few seconds of audio and speak in 23 languages.",
"voice": "en_US-lessac-medium"
}'
Frequently Asked Questions
Technical Specs
- Generation Speed Fast
- Output Quality Very Good
- Voice Cloning Supported
- Languages 23
- GPU VRAM 4-8GB
- Credits/1000 chars 25