CosyVoice2
PremiumZero-shot multilingual voice cloning with streaming support
About CosyVoice2
CosyVoice2 is a next-generation speech synthesis model from FunAudioLLM (Alibaba). It delivers natural-sounding zero-shot voice cloning across multiple languages with streaming capability for low-latency applications. Built on a finite scalar quantization approach, it achieves excellent voice similarity with just a few seconds of reference audio.
Key Features
Zero-Shot Voice Cloning
Clone any voice from 3-10 seconds of reference audio with high fidelity.
Multilingual
Supports Chinese, English, Japanese, Korean, and Cantonese with cross-lingual synthesis.
Streaming Support
Low-latency streaming mode for real-time applications and interactive systems.
Natural Prosody
Advanced prosody modeling produces natural-sounding speech with appropriate intonation.
Use Cases
How to Use CosyVoice2
-
1
Sign up and claim free credits
Create a free TextToSpeechAI account to claim your starter credits, or try the demo first. No GPU or local CosyVoice2 install is needed - everything runs on our infrastructure.
-
2
Select CosyVoice2 and add a reference clip
Choose CosyVoice2 as your engine, then upload a clean 3-10 second reference recording of the voice you want to clone. CosyVoice2 will extract the speaker characteristics for zero-shot multilingual cloning.
-
3
Enter your text in any supported language
Type or paste your script in Chinese, English, Japanese, Korean, or Cantonese. CosyVoice2 supports cross-lingual synthesis, so the cloned voice can speak a language different from the reference clip.
-
4
Generate the speech
Click generate and CosyVoice2 synthesizes natural, multilingual speech in the cloned voice, usually within seconds for short text. Premium-tier usage costs 25 credits per 1,000 characters.
-
5
Download or use the API
Download the finished audio as MP3 or WAV from your history, or automate CosyVoice2 voice cloning at scale through the TextToSpeechAI REST API.
CosyVoice2 API
Generate speech programmatically using the TextToSpeechAI REST API.
curl -X POST "https://api.texttospeechai.com/v1/generate/" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "CosyVoice2 delivers natural multilingual speech with zero\u002Dshot voice cloning capability.",
"voice": "en_US-lessac-medium"
}'
Frequently Asked Questions
Technical Specs
- Generation Speed Fast
- Output Quality Very Good
- Voice Cloning Supported
- Languages 5
- GPU VRAM 4-6GB
- Credits/1000 chars 25