About VITS
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is a fast, end-to-end neural TTS model that generates natural-sounding speech. It combines variational autoencoders with adversarial training for efficient synthesis. VITS is excellent for batch processing and applications requiring both quality and speed.
Key Features
Fast Synthesis
End-to-end architecture for rapid speech generation.
Batch Processing
Efficiently process multiple texts simultaneously.
Natural Speech
VAE+GAN training produces natural prosody and rhythm.
Multi-Speaker
Single model supports multiple speaker voices.
Efficient
Low memory footprint with good performance.
Open Source
MIT licensed for any use case.
Use Cases
VITS Voices
View All 109LJSpeech (English Female)
ENVCTK Speaker 225 (English Female)
ENVCTK Speaker 226 (English Male)
ENVCTK Speaker 227 (English Male)
ENVCTK Speaker 228 (English Female)
ENVCTK Speaker 229
ENVCTK Speaker 230
ENVCTK Speaker 231
ENVCTK Speaker 232
ENVCTK Speaker 233
ENVCTK Speaker 234
ENVCTK Speaker 236
ENHow to Use VITS
-
1
Sign up free or try the demo
Create a free TextToSpeechAI account to get starter credits, or use the on-page demo to hear VITS before signing up.
-
2
Pick a VITS voice or speaker
Browse the voice library and choose a voice marked with the VITS badge. The multi-speaker VITS library, including the VCTK speaker set, lets you select from many distinct voices.
-
3
Enter your text
Type or paste the text you want spoken into the editor. VITS handles long passages well and is ideal for batch and high-volume content.
-
4
Generate the audio
Click generate to synthesize speech with VITS. Because VITS is very fast and Standard-tier (10 credits per 1000 characters), results return quickly at low cost.
-
5
Download or use the API
Download the finished audio as MP3, WAV, or OGG, or call the same VITS voice through the TextToSpeechAI REST API to automate generation in your own application.
VITS API
Generate speech programmatically using the TextToSpeechAI REST API.
curl -X POST "https://api.texttospeechai.com/v1/generate/" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "VITS delivers fast, natural speech for high\u002Dvolume applications.",
"voice": "vits-ljspeech"
}'
Frequently Asked Questions
Technical Specs
- Generation Speed Very Fast
- Output Quality Good
- Voice Cloning Not Supported
- Languages 10
- GPU VRAM 1-2GB
- Credits/1000 chars 10