Kokoro
StandardLightning-fast, lightweight TTS with natural quality
About Kokoro
Kokoro is an ultra-lightweight 82M parameter TTS model that delivers natural-sounding speech at incredible speed. It runs near real-time even on CPU, making it ideal for applications where low latency is critical. Kokoro supports multiple languages and offers voice blending capabilities.
Key Features
Ultra-Lightweight
82M parameters, ~300MB model size. Runs on CPU with minimal resources.
Near Real-Time
Generates speech faster than playback speed, even without GPU acceleration.
Multi-Language
Supports English, French, Spanish, Hindi, Japanese, Chinese, Italian, Portuguese, and Korean.
Voice Blending
Mix two voices together to create unique voice combinations.
Use Cases
How to Use Kokoro
-
1
Sign up free or try the demo
Create a free TextToSpeechAI account to get 200 starter credits, or use the no-signup demo to hear Kokoro instantly. The standard tier means Kokoro only costs 10 credits per 1000 characters.
-
2
Pick a Kokoro voice
Open the voice browser and select a Kokoro voice in your target language (9 supported, from English to Japanese and Korean). You can also use Kokoro voice blending to mix two voices into a custom combination.
-
3
Enter your text
Type or paste the text you want spoken into the editor. Kokoro handles long passages efficiently thanks to its lightweight 82M-parameter, near real-time engine.
-
4
Adjust speed and generate
Set the playback speed to suit your use case, then click Generate. Kokoro renders audio faster than real-time, so your speech is ready almost immediately.
-
5
Download or use the API
Download the finished audio as MP3 or WAV, or automate generation through the TextToSpeechAI REST API at api.texttospeechai.com for batch and real-time workloads.
Kokoro API
Generate speech programmatically using the TextToSpeechAI REST API.
curl -X POST "https://api.texttospeechai.com/v1/generate/" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"text": "Kokoro delivers natural speech with incredible speed and efficiency.",
"voice": "en_US-lessac-medium"
}'
Frequently Asked Questions
Technical Specs
- Generation Speed Very Fast
- Output Quality Good
- Voice Cloning Not Supported
- Languages 9
- GPU VRAM CPU OK
- Credits/1000 chars 10