Kokoro

Standard

Lightning-fast, lightweight TTS with natural quality

Very Fast Speed
Good Quality
No Cloning
9 Languages

About Kokoro

Kokoro is an ultra-lightweight 82M parameter TTS model that delivers natural-sounding speech at incredible speed. It runs near real-time even on CPU, making it ideal for applications where low latency is critical. Kokoro supports multiple languages and offers voice blending capabilities.

Key Features

Ultra-Lightweight

82M parameters, ~300MB model size. Runs on CPU with minimal resources.

Near Real-Time

Generates speech faster than playback speed, even without GPU acceleration.

Multi-Language

Supports English, French, Spanish, Hindi, Japanese, Chinese, Italian, Portuguese, and Korean.

Voice Blending

Mix two voices together to create unique voice combinations.

Use Cases

Real-time chatbots and virtual assistants Live streaming text-to-speech Edge deployment and mobile applications High-volume batch processing

How to Use Kokoro

  1. 1

    Sign up free or try the demo

    Create a free TextToSpeechAI account to get 200 starter credits, or use the no-signup demo to hear Kokoro instantly. The standard tier means Kokoro only costs 10 credits per 1000 characters.

  2. 2

    Pick a Kokoro voice

    Open the voice browser and select a Kokoro voice in your target language (9 supported, from English to Japanese and Korean). You can also use Kokoro voice blending to mix two voices into a custom combination.

  3. 3

    Enter your text

    Type or paste the text you want spoken into the editor. Kokoro handles long passages efficiently thanks to its lightweight 82M-parameter, near real-time engine.

  4. 4

    Adjust speed and generate

    Set the playback speed to suit your use case, then click Generate. Kokoro renders audio faster than real-time, so your speech is ready almost immediately.

  5. 5

    Download or use the API

    Download the finished audio as MP3 or WAV, or automate generation through the TextToSpeechAI REST API at api.texttospeechai.com for batch and real-time workloads.

Kokoro API

Generate speech programmatically using the TextToSpeechAI REST API.

curl -X POST "https://api.texttospeechai.com/v1/generate/" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Kokoro delivers natural speech with incredible speed and efficiency.",
    "voice": "en_US-lessac-medium"
  }'

Frequently Asked Questions

Kokoro is an ultra-lightweight text-to-speech model with only 82 million parameters. Despite its small size, it produces natural-sounding speech across multiple languages at near real-time speed, even on CPU.

Yes, Kokoro is fully Apache 2.0 licensed - both code and model weights. It can be used freely in commercial applications with no restrictions.

Kokoro supports English (US and British), French, Spanish, Hindi, Japanese, Chinese, Italian, Portuguese, and Korean.

Kokoro is one of the fastest TTS models available. It generates speech faster than real-time playback speed even on CPU, making it ideal for interactive applications.

No, Kokoro does not support voice cloning. It uses a curated voice library with voice blending capabilities. For voice cloning, use F5-TTS, Chatterbox, StyleTTS2, OpenVoice, or Tortoise.

Kokoro can mix two voices together to create unique combinations. This allows you to create custom voice characteristics without traditional voice cloning.

Both are fast, lightweight models. Kokoro has a more modern architecture and supports voice blending, while Piper has a larger voice library. Both are excellent for real-time applications.

Kokoro is designed to run on CPU and requires minimal resources - approximately 300MB. No GPU is needed, though GPU acceleration is supported for even faster processing.

Yes. Kokoro generates speech faster than playback even on CPU, with very low latency, so it is an excellent fit for chatbots, voice assistants, and live streaming. Its 82M-parameter size keeps memory use tiny, making it practical for high-volume and edge deployments.

Voice blending lets you mix two Kokoro voices together to create a unique combination with custom characteristics. It is not traditional voice cloning - you cannot reproduce a specific person from a sample - but it gives you more variety than a fixed voice library. You can experiment with blends directly in the TextToSpeechAI editor.

Both are fast, CPU-friendly standard-tier engines without voice cloning. Kokoro is the lightest (about 300MB) and supports voice blending across 9 languages, while MeloTTS focuses on multiple English accents and real-time multilingual output. Choose Kokoro for the smallest footprint and blending; choose MeloTTS when you need specific accents.

Kokoro is a standard-tier engine, costing 10 credits per 1000 characters - the lowest tier on TextToSpeechAI. New accounts get 200 free credits, so you can try Kokoro without paying. This makes it one of the most cost-effective ways to generate high-quality speech at scale.

Technical Specs

  • Generation Speed Very Fast
  • Output Quality Good
  • Voice Cloning Not Supported
  • Languages 9
  • GPU VRAM CPU OK
  • Credits/1000 chars 10

Try Kokoro Now

Generate your first audio free. No credit card required.

Start Free