Pocket TTS

Standard

Ultra-lightweight voice cloning that runs real-time on CPU

Very Fast Speed
Good Quality
Yes Cloning
2 Languages

About Pocket TTS

Pocket TTS by Kyutai is an ultra-lightweight 100M parameter text-to-speech model that runs in real-time on CPU. Despite its tiny size, it supports voice cloning from just 5 seconds of reference audio. Perfect for edge deployment, mobile applications, and scenarios where GPU resources are limited. Currently supports English and French.

Key Features

Ultra-Lightweight

100M parameters - runs real-time on CPU with minimal resources.

Voice Cloning

Clone any voice from just 5 seconds of reference audio, even on CPU.

Real-Time on CPU

No GPU required. Generates speech at real-time speed on standard hardware.

Edge-Ready

Small enough for mobile devices, Raspberry Pi, and embedded systems.

Use Cases

Edge and mobile deployment Real-time voice assistants on CPU IoT and embedded devices Low-resource voice cloning

How to Use Pocket TTS

  1. 1

    Sign up free or try the demo

    Create a free TextToSpeechAI account to receive starter credits, or use the on-site demo to hear Pocket TTS before signing up. No GPU or local install is needed.

  2. 2

    Select Pocket TTS and add a voice to clone

    Choose Pocket TTS as your engine, then upload a short reference clip of about 5 to 10 seconds to clone that voice. Pocket TTS runs entirely on CPU, so cloning is fast and lightweight.

  3. 3

    Enter your text

    Type or paste the English or French text you want spoken. Keep an eye on the character count, since Pocket TTS bills at the standard rate of 10 credits per 1,000 characters.

  4. 4

    Generate the audio

    Click generate and Pocket TTS synthesizes your text in the cloned voice at real-time speed. Most clips are ready in seconds because the model is so small and CPU-efficient.

  5. 5

    Download or use the API

    Download the finished audio, or automate generation through the TextToSpeechAI REST API at api.texttospeechai.com using your account token. The API exposes the same Pocket TTS cloning and synthesis for your own apps.

Pocket TTS API

Generate speech programmatically using the TextToSpeechAI REST API.

curl -X POST "https://api.texttospeechai.com/v1/generate/" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Pocket TTS delivers voice cloning that runs in real\u002Dtime, even on CPU.",
    "voice": "en_US-lessac-medium"
  }'

Frequently Asked Questions

Pocket TTS is an ultra-lightweight text-to-speech model from Kyutai with only 100 million parameters. It runs in real-time on CPU and supports voice cloning from 5 seconds of audio.

Pocket TTS is licensed under CC-BY-4.0, which allows commercial use with attribution. You must credit Kyutai when using it in commercial applications.

Currently Pocket TTS supports English and French. More languages may be added in future releases.

Yes! With only 100M parameters, Pocket TTS runs at real-time speed on standard CPU hardware. No GPU is needed, making it ideal for edge deployment and mobile applications.

Both are lightweight and run well on CPU. Pocket TTS uniquely supports voice cloning (Kokoro does not). Kokoro supports more languages (9 vs 2). Choose Pocket TTS if you need lightweight voice cloning, Kokoro if you need more language coverage.

Provide 5 seconds of reference audio. Pocket TTS extracts speaker characteristics and can generate new speech in that voice. Quality improves with longer references (up to 10 seconds).

Yes. Unlike most cloning models that require a GPU, Pocket TTS performs zero-shot voice cloning entirely on CPU thanks to its tiny 100M-parameter footprint. You can clone a voice from a short clip even on a laptop or single-board computer.

Pocket TTS is released under CC-BY-4.0, so you must credit Kyutai as the original creator when you use or redistribute it. A simple attribution such as "Voice generated with Pocket TTS by Kyutai" satisfies the license for commercial and non-commercial use alike.

Pocket TTS generates speech at real-time or faster on a standard CPU, with no GPU required. This makes it one of the most responsive engines for low-latency use cases like live voice assistants and on-device generation.

Pocket TTS is in the standard pricing tier, costing 10 credits per 1,000 characters. That makes it one of the most economical voice-cloning options available on TextToSpeechAI.

Both are lightweight, CPU-friendly, standard-tier engines. Pick Pocket TTS when you need voice cloning, since Kokoro does not support it. Pick Kokoro when you need broader language coverage and do not need to clone a specific voice.

Yes. New TextToSpeechAI accounts receive free starter credits, and the on-site demo lets you hear Pocket TTS before committing. Sign up for free, upload a short reference clip, and generate cloned speech in seconds.

Technical Specs

  • Generation Speed Very Fast
  • Output Quality Good
  • Voice Cloning Supported
  • Languages 2
  • GPU VRAM CPU OK
  • Credits/1000 chars 10

Try Pocket TTS Now

Generate your first audio free. No credit card required.

Start Free