Dia

Ultra

Dialogue-oriented TTS with voice cloning and nonverbal sounds

Medium Speed
Excellent Quality
Yes Cloning
1 Languages

About Dia

Dia by Nari Labs is a 1.6B parameter dialogue-focused text-to-speech model. It excels at generating natural conversational speech with support for nonverbal sounds like laughter, sighs, and coughs. Dia supports multi-speaker dialogue generation and voice cloning from 5-10 seconds of reference audio, making it ideal for creating realistic conversations and character voices.

Key Features

Dialogue Generation

Generate natural multi-speaker conversations with distinct voices and turn-taking.

Nonverbal Sounds

Add [laughs], [sighs], [coughs], (gasps) for natural paralinguistic expression.

Voice Cloning

Clone any voice from 5-10 seconds of reference audio for personalized speech.

Natural Conversation

1.6B parameters produce highly natural conversational prosody and intonation.

Use Cases

Dialogue and conversation generation Audiobook production with multiple characters Game character voices Podcast and content creation

How to Use Dia

  1. 1

    Sign up free or open the demo

    Create a free TextToSpeechAI account to claim your starter credits, or open the no-signup demo to try Dia dialogue right away.

  2. 2

    Select the Dia engine

    In the TTS dashboard choose Dia from the engine list. Dia is the dialogue-oriented, ultra-tier model with multi-speaker and voice-cloning support.

  3. 3

    Write a dialogue script with tags

    Compose your conversation using [S1] and [S2] to mark each speaker turn, and drop in nonverbal tags such as [laughs], [sighs], [coughs], or (gasps) where you want natural reactions.

  4. 4

    Generate the audio

    Click generate to send your Dia script to our hosted GPUs. Dia renders the two-speaker dialogue with turn-taking and your nonverbal tags into a single audio file.

  5. 5

    Download or call the API

    Download the finished dialogue in your chosen format, or automate it by posting the same [S1]/[S2] script to the TextToSpeechAI API with your account token.

Dia API

Generate speech programmatically using the TextToSpeechAI REST API.

curl -X POST "https://api.texttospeechai.com/v1/generate/" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "[S1] Hello there! How are you today? [laughs] [S2] I am doing great, thanks for asking!",
    "voice": "en_US-lessac-medium"
  }'

Frequently Asked Questions

Dia is a 1.6B parameter dialogue-oriented text-to-speech model from Nari Labs. It specializes in generating natural conversational speech with support for multiple speakers, nonverbal sounds, and voice cloning.

Yes, Dia is fully Apache 2.0 licensed - both code and model weights. It can be used freely in commercial applications.

Currently Dia supports English only. The model is optimized for natural English conversational speech.

Dia requires approximately 10GB of VRAM for its 1.6B parameter model. A GPU with at least 12GB is recommended for comfortable operation. On TextToSpeechAI all of this runs on our hosted GPUs, so you do not need any hardware of your own.

Yes - dialogue is exactly what Dia is built for. By alternating [S1] and [S2] turns in your script, Dia TTS produces a flowing two-speaker conversation with distinct voices and realistic turn-taking, which is harder to achieve with single-speaker TTS models.

Prefix each line of your script with [S1] or [S2] to mark who is talking. Dia assigns a consistent voice to each tag and switches between them as the conversation moves, so [S1] and [S2] act as the two characters in your dialogue.

Yes. Dia supports voice cloning from roughly 5-10 seconds of clean reference audio, letting you reuse a specific voice for a speaker. You can combine cloning with the [S1]/[S2] tags so each character in a dialogue sounds like the voice you cloned.

Dia renders [laughs], [sighs], [coughs], and (gasps) as natural paralinguistic sounds woven into the speech rather than spoken words. Place a tag where you want the reaction - for example "[S1] That is hilarious [laughs]" - to make the dialogue feel more human.

Both Dia and Bark support expressive nonverbal sounds, but Dia is purpose-built for multi-speaker dialogue with [S1]/[S2] turn-taking and voice cloning. Choose Dia for realistic two-person conversations and character work; Bark is a better fit when you need broader language coverage in single-voice narration.

Dia is an ultra-tier engine, so it costs 50 credits per 1,000 characters of generated speech. The ultra tier reflects the larger 1.6B model and the ~10GB of GPU memory it uses for high-quality dialogue.

Yes. New TextToSpeechAI accounts include free starter credits, and there is a demo you can run without signing up. That is enough to generate a short Dia dialogue with [S1]/[S2] tags before deciding on a paid plan.

Yes. Once you have an API token from your account page you can submit Dia dialogue scripts - including [S1]/[S2] turns and tags like [laughs] - to the TextToSpeechAI REST API and download the resulting audio programmatically.

Technical Specs

  • Generation Speed Medium
  • Output Quality Excellent
  • Voice Cloning Supported
  • Languages 1
  • GPU VRAM 10GB
  • Credits/1000 chars 50

Try Dia Now

Generate your first audio free. No credit card required.

Start Free