VITS

Standard

I-TTS ekhawulela isiphelo-siphelo ngezwi elijwayelekile

Very Fast Isivinini
Good Ubunjani
Akukho Ukuklonya
10 Izilimi

Ngo VITS

-efficient, and highly-efficient TTS model. It is a

Izici ezibalulekile

Isingeniso esikhawulelwe

Ukwakhiwa kwengxenye-nge-ngxenye ye-evolution yezwi elisheshayo.

Ukuphatha i-batch

Uhlelo oluhle lokuphatha amatekisi amaningi ngesikhathi esifanayo.

Ukukhuluma uma ukhuluma

Uqeqesho lwe-VAE+GAN lukhiqiza i-prosody ne-rythm ezijwayelekile.

Isikhulumi esiningi

Imodeli eyodwa isekela izizwi eziningi zomsindo.

Esebenzayo

Ukusetshenziswa kwememori okuphansi nokusebenza okuhle.

Umsuka ovulekile

I-MIT ilayisenselwe nganoma iyiphi imeko yokusetshenziswa.

Sebenzisa izimo

Ukukhiqizwa komsindo Izinhlelo zokufundela nge-e-learning Abafundi bezindaba Izimemezelo ezizenzakalelayo Amasistimu we-IVR Okuqukethwe okuphezulu kwevolumu

VITS Voices

View All 109
LJSpeech (English Female)
EN
VCTK Speaker 225 (English Female)
EN
VCTK Speaker 226 (English Male)
EN
VCTK Speaker 227 (English Male)
EN
VCTK Speaker 228 (English Female)
EN
VCTK Speaker 229
EN
VCTK Speaker 230
EN
VCTK Speaker 231
EN
VCTK Speaker 232
EN
VCTK Speaker 233
EN
VCTK Speaker 234
EN
VCTK Speaker 236
EN

Indlela yokusetshenziswa VITS

  1. 1

    Ubhalise mahhala noma hlola idemo

    Dala i-akhawunti emahhala ye-TextToSpeechAI ukuze uthole ama-credits aqalayo, noma sebenzisa ikhasi lokubonisa ukulalela i-VITS ngaphambi kokufaka isicelo.

  2. 2

    Khetha umsindo noma umsindo we-VITS

    Khangela i-library yomsindo bese ukhetha umsindo ophawulwe nge-VITS badge. I-library yomsindo oningi we-VITS, kufaka phakathi i-VCTK speaker set, ikuvumela ukuthi ukhethe kusuka kuma-voices amaningi ahlukile.

  3. 3

    Faka umbhalo wakho

    Bhala noma chofoza umbhalo ofuna ukuwukhuluma kumhleli. I-VITS iphatha iziqephu ezide kahle futhi iyilungele iziqukathi kanye nezinto ezinomsindo ophezulu.

  4. 4

    Dala umsindo

    Chofoza yenza ukuhlela umsindo nge VITS. Ngenxa yokuthi i VITS ihamba ngokushesha futhi isezingeni elijwayelekile (i-10 credits ngamagama angu-1000), izimpendulo zibuyela ngokushesha ngezindleko eziphansi.

  5. 5

    Layisha phezulu noma sebenzisa i-API

    Layisha ngezansi umsindo oqediwe njenge MP3, WAV, noma OGG, noma thinta umsindo we VITS nge-TextToSpeechAI REST API ukuze usebenzise ukukhishwa kwe-automatic kuhlelo lwakho lokusebenza.

VITS I-API

Yenza ulwimi ngokuzenzakalela usebenzisa i-TextToSpeechAI REST API.

curl -X POST "https://api.texttospeechai.com/v1/generate/" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "VITS inikeza isikhulumi esisheshayo, esijwayelekile sezinhlelo zokusebenza ezivolumu ephezulu.",
    "voice": "vits-ljspeech"
  }'

Imibuzo ebuzwa kaningi

VITS (I-Variational Inference with adversarial learning for end-to-end Text-to-Speech) iyimodeli ye-TTS eyisicaba esihlanganisa i-variational autoencoder ne-adversarial GAN training. Ikhiqiza amagama azwakalayo ngokuvamile engxenyeni eyodwa, okwenza kube lula futhi kusebenze. Ungazama i-VITS mahhala ku-TextToSpeechAI.

Yebo, i-VITS ivulekile ngaphansi kwelayisense ye-MIT, ngakho-ke isekela ukusetshenziswa okuphelele kokuthengiswayo ngaphandle kokunganaki. Isetshenziswa kakhulu emikhiqizweni nokusebenza kokuthengiswayo. Ku-TextToSpeechAI, i-VITS ibiza ama-credits angama-10 ngamagama angama-1000 ku-Standard level.

TextToSpeechAI inikeza i-VITS library enkulu ekhuluma kakhulu, kufaka phakathi i-VCTK voice set ne-dozens of distinct English speakers. Imodeli eyodwa ye-VITS ingaphatha izikhulumi eziningi, ngakho ungakhetha phakathi kwezwi elihlukile ngaphandle kokushintsha ama-engine.

Usizo lwe-VITS luyahluka ngokwemodeli eqeqeshiwe. Amamodeli ajwayelekile we-VITS ahlanganisa isiNgisi, isiChinese, isiJapanese, isiKorea, isiJalimane, isiFulentshi, nezinye izilimi ezinkulu, nge-multi-speaker English covering from the VCTK dataset.

VITS ishesha kakhulu, ikhiqiza amagama ngesikhathi sangempela noma ngokushesha kunalokho ku-GPU. Ukwakhiwa kwayo okusobala kuyasobala kuvimbela izigaba eziningi zokucubungula ezinye izimo, ngakho-ke iVITS ilungele ukucubungula okuningi nokukhulu.

Hayi, i-VITS ayixhasi ukuklonywa kwezwi. Isebenzisa amamodeli aqeqeshiwe angaphambilini akhuluma-ningi kunalokho ukukopa umsindo ofuna ukuwuthola kusuka kusampula. Ukuklonywa kwezwi ku-TextToSpeechAI, sebenzisa i-F5-TTS noma i-GPT-SoVITS endaweni.

VITS ikhiqiza umsindo osezingeni eliphakeme nge-prosody ejwayelekile ne-rythm. Uma kungenjalo ku-StyleTTS 2 noma Tortoise, inikeza ukhwalithi enhle yejubane, ikakhulukazi ukucubungula okuningi.

VITS isebenza kahle ngomlando, idinga kuphela ama-GB ambalwa we-VRAM (amalunga ne-4GB). Isebenza kahle kuma-GPUs asetshenziswayo, futhi ku-TextToSpeechAI konke ukudweba kwenziwa kumaseva ethu ngakho-ke awudingi noma yiziphi izinsimbi zakho.

VITS nePiper zombili zihamba ngokushesha, MIT-licensed Standard-tier engines on TextToSpeechAI. Piper iyinto elula futhi ehamba ngokushesha, ngenkathi VITS inikeza i-multi-speaker library enkulu (kufaka phakathi VCTK) ne-prosody encane ejwayelekile. Kuzo zombili asizoxhasa ukuklonywa kwezwi.

VITS yi-Standard-tier engine, ibiza ama-credits angama-10 ngamagama angama-1000. Le yi-price-level yethu ephansi ngenxa yokuphumelela, imvelo ekhawulezayo yemodeli ye-VITS.

VITS ikhiqiza umsindo ku-22050Hz ngokusemthethweni. Ngo-TextToSpeechAI ungacela amafomethi we-MP3, WAV, noma OGG, ngokuguqulwa okuzenzakalelayo okuphathwa kuwe.

Ubhalise ku-TextToSpeechAI ukuze uthole ama-credits amahhala, bese ukhetha umsindo we-VITS, ufake umbhalo wakho, futhi ukhiqize umsindo. Ungasebenzisa futhi idemo ukulalela i-VITS ngaphambi kokwenza i-akhawunti, futhi ufinyelele i-VITS nge-REST API yethu uma ubhalisa.

Technical Specs

  • Generation Speed Very Fast
  • Output Quality Good
  • Voice Cloning Not Supported
  • Languages 10
  • GPU VRAM 1-2GB
  • Credits/1000 chars 10

Try VITS Now

Generate your first audio free. No credit card required.

Start Free