What is the state of AI voice generation?

Post by **rusty_shackleford** » April 5th, 2023, 07:35

Specifically, for use in video games

Post by **WhiteShark** » May 6th, 2023, 12:32

Do you mean for on-the-fly generation (paired with on-the-fly text geneartion) or pre-generated? I haven't experimented myself but the clips I've heard have been pretty darn convincing, though not 100% perfect. I suspect that you have to finesse it a little to get the exact pauses and such that you want, though, which would make it a bit worse for on-the-fly use.

Red7 · Post by **Red7** » August 15th, 2023, 18:34

ye its pretty much still shit "on the fly" but why would u use on the fly. if u do it, means u running your game non locally which means u dont own it.
which is bad

with custom model u can do almost anything but requires human q/a at least for now. you make lets say 20-40 generations and you will get what you want if you set variation high enough.

va has no reason to exist anymore other than fake job just like govermental employees, you can have 1 guy replacing 100 vas easily.

Shillitron · Post by **Shillitron** » September 28th, 2023, 17:11

There's sort of three camps worth looking at:

(I am only gonna post a few git repo's / examples for each, there are actually many many more)

Text-To-Speech Commercial AKA ElevenLabs
Pros: Dips between Okay to Good, Can use it a lot for free
Cons: Making "problematic content" can have you nuked

https://elevenlabs.io/

Text-To-Speech Open Source
Pros: Can do anything the fuck you want
Cons: Quality is lower but steadily improving..

https://github.com/coqui-ai/TTS (Just a generic TTS)
https://github.com/DanRuta/xVA-Synth (A Library of Pre-Trained Voice Models based on popular games - most stuff I've seen is ass, although sometimes it can be good)

Retrieval-based-Voice-Conversion (RVC) Open Source
This works a little different, it allows you to convert your voice to a different voice leveraging your speaking cadence but able to mimic other voices. Basically you remove a lot of the difficult parts of training a voice model by speaking yourself and let the AI focus on just pitch, tone and inflection.

Pros: Has the best results IMO
Cons: You have to record someone speaking lines as an input. More work, can fuck it up.

https://github.com/RVC-Project/Retrieva ... ion-WebUI/
https://github.com/w-okada/voice-changer

Humbaba · Post by **Humbaba** » September 28th, 2023, 18:41