We have a Steam curator now. You should be following it. https://store.steampowered.com/curator/44994899-RPGHQ/

What is the state of AI voice generation?

I'm sorry Dave, I'm afraid I can't do that
Post Reply
User avatar
WhiteShark
Turtle
Turtle
Posts: 2151
Joined: Feb 2, '23

Post by WhiteShark »

Do you mean for on-the-fly generation (paired with on-the-fly text geneartion) or pre-generated? I haven't experimented myself but the clips I've heard have been pretty darn convincing, though not 100% perfect. I suspect that you have to finesse it a little to get the exact pauses and such that you want, though, which would make it a bit worse for on-the-fly use.
User avatar
Red7
Posts: 2231
Joined: Aug 11, '23

Post by Red7 »

ye its pretty much still shit "on the fly" but why would u use on the fly. if u do it, means u running your game non locally which means u dont own it.
which is bad

with custom model u can do almost anything but requires human q/a at least for now. you make lets say 20-40 generations and you will get what you want if you set variation high enough.

va has no reason to exist anymore other than fake job just like govermental employees, you can have 1 guy replacing 100 vas easily.
User avatar
Shillitron
Turtle
Turtle
Posts: 1675
Joined: Feb 6, '23
Location: ADL Head Office

Post by Shillitron »

There's sort of three camps worth looking at:

(I am only gonna post a few git repo's / examples for each, there are actually many many more)

Text-To-Speech Commercial AKA ElevenLabs
Pros: Dips between Okay to Good, Can use it a lot for free
Cons: Making "problematic content" can have you nuked
Text-To-Speech Open Source
Pros: Can do anything the fuck you want
Cons: Quality is lower but steadily improving..

Retrieval-based-Voice-Conversion (RVC) Open Source
This works a little different, it allows you to convert your voice to a different voice leveraging your speaking cadence but able to mimic other voices. Basically you remove a lot of the difficult parts of training a voice model by speaking yourself and let the AI focus on just pitch, tone and inflection.

Pros: Has the best results IMO
Cons: You have to record someone speaking lines as an input. More work, can fuck it up.
Post Reply