You should try it! I wouldn’t say it’s the best, far from that. But also wouldn’t say it’s terrible. If you have a 5090, then yes, you can run much more powerful models in real time. Chatterbox is a great model though
I should have posted the reference audio used with the examples. Honestly it doesn’t sound so different from them. Voice cloning can be from a cartoon too, doesn’t have to be from a human being
A before / after with the reference and output seems useful to me, and maybe a range from more generic to more recognizable / celebrity voice samples so people can kinda see how it tackles different ones?
(Prominent politician or actor or somebody with a distinct speaking tone?)
I agree with the comment above. I have not logged into hacker news in _years_ but did so today just to weigh in here. If people are saying that the audio sounds great, then there is definitely something going on with a subset of users where we are only hearing garbled words with a LOT of distortion. This does not sound like natural speech to met at all. It sounds more like a warped cassette tape. And I do not mean to slight your work at all. I am actually incredibly puzzled here to understand why my perception of this is so radically different from others!
Also keep in mind the processing time. The ^ article above used a NVIDIA L4 with 24-GB VRAM. Sopro claims 7.5 second processing time on CPU for 30 seconds of audio!
If you want to get real good quality TTS, you should check out elevenlabs.io
Yes, you are right. However, there are many upsides to this kind of technology. For example, it can restore the voices of people who were affected by numerous diseases
Ok, that's an interesting angle, I had not thought of that, but of course you'd still need a good sample of them from before that happened. Thank you for the explanation.
This is my side “hobby”. And compute is quite expensive. But if the community’s responsive is good, I will definitely think about it! Btw, chatterbox is a great model and inspiration
I was on similar path and saw my bills going over 1000 dollars as interests to do research and ablations grew. Then I decided to get one Blackwell Pro 6000 and trying things with that :)
If you have suggestions on how to manage metrics let us know. Currenty trying langfuse since its one click install on coolify
reply