I quite like IndexTTS2 personally, it does voice cloning and also lets you modulate emotion manually through emotion vectors which I've found quite a powerful tool. It's not necessarily something everyone needs, but it's really cool technology in my opinion.
It's been particularly useful for a model orchestration project I've been working on. I have an external emotion classification model driving both the LLM's persona and the TTS output so it stays relatively consistent. The affect system also influences which memories are retrieved; it's more likely to retrieve 'memories' created in the current affect state. IndexTTS2 was pretty much the only TTS that gives the level of control I felt was necessary.
Personally I'm not a fan of terse writing; if something's worth saying at all it's worth using suitably expressive language to describe it, and being short and cold with people isn't a good form of interpersonal communication in my view. Pleasantries are important for establishing mutual respect, if they're considered the baseline of politeness in a particular culture then it's overtly disrespectful to forgo them with strangers. Terseness is efficient for the writer certainly, but it's not necessarily for the reader.
Written like you're on one side of the cultural barrier and think that you have to be somehow naturally correct because that's what's natural to you. To others, that attitude is just arrogant and self-centered. Why should one particular culture dictate the behavior of everyone, and especially why should it be your culture?
What you call "establishing mutual respect" is just "insincere and shallow" to others. I do not believe for a second that a grocery store cashier wants to know how my day has been.
That's not what I mean, I don't like corpo-speak either. I mean just treating people like they're human beings, neither with affected shortness nor affected warmth. I really don't like the common notion that you have to be cold and short with people to be a good engineer, it makes the culture considerably less pleasant and more abrasive than it needs to be in my view.
I could just as well turn that around and say why should we all adopt your preference of unpleasantly curt communication? Is that not also an imposition of someone else's culture?
What if short isn't "cold" at all? That's a value you're projecting to it.
I understand there are cultures that value flowery speech more than mine. I'm asking you to stop using emotionally loaded words to describe how other people behave.
Nah I disagree, tool calling isn't that difficult. I've got my own Cats Effect based model orchestration project I'm working on, and while it's not 100% yet I can do web browse, web search, memory search (this one is cool), and others on my own hardware.
Yeah I do think if your trust in state institutions is gone for whatever reason (such as living in a dictatorship), it'd be absolute madness to carry around an electronic snitch with you. I'm not sure what I would rely on in those circumstances, but it certainly wouldn't be smartphones. Personally I'd want to rely on in-person communication as much as possible.
I'd go even further. Even if you trust it now, can you trust it in 5 years? How much of your data do apps, companies, and mobile providers hold onto? The real answer is that you don't know. So if your phone is a super precise GPS that you can't turn off (eg: Android) -- were you near a crime scene by chance? How about a big protest 2 years before the political winds shifted. Who knows you were there? You can't know for sure.
Part of why I like sailing is for a similar reason, beyond a certain range the only people who can bother you electronically are other people at sea (and you actually want to listen to them).
Swearing is a good heuristic still I think. The American corporate world remains rather prissy about swearing, so if the post sounds like a hairy docker after 12 pints then it's probably not an LLM.
I know the very roundabout you mean without having to look it up, I used to cycle in Oxford very often and while I’m sure there’s a tendency on the internet to underrate locals’ stories as hyperbolic, it really can’t be stressed enough how hazardous this particular feature of civil engineering is.
> If it is possible, maybe it is what some people supposedly feel as "auras"
For what it's worth, I have a disorder that causes me to see "auras" around people quite often. The nature of the disorder is that my brain can't filter out its own sensory noise properly, giving rise to a lot of visual artefacts that non-disordered brains filter out. These range from 'TV static' to stuff that's not a million miles away from diffusion model artefacts, but the auras around people I see pretty much all the time especially against plain backgrounds. It's not very well-known or studied but fMRI studies have recently implicated the same serotonin receptor psychedelics target, and it's also linked to migraine.
I think this disorder being more prevalent than expected would be a good explanation for auras. It was once thought to be very rare, but many people who have it aren't actually affected enough to seek out a diagnosis. It wouldn't be an unreasonable source for images like auras, saints' haloes, and other things like that since they're just an ordinary part of vision for me. I also think it somewhat vindicates Aldous Huxley's thoughts on the subject.
I really like the idea of electrical fields being somehow important for consciousness, and it's not something I'd rule out off the bat. I just think that disorders of perception are a better explanation for auras and similar phenomena.
I've been experimenting with something similar to this approach recently. IndexTTS2 gives you emotion vectors as an input, I used an external emotion classification model on the LLM output to modulate the TTS emotion vectors. You need to manage the state of the current affect with a bit of care or it sounds unhinged, but it's worked surprisingly well so far. I wired it together using Cats Effect.
As you'd expect latency isn't great, but I think it can be improved.
That's certainly a geopolitically contentious take if I've ever seen one.
reply