Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you know when we can expect an update on the realtime API? It’s still in beta and there are many issues (e.g voice randomly cutting off, VAD issues, especially with mulaw etc…) which makes it impossible to use in production, but there’s not much communication from OpenAI. It’s difficult to know what to bet on. Pushing for stt->llm->tts makes you wonder if we should carry on building with the realtime API.


we're working hard on it at the moment and hope we'll have a snapshot ready in the next month or so

we've debugged the cutoff issues and have fixes for them internally but we need a snapshot that's better across the board, not just cutoffs (working on it!)

we're all in on S2S models both for API and ChatGPT, so there will be lots more coming to Realtime this year

For today: the new noise cancellation and semantic voice activity detector are available in Realtime. And ofc you can use gpt-4o-transribe for user transcripts there


Agreed- really not liking how they are neglecting it… I hope they are just hard at work behind the scenes and will release something soon


S2S is where we're investing the most effort on audio ... sorry it's been slow but we are working hard on it

Top priorities at the moment 1) Better function calling performance 2) Improved perception accuracy (not mishearing) 3) More reliable instruction following 4) Bug fixes (cutoffs, run ons, modality steering)


Appreciate the efforts. It’s not there yet, but when it gets there it will open up a lot of use cases.

Any fine tuning for s2s in the horizon?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: