Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pub/sub via WebSockets seems like the simplest solution. You'll need to change your LLM serving architecture around a little bit to use a pub/sub system that a microservice can grab the output from (to send to the client) but it's not rocket science.

It's yet another system that needs some DRAM though. The good news is that you can auto-expire the queued up responses pretty fast :shrug:

No idea if it's worth it, though. Someone with access to the statistics surrounding dropped connections/repeated prompts at a big LLM service provider would need to do some math.



Corporate security hates websockets though, SSE is much easier for end-users to get approved.


I think it would be even more wasteful to continue inference in background for nothing if the user decided to leave without pressing the stop button. Saving the partial answer at the exact moment the client disappeared would be better.


What if I want to have the agent go off and work on something for a while and I'll check back tomorrow?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: