Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like the real innovation of LLM-based AI models is the creation of a new human-computer interface.

Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent. Certainly revolutionary, but not true AGI in the sense of the machine having truly independent agency and consciousness.

In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts for an AI interface. Keyboards will become a power-user interface and only used for highly technical tasks, similar to the way terminal interfaces are currently used to access lower-level systems.



It always surprises me when someone predicts that keyboards will go away. People love typing. Or I do love typing. No way I am going to talk to my phone, especially if someone else can hear it (which is always basically).


Heh, I had this dream/nightmare where I was typing on a laptop at a cafe and someone came up to me and said, "Oh neat, you're going real old-school. I like it!" and got an info dump about how everyone just uses AI voice transcription now.

And I was like, "But that's not a complete replacement, right? What about the times when you don't want to broadcast what you're writing to the entire room?"

And then there was a big reveal that AI has mastered lip-reading, so even then, people would just put their lips up to the camera and mouth out what they wanted to write.

With that said, as the owner of tyrannyofthemouse.com, I agree with the importance of the keyboard as a UI device.


It’s interesting to note that nobody even talks on their phone anymore, they type (on terrible “keyboards”!).


Interesting, I get so many "speech messages" in WhatsApp, nobody is really writing anymore. Its annoying. WhatsApp even has a transcript feature to put it back to text.


Personally I block anyone who does that.


For chat apps, once you've got the conversation thread open, typing is pretty easy.

I think the more surprising thing is that people don't use voice to access deeply nested features, like adding items to calendars etc which would otherwise take a lot of fiddly app navigation.

I think the main reason we don't have that is because Apple's Siri is so useless that it has singlehandedly held back this entire flow, and there's no way for anyone else to get a foothold in smartphone market.


Google Assistant is/was pretty good...for Google apps. It's useless for anything else. The new Gemini powered version is actually a regression imo


I have fat fingers, I always dictate into the phone if I need to send a message longer than 2-3 words.


They talk on zoom, teams etc. yes phone is almost dead in the office.


Those are applications, not interfaces. No one controls those applications with their voices, they use buttons, either touch or mechanical.


Just because you don't doesn't mean other people aren't. It's pretty handy to be able to tell Google to turn off the hallway light from the bedroom, instead of having to get out of bed to do that.


They talk to other humans on those apps, not the computer. I've noticed less dictation over time in public but that's just anecdotal. I never use voice when a keyboard is available.


I think an understated thing that's been happening is that people have been investing heavily into their desktop workspace. Even non-gamers have decked out mics, keyboards, monitors, the whole thing. It's easy to forget because one of the most commonly accepted sayings for awhile now has been "everyone's got a computer in their pocket". They have nice setups at home too.

When you have a nice mic or headset and multiple monitors and your own private space, it's totally the next step to just begin working with the computer with voice. Voice has not been a staple feature of people's workflow, but I think all that is about to change (Voice as an interface, not as a communication tool, that's been around since 1876.


Voice is slow and loud. If you think voice is going to make a comeback in the desktop PC space as a primary interface I am guessing you work from home and have no roommates. Am I close?


I, for one, am excited about the security implications of people loudly commanding their computers to do things for them, instead of discreetly typing.


Everyone having a computer in their pocket and multiple modes of access have made the keyboard and conventional computer less relevant.

But-- that means "not pivotal any more, just hugely important."


I talk all the time to the AI on my phone. I was using ChatGPT's voice interface then it failed probably because my phone is too old. Now I use Gemini. I don't usually do alot with it but when I go on walks I talk with it about different things I want to learn. to me it's a great way to learn about something at a high level. or talk through ideas.


What failed about ChatGPT Voice? I work on it and would love to see it fixed/make sure you haven't hit a bug I don't know about!


Nobody wants AI voice to say : uh um er. Otherwise we’d have the radio and tv full of people talking like that


Honestly, I would love for the keyboard input style to go away completely. It is such an unnatural way to interact with a computing device compared to other things we operate in the world. Misspellings, backspacing, cramped keys, different layout styles depending on your origin, etc make it a very poor input device - not to mention people with motor function difficulties. Sadly, I think it is here to stay around for a while until we get to a different computing paradigm.


I hope not. I make many more verbal mistakes than typed ones, and my throat dries and becomes sore quickly. I prefer my environment to be as quiet as possible. Voice control is also terrible for anything requiring fine temporal resolution.


> make it a very poor input device

Wow, I've always felt the keyboard is the pinnacle of input devices. Everything else feels like a toy in comparison.


The only thing better than a keyboard is direct neural interface, and we aren't there yet.

That aside, keyboard is an excellent input device for humans specifically because it is very much designed around the strengths of our biology - those dextrous fingers.


Buttons are accurate (1:1) input. Will never go away


I play as a wizard character in an online game. If I had to actually speak all those spells, in quick succession, for hours at a time ...


If wizardry really existed, I’d guess battles will be more about pre-recorded spells and enchanted items (a la Batman) than going at it like in Harry-Potter.


Voice interface sound awful. But maybe I am a power user. I don't even like voice interface to most people.


I also find current voice interfaces are terrible. I only use voice commands to set timers or play music.

That said, voice is the original social interface for humans. We learn to speak much earlier than we learn to read/write.

Better voice UIs will be built to make new workflows with AI feel natural. I'm thinking along the lines of a conversational companion, like the "Jarvis" AI in the Iron Man movies.

That doesn't exist right now, but it seems inevitable that real-time, voice-directed AI agent interfaces will be perfected in coming years. Companies, like [Eleven Labs](https://elevenlabs.io/), are already working on the building blocks.


Young people don't even speak to each other on the phone anymore.


For a voice-directed interface to be perfected, speech recognition would need to be perfected first. What makes that development seem inevitable?


It doesn't work well at all with ChatGPT. You say something, and in the middle of a sentence, ChatGPT in Voice mode replies to you something completely unrelated


It works great with my kids sometimes. Asking a series of questions about some kid-level science topic for instance. They get to direct it to exactly what they want to know, and you can see they are more actively engaged than watching some youtube video or whatever.

I'm sure it helps that it's not getting outside of well-established facts, and is asking for facts and not novel design tasks.

I'm not sure but it also seems to adopt a more intimate tone of voice as they get deeper into a topic, very cozy. The voice itself is tuned to the conversational context. It probably infers that this is kid stuff too.


Or it stops talking mid-sentence because you cleared your throat or someone else in the room is watching TV and other people are speaking.


Voice is really sub-par and slow, even if you're healthy and abled. And loud and annoying in shared spaces.

I wonder if we'll have smart-lens glasses where our eyes 'type' much faster than we could possibly talk. Predicative text keyboards tracking eyeballs is something that already exists. I wonder if AI and smartglasses is a natural combo for a future formfactor. Meta seems to be leaning that way with their RayBan collaboration and rumors of adding a screen to the lenses.


Sci-fi may be showing the way again- subvocalization voice recognition or ‘mental’ speech recognition seem the obvious medium term answers.


I am also very skeptical about voice, not least because I've been disappointed daily by a decade of braindead idiot "assistants" like Siri, Alexa, and Google Assistant (to be clear I am criticizing only pre-LLM voice assistants).

The problem with voice input to me is mainly knowing when to start processing. When humans listen, we stream and process the words constantly and wait until either a detection that the other person expects a response (just enough of a pause, or a questioning tone), or as an exception, until we feel we have justification to interrupt (e.g. "Oh yeah, Jane already briefed me on the Johnson project")

Even talking to ChatGPT which embarrasses those old voice bots, I find that it is still very bad at guessing when I'm done when I'm speaking casually, and then once it's responded with nonsense based on a half sentence, I feel it's a polluted context and I probably need to clear it and repeat myself. I'd rather just type.

I think there's not much need to stream the spoken tokens into the model in realtime given that it can think so fast. I'd rather it just listen, have a specialized model simply try to determine when I'm done, and then clean up and abridge my utterance (for instance, when I correct myself) and THEN have the real LLM process the cleaned-up query.


> In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts

I doubt it. The keyboard and mouse are fit predators, and so are programming, query, and markup languages. I wouldn't dismiss them so easily. This guy has a point: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...


It's an interesting one, a problem I feel is coming to the fore more often. I feel typing can be too cumbersome to communicate what I want, but at the same time, speaking I'm imprecise and sometimes would prefer the privacy a keyboard allows. Both have cons.

Perhaps brain interface, or even better, it's so predictive it just knows what I want most of the time. Imagine that, grunting and getting what I want.


> Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent.

Oh, I know! Let's call it... "requirements management"!


brain-computer interface will kill the keyboard, not voice. imho


I disagree. A keyboard enforces a clarity and precision of information that does not naturally arise from our internal thought processes. I'm sure many people here have thought they understood something until they tried to write it down in precise language. It's the same sort of reason we use a rigid symbolic language for mathematics and programming rather than natural language with all its inherent ambiguities.

Dijkstra has more thoughts on this

https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...


why can't the brain interface be a virtual keyboard that i "type" on?


If that ever exists.

A BCI able to capture sufficient nuance to equal voice is probably further out than the lifespan of anyone commenting here.


5 years ago, almost everyone in this forum would have said that something like GPT-5 "is probably further out than the lifespan of anyone commenting here."


It has been more than 5 years since the release of GPT-3.

GPT-5 is a marginal, incremental improvement over GPT-4. GPT-4 was a moderate, but not groundbreaking, improvement over GPT-3. So, "something like GPT-5" has existed for longer than the timeline you gave.

Let's pretend the above is false for a moment though, and rewind even further. I still think you're wrong. Would people in 2015 have said "AI that can code at the level of a CS college grad is a lifespan away"? I don't think so, no. I think they would have said "That's at least a decade away", anytime pre-2018. Which, sure, maybe they were a couple years off, but if it seemed like that was a decade away in 2015, well, it's been a decade since 2015.


GPT-4 was a massive improvement over GPT-3.5, which was a moderate improvement over GPT-3.

GPT-5 is not that big of a leap, but when you compare it to the original GPT-4, it's also not a marginal improvement.


GPT-2 to 3 was the only really "groundbreaking" one. 3 to 3.5, 3.5 to 4, were all just differences in degree, not in kind.


it really just needs to let me create text faster/better than typing does, i'm not sure it needs to be voice based at all. maybe we "imagine" typing on a keyboard or move a fantom appendage or god knows what


It needs to be as accurate as the typing, though. Voice can do that. A BCI cannot capture a nuanced sentence.


I can't get voice accurate. For some people it might be but nothing understands my accent. It's very frustrating.


They're ~10 years or out so, based on current research.


Perpetually 10 years out you mean? BCI tech has not meaningfully changed in the last 10 years.


Agreed, but feels like brain-computer interfaces ready for mass adoption will not be available for another decade or two.


AI is more like a compiler. Much like we used to write in C or python which compiles down to machine code for the computer, we can now write in plain English, which is ultimately compiled down to machine code.


I get your analogy, but LLMs are inherently non deterministic. That’s the last thing you want your compiler to be.


Non-determinism is a red herring, and the token layer is a wrong abstraction to use for this, as determinism is completely orthogonal to correctness. The model can express the same thing in different ways while still being consistently correct or consistently incorrect for the vague input you give it, because nothing prevents it from setting 100% probability to the only correct output for this particular input. Internally, the model works with ideas, not tokens, and it learns the mapping of ideas to ideas, not tokens to tokens (that's why e.g. base64 is just essentially another language it can easily work with, for example).


No. Humans think it maps to ideas. This is the interpretation being done by the observer being added to the state of the system.

The system has no ideas, it just has its state.

Unless you are using ideas as a placeholder for “content” or “most likely tokens”.


That's irrelevant semantics, as terms like ideas, thinking, knowledge etc. are ill-defined. Sure, you can call it points in the hidden state space if you want, no problem. Fact is, the correctness is different from determinism, and the forest of what's happening inside doesn't come down to the trees of most likely tokens, which is well supported by research and very basic intuition if you ever tinkered with LLMs - they can easily express the same thing in a different manner if you tweak the autoregressive transport a bit by modifying its output distribution or ban some tokens.

There are a few models of what's happening inside that hold different predictive power, just like how physics has different formalisms for e.g. classical mechanics. You can probably use the same models for biological systems and entire organizations, collectives, and processes that exhibit learning/prediction/compression on a certain scale, regardless of the underlying architecture.


You're right. But many people are using it just like a compiler (by blindly accepting its outputs). Not saying that's a good thing...


They are deterministic. Random seeding makes them not. But thats a feature.


even with t=0 they are stochastic. e.g., non associative nature of floating point operations


That is an artifact of implementation. You can absolutely implement it using strict FP. But even if not, any given implementation will still do things in a specific order which can be documented. And then if you're running quantized (including KV cache), there's a lot less floating point involved.


Doesn’t changing even one word in your prompt affect the output?


Yes, and completely unpredictably.


LLMs are nothing like compilers. This sort of analogy based verbal reasoning is flimsy, and I understand why it correlates with projecting intelligence onto LLM output.


We are just not used to non-deterministic translation of computer programs and LLMs are very good at non-deterministic translation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: