Right. Too often people conflate (a) risk-loving entrepreneurs and their marketing claims with the (b) realities of usage patterns and associated value-add.
As an example, look at how cars are advertised. If you only paid attention to marketing, you would think everyone is zipping around winding mountain roads in their SUVs, loaded up with backcountry gear. This is not accurate, but nonetheless SUVs are dominant.
I've had the same system (M2 64GB MacBook Pro) for three years.
2.5 years ago it could just about run LLaMA 1, and that model sucked.
Today it can run Mistral Small 3.1, Gemma 3 27B, Llama 3.3 70B - same exact hardware, but those models are competitive with the best available cloud-hosted model from two years ago (GPT-4).
The best hosted models (o3, Claude 4, Gemini 2.5 etc) are still way better than the best models I can run on my 3-year-old laptop, but the rate of improvements for those local models (on the same system) has been truly incredible.
I'm surprised that it's even possible running big models locally.
I agree we will see how this plays out but I hope models might start to become more efficient and it might not matter that much for certain things to run some parts locally.
I could imagine a LLM model with a lot less languages and optimized for one programming language to happen. Like 'generaten your model'
Of course they are. Force has a strong correlation with mass times acceleration. Objects at rest have a high chance of being observed to remain at rest. And so on.
One of the problems programmers have is loading a problem into working memory. It can take an hour. An interruption, a phone call, or a meeting can mean that you have to start over (or, if not completely over, you still have to redo part of it). This is a standard programmer complaint about interruptions.
It's interesting that LLMs may have a similar issue.
GenAI just works. People don't need to be pushed using it and continue using it.
OpenAI has 500 Million active users weekly.