Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But economically, it is still much better to buy a lower spec't laptop and to pay a monthly subscription for AI.

However, I agree with the article that people will run big LLMs on their laptop N years down the line. Especially if hardware outgrows best-in-class LLM model requirements. If a phone could run a 512GB LLM model fast, you would want it.



Are you sure the subscription will still be affordable after the venture capital flood ends and the dumping stops?


100% yes.

The amount of compute in the world is doubling over 2 years because of the ongoing investment in AI (!!)

In some scenario where new investment stops flowing and some AI companies go bankrupt all that compute will be looking for a market.

Inference providers are already profitable so with cheaper hardware it will mean even cheaper AI systems.


You should probably disclose that you're a CTO at an AI startup, I had to click your bio to see that.

> The amount of compute in the world is doubling over 2 years because of the ongoing investment in AI (!!)

All going into the hands of a small group of people that will soon need to pay the piper.

That said, VC backed tech companies almost universally pull the rug once the money stops coming in. And historically those didn't have the trillions of dollars in future obligations that the current compute hardware oligopoly has. I can't see any universe where they don't start charging more, especially now that they've begun to make computers unaffordable for normal people.

And even past the bottom dollar cost, AI provides so many fun, new, unique ways for them to rug pull users. Maybe they start forcing users to smaller/quantized models. Maybe they start giving even the paying users ads. Maybe they start inserting propaganda/ads directly into the training data to make it more subtle. Maybe they just switch out models randomly or based on instantaneous hardware demand, giving users something even more unstable than LLMs already are. Maybe they'll charge based on semantic context (I see you're asking for help with your 2015 Ford Focus. Please subscribe to our 'Mechanic+' plan for $5/month or $25 for 24 hours). Maybe they charge more for API access. Maybe they'll charge to not train on your interactions.

I'll pass, thanks.


I'm not longer CTO at an AI startup. Updated, but don't actually see how that is relevant.

> All going into the hands of a small group of people that will soon need to pay the piper.

It's not very small! On the inference side there are many competitive providers as well as the option of hiring GPU servers yourself.

> And historically those didn't have the trillions of dollars in future obligations that the current compute hardware oligopoly has. I can't see any universe where they don't start charging more, especially now that they've begun to make computers unaffordable for normal people.

I can't say how strongly I disagree with this - it's just not how competition works, or how the current market is structured.

Take gpt-oss-120B as an example. It's not frontier level quality but it's not far off and certainly gives a strong redline that open source models will never get less intelligent than.

There is a competitive market in hosting providers, and you can see the pricing here: https://artificialanalysis.ai/models/gpt-oss-120b/providers?...

In what world is there a way in which all the providers (who are want revenue!) raise prices above the premium price Cerebas is charging for their very high speed inference?

There's already Google, profitable serving at the low-end at around half the price of Cerebas (but then you have to deal with Google billing!)

The fact that Azure/Amazon are all pricing exactly the same as 8(!) other providers as well as the same price https://www.voltagepark.com/blog/how-to-deploy-gpt-oss-on-a-... gives for running your own server shows how the economics work on NVidia hardware. There's no subsidy going on there.

This is on hardware that is already deployed. That isn't suddenly going to get more expensive unless demand increases... in which case the new hardware coming online over the next 24 months is a good investment, not a bad one!


Datacenters full of GPU hosts aren't like dark fiber - they require massive ongoing expense, so the unit economics have to work really well. It is entirely possible that some overbuilt capacity will be left idle until it is obsolete.


The ongoing costs are mostly power, and aren't that massive compared to the investment.

No one is leaving an H100 cluster not running because the power costs too much - this is why remnants markets like Vast.ai exist.


They absolutely will leave them idle if the market is so saturated that no one will pay enough for tokens to cover power and other operational costs. Demand is elastic but will not stretch forever. The build out assumes new applications with ROI will be found, and I'm sure they will be, but those will just drive more investment. A massive over build is inevitable.


Of course!

But the operational costs are much lower than some people in this thread seem to think.

You can find a safe margin for the price by looking at aggregators.

https://gpus.io/gpus/h100 is showing $1.83/hour lowest price, around $2.85 average.

That easily pays running costs - a H100 server with cooling etc is around $0.10/hour to keep running

And a massive overbuild pushes prices down not up!


> Inference providers are already profitable.

That surprises me, do you remember where you learned that?


Lots of sources, and you can do the math yourself.

Here's a few good ones:

https://github.com/deepseek-ai/open-infra-index/blob/main/20... (suggests Deepseek is making 80% raw margin on inference)

https://www.snellman.net/blog/archive/2025-06-02-llms-are-ch...

https://martinalderson.com/posts/are-openai-and-anthropic-re... (there's a HN discussion of this where it was pointed out this overestimates the costs)

https://www.tensoreconomics.com/p/llm-inference-economics-fr... (long, but the TL;DR is that serving Lllama 3.3 70B costs around $0.28/million tokens input, $0.95 output at high utilization. These are close to what we see in the market: https://artificialanalysis.ai/models/llama-3-3-instruct-70b/... )


> The amount of compute in the world is doubling over 2 years because of the ongoing investment in AI (!!)

which is funded by the dumping

when the bubble pops: these DCs are turned off and left to rot, and your capacity drops by a factor of 8192


> which is funded by the dumping

What dumping do you mean?

Are you implying NVidia is selling H200s below cost?

If not then you might be interested to see that Deepseek has released there inference costs here: https://github.com/deepseek-ai/open-infra-index/blob/main/20...

If they are losing money it's because they have a free app they are subsidizing, not because the API is underpriced.


Doesn't matter now. GP can revisit the math and buy some hardware once the subscription prices actually grow too high.


You have to remember that companies are kind of fungible in the sense that founders can close old companies and start new ones to walk away from bankruptcies in the old companies. When there's a bust and a lot of companies close up shop, because data centers were overbuilt, there's going to be a lot of GPUs being sold at firesale prices - imagine chips sold at $300k today being sold for $3k tomorrow to recoup a penny on the dollar. There's going to be a business model for someone buying those chips at $3k, then offering subscription prices at little more than the cost of electricity to keep the dumped GPUs running somewhere.


I do wonder how usable the hardware will be once the creditors are trying to sell it - as far as I can tell is seems the current trend is more and more custom no-matter-the cost super expensive power-inefficient hardware.

The situation might be a lot different than people selling ex-crypto mining GPUs to gamers. There might be a lot of effective scrap that is no longer usable when it is no longer part of a some companies technological fever dream.


They will go down. Or the company will be gone.


Running an LLM locally means you never have to worry about how many tokens you've used, and also it allows for a lot of low latency interactions on smaller models that can run quickly.

I don't see why consumer hardware won't evolve to run more LLMs locally. It is a nice goal to strive for, which consumer hardware makers have been missing for a decade now. It is definitely achievable, especially if you just care about inference.


isnt this what all these NPUs are created for?


I haven’t seen an NPU that can compete with a GPU yet. Maybe for really small models, I’m still not sure where they are going with those.


> economically, it is still much better to buy a lower spec't laptop and to pay a monthly subscription for AI

Uber is economical, too; but folks prefer to own cars, sometimes multiple.

And how there's market for all kinds of vanity cars, fast sportscars, expensive supercars... I imagine PCs & Laptops will have such a market, too: In probably less than a decade, may be a £20k laptop running a 671b+ LLM locally will be the norm among pros.


> Uber is economical, too

One time I took an Uber to work because my car broke down and was in the shop and the Uber driver (somewhat pointedly) made a comment that I must be really rich to commute to work via Uber because Ubers are so expensive


Most people don't realise the amount of money they spend per year on cars.


Paying $30-$70/day to commute is economical?


if you calculate depreciation and running costs on a new car in most places - I think it probably would be.


If Uber were cheaper than the depreciation and running costs of a car, what would be left for the driver (and Uber)?


a big part of the whole "hack" of Uber in the first place is that people are using their personal vehicles. So the depreciation and many of the running costs are sunk costs already. Once you paid those already it becomes a super good deal to make money from the "free" asset you already own.


My private car provides less than one commute per day, on average.

An Uber car can provide several.


While your car in sitting in the parking lot, the uber driver is utilizing their car throughout the day.


If you’re using uber to and from work, presumably you would buy a car that’s worth more than the 10 year old Prius your uber driver has 200k miles on.


The depreciation would be amortized to cover more than one person. I only travel once or twice per week, it cost me less to use an Uber than to own a car.


> Paying $30-$70/day to commute is economical?

When LLM use approaches this number, running one locally would be, yes. What you and other commentator seem to miss is, "Uber" is a stand-in for Cloud-based LLMs: Someone else builds and owns those servers, runs the LLMs, pays the electricity bills... while its users find it "economical" to rent it.

(btw, taxis are considered economical in parts of the world where owning cars is a luxury)


any "it's cheaper to rent than to own" arguments can be (and must be) completely disregarded due to experience of the last decade

so stop it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: