More

merlincorey · 2026-01-13T23:33:46 1768347226

Which ones are you claiming have already been achieved?

My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.

For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.

Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.

bspammer · 2026-01-14T00:06:50 1768349210

The bug-free code one feels unfalsifiable to me. How do you prove that 10,000 lines of code is bug-free, and then there's a million caveats about what a bug actually is and how we define one.

The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.

verse · 2026-01-14T00:30:45 1768350645

I agree with you but I'd point out that unless you've read the book it's difficult to know if the answer you got was accurate or it just kinda made it up. In my experience it makes stuff up.

Like, it behaves as if any answer is better than no answer.

evrydayhustling · 2026-01-14T01:40:36 1768354836

So do humans asked to answer tests. The appropriate thing is to compare to human performance at the same task.

At most of these comprehension tasks, AI is already superhuman (in part because Gary picked scaled tasks that humans are surprisingly bad at).

rafaelmn · 2026-01-14T03:37:59 1768361879

You can't really compare to human performance because the failure modes and performance characteristics are so different.

In some instances you'll get results that are shockingly good (and in no time), in others you'll have a grueling experience going in circles over fundamental reasoning, where you'd probably fire any person on the spot for having that kind of a discussion chain.

And there's no learning between sessions or subject area mastery - results on the same topic can vary within same session (with relevant context included).

So if something is superhuman and subhuman a large percentage of time but there's no good way of telling which you'll get or how - the result isn't the average if you're trying to use the tool.

retrac · 2026-01-14T01:22:58 1768353778

I've done similar loose experiments using a 19th century French novel I have read (L'Agonie by Jean Lombard), and which is quite obscure. There is no English translation of the novel published.

Using Gemini Notebooks I've found it passable at summarizing chapters, listing characters, major theme and so on, and it can do this in English.

"What is the symbolism of the Black Stone?"

> In the sources, the Black Stone (referred to as the Pierre-Noire) serves as the central religious icon of the Syrian Sun cult brought to Rome by the Emperor Elagabalus. Its symbolism is multifaceted, ranging from fertility and virility to a mystical pursuit of universal unity.

> It represents the perennity of the Sun, which fertilizes the world, causing "germs to rise" and spreading them through the atmosphere. It is viewed as the "definitive form" of divinity, intended to absorb and replace the "transitory forms" of all other Roman, Greek, Egyptian, and Persian gods, including the Christian "Kreistos". > Virility and Phallic Symbolism > > The sources explicitly characterize the Black Stone as a gigantic sacred phallus. It is described as: • An icon of virility and the "organ of generation" • A "unisexual icon" that materializes the generative force of nature • A representation of "virility in activity," which is why it is often paraded and elevated in an "orgasmic" or "colossal adoration"

> The Androgyne and Universal Unity - Beyond simple fertility, the philosopher Atillius explains a more complex, "mad" metaphysical project associated with the stone. It symbolizes "Life One" (Vie Une) and the return to a unisexual state of perfection. • The Androgyne: Atillius believes that by pursuing "the male sex by the male sex," the cult "inutilizes" the female sex to eventually create the Androgyne—a self-sufficient being containing both sexes • Unity: The stone signifies the fusion of all generative forces into a single Unity, reversing the "separation of the sexes" which is viewed as a state of unhappiness and impotence. • Marriage of Moon and Sun: The ritual marriage of the goddess Astaroth (representing the Moon and the female principle) to the Black Stone (representing the Sun and the male principle) symbolizes the merging of the Orient and Occident into this unified life principle > > Destruction of the Symbol - The Black Stone ultimately becomes a symbol of Oriental pollution and decadence to the Roman populace. During the final rebellion against Elagabalus, the stone is torn from its temple on the Palatine, defiled with filth, and broken into pieces to ensure that its "signification of Life" would never again dominate Rome.

This is all accurate to the book, even teasing out a couple themes that were only subconsciously present to me.

The NotebookLM version gives citations with links to the original text to support all these assertions, which largely are coherent with that purpose.

The input is raw images of a book scan! Imperfect as it is it still blows my mind. Not that long ago any kind of semantic search or analysis was a very hard AI problem.

daveguy · 2026-01-14T02:06:51 1768356411

"quite obscure" doesn't mean there is nothing in the internet that directly addresses the question.

Here is an english analysis of the text that easily showed up in an internet search:

https://www.cantab.net/users/leonardo/Downloads/Varian%20Sym...

This source includes analysis of "the Black Stone."

retrac · 2026-01-14T02:18:11 1768357091

Not quite the same analysis. The human is better, no surprise. But the NotebookLM output links back to the original book in a very useful way. If you think about it as fuzzy semantic search it's amazing. If you want an essay or even just creativity, yes it's lacking.

daveguy · 2026-01-14T02:25:25 1768357525

It doesn't have to be the same analysis to put it in a partially overlapping vector space. Not saying it wasn't a useful perspective shuffling in the vector space, but it definitely wasn't original.

LLMs haven't solved any of the 2029 predictions as they were posited. But I expect some will be reached by 2029. The AI hype acts like all this is easy. Not by 2029 doesn't mean impossible or even most of the way there.

Workaccount2 · 2026-01-14T03:07:52 1768360072

LLMs will never achieve anything as long as any victory can be hand waved away with "in the training set". Somehow these models have condensed the entire internet down to a few TB's, yet people aren't backing up their terabytes of personal data down to a couple MB using this same tech...wonder why

daveguy · 2026-01-14T15:15:35 1768403735

It wasn't a hand wave. I gave an exact source, which OP admitted was better.

They certainly haven't "condensed the entire internet into a few TBs". People aren't backing up their personal data to a few MB because your assumption is false.

Maybe when people stop hand waving abilities that aren't there we will better understand their use as a tool and not magic.

suddenlybananas · 2026-01-14T08:34:30 1768379670

Surely there is analysis available online in French though?

stingrae · 2026-01-13T23:54:01 1768348441

1 and 2 have been achieved.

4 is close, the interface needs some work to allow nontechnical people use it. (claude code)

fxtentacle · 2026-01-13T23:59:40 1768348780

I strongly disagree. I’ve yet to find an AI that can reliably summarise emails, let alone understand nuance or sarcasm. And I just asked ChatGPT 5.2 to describe an Instagram image. It didn’t even get the easily OCR-able text correct. Plus it completely failed to mention anything sports or stadium related. But it was looking at a cliche baseball photo taken by an fan inside the stadium.

protocolture · 2026-01-14T01:45:50 1768355150

I have had ChatGPT read text in an image, give me a 100% accurate result, and then claim not to have the ability and to have guessed the previous result when I ask it to do it again.

pixl97 · 2026-01-14T03:30:24 1768361424

>let alone understand nuance or sarcasm

I'm still trying to find humans that do this reliably too.

To add on, 5.2 seems to be kind of lazy when reading text in images by default. Feeding it an image it may give the first word or so. But coming back with a prompt 'read all the text in the image' makes it do a better job.

With one in particular that I tested I thought it was hallucinating some of the words, but there was a picture in the picture with small words it saw I missed the first time.

I think a lot of AI capabilities are kind of munged to end users because they limit how much GPU is used.

falloutx · 2026-01-14T00:03:06 1768348986

I dispute 1 & 2 more than 4.

1) Is it actually watching a movie frame by frame or just searching about it and then giving you the answer?

2) Again can it handle very long novels, context windows are limited and it can easily miss something. Where is the proof for this?

4 is probably solved

4) This is more on predictor because this is easy to game. you can create some gibberish code with LLM today that is 10k lines long without issues. Even a non-technical user can do

CjHuber · 2026-01-14T00:19:05 1768349945

I think all of those are terrible indicators, 1 and 2 for example only measure how well LLMs can handle long context sizes.

If a movie or novel is famous the training data is already full of commentary and interpretations of them.

If its something not in the training data, well I don't know many movies or books that use only motives that no other piece of content before them used, so interpreting based on what is similar in the training data still produces good results.

EDIT: With 1 I meant using a transcript of the Audio Description of the movie. If he really meant watch a movie I'd say thats even sillier because well of course we could get another Agent to first generate the Audio Description, which definitely is possible currently.

zdragnar · 2026-01-14T00:25:37 1768350337

Just yesterday I saw an article about a police station's AI body cam summarizer mistakenly claim that a police officer turned into a frog during a call. What actually happened was that the cartoon "princess and the frog" was playing in the background.

Sure, another model might have gotten it right, but I think the prediction was made less in the sense of "this will happen at least once" and more of "this will not be an uncommon capability".

When the quality is this low (or variable depending on model) I'm not too sure I'd qualify it as a larger issue than mere context size.

CjHuber · 2026-01-14T00:32:20 1768350740

My point was not that those video to text models are good like they are used for example in that case, but more generally I was referring to that list of indicators. Like surely when analysing a movie it is alright if some things are misunderstood by it, especially as the amount of misunderstanding can be decreased a lot. That AI body camera surely is optimized on speed and inference cost. but if you give an agent 10 1s images along with the transcript of that period and the full prior transcript, and give it reasoning capabilities, it would take almost endlessy for that movie to process but the result surely will be much better than the body cameras. After all the indicator talks about "AI" in general so judge a model not optimized for capability but something else to measure on that indicator

merlincorey · 2026-01-13T23:16:36 1768346196

> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.

Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.

However, in the end, execution is all that matters so if you and your cofounder are able to execute successfully with mountains of generated code then it doesn't matter what assets and liabilities you hold in the short term.

The long term is a lot harder to predict in any case.

_vertigo · 2026-01-13T23:18:50 1768346330

> Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.

Code that solves problems and makes you money is by definition an asset. Whether or not the code in question does those things remains to be seen, but code is not strictly a liability or else no one would write it.

merlincorey · 2026-01-13T23:23:23 1768346603

"Code is a liability. What the code does for you is an asset." as quoted from https://wiki.c2.com/?SoftwareAsLiability with Last edit December 17, 2013.

This discussion and distinction used to be well known, but I'm happy to help some people become "one of today's lucky 10,000" as quoted from https://xkcd.com/1053/ because it is indeed much more interesting than the alternative approach.

jvanderbot · 2026-01-14T13:18:59 1768396739

Code requires maintenance, which grows with codebase size, minus some decay over time. (LLMs do not change this, and might actually be more sensitive to this), So increasing code size, esp with new code, implies future costs, which meets the definition of a liability on a LOC kinda-sorta-basis.

It's not right but it's not wrong either. It at least was a useful way to think about code, and we'll see if that applies in LLM era.

sswatson · 2026-01-14T04:28:10 1768364890

It’s well known and also wrong.

Delta’s airplanes also require a great deal of maintenance, and I’m sure they strive to have no more than are necessary for their objectives. But if you talk to one of Delta’s accountants, they will be happy to disabuse you of the notion that the planes are entered in the books as a liability.

jvanderbot · 2026-01-14T13:23:52 1768397032

Whoa whoa whoa let's not bring the accountants in!

Code isn't a liability b/c it costs money (though it does). Code is a liability like an unsafe / unproven bridge is a liability. It works fine until it doesn't - and at that point you're in trouble. Just b/c you can build lots of bridges now, doesn't mean each new bridge isn't also a risk. But if you gotta get somewhere now, conjuring bridges might be the way to go. Doesn't make each bridge not a liability (risky thing to rely on) or an asset (thing you can sell, use to build value)

dpark · 2026-01-14T18:31:18 1768415478

Even proven code is a liability. The point of it being a liability is that it costs time and effort to maintain and update.

The same with the bridge. Even the best built and most useful bridge requires maintenance. Assuming changing traffic patterns, it might equally require upgrades and changes.

The problem with this whole “code is a liability” thing is that it’s vacuous. Your house is a liability. The bridge that gets you to work as a liability. Everything that requires any sort of maintenance or effort or upkeep or other future cost is ina sense a liability. This isn’t some deep insight though. This is like saying your bones could break so they are liability. OK, but their value drastically outweighs any liability they impose.

hshdhdhj4444 · 2026-01-14T10:39:39 1768387179

If Delta was going bankrupt it would likely be able to sell individual planes for the depreciated book value or close to it.

If a software company is going bankrupt, it’s very unlikely they will be able to sell code for individual apps and services they may have written for much at all, even if they might be able to sell the whole company for something.

dpark · 2026-01-14T18:33:00 1768415580

The other half of the quote about liability is that the capabilities of the code are an asset. I don’t know if your bankrupt company would be able to sell their code, but they sure as hell couldn’t sell their capabilities without the code.

foobarchu · 2026-01-15T00:37:03 1768437423

If we're bringing in other industries, you'd be wise to consider banking. Savings accounts are something most people would consider an asset, because it's money the bank has on hand and can use for loan purposes.

But it's the opposite, deposits are liabilities because they need interest paid out and can be withdrawn at any time.

Just because the company has a thing that could be assigned value doesn't make it automatically an asset.

_heimdall · 2026-01-14T05:05:09 1768367109

You're hinting at the underlying problem with the quote. "Asset" in the quote reads, at least to me, in the financial or accounting meaning of the term. "Liability" reads, again to me, in the sense of potential risk rather than the financial meaning. Its apples and oranges.

Ygg2 · 2026-01-14T10:17:27 1768385847

Liability is also an economic term. As in, "The bank's assets (debt) are my liability, and my assets (house) are the bank's liability."

I don't think it's a wrong quote. Code's behavior is the asset, and code's source is the liability. You want to achieve maximum functionality for minimal source code investment.

_heimdall · 2026-01-14T14:44:21 1768401861

Sorry, my point wasn't that liability doesn't have a meaning in finance. My read of the quote is that it uses liability in the sense of risk not debt on a balance sheet.

I could always be wrong though, that was just my interpretation of it. I don't get how code could be a liability in the financial sense, but I do get how every line of code risks bugs and other issues.

Ygg2 · 2026-01-14T15:02:57 1768402977

Sure, but all code is a potential future debt.

You wrote a music player that only allows one artist from list of all artists? Tech debt.

You wrote optimized assembly for x86_64? It's the year 2060, and we only support NGPU_ARM_N_LEG.

The moment your expectations change (which is all the time), your code needs to be changed, and effort isn't free.

_heimdall · 2026-01-16T04:12:59 1768536779

Tech debt is not part of a financial account or disclosure though. Yes those are forms of debt, no they aren't financial debts or financial liabilities.

OneMorePerson · 2026-01-14T04:50:56 1768366256

It's possible for something to be both an asset and a potential liability, it isn't strictly one or the other.

kortilla · 2026-01-14T04:35:29 1768365329

Delta leases a big portion of its fleet, which makes your example pretty bad.

simonsmithies · 2026-01-14T04:45:35 1768365935

Not a terrible example. The planes delta owns are delta’s assets; the planes the leasing company owns are the leasing company’s assets. The point is, the code and the planes are assets despite the maintenance required to keep them in revenue-generating state.

tom_m · 2026-01-15T04:27:00 1768451220

Not a very valuable one. Never had been. That's the funny part. So many people want software but then don't know what to do once they have it.

wouldbecouldbe · 2026-01-13T23:51:23 1768348283

Developers that can’t see the change are blind.

Just this week, sun-tue. I added a fully functional subscription model to an existing platform, build out a bulk async elasticjs indexing for a huge database and migrated a very large Wordpress website to NextJS. 2.5 days, would have cost me at least a month 2 years ago.

fxtentacle · 2026-01-14T00:07:24 1768349244

To me, this sounds like:

AI is helping me solve all the issues that using AI has caused.

Wordpress has a pretty good export and Markdown is widely supported. If you estimate 1 month of work to get that into NextJS, then maybe the latter is not a suitable choice.

serf · 2026-01-14T05:38:14 1768369094

it's wild that somehow with regards to AI conversations lately someone can say "I saved 3 months doing X" and someone can willfully and thoughtfully reply "No you didn't , you're wrong." without hesitation.

I feel bad for AI opponents mostly because it seems like the drive to be against the thing is stronger than the drive towards fact or even kindness.

My .02c: I am saving months of efforts using AI tools to fix old (PRE-AI, PREHISTORIC!) codebases that have literally zero AI technical debt associated to them.

I'm not going to bother with the charts & stats, you'll just have to trust me and my opinion like humans must do in lots of cases. I have lots of sharp knives in my kitchen, too -- but I don't want to have to go slice my hands on every one to prove to strangers that they are indeed sharp -- you'll just have to take my word.

jbgt · 2026-01-14T06:03:05 1768370585

Slice THEIR hands. They might say yours are rigged.

I'm a non dev and the things I'm building blow me away. I think many of these people criticizing are perhaps more on the execution side and have a legitimate craft they are protecting.

If you're more on the managerial side, and I'd say a trusting manager not a show me your work kind, then you're more likely to be open and results oriented.

array_key_first · 2026-01-14T07:09:30 1768374570

From a developer POV, or at least my developer POV, less code is always better. The best code is no code at all.

I think getting results can be very easy, at first. But I force myself to not just spit out code, because I've been burned so, so, so many times by that.

As software grows, the complexity explodes. It's not linear like the growth of the software itself, it feels exponential. Adding one feature takes 100x the time it should because everything is just squished together and barely working. Poorly designed systems eventually bring velocity to a halt, and you can eventually reach a point where even the most trivial of changes are close to impossible.

That being said, there is value in throwaway code. After all, what is an Excel workbook if not throwaway code? But never let the throwaway become a product, or grow too big. Otherwise, you become a prisoner. That cheeky little Excel workbook can turn into a full-blown backend application sitting on a share drive, and it WILL take you a decade to migrate off of it.

wouldbecouldbe · 2026-01-14T11:31:10 1768390270

yeah AI is perfect at refactor and cleaning things up, you just have to instruct it. I've improved my code significanlty by asking it to clean up, refactor function to pure that I can use & test over a messy application. Without creating new bugs.

xmcp123 · 2026-01-15T13:56:16 1768485376

Holy hell, AI is not at all perfect at refactoring. Absolutely terrified on your behalf if you believe this to be the case.

mycall · 2026-01-14T07:59:26 1768377566

You can use AI to simplify software stacks too, only your imagination limits you. How do you see things working with many less abstraction layers?

I remember coding BASIC with POKE/PEEK assembly inside it, same with Turbo Pascal with assembly (C/C++ has similar extern abilities). Perhaps you want no more web or UI (TUI?). Once you imagine what you are looking for, you can label it and go from there.

rerdavies · 2026-01-14T15:08:17 1768403297

I am a (very) senior dev with decades of experience. And I, too, am blown away by the massive productivity gains I get from the use of coding AIs.

Part of the craft of being a good developer is keeping up with current technology. I can't help thinking that those who oppose AI are not protecting legitimate craft, but are covering up their own laziness when it comes to keeping up. It seems utterly inconceivable to me that anyone who has kept up would oppose this technology.

There is a huge difference between vibe coding and responsible professional use of AI coding assistants (the principle one, of course, being that AI-generated code DOES get reviewed by a human).

But that, being said, I am enormously supportive of vibe coding by amateur developers. Vibe coding is empowering technology that puts programming power into the hands of amateur developers, allowing them to solve the problems that they face in their day-to-day work. Something that we've been working toward for decades! Will it be professional-quality code? No. Of course not. Will it do what it needs to do? Invariably, yes.

xmcp123 · 2026-01-15T13:57:56 1768485476

I think the issue is that most vibe coders believe it is professional quality code, or is sufficient moving forward.

It produces code (in the hands of an amateur) that is good enough for a demo or at best an MVP, but it’s not at all a stable foundation.

immibis · 2026-01-14T13:03:51 1768395831

Just look at the METR study. All predictions were massive time savings but all observations were massive time losses. That's why we don't believe you when you say you saved time.

solumunus · 2026-01-15T06:09:40 1768457380

You should know better than to form a opinion from one study. I could show you endless examples of a study concluding untrue things, endless…

I’ve been full time (almost solo) building an ERP system for years and my development velocity has gone roughly 2x. The new features are of equal quality, everything is code reviewed, everything is done in my style, adhering to my architectural patterns. Not to mention I’ve managed to build a mobile app alongside my normal full time work, something I wouldn’t have even had the time to attempt to learn about without the use of agents.

So do you think I’m lying or do you just think my eyes are deceiving me somehow?

Dylan16807 · 2026-01-15T09:14:56 1768468496

I think any measurement of development velocity is shaky, especially when measured between two different workflows, and especially when measured by the person doing the development.

Such an estimate is far less reliable than your eyes are.

So if people want to do more and better studies, that sounds great. But I have a good supply of salt for self-estimates. I'm listening to your input, but it's much easier for your self-assessment to have issues than you're implying.

fuy · 2026-01-15T11:37:27 1768477047

Not saying you're wrong, but solo developers building (relatively) greenfield projects are the best bet for increased AI productivity.

Solo dev projects are usually reasonably sized (< million LOC), style is more uniform, there's fewer silos etc. etc.

Good studies look at a broader picture.

solumunus · 2026-01-15T11:52:41 1768477961

It’s a very good point. I have full control and everything is incredibly uniform, and more recently designed with agents in mind. This must make things significantly easier for the LLM.

mattmaroon · 2026-01-14T14:09:03 1768399743

It is wild. I must admit I have a bit of Gell Mann amnesia when it comes to HN comments. I often check them to see what people think about an article, but then every time the article touches on something I know deeply, I realize it’s all just know-it-all puffery. Then I forget and check it when it’s on the many things I do not know much about.

My cofounder is extremely technically competent, but all these people are like good luck with your spaghetti vibe code. It’s humorous.

wouldbecouldbe · 2026-01-14T09:58:28 1768384708

You are assuming a lot of things.

The work was moving the many landing pages & content elements to NextJS, so we can test, iterate and develop faster. While having a more stable system. This was a 10 year old website, with a very large custom WordPress codebase and many plugins.

The content is still in WordPress backend & will be migrated in the second phase.

tengbretson · 2026-01-14T04:02:57 1768363377

To me, this sounds like:

If AI was good at a certain task then it was a bad task in the first place.

Which is just run of the mill dogmatic thinking.

6510 · 2026-01-14T18:06:47 1768414007

There is much going on in that exchange.

I don't even know what a Wordpress site is anymore.

> then maybe the latter is not a suitable choice.

But now it only takes days which makes it suitable?

There also is the paradoxical question if it is worth the time from someone who knows what they are doing? how would you even tell?

Zababa · 2026-01-14T09:29:50 1768382990

>Code is not an asset it's a liability

This would imply companies could delete all their code and do better, which doesn't seem true?

thehappypm · 2026-01-15T03:02:55 1768446175

A more accurate description of code is that it’s a depreciating asset, perhaps, or an asset that requires maintenance cost. Neither of which is a liability

merlincorey · 2025-12-30T19:35:09 1767123309

It sounds like you use your personal Claude Code subscription for work of your employer, but that is not something I would ever consider doing personally so I imagine I must be mistaken.

Can you elaborate slightly on what you pay for personally and what your employer pays for with regards to using LLMs for Enterprise ERP?

rcbdev · 2025-12-31T10:28:36 1767176916

Freelancers regularly use tools such as Copilot and Claude, it's always handled professionally and in agreement with their customers. I've seen other freelancers do it plenty of times in the last 1-2 years at my customer sites.

Why so narrow minded?

merlincorey · 2025-12-31T20:46:39 1767213999

I'm inquisitive not narrow minded.

The GP didn't mention anything about freelancing so unless you know them or stalked them you are perhaps being narrow minded here.

rcbdev · 2026-01-01T21:25:45 1767302745

They also never said anything about being employed.

You are being narrow minded here.

merlincorey · 2026-01-01T22:25:47 1767306347

Again, I disagree and reaffirm my being full of inquisitiveness.

You are being downright unpleasant and I don't think we should continue this conversation further until you open your mind.

rcbdev · 2026-01-03T13:57:52 1767448672

Ditto.

merlincorey · 2026-01-03T19:29:44 1767468584

I'm glad we agree.

solumunus · 2026-01-01T06:58:06 1767250686

I own my own business.

merlincorey · 2026-01-01T22:29:00 1767306540

Interesting and thanks for clarifying that aspect. I have a few more questions if you would be able to answer any of them at any level of detail I would appreciate it.

How much would you be willing to pay to continue using Claude on a monthly basis before you stopped?

Do you currently maintain the new (as of two weeks ago) cash reserve to ensure it continues working when limits are reached and how much do you reserve for said reserve?

Finally, do you send your customer's code or data directly to Claude or do you use it indirectly on generic stuff and then manually specialize the outputs?

merlincorey · 2025-12-29T22:42:40 1767048160

> The $200/month plan doesn't have limits either... once you've expended your rate limited token allowance... pay for the extra tokens out of an additional cash reserve you've set up

You're absolutely right! Limited token allowance for $200/month is actually unlimited tokens when paying for extra from a cash reserve which is also unlimited, of course.

simonw · 2025-12-29T22:45:24 1767048324

I think you may have misunderstood something here.

When paying for Claude Max even at $200/month there are limits - you have a limit to the number of tokens you can use per five hour period, and if you run out of that you may have to wait an hour for the reset.

You COULD instead use an API key and avoid that limit and reset, but that would end up costing you significantly more since the $200/month plan represents such a big discount on API costs.

As-of a few weeks ago there's a third option: pay for the $200/month plan but allow it to charge you extra for tokens when you reach those limits. That gives you the discount but means your work isn't interrupted.

Extra Usage for Paid Claude Plans: https://support.claude.com/en/articles/12429409-extra-usage-...

merlincorey · 2025-12-29T22:57:13 1767049033

Thank you for the explanation, but I did fully understand that is what you were saying.

What I don't fully understand is how you can characterize that as "not limited" with a straight face; then again, I can't see your face so maybe you weren't straight faced as you wrote it in the first place.

Hopefully you could see my well meaning smile with the "absolutely right" opening, but apparently that's no longer common so I can understand your confusion as https://absolutelyright.lol/ indicates Opus 4.5 has had it RLHF'd away.

simonw · 2025-12-29T23:15:02 1767050102

When I said "not limited" I meant "no longer limits your usage with a hard stop when you run out of tokens for a five hour period any more like it did until a few weeks ago".

That's why I said "not limited" as opposed to "unlimited" - a subtle difference in word choice, I'll give you that.

merlincorey · 2025-12-14T02:27:52 1765679272

LLMs aren't calculators; for example, your calculator always gives you the same outputs given the same inputs.

Long division is a pretty simple algorithm that you can easily and quickly relearn if needed even your LLM of choice can likely explain that to you given there's plenty of writing about it in books and on the internet.

merlincorey · 2025-12-13T20:40:59 1765658459

I believe prompting an AI is more like delegation than abstraction especially considering the non-deterministic nature of the results.

sarchertech · 2025-12-13T21:29:01 1765661341

It does further than non-determinism. LLM output is chaotic. 2 nearly identical prompts with a single minor difference can result in 2 radically different outputs.

merlincorey · 2025-10-22T06:40:12 1761115212

Too bad they haven't done a release song since 7.3: https://www.openbsd.org/lyrics.html#73

merlincorey · 2025-09-23T18:06:27 1758650787

It seems we're still collectively trying to figure out the boundaries of "delegation" versus "abstraction" which I personally don't think are the same thing, though they are certainly related and if you squint a bit you can easily argue for one or the other in many situations.

> We've gotten claude code to handle 300k LOC Rust codebases, ship a week's worth of work in a day, and maintain code quality that passes expert review.

This seems more like delegation just like if one delegated a coding task to another engineer and reviewed it.

> That in two years, you'll be opening python files in your IDE with about the same frequency that, today, you might open up a hex editor to read assembly (which, for most of us, is never).

This seems more like abstraction just like if one considers Python a sort of higher level layer above C and C a higher level layer above Assembly, except now the language is English.

Can it really be both?

dhorthy · 2025-09-23T18:36:41 1758652601

I would say its much more about abstraction and the leverage abstractions give you.

You'll also note that while I talk about "spec driven development", most of the tactical stuff we've proven out is downstream of having a good spec.

But in the end a good spec is probably "the right abstraction" and most of these techniques fall out as implementation details. But to paraphrase sandy metz - better to stay in the details than to accidentally build against the wrong abstraction (https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction)

I don't think delegation is right - when me and vaibhav shipped a week's worth of work in a day, we were DEEPLY engaged with the work, we didn't step away from the desk, we were constantly resteering and probably sent 50+ user messages that day, in addition to some point-edits to markdown files along the way.

sarchertech · 2025-09-24T00:49:51 1758674991

It’s definitely not abstraction. You don’t watch a compiler output machine code and constantly “resteer” it.

sothatsit · 2025-09-24T01:33:27 1758677607

I continue to write codebases in programming languages, not English. LLM agents just help me manipulate that code. They are tools that do work for me. That is delegation, not abstraction.

To write and review a good spec, you also need to understand your codebase. How are you going to do that without reading the code? We are not getting abstracted away from our codebases.

For it to be an abstraction, we would need our coding agents to not only write all of our code, they would also need to explain it all to us. I am very skeptical that this is how developers will work in the near future. Software development would become increasingly unreliable as we won't even understand what our codebases actually do. We would just interact with a squishy lossy English layer.

dhorthy · 2025-09-24T01:46:21 1758678381

You don’t think early c programmers spent a lot of time reading the assembly that was produced?

sarchertech · 2025-09-24T02:20:28 1758680428

No not really. They didn’t need to spend a lot of time looking at the output because (especially back then) they mostly knew exactly what the assembly was going to look like.

With an LLM, you don’t need to move down to the code layer so you can optimize a tight loop. You need to look at the code so you can verify that the LLM didn’t write a completely different program that what you asked it to write.

sothatsit · 2025-09-24T03:33:11 1758684791

Probably at first when the compiler was bad at producing good assembly. But even then, the compiler would still always produce code that matches the rules of the language. This is not the case with LLMs. There is no indication that in the future LLMs will become deterministic such that we could literally write codebases in English and then "compile" them using an LLM into a programming language of our choice and rely on the behaviour of the final program matching our expectations.

This is why LLMs are categorically not compilers. They are not translating English code into some other type of code. They are taking English direction and then writing/editing code based upon that. They are working on a codebase alongside us, as tools. And then you still compile that code using an actual compiler.

We will start to trust these tools more and more, and probably spend less time reviewing the code they produce over time. But I do not see a future where professional developers completely disregard the actual codebase and rely entirely on LLMs for code that matters. That would require a completely different category of tools than what we have today.

rsynnott · 2025-09-24T08:44:23 1758703463

I mean, the ones who were actually _writing_ a C compiler, sure, and to some who were in performance critical spaces (early C compilers were not _good_). But normal programmers, checking for correctness, no, absolutely not. Where did you get that idea?

(The golden age of looking at compiler-generated assembly would've been rather later, when processors added SIMD instructions and compilers started trying to get clever about using them.)

merlincorey · 2025-09-15T20:40:21 1757968821

There's a section on "why not printf" which is Standard C, but I can't find any section on "why not std::format"[1] which is Standard C++ since C++20 and works on all major compilers today in 2025.

They do mention "std::print"[2] from C++23 (which uses std::format) and compile times, but, they don't touch on "std::format" at all.

See:

[1] https://en.cppreference.com/w/cpp/utility/format/format.html

[2] https://en.cppreference.com/w/cpp/io/print.html

amomchilov · 2025-09-16T11:26:42 1758022002

This is the eternal selection pressure that slows new C++ adoption.

The kinds of places still waiting C++ aren’t usually the ones that put much emphasis on using a compiler from the past decade.

Java 8 and C++98 will be here forever lol

hoten · 2025-09-15T21:04:06 1757970246

Is it in major compilers yet? Last I checked for MSVC it was behind a "latest" compiler flag (not C++20). I've been vendoring the fmt library for awhile now.

shakna · 2025-09-15T21:35:48 1757972148

From GCC 13, and clang 17. (2023).

Unfortunately, MSVC, always lags and fails to implement some things.

pton_xd · 2025-09-15T22:00:29 1757973629

std::print / std::format also bloat the binary size, which is a consideration for some platforms (eg WASM).

merlincorey · 2025-09-15T05:32:27 1757914347

Was the website the only thing down?

This community Starlink Status page doesn't seem to show any outage: https://starlinkstatus.space/

mlyle · 2025-09-15T05:39:27 1757914767

The funny thing with the community status page is that stations can't report they're down when they're down :P There's big holes in individual stations' history and it looks normal.

https://www.reddit.com/r/Starlink/ paints a different picture

Perhaps due to geomagnetic storms, though stronger ones have not caused outages. Possibly just because.