More

MrOrelliOReilly · 2026-01-15T22:32:12 1768516332

But they do have utility functions, which one can interpret as nearly equivalent

MrOrelliOReilly · 2026-01-15T17:53:15 1768499595

I appreciate the Fly.io team’s enthusiasm and am optimistic this will mature into a product I’d pay for, but my initial impression was of a lack of polish.

Documentation is sparse, or not even available? The API docs don’t tell you much about the service itself, and a Google search for docs returns an inaccessible website as the first result (https://docs.sprites.dev). Blog posts and forum threads and Claude skills shouldn’t be a substitute.

The snappiness of the sprites is very cool and I can definitely see it integrating into future Claude Code workflows. But the lack of a base container images means you’re still doing setup work on the sprite before you can begin development. I understand the philosophy is that sandboxes should be persistent, but Claude Code sessions also work better when isolated from each other, so it’d be nice to have some precepts to get a workspace setup quickly (given agentic coding is clearly a target).

I also found the CLI unintuitive but maybe that was just me!

So very cool idea but left with the impression that the Fly.io team’s should have spent a couple weeks on polish before shipping.

tptacek · 2026-01-15T17:55:25 1768499725

You're not wrong. The documentation actually had a hallucinated link to an Anthropic dependency in it when we shipped. Right now the attitude is mostly "if we have to document it extensively, we're doing something wrong". It's been in the works for awhile, with a small team, and we're just getting it out there right now.

I've been needling Kurt for several months now that if we wait until it's polished enough that we don't see comments like this, we're doing it wrong.

macNchz · 2026-01-15T23:20:48 1768519248

For what it's worth, I evaluated Fly.io during a divorce from Heroku some time in mid 2022 (I think), found the platform was... way too rough around the edges at the time to want to migrate any real workloads. I kept it on my radar and shipped an MVP with it in 2024, found it was a lot more polished, and now have multiple production apps running there. I'm genuinely pumped about Sprites and have started building against the API—I did notice the weirdness with the docs, but you guys have been doing well on the "this thing that {was broken|I didn't like|was missing} now works the way I'd hoped it would" front.

MrOrelliOReilly · 2026-01-15T20:20:05 1768508405

Appreciate your perspective and totally understand that at some point you just have to ship it! From the outside it looks like a bit less time on XYZ feature and bit more time on marketing polish might have been a good call. But can only speculate what the trade offs were internally. Best of luck maturing the product!

dtkav · 2026-01-16T06:36:28 1768545388

The main things i think are missing is (1) how much am i spending and (2) why isn't my sprite paused, and (3) how can i get my stuff out (it would be nice to be able to mount in either direction or else integrate with git/git worktrees).

I ended up using it (and enjoying yolo mode!) but then my sprites weren't pausing and i was worried about spending too much so i deleted them.

mcpherrinm · 2026-01-15T18:36:37 1768502197

I'm sure this is a difference-of-learning or whatever, but I'm usually unwilling to try a product until I can understand it and how it works from the documentation

tptacek · 2026-01-15T18:38:45 1768502325

Understandable. Our current take is that there's not really much to know, and that the people this will really light up are good with that. Of course, we'll flesh out documentation!

I'm really jazzed about this particular product as a product (I just really enjoy using it), but the post is mostly about how we built it, and deliberately not much about how best to use it.

el_nahual · 2026-01-15T19:08:38 1768504118

I hate being negative but it sounds like par for the course for fly.

Incredible (truly, incredible, world-class) engineers that somehow lack that final 10% of polish/naming/documentation that makes things...well, seriously usable.

I remember last time I tried them the bizarre hoops/documentation around database creation. I _think_ they solved that but I remember at the time it felt almost like I was getting looked down upon as a user. Ugh, you need clarity? how amateurish!

tptacek · 2026-01-15T19:25:30 1768505130

Naming? We got naming wrong?

el_nahual · 2026-01-15T19:44:59 1768506299

Could not have illustrated the point better if I tried.

dcre · 2026-01-15T19:47:53 1768506473

Likewise!

killthebuddha · 2026-01-15T20:56:18 1768510578

+1. This thread, the thread about documentation, and the thread about turning off Sprites, when taken together, thoroughly illustrate why I'm not currently a Fly user.

dtkav · 2026-01-16T06:37:42 1768545462

The name is excellent.

MrOrelliOReilly · 2026-01-03T16:45:33 1767458733

For everyone defending these actions on the basis of Maduro's own corruption and the desires of Venezuelans, I would encourage you to research the history of American intervention and regime change in Latin America. It is impossible to anticipate the second and third order effects of this change, and how it will be absorbed in the local politics. We are witnessing the return of American military intervention in Latin American, nothing more and nothing less.

To everyone proclaiming that we should turn to Venezuelans to assess these actions, how dare you assert that Americans have no autonomy in the actions of their own government. It is tremendously unfortunate that congress has forfeited all decision making authority to the executive branch, but as our democracy was intended this would amount to an act of war, which would require authorization by congress.

MrOrelliOReilly · 2025-12-29T12:47:19 1767012439

The author makes effectively the same comment that twenty minutes late is normal for Germany (below). I don’t have any statistics, but anecdotally I’ve had worse experiences with DB than in the UK. DB does not just run late, but has a bad habit of teleporting you to random German towns, from which you must quickly route find to your original destination (as is the exact story of the post).

> It is twenty minutes late. I consider this early.

flohofwoe · 2025-12-29T12:54:20 1767012860

The TL;DR is: regional trains are usually on time, long-distance trains usually are not. If you need to travel between cities, plan with an hour buffer time. Basically, "show some adaptability".

The one good thing about frequent long-distance delays is that you might be lucky and catch an earlier delayed train and actually arrive a bit earlier than planned ;)

(also JFC, does the author like to whine about nothing - I'm travelling frequently with DB for about 25 years now, and while shit happens from time time, most of it is merely a slight inconvenience).

barrkel · 2025-12-29T13:28:16 1767014896

Plan with a ~40% of travel time buffer if you ought be there, and travel the day before if you must be there.

I travel from Basel to Hannover and back every two weeks on DB. Trains south are almost always late, trains north usually late. Frequently the train is already late in Hannover having come from Hamburg. The worst was when I was kicked out in Frankfurt and had to stay in a hotel. The delays were so bad there were no more trains left that could connect me to the last train out of Basel.

Things have been getting better for the past couple of months I think though.

MrOrelliOReilly · 2025-12-24T13:15:25 1766582125

There has been a lot of discussion on this recently in the blog-o-sphere. All conclusions I've seen so far are that the economy is basically fine and maybe people's expectations have risen (I'm oversimplifying). I'm also quite eager to hear different conclusions, because there is a lot of cognitive dissonance on the economy right now.

- https://www.slowboring.com/p/you-can-afford-a-tradlife

- https://www.slowboring.com/p/affordability-is-just-high-nomi...

- https://thezvi.substack.com/p/the-revolution-of-rising-expec...

- https://open.substack.com/pub/astralcodexten/p/vibecession-m...

hansvm · 2025-12-24T16:21:53 1766593313

There's definitely an aspect of rising expectations (e.g., everyone and their dog having a late-model smartphone). There's also an aspect where some of that is mostly unavoidable (e.g., accessing my HSA now requires a very late-model smartphone -- something I can avoid complying with for now by just finding a better place to transfer my money, but it's a worrying trend -- to achieve the same QoL as in the early 2000s I have mandatory nontrivial overhead).

It's really not just that though. A lot goes into it, but one observation is that the relative increases in wages and prices isn't distributed evenly. Some examples:

- A lot of people are legitimately substantially better off than they would have been a few decades ago. I literally never have to worry about money anymore when thinking about our purchases (for everything but a house with a big yard, which we still can't safely [0] afford without moving). I'm not alone.

- That's not true of everyone, even my next-door neighbors. I know people splitting a studio apartment and still struggling a bit. They have good jobs, and even splitting the apartment their post-tax, post-rent pay is $7.20/hr. That's fine enough I suppose, but they'll literally never be able to save for a home of any quality in the area in their entire lives using only a single income. It'll take them awhile to afford a home anywhere.

- Suppose you have a couple young kids. That places hard bounds on how much money you need to make even for childcare to make sense to get up to two incomes in the first place. I've known plenty of people with PhDs and good jobs who quit to take care of the kids for financial reasons, supporting the household on just the higher-earner's pay.

- A lot of small towns haven't seen the same increase in wages as the rest of the country but have seen the increase in prices. My hometown saw an increase from $10/hr to $20/hr in what a great wage is over the last 25 years. CPI only went up 1.9x in that time, but the same caliber of house went up 3x, and the staples people used to eat (like ground beef) went up more than 3x as well. They're correctly observing that they have less take-home money (because of 3x increased rent), that take-home money doesn't go as far (they can't eat the same foods they could 25yrs ago), and it definitely doesn't go as far if you want to do something like save for a house (it's an extra 4+yrs of post-tax, post-rent income to pay for a house, assuming you could devote all of it to savings instead of groceries and whatnot).

I'm not sure exactly how to quantify who's struggling and why at a macroscopic level, but I guarantee they're real and that it's not just an increase in expectations.

[0] It depends on your relative risk levels, but if you're not convinced the gravy train will last forever and are concerned about locking up all your assets in a depreciating vehicle then you need to be a bit more frugle with your choice of home.

MrOrelliOReilly · 2025-12-22T12:54:10 1766408050

Try a commuter train in Switzerland. I work two hours on the train every single day with no issue!

MrOrelliOReilly · 2025-12-20T09:46:15 1766223975

I think this is a total misunderstanding of Anthropic’s place in the AI race. Opus 4.5 is absolutely a state of the art model. I won’t knock anyone for preferring Codex, but I think you’re ignoring official and unofficial benchmarks.

See: https://artificialanalysis.ai

wahnfrieden · 2025-12-20T16:24:34 1766247874

What am I missing? As suspicious as benchmarks are, your link shows GPT 5.2 to be superior.

It is also out of date as it does not include 5.2 Codex.

Per my point about steerability compensated for by modalities and other harness features: Opus 4.5 scores 58% while GPT 5.2 scores 75% for the instruction following benchmark in your link! Thanks for the hard evidence - GPT 5.2 is 30% ahead of Opus 4.5 there. No wonder Claude Code needs those harness features for the user to manually reign in control over its instruction following capability.

woadwarrior01 · 2025-12-20T11:20:29 1766229629

> Opus 4.5 is absolutely a state of the art model.

> See: https://artificialanalysis.ai

The field moves fast. Per artificialanalysis, Opus 4.5 is currently behind GPT-5.2 (x-high) and Gemini 3 Pro. Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.

MrOrelliOReilly · 2025-12-20T14:26:58 1766240818

Totally, however OP's point was that Claude had to compensate for deficiencies versus a state of the art model like ChatGPT 5.2. I don't think that's correct. Whether or not Opus 4.5 is actually #1 on these benchmarks, it is clearly very competitive with the other top-tier models. I didn't take "state of the art" to here narrowly mean #1 on a given benchmark, but rather to mean near or at the frontier of current capabilities.

gessha · 2025-12-20T15:30:03 1766244603

One thing to remember when comparing ML models of any kind is that single value metrics obscure a lot of nuance and you really have to go through the model results one by one to see how it performs. This is true for vision, NLP, and other modalities.

ramoz · 2025-12-20T14:06:28 1766239588

https://x.com/giansegato/status/2002203155262812529/photo/1

https://x.com/METR_Evals/status/2002203627377574113

> Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.

What an insane take for anybody uses these models daily.

MrOrelliOReilly · 2025-12-20T14:18:59 1766240339

Yes, I personally feel that the "official" benchmarks are increasingly diverging from the everyday reality of using these models. My theory is that we are reaching a point where all the models are intelligent enough for day-to-day queries, so points like style/personality and proper use of web queries and other capabilities are better differentiators than intelligence alone.

int_19h · 2025-12-21T23:36:53 1766360213

The benchmarks haven't reflected the real utility for a very long time. At best they tell you which models are definitely bad.

dr_dshiv · 2025-12-20T13:36:26 1766237786

https://lmarena.ai/leaderboard/webdev

LM Arena shows Claude Opus 4.5 on top

HarHarVeryFunny · 2025-12-20T13:57:58 1766239078

I wonder how model competence and/or user preference on web development (that leaderboard) carries over to more complex and larger projects, or more generally anything other than web development ?

In addition to whatever they are exposed to as part of pre-training, it'd be interesting to know what kind of coding tasks these models are being RL-trained for? Are things like web development and maybe Python/ML coding overemphasized, or are they also being trained on things like Linux/Windows/embedded development etc in different languages?

fzzzy · 2025-12-20T16:09:15 1766246955

is x-high fast enough to use as a coding agent?

wahnfrieden · 2025-12-20T16:14:36 1766247276

Yes, if you parallelize your work, which you must learn to do if you want the best quality

MrOrelliOReilly · 2025-12-16T10:53:52 1765882432

I agree that it's annoying to have competing standards, but when dealing with a lot of unknowns it's better to allow divergence and exploration. It's a worse use of time to quibble over the best way to do things when we have no meaningful data yet to justify any decision. Companies need freedom to experiment on the best approach for all these new AI use cases. We'll then learn what is great/terrible in each approach. Over time, we should expect and encourage consolidation around a single set of standards.

pscanf · 2025-12-16T11:09:38 1765883378

> when dealing with a lot of unknowns it's better to allow divergence and exploration

I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.

For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).

¹ https://fly.io/blog/everyone-write-an-agent/

kridsdale3 · 2025-12-16T17:36:00 1765906560

I recently heard that when automobiles were new the USA quickly ended up in a state with 80 competing manufacturing brands. In a couple decades, the market figured out what customers actually want and what styles and features mattered, and the competition ecosystem consolidated to 5 brands.

The same happened with GPUs in the 90s. When Jensen formed Nvidia there were 70 other companies selling Graphics Cards that you could put in a PCI slot. Now there are 2.

MrOrelliOReilly · 2025-12-11T09:50:57 1765446657

I have been fighting the same bizarre quota demon. Scripts kept timing out due to quota limitations, but I haven't been able to find any indication of a limit in the console. Finally gave up and switched to Claude, since they at least have a sane interface for API keys and billing!

MrOrelliOReilly · 2025-11-04T14:38:42 1762267122

I find this a bit disingenuous.

If I visit a buffet looking for a healthy snack, but 90% of the dishes are fast food, then I'll probably spend a lot of time looking through the fast food, and may even eat some as the best worst option.

Similarly, I have found the overall content pool to have significantly worsened since Musk's takeover. The algorithm keeps serving me trash. It doesn't mean I want trash.

cloverich · 2025-11-04T18:37:57 1762281477

You can take your analogy further. The buffet noticed you pausing on unhealthy food, and begins replacing all the healthy options with unhealthy options. People shame your criticisms and note you could easily put blinders on and intentionally look longer at healthy options anytime you accidentally glance at an unhealthy one. the alternative would be an absolute repression of free speech after all.