There is important truth in your post, yet you seem to miss the really important pieces that make this hard.
> It's the parents obligation to educate their child.
> It's the child's obligation to use that education wisely.
Two obvious things complicate this:
- You weren't taught how to use a real gun at 6 months old, right?
- Would it not follow from what you said above that if you had accidentally shot and killed yourself at age 7, then it would be your own fault and nobody else's? That seems (to me, at least) like an absurd conclusion.
I think about it like this: as a parent, my jobs include identifying when my child is capable of learning about something new, providing the guidance they need to learn it (which is probably not all up front, but involves some supervision, since it's usually an iterative process), allowing them to make mistakes, accepting some acceptable risks of injury, and preventing catastrophe. I'll use cooking as an example. My kids got a "toddler knife" very young (basically a wooden wedge that's not very sharp). We showed them how to cut up avocados (already split) and other soft things. As they get older, we give them sharper knives and trickier tasks. We watch to see if they're understanding what we've told them. We give more guidance as needed. It's okay if they nick themselves along the way. But we haven't given them a sharpened chef's knife yet! And if they'd taken that toddler knife and repeatedly tried to jam it into their sibling's eye despite "educating" them several times, while I wouldn't regret having made the choice to see if they were ready, I would certainly conclude that they weren't yet ready. That's on me, not them.
You allude to this when you say:
> I am very much for showing kids how to use the internet responsibly, but I'm not of the opinion that parental controls are particularly desirable beyond an initial learning period.
Yes, the goal should be to teach kids how to operate safely, not keep them from all the dangerous things. But I'd say that devices and the internet are more like "the kitchen". There are lots of different risks there and it's going to take many years to become competent (or even safe). Giving them an ordinary device would be like teaching my 2-year-old their first knife skills next to a hot stove in a restaurant kitchen with chefs flying around with sharp knives and hot pots. By contrast, without doing any particular child-proofing, our home kitchen is a much more controlled environment where I can decide which risks they're exposed to when. This allows me to supervise without watching every moment to see if they're about to stab themselves -- which also gives them the autonomy they need to really learn. The OP, like other parents, wants something similar from their device and the internet: to gradually expose elements of these things as the parents are able to usefully guide the children, all while avoiding catastrophe.
I inferred that they’re referring to the fact that in typical C the compiler must have seen a function earlier in the file for you to use it. One solution (that the author doesn’t like) is to put the leaf functions first so that they’re defined when the compiler sees their callers. The author seems to be ignoring the alternative approach: declaring functions at the top and then writing the in the top-down order that they like.
Doesn't this depend a lot on how long your actions run? Like, you may have already invested in your own hardware (maybe because your actions use a lot of resources and it's cheaper) and now you have to pay per-minute of action runtime for the API that does the bookkeeping?
It is, although you can have sharded PostgreSQL, in which case I agree with your assessment that you want random PKs to distribute them.
It's workload-specific, too. If you want to list ranges of them by PK, then of course random isn't going to work. But then you've got competing tensions: listing a range wants the things you list to be on the same shard, but focusing a workload on one shard undermines horizontal scale. So you've got to decide what you care about (or do something more elaborate).
Every time Postgres advice says to “schedule [important maintenance] during low traffic period” (OP) or “outside business hours”, it reinforces my sense that it’s not suitable for performance-sensitive data path on a 24/7/365 service and I’m not sure it really aims to be. (To be fair, running it like that for several years and desperately trying to make it work also gave me that feeling. But I’m kind of aghast that necessary operational maintenance still carries these caveats.)
> Every time Postgres advice says to “schedule [important maintenance] during low traffic period” (OP) or “outside business hours”, it reinforces my sense that it’s not suitable for performance-sensitive data path on a 24/7/365 service and I’m not sure it really aims to be.
It's a question of resource margins. If you have regular and predictable windows of low resource utilization, you can afford to run closer to the sun during busy periods, deferring (and amortizing, to some degree) maintenance costs till later. If you have a 24/7/365 service, you need considerably higher safety margins.
Also, there's a lot of terrible advice on the internet, if you haven't noticed.
> (To be fair, running it like that for several years and desperately trying to make it work also gave me that feeling. But I’m kind of aghast that necessary operational maintenance still carries these caveats.)
To be fair, I find oxides' continual low-info griping against postgres a bit tedious. There's plenty weaknesses in postgres, but criticizing postgres based on 10+ year old experiences of running an, at the time, outdated postgres, on an outdated OS is just ... not useful? Like, would it useful to criticize oxides lack of production hardware availability in 2021 or so?
What you describe is true and very important (more margin lets you weather more disruption), but it's not the whole story. The problem we had was queueing delays mainly due to I/O contention. The disks had the extra IOPS for the maintenance operation, but the resulting latency for all operations was higher. This meant overall throughput decreased when the maintenance was going on. The customer, finally accepting the problem, thought: "we'll just build enough extra shards to account for the degradation". But it just doesn't work like that. If the degradation is 30%, and you reduce the steady-state load on the database by 30%, that doesn't change the fact that when the maintenance is ongoing, even if the disks have the IOPS for the extra load, latency goes up. Throughput will still degrade. What they wanted was predictability but we just couldn't give that to them.
> To be fair, I find oxides' continual low-info griping against postgres a bit tedious. There's plenty weaknesses in postgres, but criticizing postgres based on 10+ year old experiences of running an, at the time, outdated postgres, on an outdated OS is just ... not useful?
First, although I work at Oxide, please don't think I speak for Oxide. None of this happened at Oxide. It informed some of the choices we made at Oxide and we've talked about that publicly. I try to remember to include the caveat that this information is very dated (and I made that edit immediately after my initial comment above).
I admit that some of this has been hard for me personally to let go. These issues dominated my professional life for three very stressful years. For most of that time (and several years earlier), the community members we reached out to were very dismissive, saying either these weren't problems, or they were known problems and we were wrong for not avoiding them, etc. And we certainly did make mistakes! But many of those problems were later acknowledged by the community. And many have been improved -- which is great! What remains is me feeling triggered when it feels like users' pain is being casually dismissed.
I'm sorry I let my crankiness slip into the comment above. I try to leave out the emotional baggage. Nonetheless, I do feel like it's a problem that, intentionally or otherwise, a lot of the user base has absorbed the idea that it's okay for necessary database maintenance to significantly degrade performance because folks will have some downtime in which to run it.*
> First, although I work at Oxide, please don't think I speak for Oxide. None of this happened at Oxide. It informed some of the choices we made at Oxide and we've talked about that publicly. I try to remember to include the caveat that this information is very dated (and I made that edit immediately after my initial comment above).
I said oxide, because it's come up so frequently and at such length on the oxide podcast... Without that I probably wouldn't have commented here. It's one thing to comment on bad experiences, but at this point it feels like more like bashing. And I feel like an open source focused company should treat other folks working on open source with a bit more, idk, respect (not quite the right word, but I can't come up with a better one right now).
I probably shouldn't have commented on this here. But I read the message after just having spent a Sunday morning looking into a problem and I guess that made more thin skinned than usual.
> For most of that time (and several years earlier), the community members we reached out to were very dismissive, saying either these weren't problems, or they were known problems and we were wrong for not avoiding them, etc.
I agree that the wider community sometimes has/had the issue of excusing away postgres problems. While I try to avoid doing that, I certainly have fallen prey to that myself.
Leaving fandom like stuff aside, there's an aspect of having been told over and over we're doing xyz wrong and things would never work that way, and succeeding (to some degree) regardless. While ignoring some common wisdom has been advantageous, I think there's also plenty where we just have been high on our own supply.
> What remains is me feeling triggered when it feels like users' pain is being casually dismissed.
I don't agree that we have been "bashing" Postgres. As far as I can tell, Postgres has come up a very small number of times over the years: certainly on the CockroachDB episode[0] (where our experience with Postgres is germane, as it was very much guiding our process for finding a database for Oxide) and then again this year when we talked about our use of statemaps on a Rust async issue[1] (where our experience with Postgres was again relevant because it in part motivated the work that we had used to develop the tooling that we again used on the Rust issue).
I (we?) think Postgres is incredibly important, and I think we have properly contextualized our use of it. Moreover, I think it is unfair to simply deny us our significant experience with Postgres because it was not unequivocally positive -- or to dismiss us recounting some really difficult times with the system as "bashing" it. Part of being a consequential system is that people will have experience with it; if one views recounting that experience as showing insufficient "respect" to its developers, it will have the effect of discouraging transparency rather than learning from it.
I'm certainly very biased (having worked on postgres for way too long), so it's entirely plausible that I've over-observed and over-analyzed the criticism, leading to my description.
> I (we?) think Postgres is incredibly important, and I think we have properly contextualized our use of it. Moreover, I think it is unfair to simply deny us our significant experience with Postgres because it was not unequivocally positive -- or to dismiss us recounting some really difficult times with the system as "bashing" it. Part of being a consequential system is that people will have experience with it; if one views recounting that experience as showing insufficient "respect" to its developers, it will have the effect of discouraging transparency rather than learning from it.
I agree that criticism is important and worthwhile! It's helpful though if it's at least somewhat actionable. We can't travel back in time to fix the problems you had in the early 2010s... My experience of the criticism of the last years from the "oxide corner" was that it sometimes felt somewhat unrelated to the context and to today's postgres.
> if one views recounting that experience as showing insufficient "respect" to its developers
I should really have come up with a better word, but I'm still blanking on choosing a really apt word, even though I know it exists. I could try to blame ESL for it, but I can't come up with a good German word for it either... Maybe "goodwill". Basically believing that the other party is trying to do the right thing.
>> What remains is me feeling triggered when it feels like users' pain is being casually dismissed.
> Was that done in this thread?
Well, I raised a general problem around 24/7/365 use cases (rooted in my operational experience, reinforced by the more-current words that I was replying to and the OP) and you called it "tedious", "low-info griping". Yes, that seems pretty dismissive.
(Is it fair? Though I thought the podcast episodes were fairly specific, they probably glossed over details. They weren't intended to be about those issues per se. I did write a pretty detailed post though:
https://www.davepacheco.net/blog/2024/challenges-deploying-p...
(Note the prominent caveat at the top about the experience being dated.))
You also wrote:
> running an, at the time, outdated postgres, on an outdated OS
Yes, pointing to the fact that the software is old and the OS is unusual (it was never outdated; it was just not Linux) are common ways to quickly dismiss users' problems. If the problems had been fixed in newer versions, that'd be one thing. Many (if not all) of them hadn't been. But also: the reason we were running an old version was precisely that it was a 24/7/365 service and there was no way to update databases without downtime, especially replicated ones, nor a great way to mitigate risk (e.g., a mode for running the new software without updating the on-disk format so that you can go back if it's a disaster). This should be seen as a signal of the problem, not a reason to dismiss it (as I feel like you're doing here). As for the OS, I can only think of one major issue we hit that was OS-specific. (We did make a major misconfiguration related to the filesystem that certainly made many of our issues much worse.)
I get that it sucks to keep hearing about problems from years ago. All of this was on 9.2 - 9.6 -- certainly ancient today. When this comes up, I try to balance sharing my operational experience with the fact that it's dated by just explaining that it's dated. After all, all experience is dated. Readers can ignore it if they want, do some research, or folks in the PostgreSQL world can update me when specific things are no longer a problem. That's how I learned that the single-threaded WAL receiver had been updated, apparently in part because of our work: https://x.com/MengTangmu/status/1828665449850294518 (full thread: https://x.com/MengTangmu/status/1828665439234474350). I'll happily share these updates wherever I would otherwise share my gripes!
Regarding pgstattuple specifically: If this was a 24/7/365 service and you would be concerned by the I/O impact of loading the full table or index at any time, you could run this on a replica too. For tables there is pgstattuple_approx which is much better at managing its impact, but there is no equivalent for indexes today.
The REINDEX CONCURRENTLY mentioned in OP could also be run at other times of the day - the main issue is again I/O impact (with potentially some locking concerns at the very end of the reindex concurrently to swap out the index).
There are no magic solutions here - other databases have to deal with the same practical limitations, though Postgres sometimes is a bit slow to adopt operational best practices in core (e.g. the mentioned pg_squeeze from OP may finally get an in-core "REPACK CONCURRENTLY" equivalent in Postgres 19, but its been a long time to get there)
> The exclusive lock is only needed during the final swap phase, and its duration can be configured.
FYI: even a very short operation that requires an exclusive lock can induce significant downtime if there’s anything else that holds a shared lock for extended periods. In [1], there was:
- a wraparound autovacuum (which holds a shared lock for potentially a long time — like hours)
- lots of data path operations wanting a shared lock
- one operation that should have been very brief that merely tried to take an exclusive lock
The result is that the presence of an operation wanting an exclusive lock blocked the data path for the duration of the autovacuum. Major outage.
I’m sympathetic to looking down on the obsession with money. But there’s something deep and important about the monetary element. Engineering is about solving real-world, practical problems. The cost is a real factor in whether a potential solution is a useful one.
I think the money question is a red herring here. I’d phrase it more like: what problem in a user’s problem space is expressible only like this? And if the only user is the programmer, that’s alright, but feels more aligned with pure academia. That’s important, too! But has a much smaller audience than engineering at large.
I guess you’re being facetious but for those who didn’t click through:
> This type of code error is prevented by languages with strong type systems. In our replacement for this code in our new FL2 proxy, which is written in Rust, the error did not occur.
You're right that you have to "write code well" to prevent this sort of thing. It's also true that Rust's language features, if you use them, can make this sort of mistake a compile-time error rather than something that only blows up at runtime under the wrong conditions. The problem with their last outage was that somebody explicitly opted out of the tool provided by the language. As you say, that's "not writing code well". But I think you're dismissing the value of the language feature in helping you write code well.
It seems obvious to me that "the ability to model, predict, and influence one’s future" is far more general and capable than "constrained to pattern recognition and prediction of text and symbols." How do you conclude that those are the same?
I do like that definition because it seems to capture what's different between LLMs and people even when they come up with the same answers. If you give a person a high school physics question about projectile motion, they'll use a mental model that's a combination of explicit physical principles and algebraic equations. They might talk to themselves or use human language to work through it, but one can point to a clear underlying model (principles, laws, and formulas) that are agnostic to the human language they're using to work through them.
I realize some people believe (and it could be) that ultimately it really is the same process. Either the LLM does have such a model encoded implicitly in all those numbers or human thought using those principles and formulas is the same kind of statistical walk that the LLM is doing. At the very least, that seems far from clear. This seems reflected in the results like the OP's.
> It's the parents obligation to educate their child.
> It's the child's obligation to use that education wisely.
Two obvious things complicate this:
- You weren't taught how to use a real gun at 6 months old, right?
- Would it not follow from what you said above that if you had accidentally shot and killed yourself at age 7, then it would be your own fault and nobody else's? That seems (to me, at least) like an absurd conclusion.
I think about it like this: as a parent, my jobs include identifying when my child is capable of learning about something new, providing the guidance they need to learn it (which is probably not all up front, but involves some supervision, since it's usually an iterative process), allowing them to make mistakes, accepting some acceptable risks of injury, and preventing catastrophe. I'll use cooking as an example. My kids got a "toddler knife" very young (basically a wooden wedge that's not very sharp). We showed them how to cut up avocados (already split) and other soft things. As they get older, we give them sharper knives and trickier tasks. We watch to see if they're understanding what we've told them. We give more guidance as needed. It's okay if they nick themselves along the way. But we haven't given them a sharpened chef's knife yet! And if they'd taken that toddler knife and repeatedly tried to jam it into their sibling's eye despite "educating" them several times, while I wouldn't regret having made the choice to see if they were ready, I would certainly conclude that they weren't yet ready. That's on me, not them.
You allude to this when you say:
> I am very much for showing kids how to use the internet responsibly, but I'm not of the opinion that parental controls are particularly desirable beyond an initial learning period.
Yes, the goal should be to teach kids how to operate safely, not keep them from all the dangerous things. But I'd say that devices and the internet are more like "the kitchen". There are lots of different risks there and it's going to take many years to become competent (or even safe). Giving them an ordinary device would be like teaching my 2-year-old their first knife skills next to a hot stove in a restaurant kitchen with chefs flying around with sharp knives and hot pots. By contrast, without doing any particular child-proofing, our home kitchen is a much more controlled environment where I can decide which risks they're exposed to when. This allows me to supervise without watching every moment to see if they're about to stab themselves -- which also gives them the autonomy they need to really learn. The OP, like other parents, wants something similar from their device and the internet: to gradually expose elements of these things as the parents are able to usefully guide the children, all while avoiding catastrophe.
reply