Something I find somewhat annoying about MySQL's INSERT...ON DUPLICATE KEY UPDAT...

sbuttgereit · on Sept 25, 2023

I expect this behavior to be pretty common. In PostgreSQL, auto incrementing keys like this are backed by sequences.

Sequences, at least in the database products I'm familiar with, do not exhibit transactional behavior; I'm not sure if this is actually part of the standard, but if not, I at least expect this to be pretty common. Part of the reason for this is to avoid introducing a point of serialization in concurrent transaction processing. If we were to do what you're asking for, you can get quite a performance hit. Consider a long running transaction involving an insert into a table using an auto-incrementing key. Any other transaction that wanted to get an ID would have to wait for that long running transaction to finish to get a number from the sequence... the waiting transaction wouldn't know if the long running transaction was going to commit or rollback and so the sequence number would be undefined until the waiting transaction could get its own lock.

If we set forth the expectation that a sequence will only guarantee you that you'll get a unique number from within the range, but that any other desire like gapless numbering is out of scope for the sequence you can avoid that whole waiting game. Sure, you might have gaps as a result; if the long running transaction finally rolls back, you'd get a gap for the number it consumed.... but no one waited to find out if that was the case and all the records got unique IDs.

If you need to ascribe meaning beyond merely "unique identity" to such a record ID, it might be that an auto-sequencing is simply the wrong tool. I think the compromise and expectation that sequences, and extended functionality depending on sequence mechanics, are not transactional is a good one given the performance costs of ensuring gapless numbering whether you really need it or not.

andorov · on Sept 25, 2023

worth re-upping -

postgres requests an id from the sequence for each row of the incoming data ahead of time since it doesn't know which rows are updates and which are inserts (although presumably this could be changed?). the sequence doesn't reset down for the unused so this can eat through it unexpectedly quickly if you have a table with a relatively large volume of updates.

also as a tip if you hit the max integer for the sequence and need space to implement a fundamental fix you can quickly change the sequence to start at -1 and go down. there's no issue with negative ids since they're also integers.

anarazel · on Sept 25, 2023

> postgres requests an id from the sequence for each row of the incoming data ahead of time since it doesn't know which rows are updates and which are inserts (although presumably this could be changed?).

Not in any sort of general way - the sequence can be part of the unique key that you're conflicting on.

dewey · on Sept 25, 2023

PG upsert also increments the counter.

I somehow feel it's wasteful and feel a bit conflicted. I also realize that while it feels natural to somehow get a feeling for the row numbers and it can help with debugging because you somewhat know some IDs / ranges after a while it's probably more something I just got used to and maybe shouldn't rely on.

tucnak · on Sept 25, 2023

Yes, unfortunately it does and I don't think you can work around this...