If anyone is interested, Winand's book "SQL Performance Explained" [1] really goes deep into understanding and optimizing SQL queries and improving performance of SQL queries.
Unfortunately, I have forgotten the details, but a couple of years ago, I was writing a rather complex SQL query for my boss to extract some information from our ERP database the ERP software itself was incapable of providing (it was a kind of crappy ERP system, I suspect, not being an expert).
After much head-scratching, coffee-drinking and intense screen-staring, I came up with a query that gave the desired results, but took about an hour to complete.
I have only run into SQL performance problems very rarely, but this was an obvious stinker, so I dove in, using MS SQL Server Management Studio's query explainer to figure out what was going wrong. To my mild surprise and great relief, I did find the solution, I only recall it had something to do with JOINs (HASH JOIN? Is that a thing?). The query went from ~60 minutes to less than a second.
So I learned that SQL performance tuning is not black magic, although it's not trivial either. Fortunately, most of the time, my SQL queries run sufficiently fast it's not worth the effort to make 'em faster.
But damn, going from ~60 minutes to less than a second with just a relatively minor change in the query was probably the most effective optimization I ever did. :-)
I have that book (SQL Performance Explained) and benefited from it and think it was worth it, but I'd also like something more advanced. If he has another book out, that's great, I'm ready.
Added: But, it looks like the link for this thread is just a slide deck. Oh well.
Great post: I found the problem/solution approach super helpful. I've used some of the more recent features, but have missed grouping sets, recursive CTEs, and MATCH_RECOGNIZE. hope to put these in application someday.
I really don't like this whole "OFFSET" kills performance thing that you see everywhere because it's only relevant for huge sets, think millions of rows.
And then, there's a reason why people are using offsets, cursors are usually not a good alternative and the after_id pattern is not flexible and don't allow to go to a random page number...
But still, some people are preaching for alternatives that are not actually solving the same problem as offsets.
The problem is that when I implement pagination it's often because I want automated scripts to be able to export ALL of the data, which means they need to be able to paginate to the end of the results.
"SELECT * from table limit 100 offset 10000" requires the database to skip 10000 rows in order to get to the rows it needs to start returning - and then on the next page it has to do it again, using offset 10100.
If you're showing an interface to users you can implement an easy workaround: don't allow them to click past page 10, in which case OFFSET pagination only ever needs to loop through up to about 1,000 rows (depending on your page size).
If you implement a REST API with a "page" parameter that renders as OFFSET in its DB queries, I promise you someone will come along and page through your entire "not big" table. For reporting, or to update XML sitemaps, or something else vital to their business.
Every 15 minutes.
Now put that API behind a public-facing Web app handling, say, a thousand requests a second.
Now deploy a migration that queues for an exclusive lock on a production table to add a column.
Now write a "post-mortem" report explaining to your boss' boss' bosses why their revenue-generating sites were all down for 15 minutes.
[1] https://sql-performance-explained.com/?utm_source=winand.at&...