Modern SQL

gordon_freeman · on Aug 13, 2021

If anyone is interested, Winand's book "SQL Performance Explained" [1] really goes deep into understanding and optimizing SQL queries and improving performance of SQL queries.

[1] https://sql-performance-explained.com/?utm_source=winand.at&...

krylon · on Aug 13, 2021

Unfortunately, I have forgotten the details, but a couple of years ago, I was writing a rather complex SQL query for my boss to extract some information from our ERP database the ERP software itself was incapable of providing (it was a kind of crappy ERP system, I suspect, not being an expert).

After much head-scratching, coffee-drinking and intense screen-staring, I came up with a query that gave the desired results, but took about an hour to complete.

I have only run into SQL performance problems very rarely, but this was an obvious stinker, so I dove in, using MS SQL Server Management Studio's query explainer to figure out what was going wrong. To my mild surprise and great relief, I did find the solution, I only recall it had something to do with JOINs (HASH JOIN? Is that a thing?). The query went from ~60 minutes to less than a second.

So I learned that SQL performance tuning is not black magic, although it's not trivial either. Fortunately, most of the time, my SQL queries run sufficiently fast it's not worth the effort to make 'em faster.

But damn, going from ~60 minutes to less than a second with just a relatively minor change in the query was probably the most effective optimization I ever did. :-)

throwaway81523 · on Aug 14, 2021

I have that book (SQL Performance Explained) and benefited from it and think it was worth it, but I'd also like something more advanced. If he has another book out, that's great, I'm ready.

Added: But, it looks like the link for this thread is just a slide deck. Oh well.

MarkusWinand · on Aug 13, 2021

See also: https://winand.at/conferences/past

Includes an updates version of the first slideset: https://modern-sql.com/slides/ModernSQL-2019-05-30.pdf

simonw · on Aug 13, 2021

Markus's site https://modern-sql.com/ is fantastic, I've learned a ton of useful tricks from it.

xdfgh1112 · on Aug 13, 2021

This seems out of date (2015?) Unless I am missing something.

franole · on Aug 13, 2021

Probably.

In Postgres, WITH queries are not "optimizer fences" since version 12 [0].

But I discovered many other usefull features/tips.

[0] https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-...

simonw · on Aug 13, 2021

I recently found out that SQLite added support for FILTER aggregate clauses in version 3.30.0, released 2019-10-04: https://til.simonwillison.net/sqlite/sqlite-aggregate-filter...

treve · on Aug 13, 2021

Is all the javascript broken? I can't move to the next slide, but the premise seems interesting.

bradford · on Aug 13, 2021

Great post: I found the problem/solution approach super helpful. I've used some of the more recent features, but have missed grouping sets, recursive CTEs, and MATCH_RECOGNIZE. hope to put these in application someday.

Eikon · on Aug 13, 2021

I really don't like this whole "OFFSET" kills performance thing that you see everywhere because it's only relevant for huge sets, think millions of rows.

And then, there's a reason why people are using offsets, cursors are usually not a good alternative and the after_id pattern is not flexible and don't allow to go to a random page number...

But still, some people are preaching for alternatives that are not actually solving the same problem as offsets.

simonw · on Aug 13, 2021

Millions of rows isn't a lot of data though - I've seen plenty of systems that implemented pagination with OFFSET and later regretted it.

Eikon · on Aug 13, 2021

By what measure "millions" is not big? We are talking about pagination there.

And even then you just scale up your database server to have performant disks and implement caching.

You'll be just fine going through even tens of millions of rows.

simonw · on Aug 13, 2021

The problem is that when I implement pagination it's often because I want automated scripts to be able to export ALL of the data, which means they need to be able to paginate to the end of the results.

Example from one of my projects: https://covid-19.datasettes.com/covid/ny_times_us_counties - it uses keyset (cursor) pagination so it's possible to efficiently export all 1,611,908 rows of data.

"SELECT * from table limit 100 offset 10000" requires the database to skip 10000 rows in order to get to the rows it needs to start returning - and then on the next page it has to do it again, using offset 10100.

If you're showing an interface to users you can implement an easy workaround: don't allow them to click past page 10, in which case OFFSET pagination only ever needs to loop through up to about 1,000 rows (depending on your page size).

nicwolff · on Aug 13, 2021

Seconding simonw:

If you implement a REST API with a "page" parameter that renders as OFFSET in its DB queries, I promise you someone will come along and page through your entire "not big" table. For reporting, or to update XML sitemaps, or something else vital to their business.

Every 15 minutes.

Now put that API behind a public-facing Web app handling, say, a thousand requests a second.

Now deploy a migration that queues for an exclusive lock on a production table to add a column.

Now write a "post-mortem" report explaining to your boss' boss' bosses why their revenue-generating sites were all down for 15 minutes.