Hacker Newsnew | past | comments | ask | show | jobs | submit | MarkusWinand's commentslogin

> Making a pretty diagram where all the behaviours from various DBMSs are listed is left as an exercise to the author. :)

It's there already!

It's in the chart as footnote "b": Outer reference must be the only argument (doesn’t support F441)

That's the one thing SQL Server doesn't eat. Those that are green in the chart work fine in this case.


No question, such a query should not be written. That's probably the reason why this odd behavior, which is even different in various DBMSes, is not causing everyday problems.


The reason the behavior doesn't cause problems is because everybody treats automatic aggregation as a voodoo where they know one recipe that works and anything different is domain of the Devil.

And, IMO, that's a very sane and reasonable way to treat it. The entire idea of automatic aggregation is flawed, and those queries should just have a `group by ()` explicit at the right place.


I agree, I think the original sin here is the fact that whether a `SELECT` is an aggregation is determined by the contents of the scalar expressions at all. I think most of this weirdness comes directly out of wanting to be able to write both `SELECT sum(x) FROM xx` and `SELECT x FROM xx` and have them work.

Not that I have a better solution offhand, in SQL grouping by a constant value is not actually the same as not writing `GROUP BY` at all since the behaviour on empty tables is different.


What's an aggregation per se? A SQL query is best thought of an arbitrary generator function. You can, using functions like UNNEST, end up emitting multiple row-tuples for each processed input row-tuple. An aggregation is just a generator function that happens to reduce over all the input rows and then emit one row-tuple when there's no more input. Query-planning engines do not special-case this. It's just a generator node like any other generator node.

Consider: using window functions, you can do partial aggregations over subsets of the input — without even necessarily partitioning the input (i.e. you can compute "running totals" and other wacky state-machine-like outputs.)


Not writing `group by` is the same as writing `group by ()`.

And yeah, the difference between that and a value is one of those really surprising things on SQL that actually make sense and should be this way. Unfortunately, there are many of those.


Ah my apologies, I wasn't familiar with that syntax.


No need to apologize. My post didn't make that clear at all.


Would you argue that automatic scalar-ism is also flawed, and the query

    SELECT a FROM aa;
should have an explicit grouping, like

    SELECT a FROM aa GROUP BY id;
?

After all, when you think about it, it's really not the aggregate functions that break expectations, it's the scalars. Of the four combinations of having or not having an explicit `GROUP BY` and having or not having an aggregate function, three of them have the unsurprising behavior of returning exactly one row per grouping.

    -- aggregate function and group by - one single result per grouping
    SELECT sum(a) FROM aa GROUP BY a; --or
    SELECT sum(a) FROM aa GROUP BY ();


    -- no aggregate function, but a group by - one single result per grouping
    SELECT a FROM aa GROUP BY a;

    -- just aggregate function, but no group by - one single result per (single, implicit) grouping
    SELECT sum(a) FROM aa;

But then, when you have neither an aggregate function nor an explicit `GROUP BY`, it breaks expectations:

    -- no aggregate function, no group by - one result row per row 
    -- in the source set, even though there should be only one big implicit group
    SELECT a FROM aa;
Therefore, I propose that the next SQL standard should introduce a new `GROUP BY INDIVIDUAL ROW` keyword, that henceforth all "scalar" queries MUST use in order to have consistent behaviour with the rest of the language.


What is flawed is not that there is an implicit grouping on all queries. It's that the implicit grouping changes, depending on a set of rules that consider stuff written in several places that are not explicitly related to it.

You are asking if it makes sense to have an implicit grouping at all; it very obviously absolutely does. And grouping by individual row is the very obvious default. But I do totally support adding that keyword expressing the default. All defaults should be expressible.


> SQLite also ignores size restrictions on text columns; [...] requires a trigger.

I'd go for CHECK constraints first: https://www.sqlite.org/lang_createtable.html#ckconst

> Procedural SQL [...] Choose your investment here carefully.

I think that the demand for procedual code has dropped drastically in the past decades as the "normal" SQL can solve so many more things with window functions, recursion, and so forth. So I'd say: Yes, choose your investment wisely and stay away from procedural SQL as long as possible.


There is still a case to embed business logic at the database layer, not the application layer, as databases tend not to change as much.

I have thousands of lines of PL/SQL that originated in the days of PowerBuilder that are now serviced by .NET; a decade from now could be totally different.

SQLite appears to use a (very small) subset of PSM for triggers.


Fair point, looks suspicious on first sight. I've checked my tests briefly: It's accepted as table and column name. As mentioned in footnote [0] I haven't checked different contextes yet.

Can you tell me: what kind of identifier is it (view name, function name) and which SQL context it causes problems (select list, create/drop statement, ...) and which system has problems with it. Thx.


Author of the website here.

All charts on modern-sql.com are backed by tests. I also keep them up-to-date by running them for new releases when they appear.


we xan always rely on you . thank you Markus


I wonder why you are so focused on SQL:2011? Since then there was SQL:2016 and now there is SQL:2023.

Also, which features in particular are you missing in PostgreSQL? Merge was added with PostgreSQL 15 a year ago: https://modern-sql.com/caniuse/merge


2016/2023 got pretty widespread support across databases for the functionality I care about, both before and after adoption - namely the JSON stuff. MSSQL also already had pretty good graph support AFAIK, although I haven't used it that much.

I find the temporal table stuff really useful and they drastically simplify a number of requirements, so it's annoying that the only non-proprietary DB that supports it is maria.


Temporal tables, most likely. Although even SQL Server only supports system time versioning, not full bitemporal tables (as per the spec).


The existing implementations (Oracle DB, SQL Server, MariaDB, Big Query) come with their problems too. I was a big fan of the new features when it came out in 2011, but pratically there is an unsolved elephant in the room: It doesn't cover schema changes.


> there is an unsolved elephant in the room: It doesn't cover schema changes.

100% agreed. It's remarkable how Datomic also arrived on the scene in the same era (2012) but actually managed to solve a lot of these hard issues of immutable versioning + schema evolution via a clean EAV-based information model and an emphasis on accrete-only schema changes.

I'm a big fan of your work by the way :)


> The standard is not generally available. Most of us will never learn what is in it.

It also not aimed at users, but at implementors. Funny enough, they don't read the standard either ;) But more seriously: Some implementations are old and generally vendors prefer not changing the behavior of their product. When the standard comes later, the train has already departed. The most critical incompatibilities are in the elder features. The newer ones have a tendency the be more aligned with standard behavior (e.g. window functions are typically implemented just fine).

> They haven't gotten as far as working out variables or function composition either [0].

Part 2 SQL[0] is declarative and intentionally doesn't have variables. Part 4 SQL (aka "pl" SQL) does have variables. I personally consider Part 4 obsolete on systems that have sufficient modern Part 2 support.

> Tying SQL to your specific database is the best option for performance. Writing database-independent SQL is somethign of a fools errand, it is either trivial performance-insensitive enough that you should be using a code generator or complex enough to deserve DB-specific turning.

While this is certainly true for some cases there are also plenty of examples where the standard SQL is more concise and less error prone than product-specific alternatives. E.g. there COALESCE is the way to go rather than isnull, ifnull, nvl, or the like (typically limited to two arguments, sometimes strange treatment of different types).

There is a lot of *unnecessary* vendor-lock in in the field.

[0] https://modern-sql.com/standard/parts


> Part 2 SQL[0] is declarative and intentionally doesn't have variables...

The issue I have with that is that making a bad decision for a reason doesn't change the fundamental correctness of the decision.

SQL semantics clearly support variables, because they are available as WITH ... clauses and subqueries. So either these elements are a mistake and should be discouraged, or the language has variables and they should be made to read like variables in almost all other languages for consistency and readability.

This is literally a cosmetic issue but a lot of thought on syntax has happened since the 1970s and there seems to be an overwhelming on-the-ground consensus that SQL's syntax is not the way to lay code out. The semantics? Great. But there is nearly nothing from SQL's syntax choices that is present in any surviving programming language since. Many of the more popular languages explicitly reject SQL syntax decisions like Python (significant whitespace), "function(arg,arg,arg)" instead of constructs like "VERB arg,arg WITH TEXT,arg TEXT arg" (everything I've met bar lisp) for functions and not replicating broken English (everything).

Variables are a different thing because the declarative nature of SQL is a useful property. But standard SQL overall does such an appalling job of handling syntax I feel confident that they have gotten this part of the syntax wrong too.

Gotten a bit off topic, just happens to be a thing I feel strongly about.


Lisp is in no way closer to SQL syntax than languages like python. On the contrary, lisp syntax is much more regular.


I recently covered MERGE on modern-sql.com: https://modern-sql.com/caniuse/merge

There I also look at limitations of some implementations and problems such as not reporting ambiguous column names — just guessing what you mean ;)


Great work as always! I still remember looking at your site when I was working on an ETL system for sensor data and realising in frustration how MariaDB was missing all the nice modern SQL functions like merge.


This is one of the questions I try to answer at https://modern-sql.com/


Your website is great and I regularly check it to see what's new in various implementations. Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?


> Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?

I'd guess the incentive structure is the opposite of what you're implying; the same reason why every cordless drill manufacturer has their own battery connector: vendor lock in fuels private planes and shareholder reports, versus being compatible means no one is forced to buy your batteries and thus profits are `$total - $forced_purchases`

This situation gets even worse in the case of a standard without any objective way of knowing one is in compliance. Having a badge on the mysql.com website saying "now featuring SQL:2023 compliance!11" sells how many more support contracts exactly?


That's a good point. Additionally, it seems the standard isn't freely available and I doubt most of the developers of existing SQL DBs partecipate in drafting new standards. It seems it is doomed to diverge even further, which begs the question whether is it relevant anymore to have the SQL standard at all


Luckily pretty much nobody needs the standard documents. It's actually my aim at https://modern-sql.com/ to make the relevant information more accessible — in particular including support-matrices ("Can I Use").


I’ve never had access to the SQL standard, but in things like HTML and CSS I know I reach for the specs extremely regularly, and that when working with SQL (mostly PostgreSQL or SQLite, including sometimes having to support both) I get frustrated at the general poor state of SQL documentation (and certainly a lot of this is because of engine diversity and), and have often wished I had access to the SQL Standard (even though I know engine diversity significantly lowers its value for users of particular databases—my ideal would probably be a version of the spec augmented with engine support and links to each engine’s documentation of the matter). Certainly the table of contents for this spec sounds delightful.

Not all specs are particularly accessible (e.g. ECMAScript is often hard to follow if you haven’t spent a fair bit of time around it—it’s mostly natural-language code that works with a ECMAScript virtual machine), but most of the time, I would much rather have the actual spec over someone’s digested summary that covers what they thought was important, but regularly lacks details important for my situation. Some specs are absolutely magnificent. The HTML Standard is my very favourite as both a piece of spec work and as reference material.

Seriously, specs are really good stuff and it makes me sad how people often ignore them because they assume they’ll be inscrutable. (Similar remarks apply to legislative texts. They’re normally pretty easy to understand, and you find all kinds of sometimes fun and sometimes useful gems that potted summaries exclude.)


I'm studying the SQL standard for years now and compared to other standards that I know (XSLT, a little CSS, decades ago POSIX, C and C++) the SQL standard is really hard to make sense of. You might overestimate the value of having access to it.

Having that said: free would be better.


agreeing here Markus, for all practical purposes all one needs to know are the features supported by a particular product, not the possible features. big fan of your site !


I have not read any of 9075, but 14882 is surprisingly comprehensive, and wg21 regularly publishes prerelease versions for free.


The good news (not really) is that the SQL standard isn't really followed.

So unless you are writing a database yourself, the DMBS documentation is going to be more relevant.


Just in case anyone who isn't familiar with Markus's work gets a slightly w3schools vibe when clicking the above link: his book SQL Performance explained[1] is probably the single most useful book a backend developer can read.

[1] https://sql-performance-explained.com/


As a web developer: your site is clean and crisp, with an almost brutal simplicity which makes it very attractive. Honestly, I like it a lot. It could use one or two very subtle tweaks in the super-fine details, but for a technical information-dissemination site, it’s bloody awesome.

My only issue is with the width. While whitespace between the sides and the centre content is very useful, this isn’t the 1990s anymore with its 1024×768 monitors. You _can_ make the centre column responsive to the overall width of the screen. Doing so can also give you a lot more room to do things, and make it easier to read. Your line-height is already great, and is perfect for text blocks a good half again wider if not twice as wide. Even on my vertical monitors, which are only 1500px wide, that centre column is pretty much claustrophobically narrow.


The rationale behind the width is more the ease of reading than anything else: https://en.wikipedia.org/wiki/Line_length

But thanks for the feedback. There is a rework pending anyway.


One piece of good news is that you just need to drop #page-wrapper's width (or adjust to preference) the rest of the CSS all appears to be sensibly responsive.


I love this, thanks!

How we could we add Snowflake to the compatibility matrices?

It's popular enough to deserve it, IMHO.

https://db-engines.com/en/ranking


Cool nice site man. I bought the book a while back and really enjoy the articles.


I've certainly needed them before :(


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: