I would bet the answer is it is easier to write a script the simply downloads everything it can (foreach <a href=>: download and recurse), rather than looking into which sites provide data dumps and how to use them.
>Every customer would prefer a firehose content delta over having to scrape for diffs.
customers is a strong word, especially when you're saying they should be providing a new service useful, more or less exclusively, to AI startups and megacorps
Why should they create a whole new architecture to support when you can find changed articles between two dumps with a simple query? I'd rather load a big file into a database than maintain a firehose consumer.
Every customer would prefer a firehose content delta over having to scrape for diffs.
They obviously have the capital to provide this, and still grow their funds for eternity without ever needing a single dollar in external revenue.
Why don't they?