The one table used for Clickbench isn’t that large. I assume these are the “warm” results that throw away the first execution of each query & all perf being measured here is in-memory.
That's an assumption. My #1 rule for a benchmark is that it should have reproducible results. Using variable-performance storage goes directly against reproducible results.
On a related topic: an OLAP benchmark with a small dataset that fits in memory caters only to what I'd consider a small set of OLAP use cases. I'd love to see one with a large dataset much bigger than memory.