It's a bit weird, very much a "software optimization" approach. But looking at the flame graph, you couldn't tell a model running in FP32 from one in INT8, taking 3x the time and energy.
And? This is an information trivially obtainable in a different way (e.g. using a stopwatch), while flamegraphs visualise where that time was spent, helping us to determine the parts that need to be optimised.