Apple's Metal GPU profiler is excellent. And it works even if you're using it through WebGPU from C++ compiled outside of XCode.
Among other things, when you click on a draw command it shows you a thumbnail of the changes it made on the screen. It does this by saving and re-running the entire window render, highlighting the one command.
Does someone knows of a software that can estimate pipeline flushes, execution unit saturation, memory bandwidth and latency limits etc. for a given cpu?
Like:
simulate a.out
And it would give me a report for each line of code and tell me everything that happened on that line. Was it a cache miss? False sharing? Etc.
Among other things, when you click on a draw command it shows you a thumbnail of the changes it made on the screen. It does this by saving and re-running the entire window render, highlighting the one command.