Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

More resources like this if you are interested:

If you want to understand how the GPU driver thinks under the hood read through https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...

If you want to see the OpenGL state machine in action, check out https://webglfundamentals.org/webgl/lessons/resources/webgl-...



Well I hope you are happy. Just lost like 3 hours of my time, because I couldn't stop reading.


Another great series from fyg is https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlu...

I'd not come across the state machine site before, it looks pretty useful.


Would you be able to point me in direction of understanding diff btwn GPU and CPU from ELI5 --> [whatever]


A CPU (generally) uses one powerful core, sequentially going through a large chunk of data, while a GPU has a large number of relatively weak cores (say, hundreds) operating in parallel on small chunks of data. For instance, if you wanted to do something with every pixel on the screen (~2 million), with a CPU, you would step through each pixel in a loop, one by one. With a GPU, you would run a small program (usually called a shader for historical reasons) for each pixel in parallel on all of its cores. Hopefully that helps.


Although the GPU vendor would like you to think they're cores, they're more like ALUs or SIMD units since they all run the same instructions.


Now we are moving way out of ELI5 territory but this is a simplification that I think hurts as much as it helps. It's not that a graphics card has thousands of ALUs restricted to always executing the same instruction.

It's more like you have a fairly high number of cores each of which consists of a large number (typically 32 or 64) of ALUs executing the same instructions in parallel.

This means that while you cannot execute a thousand different instructions in parallel you can in theory run somewhere between tens and hundreds of different instructions each across 32 or 64 different sets of input data.


A key trade-off is IPC per thread vs area. The CPU needs a high IPC per thread but by achieving this, the area goes disproportionately up and IPC / mm^2 goes down.

The GPU does not need high single-thread performance so it has lots of simpler and smaller cores, and as a result the total IPC goes way up.

And then of course the GPU has dedicated hardware for operations like rasterization and texture interpolation which the CPU lacks. On the other hand, the CPU needs to run an OS so each core has support for virtual memory, interrupts and other types of instructions that are either not needed on the GPU or at least do not need to be supported by each one of the little cores and are handled in a more global manner, making these cores even smaller.


CPU = Lots of branching and complicated operations, small number of cores GPU = Lots of cores, not much branching and simple operations




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: