Now we are moving way out of ELI5 territory but this is a simplification that I think hurts as much as it helps. It's not that a graphics card has thousands of ALUs restricted to always executing the same instruction.
It's more like you have a fairly high number of cores each of which consists of a large number (typically 32 or 64) of ALUs executing the same instructions in parallel.
This means that while you cannot execute a thousand different instructions in parallel you can in theory run somewhere between tens and hundreds of different instructions each across 32 or 64 different sets of input data.
It's more like you have a fairly high number of cores each of which consists of a large number (typically 32 or 64) of ALUs executing the same instructions in parallel.
This means that while you cannot execute a thousand different instructions in parallel you can in theory run somewhere between tens and hundreds of different instructions each across 32 or 64 different sets of input data.