The complexity of Vulkan or shaders has very little to do with the fixed-function parts of the GPU. The complexities of Vulkan are around the realities of talking to a coprocessor over a relatively low-bandwidth, high-latency pipe. It's not unlike, say, gRPC or whatever other remote call API you want. So you end up building command queues so you can send a batch of work all at once instead of dozens or hundreds of small transmissions. And then need to deal with synchronization between those. And then memory management of all that.
Very little of that goes away as the GPU becomes more programmable / powerful. Rather it gets ever more complicated, as suddenly a texture isn't just a texture anymore. It's now a buffer. And buffers can be used for lots of things. This complexity really could only go away if the GPU & CPU merged into a single unit, which isn't entirely unlike Intel's failed Larrabee as a sibling comment mentioned.
As for shaders, those are just arbitrary programs. The complexity there is entirely the complexity of whatever your renderer does, compounded by the extreme parallelism of a GPU. So this complexity really never goes away.
For your core question, the problem is that fixed function hardware is just always faster & more efficient than a programmable one. So as long as games commonly do the same set of easily ASIC-able work (like render triangles), then you really won't ever see that fixed-function unit go away. But something like Unreal Engine 5's Nanite is kinda the productization of the idea of doing triangle rasterization in a "software" renderer instead of the fixed-function parts of the GPU: https://www.unrealengine.com/en-US/blog/understanding-nanite...
Only a little. The bulk of the complexity for large data like textures isn't around the dma transfer, it's instead around things like ensuring data is properly aligned, that things like textures are swizzled if that format is even documented at all, and ensuring it's actually safe to read or write to the buffer (that is, that the GPU isn't still using it). Also in actually allocating memory in that a malloc/free isn't really provided, rather something like mmap is instead. So you want a (re)allocator on top of that.
And there's also the complexity of things like Vulkan want to work on both unified and non-unified systems.
Also unified doesn't necessarily mean coherent, so there's additional complexities there.
Very little of that goes away as the GPU becomes more programmable / powerful. Rather it gets ever more complicated, as suddenly a texture isn't just a texture anymore. It's now a buffer. And buffers can be used for lots of things. This complexity really could only go away if the GPU & CPU merged into a single unit, which isn't entirely unlike Intel's failed Larrabee as a sibling comment mentioned.
As for shaders, those are just arbitrary programs. The complexity there is entirely the complexity of whatever your renderer does, compounded by the extreme parallelism of a GPU. So this complexity really never goes away.
For your core question, the problem is that fixed function hardware is just always faster & more efficient than a programmable one. So as long as games commonly do the same set of easily ASIC-able work (like render triangles), then you really won't ever see that fixed-function unit go away. But something like Unreal Engine 5's Nanite is kinda the productization of the idea of doing triangle rasterization in a "software" renderer instead of the fixed-function parts of the GPU: https://www.unrealengine.com/en-US/blog/understanding-nanite...