10G Ethernet would only marginally speed things up based on past experience with llama RPC; latency is much more helpful but still, diminishing returns with that layer split.
Possibly RDMA over thunderbolt. But for RoCE (RDMA over converged Ethernet) obviously not because it's sitting on top of Ethernet. Now that could still have a higher throughput when you factor in CPU time to run custom protocols that smart NICs could just DMA instead, but the overhead is still definitively higher
"Next I tested llama.cpp running AI models over 2.5 gigabit Ethernet versus Thunderbolt 5"
Results from that graph showed only a ~10% benefit from TB5 vs. Ethernet.
Note: The M3 studios support 10Gbps ethernet, but that wasn't tested. Instead it was tested using 2.5Gbps ethernet.
If 2.5G ethernet was only 10% slower than TB, how would 10G Ethernet have fared?
Also, TB5 has to be wired so that every CPU is connected to every other over TB, limiting you to 4 macs.
By comparison, with Ethernet, you could use a hub & spoke configuration with a Ethernet switch, theoretically letting you use more than 4 CPUs.