Also I think even an M3 Ultra is more cost effective at running LLMs than 4090 o...

andy_ppp · 2025-10-15T17:11:12 1760548272

It can run larger models quite slowly but lacks matmul acceleration (included in the M5) that is very useful for context and prompt performance at inference time. I will probably burn my budget with an M5 Max with 256gb (maybe even 512gb) memory, the price will be upsetting but I guess that is life!

sgt · 2025-10-15T18:51:04 1760554264

Yes! I think smaller models on the M3 Ultra is interesting enough, but now with matmul/ tensors on M5 Ultra or Max, with decent unified mem, it will be a gamechanger.

I can easily imagine companies running Mac Studios in prod. Apple should release another Xserve.

andy_ppp · 2025-10-16T08:09:57 1760602197

Yes completely, my guess is M6 will have external GPUs perfect for AI accelerators at home and in datacenters.