Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting - _q4 on a pair of 12Gb 3060s it runs at 20 tok/sec. _q8 (25Gb) on same is about 4 tok/sec.


~360GB/s memory bandwidth on the 3060, versus ~1008GB/s on the 3090 Ti probably accounts for that.

Given that, I'd expect a single 3060 (if a large enough one existed) to run at about 16 tok/s so 20 tok/s on two isn't bad not being NVLinked.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: