From this point of view I don't understand what's happening between the actual S... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		antirez 10 months ago \| parent \| context \| favorite \| on: DeepSeek open source DeepEP – library for MoE trai... From this point of view I don't understand what's happening between the actual SOTA models practice and the academic models. The former at this point are all MoEs, starting with GPT4. But then the open models, if not for DeepSeek V3 and Mixtral, are always dense models.

woctordho 10 months ago | [–]

MoEs require less computation and more memory, so they're harder to setup in small labs

kristianp 10 months ago | [–]

I assumed gpt 4o wasn't MOE, being a smaller version of gpt-4, but I've never heard either way.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact