> The obvious alternative to gradient descent here would be Bayes Formula If you... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		renonce on March 13, 2024 \| parent \| context \| favorite \| on: Devin: AI Software Engineer > The obvious alternative to gradient descent here would be Bayes Formula If you know a little about the math behind gradient descent you can see that an embedding layer followed by a softmax layer gives you exactly the best Bayes estimate. If you want a bit of structure, like every word depends on previous n words, you get a convolutional RNN which is also well studied. These ideas are natural and elegant but maybe a better idea is to comprehend the research already done to avoid diving into dead ends too much.

HarHarVeryFunny on March 13, 2024 [–]

No, I don't "want a bit of structure" ... I want a predictive architecture that supports online learning. So far the only one I'm aware of is the cortex.

Not sure what approaches you are considering as dead ends, but RNNs still have their place (e.g. Mamba), depending on what you are trying to achieve.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact