Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The bigger problem is that the benchmarks / multiple-choice tests they are trained to optimize for don't distinguish between a wrong answer and "I don't know". Which is stupid and surprising. There was a thread here on HN about this recently.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: