The bigger problem is that the benchmarks / multiple-choice tests they are train...

		pegasus 61 days ago \| parent \| context \| favorite \| on: Reasoning models reason well, until they don't The bigger problem is that the benchmarks / multiple-choice tests they are trained to optimize for don't distinguish between a wrong answer and "I don't know". Which is stupid and surprising. There was a thread here on HN about this recently.