Trurl's machine, indeed. It insisted that the volume of the unit cube and unit ball are both the same, and 1, in all dimensions, even though it knew the correct formula for the surface of the n-ball.
Wen I pointed out that n=2 is a simple counter example, it refused to talk to me (no answer, try-again button, ad inifinitum). Well, safer than Trurl's machine.
Don't even have to go that far. Just have it multiply two 3 or 4-digit numbers. It'll give an incorrect answer somewhere in the ballpark of the right answer.
You're asking a language model to do math. What's impressive there is not that it fails but that it comes up with an answer at all, especially if it is in the ballpark.
Most humans would do exactly the same unless given either access to pen and paper or a calculator, and it would likely be trivial for GPT-3 input processing to detect it has been presented with a math question and to farm it out to a special calculation module. Once you start to augment its input like that progress would be very rapid but it would no longer be just a language model.
Well, everything is math, at some level. Supreme Court decisions might be. There are software packages used to day, using some "AI", to help judges determine the adequate level of punishments looking at circumstantial factors determining recividism rates et cetera [1] [2].
I believe that in the not too distant future there will be pressure to use these "magic" AIs to be applied everywhere, and this pressure will probably not look very hard at whether the AI is good at math or not. Just look at all the pseudoscience in the criminal system [3]. I believe this poses a real problem, so keeping hareping on this is probably the right response.