Your definition of reliability seems different to how people use the word. I think most would consider a program that was statically checked, but often produces a wrong result as less reliable than a dynamically checked program that produces the right result.
>My argument is that in the totality of possible errors, statically typed programs have provably LESS errors and thus are definitionally MORE reliable than untyped programs. I am saying that there is ZERO argument here, and that it is mathematical fact. No amount of side stepping out of the bounds of the metric "reliability" will change that.
Making such broad statements about the real world with 100% confidence should already raise some eyebrows. Even through the lens of math and logic, it is unclear how to interpret your argument. Are you claiming that sum of all possible errors in all runnable programs in a statically checked language is less than sum of all possible errors in all runnable programs in an equivalent dynamically checked language? Both of those numbers are infinity, although i remember from school that some infinities are greater than others, I'm not sure how to prove that. And if such statement was true, how does it affect programs written in the real world?
Or is your claim that a randomly picked program from the set of all runnable statically checked programs is expected to have less errors than randomly picked program from the set of all runnable dynamically checked programs? Even this statement doesn't seem trivial, due to correct programs being rejected by type checker.
If your claim is about real world programs being written, you also have to consider that their distribution among the set of all runnable programs is not random. The amount of time, attention span and other resources is often limited. Consider the act of twisting an already correct program in various ways to satisfy the type checker, Consider the time lost that could be invested in further verifying the logic. The result will be much less clear cut, more probabilistic, more situation-dependent etc.
I think the disagreement here comes from overcomplicating what is actually a very simple claim.
I am not reasoning about infinities, cardinalities of infinite sets, or expectations over randomly sampled programs. None of that is needed. You do not need infinities to see that one set is smaller than another. You only need to show that one set contains everything the other does, plus more.
Forget “all possible programs” and forget randomness entirely. We only need to reason about possible runtime outcomes under identical conditions.
Take a language and hold everything constant except static type checking. Same runtime, same semantics, same memory model, same expressiveness. Now ask a very concrete question: what kinds of failures can occur at runtime?
In the dynamically typed variant, there exist programs that execute and then fail with a runtime type error. In the statically typed variant, those same programs are rejected before execution and therefore never produce that runtime failure. Meanwhile, any program that executes successfully in the statically typed variant also executes successfully in the dynamic one. Nothing new can fail in the static case with respect to type errors.
That is enough. No infinities are involved. No counting is required. If System A allows a category of runtime failure that System B forbids entirely, then the set of possible runtime failure states in B is strictly smaller than in A. This is simple containment logic, not higher math.
The “randomly picked program” framing is a red herring. It turns this into an empirical question about distributions, likelihoods, and developer behavior. But the claim is not about what is likely to happen in practice. It is about what can happen at all, given the language definition. The conclusion follows without measuring anything.
Similarly, arguments about time spent satisfying the type checker or opportunity cost shift the discussion to human workflow. Those may matter for productivity, but they are not properties of the language’s runtime behavior. Once you introduce them, you are no longer evaluating reliability under identical technical conditions.
On the definition of reliability: the specific word is not doing the work here. Once everything except typing is held constant, all other dimensions are equal by assumption. There is literally nothing else left to compare. What remains is exactly one difference: whether a class of runtime failures exists at all. At that point, reliability reduces to failure modes not by preference or definition games, but because there is no other remaining axis. I mean everything is the same! What else can you compare if not the type errors? Then ask the question which one is more reliable? Well… everything is the same except one has run time type errors, while the other doesn’t… which one would you call more “reliable”? The answer is obvious.
So the claim is not that statically typed languages produce correct programs or better engineers. The claim is much narrower and much stronger: holding everything else fixed, static typing removes a class of runtime failures that dynamic typing allows. That statement does not rely on infinities, randomness, or empirical observation. It follows directly from what static typing is.
>My argument is that in the totality of possible errors, statically typed programs have provably LESS errors and thus are definitionally MORE reliable than untyped programs. I am saying that there is ZERO argument here, and that it is mathematical fact. No amount of side stepping out of the bounds of the metric "reliability" will change that.
Making such broad statements about the real world with 100% confidence should already raise some eyebrows. Even through the lens of math and logic, it is unclear how to interpret your argument. Are you claiming that sum of all possible errors in all runnable programs in a statically checked language is less than sum of all possible errors in all runnable programs in an equivalent dynamically checked language? Both of those numbers are infinity, although i remember from school that some infinities are greater than others, I'm not sure how to prove that. And if such statement was true, how does it affect programs written in the real world?
Or is your claim that a randomly picked program from the set of all runnable statically checked programs is expected to have less errors than randomly picked program from the set of all runnable dynamically checked programs? Even this statement doesn't seem trivial, due to correct programs being rejected by type checker.
If your claim is about real world programs being written, you also have to consider that their distribution among the set of all runnable programs is not random. The amount of time, attention span and other resources is often limited. Consider the act of twisting an already correct program in various ways to satisfy the type checker, Consider the time lost that could be invested in further verifying the logic. The result will be much less clear cut, more probabilistic, more situation-dependent etc.