Respectfully I disagree - the situation is far more complex in science than software engineering disciplines.
I agree that different tests require different amounts of effort (obviously), but even the simplest "unit tests" you could conceive of for scientific domains are very complex, as there's no standard (or even unique) way to translate a scientific problem into a formally checkable system. Theories are frameworks within which experiments can be judged, but this is rarely unambiguous, and often requires a great deal of domain-specific knowledge - in analogy to programming it would be like the semantics of your language changing with every program you write. On the other hand, any programmer in a modern language can add useful tests to a codebase with (relatively) little effort.
We are talking hours versus months or even years here!
The experiment informs the ontology which informs the experiment. I don't think this is reducible to bias, although that certainly exists. Rather to me it's inherent uncertainty in the domain that experiments seek to address.
Business practice, as you use the term, evolved to serve very different needs. Automated testing is useful for building software, but that effort may be better spent in science developing new experiments and hypotheses. It's very much an open problem whether the juice is worth the squeeze - in fact the lack of such efforts is (weak) evidence that it might not be. Scientists are not stupid.
> We are talking hours versus months or even years here!
That is why there are engineers that specialize in how to perform testing so that it doesn't take so long. For example long tests don't need to run at all if more critical short tests fail. The problems you describe for astrophysics are not unique to astrophysics even if the scale, size, and diversity of the data is so unique. Likewise, all the excuses I hear to avoid testing are the very same excuses software developers make.
The reality is that these are only 25% valid. On their face these excuses are complete garbage, but a validation cannot occur faster than the underlying system allows. If the system being tested is remarkably slow then any testing upon it will be, at best, just as remarkably slow. That is not a fault of the test or the testing, but is entirely the fault of that underlying system.
Uniqueness is not a requirement =) I still think you are over-generalising from experience in the software world. Science is about sense-making, finding parsimonious ontologies to describe the world. Software is about building reliable automation for various purposes.
They have orthogonal goals; why would you believe that automated testing would work the same way in both domains? I just don't see it.
Maybe you can elaborate on what you mean by automated testing of scientific hypotheses? I get the feeling we are talking past each other because we're both repeating the same points. Maybe we should focus on the 25% of excuses you've agreed are valid!
That’s because you haven’t tried. You only test what you know, what’s provable. The goal isn’t 100% validation of everything. The only goal is error identification. Surely you know something in science, like the distance to Proxima Centauri. Start with what you know.
Then when something new comes along the only goal is to see which of those known things are challenged.
Testing doesn’t buy certainty. It’s more like insurance, because it successfully lowers risks in much shorter time. Like with insurance there aren’t wild expectations it’s going to prevent a house fire, but it will prevent unexpected homelessness.
I agree that different tests require different amounts of effort (obviously), but even the simplest "unit tests" you could conceive of for scientific domains are very complex, as there's no standard (or even unique) way to translate a scientific problem into a formally checkable system. Theories are frameworks within which experiments can be judged, but this is rarely unambiguous, and often requires a great deal of domain-specific knowledge - in analogy to programming it would be like the semantics of your language changing with every program you write. On the other hand, any programmer in a modern language can add useful tests to a codebase with (relatively) little effort.
We are talking hours versus months or even years here!
The experiment informs the ontology which informs the experiment. I don't think this is reducible to bias, although that certainly exists. Rather to me it's inherent uncertainty in the domain that experiments seek to address.
Business practice, as you use the term, evolved to serve very different needs. Automated testing is useful for building software, but that effort may be better spent in science developing new experiments and hypotheses. It's very much an open problem whether the juice is worth the squeeze - in fact the lack of such efforts is (weak) evidence that it might not be. Scientists are not stupid.