Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It didn't. Hence my question, as that's the gold standard for science. How is this validated? How do they design the experiments when they do science on programming related stuff?

I don't get my amount of downvotes for a genuine question based on curiosity. I'm just interested in how science is done in this type of research.



Lots of excellent science is done without double-blind experiments. Not all experiments need double blind. All of the Pluto flyby researchers know their data comes from Pluto.

You'll mostly see the phrase 'gold standard for science' in biomedical and social science fields where subjective evaluation plays, or may play, an important role.

While these are studies are definitely in social science, they mostly stuck with objective values which cannot easily be affected by an un-blinded experimenter.

Here's what a quick scan of some of the papers, which you could have done yourself.

The paper "Static type systems (sometimes) have a positive impact on the usability of undocumented software" uses randomly assigned test subjects and a measurement - time - which does not have a subjective component that would be affected by blinding the experimenter.

The same with "How Do API Documentation and Static Typing Affect API Usability?", which also looked at other objective parameters.

"An Experiment About Static and Dynamic Type Systems" also used time, as well as the ability to pass a set of test cases which were uniformly applied. They considered using code reviews, but the subjectiveness requires a large number of reviewers to hope to get an interpretable number. Had they gone the code review route, yes, I think the reviewers would also need to be blinded.

"Work In Progress: an Empirical Study of Static Typing in Ruby" was a pilot study which had no experimental numbers, and was mostly meant for hypothesis generation and working out kinks in the study protocol.

"Haskell vs. Ada vs. C++ vs. Awk vs. ... An Experiment in Software Prototyping Productivity" used LoC, development time, and subjective reviewers, noting the issues with subjective reviews, but curiously omitting the low statistical confidence. Blinding wouldn't have made a difference in the reviews as all programs but Haskell were written in a different languages. (Two were in Haskell.)

FWIW, double-blind studies are not immune to p-hacking, for that you need a preregistered study. I guess a pre-registered triple-blind study would be platinum standard?


Double blinds are not the gold standard for science. They are not even the gold standard for medicine, although for drug studies in particular they have great utility. They actually have quite limited applicability outside of a particular set of circumstances.

The double blind study is a special construct created to deal with the confounding effect of placebos, which really isn’t a thing outside of medicine.


So blinding is pretty much impossible for something like typing. The programmer can't be ignorant to whether the language is typed, it's central aspect of using the language.

The reason blinding works so well in medicine is because you don't consciously interact with the medicine's mechanism. Your body does. If you had to understand the shape of the molecules in a pill, say, for it to work, you wouldn't be able to blind.


Yes, you can. A language may not have explicit type declarations, but still have machine-inferable types.

"Work In Progress: an Empirical Study of Static Typing in Ruby" gives an example.

> In this paper, we present an empirical pilot study of four skilled programmers as they develop programs in Ruby, a popular, dynamically typed, object-oriented scripting language. Our study compares programmer behavior under the standard Ruby interpreter versus using Diamondback Ruby (DRuby), which adds static type inference to Ruby. The aim of our study is to understand whether DRuby’s static typing is beneficial to programmers.

I can imagine a similar experiment where both groups use Python, but one setup uses just Python and the other uses Python+mypy, to report type issues.

(The WIP paper points out how DRuby is a lot slower than Ruby, so users aren't blind to the effect. The Python experiment should probably run Python+mypy for both cases, but only report the mypy output for one case.)


Getting better diagnostics from secretly running a type checker as a "linter" is not the same as writing a program in a language you know is typed and which you design the types first before constructing the program.

I'm sorry but this seems uncontroversial to me, I don't think this negates the point I was making at all.


The linked-to literature review considers it a relevant paper. Take up your controversial issue with Dan Luu.

> not the same as writing a program in a language you know is typed and which you design the types first before constructing the program

You know that's not the only way to work with static types, right?

Furthermore, the same DRuby paper hypothesizes "that programmers have ways of reasoning about types that compensate for the lack of static type information". It suggests that programmers are designing with types first, in their head, even in programming languages which don't support types.

So I think you meant to write something more like "express" or "implement" the types first, not "design."


> I don't get my amount of downvotes for a genuine question based on curiosity.

It doesn't come across this way. It comes across as "this isn't legitimate because they aren't looking at studies that do double-blind experimentation."


Well, that's an uncharitable way of reading into it imo. Anyhow, I explained my question in a follow up (can no longer edit). Basically I'm curious on how to do proper science on things like these.


It was the same reading I had.

"How are they doing double blinded experiments on typing?" comes across as assuming double blinded experiments are the only acceptable way to a valid answer.

Why can't single-blinded, or even unblinded, give useful answers?

That is, why do you think any answer to your question would make a difference as to the quality of the results?

Epidemiology is an important field of science, even when based on observational and not experimental science. The answer to "How are they doing double blinded experiments on observational studies" is "they aren't."

Yet it's still good science.


Some areas of science use double blind studies a lot, others don’t. I assume you won’t get mad if I don’t use a double blind study when investigating a black hole, right? We use different types of studies for different types of science.

Double blind studies work for areas where the experimental subject can’t tell the difference between the different courses - for example, whether a drug relieves pain. A double blind study for static/dynamic typing is definitionally impossible, since the experimental subject must know whether the language is static or dynamic, and they come into the experiment with biases about that.

Now, you could do something like a double blind study, where the participants are assigned to program in a language without knowing what they’re testing. I hope that those types of studies are done! But that’s a different thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: