Surely those seemingly smart anonymous reviewers now feel pretty dumb in hindsight.
Peer review does not work for new ideas, because no one ever has the time or bandwidth to spend hours upon hours upon hours trying to understand new things.
It's worth pointing out that most of the best science happened before peer review was dominant.
There's an article I came across awhile back, that I can't easily find now, that basically mapped out the history of our current peer review system. Peer review as we know it today was largely born in the 70s and a response to several funding crises in academia. Peer review was a strategy to make research appear more credible.
The most damning critique of peer-review of course is that it completely failed to stop (and arguably aided) the reproducibility crisis. We have an academic system where the prime motivation is the secure funding through the image of credibility, which from first principles is a recipe for wide spread fraud.
>It's worth pointing out that most of the best science happened before peer review was dominant.
It's worth pointing out that most of everything happened before peer review was dominant. Given how many advances we've made in the past 50 years, so I'm not super sure everyone would agree with your statement. If they did, they'd probably also agree that most of the worst science also happened before peer review was dominant, too, though.
Our advances in the last 50 years have largely been in engineering, not science. You could probably take a random physics professor from 1970 and they'd not sweat too much trying to teach physics at the graduate level today.
But a biology professor from that time period would have a lot of catching up to do, perhaps too much, especially (but not only) if any part of their work touched molecular biology or genetics.
But there is zero reason why the definition of peer review hasn't immediately been extended to include:
- accessing and verifying the datasets (in some tamper-proof mechanism that has an audit trail). Ditto the code. This would have detected the Francesca Gino and Dan Ariely alleged frauds, and many others. It's much easier in domains like behavioral psychology where the dataset size is spreadsheets << 1Mb instead of Gb or Tb.
- picking a selective sample of papers to check reproducibility on; you can't verify all submissions, but you sure could verify most accepted papers, also the top-1000 most cited new papers each year in each field, etc. This would prevent the worst excesses.
PS a superb overview video [0] by Pete Judo "6 Ways Scientists Fake Their Data" (p-hacking, data peeking, variable manipulation, hypothesis-shopping and selectively choosing the sample, selective reporting, also questionable outlier treatment). Based on article [1]. Also as Judo frequently remarks, there should be much more formal incentive for publishing replication studies and negative results.
It seems kind of obvious that peer review is going to reward peer think, peer citation, and academic incremental advance. Obviously that's not how innovation works.
the system, as flawed as it is, is very effective for its purpose. see eg "success is 10% inspiration and 90% perspiration". on a darker side, the purpose is not to be fair to any particular individual, or even to be conducive to human flourishing at large.
I have finished a PhD in AI just this past year, and can assure you there exist reviewers who spend hours per review to do it well. It's true that these days it's often the case that you can (and are more likely than not to) get unlucky with lazier reviewers, but that does not appear to have been the case with this paper.
For example just see this from the review of f5bf:
"The main contribution of the paper comprises two new NLM architectures that facilitate training on massive data sets. The first model, CBOW, is essentially a standard feed-forward NLM without the intermediate projection layer (but with weight sharing + averaging before applying the non-linearity in the hidden layer). The second model, skip-gram, comprises a collection of simple feed-forward nets that predict the presence of a preceding or succeeding word from the current word. The models are trained on a massive Google News corpus, and tested on a semantic and syntactic question-answering task. The results of these experiments look promising.
...
(2) The description of the models that are developed is very minimal, making it hard to determine how different they are from, e.g., the models presented in [15]. It would be very helpful if the authors included some graphical representations and/or more mathematical details of their models. Given that the authors still almost have one page left, and that they use a lot of space for the (frankly, somewhat superfluous) equations for the number of parameters of each model, this should not be a problem."
These reviews in turn led to significant (though apparently not significant enough) modifications to the paper (https://openreview.net/forum?id=idpCdOWtqXd60¬eId=C8Vn84f...). These were some quality reviews and the paper benefited from going this review process, IMHO.
I have been deeply unimpressed with the ML conference track this last year... There's too many papers, too few reviewers, leading to an insane number of PhD student-reviewers. We've gotten some real nonsense reviews, with some real sins against the spirit of science baked into them.
For example, a reviewer essentially insisting that nothing is worth publishing if it doesn't include a new architecture idea and SOTA results... God forbid we better understand and simplify the tools that already exist!
This is not the takeaway I got. The takeaway I got was the review process improved the paper and made it more rigorous. How is that a bad thing? But yes, sometimes reviewers are focusing on different issues instead of 'is this going to revolutionize A, B, and C'.
I currently have a paper under review (first round) that was submitted the 2nd of August. This is at the second journal. The first submission was a few months before that.
I'm not sure peer review makes things more rigorous, but it surely makes it more slow.
The issue here wasn't that the reviewers couldn't handle a new idea. They were all very familiar with word embeddings and ways to make them. There weren't a lot a of new concepts in word2vec, what distinguished it was that it was simple, fast, and good quality. The software and pretrained vectors were easy to access and use compared to existing methods.
Peer review isn't about the validity of your findings and the reviewers are not tasked with evaluating the findings of the researchers. The point is to be a light filter to make sure a published paper has the necessary information and rigor for someone else to try to replicate your experiment or build off of your findings. Those are the processes for evaluating the correctness of the findings.
Peer review does not work for new ideas, because no one ever has the time or bandwidth to spend hours upon hours upon hours trying to understand new things.