Sometimes I wonder about the how much better leetcode interviews are at picking the best candidate than a random number generator would be (obviously after doing all the other interview stuff other than the leetcode)
This is a great question, more generally what is the expected improvement in the quality of hire conditional on some selection method.
The famous Hunter & Schmidt meta-analysis suggests that only work-sample tests and IQ tests work well enough to use, but it's an old set of studies so hard to know if it still applies.
In the updated paper, work sample tests do not come up as important as they were in the original paper:
It appears that almost all of the validity of work sample
tests is captured by GMA measures, because the incremental
validity is essentially zero.
But work sample tests are a great mechanism when GMA cannot be used for legal or other reasons. The paper concludes with:
The research evidence summarized in this article shows
that different methods and combinations of methods have
very different validities for predicting future job
performance. Some, such as person-job fit, person-
organization fit, and amount of education, have low
validity. Others, such as graphology, have essentially no
validity; they are equivalent to hiring randomly. Still
others, such as GMA tests and integrity tests, have high
validity. Of the combinations of predictors examined, two
stand out as being both practical to use for most hiring
and as having high composite validity: the combination of
a GMA test and an integrity test (composite validity of
.78); and the combination of a GMA test and a structured
interview (composite validity of .76).
Well, the problem is that you can't actually have a "realistic" work problem in the space of an interview. Given that constraint I think it's a reasonable approximation.
Sure you can. At least, if our baseline for "realism" is, "an approximation of what someone will do on the job".
I'd say asking someone to regurgitate some solution for a random leetcode problem is far worse an approximation than asking someone to write, say, a little toy API that does nothing more than retrieve a value out of a set.
See how well they're able to develop in a language of their choosing. Can they get started immediately or do they stumble putting together the first little building blocks?
Treat it like a "real-world example" and make that clear up front. Do they think about logging and metrics? (for the purposes of a toy interview problem, just writing to stdout for both would be sufficient). Do they think about dependency injection? What about unit tests?
Then follow it up by asking them to modify a bit of their logic. ("okay, we've got it returning a matching value from the set if it exists - what if we wanted to add in wildcard support at the end of the incoming string?").
Tons of very real things to consider, even in the constraints of a simple toy problem.
As someone who works in Big Tech, I would much rather have people on my team who think about maintainability, debuggability, monitoring, what can go wrong, etc. etc. (and have shown during an interview they're capable of writing some trivial business logic around that) than someone who absolutely nailed mirroring a binary tree and solving the longest common sub-sequence problem.
Have you ever actually been asked to implement a sorting algorithm or balance a tree from memory? I've done a lot of whiteboard interviews, including at some of the Big Ns, and I can't say I ever experienced this, despite those exercises being used as a kind of metonym for whiteboarding.
It was mostly a joke, I'm a DS so I get arbitrary take homes rather than leetcode.
The more general point is that the algorithmic approaches from leetcode problems have not a lot of relation to what most programmers do all day, and as such, are less useful as a work-sample test.
Doing a take-home where you fix some bugs would probably work better (i.e. more correlated with outcomes) than leetcode interviews.
Well I think this is an important point, because the tests, IME, are asking you to apply the concepts to solve a toy problem, not to actually implement stuff like sorting algorithms from memory. The latter is indeed unrealistic, but the former is something my job actually does entail.