Another way to frame it is that these models still perform very poorly at the task they're designed to do. Imagine if real programmer needed to write a solution a hundred times before they were able to achieve (average) performance. You'd probably wonder if it was just blind luck that got them to the solution. You'd also fire them. What these models are very good at doing is plagiarizing content, so part of me wonders if they aren't just copying previous solutions with slight adjustments.
> Imagine if real programmer needed to write a solution a hundred times
To be fair, a lot of creative work requires plenty of trial and error. And since no problems are solved from scratch, all things considered, the most immediate contributors to your result and you might have iterated through tens of dozens of possibilities.
My advantage as a human is I can often tell you why I am eliminating this branch of the search space. The catch is my reasoning can be flawed. But we do ok.
> just copying previous solutions with slight adjustments.
It's not just doing that, Copilot can do a workable job providing suggestions for an invented DSL. A better analogy than autocomplete is inpainting missing or corrupted details based on a surrounding context. Except instead of a painting we are probabilistically filling in patterns common in solutions to leetcode style problems. Novelty beyond slight adjustments comes in when constraints are insufficient to pin down a problem to a known combination of concepts. The intelligence of the model is then how appropriate its best guesses are.
The limitations to GPT3 codex and AlphaCode seems to be they're relatively weak at selection and that they require problem spaces with enough data to distill a sketch of and how to inpaint well in them. Leetcode style puzzles are constructed to be soluble in a reasonable number of lines, are not open ended and have a trick to them. One can complain that while we're closer to real world utility, we're still restricted to the closed worlds of verbose apis, games and puzzles.
While lots of commenters seem concerned about jobs, I look forward to having the dataset oliphaunt and ship computer from Fire Upon Deep someday soon.
>> While lots of commenters seem concerned about jobs, I look forward to having the dataset oliphaunt and ship computer from Fire Upon Deep someday soon.
I think this is more worthy of debate than anything about DSL models or current limits to problem spaces.
I'm not concerned about my job, but I am concerned about a world where corporate money starts shifting toward managing AIs as beasts rather than coding clever solutions. I'm concerned about it because (1) It has always been possible in theory to invent an infinite number of solutions and narrow them down, if you have the processing power, to those that "work", but, this leaves us in a position where we don't understand the code we're running (as a society) or how to fix it (as individuals). And (2) because learning to manage an elephant, as a beast, is utterly different from learning to build an elephant, and it will lead to a dumbing-down of people entering the trade. In turn, they'll become more reliant on things just working the way they're expected to work. This is a very negative cycle for humanity as a whole.
Given the thing you're looking forward to, it's only about 30 years before no one can write code at all; worse, no one will know how to fix a broken machine. I don't think that's the thing we should advocate for.
"Understanding the code" might not be that big of a deal as you might think -- we have this problem today already. A talented coder might leave the company and the employer may not be able to hire a replacement who's as good. Now they have to deal with some magic in the codebase. I don't hear people giving advice not to hire smart people.
At least with AI, you can (presumably) replicate the results if you re-run everything from the same state.
There's also a very interesting paragraph in the paper (I'm in no position to judge whether it's valid or not) that touches on this subject, but with a positive twist :
Interpretability. One major advantage of code generation models is that code itself is relatively interpretable. Understanding the behavior of neural networks is challenging, but the code that code generation models output is human readable and can be analysed by traditional methods (and is therefore easier to trust). Proving a sorting algorithm is correct is usually easier than proving a network will sort numbers correctly in all cases. Interpretability makes code generation safer for real-world environments and for fairer machine learning. We can examine code written by a human-readable code generation system for bias, and understand the decisions it makes.
> Now they have to deal with some magic in the codebase. I don't hear people giving advice not to hire smart people.
People do advise against hiring people who write incomprehensible code.
Yeah every now and then you run across some genius with sloppy code style and you have to confine them to a module that you'll mark "you're not expected to understand this" when they leave because they're really that much of a genius, but usually the smart people are smart enough to write readable code.
>The limitations to GPT3 codex and AlphaCode seems to be they're relatively weak at selection
This really does seem like the key here--the knowledge apparently is all in the language model, we just haven't found the best ways to extract that knowledge in a consistent and coherent manner. Right now it's just: generate a bunch of examples and cherry pick the good ones.
How do you know the inner workings of the mind don't operate in a similar manner? How many different solutions to the problem are constructed within your mind before the correct one 'just arrives'?
I suspect there is some similarity between language models and the structure of language in the mind, but there's a whole lot more going on behind the scenes in the brain than simple runtime statistical model output. Intentionality, planning, narrativity, memory formation, object permanence... Language models are exciting and interesting because apparently they can do abstract symbolic manipulation and produce coherent text, but I wouldn't call AGI solved quite yet.
I was really impressed with a lot of the GPT3 stuff I had seen people showing so I gave it a spin myself. I was surprised by how repetitive it seemed to be, it would write new sentences but it would repeat the same concepts among similar prompts. I wish I saved the examples, it was like when a chat bot gets in a loop but GPT3 varied the sentence structure. I think that if you look closely at transformer models outputs you can expect the same sort of thing. Its like in high school when people would copy homework but use different wording.
I also think generally in ML and DL the overarching progress gets hyped but in the background there are murmurs about the limitations in the research community. Thats how we end up with people in 2012 saying FSD is a couple years away but in 2022 we know we aren't even close yet. We tend to oversell how capable these systems are.
Id be shocked if people pitching startups and research grants etc all started saying "yeah this stuff isn't going to work for a couple of decades in any kind of sustainable manner" even if these types of unknowable unknowns were known.