It kind of is expected, right? If a 70B model can have great overall performance, a 1B model focused on coding and a single language could even be comparable.
I am actually hoping we see more per language models soon, though obviously, it can be as "smart" if trained only on a single language.
reply