Yes, except that I'm not so sure there is a clear distinction between following general instructions and generating new heuristics. It's just a difference in the level of abstraction there, and probably not even that one in any discrete sense, more like a continuum.
(Current) models may of course lack sufficient training data to act on a metalevel enough ("be creative problem solvers"), or they may lack deep enough representations to efficiently act in a more creative way. (And those two may be more or less the same thing or not.)
Up to a point, general instructions can be generated from a collection of specific examples by abstracting what is different between them, but it is not clear to me that abstraction is all you need to come up with novel methods.
This seems consistent with the main points in this paper: one to-the-point statement of the answer to a factual question is all you need [1], while, if you don't have an example of a chain of reasoning in which all of the parameters are the same as those in the prompt, more than one example will be needed.
The authors write "We falsify the hypothesis that the correlations are caused by the fact that the reasoning questions are superficially similar to each other, by using a set of control queries that are also superficially similar but do not require any reasoning and repeating the entire experiment. For the control queries we mostly do not observe a correlation." In the examples of control queries that they give, however, this just amounts to embedding the specific answer to the question asked into language that resembles an example of reasoning to a solution (and in the first example, there is very little of the latter.) The result, in such cases, is that there is much less correlation with genuine examples of reasoning to a solution, but it is not yet clear to me how this fact justifies the claim quoted at the start of this paragraph: if the training set contains the answer stated as a fact, is it surprising that the LLM treats it as such?
[1] One caveat: if the answer to a factual question is widely disputed within the training data, there will likely be many to-the-point statements presented as the one correct answer (or - much less likely, I think - a general agreement that no definitive answer can be given.) The examples given in figure 1 are not like this, however, and it would be interesting to know if the significance of individual documents extends to such cases.
Not "exactly" how we learn. Humans learn through a combination of reinforcement learning (which is costly/risky/painful) and through observation of existing patterns and norms.
Better observation-based learning is a less expensive way of improving existing corpus-based approaches than trial-and-error and participating in an environment.
except that the careful observation comes late in the curriculum. children don't learn if you start out with the Stern Gerlach experiment. they sing ABCs.
The parent of any young child can tell you that they learn through lots of exploration and reinforcement - often to the worry and chagrin of caregivers. Indeed much of our job is to guide exploration away from excessively dangerous “research” activities (ex. locking away cleaning products).
(Current) models may of course lack sufficient training data to act on a metalevel enough ("be creative problem solvers"), or they may lack deep enough representations to efficiently act in a more creative way. (And those two may be more or less the same thing or not.)