>Visual sharpness at the expense of wider-scale coherence (see: sliding/floating walking woman in Tokyo demo or tiny people next to giant people in Lagos demo)
Wider-Scale coherence is still much better than previous models and has consistently been improving. It's not "visual sharpness at the expense of coherence". At worst, the models are learning wider-scale coherence slower.
Not everything is equally difficult to learn so it follows that some aspects will lag behind others. If coherence weren't improving you might have a point but it is so...
Scaling laws operate in the limit but eventually practical considerations dominate. There's a lot we haven't yet fully appreciated about biological vision and cognition -- and indeed, common sense as regards sensible video generation and processing -- that have not made their way into this kind of model. NeRFs are interesting and I hope to see more from that side of things in the coming months and years.
Yes and in that time we've learned some important lessons that it would be unwise to ignore, e.g. comprehension of 3D geometry despite 2D input visual data.
Wider-Scale coherence is still much better than previous models and has consistently been improving. It's not "visual sharpness at the expense of coherence". At worst, the models are learning wider-scale coherence slower.
Not everything is equally difficult to learn so it follows that some aspects will lag behind others. If coherence weren't improving you might have a point but it is so...