Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any example of that? One would think that predicting what comes next from an image is basically video generation, which works not perfect, but works somehow (Veo/Sora/Grok)


Here's one I made in Veo3.1 since gemini is the only premium AI I have access to.

Using this image - https://www.whimsicalwidgets.com/wp-content/uploads/2023/07/... and the prompt: "Generate a video demonstrating what will happen when a ball rolls down the top left ramp in this scene."

You'll see it struggles - https://streamable.com/5doxh2 , which is often the case with video gen. You have to describe carefully and orchestrate natural feeling motion and interactions.

You're welcome to try with any other models but I suspect very similar results.


A Goldbergs machine was not part of their training data. For humans, we have seem such things.


physics textbooks are though so it should know how they'd work, or at least know that balls don't spontaneously appear and disappear and that gears don't work when they aren't connected


I love how it still copies the slow pan and zoom from rube goldberg machine videos, but it's just following along with utter nonsense lol


It is video generation, but succeeding at this task involves detailed reasoning about cause and effect to construct chains of events, and may not be something that can be readily completed by applying "intuitions" gained from "watching" lots of typical movies, where most of the events are stereotypical.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: