Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I do a test on multimodal LLMs where I show them a dog with 5 legs, and ask them to count how many legs the dog has. So far none of them can do it. They all say "4 legs".

Segment anything however was able to segment all 5 dog legs when prompted to. Which means that meta is doing something else under the hood here, and may lend itself to a very powerful future LLM.

Right now some of the biggest complaints people have with LLMs stems from their incompetence processing visual data. Maybe meta is onto something here.



Segmentation doesn't need to count legs. I'd guess something like YOLO could segment 5 legged dogs too.


YOLO is not a segmentation model.



Thanks! TIL there's a class of segmentation models with the YOLO naming scheme.


I thought it was a joke about YAML


Lol you obviously haven't seen what cheats for FPS games look like in the last 3 years.

https://github.com/Babyhamsta/Aimmy


You don’t need segmentation to count legs. Object detection can do that. DeepLabCut from 2020 perhaps.


I doubt that gemini 3 cannot do it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: