Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well it's good to see they are showcasing examples where the model really fails too.

- The second one in case 2 doesn't look anything like the reference map

- The face in case 5 changes completely despite the model being instructed to not do that

- Case 8 ignores the provided pose reference

- Case 9 changes the car positions

- Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is

- Case 27 shows the usual "models can't do text" though I'm not holding that against it too much

- Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing

- Case 33 just generated a generic football ground

- Case 37 has nonsensical labellings ("Define Jawline" attached to the eye)

- Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much

Super nice to see how honest they are about the capabilities!



> - Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is

> - Case 27 shows the usual "models can't do text" though I'm not holding that against it too much

16 makes it seem like it can "do text" — almost, if we don't care what it says. But it looks very crisp until you notice the "Pul??nary Artereys".

I'd say the bigger problem with 27 is that asking to add a watermark also took the scroll out of the woman's hands.

(While I'm looking, 28 has a lot of things wrong with it on closer inspection. I said 26 originally because I randomly woke up in the middle of the night for this and apparently I don't know which way I'm scrolling.)


EDIT: Yeah, on closer inspection, 28 is definitely a bit screwy. I wasn't clicking on the images themselves to view the enlarged ones, and from the preview I didn't see anything that immediately jumped out at me. I have no idea what that line at the bottom is meant to represent!

Also you're right, I didn't notice the scroll had gone, though on another inspection, it's also removed the original prompter's watermark


In Case 16 (diagram of the heart), every single label (aside from the superior vena cava) is incorrect.


Yeah, I appreciate this kind of benchmarking too. That other Gen AI Showdown in the comments also does a good job with this - mentions that it was best of 8 attempts and so on.


47 is also very questionable

48 is impossible to do in a way that is accurate and meaningful




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: