> - Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is
> - Case 27 shows the usual "models can't do text" though I'm not holding that against it too much
16 makes it seem like it can "do text" — almost, if we don't care what it says. But it looks very crisp until you notice the "Pul??nary Artereys".
I'd say the bigger problem with 27 is that asking to add a watermark also took the scroll out of the woman's hands.
(While I'm looking, 28 has a lot of things wrong with it on closer inspection. I said 26 originally because I randomly woke up in the middle of the night for this and apparently I don't know which way I'm scrolling.)
EDIT: Yeah, on closer inspection, 28 is definitely a bit screwy. I wasn't clicking on the images themselves to view the enlarged ones, and from the preview I didn't see anything that immediately jumped out at me. I have no idea what that line at the bottom is meant to represent!
Also you're right, I didn't notice the scroll had gone, though on another inspection, it's also removed the original prompter's watermark
Yeah, I appreciate this kind of benchmarking too. That other Gen AI Showdown in the comments also does a good job with this - mentions that it was best of 8 attempts and so on.
- The second one in case 2 doesn't look anything like the reference map
- The face in case 5 changes completely despite the model being instructed to not do that
- Case 8 ignores the provided pose reference
- Case 9 changes the car positions
- Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is
- Case 27 shows the usual "models can't do text" though I'm not holding that against it too much
- Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing
- Case 33 just generated a generic football ground
- Case 37 has nonsensical labellings ("Define Jawline" attached to the eye)
- Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much
Super nice to see how honest they are about the capabilities!