not adhering to the prompt guide is def a valid strong criticism. resampling i think less so for the demo just because fewer people look at k samples per model, so just taking literally the first one has the fewest of my own biases injected into it
I actually think it's ok to inject your own bias here: if you're deploying these models in production, then you probably test on your own domain other than half eaten burritos lol
But individual users usually iterate/pick, so just sharing a blurb about your preference is probably enough if you choose 1 of n