> Are you saying this with the personal experience of being from a country that now speaks the language of its colonizer?
And to be clear, the US is excluded from this. Our cultural memory of our colonial history is an outlier—for most Americans our sense of our relationship with Britain is more that of friendly rivals than colonizer-colonized. The difference is largely because most of us are descended from the colonists (or people who arrived much later), not from the people that were there first, so the abuses that our ancestors suffered barely even register on the scale of colonial abuse.
That contrasts sharply with how the Irish or most Africans feel towards their former colonial powers. It's hard to feel positively towards a flag that represents a power that repeatedly committed genocide against your people.
Cascadia is a perfectly non-fictional name for the region, the Republic of Cascadia Department of Cephalopod Conservation, OTOH, is an extremely fictional agency of an equally fictional government.
> You would ask an LLM based assistant what it can do
But this has the same problem that it's trying to solve in the first place: the LLM's behavior is unpredictable, and that includes its answers to questions like this. There's no guarantee that it won't hallucinate.
Maybe this can be ameliorated by giving it access to some hard-coded and highly vetted list of capabilities?
It's still not quite what the experiment is designed to test. How many people who opt out when given the ability would stick around when not given the ability?
If they make opting out difficult (ie. You go to a page in settings that you have to directly link to because you’re “signed out” in the main app) then that wouldn’t skew the results that significantly, while still providing an escape hatch that isn’t “wait for this to be finished”.
I'm struggling to imagine a mental model of the LLM for which that would make sense. A human who's willing to a lie a little, but comes clean when called out? A robot that mostly doesn't make mistakes, and is more likely to catch its own than make more?
For ex. I tried asking ChatGPT about cartoons from childhood. I wrote "What was that cartoon in the 1980s that was based on some kind of gummy candy?" and it correctly identified "The Adventures of the Gummi Bears". I wrote "Sing the theme song for me" and it produced the song missing the first verse. I wrote "That is missing the first verse!" and it produced the whole correct song.
On the other hand, when I asked it to describe the instrumental 90s X-Men theme song, it tells me:
'...the lyrics are epic and uplifting, with lines like "We're the X-Men, we're the best there is at what we do." The song also has a sense of urgency and danger, with lines like "We're fighting for our lives" and "The mutant race will survive"...'
When I put "The X-Men theme song doesn't have lyrics" it readily accepted the correction but unlike getting the missing first verse I wasn't really getting any verifiable information by making the correction.
And of course it was happy to tell me about a nonexistent Gummi Bears / Rescue Rangers crossover episode.
Honestly, at least when it comes to code in widely used languages, GPT-4 is very good at catching mistakes in generated output after one makes a simple request for a second check. In most cases, there isn't even a need to explain what the specific issue, concern or error in the provided code is.
This does make sense as, beyond what has been typed, there is no memory implemented in most of these models, so revisions are currently the only game in town to get more accurate results.
What surprised me most concerning this entire situation is that the model did insist on being correct. Normally ChatGPT has been set up to be a bit cautious and more likely to admit to having made a mistake, to the point where if you ask in a direct manner like this lawyer has done, the model may claim to have been incorrect, even when the output was actually correct, in my experience. Bings implementation, of the same underlying model, meanwhile can be so forceful in trying to convince users that the output is correct, even when provided with online resources that show the oposite, that it would not be unreasonable to feel gaslit by that LLM.
The rest of this situation was not very surprising and I have do admit, I am happy that this was caught right away. Lawyers actions have a massive impact on countless people every day, if this had not become such a public scandal right away, perhaps a lot of defendants would have suffered under improper representation due to reliance on imperfect models.
My layman understanding is it’s not grounded. GPTs are schizophrenic, but unlike schizophrenic man who is delusional and failing to be bound onto the reality he is in, GPT is actually up in the air.
That and that it’s just a language model, an approximation of neither the world, nor a body of knowledge, but of English, and not an answering machine at all.
This reminds me of a story, I believe referring to Pascal's demonstration of his (newly invented, entirely mechanical) calculator to the Royal Society. He showed that pressing certain levers in the correct order means you want to operate on certain numbers, and certain other levers mean choosing the operation you want, and then by cranking his calculator you could read off the answer of doing your operation on your numbers. World's first calculator! Someone asked: if you press the levers wrong, do you still get the right answer?
Their mental model is "magic" and they don't understand the details of how magic works, because it's magic.
This is not an unreasonable mental model. You can ask ChatGPT to give you a program, ask it "is this correct?" and it'll find and fix bugs. To a layperson it looks like it is capable of double checking its work and finding an error. Why would it be any different here?
(the answer of course that the LLM doesn't actually search the internet and/or doesn't have access to a law database it can query)
GP means the mental model that the lawyer had of ChatGPT: why did he think that he could check ChatGPT's work by just asking it "hey are you sure about that"?
I love how you make up this fake scenario in your head where the guy is threatening the woman with his power. Yes, the scenario where a powerful guy is threatening the little girl is unsexy. Good thing the only brain conflating the article and discussion thread with this little world you've created in your head, is the person who created this little world in your head.
The guy was on an island with a horny 17yo who was all over him. He wasn't threatening anyone. Now maybe the owner of the island was - and that guy went to jail and died.
Now the real question is, why would you run over a puppy with your car? You see, if running the puppy over with your car is wrong, then I'm right.
Someone's actual preference is stored in their memory. If you give the LLM access to a database that includes "preferences", I'm sure it will be able to retrieve that.
This doesn't seem obvious to me. I don't think the word "English" is likely to confuse any English speaker.
> anyone offended by this needs to grow a tougher skin
Are you saying this with the personal experience of being from a country that now speaks the language of its colonizer?