> Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI have?
What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.
> then it should give some of that value back to the content.
That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.
> That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.
How is OpenAI compensating the owners of IP they trained their models on? Or is that not what you mean? It's certainly how I read the part of the GP comment you quoted.
So far, looks like funding a UBI study. As the IP owners are approximately "everyone" in law, UBI is kinda the only way to compensate all the IP owners.
So the researchers, shareholders, and leadership of OpenAI will be happy to give up being ridiculously wealthy so they can be only moderately wealthy, and everyone else gets a basic income?
I'm also just skeptical of UBI in general, I suppose - 'free' money tends to just inflate everything to account for it, and it still won't address scarcity issues for limited physical assets like land/property.
This is a load of bullshit and I sincerely hope you know that as well as I do.
As a thought experiment, let's say I pirate enough ebooks to stock a virtual library roughly equivalent in scope to a large metropolitan library system, then put up a website where you can download these books for free. I make money on the ads I run on this website, etc. This is theft, but as "compensation" I put some percentage of my revenues into funding a UBI study that might, if we're lucky—in half a century or so, in a progressive, enlightened version of the future we are by no means guaranteed to realize—make a fractional contribution to the thrust of a successful UBI movement.
Does that make what I'm doing okay? Should all those authors deprived of royalties on their work now, even deprived of publishing opportunities as legitimate sales collapse, understand my token contribution to UBI as fair compensation for what I'm taking from them?
That to me is a joke, and the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of. We will see what our lawmakers and courts make of it now, but either way, making a promise to pay me back later does not justify you in taking all the cash out of my wallet without my consent. Nor, for that matter, does tearing it up and returning it to me in the form of a papier-mâché sculpture of Shrek's head protruding from the bowl of a "skibidi toilet".
UBI is great! But taking away artists' livelihoods and justifying it with an uncertain promise of UBI years or (far more likely) decades in the future is (hypothetically) a particularly malignant example of putting the cart before the horse.
My observation of the vast majority of people who think intellectual property law is moral anathema to some idealized notion of "freedom" is that they either know nothing about making art themselves, or at the very least they are dilettantes who don't rely on it for any substantial portion of their livelihood. And, as it turns out, the latter group very rarely produce anything of note. Why is that? Because making interesting art is hard—just making art the average person would be willing to spend more than a second or two thinking about is hard enough, let alone art they'd be willing to pay for. It's hard, and it takes time and incredible effort and focus.
By taking (in whatever sense) an artist's work for less than it's worth, you're damaging their ability to extract value from their time investment in art, and so forcing them to choose between their artistic endeavors and other profitless aspects of their lives (e.g. their families). Even if you think that's your right (which I absolutely, categorically disagree with), your theft will have a very real effect of stunting young artists' development and reducing their output, and therefore cultural impact, in aggregate.
The sort of (frankly) idiots who think AI art is "giving power back to the people", or whatever, probably don't care about that because they don't understand art, or its impact on people and culture, at all. All I can say to them is that they have an incredibly juvenile outlook on the whole subject, and that instead of whining about artists gatekeeping art they should spend $10 on a pad of paper and a box of pencils, or $0 on a Google Doc, and go make some fucking art for themselves. Nobody's stopping them, and if they put in enough work maybe they'll start to understand some things.
Or, if you think UBI fixes everything, go ahead and start sending out checks (with a robust guarantee that they will keep coming). I'm sure that will change the tenor of the conversation.
> Or, if you think UBI fixes everything, go ahead and start sending out checks (with a robust guarantee that they will keep coming). I'm sure that will change the tenor of the conversation.
So far as I can tell, the capacity to do that is called "communism". Specifically "fully automated luxury communism". Why? Because governments are the only entities capable of making such a robust guarantee, and they can only do it without hyperinflation if they own the means of production… which in this case would be the AI.
Now, convincing a government to take an AI into public ownership and use its economic productivity to fund a UBI, that's going to be hard work, and will need evidence to show which of the many options within this very ill-defined and barely researched topic will work vs. which are mere wishful thinking.
This research is what's currently being funded by the actual entity you're most visibly upset with (OpenAI).
And that research may well also determine the answer is "no", at which point we have to come up with something else.
Saying "if you want that why don't you fund it yourself?" was a retort against government funded pensions back in the Victorian era, which I mention because pensions are an age-gated UBI[0]. It's never sufficient to find volunteers for stuff of this scale in a democracy, because "sufficient" is on the same scale as becoming the government and changing the laws.
[0] thus not literally universal, but that's always going to be a sticking point — one open question is if "UBI" should apply to just "all adults" or be the same rate also for minors
You are rationalizing an unsupported prior ("OpenAI and its ilk are right to keep doing what they're doing") with a science fiction premise in which what they're doing is a clear path to so-called "fully automated luxury gay space communism". I don't find this kind of thinking to be particularly well-founded in reality, especially when it's used as a justification for causing harm.
IMO that's a terrible thought experiment given the situation.
LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.
Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.
> Does that make what I'm doing okay? Should all those authors deprived of royalties on their work now, even deprived of publishing opportunities as legitimate sales collapse, understand my token contribution to UBI as fair compensation for what I'm taking from them?
Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.
I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.
> OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of
You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said:
"""Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model."""
Indeed, they are still mocked for suggesting their models may carry any risk at all. Of any kind. There are plenty of people who want to rush forward with this and think OpenAI are needlessly slow and cautious.
You may also have noticed their CEO gave testimony in the US Congress, and that the people asking him questions were surprised he said (to paraphrase) "regulate us specifically — not the open source models, they're not good enough yet — us".
To the extent that any GenAI can pose an economic threat to a creative job, it has to be better than a human in that same job. For now, IMO, they're assistant-level, not economic-threat-level. And when they get to economic-threat-level (which in fairness could be next month or next year), they'll be that threat even if none of your IP ever entered their training runs.
> LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.
I already addressed this: "the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of."
You are certainly welcome to disagree with what I've said, but you can't simply pretend I didn't say it.
> Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.
A quick Googling suggests that OpenAI employees are not working for free—far from it, in fact. In this frame I don't particularly care whether the organization itself is nominally "non profit", because profit motives are obviously present all the same.
> Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.
They are certainly welcome to try! Given how profoundly incapable extant genAI systems are of generating novel (no pun intended) output, including but not limited to developing artistic styles of their own, I think it would be quite funny to see these companies try to outcompete human artists with AI generated slop 70+ years behind the curve of art and culture. As for modern "public domain"-ish content, if genAI companies actually decided to respect intellectual property rights, I expect those licenses would quickly be amended to prohibit use in AI training.
AI systems will probably get there eventually, though it's very difficult to predict when. However, that speculation does not justify theft today.
> I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.
People are absolutely throwing money at genAI right now, so if nobody has thrown enough money at this particular idea to give it a fair shake then the obvious conclusion is that people who know genAI think it's a relatively bad one. I'm inclined to agree with them.
> You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said [...]
Why is this relevant? I'm not talking about AI safety or "X risk" or whatever—I'm talking about straightforward intellectual property theft, which OpenAI and their contemporaries are obviously very comfortable with. The models they sell to anybody willing to pay today could literally not exist without their training datasets.
That... doesn't seem sufficient, or legal, or (if legal) ethical. You can't just "compensate" people for using their copyrighted works via whatever means you've decided is fair.
I think funding UBI studies and lobbying for that sort of thing is a public good, but is entirely unrelated to -- and does not make up for -- wholesale copyright infringement.
I take no position at all about legality, if scraping is or is not legal and if LLMs are or are not "fair use" is quite beyond my limited grasp of international copyright law.
But moral/ethical, and/or sufficient compensation?
IMO (and it is just opinion), the damage GenAI does to IP value happens when, and only when, an AI is good enough to make some human unemployable, and that happens to all e.g. novelists around the same time even those whose IP was explicitly excluded from training the model. So, twist question: is it fair to pay a UBI to people who refuse to contribute to some future AI that does end up making us all redundant? (My answer is "yes, it's fair to pay to all even if they contributed nothing, this is a terrible place for schadenfreude").
Conversely, mediocre regurgitation of half-remembered patterns that mimic the style of a famous author cannot cause any more harm when done by AI than when done by fan fiction.
Right now these models are pretty bad at creative fiction, pretty good at writing code, so I expect this to impact us before novelists, despite the flood of mediocre AI books reported in various places.
Other damage can happen independently of IP value damage, like fully automated propaganda, but that seems like it's not a place where compensation would go to a copyright holder in the first place.
Sufficient compensation? If AI works out, nobody will have any economic advantage, and UBI is the only fair and ethical option I am yet aware of for that.
It's what happens between here and there that's messy.
> As the IP owners are approximately "everyone" in law
That makes no sense. If I write a book by myself, post part of it on my website and OpenAI ingests part of it - how does that make anyone besides me myself and I an "owner" of the IP?
I don't understand why you're confused, but I think it's linguistics.
If you write a book by yourself and post parts on your website and they ingest it, you are the copyright holder of that specific IP, and when I post this specific comment to Hacker News I am the copyright holder of this specific IP.
In aggregate you and I together are the copyright holders of that book sample and this post, and I don't know any other way of formulating that sentence, though it sounds like you think I'm trying to claim ownership of your hypothetical book while also giving you IP ownership of this post? But that's not my intent.
I don't think you're trying to claim ownership. It sounded like you were suggesting that the only recourse for OpenAI would be to fund a UBI program as a form of payment instead of directly paying the people who own the IP it ingested?
They ingested the entirety of the internet. Everyone who has ever written anything, including our (implicitly copyrighted) HN comments and letters written 400 years ago, which is online was used to train GPT-4.
Yes, I'm saying that because there's (currently) no way to even tell how much the model was improved by my comments on HN vs. an equal number of tokens that came from e.g. nytimes.com; furthermore, to the extent that it is even capable of causing economic losses to IP holders, I think this necessarily requires the model to be actually good and not just a bad mimic[0] or prone to whimsy[1] and that this economic damage will occur equally to all IP holders regardless of whether or not their IP was used in training. For both of these reasons independently, I currently think UBI is the only possible fair outcome.
[0] I find the phrase "stochastic parrot" to be ironic, as people repeat it mindlessly and with a distribution that could easily be described by a Markov model.
[1] if the model is asked to produce something in the style of NYT, but everyone knows it may randomly insert a nonsense statement about President Trump's first visit to the Moon, that's not detracting from the value of buying a copy of the newspaper.
So because it's "difficult" for ChatGPT to pay people for what it ingested we need to change our entire economic model to accommodate their inability (I'd argue it's the lack of will) to solve this problem?
Imagine a scenario where your employer decides that it's going to go plant trees to save the environment, a laudable goal, in lieu of your paycheck. Their excuse would be "it's too difficult to process payroll and easier to plant trees. Since planting trees is good for the environment by the transitive property it's good for the employee. Thus, they should be happy."
> So because it's "difficult" for ChatGPT to pay people for what it ingested we need to change our entire economic model to accommodate their inability (I'd argue it's the lack of will) to solve this problem?
It's not merely "difficult", so far as I know the only way even in theory, and I am about to explain why this is sufficiently hard as to be impossible in practice even though it may not sound like it before you get to that explanation, is to try retraining the model from scratch with certain subsets removed.
Unfortunately, while can do this for any single source, you absolutely cannot owing to the computational limits of the universe do this for all 2^(2*10^9)) permutations of people on the internet who produced some training data (even a mere 2^(765.5) is too large to *count* within the theoretical computational limits of the universe); and those permutations are necessary because inside the model all these sources interact with each other, so you also can't (accurately) just assume that whatever Alice's contribution was, was independent of whatever Bob's contribution was — as a toy model, pretend Alice explained algebra and Bob explained geometry, there's a lot you can do with either one, but even more you can do with both.
As for "changing our entire economic model": if/when AI is good enough to make everyone permanently unemployable, what else would you suggest?
Even in the lead-in to that if/when conditional, I think we need something like UBI when there's an AI that's "only" making everyone of IQ≤85 permanently unemployable; similar economic changes, but with different potential solutions, are also needed for "only" truck drivers (as a category, I'm not making an IQ claim about them), or "only" the traditional handloom weavers[0], as in all cases the alternative is they choose between rioting and starving.
And in case you're wondering, I think the image generators may put artists in the second group (a specific role in society that is quite understandably upset about being suddenly automated out of a perfectly respectable career), while LLMs look like they might be the former (even if the question of exactly what this whole "IQ" thing means anyway is surprisingly difficult to answer).
> Imagine a scenario where your employer decides that it's going to go plant trees to save the environment, a laudable goal, in lieu of your paycheck. Their excuse would be "it's too difficult to process payroll and easier to plant trees. Since planting trees is good for the environment by the transitive property it's good for the employee. Thus, they should be happy."
Bad example. For one, as a software developer my job is to automate myself into redundancy. For another, I am saying that *all actual harm* from GenAI has to be balanced by someone else making *the same economic output*, which in the environment/trees example would have to be "my employer plants 2,400 trees per minute to make up for not paying anyone" (according to Wikipedia)… or possibly "my employer cut down literally all the trees on the planet and has decided to make up for this by planting replacements, which means they no longer have any money to pay anyone", which would leave me very confused about my financial status but I'd probably ask the government to step in to make sure they didn't go under before finishing putting the trees back?
"That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter. "
that sounds like insane bullshit to me. they're trained on the whole internet. there's no way they give back to the whole of the internet, more likely a lot of jobs will be taken away by their work.
If they make a model that's good enough to actually take away jobs rather than merely making everyone more productive — is it a tool or a human level AI, I'm not sure either way though I lean toward the latter — the only possible compensation is UBI… which they're funding research into.
> plus, there was no consent.
I agree with you about this. That's a different problem, but I do agree.
It’s not a matter of “is it good enough to replace humans”, as despite all of us here knowing it’s not, we could list many companies (and even industries) where it’s already happening
That comment is self-contradicting. If it's already replacing humans, then economically speaking (which is what matters for economic harm), it's good enough to replace those specific humans.
The reason I'm not sure how much this tech really is at the level of replacing humans in the workplace, is that there's always a lot of loud excitement about new tech changing the world, and a lot of noise and confounding variables in employment levels.
But if it is actually replacing them, then it must be at the level of those employees in the ways that matter.
„In the ways that matter” and the only way that matters for a lot of employers is what is cheaper.
This maybe isn’t strictly related to the topic of this post or conversation but a lot of companies have been replacing most, or even all, support channels with AI assistants.
No, it isn’t good enough to replace those humans in a sense most would consider essential - helping customers which reach for the support line, but businesses find it „good enough” in a sense that it’s cheaper than human workers and the additional cost of unhappy customers is small enough to still have it be worth it.
I would agree with you that what counts as "good enough" is kinda hard to quantify (which itself leads into the whole free market vs state owned business discourse from 1848 to 1991), but I do mean specifically from the PoV of "does it make your jobs go away?"
Although now I realise I should be even more precise, as I mean "you singular and jobs plural for now and into the future" while my previous words may reasonably be understood as "you plural and each job is just your current one".
> What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.
Great. Let's do that then. No good reason to volunteer it for a lobotomy.
Plus side: Smaller more focused model with lower rate of falsehoods.
Minus side: it kant reed txt ritten liek this, buuuut dat only matters wen nrml uzers akchuly wanna computr dat wrks wif dis style o ritin lol
I suspect there is a lot of value in the latter, and while I don't expect it to be as much as the value in the former, I wouldn't want to gamble very much either way.
What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.
> then it should give some of that value back to the content.
That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.