>> If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content.
But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it. If one doesn't want anyone to see/consume or be influenced by one's copyrighted work, then lock it in a box and don't show it to anyone.
I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.
This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before. Visual art may be approaching a similar point. No matter how original you think your drawings are, someone out there has already done something similar. They may not have created exactly the same image, but neither does AI literally copy images. That reality doesn't kill visual arts as it didn't kill off the fashion industry.
I firmly believe that training models qualifies as fair use. I think it falls under research, and is used to push the scientific community forward.
I also firmly believe that commercializing models built on top of copyrighted works (which all works start off as) does not qualify as fair use (or at least shouldn't) and that commercializing models build on copyrighted material is nothing more than license laundering. Companies that commercialize copyrighted work in this manner should be paying for a license to train with the data, or should stick to using the licenses that the content was released under.
I don't think your example is valid either. The reason that AI models are generating content similar to other people's work is because those models were explicitly trained to do that. That is literally what they are and how they work. That is very different than people having similar styles.
> I firmly believe that training models qualifies as fair use
There's a hell lot of money to be made from this belief so of course the HN crowd will hold it.
Some of us here who have been around the copyright hustle for a little longer laugh at this bitterly and pray that the courts and/or Doctorow's activism saves us. But there's so much money to be made from automatized plagiarism and the forces against are so weak, the hope is not much.
The world will be a much, much poorer place once all the artists this view exploits will stop making art because they need to make a living.
I literally met and worked with Doctorow on a protest back in 2005, so I'm not exactly new to this. I also think that the only way you could have written your comment was by grossly misinterpreting my comment.
No, I have to disagree here. I'm not an artist, but I respect the creations of others. OpenAI does not. They could have trained on free data, but it did not want to because it would cost more (finding humans to find/sanction said data, etc).
> Sure. It's just a tool. That need other people's art to work.
So does a human brain.
Which brings us to the other side of the reasoning is that tools like Midjourney and OpenAI enable idiots (when it comes to drawing/animating ... that includes me) to create engaging artwork.
Recently generating artwork like that went from almost impossible to "great, but easily recognizeable as AI artwork". Frankly, I expect the discussion will end when it stops being recognizeable.
I hate Andreesen Horowitz' reasoning, but they're right about one thing: once we have virtual artists that are not easy to distinguish from "real" ones, the discussion will end. It does not really matter what anyone's opinion on the matter is as it will not make a difference in the end.
A major difference between a human training himself by looking at art, and a computer doing it, is that the human ends up working for himself, the computer is owned by some billionaire.
One enhances the creative potential of humanity as a whole, the other consolidates it in the hands of the already-powerful.
Another major difference is that a human can't use that knowledge to mass-produce that art at a scale that will put other artists in the poorhouse. The computer can.
Copyright exists to benefit humanity as a whole... And frankly, I see no reason for why a neural network's output should be protected by copyright. Only humans can produce copyrightable works, and a prompt is not a sufficient creative input.
No, the beneficiaries of generative AI are the users because they set the prompt and use the outputs. Providers make cents per million tokens. It is a people empowerment technology.
Visual artists cannot create without tools. Whether that tool is a brush and paint, a camera, or a neural network.
Whether an artist pays for a subscription to openAI or buys paint pots on Amazon.com money is going to a billionaire, it is not a difference between ai and other art.
You are also ignoring the existence of non-commercial open source AI, they exist.
Regarding copyright, we copyright output not input. Otherwise most photography would be uncopyrightable.
There's a substantive difference in whether the artist is using the tool, or the tool works on its own. A paintbrush doesn't produce a painting by itself, a human needs to apply an incredibly specialized creative skillset, in conjunction with the paintbrush to do so.
An LLM takes a prompt and produces a painting. No sane person would say that I 'drew' the painting in question, even if I provided the prompt.
> Regarding copyright, we copyright output not input. Otherwise most photography would be uncopyrightable.
We copyright things that require creative input. A list of facts or definitions did not require creative input, and is therefore not copyrightable.
Using an LLM does not meet the bar for creative input.
> There's a substantive difference in whether the artist is using the tool, or the tool works on its own. A paintbrush doesn't produce a painting by itself, a human needs to apply an incredibly specialized creative skillset, in conjunction with the paintbrush to do so.
That sounds like a kinder restatement of the opinion at the top of the thread: "Artists are mad because this tool empowers other people, who they view as less talented, to make art too."
Artists might not like the phrasing, but scratch the surface and there's a degree of truth there. It's an argument from self-interest, at core.
One small nitpick: It is completely possible for an artist to make all of their own tools, and indeed for the majority of history that is exactly how things went.
But today the artist that can also create a robust version of photoshop on their own doesn’t really exist. Maybe some can write code to that level but certainly not a majority and it’s certainly not the same as sanding wood to make a paintbrush.
> I see no reason for why a neural network's output should be protected by copyright. Only humans can produce copyrightable works, and a prompt is not a sufficient creative input.
If you graduated from school and only used work that was public domain, would you have all the knowledge you currently have? Have you learned anything from anybody since graduating?
Where is the line? It’s ok for humans to learn from others work but not a machine?
It is NOT okay for machine owner to profit from learning, unless the machine owner compensates the owners of the training data. That is where the line should be drawn.
If I read a lot of stories in a certain genre that I like, and I later write my own story, it’s almost by definition going to be a mish-mash of everything I like.
Should I pay the authors of the books I read when I sell mine?
We shouldn't hold individual humans and ML models to the same standards, because ML models themselves are products capable of mass production and individual humans are not even remotely at the same scale.
If you write that book, chances are you will gain some fans that are also fans of other authors in that genre.
If ML models write that genre, they can flood that genre so full that human artists won't be able to complete.
I feel like the issue here, is you are giving AIs agency.
AIs are not magic. They are tools. They are not alive, they do not have agency. They do not do things by themselves. Humans do things, some humans use AI to do those things. Agency always rests with a combination of the tool's creator and operator, never the tool itself.
Is there really a difference between a human flooding the market using AI and a human flooding the market using a printing press?
Even if human's can't compete (An obviously untrue premise from my perspective, but lets assume it for the sake of argument), is that a bad thing? The human endeavor is not meant to be a make work project. Humans should not be forced to pointlessly toil out of protectionism when they could be turning their attention to something that can't be automated.
>Is there really a difference between a human flooding the market using AI and a human flooding the market using a printing press?
A magnitude of difference, yes. Even a printing press will be limited by natural resources, which require humans to procure.
A computer server can do a lot more with a lot less. And is much easier to scale than a printing press.
>Even if human's can't compete (An obviously untrue premise from my perspective, but lets assume it for the sake of argument), is that a bad thing?
When the AI can be argued to be stealing human's work, yes. A printing press didn't need to copy Shakespeare to be useful. And it'd benefit Shakespeare anyways because more people get to read about his works.
So far I don't see how AI benefits artists. Optimistically:
-an artist can make their own market? Doubtful, they will be outgunned by SEO optimized ads from corporations.
- they can make commissions faster? To begin with commissions aren't a sustainable business. Even if they 5x the labor and somehow kept the same prices they aren't living well. But in reality, they will get less business as people will AI their "good enough" art and probably won't pay as much for something not fully hand drawn
- okay, they can make bigger commissions? There's a drama about spending 50k on a 3 minute AMV, imagine if that could be done by a single artist in a day now!... Well, give it another 10 years. Lot of gen Ai is static assets. Rigging or animating is still far from acceptable quality, and a much harder problem space. I also wouldn't be surprised if by then any AI models has its own phase of enshittification and you end up blowing hundreds, thousands anyway.
-----
>Humans should not be forced to pointlessly toil out of protectionism when they could be turning their attention to something that can't be automated.
Until someone conceptualizes a proper UBI scheme, pointlessly toiling is how most of the non-elite live. I have yet to hear of a real alternative for these misplaced artists to move towards.
So what? So we all just become managers in meetings in 30 years?
> A magnitude of difference, yes. Even a printing press will be limited by natural resources, which require humans to procure.
A computer server can do a lot more with a lot less. And is much easier to scale than a printing press.
AI runs on some of the most power hungry and expensive silicon on the planet. Comparing a GPU cluster and a printing press then staring the GPU cluster not limited by natural resources is just silly. Where does the materials come from to make the processors?
> When the AI can be argued to be stealing human's work, yes. A printing press didn't need to copy Shakespeare to be useful. And it'd benefit Shakespeare anyways because more people get to read about his works.
The same can be true for AI as well. I could see a picture and then ask AI whose style it is. Then I could go look up more work by that artist, increasing their visibility.
- they can make commissions faster? To begin with commissions aren't a sustainable business. Even if they 5x the labor and somehow kept the same prices they aren't living well. But in reality, they will get less business as people will AI their "good enough" art and probably won't pay as much for something not fully hand drawn
Is this a complaint that something got cheaper to make? This one affects more than just artists. For instance, code quality output from LLM is quite high. So, wages across the board will decrease yet capabilities will increase. This is a problem external to AI.
> Until someone conceptualizes a proper UBI scheme, pointlessly toiling is how most of the non-elite live. I have yet to hear of a real alternative for these misplaced artists to move towards.
Again, not just artists and the path forward is the same as it’s always been with technological advancements, increase your skill level to above the median created by the new technology.
>Comparing a GPU cluster and a printing press then staring the GPU cluster not limited by natural resources is just silly. Where does the materials come from to make the processors?
Probably mined from 3rd world country slaves (in the literal "owning people" sense). But still, these servers already exist and scale up way more than a tree.
>well. I could see a picture and then ask AI whose style it is. Then I could go look up more work by that artist, increasing their visibility.
Sure, and you can use p2p to download perfectly legal software. We know how the story ends.
>Is this a complaint that something got cheaper to make... not just artists and the path forward is the same as it’s always been with technological advancements, increase your skill level to above the median created by the new technology.
It's a complaint that people even woth more efficiency still can't make a living. While the millionaires become billionaires. I'm not even concerned about software wages. Some Principal SWE going from 400k to 200k will still live fine.
Artists going from 40k to 40k (but now working more efficiently) is exactly how we ended up with wages stagnating for 30 years. And yes, it is affecting everyone even pre-AI. The median is barely a living wage anymore, which is what "minimum wage" used to be.
If we lived in a work optional world I don't think many would care. But we don't and recklessly taking jobs to feed the billionaires is just going to cause societal collapse if left unchecked.
I'm not. But I'm going to hold a company who's responsible for the production more accountable than a consumer who can't research the sourcing of every single part of their personal device.
>Was your comment written by some new flamebait AI
IIRC AI focuses on formal language and discourages slang. Either way:
>Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
> Is there really a difference between a human flooding the market using AI and a human flooding the market using a printing press?
Yes. A printing press only floods the market with copies. An AI floods the market with new derivative works.
A human producing a single creative work and then flooding the market with copies leaves lots of room for other humans to produce their own novel work. An AI flooding the market with new derivative works leaves no such room.
I work with DNNs a lot professionally and remain a proponent of the technology, but what OpenAI et al are doing is highly exploitative and scummy. It’s also damaging their social licence and may end up setting the field back.
It’s potentially nice for the consumer. If I could get personalised audio and video content created on demand for me, that would be pretty amazing. But it does disincentivise people from creating content rather than just consuming it, and I think that could end up taking away a lot of the magic from life.
It'd be a good point if it wasn't for the fact that search engines didn't exist until google, because of technology, and that courts didn't need to consider the issue until then. So where does your point get us? We are here now.
When i was in university, i remember there was this humanities professor who had a concordance for the iliad on his shelf. As a CS person it was so cool to see the ancient version of a search engine.
I think the post is making reference to the "thing" which made Google stand out in a sea of existing search engines.
"Using the idea of relevancy, they built Google in a way that - in comparison to other search engines at the time - was simply better at connecting users with more pertinent results.
"A query typed in Google provided more utility and relevancy than did Excite, Yahoo and other search engines.
Why? I dont see any way the ranking algorithm would affect this discussion, and if it did, i think old style (ranking based on keywords) would make the comparison slightly better than google (pagerank) style search engines.
Computers and machines have been capable of mass production for decades, and humans have used them as tools. In the past 170 years, these tools of mass production have already diminished many thousands of professions that were staffed by people who had to painstakingly craft things one at a time.
Why is art some special case that should be protected, when many other industries were not?
Why should we kill this technology to protect existing artistic business models, when many other technologies were allowed to bloom despite killing other existing business models?
>Why is art some special case that should be protected, when many other industries were not?
Because in this case the art is still necessary for the machine to work. You don't need horse buggies to make a car, nor existing books to make a printing press. You DO need artist's art to make these generative AI tools work.
If these worked purely off of open source art or from true scratch, I wouldn't personally have an issue.
>Why should we kill this technology to protect existing artistic business models,
We don't need to kill it. Just pay your dang labor. But if we are treating proper compensation as stifling technology, I'm not surprised people are against it.
Maybe in the 2010's tech would have the goodwill to pull this off in PR, but the 2020's have drained that goodwill and then some. Tech's made so many promises to make lives easier and now they joined the very corporations they claimed to fight against.
>Nobody can really answer these questions.
Well it's in courts, so someone is going to answer it soon-ish
> We don't need to kill it. Just pay your dang labor.
> But if we are treating proper compensation as stifling technology, I'm not surprised people are against it.
That's just it, nobody looking to get paid by OpenAI actually did any labor for OpenAI. They did labor for other reasons, and were happy with it.
OpenAI found a way to benefit by learning from these images. The same way that every artist on the planet benefits by learning from the images of their fellow artists. OpenAI just uses technology to do it much more efficiently.
This has never been considered labor in the past. We've never asked artists to "properly compensate" each other for learning/inspiration in the past. I don't know why it should be considered labor or proper compensation now.
There are many ways an artist can compensate their influences. Some of them are monetary.
When discussing our work, we can name them.
When one of our influences comes out with a new body of work, we can gush about it to our own fans.
When we find ourselves in a position of authority, we can offer work to our influences. No animation studio is really complete without someone old enough to be a grandfather hanging out helping to teach the new kids the ropes in between doing an amazing job on their own scenes, and maybe putting together a few pitches, for instance.
We can draw fan art and send it to them.
None of these are mandatory, but artists tend to do this, because we are humans, and we recognize that we exist in a community of other artists, and these all just feel like normal human things to do for your community.
And if an artist suddenly starts wholesale swiping another artist's style without crediting them, their peers get angry. [1]
OpenAI isn't gonna tell you that it was going for a Cat & Girl kind of feel in this drawing. OpenAI isn't gonna offer Dorothy Gambrell a job. OpenAI isn't going to tell you that she just came out with a new collection and she's still at the top of her game, and that you should buy it. OpenAI's not going to send her a painting of Cat & Girl that it did for fun. OpenAI isn't going to do anything for her unless the courts force it to, because OpenAI is a corporation who has found a way to make money by strip-mining the stuff people post publicly on the Internet because they want other humans to be able to see it.
Most people know 20,000-40,000 words. Let's call it 30,000. You've learned 99.999% of those 30,000 people from other people. And don't get me started on phrases, cliches, sentence structures, etc.
How many of those words do you remember learning? How many can you confidently say you remember the person or the book that taught you the word? 5? 10? Maybe 100?
That's how brains work. We ingest vast amounts of information that other people put out into the world. We consume and it incorporate it and start using it on our own work. And we forget where we even got it. My brain works this way. Your brain works this way. Artists' brains work this way. GPT-4 works this way.
The idea that a visual artist can somehow recall where they first saw many of the billions of images stored in their brain -- the photos, movies, architecture, paintings, and real-life scenes that play out every second of every day -- is laughable. Almost all of that goes uncredited, and always will.
I tend to fall more on the "training should be fair use" side than most, but your comment seems to be missing the point. Nobody is arguing that models are violating copyright or social norms around credit simply because they consume this information. Nobody ever argued/argues that the traditional text generation in markov models on your phone's keyboard runs afoul of these issues. The argument being made is that these particular models are now producing content that very clearly does run into these norms in a qualitatively different way. You cannot convincingly make the argument that the countless generated "X, but in the style of Y" images, text, and video going around the internet are exclusively the product of some unknowable mishmash of influences -- there is clearly some internalized structure of "this work has this name" and "these works are associated with this creator".
To take it to an extreme, you obviously can't just use one of the available neural net lossless compression algorithms to circumvent copyright law or citation rules (e.g., distributing a local LLM that helpfully displays the entirety of some particular book when you ask it to), you can't just tweak it to make it a little lossy by changing one letter, or a little more lossy than that, etc., while on the other hand, any LLM that performs exactly the same as a markov model would presumably be fine, so there is a line somewhere.
A company hires an artist. That artist has observed a ton of other artists' work over the years. The company instructs that artist to draw, "X but in the style of Y", where Y is some copyrighted artwork. The company then prints the result and puts it on their packaging.
A company builds an AI tool. That AI tool is trained on a ton of artists' work over the years. The company opens up the AI tool and asks it to draw, "X but in the style of Y," where Y is come copyrighted artwork. The company then prints the result and puts it on their packaging.
What's the difference?
I'd argue there isn't one. The copyright infringement isn't the ability of the artist or the AI tool to make a copy. It's the act of actually using it to make a copy, and then putting that out into the world.
Ultimately, only high courts in each jurisdiction can decide. I can imagine a case where some highly advanced nations decide different interpretations that cause conflict. Then, we need an amendment to the widely accepted international copyright rules, the Berne Convention. Ref: https://en.wikipedia.org/wiki/Berne_Convention
Okay, but then that's an an argument subject to the critiques made upthread that you were initially trying to dismiss? You can't claim that AI doesn't need to worry about citing influences because it's just doing a thing humans wouldn't cite influences for, then proceed to cite an example where you would very much be expected to cite your influences, and AI wouldn't, as evidence.
I never argued that AI doesn't need to worry about citing influences. If I am a person using a tool to create a work, and the final product clearly resembles some copyrighted work that I need to reference and give credit to, what does it matter if my tool is a pencil, a graphics editing program, a GPT, or my own mind? I can cite the work.
Like I said, this is exactly what the comment you first replied to was explaining. It is very clearly not the same as a pencil or a graphics editing program, because those things do not have a notion of Cat & Girl by Willem de Kooning embedded in them that they can utilize without credit. It is clearly not the same as your mind, because your mind can and, assuming you want to stay in good standing, will provide credit for influence.
Again, take it back to basics: do you believe it is permissible to share a model itself (not the model output, the model), either directly or via API, that can trivially reproduce entire copyrighted works?
I'd say that a tool itself can't be guilty of copyright infringement, only the person using the tool can. So it doesn't matter if the GPT has some sort of "notion" of a copyrighted work in it or not. GPTs aren't sentient beings. They don't go around creating things on their own. Humans have to sit down and command them, and that point, whoever issued the command is responsible for the output. Copyright violation happens at the point of creation or distribution, not at the much earlier point of inspiration or learning.
So yeah, of course imo it should be permissible to share a model that can reproduce copyrighted works. Being "capable of being used" to violate a law is not the same thing as violating a law.
A ton of software on my computer can copy-paste others' work, both images and words. It can trivially break copyright. Hell, there are even programs out there than can auto-generate code for me, code that various companies have patent claims for. Do I think distributing any of this software should be illegal? No. But I think using that software to infringe on someone's copyright should be.
(Note: This is different than if the program distributed came with a folder that included bunch of copyrighted works. To me, sharing something like that would be a copyright violation.)
I'm not sure how to explain this any clearer. I am talking about neural net compression algorithms. As in, it is literally just a neural net encoding some copyrighted work, and nothing else. It is ultimately no more intelligent than a zip file, other than the file and program are the same. You can't seriously believe that these programs allow you to avoid copyright claims, can you? Movie studios, music producers, and book publishers should just pack it in, pirates just need to switch to compressing by training a NN, and seeding those instead, and there's no legal precedence to stop them? If you do think that, do you at least understand why nobody is going to take your position seriously?
A neural net designed to do nothing other than compress and decompress a copyrighted work is completely different than GPT-4, unless I'm uninformed. To me that sounds like comparing a VCR to a brain. GPT-4's technology is clearly something that "learns" in order to be able to produce novel thoughts and ideas, rather than merely compressing. A judge or jury would easily understand that it wasn't designed just to reproduce copyrighted works.
> It is clearly not the same as your mind, because your mind can and, assuming you want to stay in good standing, will provide credit for influence
I forgot to respond to this, but it's not true. Your mind is incapable of providing credit for 99.9% of its influence and inspiration, even when you want it to. You simply don't remember where you've learned most of the things you've learned. And when you have a seemingly novel idea, you can't always be aware of every single influential example of another person's work/art that combined to generate that new idea.
> A neural net designed to do nothing other than compress and decompress a copyrighted work is completely different than GPT-4, unless I'm uninformed.
Compression and the output from LLMs are cousins. The model tries to predict what continuations are likely, given context. Indeed, it takes a lot of effort to make LLMs less willing to just output training data verbatim. And conversely, you can get compression algorithms to do things similar to what LLMs do (poorly).
Whether this also describes most of human cognitive process, is subject to debate.
Individual words aren't comparable to the things people are worried about getting copied. People are much more able to tell you where they learned about more sophisticated concepts and styles.
The same principle applies, though. They can tell you maybe a dozen, maybe a few dozen, concepts they've learned and use in their work. But what about the thousands of concepts they use in their work they can't tell you about? The patterns they've noticed, the concepts that don't even have names, but that came from seeing things in the world world that were all created by other people?
For example, how many artists drawing street scenes credit the designer at Ford Motors for teaching them what a generic car looks like? How many even know which designers created their mental model of a car?
> That's just it, nobody looking to get paid by OpenAI actually did any labor for OpenAI.
To me this is a strong point in favor of the idea that OpenAI has no business using their work. How can you even think it's ok for OpenAI to use work that was not done for them without paying some kind of license? They aren't entitled to the free labor of everyone on the internet!
> How can you even think it's ok for OpenAI to use work that was not done for them without paying some kind of license?
At the risk of answering a rhetorical question: because copyright covers four rights: copying, distribution, creation of derivative works, and public performance, and LLM training doesn't fit cleanly into any of these, which is why many think copying-for-the-purpose-of-training might be fair use (courts have yet to rule here).
I think the most sane outcome would be to find that:
- Training is fair use
- Direct, automated output of AI models cannot be copyrighted (I think this has already been ruled on[0] in the US).
- Use of an genAI to create works that would otherwise be considered a "derivative work" under copyright law can still be challenged under copyright.
The end result here would be that AI can continue to be a useful tool, but artists still have legal teeth to come after folks using the tool to create infringing works.
Of course, determining whether a work is similar enough to be considered infringing remains a horribly difficult challenge, but that's nothing new[1], and will continue to hinge on how courts assess the four factors that govern fair use[2].
The entire point of the patent system was to say inventors can put their design on the net without it being stolen; so future inventors can build on their work.
>They did labor for other reasons, and were happy with it.
True, sadly most of those copyright are probably owned by other megacorp. So they either collude to surppess the entire industry or eat each other alive in legal clashes. The latter is happening as we speak (the writers for NYT are probably long retired, but NYT still owns the words) so I guess we'll see how that goes.
>OpenAI found a way to benefit by learning from these images. The same way that every artist on the planet benefits by learning from the images of their fellow artists.
If we treat AI like humans, art historically has an equally thin line between inspiration and plagiarism. There are simply more objective metrics to measure now because we can indeed go inside an AI's proverbial brain. So the metaphor is pretty apt, except with more scrutiny able to be applied.
> Why is art some special case that should be protected, when many other industries were not?
It shouldn't be.
As soon as someone makes an AI that can produce it's own artwork without requiring ingesting every piece of stolen artwork it can, then I'm on board.
But as long as it needs to be trained on the work of humans it should not be allowed to displace those people it relied on to get to where it is. Simple as that.
Are there any humans that can produce artwork without ingesting inspiration from other art? Do you know any artists that lived in a box their whole life and never saw other art? Do you know any writers who'd never read a book?
Are they any human artists who can't, if requested, draw or write something that's a copy of some other person's drawings or writings?
Also, FYI, you can't steal digital artwork. You can only commit copyright infringement, which is not the same crime as theft, because theft requires depriving the owner of something in their possession.
> Are there any humans that can produce artwork without ingesting inspiration from other art? Do you know any artists that lived in a box their whole life and never saw other art? Do you know any writers who'd never read a book?
> Are they any human artists who can't, if requested, draw or write something that's a copy of some other person's drawings or writings?
This still is pretending that humans and AI models are equivalent actors and should have the same rights
Emphatically no they shouldn't. The capabilities are vastly different. Fair use should not apply to AI.
This isn't about giving "rights" to machines. Machines are just tools. The question is about what humans are allowed to do with those tools. Are humans using AI models and humans not using AI models equivalent actors that should have the same rights? I'd argue emphatically yes they should.
The thing is, we already have doctrine that starts to encompass some of these concepts with fair use.
The four pronged test in US case law:
- the purpose and character of use (is a machine doing this different in purpose and character? many would say yes. is "ripping-off-this-artist-as-a-service" different than an isolated work that builds upon another artist's art?)
- the nature of the copyrighted work
- the amount and substantiality of the portion taken (can this be substantially different with AI?)
- the effect of the use upon the potential market for the original work (might mechanization of reproducing a given style have a larger impact than an individual artist inspired by it?)
These are well balanced tests, allowing me as a classroom teacher to duplicate articles nearly freely but preventing me from duplicating books en masse for profit (different purpose; different portion taken; different impact on market).
The problem with this conversation is that its being had by people that make the top level comment here stating that clothing is not copyrightable. It is. Clothing design is copyrightable. This was a huge recent case, Star Athletica. They know nothing about copyright law and they just build intuitions from the world around them, but the intuitions are completely nonsense because they are made in ignorance of the actual law and what the law does and why the law does it. I find it exhausting.
Your sentiment is probably correct in that there are many aspects of copyright law that are not strictly aligned with the public’s intuition. But your example is a bit of a reach. Star Athletica was a relatively novel holding that allows for a specific piece of clothing, when properly argued, could qualify as copyrightable as a semi-sculptural work of art, however this quality of a given piece is separate to its character as clothing. In fact, the USSC in Star Athletica explicitly held a designer/manufacturer has “no right to prohibit any person from manufacturing [clothing] of identical shape, cut, and dimensions” to clothing which they design/manufacture. That quote is directly from a discussion of the ability to apply copyright protections to clothing design. I think the end result is that trying to argue technical legal issues around a poorly implemented statutory regime is always fraught with errors. That really leave moral and commercial arguments outstanding and advocacy should try and focus on that, when not fighting to affect change in the law these copyright determinations are based on.
And just to be clear, this post does not constitute legal advice.
You're dismissing my comment because of what someone else said upthread?
I hate the desire to meta-comment about the site rather than argue on the merits.
We obviously don't know so much about how courts will interpret copyright with LLMs. There's a lot of arguments on all sides, and we're only going to know in several years after a whole lot of case law solidifies. There are so many questions, (fair use, originality, can weights be copyrighted? when can model output be copyrighted? etc etc etc). Not to mention that the legislative branch may weigh in.
This discourse by citizens who are informed about technology is essential for technology to be regulated well, even if not all participants in the conversation are as legally informed as you'd wish. Today's well-meaning intuition about what deserves copyright and why inform tomorrow's case law and legislation.
> Emphatically no they shouldn't. The capabilities are vastly different. Fair use should not apply to AI.
Fair use applies even to use of traditional algorithms, like the thumbnailing/caching performed by search engines. If I make a spam detector network, why should it not be covered by fair use?
Fair use applies to humans and the things they do (including AI). It is not something that applies to algorithms in themselves. AI's are not people, the people who use them are people and fair use may or may not apply to the things they do depending on the circumstances of whatever it is they do. The agent is always the human not the machine.
True; consider the "it" in my question ("If I make a spam detector network, why should it not be covered by fair use?") as "my making (and usage) of the network".
No idea on the legality, but common sense suggests that the difference would be that a spam detector doesn't replace the products that it was trained on, while AI-generated "art" is intended to replace human artists.
> common sense suggests that the difference would be that a spam detector doesn't replace the products that it was trained on
The extent to which it supplants the original work is one of the fair use considerations.
I think it'd make more sense to have a stance of "current LLMs and image generators should be judged by fair use factors and I believe they'd fail", though I'd still disagree, instead of having machine learning models subject to a different set of rules than humans and traditional algorithms.
That is indeed the most common stance. There isn't nearly as much outcry over, say, image classification by LLMs, as there is over AI "art" generation.
The question is "is it a derivative work of the original?" - not if it is a generative work.
If that was the distinction to be made, using ChatGPT as a classifier would be acceptable while using it to write new spam (see the "I am sorry" amazon listings of the other day) would be unacceptable.
If two different uses of a tool allow for both infringing and non-infringing uses (are photocopiers allowed to make copies(!) of copyrighted works?) it has generally been the case that the tool is allowed and the person with agency to either use the copyrighted work in an infringing or a non-infringing way is the one to come under scrutiny.
I believe that if it is found that OpenAI is found to have committed copyright infringement in training the model, then an argument that training a model on spam be considered to be copyright infringement could be reasonably constructed.
If, on the other hand, OpenAI is found to have sufficiently transformative in its creation of the model and some uses are infringing, then it is the person who did the infringing (as with a photocopier or a printer printing off a copy of a comic from the web) that should be have legal consequences.
> Are there any humans that can produce artwork without ingesting inspiration from other art?
Logically, the answer to this is (almost certainly) yes, so you’ll need to discount this argument.
If the answer were no, then either an infinite number of humans have lived (such that there was always a previous artist to learn from), or it was true in the past but false in the present, which seems unlikely given humans brains have generally become more and not less sophisticated over time.
I presume what you’re missing here is that the brain can be inspired from other sources than human art. For example: nature; life experience; conversation.
Not making any other comment about what machines can or can’t do, just wanted to point out this argument is invalid as it comes up a lot and is probably grounded in ignorance around the artistic process. It’s such a strange idea to suggest that the artist process is ingesting lots of art to make more art. That’s such a weird world view. It’s like insisting every artist is making art the way Quentin Tarantino makes films.
I’ve spent a lot of time with artists, I’ve worked with them, I’ve been in relationships with artists, and I can tell you the great ones see the world differently. There’s something about their brains that would cause them to create art even if born on a desert island without other human contact. Some of them don’t even take an interest in other art.
In fact, those artists that _do_ make art heavily based on other artists’ work as suggested are often derided as “derivative” and “unoriginal”.
> When a little child draws a tree for the first time, where do they draw inspiration? Do you think they were reviewing works of Picasso?
Are we going to discount the hundreds to thousands of artistic pictures children are exposed to? Or how about the teacher sitting up front demonstrating to the class how to draw a tree?
> Do you not have eyes, ears, do you not perceive and get inspiration from the natural world around you?
Learning to see as an artist is a distinct skill. Being able to take the super compressed simplified world view that mind sees and put something recognizable on paper is a specialized skill that has to be developed. That skill is developed by doing it over and over again, often by copying the style of an artist that someone enjoys.
Or to put it another way, go to any period in history prior to the mid 20th century and art in a given region starts to share the same style, dramatically so, because people were inspired by each other, almost to a comical extent. (Financial reasons also had something to do with it as well of course, Artists paint/carve/engrave/etc what sells!)
Yeah, but that’s not really your sole source of inspiration. My son has been ‘inspired’ by the art of all other kids in his kindergarden. Certainly by the time he gets to the age where he does it professionally he’s been inspired by an uncountable number of people.
Being inspired isn't against the law. copying is. it'd be one thing if this conversation could be had with useful terminology that's actually on point. instead we have you, insisting that there is no creative process, there is only experiencing other art and inevitably copying (because apparently you think that's the only thing humans can do!). It's all so telling. Yet its tragic because so many here don't even realize it. I'm sad for your inability to engage with creativity and creative acts.
We don't know what percentage is independent inspiration for a person using the AI to create art.
Once upon a time it was a contentious idea that humans had significant authorship in photographs, which merely mechanically captured the world. What % is the camera's independent inspiration?
Here, we have humans guiding what's often a quite involved process of synthesis of past human (and machine) creation.
> The person using the AI doesn't matter in the equation. They aren't an artist, they're a monkey with a typewriter.
That's an opinion.
Does your opinion hold in all circumstances? If I spend 20 hours with an AI, iterating prompts, erasing portions of output and asking it to repaint and blend, and combining scenes-- did I do anything creative?
Of course the person using the AI matters. It's literally the same as holding a brush. You can give it a prompt, get a result and be unhappy with it, modify it or remove it, and proceed doing that until you are happy with what you have.
No matter how great the AI is, a monkey with an AI will never generate anything useful.
> But as long as it needs to be trained on the work of humans it should not be allowed to displace those people it relied on to get to where it is. Simple as that.
Do you feel the same way about tools like Google Translate?
Tbh I'm not familiar enough with how Google Translate is built, but if it's ingesting tons of people's work without their permission so it can be used to replace them then yes I do.
For what it's worth: that's pretty much how Translate works.
Translate operates at a large-chunk resolution, and one of the insights in solving the problem was the idea that you can often get a pretty-good-enough translation by swapping a whole sentence for another whole sentence. So they ingest vast amounts of pre-translated content (the UN publications are a great source, because they have to be published in the language of every member nation), align it for sentence- and paragraph-match, and feed the translation engine at that level.
It's created an uncanny amount of accuracy in the result, and it's basically fed wholesale by the diligent work of translators who were not asked their consent to feed that beast. Almost nobody bats an eye about this because the value (letting people using different languages communicate with each other) grossly outstrips the opportunity cost of lost human translator work, and even the translators are, in general, in favor of it; they aren't going to be displaced because (a) it doesn't really work in realtime (yet), (b) it can't handle any of the deeper signal (body language, tone, nuance) of face-to-face negotiation, and (c) languages are living things that constantly evolve, and human translators handle novel constructs way better than the machines do (so in high-touch political environments, they matter; the machines have replaced translators in roles like "rewriting instruction manuals" that were always pretty under-served in the first place).
I would argue that Translate being fed by paid UN translators who likely agreed to the use of their transcriptions in a TOS or something is not an equal comparison to unpaid artists having their art submitted online to sites which become part of a training set used in for-profit models such as OpenAI, that they never consented to. OpenAI is a nonprofit parent company, but this spawned a child for-profit company OpenAI LP which most of their staff work for, which is meant to return many-fold returns to their shareholders who are effectively profiting from the labor of all the artists and sources in their training.
Google translate is very basic and not even close to something good if you already know both languages. Useful if you're translating to your language (you do the correction when reading), but can lead to confusion the other way.
If you can do the correction when reading, it seems reasonable to assume the reader in the opposite direction has the same correction capability.
I would expect the chance of confusion to be identical. The only difference is a matter of perspective, where in one case you are the reader and in one case you are the author.
Yes, they are identical. But I believe the reader is better armed to deal with the confusion, or at least to recognize the error, because it does not fit it. But when producing, you don't know the target language, so there's a better chance for errors to slip in unnoticed.
It's better for me to receive a text in the original language and translate it myself than to try to decipher something translated automatically.
Code has licenses too. And we've had very high profile lawsuits based on "copying code".
>what about if we eventually robot labourers that is trained on observing human labourers?
Interesting point, but by that point in time I don't think generative art will even be in the top 10 ethical dilemmas to solve for "sentient" robots.
As it is now, robots aren't the ones at the helm grabbing data for themselves. Humans give orders (scripts) and provide data and what/where to obtain that data.
Just the people in this discussion thread, devs and antrepreneurs, have probably automated a huge amount of work. But here we are bickering about AI and copyrights like its a new thing.
>What would you buy? $10 H&M or $100 hand-made shirt? - (My guess, if you could afford the later.)
This is an interesting example because even in the $100 case you are still talking about machine-augmentation. You can have a seamstress or a tailor customize patterns, using off the shelf textiles, for that order of magnitude price - but if you want to use custom built, exotic materials or many kinds combined, the cost is on the orders of thousands not hundreds. Also there is a large industry of just printing designs on stock-shirts, that has a different point effort-scale equilibria.
Thinking about how how automation disintermediates is very important. For animation, often productions have key-frame artists in the animation pipeline that define scenes, and then others that take though to flush out all the details of that scene. GenAI can potentially automate that process. You could still have the artist producing a keyframe, and can render that into a video.
Another big factor is style. One hypothesized reason that more impressionism, absurdism or abstract art all become styles is photography. Once cheap machine-produced photography became available, there is less need for a portrait artist. But further, it also is no longer high-status and others push trends alternative directions.
All the experiments and innovation going right now will definitely settle into a different set of roles for artists, and trends that they will seek to satisfy. Art-style itself will change as a result of both what is technically possible and also what is _not_ easily automatable in order to gain prestige.
What's killing art is this idea by a vocal minority of "artists" that they need to mass produce their work, enter the market, and attempt to make millions of dollars by selling and distributing it to millions.
That's not art. That's capitalism. That's competing to produce something that customers will want to buy more than what your competitors offer.
If you want to compete on the capitalistic marketplace, then compete on the capitalistic marketplace. But if you want to be an artist, be an artist.
Art is still alive and well and always will be. Every day I see people singing because they love singing, making pottery because they love making pottery, writing because they love writing. Whether other people love or enjoy their art, the artist may or may not care. Whether they can profit from their art, the artist may or may not care. But many billions of artists will keep creating, crafting, and designing day after day, and they will never be stopped by AI or anything else.
People do whatever they want with their own property. You have no right to steal it just because they want to monetise it. What’s killing art is stealing it en masse using procedural generators.
Jobs have never been less soul crushing, or more creative, in the history of humanity. And that becomes increasingly true every decade.
Do you know what a job does? What a company does? It contributes to society! It produces something that someone else values. That they value so much they're willing to pay for it. Being part of this isn't a bad thing. It's what makes society work.
A job/company entertains. It keeps things clean. It transports people to where they need to go. It produces. It gives people things they want. It creates tools, and paints, and nails, and shirts. I look out my window, and I see people delivering furniture, chefs cooking food and selling it out of trucks, keepers maintaining grounds, people walking dogs.
Being useful to the fellow members of your society for 40 hours a week is not "soul crushing."
(This is a response to your comment before you edited it.)
Find the intersection of something that people increasingly value, that you enjoy, and that you can compete at.
The best proof that people value something is that they're spending money for it. If people aren't spending money, they don't value it, and you probably don't want to go into it. If people aren't spending more and more money on it every year, then it's not increasing in value, and you probably don't want to go into it.
The best proof that you enjoy something is that you enjoyed it in the past. Things you liked as a kid, activities that excited you as a young adult, etc., are often the best candidates.
Look for intersections of the two things above. Do some Googling, do some research.
Finally, you need to be able to compete at it. If you do something worse than everyone else does it, then no one will pick you, because you're probably not being helpful. The simple answer to this is to practice to make yourself better. But most people don't want to do that. A better answer to this is to be more unique, so you can avoid the competition. Don't do a job that has a title, a college major, and millions of talented applicants. It's not that helpful to society to do something a hundred million other people can already do, which is why there's more competition and lower wages.
When you find the intersection of what's valued and what you enjoy, call up some people in those fields and ask what's rare. What in their area is needed. What are they missing. What is no one else doing.
Or just start your own company. That's the easiest way to be unique. But it's hard.
Finally, if you feel you're too "mid," then make sure your standards aren't crazy. Don't let society tell you that you need to be a millionaire with a yacht and designer clothes to be happy. Get a normal 9 to 5 with some purpose in it, that you can be proud of, that others appreciate. Live within your means and don't stress yourself out financially. Spend your free time doing things you like. Take care of your health, find good relationships, and treasure them. That's a happy life at any income. I know a bunch of miserable depressed rich people who are very good at making money and very bad at health/relationships/etc., which is the real stuff that life is made out of.
It's an interesting predicament. Assuming these stories between person and machine are indistinguishable and of same quality, then the difference here is the ability to scale. Without giving bias because of humanity reasons, why should we give entitlement to output derived from a human over something else of same quality?
I hate making analogies, but if we make humans plant rows of potatoes, should that command a higher price and seen more valuable than planting potatoes by tractor 20 rows wide?
Exactly; their flesh, blood, energy, etc. does matter. This is my argument for it, not for your argument against it, lmao. There's nothing more remarkable about my planted potato row vs the tractor planted rows, and my energy can be spent elsewhere. I am not entitled to making a living hand planting potatoes if there's not a market for it.
People have the choice to continue making stories and they'll have a fanbase for it and always will, because that's ultimately apart of freedom and choice. Many are less what I'll call purists here, and don't care about how it came to be, they just want a quality story.
What you're loosely proposing is art being a protected class of output, when we have tools that can match and soon with the potential to surpass. Is that not a terrific way to stunt what you're trying to defend?
For transparency, I am an advocate for human made art, but I am against stunting tooling that can otherwise match said creativity. I see that as an artform in itself.
This is just gatekeeping. Art is not better because it was made by hand as opposed to with technology. If I use a generative model to make art then I’m an artist.
I would argue art is better when it's the result of the effort and vision of an individual
prompting a search engine to stitch images together on your behalf might result in an image you can call art, but imo all the art generated wholecloth like this sucks. necessarily derivative. put into the world without thought.
My favorite critique of LLM work: "why would I bother to read a story that no one bothered to write"
This is just the fallacy of the Protestant work ethic with different words. Things don’t need to be difficult to be good. You can’t tell how hard an artist worked just by looking at the piece. There’s a lot of truly terrible art that has had a ton of work put into it.
It’s very easy to make bad art quickly with powerful tools. It’s also possible to carefully craft prompts which generate amazing results that win awards. Source: I’ve done this. You should see the reactions when people have heaped flowery accords on a drawing and then find out it’s Dall-e. The irony of the transition from “art is rebellion” to pearl-clutching is almost the best part.
That critique says more about your understanding than it does about the work.
These models are not conscious, they’re not acting on their own. If I make art using a generative model it’s no more the model doing it than it’s sketchbook doing it if I were to use that. I’m making art using whatever tool, sometimes that tool is more or less powerful. But I’m the one doing it.
How how many books per second can you read to influence and change your personal style?
I don't think any person who actually has worked on anything creative in their life would compare a personal style to a model that can output in nearly any style at extreme speeds. And even if you're inspired by a specific author, invariably what happens is it becomes mix of yourself + those influences, not a damn near-copy.
With visual mediums it's even worse, because you have to take the time [months, years] to specialize in that specific medium/style.
> I don't think any person who actually has worked on anything creative in their life would compare a personal style to a model that can output in nearly any style at extreme speeds. And even if you're inspired by a specific author, invariably what happens is it becomes mix of yourself + those influences, not a damn near-copy.
I don't think anyone who has ever read a novel in their life would say that an AI can write literature at all, in any style.
> not a damn near-copy.
The obvious solution is to just treat it as if a human did it. If you did not know the authorship of the output and thought it was a human, would you still consider it copyright infringement? If yes, fair enough. If no, then i think is clearly not a "damn near-copy"
On my laptop, using modern tools backed by AI? ... many.
>> How how many books per second can you read to influence and change your personal style?
Thanks now to AI, hundreds. I can plug the output of the book-reading AI into the input of the tool I use to write my books and thereby update my personal style to incorporate all the latest trends. Blame the idiots who are paying me for my books.
You should read the response more carefully. Generative models are just tools. If I use one to write a story it’s no less a story that I wrote than if I’d chiseled it into a Persian mountainside.
There are two problems with this (very common) line of argument.
First, the law is pretty clear that yes if your story is too similar to another work, they have rights. Second, it's not at all obvious we can or should generalize from "what a human can do" and "what a bunch of computers can do" in areas like this.
Did you not pay them when you bought their book to read it in the first place? That dead trees don't lend themselves to that sort of payoff is a limitation of the technology. In music, sampling is a well-accepted mechanism for creating new music, and the original authors of the music they used do get paid when the new one is used.
>> the authors did not benefit from my secondary market transaction.
But they did. The presence of a secondary market for used books increased the value of some new books. People buy them knowing that they might one day recoup some costs by selling them. Would people pay more, or less, for a new car if they were told they could never sell or trade it away as a used car?
They actually do exactly this, they’re just not thinking about it. You buy a book because you want the physical possession, which gives you the ability to sell it or give it to someone or display it. Not because you want to read the contents - else you would just borrow it from a library.
> You buy a book because you want the physical possession, which gives you the ability to sell it or give it to someone or display it. Not because you want to read the contents - else you would just borrow it from a library.
Please back up this statement.
I for one, buy primary market books for the content; and to support creators whom I wish to support. Not to display the book or resell it. That is some capitalist jive.
People typically use libraries because
A) they are too poor to afford the books, and libraries therein provide a valuable community function
B) they are doing research and only need the books for a short time
Gee I don't know, but I'm glad that digital goods do not incur the same material costs as a car. "You wouldn't download a car", we've come full circle.
"In 2012, the Court of Justice of the European Union (ECJ) held in UsedSoft
GmbH v. Oracle International Corp that the first sale doctrine applies to
used copies of [intangible goods] downloaded over the Internet and sold in the
European Union." [0]
Arguably the U.S. courts are in the wrong here. We can only hope first sale doctrine is extended to digital goods in the U.S. in the future, as it has been in the EU for over a decade.
That depends on a variety of factors. You may find yourself in trouble if you write about a wizard boy called Perry Hotter going to Elkwood school of magic and he ends up with two sidekicks (a smarter girl and a redhead boy).
It could be argued quite convincingly that stories like Brooks's Shannara and Eddings's Belgariad are LOTR with the serial numbers filed off — but there is more than enough difference in how various pieces work for those series to make them unique creations that do not infringe on the properties or cover too much the story. (Although I cringe at putting the execrable Belgariad books in any class with either LOTR or Shannara.)
The "best" modern example of this is the 50 Shades series. These are Twilight fan fiction (it is acknowledged as such) with the vampire bits filed off. They are inspired by Twilight, but they are not identifiably Twilight in the end. It might be hard to tell the quality of writing from that which an LLM can produce, and frankly Anne Rice did it all better decades before (both vampires and BSDM).
Humans can be influenced by writers, artists, etc. LLMs cannot. They can produce statistically approximated mishmashes of the original works themselves, but there is no act or spark of creation, insight, or influence going on that makes the sort of question you’re asking silly. LLMs are just math. Humans may be just chemistry, but there’s qualia that LLMs do not have any more than `fortune` does.
I'm with all your other arguments ... but not this point. What is the special magic property that machine-generated art doesn't have? Both human and machine generated art can be banal, can be crap. And I think there is plenty of machine generated art this a quite beautiful, and if well prompted even very insightful. Non-GenAI can be this way, Conway's game of life has a quality of beauty to it that rivals of forms of modern art. If you wanted to argue that there still is the need for a human to provide some initial inspiration as input, or programming before something of value can be generated, then I would agree, at least for now, though there is meta-argument about asking LLMs to generate their own prompts that makes this an increasingly gray area.
But I don't think the stochastic parrot argument holds water. Most of _human_ creations is derivative. Unique mixes of pre-existing approaches, techniques, substance, often _is_ the creative act. True innovation with no tie to existing materials seems vanishingly rare to me and is really high bar, beyond which most humans ever achieve.
> Humans can be influenced by writers, artists, etc. LLMs cannot.
This is literally wrong, LLMs are influenced by their users. And people can input new ideas, facts and explore new directions by choosing how they interact with the LLM.
A LLM writing a book all on its own start to finish would be a different story, the output would be derived 100% from its training set. A LLM being prompted with a book chapter would risk borrowing too much.
But if you prompt a LLM without copy pasting protected content in the prompt, then you are the main influence. And LLMs can explore outside their training distribution in this way, helped by humans.
Many people think a trained LLM is frozen, but they do in-context-learning, they can even acquire new concepts/words and properly use them in the same session. They don't keep the memory of this until retraining, but that doesn't mean they are locked up. There is plenty of space in the buffer you can use to add new material after training.
This kind of thinking is like saying you can't drive a nail unless you own a licensed hammer, if someone is using his shoe to drive a nail will be in infringement. Maybe the shoe is also a hammer if the user so says.
Have you noticed that authors and artists love sharing their inspirations? Let's say you're an up-and-coming author. In an interview, you list your sources of inspiration.
Using your logic, why does the creative community celebrate you and your inspirations instead of crying foul like they are with LLMs?
> If I read a lot of stories in a certain genre that I like, and I later write my own story, it’s almost by definition going to be a mish-mash of everything I like.
But it's also going to be affected by the teachers you had in pre-school, the people you hang around with, your relatives, films you've seen, adverts you watched, good memories and bad memories of events. You bring your lived experience to your story, and not just a mish-mash of stories in a particular genre, but everything.
Whereas when you train a model, you know the exact input, and that exact input may be 100% copyright material.
I feel like the keyword is 'almost' and then you begin pulling on that thread:
How closely is this the case?
What blind spots exist?
How do you measure this?
What is the capacity for original idea generation does the human mind have and how does it inspire a unique spin to it?
This is one of those areas where 'thought experiments' are never going to pass muster against genuine experiments with metrics, trial, and robist scientific research.
But with the stakes as they are, I dont have faith there exists a good faith dialogue in this arena.
However, AI models require copying and reproducing copyrighted works, whereas when I read I book I’m not copying it and I’ve secured some sort of license to use it.
The collection, mass copying, and redistribution of work to create these models seems quite obviously to be a violation on IP laws.
If openai would pay usage fees for the training material per user its generating content for - it would never be profitable - artist would be fine off. But even all the shares are owned by people who have given this system none of it‘s knowledge.
>If openai would pay usage fees for the training material per user its generating content for - it would never be profitable
In that case, good? I thought if nothing else, these past year or two would teach companies about sinking money into unsustainable businesses and then price gauging later (i know it won't, the moment interest rates fall we are back to square one). If it isn't profitable, scale down the means of production (which may include paying C class executives one less yatch per year, tragic), charge more upfront to the customers, or work out better deals with your 3rd parties (which is artists in this case).
I also find some scheudenfredre in that these companies are trying to sell "less enployees" to other companies but would also benefit from said scaling down as they throw out defenses of "we can't afford to pay every copyright" .
> The reason that AI models are generating content similar to other people's work is because those models were explicitly trained to do that.
Ah, just like humans who train against the output of other humans. AI models are not fundamentally different in kind in this regard, only scope, and even that isn't perfectly obvious to me a priori.
Humans usually add their own style to things, and it’s hard to discuss copyright without that larger context along with the question of scale (me making copies of your paintings by hand is not as significant a risk to your livelihood as being able to make them unimaginably faster than you can at much lower cost). Just as rules about making images of people in certain ways or places only became critical when photography made image reproduction an industrial-scale process, I think we’ll be seeing updates to fair-use rules based on scale and originality.
Humans can also come up with their own styles and can draw things they’ve never seen, which ML models as they currently exist are not capable of (and likely will never be). A human artist who has lived their entire life in the wilderness and has never trained themselves with the work of another artist will still be able to produce art with styles produced entirely by personal experimentation.
ML models have a long way to go before comparisons to humans make any kind of sense.
I really don't get why so many people seem to think that an AI model training on copyrighted work and outputting work in that same style is exactly the same thing (morally, ethically, legally, whatever dimension you want) as a human looking at copyrighted work and then being influenced by that work when they create their own work.
The first thing is the output of a mathematical function as computed by a computer, while the second is an expression of numan creativity. AI models are not alive. They are not creative. They do not have emotion. These things are not even in the same ballpark, let alone similar or the same.
Maybe someday AI will be sophisticated enough to be considered alive, to have emotion, and to be deserving of the same rights and protections that humans have. And I hope if and when that day comes, humanity recognizes that and doesn't try to turn AI into an enslaved underclass. But we are far, far from that point now, and the current computer programs generating text and images and video are not exhibiting creativity, and do not deserve these kinds of protections. The people creating the art that is used to feed the inputs and outputs of these computer programs... those are the people that should have their rights protected.
> The first thing is the output of a mathematical function as computed by a computer, while the second is an expression of human creativity.
The mathematical function is the expression of human creativity. It consumes other expressions of human creativity, a creativity act in an off itself, to output a different expression of human creativity. Other humans consume the output of the system and the system alike to generate further creative works. An AI model is a higher order creative process, much like humans, and is fundamentally predicated on the creative involvement of humans.
Consider the seminal work _Designing Programmes_ by Karl Gerstner if you'd like further consideration of optimizing creative output via self imposed, systematized restrictions and permutations (design programs as meta art). Or alternatively, consider aleatoric music or Toshiko Takaezu for the incorporation of _chance_ into art.
There really isn't anything too new here in my book (AI) - just increased scope and fruition.
> They do not have emotion.
Art does not require an input or output of emotion nor an emotional affect.
Well, mostly because of corporate greed of ownership. But the underlying issue is that Ai training in AI is a recipe for ruining the entire training set. At least in these early stages.
Not just greed, they want to silence copyright holders whose works they freely use and at the same time prevent others from using theirs. It is like having different set of rules for them. I don't believe training itself is ruining anything, it is the proposed model of value capture and marginalizing content creators that poses greater threat.
Yes, you've condensed the problem in display quite well here. It's not even just hypocrisy, but also short sighted behaviour.
Artists will learn to not trust the web, if they haven't already. The greatest time to train a model was yesterday, eventually no novel ideas, expressions, art will prosper on the "open" web. Just a regurgitation of some statistical idea of words, and pixels.
Oh, right. It just reads a million books in a couple of days, removes all the source information, mix and match it the way it sees fit and sells this output $10/month to anyone comes with a credit card.
It's the same thing with GitHub's copilot.
A book publisher would seize everything I have, and shot me at a back alley if I do 0.0001% of this.
Yeah, fair use implicitly uses the constraints of typical human lifetime and ability to moderate how much damage is done to publishers with it. That wasn’t an issue before recently, as humans were the only ones who could create output based off fair use laws.
AI models are fundamentally different because a computer is a lump of silicon which is neither a moral subject nor object. A human author is a living sentient being that needs to earn a living and is deserving of dignity and regard.
I'm sorry, but I'm going to fundamentally disagree with you. One does not get a morality pass because "the computer did it". People are creating these AI models, selecting data and feeding the models data on which to be trained. The outcome of that rests upon _both_ the creators of the models and the users prompting the models to achieve a result.
I interpret the comment I was replying to as basically saying "We let humans do it, so therefore we should let machines do it." And my response is basically, "we let humans do it because it provides benefits to an actual living being that deserves them".
When Midjourney serves up an image, it does not collect a paycheck that enables it to feed its family. It doesn't go home and sleep well at night with the satisfaction that it has created a piece of art that meant something to others.
It may be the case that the executives and engineers who own Midjourney feel some of that, but I think the experience of making a machine to make X is fundamentally different from making an X.
It may also be the cast that the person who wrote a prompt to ask Midjourney to produce an image generates some value from that and feels good about it. I get that. But I think the amount of creative effort they put into doing that and the amount of value in the result that is derived from uncompensated other artists is profoundly different from sitting down and actually drawing a picture.
A large enough difference in degree is a difference in kind.
As other already pointed out, that's not how human artists learn or produce art. Everyone who uses this brain-dead argument outs themselves as someone who knows nothing about the subject.
So what about the fact that these cartoons look like Keith Haring meets Cathy Guisewite meets Scott Adams? These cartoons are artistically derivative. They are obviously not derivative from the perspective of copyright as style is an idea, not an expression.
These models were not trained on just the cartoonist in question, nor just their inspirations. The intent was to train on all images and styles. The expression of the idea using these models is not going to match the expression of the idea of all images, even those conforming to a certain bounded prompt.
For the life of me I can't get DALL-E or Stable Diffusion to produce anything like Cat and Girl nor anything coherent for the above mentioned inspirations. DALL-E flat out refuses to create things in the style of the above and Stable Diffusion has insane looking outputs, overwhelmed by Herring.
Most importantly, copyright is concerned with specific works that specifically infringe and whose damages are either statute or based on quantifiable earnings from infringement. Copyright does not cover all works, especially when again, the intent is to learn all styles that rarely, if at all, reproduces direct expressions.
The only point at which these images are directly copied are when in the machine's memory, which has already has case law for allowance, followed by back propagation that begins the process of modifying the direct copies for the underlaying formal qualities.
It seems like a lot of people are going to be upset when the courts rule eventually rule in favor of the training and use of these models, if not only because the defendant has a lot of resources to throw at a legal team.
your argument is that it's not infringing because they copied everything at once?
I get that there's case law on copying in memory on the input side not being infringing but can't for the life of me understand how they get away with not paying for it. At least libraries buy the books before loaning them out, OpenAI and midjourney presumably pirated the works or otherwise ignored the license of published works and just say "if we found it on the internet it's fair game"
1. Google's unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google's commercial nature and profit motivation do not justify denial of fair use.
2. Google's provision of digitized copies to the libraries that supplied the books, on the understanding that the libraries will use the copies in a manner consistent with the copyright law, also does not constitute infringement.
Nor, on this record, is Google a contributory infringer.
Focus on the protected aspects of the originals. This would be their expressions. Unless an LLM is reproducing the exact expression and not merely the idea, then there is no market substitute. The market substitute for a Picasso painting is the facsimile of one of Picasso's actual paintings, not just some other cubist artwork.
I think it's worth noting that one of the things that makes this question so vexing is that this topic really is pretty novel. We've only had a few machines like this in history and almost no legal precedent around how they should be treated. I can't remember anyone ever bringing suit over a Markov chain engine, for example, and fabricating one is basically "baby's first introductory 'machine intelligence' project" these days (partially because the output sucks, so nobody has ever felt they have something to lose from competing with a Markov engine).
Existing copyright precedent serves this use-case poorly, and so the question is far more philosophical than legal; there's a good case to be made that there's no law clearly governing this kind of machine, only loose-fit analogies that degenerate badly upon further scrutiny.
That’s literally what human artists do, and how they work. Art is iteratively building on the work of others. That’s why it’s so easy to trace its evolution.
The poster probably meant "fair use", which is an American term of copyright law. The UK, Canada, and other commonwealth countries have a concept known as "fair dealing" which is similar to, but different than fair use[1]. EU copyright law has explicitly permitted uses[2] which are exceptions to copyright restrictions. Research is one of them, but requires explicit attribution.
> I firmly believe that training models qualifies as free use. I think it falls under research, and is used to push the scientific community forward.
I don't think this is as cut-and-dry and you make it out here. If I train a model on, say, every one of New York Times' and release it for free and it finds use as a way of circumventing their paywall I have difficulty justifying that as fair use/fair dealing. The purpose/character of the model should indeed be a factor but certainly not nearly as dispositive a one as I think you're suggesting.
To the extent that training the model serves a research purpose I think that the general use / public release of the trained model does not in general serve the same research purpose and ought to have substantially lower protection on the basis of, e.g., the effect on the original work(s) in the market.
Wouldn't that depend on the use case? If you just had the model regenerate articles that roughly approximate its source material that is much a more clear cut violation of a paywall. But if you use that data as general background knowledge to synthesize aggregative works such a history of the vietnam war, or trends in musical theatre in the 1980s relative the 1970s, or shifts in the language usage of formal honorifics, then that seems to me to be clearly fair-use categories. There are gray areas, such as aggregating the opinions of a certain op-ed writer over a short timeframe that while it might produce a novel work, is basically is mixmash of recent articles. But would that be unfair, especially if not done in the original authors style?
These technical distinctions like these probably will matter in whatever form regulation eventually ends up becoming.
Quite a lot of what news publications like the New York Times do is precisely regenerating articles that roughly approximate source material from some other publication. If I remember rightly, a lot of smaller, more local news organisations aren't happy about this because of course it's a more or less direct substitute for the original and a handful of big news organisations (particularly the New York Times) are taking so much of the money that people are willing to pay for news that it's affecting the viability of the rest - but it's not illegal, since facts cannot be copyrighted.
Yes, I think this is a rather fact-specific inquiry. My main point is that the research/commercial distinction is not the only factor (and not even the most important one).
> if you use that data as general background knowledge to synthesize aggregative works such a history of the vietnam war, or trends in musical theatre in the 1980s relative the 1970s, or shifts in the language usage of formal honorifics, then that seems to me to be clearly fair-use categories
I don't think this is clear. If someone were to train a model on several books about the Vietnam War and then publish my own model-created book on history of the Vietnam War, I would be inclined to say that that is infringement. And if they changed the dataset to include a plurality of additional books which happen to not be about Vietnam, I don't think that changes the analysis substantially.
I think it is hard to earnestly claim that in that instance the output (setting aside the model itself, which is also a consideration) is transformative, and so I would think, absent more specific facts, that all four fair use factors are against it.
> And if they changed the dataset to include a plurality of additional books which happen to not be about Vietnam, I don't think that changes the analysis substantially.
I think the question is if it changes analysis if the dataset DOES include a bunch of books and articles related to Vietnam beyond your specific book.
In the first cast where it just rewriting a single books content, the unfairness is clear.
But in case where it is producing a new synthesis and analysis of the data, derived in part, but not regurgitating the source material, is that unfair?
Sorry, but these arguments by analogy are patently ridiculous.
We are not talking about the eons old human practice of creative artistic endeavor, which yes, is clearly derivative in some fashion, but which we have well established practices around.
We are discussing a new phenomenon of mass replication or derivation by machine at a scale impossible for a single individual to achieve by manual effort.
Further, artists tend to either explicitly or implicitly acknowledge their priors in secondary or even primary material, much like one cites work in an academic context.
Also, the claim:
>But if I take your work and compare it to millions of other people's work...
Is ridiculous. A. you haven't, nor will you ever actual do this. B. This is never how the system of artistic practice up to this point has worked precisely because this sort of activity is beyond the scale of human effort.
In addition, plagiarism exists and is bad. There's no reason that concept can be extended and expanded to include stochastic reproduction at scale.
If you feel artists shouldn't have a say and a future in which capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws and the labor of thousands, by all means. But this argument that somehow by analogy to human behavior companies should not be responsible for the vast use of material without permission is absolutely preposterous. These are machines owned by companies. They are not human beings and they do not participate in the social systems of human beings the way human beings do. You may want to consider a distinction in the rules that adequately reflects this distinction in participatory status in a social system.
>> a future in which capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws and the labor of thousands
I think this is the central issue and is not limited to just AI generated art. Wealth concentrates to the few from each technological development. When robots replaced factory workers, the surplus profit went to the capital holders, not the workers who lost their jobs. AI generated art will be no different but I don't think it will replace the creative art that people will want to make, just the art that people are making to pay the bills.
So your argument is predicated on the scale of inspired work being the problem?
> They are not human beings and they do not participate in the social systems of human beings the way human beings do
I don't think this adds anything to the argument besides you using this as a reason analogies with humans can't be used to compare the specific concept of inspired works? I don't think this holds up.
Algorithms participating in social systems has nothing to do with whether inspired works have a moral claim to existence for some. The fact that your ethics system values the biological classification of the originator of inspired works is something that can't be reconciled into a general argument. I could make the claim that the prompt engineer is the artist in this case.
> capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws
That can be said by the development of any technology. Fear of capital concentration is more a critique on capitalism than it is on technological development.
> That can be said by the development of any technology. Fear of capital concentration is more a critique on capitalism than it is on technological development.
Technology does not exist in a vacuum. All of the utility and relevance of technology to humans is dependent on the social and economic conditions in which that technology is developed and deployed. One cannot possibly critique technology without also critiquing a social system, and typically a critique of technology is precisely a critique about its potential abuses in a given social system. And yes, that's what I'm attempting to do here.
> I don't think this adds anything to the argument besides you using this as a reason analogies with humans can't be used to compare the specific concept of inspired works? I don't think this holds up.
This is a fair point. One could argue that an LLM, properly considered, is just another tool in the artist's toolbox. I think a major distinction though, between and LLM and, say, a paintbrush or even a text-editor, or photoshop, is that these tools do not have content baked into them. An LLM is in a different class insofar as it is not simply a tool, but is also partially the content.
The use of two different LLMs by the same artist, with a the same prompt, will produce different results regardless of the intent of the so called artist/user. The use of a different paintbrush, by the same artist, with the same pictorial intention may produce slightly different results due to material conditions, but the artist is able to consciously and partially deterministically constrain the result. In the LLLM case, the tool itself is a partial realization of the output already and that output is trained on masses of works of unknown individuals.
I think this is a key difference in the "AI as art tool" case. A traditional tool does not harbor intentionality, or digital information. It may constrain the type of work you can produce with it, but it does not have inherent, specific forms that it produces regardless of user intent. LLMs are a different beast in this sense.
Law is a realization of the societal values we want to uphold. Just as we can't in principle claim that training of LLMs on scores of existing work is wrong solely due to the technical function of LLMs, we cannot claim that this process shouldn't be subject to constraints and laws due to the technical function of LLMs and/or human beings, which is precisely what the arguments by analogy try to do. They boil down to "well it can't be illegal since humans basically do the same thing" which is a hyper-reductive viewpoint that ignores both the complexities and novelty of the situation and the role of law in shaping willful societal structure, and not just "adhering" to natural facts.
> They are not human beings and they do not participate in the social systems of human beings the way human beings do.
Your original quote was not using the impact of the technology, it was disparaging the algorithmic source of the inspired work (by saying it does not participate in social systems the way humans do).
> I think a major distinction though, between and LLM and, say, a paintbrush or even a text-editor, or photoshop, is that these tools do not have content baked into them
LLMs, despite being able to reproduce content in the case of overtraining, do not store the content they are trained from. Also, the usage of "content" here is ambiguous so I assumed you meant the storage of training data.
To me, the content of an LLM is its algorithm and weights. If the weights can reproduce large swaths of content to a verifiable metric of closeness (and to an amount that's covered by current law) I can understand the desire to legally enforce current policies. The problem I have is against the frequent argument to ban generative algorithms altogether.
> The use of a different paintbrush, by the same artist, with the same pictorial intention may produce slightly different results due to material conditions, but the artist is able to consciously and partially deterministically constrain the result.
I would counter this by saying the prompts constrain the result. How deterministically depends on how well one understands the semantic meaning of the weights and what the model was trained on. Also, as a disclaimer, I don't think that makes prompts proprietary (for various different reasons).
> I think this is a key difference in the "AI as art tool" case. A traditional tool does not harbor intentionality, or digital information
Assigning "intent" is an anthropomorphism of the algorithm in my opinion as they don't have any intent.
I do agree with your last paragraph though, one (or even a group of) individual's feelings don't make something legal or illegal. I can make a moral claim as to why I don't think it should be subject to constraints and laws, but of course that doesn't change what the law actually is.
The analogies are trying to make this appeal in an effort to influence those who try to make the laws overly restrictive. There are many laws that don't make sense and logic can't change their enforcement. The idea is to make a logical appeal to those who may have inconsistencies in their value system to try and prevent more non-sensical laws from being developed.
Because we are humans and our capability of abusing those rights is limited. The scale and speed at which LLMs can abuse copyrighted work to threaten the livelihoods of the authors of those works is reason enough to consider it unethical.
"abusing those rights" is a subjective phrase. What about it is "abuse"? If I learned how to draw cartoon characters from copying Family Guy and released a cartoon where the characters are drawn in a similar style, would that be abuse (assuming my show takes some of Family Guy's viewership)? Is your ethical hangup with the fact it's wrong to use the data of others to influence one's work (which could potentially be an algorithm) or that people are losing opportunities based on the influenced work?
If it's the latter how do we find the line between what's acceptable and what's not? For example, most people wouldn't be against the creation and release of a cure for cancer developed in this way. It would lead to the loss of opportunities for cancer researchers but I believe most people would deem that an acceptable tradeoff. A grayer area would be an AI art generator used to generate the designs for a cancer research donation page. If it could potentially lead to a 10% increase in donations, does that make it worth it?
>For example, most people wouldn't be against the creation and release of a cure for cancer developed in this way.
Intellectual property law does presently restrict the development of cancer treatments and demands in many cases exorbitant royalties from patients and practitioners, so I'm not convinced that this is accurate. If people believed that the loss of opportunities would constrain innovation in the field of cancer research, I think they'd expect the AI users to pay royalties as well.
>If people believed that the loss of opportunities would constrain innovation in the field of cancer research, I think they'd expect the AI users to pay royalties as well.
This comes down to the product of AI.
If the AI produces a cancer treatment identical to what is already covered by patent, I think commercialization would be contingent on the permission of the IP holder.
If the AI produced a novel cancer treatment, using a transformative synthesis of available knowledge, Most people would not expect royalties.
I never made a legal appeal in my previous comment so legalities are irrelevant. It also differs from my argument on derivative/transformative works rather than specific works.
What I was questioning was whether people would think it's morally right or not to generate inspired works. For example, if someone made an algorithm to read the relevant papers and make a cancer treatment that addresses the same areas/conditions of a method under IP law but don't equate to the exact method, I don't see that as a morally wrong action by itself.
I don’t think it is. What you describe is similar to any other industry disruption, and I don’t think those are unethical. I’d actually argue that preventing disruption is often (not always) unethical, because you artificially prolong an inefficient or inferior alternative.
So you're saying that, we should stop pursuing art and prose? Because when you fine tune midjourney with 30 or so images of an artist, it can create any image with the artist's style.
You removed the value and authenticity that artist in 30 minutes, you applauded it, and defended that it should be the norm.
OK then, we can close down all entertainment business, and generate everything with AI, because it can mimic styles, clone sounds, animate things with gaussian splats, and so on.
Maybe we can hire coders to "code" films? Oh sorry. ChatGPT can do that too. So we need a keypad then, only the most wealthy can press. Press 1 for a movie, 2 for a new music album, 3 for a new book, and so on.
We need 10 buttons or so, as far as I can see. Maybe I can ask ChatGPT 4 to code one for me.
Doesn't matter. You pay the artist for their style of rendering things. Consider XKCD, PHD Comics, Userfriendly, etc. At least 50% of the charm is the style, remaining 50% is the characters and the story arc.
You can't copyright style of a Rolex, but people pay a fortune to get the real deal. Same thing.
> My word, the lawsuits that would arise between artists...
Artists imitate/copy artists as a compliment, at least in illustration and comics world. Unless you do it in bad faith, I don't think artists gonna do that. Artists have a sense of humor to begin with, because art is making fun of this world, in a sense.
No, you pay them for the finished product. The STYLE is independent. Lots of artists have similar styles. They don't all pay each other for copying their styles.
Every artist has their own style, because it's their way of creating the product.
Pixar, Disney and Dreamworks have different styles, same for actors, writers, and designers, too. You can generally tell who made what by reading, looking, listening, etc.
I can recognize a song by Deep Purple or John Mayer or Metallica, just by their guitar tone, or their mastering profile (yes, your ear can recognize that), in a couple of seconds.
If style was that easy, we could have 50 Picassos, 200 John Mayers, 45 Ara Gulers (A photographer) which you can't tell them apart, but it doesn't work that way.
XKCD took a couple of guest artists because of personal reasons. It was very evident, even if the drawing style was the same.
People, art, and hand made things are much more complex than it looks. Many programmers forget because everything is rendered with their favorite font, but no two hand-made thing is ever the same. Eat the same recipe from two different cooks, even if you measure the ingredients independently and give them beforehand, you'll have different tastes.
Style is a reflection of who you are. You can maybe imitate it, but you can't be it.
Heck, even two people implementing the same algorithm in the same programming language doesn't write the same thing.
> Style is a reflection of who you are. You can maybe imitate it, but you can't be it.
Isn't this an argument that AI-generated artwork will never be more than a lesser facsimile? That'd suggest that human-made works will always be more sought-after, because they're authentic.
It'll be, and human made things will always be better and more sought-after, however capitalism doesn't work that way.
When the replacements become "good enough", it'll push the better things because of being cheaper and 90% being there. I have some hand-made items and they're a treat to hold and use. They perform way better than their mass produced ones, they last longer, they feel human, and no, they're not inferior in quality. In fact it's the opposite, but most of them are not cheap, and when you want to maximize profits, you need to reduce your costs, ideally to zero.
Honestly, that'll be boring. I don't want to be a star of a movie, that's not what pulls me in.
I want to see what the person has imagined, what the story carries from the author, what the humans in it added to it and what they got out of it.
When I read a book, I look from another human's eyes, with their thoughts and imagination. That's interesting and life-changing actually. Also, the author's life and inner world leaks into the thing they created.
The most notable example for me is Neon Genesis Evangelion. The psychological aspects of it (which hits very hard actually) is a reflection of Hideaki Anno's clinical depression. You can't fake this even if you want.
This is what makes human creation special. It's a precipitation of a thousand and one thing in an unforeseen way, and this is what feeds us, albeit we are not aware of this and love to deny it at the same time.
"This is what makes human creation special.", that's a load of garbage. There is nothing inherently special about human creation. Some AI artwork I've seen is incredible, the fact it was AI generated didn't change its being an incredible piece of art.
Thinking our creation has some kind of 'specialness' to it is like believing in a soul, or some other stupid thing. It's pure hubris.
Actually, I'm coming from a gentler point of view: "Nature and living things are much more complex than we anticipate".
There are many breakthroughs and realizations in science which excite me more than "this thing called AI": Bacteria have generational memory. Bees have a sense of time. Mitochondria (and cells) inside a human body communicate and try to regulate aging and call for repairs. Ants have evolved antibiotics, and expel the ones with incurable and spreadable diseases. Bees and ants have social norms, they have languages. Plants show more complex behavior than we anticipated. I'm not entering the primates' & birds' region because only the titles will be a short chapter.
While some of them might be very simple mechanisms on chemical level, they make a much more complex system, and the nature we live in is much sophisticated than we know, or want to acknowledge.
I'm not looking from "Humans are superior" perspective. Instead, I'm looking from "our understanding of everything is too shallow" perspective. Instead of trying to understand or acknowledge that we're living in a much more complex system on a speck of dust in vast emptiness, we connect a bunch of silicon chips, dump everything we babbled to a "simulated neural network", and it gives us semi-nonsensical, grammatically correct half-truths.
That thing can do it because it randomly puts a word after word after a very complex and weighted randomization learned from how we do it, but imitating it blindly, and we think that we understood and unlocked what intelligence is. Then we applaud ourselves because we're one step closer to strip a living thing from its authenticity and making Ghost in the Shell a reality.
Living things form themselves over a long life with sight, hearing, communication, interaction and emotions, at least, and we assume that a couple of millions lines of code can do much better because we poured a quadruple distilled, three times diluted version of what we have gone through.
This is pure hubris if you ask me, if there's one.
Then the market will decide, won't it? Why the fuss about generative AI then? If you're so confident about its inferiority, you shouldn't have to worry about it, right? The better product will win, right?
The market does not choose the superior product. It might choose the least common denominator, the cheapest product, the product that got on the market the earliest, or the one with the richest backers, but not "the superior product".
The first part is debatable, unless you qualify it as "superior at making their creator money".
The market selects for that, and only that. Other qualities of the product are secondary, making any statements to the effect of "the best product [outside the context of simply making the most money] will win" misguided at best.
What will actually happen is people will think "meh good enough", shitty AI art will become the norm, and we'll be boiling frogs and not realize how shitty things have become.
Yes, that is true. I 100% agree. It is needed without a doubt.
For one moment, let's think it this way. You are a 20-year experienced engineer who is making whatever money you are making. Suddenly, your skills are invalidated because of a new disruption. And you have another friend in, the same situation.
Fortunately for you, luck played out and you could transition! You found a way into life, meaning and value. Your joy and your everyday life continued as it is.
But the other friend enjoyed the process, and liked doing what they were doing and there was no suitable transition for them. Humans are adaptable, but to them, nothing mattered because the whole existence didn't offer any value. The sole act of doing was robbed WITHOUT ANY ALTERNATIVE. The experience and value of a person rendered worthless.
Can you relate to that feeling? If yes, thank you.
If no, your words are empty and hold no value.
Artist went through the similar phase during the invention of photography. Now, it is rather soul-crushing because anything an artist make can easily be replicated, making the whole artistic journey a moot.
> Can you relate to that feeling? If yes, thank you.
> If no, your words are empty and hold no value.
Being sympathetic towards those people doesn't mean you should bend to their will if you don't believe it's the right thing to do. I can be sympathetic to a child who cries over not being able to ride a roller coaster because they aren't tall enough without thinking the height requirement should be removed.
I think the big difference is that it's not a direct replacement - it feeds off of the existing people while making it much harder for them to make a living.
It would be as if instead of cars running on gasoline, they ran on chopped up horseflesh. Not good for the horses, and not sustainable in the long term.
Some "disruptions" are unethical, some are not. It's about what they actually consist of. Labelling many things as "industry disruption" abstracts beyond usefulness.
Do you really feel that way universally? Would it be ethical to disrupt the pharmaceutical industry by removing all restrictions around drug trials? Heck, you could probably speed things up even further if you could administer experimental drugs to subjects without their consent.
Obviously this is a bit facetious, but basing your ethical framework on utilitarianism and _nothing_ else is pretty radical.
If having those restrictions makes the world worse overall, then it would be ethical to remove them. But I assume the restrictions are designed by intelligent people with the intention of making the world better, so I don’t see any reason to think that’s the case.
I agree that the current crop of artists are worse off with AI art tools being generally available. But consumers of art, and people who like making art with AI art tools, are better off with those tools being available. To me it’s clear that the benefit of the consumers outweighs the cost to the artists, and I would say the same if it was coders being put out of jobs instead. You can prove this to yourself by applying it to anything else that’s been automated. Recording music playback put thousands of musicians out of work, but do you really regret recorded music playback having been invented?
P.S. Adobe firefly is pretty competent and is only trained on material that adobe has the license to. If copyright were the real reason people didn’t like AI art tools, you would see artists telling everyone to get Adobe subscriptions instead of Midjourney.
> If having those restrictions makes the world worse overall, then it would be ethical to remove them
Worse how? As defined by whom?
You could make a pretty compelling argument that "the world" would be better off by, e.g., forcing cancer patients through drug trials against their will. We basically could speed run a cure to cancer!
These longtermist, ends justify the means, ideas can easily turn extremely gross.
Don't even try to stop my grocery-store-sample-hoarding robot army, Wegmans! You're being unethical in your pathetic attempt to prevent your sampling disruption!
Are photocopy machines illegal? Are CD-ROM burners illegal? Both allow near-unlimited copies of copyrighted material at a scale much faster than a human could do alone.
The tools are not the problem, it's how humans use them.
Same as an LLM, they can be used in an illegal way if used to copy copyrighted material. So I can't tell it to reproduce a copyrighted work. But it can create new material in the style of another artist.
The difference is that the LLM is still copying copyrighted material in your case, but if I burn a Linux ISO, that is not happening.
You do not have to produce an exact copy of something to violate copyright, and I think anything the LLM outputs is violating copyright for everything it has ever trained on, unless the operator (the person operating the LLM and/or the person prompting it) has rights to that content.
No, and I don't think anyone is arguing that LLMs should be illegal either.
I personally am not against LLMs training on things the operator has rights to, and even training on copyrighted things, but I am against it laundering those things back out and claiming it's legal.
Because we are humans and our capability of abusing those rights is limited. The scale and speed at which looms can abuse copyrighted work to threaten the livelihoods of the seamstresses of those works is reason enough to consider it unethical.
Replace loom for printing press etc, you realize you're a luddite?
Ned Ludd was onto something. He wasn't anti-progress. He was anti-labour theft. The problem was not that people were losing their jobs, but that they were being punished by society for losing their jobs and not being given the ability to adapt, all to satisfy the greed of the ownership class.
I am hearing a strong rhyme.
Commercialized LLMs are absolutely labour theft even if they are useful.
We do not want our labour stolen. We want to labour less, and we want to be fairly compensated for when we have to labour.
The Luddites and the original saboteurs (from the French sabot) had a problem where the capital class invested in machines that let them (a) get more work done per person, (b) employ fewer people, and (c) pay those fewer people less because now they weren't working as hard. The people they fired? They (and the governments of the day — just like now) basically told them to go starve.
> The Luddites were members of a 19th-century movement of English textile workers which
> opposed the use of certain types of cost-saving machinery, and often destroyed the
> machines in clandestine raids. They protested against manufacturers who used machines
> in "a fraudulent and deceitful manner" to replace the skilled labour of workers and
> drive down wages by producing inferior goods.[1][2] Members of the group referred to
> themselves as Luddites, self-described followers of "Ned Ludd", a legendary weaver
> whose name was used as a pseudonym in threatening letters to mill owners and
> government officials.[3]
Yes, we want to work less. But fair work should result in fair compensation. Ultimately, this is something that the copyright washing of current commercialized LLMs cannot achieve.
I agree with the other commenters about the scale of this “deriving inspiration from others” is where this feels wrong.
It feels similar to the ye olden debates on police surveillance. Acquiring a warrant to tail a suspect, tapping a single individual’s phone line, etc all feels like very normal run-of-the-mill police work that no one has a problem with. Collating your behavior across every website and device you own from a data broker is fundamentally the same thing as a single phone’s wiretap, but it obviously feels way grosser and more unethical because it scales way past the point of what you’d imagine as being acceptable.
In that example it's not the scale that makes it right or wrong, the scale of people impacted just affects the degree of wrongs that have been committed.
> Acquiring a warrant to tail a suspect, tapping a single individual’s phone line, etc all feels like very normal run-of-the-mill police work that no one has a problem with.
If acquiring a warrant is the basic action being scaled, I'd be okay with that ethically if it was done under, what I define as, reasonable pretenses. Regardless of how it scales, I still think it would be the right thing to do assuming the pretenses for the first action could be applied to everyone wiretapped. Now if I thought the base action was morally wrong (someone was tailed or wiretapped without proper pretenses), I'd think it's wrong regardless of the scale. The number of people it affected might impact how wrong I saw it, but not whether it was right or wrong to.
I'm not as interested in making a technical/legal argument, as I'm just sharing my feelings on the topic (and eventually, what I think the law should be), but during training copies are made of copyrighted material, even if the model doesn't contain exact copies of work. Crawling, downloading, storing (temporarily) for training all involve making copies, and thus are subject to copyright law. Maybe those copies are fair use, maybe it's not (I think it shouldn't be).
My main point is that OpenAI is generating an incredible amount of value all hinging on other people's work at a massive scale, without paying for their materials. Take all the non-public domain work off Netflix and Netflix doesn't have the same value they have today, so Netflix must pay for content it uses. Same goes for OpenAI imho.
> Take all the non-public domain work off Netflix and Netflix doesn't have the same value they have today, so Netflix must pay for content it uses. Same goes for OpenAI imho
I'd feel a lot better about that argument if we had sane copyright laws and anything older than 7-10 years was automatically in the public domain. Suddenly, netflix is looking a lot more valuable with just public domain works and there'd be a ton of public domain art to train AI models with. I suspect that the technology would still leave a lot of artists concerned in that situation though, because even once the issue of copyright is largely solved the fact remains that AI enables people who aren't artists to create art.
Assume I agree that copyright holders should be compensated for their works (because I do in some sense).
How would this compensation work? Let's say a portion of profits from LLMs that were trained on copyrighted work should be sent to the copyright holders.
How would we allocate which portion of the profits go to which creators? The only "fair" way here would be if we could trace how much a specific work influenced a specific output but this is currently impossible and will likely remain impossible for quite some time.
This is what licensing negotiations are for. One doesn't get to throw up their hands and say "I don't know how to fairly pay you so I won't pay you at all".
Your argument is ridiculous, because it could identically be applied to "every human artist should have to pay a license to every artist whose work they were inspired by". That would obviously be a horrible future, but megacorps like Disney would love it.
Calling it copyright today is a misnomer, it's not actually the act of copying the work that's a problem; it should actually be called "Performance Rights" or "Redistribution Rights." The part where this gets complicated is that OpenAI has (presumably, if they haven't that's a different matter) acquired the works through legal means. And having acquired them they're free to do most anything with them so long as they don't redistribute or perform the works.
The big question is where "training an AI on this corpus of works and then either distributing the weights or performing the work via API" fall? Should the weights be considered derivative works? I personally don't think so and although the weights can be used to produce obviously infringing works I don't think this meets the bar of being a redistribution of the work via a funny lossy compression algo like some are claiming. But who knows? Copyright is more political than logical so I think the bend is really gonna be a balance of the tangible IRL harms artists can demonstrate vs. the desires of unrelated industries who wish to leverage this technology and are better for having all this data available.
I do think it's worth remembering there's a difference between "legal" and "good".
It's entirely legal for me to leave the pub every time it comes up to my round. It's legal for me to get into a lift and press all the buttons.
It's not unreasonable I think for people to be surprised at what is now possible. I'm personally shocked at the progress in the last few years - I'd not have guessed five years ago that putting a picture online might result in my style being easily recreated by anyone for the benefit mostly of a profitable company.
Another example is proprietary software that may have it's source available, either intentionally or not. If you view this and then work on something related to it, like WINE for example, you are definitely at risk of being successfully sued.
If you worked at MicroSoft and worked on Windows, you would not be able to participate in WINE development at all without violating copyright.
If you viewed leaked Windows source code you also would not be able to participate in WINE development.
An interesting question that I have, is whether training on proprietary, non-trade-secret sources would be allowed. Something like unreal engine, where you can view the source but it's still proprietary.
Another question is whether training on leaked sources of proprietary and private but non-trade-secret code, like source dumps of Windows is legal.
The way this works is they way many of us are arguing that AI and copyright should work.
Vieiwing (or training on) copyrighted work isn't copyright infringement.
What can be copyright infringement is using an employee who has viewed (or a model that was trained on) copyrighted work to create a duplication of that work.
In most of the examples of infringing output that I've seen, the prompt is pretty explicit in its request to duplicate copyrighted material.
Models that produce copyrighted content when not explicitly asked for will have trouble getting traction among users who are concerned about the risk of infringement (such the examples you listed.)
I also see this approach opening an opportunity for models that acquire specific licenses for the content they train on that would grant licenses to the users of the model to duplicate some or all of the copyrighted works.
The responsibility for how a model is used should rest primarily on the user, not the model trainers.
Let's say I'm an artist. I have, thus far, distributed my art for consumption without cost, because I want people to engage with and enjoy it. But, for whatever reason, I have a deep, irrational philosophical objection to corporate profit. I want to preclude any corporation from ever using my art to turn a profit, when at all possible. I have accepted that in some sense, electrical and internet corporations will be turning a profit using my work, but cannot stomach AI corporations doing so. If I cannot preclude AI corporations from turning a profit using my work, I will stop producing and distributing my work.
Do you think it's reasonable for me to want some legal framework that allows me to explicitly deny that use of my work? Because I do.
Copyright is a bad idea in the first place, and should just be thrown out entirely; but that isn't the whole picture here.
If OpenAI is allowed to be ignorant of copyright, then the rest of us should be allowed, too.
The problem is that OpenAI (alongside a handful of other very large corporations) gets exclusive rights to that ignorance. They get to monopolize the un-monopoly. That's even worse than the problem we started with.
Who is "we" here? Are you making a distinction between people and machines? If I built a machine that randomly copied from a big sample of arts that I wanted, would that machine be ok?
OpenAI built a machine that does exactly that. They just sampled _everyone_.
Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations? Honestly wondering. This seems to be the crux issue here.
Google is not producing something that competes with or is comparable to what it's parsing and displaying, which makes it very different.
Google is displaying the exact content and a link to the source, and is functioning as a search engine.
Copying music (or whatever), and then outputting music based on the copied music is not the same thing as a search engine, it's outputting a new "art" that is competing with the original.
Another way to put it, is that you can't use a search engine to copy something in any meaningful way, but copying music to produce more music is actually copying something.
The goal of my post was not to answer what differentiates google search with LLMs and other generative models, it was to respond to the original post above:
> Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations
The reasons as to why I don't think training on copyrighted data are stated in my other comments replying to people who have made arguments about its immorality.
Authors Guild, Inc. v. Google, Inc. determined that Google's wholesale scanning and uploading of books is allowed under the first sale doctrine because the University of Michigan library they borrowed the books to scan from paid for them (or a donor paid for them, at some point). Here's a book of bedtime stories available in its entirety: https://www.google.com/books/edition/Picnics_in_the_Wood_and...
In your first two examples, Google is still not providing an alternative to the original content, the people who have uploaded the content are and they are doing so illegally.
The Google Books thing is a lot more interesting I think. I guess the idea is that Google is acting as a Library, and they are lending you the book? I'm not sure how I feel about this and I would need to do a lot more research on it before having a strong opinion.
The other big difference here is that you can't use the content linked from Google search as if it were your own. If you Google search "nes emulator in javascript", and you get a link to a Github repo, you can't copy paste the code as if it were your own, and even basing your code on what you have saw could be risky depending on the license of the repo. LLMs are acting as a sort of search engine, pretty similar to how Google search does, but people are using the output from the "search" as if it were their own work that they have full rights to!
If crawling the web, ingesting copyrighted content, and ranking them is not a derivative work for that content, then using them to change the values of a mathematical expression should also exempt the expression from being a derivative work.
> Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works
In that case the OP should have never posed this irrelevant question because access to the expression isn't giving access to a derivative work.
>> ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations?
Entire industries exist dedicated to such things. News aggregators. TV parody shows. Standup comedians. Fan fiction. Biography writers. Art critics. Movie critics. Sometimes the derivative work even outsells the original, especially when the original was horrible or unappreciated. I have never played Among Us or the new Call of Duty, but I do enjoy watching NeebsGaming do their youtube parodies of them.
No, copyright law prohibits that. The best example so far is Google's image search being considered a fair use, notably there, its not commercial in as far as they do not sell the derivative work, though they might sell ads on the image search results. OpenAI sells their service which is the result of the copies, i.e. the a derivative work. It's also probably true that the AI weights themselves are derivatives of the works they are based from.
Yes, I believe that is correct. If you do something "transformative" with the material then you are allowed to treat it as something new. There's also the idea of using a portion of a copyrighted work (like a quote or a clip of a song or video), this would be "fair use".
I sue people for copyright infringement frequently. It's rare that I have a defendant whose defense is "the internet is full of other infringers, why should I be held responsible?" Never have they won. This debate would go better if people didn't base it on assumptions they gleam from the world around them, but with regard to the actual law and not specious reasoning like "well, they did it too!!"
Style is not copyrightable. Please make an actual effort to engage with copyright law and not just ask me smarmy questions because you think you are right because you've made no efforts past looking at things immediately in front of you.
The artist in the article clearly states that his work was free to use only if it was not used to make a profit, those were the terms of their license. In the artist's opinion, OpenAI violated that license by training their tool on their work and then selling that tool.
This artist doesn't complain about work similar to their own being generated, and their artwork is very clearly not clothing.
>> In the artist's opinion, OpenAI violated that license...
So? Why does the author's opinion even enter into the equation? Authors cannot claim ownership beyond the bounds of copyright. If what AI is doing qualifies as fair use, the artist cannot do anything about it. I'm sure that lots of artists would not want anyone to lampoon or criticize their work. They cannot stop such things. I'm sure lots of artists would never want anyone to ever create anything in any way similar to their work. They cannot do that either.
It is not clear that training an LLM falls under "fair use". We are then left with the license of the work, in this case that license forbids re-selling the work for a profit. It is the artist's license for their work at issue, not their opinion.
> "...clearly states that his work was free to use only if it was not used to make a profit"
Replace "use" with "copy". No one may copy the work to make a profit. Fair Use has long been an exemption to copyright, with Learning an example of Fair Use. But no one expected AIs to learn so quickly. I don't think it is clear either way, and will end up in SCOTUS.
>> Fair Use has long been an exemption to copyright,
The proper construction is that copyright is an exemption from the freedom of speech. Fair use is a partial description of freedom of speech, a description to narrow the limits of copyright rather than to broaden the already limitless bounds of freedom of speech.
The default for expression is that it is allowed except if copyrighted, as opposed to copyrighted except when covered by fair use.
I disagree that a person learning is the same as an AI model being trained. That aside, typically fair use covers the use of an excerpt or a portion of the material, not reproduction of the work in it's entirety.
Agreed: in the end, courts will make the decision.
Well, not exactly. Certain uses are fair. The question is does OpenAI's use count as fair. I don't think your immediate response comes close to addressing that question despite your conviction it does otherwise.
Also, clothing designs are copyrightable. The conviction expressed by some participants in this debate is exhausting in light of their familiarity with actual copyright law.
Copyright is just made up for pragmatic purposes. To incentive creation. It does not matter if training models is not the same as reproducing something exactly if we wish to decide that it's unfair or even just desirable for economic incentive to disallow it, then we are free to make that decision. The trade offs are fairly profound in both directions I think and likely some compromise will need to be made that is fair to all parties and does not cripple economic and social progress.
>But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it.
It's important to consider in any legalistic argument over copyright that, unlike conventional property rights which are to some degree prehistoric, copyright is a recent legal construct that was developed for a particular economic purpose.
The existing standards of fair use are what they are because copyright was developed with supporting the art industry as an intentional goal, not because it was handed down from the heavens or follows a basic human instinct. Ancient playwrights clipped each others' ideas liberally; late medieval economists observed that restricting this behavior seemed to encourage more creativity. Copyright law is a creation of humans, for humans, and is subordinate to moral and economic reasoning, not prior to it.
Copyright only makes any sense for goods with a high fixed cost of production and low to zero marginal cost. Any further use beyond solving that problem is pure rent seeking behavior
Also, with computers being functional copyright has become a tool of social control; any function in a physical object can be taken away from you at a whim with no recourse so long as a computer can be inserted into the object. Absent a major change in how society sees copyright I envision a very bleak totalitarian future arising from this trend.
No, copyright only makes sense insofar that it provides a net positive value for society: that it promotes/protects more creativity leading to economic output than it prevents.
That is, does the amount of creative/economic output dissuaded by allowing AI (preventing people who would not be able to or not want to create art if they couldn't get paid) exceed the creative/economic output of letting people develop and use such AIs?
GenAI reduces the fixed cost of creating images/text/whatever which all else equal will increase the amount created. Whether or not you think that is a good thing is probably mostly a function of do you make money creating these things or do you pay money to have these things created.
> Copyright only makes any sense for goods with a high fixed cost of production and low to zero marginal cost. Any further use beyond solving that problem is pure rent seeking behavior
100% agree. But even then it's not very good. Abolish copyright, severely limit patents, and leave trademarks as they are. The IP paradigm needs an overhaul.
The future of good AI art is Adobe Firefly; a tool in a picture editor which gives users great productivity for certain tasks. Artists won’t go extinct; they will be able to produce a lot more art.
That's the future of AI art - but is AI art the future of art? if AI artists can't maintain any profit from their work, how are they going to afford the compute time?
> Copyright only makes any sense for goods with a high fixed cost of production and low to zero marginal cost.
If that's the case, then novels, news articles, digital images, etc. are things that copyright absolutely makes sense for. If you think that they have a "low cost of production", you are sadly misinformed about the artistic process.
Some of these have vanishingly low marginal costs when it comes to reproduction, but in light of their high fixed cost of production, I don't see how that matters.
Clothes are inherently consumable goods. If you use them, they will wear out. If you do not use them, they still age over time. You cannot "copy" a piece of clothing without a truly astonishing amount of effort. Both the processes, and the materials, may be difficult or impossible to imitate without a very large investment of effort.
Compare this to digital art: You can copy it literally for free. Before AI, at least you had to copy it mostly verbatim (modulo some relatively boring transforms, like up/down-scaling, etc.). That limited artist's incomes, but not their future works. But in a post-AI world, you can suck in an artist's life's work, and generate an unlimited number of copycats. Right now, the quality of those might be insufficient to be true replacements, but it's not hard to imagine we'll be in a world not so far off when it will be sufficient, and then artists will be truly screwed.
GP compared copying a piece of clothing to copying digital art. I'd say that setting up a factory to make knockoffs - or even "just" buying a sewing machine, finding and buying the right fabric, laying out the piece you want to copy, tracing it, cutting the fabric, sewing it, and iterating until it comes out right - would qualify as "a truly astonishing amount of effort" for a person.
Yes. Now you - a generic person without any specific skills - just have to search for someone who is willing to sew knockoff clothing to your specification, ship them the item, and wait for them to get back to you - a month or longer, probably, as they are likely based in China and shipping things by boat - while hoping that your cursory search led you to an honest knockoff manufacturer, who won't just take your money and disappear.
I'd say that it still qualifies as a "truly astonishing amount of effort" when compared to the right-clicking and pressing save method of copying digital art.
> We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it.
In theory: sure
In practice: not really, especially when you're small and the other side is big and has lots of lawyers and/or lawmakers in their pockets.
Disney ("In 1989, for instance, the company even threatened to sue three Florida daycare centers unless they removed murals featuring some of its characters") and Deutsche Telekom[1][2] ("the company's actions just smack of corporate bully tactics, where legions of lawyers attempt to hog natural resources — in this case a primary color — that rightfully belong to everyone") are just two examples that spring to mind.
people and companies are copying copyrighted content when they're using datasets that contain copyrighted content (which also repackage and distribute copyrighted content - not just as links but as actual works/images too), download linked copyrighted content, and store that copyrighted content. plenty of copies created and stored, it seems to me.
and like, what, do you think they're trying their damnedest to keep datasets clean and to not store any images in the process? how do you think they retrain on datasets over and over? it's really simple - by storing terabytes of copyrighted content. for ease of use, of course - why download something over and over, if you can just download it and keep it. and if they really wanted to steer clear of copyright infringement, if there's truly "no good solution" (which is bullshit for compute, oh, they can compute everything but not that part) - why can't they just refrain from recklessly scraping everything, if something were to just 'slip in'? like, if you know it's kinda bad, just don't do the thing, right? well, maybe copyright infringement is just acceptable to them. if not the actual goal.
what they generate is kinda irrelevant - there's plenty of copyright infringement happening even before any training were to be done. assembling of datasets and bad datasets containing copyrighted content are the start and the core of the copyright problems.
there's a really banal thing at the core of this, and it's just a multi-TB storage filled with pirated works.
If training a model is fair use than model output should also fallow fair use criteria. The very first thing you can find on the internet about fair use is Wikipedia article on the topic. It lists a bunch of factors to decide whether something is fair use. The very first one has a quote from an old copyright case:
> [A] reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.
Most use of LLMs and image generation models do not produce criticism of their training data. The most common use is to produce similar works. You can find this very common “trick” to get a specific style of output to add “in style of <artist>”. Is this a direct way "to supersede the use of the original work”?
You can certainly see how other factors more or less put gen ai output into the grey zone.
The fact that clothing doesn’t qualify for copyright doesn’t mean text and images don’t. Or if you advocate that they don’t then you pretty much advocate for abolishment of copyright because those are the major areas of copyright applicability at the moment. Which is a stance to have but you’d probably be better to actually say that because saying that copyright applies to some images and text but not others is a much harder position to defend.
>I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.
Just like the rest of AI, if your argument is "humans can already do this by hand, why is it a problem to let machines do it?", its because you are incorrectly valuing the labor that goes into doing it by hand. If doing X that has potentially negative side effect Y, then the human labor to accomplish X is the principle barrier to Y, which can be mitigated via existing structures. Remove the labor barrier, and the existing mitigation structures cease to be effective. The fact that we never deliberately established those barriers is irrelevant to the fact that our society expects them to be there.
I feel the emotionally charged nature of the topic prevents a lot of rational discussion from taking place. That's totally understandable too, it's the livelihood for some of those involved. Unless we start making specific regulations for Generative AI, current copyright law is pretty clear: you can't call your art a Picasso, but you can certainly say it was inspired by Picasso. The difference is that GAI can do it much faster and cheaper. The best middle ground in my opinion is to allow GAI to train on copyrighted data, but the output cannot be copyrighted, and the model weights creating it can't be copyrighted either. Any works modified by a human attempting to gain copyright protection should have to fulfill the requirements to be substantiative and transformative just as fair use requires now.
I think there is a case to be made when AI models do produce copies. For instance, I think the NYT have a right to have an issue with the near verbatim recall of NYT articles. It's not clear cut though, when these models produce copies, they are not functioning as intended. Legally that might produce a quagmire, is it fair use when you intend to be transformative but by accident it isn't? Does it matter if you have no control over which bits are not transformative? Does it matter if you know in advance that some bits will be non transformative but you don't know which ones.
I presume there are people working on research relating to how to prevent output of raw training data, what is the state of the art in this area? Would it be sufficient to prevent output of the training data or should the models be required to have no significant internal copies of training examples?
> This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before.
Most every fashion company has a legal team that reviews print and pattern, as well as certain other aspects of design, relative to any source of inspiration. My husband works in the industry and has to send everything he does for review in this way. I’m not sure where you got the idea that there are no IP protections for fashion, but this is untrue.
AI doing things that human laboriously learned and inspired from is just different. After all, sheer quantity can be its own quality, especially with AI learning.
Now, i am worried about companies like OpenAI monopolizing technology through making their technology proprietary. I think their output should be public domain and copyright should only apply to human authors if they should be at all.
But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it. If one doesn't want anyone to see/consume or be influenced by one's copyrighted work, then lock it in a box and don't show it to anyone.
I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.
This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before. Visual art may be approaching a similar point. No matter how original you think your drawings are, someone out there has already done something similar. They may not have created exactly the same image, but neither does AI literally copy images. That reality doesn't kill visual arts as it didn't kill off the fashion industry.