Courts say AI training on copyrighted material is legal

foregoneconclusion · June 25, 2025 6:25PM

mfryd said:

foregoneconclusion said:

mfryd said: I am not a lawyer, however this is my understanding of what someone can do without violating copyright law.

Yes, a person can do that without violating copyright. But AI doesn't work like the human mind. AI requires the complete works of Stephen King to be copied into a database. If it's done without permission, then it's a violation of copyright.

One can certainly make a reasonable case that an internal copy is a violation. One can also make a case that an internal copy is fair use.

AI may not store text in a simple format. A computer can parse the text, break it down into verbs, nouns, concepts, etc. The computer might be storing a sophisticated parse tree and analysis of the original text, not the original text itself. Perhaps this conversion is transformative enough that it isn't a violation? Does it make a difference if the computer and undo the transformation and can recover the original text?

The courts are currently working on clarifying how to apply the existing copyright law to these new, and unforeseen usages.

Again, "fair use" for a professional product is always going to be limited to excerpts. And I think you're aware that if the AI program only used public domain writings then the output is only going to be similar to public domain writings. It isn't suddenly going to start writing like Stephen King.

randominternetperson · June 25, 2025 6:26PM

foregoneconclusion said:

mfryd said: I am not a lawyer, however this is my understanding of what someone can do without violating copyright law.

Yes, a person can do that without violating copyright. But AI doesn't work like the human mind. AI requires the complete works of Stephen King to be copied into a database. If it's done without permission, then it's a violation of copyright.

Not according to this week's ruling.

"In short, the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use."

https://storage.courtlistener.com/recap/gov.uscourts.cand.434709/gov.uscourts.cand.434709.231.0_2.pdf page 13

edited June 25

sconosciuto · June 25, 2025 6:29PM

designguybrown said:

Meh. Seems emotional and sentimental. If you are placing your content on the web, you are practically posting it on the street for general view with absurd hopes of pennies trickling in on some desperate fancy rather than through proper business channels with an effective strategy of legally protecting and promoting yourself - childish. Most people who do such art that they may avoid other types of structured paid work - what do they expect when they treat their skill set as a hobby - likely not wanting to work for others on a structured gig - if that's even around much? What's even the issue here - not getting a piece of the trifling leavings of scrapers and edu-content pedlars? pedantic. Art needs to stop being a vague creation-vocation of the rando people and grow up. Successful society is based on complex businesses and legal structures requiring serious people acting seriously. Creativity is a real skill and needs focused training and a hierarchy of knowledgeable people to propagate it through society. Sorry, but I have little symp for the dilettantes and dabblers hoping to otherwise avoid the soulless cubicle, construction site, and assembly line.

Or you could seek therapy for dealing with whatever jt is that the person who hurt you did.

foregoneconclusion · June 25, 2025 6:48PM

randominternetperson said:

foregoneconclusion said:

mfryd said: I am not a lawyer, however this is my understanding of what someone can do without violating copyright law.

Yes, a person can do that without violating copyright. But AI doesn't work like the human mind. AI requires the complete works of Stephen King to be copied into a database. If it's done without permission, then it's a violation of copyright.

Not according to this week's ruling.

"In short, the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use."

https://storage.courtlistener.com/recap/gov.uscourts.cand.434709/gov.uscourts.cand.434709.231.0_2.pdf page 13

Yes, but the judge is ignoring the fact that the AI program is a professional product and a human being is not. Plus, if the AI is really all that "transformative", why wouldn't AI companies simply restrict the training model to public domain material? Answer: they know that it isn't really as "transformative" as the judge thinks it is. For example, if the AI was only trained on the text from Bazooka Joe comics then take a wild guess at what the output would be like? Not all that "transformative".

randominternetperson · June 25, 2025 7:06PM

foregoneconclusion said:

mfryd said:

foregoneconclusion said:

mfryd said: I am not a lawyer, however this is my understanding of what someone can do without violating copyright law.

Yes, a person can do that without violating copyright. But AI doesn't work like the human mind. AI requires the complete works of Stephen King to be copied into a database. If it's done without permission, then it's a violation of copyright.

One can certainly make a reasonable case that an internal copy is a violation. One can also make a case that an internal copy is fair use.

AI may not store text in a simple format. A computer can parse the text, break it down into verbs, nouns, concepts, etc. The computer might be storing a sophisticated parse tree and analysis of the original text, not the original text itself. Perhaps this conversion is transformative enough that it isn't a violation? Does it make a difference if the computer and undo the transformation and can recover the original text?

The courts are currently working on clarifying how to apply the existing copyright law to these new, and unforeseen usages.

Again, "fair use" for a professional product is always going to be limited to excerpts. And I think you're aware that if the AI program only used public domain writings then the output is only going to be similar to public domain writings. It isn't suddenly going to start writing like Stephen King.

A critical part of this week's ruling is that Anthropic put controls in place so that their service would not output copyrighted materials and the plaintiff's didn't claim that they were. As the judge says, nothing about this ruling addresses the question as to whether the service is violating copyright with its output. The only question answered was whether Anthropics' input of copyrighted material was fair use, and the court decided it was.

When each LLM was put into a public-facing version of Claude [Anthropics AI service], it was complemented by other software that filtered user inputs to the LLM and filtered outputs from the LLM back to the user (id. ¶¶ 75–77). As a result, Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service. Yes, Claude could help less capable writers create works as well-written as Authors’ and competing in the same categories. But Claude created no exact copy, nor any substantial knock-off. Nothing traceable to Authors’ works. Such allegations are simply not part of plaintiffs’ amended complaint, nor in our record.

...

Authors further argue that the training was intended to memorize their works’ creative elements — not just their works’ non-protectable ones (Opp. 17). But this is the same argument. Again, Anthropic’s LLMs have not reproduced to the public a given work’s creative elements, nor even one author’s identifiable expressive style (assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works. But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not.

edited June 25

randominternetperson · June 25, 2025 7:09PM

foregoneconclusion said:

randominternetperson said:

foregoneconclusion said:

mfryd said: I am not a lawyer, however this is my understanding of what someone can do without violating copyright law.

Yes, a person can do that without violating copyright. But AI doesn't work like the human mind. AI requires the complete works of Stephen King to be copied into a database. If it's done without permission, then it's a violation of copyright.

Not according to this week's ruling.

"In short, the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use."

https://storage.courtlistener.com/recap/gov.uscourts.cand.434709/gov.uscourts.cand.434709.231.0_2.pdf page 13

Yes, but the judge is ignoring the fact that the AI program is a professional product and a human being is not. Plus, if the AI is really all that "transformative", why wouldn't AI companies simply restrict the training model to public domain material? Answer: they know that it isn't really as "transformative" as the judge thinks it is. For example, if the AI was only trained on the text from Bazooka Joe comics then take a wild guess at what the output would be like? Not all that "transformative".

The transformative part is specifically in the way the computer stores the data and how it uses it. Tokenizing all the words, etc., etc., to build the LLM is the transformation.

As I quoted above, this case says nothing about whether the output of an LLM can violate fair use (but strongly suggests that it can). It just asks if ingesting legally obtained content is fair use, and strongly comes down on "yes."

rezwits · June 25, 2025 11:09PM

mfryd said:

designguybrown said:

Meh. Seems emotional and sentimental. If you are placing your content on the web, you are practically posting it on the street for general view with absurd hopes of pennies trickling in on some desperate fancy rather than through proper business channels with an effective strategy of legally protecting and promoting yourself - childish. Most people who do such art that they may avoid other types of structured paid work - what do they expect when they treat their skill set as a hobby - likely not wanting to work for others on a structured gig - if that's even around much? What's even the issue here - not getting a piece of the trifling leavings of scrapers and edu-content pedlars? pedantic. Art needs to stop being a vague creation-vocation of the rando people and grow up. Successful society is based on complex businesses and legal structures requiring serious people acting seriously. Creativity is a real skill and needs focused training and a hierarchy of knowledgeable people to propagate it through society. Sorry, but I have little symp for the dilettantes and dabblers hoping to otherwise avoid the soulless cubicle, construction site, and assembly line.

It's not that simple. The AI companies are scraping material that isn't on the web. They are scanning and scraping printed books. They are scraping copyrighted movies.

They are scraping the copyrighted works of artists who earn their living licensing their work.

I agree with @designguybrown ;comments, the only thing is these fools (if you can call them that) are out here trying to see who can be the first trillionaire. And here is the 100% problem, you have computers, robots, factories, machinery, and other technology trying just to see who can be the next trillionaire or even zillionaire, and you can say well what's the problem with that? It's that they are using tech from 2025 back to 1900, to destroy a CURRENCY system designed back in what? freaking 600BC? I sure hope Jeff Bezos armed to the teeth with his tech our tech their tech and everybody's stuff, can beat the game Monopoly circa 1900s. Because gees I mean they practically already BEAT the financial game without AI and just simple Excel spreadsheets... And you doubt machines and robot exist that do MASSIVE AMOUNTS of work... where people literally already DON'T WORK from it? please...!

Like I said I agree with your comments. But here is my hope, I hope that when every last human being has not a single penny, and 3-5 guys have all the money, we turn around and say, "All that money that you 3 have... it's worthless, we won't accept it." You have earn some other kind of monetary unit...

And I know I know, you'll say bitcoin, but then AI cracks that too? It's never ending stupidity that we are in here, where all these douches are trying to do is be "First" to the Trillionaire...

2¢ Laters...

edited June 25

gatorguy · June 26, 2025 12:17PM

A different court in a similar case involving Meta and book scanning for AI training has dismissed that case too, but with distinctions that may not entirely agree with Alsup's opinion.
https://www.wired.com/story/meta-scores-victory-ai-copyright-case/

edited June 26

randominternetperson · June 26, 2025 1:40PM

gatorguy said:

A different court in a similar case involving Meta and book scanning for AI training has dismissed that case too, but with distinctions that may not entirely agree with Alsup's opinion.
https://www.wired.com/story/meta-scores-victory-ai-copyright-case/

Thank you for that reference.

Here's the text of this case: https://www.courtlistener.com/docket/67569326/598/kadrey-v-meta-platforms-inc/

One of the opening paragraphs from that ruling hits on one of the critical points in our discussion here.

Speaking of which, in a recent ruling on this topic, Judge Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on. Such harm would be no different, he reasoned, than the harm caused by using the works for “training schoolchildren to write well,” which could “result in an explosion of competing works.” According to Judge Alsup, this “is not the kind of competitive or creative displacement that concerns the Copyright Act.” But when it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.

These two ruling take very different positions about a key test in determining whether something is fair use, namely the effect of the use upon the potential market.

Alsop says that, while there could be copyright violations based on how the LLM are used, training the LLM itself isn't impacting any potential market.

Chhabria, the judge in this Meta case, says:

And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made.

Both judges accept the 4-part fair use framework, but they different in how they interpret the effect on the market. Is it the actual effect by the company training the model, or is it the potential for such a model to have an impact?

Here's what the judge said in this second case:

As for the potentially winning argument—that Meta has copied their works to create a product that will likely flood the market with similar works, causing market dilution—the plaintiffs barely give this issue lip service, and they present no evidence about how the current or expected outputs from Meta’s models would dilute the market for their own works.

and

In this case, because Meta’s use of the works of these thirteen authors is highly transformative, the plaintiffs needed to win decisively on the fourth factor [the market impact issue] to win on fair use. And to stave off summary judgment, they needed to create a genuine issue of material fact as to that factor. Because the issue of market dilution is so important in this context, had the plaintiffs presented any evidence that a jury could use to find in their favor on the issue, factor four would have needed to go to a jury. Or perhaps the plaintiffs could even have made a strong enough showing to win on the fair use issue at summary judgment. But the plaintiffs presented no meaningful evidence on market dilution at all. Absent such evidence and in light of Meta’s evidence, the fourth factor can only favor Meta.

In other words, "I am very open to declaring that training an AI with copyrighted materials (even though "transformative") is a violation of fair use, but you have to actually make that case, you morons." (this is my interpretation, not a quote from the ruling).

But to be clear, neither ruling says that training a model with copyrighted material is fair use under any circumstances, and both point out that the critical question is harm to the copyright holder.

Courts say AI training on copyrighted material is legal

Comments