Must reads: Timothy Lee, Grimmelmann, Sag on AI lawsuits; Samuelson on remedies, destruction of AI models
For those of you following the 16 copyright lawsuits against AI companies, including OpenAI, Microsoft, Google, and Meta, plus several AI text-to-image generators, three very insightful articles were recently published by some of the leading copyright experts in the field.
Timothy B. Lee and James Grimmelmann wrote an article in ars technica, provocatively titled: Why The New York Times might win its copyright lawsuit against OpenAI: The AI community needs to take copyright lawsuits seriously. Main points:
The AI community is underestimating the likelihood that the AI companies will lose the fair use argument in the lawsuits against them.
The AI community misunderstands that commercial use in machine learning may cut against fair use, even though the same use in research or an academic setting might favor fair use.
Copyright liability can be existential for a business: it led to the demise of other startups that failed to win on fair use, such as MP3.com (and Napster!).
Generative AI might be viewed differently from past technologies that courts found to involve fair uses (e.g. Google Books) because AI produces entire new works that might compete with plaintiffs’ works (I might add, even if they are not substantially similar, but see also Sega v. Accolade suggesting to the contrary).
AI generators have an “Mario” aka “Italian plumber” problem: regurgitating copyrighted content.
Judges may be cautious: “judges reluctant to shut down an innovative and useful service with tens of millions of users.”
I think the most important point they make on fair use is this, what I call holistic fair use analysis:
“Those who advocate a finding of fair use like to split the analysis into two steps, which you can see in OpenAI’s blog post about The New York Times lawsuit. OpenAI first categorically argues that “training AI models using publicly available Internet materials is fair use.” Then in a separate section, OpenAI argues that “‘regurgitation’ is a rare bug that we are working to drive to zero.”
“But the courts tend to analyze a question like this holistically; the legality of the initial copying depends on details of how the copied data is ultimately used.”
Matt Sag responded to their article.
Agrees on the need to take these copyright lawsuits seriously
Suggests that AI researchers may have been oblivious to the copyright issues when they were using datasets consisting of massive amounts of copyright materials without ever mentioning the copyright issue.
But also points out that the Supreme Court and lower courts have recognized that commercial use does not always cut against fair use.
Fair use should apply when the use is nonexpressive use, a term Sag coined in prior writings: “In a nutshell, that argument is that a technical process that creates some effectively invisible copies along way but ultimately produces only uncopyrightable facts, abstractions, associations, and styles should be fair use because it does not interfere with the author’s right to communicate her original expression to the public.”
The “Italian plumber” problem is really the “Snoopy problem” Sag identified first: “The Snoopy Problem is that the more abstractly a copyrighted work is protected, the more likely it is that a generative AI model will “copy” it.:
The Snoopy problem isn’t an existential threat.
U.S. AI companies will move to more friendly countries if the courts rule against them here!
Pamela Samuelson did a comprehensive review of all the remedies sought in the 16 lawsuits against AI companies, including, in some, the complete destruction of the AI model (assuming infringement is found).
Statutory CMI damages for Section 1202 violations for removal of copyright management information: $250 – $25,000 per violation.
In Github case, the potential amount alleged in the complaint was, at a minimum, $9 billion (however, the court threw out these claims without prejudice).
Statutory copyright damages range from $750 to $30,000 per work, escalates up to $150,000 if willful infringement.
Ordering the destruction of AI models, either based on the court’s power to order impoundment and destruction of infringing materials and “all plates, molds, matrices, masters, tapes, film negatives, or other articles by means of which such copies or phonorecords may be reproduced” (s. 503), or based on the court’s general power to order injunctions (s. 502).
Wow, this is quite interesting. “The New York Times’s complaint against OpenAI and Microsoft goes farthest in seeking model destruction as a remedy. It asks the court to order the destruction “of all GPT or other LLM models and training sets that incorporate Times Works,” even though OpenAI and Microsoft are the only defendants in that lawsuit. The threat of model destruction is, however, very real for these defendants.”
My quick take
1. Don’t forget the jury! The Seventh Am. exists for a reason.
The Lee-Grimmelmann and the Sag articles make a bunch of astute observations about the questions of liability and fair use. However, the entire focus of their articles is on how the various judges presiding in the 16 lawsuits against AI lawsuits may rule. But none of the articles ever discusses the juries.
Unless the courts rule on summary judgment (or JMOL), most, if not all, of the cases will be decided by juries. Indeed, Judge Bibas in Thomson Reuters v. ROSS Intelligence declined to grant summary judgment on the fair use issue because it depended on factual issues for the jury to decide at trial. The trial is scheduled for August 26, 2024. It will be the first trial of all the AI lawsuits.
Let’s also not forget the last big tech fair use decision was Google v. Oracle, which was decided by a jury. We all know the jury is a wildcard. It doesn’t necessarily favor either side. (Some potential members of a jury might even have a bias against AI, a phenomenon my co-researcher and I found in a recent study. This bias does not bode well for the AI companies if it’s present among jurors.)
2. None of the copyright lawsuits pose an existential threat to Microsoft, Google, or Meta. They might to startups, but not OpenAI.
The tragic fate of MP3.com might be relevant to thinking about the potential risks to the AI startup companies (except for OpenAI, given its relationship with Microsoft). But it’s completely irrelevant to thinking about how the lawsuits might impact the Big Tech companies. Their existence isn’t in peril. And, if it were, the U.S. Supreme Court would make sure it wasn’t. Most judges aren’t oblivious to how their decisions might impact the U.S. economy, especially negatively, if not catastrophically.
3. Fair Use might be analyzed holistically, but it may also be analyzed use by use, and case by case.
Lee and Grimmelmann are spot on in arguing that courts might not divide the fair use analysis into two, for training of Ai versus outputs of AI. Instead, courts might examine the question of fair use holistically, as Judge Bibas did in denying summary judgment.
At the same time, the Supreme Court’s last fair use ruling in Andy Warhol Foundation v. Goldsmith does indicate that the fair use analysis should examine each use of the plaintiff’s work individually, or in a use-by-use manner. The Court stated point-blank: “The fair use provision, and the first factor in particular, requires an analysis of the specific ‘use’ of a copyrighted work that is alleged to be “an infringement.” §107. The same copying may be fair when used for one purpose but not another.”
Indeed, the Goldsmith Court divided up the multiple uses by the defendant and focused separately on only one use, the licensing use for a magazine cover:
“Here, Goldsmith’s copyrighted photograph has been used in multiple ways: After Goldsmith licensed the photograph to Vanity Fair to serve as an artist reference, Warhol used the photograph to create the Vanity Fair illustration and the other Prince Series works. Vanity Fair then used the
photograph, pursuant to the license, when it published Warhol’s illustration in 1984. Finally, AWF used the photograph when it licensed an image of Warhol’s Orange Prince to Condé Nast in 2016. Only that last use, however, AWF’s commercial licensing of Orange Prince to Condé Nast, is alleged to be infringing. We limit our analysis accordingly.”This analysis by the Supreme Court does lend support to dividing up the uses into two: (1) training of AI as one use, and (2) any generation of output by AI as another use.
I’m sure this issue will be a major bone of contention. But, paradoxically, Andy Warhol Foundation v. Goldsmith may help the fair use defense of the AI companies.
4. The results on fair use in the 16 lawsuits may differ based on the AI platform and what outputs the Plaintiffs dig up.
In addition, it’s also very important to bear in mind that the 16 different lawsuits involve different AI platforms and different type of AI generators (text v. image) of different defendants that may have been trained on different datasets and that may have quite different outputs, not to mention better or worse effectiveness in avoiding regurgitation. Plus, it’s likely that the strength of the plaintiffs’ claims in the various lawsuits differs based on what substantially similar outputs they found generated from the respective AI platform. Add in different juries, there’s a decent chance the results of the 16 lawsuits will be different, even discounting for the consolidation of several cases.