Xuan-Thao Nguyen* & Elizabeth Porter**
Download a PDF version of this article here.
The humanities have long been under attack; now Big Tech is eating them for breakfast. Artificial Intelligence is undermining the cultural and constitutional values of human creativity; it is also threatening the livelihoods of the creative working class. Because neither Congress nor regulators can keep pace with AIâs pace of change, class actions have stepped into the breach. In the first wave of such litigation, authors and artists in courts on both coasts claim that AI companies violate intellectual property law when they âtrainâ their systems on copyrighted works.
This Article analyzes what we call âinputâ AI claimsâthat is, proposed class action copyright claims against AI companies for inputting creative works into their models. This action is variously described as copying, scraping, feeding, or training. We argue that input copyright class action claims have both merit and power. Substantively, the creatorsâ claims of direct copyright infringement are supported by almost three decades of caselaw applying the concept of âcopyingâ to code and other innovative technologies. Procedurally, input claims based on AI âtrainingâ fall into the heartland of the letter and spirit of Rule 23, which governs federal class actions. Defendantsâ claims of fair use, while seductive, would swallow copyright law and crush human-centered creativity.
Class action settlementsâwhich combine private ordering with judicial oversightâmay offer the best opportunity to establish a licensing regime that protects creators from future infringement. In lieu of fair use, we urge a different doctrine: âfair licensing.â Licensing poses its own risks to creativity; it risks creating moats around knowledge and paying creators symbolic amounts for giving up control of their works. But AI class actions can and should play an influential role in spurring the creation of fair AI licensing regimes. Courts and litigants are wise to act expeditiously in doing so: there is a constitutional urgency to protecting the essence of human creativity.
Introduction
Generative artificial intelligence models are fast becoming a normal feature of the digital landscape.1 So quickly, in fact, that it has proven impossible for lawmakers and regulators to fully conceive of, must less respond to, the host of issues raised by this new but artificially natural-feeling technology.2 Indeed, AIâs creators are struggling with the impact of their own work.3 The rapidity with which generative AI is spreading, combined with lawmakersâ deep-rooted commitmentâembodied in the Communications Decency Actâto minimizing regulation of the Internet, means that litigation, rather than regulation, will play the lead role in establishing legal limits on 21st-century AI systems in the U.S.4
Moreover, the sheer magnitude of people and entities affected by AI foundation models, and the depersonalized, algorithmic mechanisms that propel those models, mean that many legal claims against AI companies will only provide effective relief if they are brought as class actions.5 Without the benefits of aggregation, there is no viable way for the creative working classâfrom journalists, novelists, playwrights, poets, nonfiction writers, screenwriters, composers, lyricists, musicians, videogame developers, and cartoonists, to illustrators, painters, and artistsâwho allege unlawful activity by AI corporations to pursue compensation and/or injunctive relief against further harm. Even with class action lawsuits, a favorable outcome is uncertain.
We anticipate waves of class litigation by parties impacted by generative AI as they seek to shelter their businesses, livelihoods, and identities from the rising influence of artificial intelligence in everyday life.6 This Article addresses the first waveâintellectual property disputes against the AI companies themselvesâwhich is already here.
Copyrightâthe protection for âthe creative powers of the mindâ7âis enshrined in the U.S. Constitution.8 To replicate natural-seeming language, AI models are dependent on complex, rich narrativesânarratives that can be found in novels, short stories, academic articles, poetry, songs, and other copyright-protected creative output. So AI companies copied that output without slowing down to seek permission or offer compensation. The companies decided that it was better to ask forgiveness than permission.9 At a time when the humanities are already under attack, technology companies are eating them for breakfast.10
Over the past year, authors, artists, coders, publishers, and organizations representing them have filed suits arguing that AI foundation model companies, from OpenAI to Anthropicâbacked by Big Tech moneyâare violating their copyrights by âtrainingâ their large language models on tens of thousands of copyrighted creative works without even the veneer of an attempt to obtain consent or provide compensation.11 The claims in these lawsuits can be conceptually divided into two categories. The first category, and the focus of this Article, are what we call input claimsâthat is, claims alleging that AI companiesâ initial copying of text, art, or code is a form of direct copyright infringement. The second category are output claims alleging the synthetic contents generated by AI models in response to usersâ prompts violate copyright holdersâ exclusive right to prepare derivative works under copyright law.12
Despite the enormous influence of a handful of class suits on the development of copyright law, there remains very little scholarship on the intersection of class actions and copyright.13 In this Article, we seek to advance the study of copyright class actions in the field of artificial intelligence. To that end, we argue that class action plaintiffsâ input claims are correct on the merits. Beginning in the 1990s with the Ninth Circuitâs decisions in the so-called MAI Trio, courts have consistently and adeptly used existing copyright law to preserve authorial rights against incursions by new duplicating technologies. Under that precedent, AI âtrainingâ constitutes unlawful copying. This same set of precedents also undermines AI companiesâ fair use defense.
Procedurally, this Article argues that the Rule 23 class action device is an effective and appropriate tool with which to impose legal guardrails for direct infringement on AI companies. Examining earlier copyright class actions, we show that the AI plaintiffsâ direct copyright infringement claimsâwhich we call âinputâ claims because they arise out of material that was input into AI enginesâfall squarely within the letter of Rule 23. AI class actions also embody Rule 23âs spirit. Class actions are appropriate when a defendantâs conduct impacts a group of people in functionally the same way. This maps on precisely to the facts of what AI companies did. Moreover, the class action device can achieve a remedy for large numbers of people who otherwise would be without the resources or knowledge to vindicate their rights. That is also true here: Only as a group do creators have the clout to negotiate with companies valued in the billions of dollars.14
Importantly, input claims are only one facet of creatorsâ challenges to AI. Plaintiffs in AI cases have also alleged various downstream, post-âtrainingâ infringement claims, which we refer to as âoutputâ claims.15 For example, plaintiffs have argued that AI engine outputs may themselves be infringing.16 Our focus here is on input claims, but two observations are in order. First, while class certification may be appropriate for adjudicating some output claims, requirements for individualized proof and/or individualized assessment of defenses will render class certification more challenging for such claims. Second, if (as we think likely) the parties reach a settlement in the pending cases, any such agreement is likely to encompass both input and output claims, which arise from the same basic factual predicate. Any release of future output claims will thus hover in the air during resolution of the input claims. In assessing the fairness of a settlement, the key question will be whether the agreement preserves authorial voice and independence into the future in a sustainable way.
With regard to both input and output claims, the class action device serves a vital forward-focused role. Xiyin Tang has argued that copyright class actions spur the creation of licensing schemes and generate blueprints for legislative gap-filling.17 That is already happening in the AI context. Ironically, at the very same time that AI companies are defending their âfair learningâ justification for stealing from the creative working class, they are already entering into licensing deals with news publishers as well as user platforms such as Reddit.18 The difference in the class action context is that a court will need to approve any settlement; an important element of that judicial review will be ensuring that working class artists, and not only best-selling authors, receive fair compensation for any licensing agreement that results.
We conclude by centering the risk to human creativity that is at the heart of the pending class actions. Artificial intelligence inevitably has the buzz, and the buzzwords, of cutting-edge post-modern life.19 But generative AI raises questions at least as old and as important as the Constitution. And the Constitutionâs far-sighted protection of intellectual property rights should not be forgotten in the heady rush of innovation. In fundamental ways, AI copyright class actions are not solely about the right of this or that author, or the economic impact of copyright infringement on individuals; rather, these claims are about a threat to the concept of authorship as a human endeavor, and to the vision of Americaâs founders that copyright will protect and incentivize human creativity.
Since the Founding, copyright law has withstood the challenges of successive new technologies. But in a world where 90% of online content will be synthetically generated by 2026, human creativity is endangered.20 Thus, we conclude by arguing that there is a constitutionalâand not merely a statutory or economicâurgency to protecting the human role, and human agency, in the process of creation. Class actions are a feasible, viable way to respect this constitutional urgency. We recognize that licensing agreements are the most likely, and perhaps the only, way to resolve the impasse between creators and AI companies.
But licensing regimes pose their own risks to the livelihood and dignity of artists. Fair licensing must do more than give authors a smidgeon of cash in exchange for their creative souls; it must address the constitutional urgency, and the unknowable future, brought on by generative AI. To qualify as far, we argue that a licensing regime must bear three hallmarks: 1) it must allow creators to opt out; 2) it must not allow licenseesâ unilateral expansion of the scope of the license; and 3) it must provide a mechanism for attribution to human authors.
Part I of this Article illustrates how AI transforms, and taints, human creativity.
Part II briefly describes the pending intellectual property class actions currently pending against AI companies and describes their central copyright claims.
Part III traces how, beginning in the 1990s, courts have successfully applied existing copyright law to successive waves of technological disruption. It shows that, under these precedents, AI companies are liable for feeding copyright-protected creative works into their engines.
Part IV analyzes courtsâ application of Rule 23 in pre-AI copyright class actions. Current AI class claims reflect the same fundamental pressure points as did earlier suits.
Part V argues that, just as existing copyright law can meet this moment, so too can existing class action doctrine. Using two current AI cases as examplesâone from the Second Circuit and one from the Ninthâwe argue that AI plaintiffsâ input claims fall in the heartland of what Rule 23 is designed to accomplish. We also recognize that class certification is likely to prove more difficult (though not impossible) for âoutputâ claimsâclaims for derivative infringement and other downstream claims that arise out of the original unlawful copying.
Part VI contests the AI companiesâ fair use defense, which comes down to one philosophy: âtrust us.â
Part VII concludes by urging courts and Congress to reject fair learning in favor of fair licensing. It proposes three terms that can ensure that a licensing regime stays faithful to the constitutional essence of copyright law.
I. AI and the Risk of Post-Human Creativity
In a rock band, four or five members each play a role or more: a lead guitar, rhythm guitar, bass guitar, drum, keyboard, and lead vocal.21 A symphony requires many musicians with talent in their respective instruments to collaborate under a conductorâs baton perfecting a beautiful movement.22 Jazz musicians are known for their partnership merging their different styles and elevating their art or seizing inspiration from listening to other jazz musicians.23 The âintimate exchange of the creative pair, such as John Lennon and Paul McCartneyâ illuminates the value of human connectivity for creative process and achievements.24 Recording an album needs more than just the band but the producer, the studio, the engineer, the mixing engineer, and the mastering engineer.25 Connecting humans for collaboration is essential for creativity.
Michelangelo is known for the breathtaking painting of the Sistine Chapel ceiling. But he could not accomplish the feat alone. He âworked with a team of assistantsâ to achieve the results.26 Picasso made Cubism together with Georges Braque and rivaled with Henri Matisse leading to adversarial collaboration.27 Albert Einstein accumulated inventive knowledge from reading and analyzing other peopleâs inventions while he was working as a patent examiner at the Swiss Patent Office,28 and testing his idea for his theory of relativity in conversation with the engineer Michele Besson, âthe best sounding board in Europe.â29 Creativity is demonstratively the product of human connectivity and collaboration.
With generative AI, creativity may become post-human. A new rock song, a newly improvised jazz piece, a new symphony, among other myriad new music products, can be synthetically generated.30 A known musicianâs style can be imitated through AI generation.31 A famed vocalistâs unique sound can be generated because GPT-4o âcan respond to audio inputs in as little as 232 milliseconds.â32 Connectivity and collaboration are not invited. All the models need is prompts.
Do you want to write in the style of Tolstoyâs novelsâperhaps War and Peace, or Anna Karenina? AI models are standing by to assist in generating deepfake Tolstoy-like new text through a series of prompts: â1. Directly ask the AI to write like the writer you have in mind[,] 2. Describe what the writerâs voice is like, and ask the AI to write like that (preferably with examples)[, and] 3. Fine-tune the AI on a specific writer[.]â33 Do you wish to draft a sitcom dialogue mimicking Jerry and Kramer in a new Seinfeld episode?34 In minutes, AI models trained on novels and scripts written by human authors present you with synthetic literary text of the original authorsâ style.
Writing pitches for sequels to popular movies in an instant? AI modelsâ performance in the movie sequels task garnered praises like âsurprising,â âgood,â âgood enough that you could see someone buying a ticket to the film,â and âbrilliant.â35 For instance, in response to Casablanca in the prompt, AI wrote: âOne possible plot for a sequel to Casablanca could explore what happened to Rick Blaine and Ilsa Lund after they parted ways in Casablanca. The sequel could pick up several years after the events of the original film, with Rick and Ilsa living separate lives in different parts of the world. Perhaps circumstances conspire to bring them back together, and they must grapple with the complicated feelings they have for each other and the choices they made in the past.â36 Having ingested all the scripts and movies, AI models produce new pitches deriving from the originals.
AI can generate a painting that helps you win a jury prize.37 Prompts, again, are what you input at the computer.38 Learning how to paint from teachers, imitating styles from accomplished artists, studying their techniques, and listening to critiques from others, none of these steps are necessary. AI models trained on existing images from all sources are ready at your prompt command to generate whatever you desire and whenever you wish.39 All you need is to type in the prompts, and generative AI programs will do the rest in seconds.40
Learning to speak or sing beautifully in a foreign language typically takes years of learning from teachers and coaches. In many cases, the students may travel abroad to submerge themselves in living in foreign countries for the opportunities to connect with the people and culture while improving their language abilities. In the AI age, just ignore human connectivity and creativity regarding learning foreign languages. For instance, AI can generate an English-speaking rock band vocalist (who is real) rocking away fluently and with nuanced flair in Japanese and Korean languages (synthetically, of course).41
Albums of new songs by different artists can be synthetically generated without ever stepping into a studio.42 Googleâs Dream Track AI tool excites performers to generate new content in their replicas.43 The days of collaboration in the studio are numbered. Obviously, for now, those famous few get paid by Googleâs Deep Mind AI for having their voices cloned as part of new synthetic content creation without collaborating with others in the studio. For many others with ordinary voices, there is nothing to collect.44
Generating a haiku in seconds? Ditto. AI models ingest haiku and are trained to generate haiku, poems known for having 17 syllables, and apply the worldâs shortest and clearest rules to compose new poems and evaluate their aesthetic. AI can generate haiku quicker than humans, and humans fail to distinguish which haikus are synthetic and which are human-created.45 John Grisham took years to research and develop characters and stories for his books. However, AI can generate sequel chapters mimicking John Grishamâs style in an instant.
AI poses an existential threat to the creative working class, and also to the very concept of human creativity.46 If we do not act soon, we will no longer be able to distinguishâor perhaps even rememberâwhat human creativity is.
II. Copying and Fair Use in Disruptive Technologies
The Constitutionâs Copyright Clause was drafted during the age of lithographyâmore than 50 years before the rotary press spurred modern print media.47 Yet its language, and its mission, have remained relevant.48 Copyright law has outlasted cycles of disruptive technological advancements. When personal computers first began to enter businesses and households, critics lamented that âas people and businesses often find the transition to computers difficult, integrating computer software into the realm of copyright law has proven to be a daunting task.â49 Ultimately, however, both courts and Congress rose to the challenge, and copyright law has remained a relevant, practical tool for protecting authorial rights from technology-enabled infringement. Beneath each technological innovation, the heart of the issue has remained: the protection of human creative output. And in each instance, the Constitutionâs Copyright Clause has served as a compass, guiding courts to encourage technological innovation while simultaneously protecting human creativity.
This Part recounts how courts have thoughtfully applied copyright doctrineâboth affirmative claims, as well as the defense of fair useâto waves of disruptive digital technology. The precedents described below provide a firm basis for holding AI companies liable for their unauthorized âtrainingâ of AI models on copyrighted creative works, and for denying a âfair learningâ defense. Simultaneously, however, this part recounts the Supreme Courtâs somewhat expansive use of the fair use doctrine to protect some code-copying in its 2021 decision in Google LLC v. Oracle America, Inc.50
A. The MAI Trio: Code Copying
The Copyright Act explicitly contemplates the evolution of technological methods of copying creative output. It defines âcopiesâ as âmaterial objects…in which a word is fixed by any method now known or later developed.â51 In the early 1990s, the Ninth Circuit penned three opinions later referred to as the MAI Trio, defining the meaning of âcopyingâ under the Copyright Act during the rise of software and personal computers. These decisions, which granted property rightsâand copyright protectionsâto the output of the software age, apply directly in the age of artificial intelligence.52
1. Code Is a âCopyâ
The first of the Trio, MAI Systems Corp. v. Peak Computer, established that even temporary copies of software code are protected by copyright law.53 In so doing, MAI Systems defined the concept of âcopyingâ for the digital age.
The case concerned a dispute between MAIâa computer manufacturer and software developerâand Peak, a computer maintenance company.54 When Peak provided maintenance and repair services to those of its customers using MAI computers, it necessarily ran MAI software.55 MAI argued that this unlicensed use of its software constituted âcopyingâ for purposes of the Copyright Act; it sought and obtained a permanent injunction against Peak.56 The Court framed the question as whether âcopyingâ occurs when a computer program is transferred from a permanent storage device to a computerâs random-access memory (RAM).57 It answered that question in the affirmative.
To begin, the Court noted that it was âgenerally accepted that the loading of software into a computer constitutes the creation of a copy.â It applied this same logic to RAM copies.58 A computer executing any program must load that program into RAM.59 Peak argued that RAM copies were not âfixedâ under copyright law. But the Ninth Circuit held that a RAM copy is âsufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of more than transitory duration.â60 Therefore, it held that the âloading of copyrighted computer software from a storage medium (hard disk, floppy disk, or read-only memory) into the memory of a central processing unit âCPUâ) causes a copy to be made.â61
The immediate impact of the case was that MAI could prevent Peak from using its software, thereby preventing Peak from offering its services as a competitor. The lasting impact was even broader. MAI was innovative, clear, andâgiven the lack of Supreme Court precedent in copyrightââfar-reaching.â62 It was also problematic. Scholars have criticized MAI on textual and policy grounds.63 As Danielle DâOnfro recently observed, the effect of MAI was to entrench restrictive intellectual property licenses into anything related to softwareâwhich is fast becoming everything.64 Indeed, the Ninth Circuit now appears to offer a more nuanced take on software copies.65
Nevertheless, what is good for the goose should be good for the gander. MAIâs broad conception of a fixed copyâwhich has tended to benefit corporate software developersâprovides authors and other copyright owners with a strong shield against non-consensual copying by AI companies.
2. Limiting Fair Use for Code Copying
The second case in the MAI Trio, Triad Systems Corp. v. Southeastern Exp. Co., which reached the Ninth Circuit in 1995, narrowly construed fair use in the copying of software.66 Triad manufactured computers for use by automotive parts stores.67 Triad also developed software for its service technicians to use.68 Southeastern competed against Triad for the business of servicing and maintaining Triad computers.69 Upon discovering that Southeastern had been making RAM copies of Triad software when providing services to customers, Triad sued.70 Following MAI, Triad moved for summary judgment on its copyright infringement claim; Southeastern countered with a fair use defense.71 The district court denied both motions, the case proceeded to trial, and the jury found that Southeastern had violated Triadâs copyrights.72 Following the verdict, the district court granted Triadâs motion for a preliminary injunction, pending the resolution of other issues in the case.73 Southeastern appealed.
Applying MAI, the Ninth Circuit expressed âno doubtâ that Southeastern had made infringing copies of Triadâs software.74 The Court also noted that, for purposes of injunctive relief, there is a presumption of irreparable harm from copyright infringement, notwithstanding that a copyright plaintiff can theoretically be made whole through money damages.75
Most significantly, the Ninth Circuit rejected Southeasternâs fair use defense.76 Southeastern, the Court held, âhas invented nothing of its own; its use of Triadâs software is…neither creative nor transformative, and does not provide the marketplace with new creative works.â77 Southeastern was âsimply commandeering its customersâ software and using it for the very purpose for which, and in precisely the manner in which, it was designed to be used.â78 The Court also found that Southeasternâs conduct yielded no appreciable public benefit.79 Accordingly, the Ninth Circuit rejected the fair use defense in RAM copy infringement cases.
In 1998âafter MAI and TriadâCongress amended Section 117 of the Digital Millenium Copyright Act to prohibit software developers from using the RAM copy doctrine to eliminate competition from computer service companies.80 But Congress did not eliminate the RAM copy doctrine itself; it merely provided an exception to it.81 Congress also did not overturn MAIâs holding that license agreements define the limits of licenseesâ rights with regard to software.82
3. The Age of Licenses
A decade later, the Ninth Circuit decided the final case in the MAI Trio, Wall Data Inc. v. LA County Sheriffâs Dept.83 Wall Data developed RUMBA software and the Los Angeles County Sheriffâs Department purchased licenses for 3,663 computers.84 But the Sheriffâs Department installed the software on 6,007 computers.85 Wall Data brought a copyright infringement suit against the Sheriffâs Department, which raised âfair useâ and âessential stepâ defenses.86
The Court rejected fair use. It ruled that making verbatim copies of software, for the same purpose, was not transformative, nor was it an industry norm.87 It also held that, were the practice of duplicating software in excess of oneâs license to become widespread, it âcould seriously impactâ the market.88
The Court also rejected an âessential stepâ defense under 17 U.S.C. § 117.89 âEssential stepâ means the making of another copy of a software program in the utilization of the computer program in conjunction with a machine. The copy is used in no other manner. The Ninth Circuit held that the âessential stepâ defense only applies to an âownerâ of a program, not to a licensee such as the LA Sherriffâs Department.90 The Ninth Circuit cited MAI in support of its ruling that the Sheriffâs Departmentâs licensee is bound by the license terms, and as a licensee, the Sheriffâs Department did not have the statutory right to make extra copies of the software.91
The MAI Trio applied long-standing copyright law to burgeoning digital technologies. The Trio established that digital copying of copyrighted works in the computer software context is unlawful infringement, barring permission or a statutory exemption. It also empowered and protected software creators by allowing them to define the terms of use for their product through the (now ubiquitous) use of licenses. Notably, courtsânot Congressâtook the lead in analyzing the application of the Copyright Act to new technologies.92 Although Congress eventually weighed in, amending § 117, it did so only after the issues of copying and fair use of software had been explicated through an adversary process between those parties most directly impacted.
B. Cached Contents and Fair Use: Field v. Google
The explosion of networked computers in the early 2000s forced copyright law to update copyright doctrine once again. Companies such as Google, Yahoo, AltaVista, and Lycos developed technologies to capture the emerging search sector of the digital economy. To do so, they sought to copy, analyze, and cache every existing web page. Googleâs automatic program for finding and cataloging web pages was called the âGooglebot.â93 Search companies did not seek permission from the copyright owners of any of these web pages. Litigation soon followed, pitting copyright owners against the search companies.
In this battle, the search companies held their own.
When a user conducts a search at Google, the search engine displays its search results, comprising the title of a webpage, short snippets from the webpage, the URL for the page, and another link labeled âcached.â94 Clicking on the âcachedâ link âdirects an Internet user to the archival copy of a Web page stored in Google’s system cache, rather than to the original Web site for that page.â95 A disclaimer explains that the page is a snapshot from the Google cache, not the original page.96 The snapshot is therefore not current if the webpage has been updated by the owner.97
The âcachedâ links served several purposes, according to Google, including: (1) providing archival copies that allowed users to access the pages in the event the website became inaccessible, (2) providing webpage comparisons, enabling users to determine how a particular webpage changed over time, and (3) identifying search query terms, thus allowing users to determine why a particular page was more responsive to their query.98 Also, according to Google, the vastness of the internet prevented it from contacting website owners to ascertain whether they wanted their pages listed in search results or accessible through âcachedâ links.99
In Field, the plaintiffâan attorney and amateur authorâbrought a copyright infringement action against Google for copying his website, on which he had posted 50 creative works, and storing these copies in a cache without authorization.100 The plaintiff argued that Google directly infringed his copyrights when a user clicked on a âcachedâ link to the web pages containing the plaintiff’s copyrighted works and downloaded a copy of those pages from Googleâs cache.101 That meant, according to the plaintiff, that Google itself had created and distributed copies of the plaintiffâs works.102
Field did not prevail. The district court held that the user, not Google, creates and downloads a copy of the cached web page when the user requests a web page contained in the Google cache.103 Without a userâs request, no copy is created or distributed; even with such a request, the âautomated, non-volitional conduct by Google in response to a userâs request does not constitute direct infringementâ under copyright law.104 The district court also sustained Googleâs implied license defense. At the time of the lawsuit, Field had an easy way to protect his digital material from being cached: he could have used a âno-archiveâ meta-tag, which would have informed Google not to display âcachedâ links to his page. Field did not do this.105 Instead, the plaintiff âmade a conscious decision to permitâ Google to make available the cached links.106
Most significantly, the district court found that Googleâs copying and distributing of the plaintiffâs copyrighted works by allowing users access to them through cached links constituted fair use.107 Under the first fair use factorâthe purpose and character of Googleâs useâthe court observed that Googleâs system cache serves a different purpose from that of the plaintiffâs original works. The plaintiff intended his copyrighted works to serve an artistic function.108 But Googleâs cache enabled users to access content when the original page is inaccessible, detect changes in a particular website, and allow users to understand why a page was or was not responsive to their query.109 âBecause Google serves different and socially important purposes…and does not merely supersede the objectives of the original creations,â the district court concluded that Googleâs alleged copying and distribution of the plaintiffâs web pages was transformative.110
Assessing the three remaining fair use factors, the district court found that Google operated its cache system in good faith; that it provided a disclaimer and explanation; and that it followed industry protocols by not displaying âcachedâ links if the owner of a page opted out.111
Field established a blueprint for other copyright infringement actions relating to cached contents.112 For instance, the district court in Parker v. Yahoo! adopted much of Fieldâs reasoning and held that Yahoo!, a search engine with similar caching practices, had an implied license to display a cached copy of the plaintiffâs copyrighted works.113 AI companies will likely invoke Field to argue that their âtrainingâ is simply caching by another name. As we discuss further below, however, AI large language models are a far cry from Googleâs caches.114
C. Licensing and Fair Use: American Geophysical Union v. Texaco, Inc.115
As courts expanded the power of licensors in copyright, they also incorporated licensing schemes into their concept of fair use. This trend is best embodied in American Geophysical Union v. Texaco, Inc., where the Second Circuit expressly considered the availability of a licensing mechanism in rejecting a fair use defense.116 In Texaco, academic researchers sued oil giant Texaco, alleging that the company routinely photocopied academic research publications for its scientists without permission from or compensation to the publishers.117 The courtâs rejection of fair use in Texaco provides an important check on Fieldâs fair use approach.
In Texaco, the court questioned whether fair use could even apply to mechanical copying of an entire document.118 Even under the traditional four-factor fair use test, however, the court rejected the defense. It found that Texacoâs practice of making copies for individual scientists was âpart of a systematic process of encouraging employee researchers to copy articles so as to multiply available copies while avoiding payment.â119 While the second factorâthe nature of the copyrighted workâfavored Texaco, the third and fourth factors tipped in favor of the plaintiff authors. The Court was particularly persuaded by the fact that Texaco copied each copyrighted work in its entirety. Texaco had attempted to reframe the question by noting that only a small fraction of any particular volume of the academic journal was copiedâbut the Court refused to take a macro view, given that âeach article enjoys independent copyright protection.â120
Most importantly, the Court found in favor of the plaintiffs on the fourth factorâpurportedly the âmost importantâ fair use elementâbecause of the availability of the Copyright Clearance Center, a private licensing organization that could negotiate a royalty in exchange for additional copies.121 The Court noted that existence of âtraditional, reasonable, or likely to be developedâ licensing markets is an important factor in assessing the effect of an infringing activity on the market; if paying for a use is relatively easy, free use is, simply, âless fair.â122
Some scholars have criticized Texaco for allowing the existence of a private licensing scheme to warp the traditional fair use analysis.123 As discussed below, however, the doctrine of fair use must respond to current copyright culture. And within that culture, licensing organizations and agreements have become a routine and important wayâperhaps the only feasible wayâto mediate between copyright holders and digital innovators.
D. Digital Copying of Creative Works
Beginning early in the 2000s, technology enabled companies to duplicate creative works with increasing efficiency, opening the door to mass-scale digital duplication, resulting in individual as well as class suits seeking redress for copyright infringement by digital disruptors.
1. Digital Copying: New York Times Co. v. Tasini124
In New York Times Co. v. Tasini, the Supreme Court confronted the first wave of large-scale digital replication of copyrighted works. The plaintiffs in Tasini had published their works in print periodicals. The periodicalsâ publishers had, without consent, submitted the authorsâ pieces to an electronic database, making them retrievable by users as individual works, detached from the volume in which they first appeared. The defendant publishers argued that this was a ârevisionâ of a work authorized by copyright law. The Court rejected that argument.125 Detaching these works from their context and dropping them into a massive database, the Court held, âeffectively overrides the Authorsâ exclusive right to control the individual reproduction and distributionâ of their work.126
The publishers offered several other defenses that recur in digital copying cases, none of which prevailed (and many of which were raised anew by subsequent digital disruptors). First, they claimed that the decontextualization of the copyrighted works could be solved technologically by enabling users to âmanipulate the Databasesâ to retrieve all the works contained within the original volume.127 The Court was unimpressed. Second, defendants argued that, at most, they could be contributory infringers; end users, rather than the Databases, would be committing the infringing acts. Again, the Court disagreed: âIt is the copies themselves, without any manipulation by users,â that violate the Copyright Act.128 Finally, the defendants argued that disallowing inclusion of these works in electronic databases would, as the Court phrased it, âpunch gaping holes in the electronic record of history.â129 The Court remained unmoved.130 This, according to the Court, is what licensing agreements are made for.131
2. Fair Use: Google Book Project132
Although Tasini was a Supreme Court decision, it is overshadowed in cultural memory by the Second Circuitâs adjudication of Googleâs book project. Google began with an ambitious (and ultimately unrealized) plan to create a comprehensive database of booksâa quasi-mythical universal library.133 In 2004, announcing the âGoogle Books Library Project,â Google secured agreements with the New York Public Library, the Library of Congress, and several university libraries to digitally copy their collections, and it scanned more than twenty million volumes in its quest to create a massive digital book repository.134 Left out of this vision was a process for seeking permission from, or paying compensation to, copyright holders included in the scanned book collection.
The Authors Guild, Inc., an organization of published authors, as well as several individual authors, challenged the project in a class action. Google asserted that its unauthorized activitiesâcopying, storing, distributing, and displaying the copyrighted bookâwere protected by the doctrine of fair use.135 The parties participated in extensive negotiations and entered into a proposed settlement.136 That settlement would have granted Google broad (though not exclusive) rights over the books in its electronic database, including the right to sell subscriptions to an e-book database and the right to sell online access to individual books, in exchange for paying 63% of the revenues to copyright holders.137 Objectors contested the settlement on the ground that it did not comply with Rule 23; they also alleged that it violated copyright, privacy, and antitrust law, among others.138 The district court rejected the proposed settlement.139
Google reduced the scope of its project so that users would only be able to see âsnippetsâ of the copied texts in the public domain; what was originally a digital bookstore became a vast public index instead. The Second Circuit found that Googleâs now-much-narrowed search and snippet functions satisfy fair use.140 It held that the first factorâthe purpose and character of the useâfavored Google.141 Googleâs use of copyrighted works âhighly transformativeâ because its digitizing technology enables âa search for identification of books containing a term of interest to the searcherâ and permits âa searcher to identify those that contain a word or term of interestâ.142 With the snippet views, searchers can only âread snippets from the book searchedâ but gain âimportant value to the basic transformative search function, which tells only whether and how often the searched term appears in the book.â143 Googleâs tiny snippets show âthe searcher just enough context surrounding the searched term to help her evaluate whether the book falls within the scope of her interest (without revealing so much as to threaten the author’s copyright interests).â144 In other words, Googleâs transformative use of the copyrighted works did not directly compete as substitutes for the plaintiffsâ books.145
On the second factor, the nature of copyrighted works, the Second Circuit noted that âeach of threeâ books scanned by Google is âfactualâ and concluded that nonfiction or fiction books are ânot dispositiveâ in finding fair use.146 The Circuit observed that â[n]othing in this case influences us one way or the other with respect to the second factor considered in isolation.â147 Considering both the first factor and second factor of fair use doctrine together, the Circuit stated, âthe second factor favors fair use not because Plaintiffsâ works are factual, but because the secondary use transformatively provides valuable information about the original, rather than replicating protected expression in a manner that provides a meaningful substitute for the original.â148 The Circuit, however, did not provide extensive explanation. This seems to suggest that authors of published nonfiction books may face a conclusory finding of fair use under the second factor, or that the second factor is no longer relevant in fair use analysis.
For the third factor, the amount and substantiality of the portion used, the Second Circuit found that â[w]hile Google makes an unauthorized digital copy of the entire book, it does not reveal that digital copy to the public. The copy is made to enable the search functions to reveal limited, important information about the books.â149 Accordingly, with respect to the search function, Google satisfied the third factor. As to the snippet view function, Google constructed the snippet feature in a âmanner that substantially protects against its serving as an effectively competing substitute for Plaintiffsâ booksâ.150 In fact, Google included many limitations on the snippet function:
These include the small size of the snippets (normally one eighth of a page), the blacklisting of one snippet per page and of one page in every ten, the fact that no more than three snippets are shownâand no more than one per pageâfor each term searched, and the fact that the same snippets are shown for a searched term no matter how many times, or from how many different computers, the term is searched. In addition, Google does not provide snippet view for types of books, such as dictionaries and cookbooks, for which viewing a small segment is likely to satisfy the searcher’s need. The result of these restrictions is,…, that a searcher cannot succeed, even after long extended effort to multiply what can be revealed, in revealing through a snippet search what could usefully serve as a competing substitute for the original.151
….
The other restrictions built into the program work together to ensure that, even after protracted effort over a substantial period of time, only small and randomly scattered portions of a book will be accessible.152
With such limitations and restrictions imposed by Google in creating its search and snippet functions, Google satisfied the third factor for both the search and snippet functions.
Under the fourth factor, the effect of the use upon potential market or value, the Second Circuit ruled for Google because Google did not sell scanned books in competition with the plaintiffs. Specifically, the snippet views in the aggregate amount to âno more than 16% of a book.â153 The Circuit also found that a loss of sale of a book occurs âin relation to interests that are not protected by the copyrightâ because âa searcherâs need for access to a copyrighted book will at times be because the snippet conveys a historical fact that the searcher needs to ascertain.â154 In summary, Googleâs making of a complete digital copy of the copyrighted works for the narrow and unchanging purpose of providing the public with its search and snippet view functions did not constitute copyright infringement.
Google, the defendant in both Field and Google Book Project, prevailed in its fair use defense in the second suit because both cached content and the snippet index involved limited, unchanging, and transformative use of existing copyrights. In neither case did the search companyâs output interfere with copyright holdersâ potential market of selling their books or negatively impact the value of copyrighted works. In the context of AI input class actions, by contrast, there are no firm guardrails on the use of the copied copyrighted works. There is also a strong profit motive to resist the formation of such guardrails. AI companies rely on existing literary text, images, and sounds to create AI models. Without human creative works to train AI models, there can be no AI models.155 That creativity is the essence of what copyright law protects.
3. Fair Use of Code: Google LLC v. Oracle America, Inc.
In setting up its open-source Android platform, Google engineers wrote millions of lines of new code. But they also copied 11,500 lines of code from Java SE, a program ultimately owned by Oracle.156 By using this codeâwhich included tasks that were essential to using Java coding language at allâGoogle enabled âmillions of programmers, familiar with Java, to be able easily to work with its new Android platform.â157 Oracle sued Google, arguing among other things that Googleâs use of the code violated copyright. The question before the Supreme Court was whether this copied code is subject to copyright protections (the trial court had found otherwise158), andâif yesâwhether Googleâs actions were shielded as fair use.
Reasonable minds could and did disagree about these issues. One jury deadlocked on the question of fair use;159 following an appeal, a subsequent jury found Googleâs use of the code did constitute fair use.160 On appeal again, the Federal Circuit reversed, finding ânothing fair about taking a copyrighted work verbatim and using it for the same purpose and function as the original in a competing platform.â161
The Supreme Court, in an archetypal opinion by Justice Stephen Breyer, reversed the reversal.162 It assumed, without deciding, that the code at issue was protected by copyright.163 But it concluded that Googleâs use of the 11,500 lines of code was protected by fair use.164 The purpose of the judge-made, equitable doctrine of fair use, the Court noted, is to âpermit courts to avoid rigid application of the copyright statute when, on occasion, it would stifle the very creativity which that law is designed to foster.â165 In the Courtâs view, Googleâs use of the Java code merited that protection.
To a large extent, the Courtâs fair use holding reflects underlying skepticism toward the concept of granting copyright protection to computer code. Although it assumed for purposes of argument that code is copyrightable, it strongly suggested thatâto the extent it is protected at allâcode should enjoy much thinner protection than fiction, and even than less-protected non-fiction.166 In the Courtâs view, the code at issue was a type of functional organizerâthe equivalent of cabinets, drawers, and files.167 On this basis, when analyzing the first fair use factor, the nature of the copyrighted work, the Court ruled that the code was, âif copyrightable at all, further than are most computer programs…from the core of copyright.â168
The Court also indicated strong support for Googleâs argument that its use of the Java code lines was transformative, thus tipping the second fair use factor in Googleâs favor. In the Courtâs view, Googleâs use of the code furthered the development of computer systems in the new and important realm of smartphone platforms.169 The Court found that the third factorâthe amount and substantiality of the portion usedâalso favored Google: the Court found that the copied code was only 0.4 percent of the total set of Javaâs code.170 Finally, the Court found that Googleâs use of the code lines posed little threat to Oracleâs market, because Oracle was âpoorly positioned to succeed in the mobile market.â171 In other words, Google was not developing a market substitute for Oracleâs products.172 The Court also saw Oracleâs attempt to shield this code as a potentially monopolistic move that would thwart innovation.173 It ruled in favor of Google on the ground of fair use.
***
The cases involving RAM copies, cached contents, digital libraries of scanned books, and lines of computer code, illustrate the ways that copyright law has adapted in response to the computer revolution in the last five decades. These precedents show that AI companies, in âfeedingâ copyrighted works to their systems, were engaging in what copyright law defines as unlawful infringement. Yet that is not the end of the matter. In the era of networked computers, courtsâsqueamish about squelching innovationâappear to have become more open to findings of fair use. Where Big Tech enjoyed fair use protection, no asking for forgiveness was necessary. But the scope of fair use doctrine remains ambiguous, and fair use decisions such as Field and Oracle apply imperfectly, if at all, to the wholesale copying of thousands of copyrighted works of fiction and non-fiction.
This uncertainty lays the foundation for the current wave of class actions by creators against AI companies. The technology is new, but the questions are familiar: what constitutes infringement? What infringement is protected by fair use? With one notable recent exception, courts in the many pending cases have yet to answer.174
III. Generative AI and the First Wave of Litigation
The unexpected arrival of generative AI in 2022 as the world resumed normalcy after the COVID-19 pandemic sent a shockwave across the globe. Generative AI dominated headlines in late 2022 and early 2023. This new technology was laudedâand also criticized for trampling on copyright and privacy laws. This Section briefly describes the three primary types of cases in the current, first wave of litigation arising from generative AI.175
A. Copyright Infringement Actions
Generative AI systems rely on data to train models for the creation and generation of new works that mirror the training data. The quality of that data matters.176 The phrase garbage in, garbage out, first uttered by IBM programmer and instructor George Fueschel,177 applies squarely to generative AI.178 Indeed, AI developers actively seek to avoid poor data in training their models. This has led AI companies to use copyrighted content to train models, without taking the time to seek permission from, or offer compensation to, copyright holders. This type of infringement falls in the âinputâ claims category.
In general, copyright refers to a bundle of rights, namely, the right to make copies, the right to prepare derivative works, the right to distribute the works, the right to publicly perform the works, and the right to publicly display the works.179 Before the arrival of generative AI, most copyright infringement cases centered on unauthorized copying and distribution. In American Geophysical Union v. Texaco, Inc., for example, the defendantâs employees used a photocopy machine to make unauthorized copies of journal articles.180
The age of networked computers spurred the creation of new technologies for ease of copying and distribution, as seen in A&M Records, Inc. v. Napster where the defendant was accused of providing a central server that facilitated music file uploading and sharing among users without permission from copyright holders.181 As a result, fewer copyright disputes rested on violation of the right to prepare derivative work, and if they did, the focus was on situations where a subsequent work was allegedly derived from the original.182 Moreover, the development of well-recognized IP structures, such as the licensing model and movie right options, helped guard against infringing derivative works. For instance, J.K. Rowling sold both movie and merchandise rights based on her Harry Potter series.183
In the AI era, copyright disputes have taken a darker turn. Pending copyright class actions zero in on the violation of both the right to make copies and the right to prepare derivative works. In Getty Images v. Stability AI, for example, Getty Images alleges that Stability AI copied more than 12 million photographs without permission for compensation, with the purpose of âbuild[ing] a competing business.â184 In the course of so doing, Getty Images alleges, Stability AI also either removed or altered Getty Imagesâ copyright management information and infringed on its trademarks.185 Now stuffed to the gills with rich and complex images created by human artists, Stability AI dazzles the public with its generative AI models, with which users can generate stunning AI images in seconds. In sum, unlike the copyright infringement actions of prior decades, AI copyright suits seek to protect creatorsâ rights to authorial creative integrity against a pending torrent of copycat creativity.
Two unsettled doctrinal questions dominate the pending copyright suits. The first is whether the AI companiesâ actions constitute âcopyingâ for purposes of copyright law. We argue above that, in light of a series of cases applying copyright law to emerging technologies, it does.186
The second question is whether AI developersâ use of copyrighted works is protected by fair use. Courts, including the Supreme Court, have struggled to apply the traditional four fair use factors in this new context.187 In Oracle, the Supreme Court held that fair use protected Googleâs use of 11,500 lines of Oracleâs Java code.188 But neither Oracle nor other cases address fair use when creative works are being copied and made available to the public wholesale, nor when the purpose of this copying is to enable the instantaneous creation of limitless derivative works that will directly compete against the works that were copied.
The pending AI copyright suits, which are seeking substantial money damages for past infringement,189 leverage the utilitarian and incentivization theories underlying copyright infringement claims.190 These theories, however, understate the core value of copyright to society.191 The heart of the copyright violations by AI companies is the complete disregard for creatorsâ originality, manifested as unique voices, styles, signatures, manners, and genres.192 Rich, diverse works of authorship enhance the progress of humankind. Encouraging that progress is an explicit purpose of the Intellectual Property Clause of the Constitution.
B. Claims Under the Digital Millenium Copyright Act
Open-source software developers are also in the first wave of generative AI class actions. In Doe v. GitHub, Inc.,193 anonymous members of a proposed class of developers allege that AI companies have taken billions of lines of open-source code, without attribution or permission, to train their generative models, thus violating traditional copyright law as well as the Digital Millenium Copyright Act of 1998 (DMCA).194 This type of claims is also in the âinputâ claims category.
Computer software developers spend significant time and creativity drafting software code.195 Coders have adopted open-source licensing regimes to facilitate collaborative efficiency and creativity within the copyright law framework. For more than three decades, coders worldwide have followed the requirements of open-source license agreements, including GPL, MITâs Open-Source License.196 In these agreements, the copyright management information (CMI) such as the copyrighted workâs title, copyright registration number, the copyright ownerâs name, the creatorâs name, and terms and conditions for use of the work, are included as part of the licensed materials.197 Under these agreements, billions of lines of code have become publicly accessible. Developers post code in so-called Git depositariesâhence the name GitHub. The open-source software movement fueled the rapid development of the World Wide Web, among countless other projects.
Big Tech companies, notably Microsoft, in teaming up with startup OpenAI, are releasing generative AI systems that are trained on billions of lines of code taken from these publicly accessible code depositaries. But coders allege that the companies have exploited this code without complying with the open-source license terms.198 In fact, according to the class action complaint brought by coders, Big Tech companies removed or altered the copyright management information (CMI) from the licensed codes, cleansing them of copyright identity and protection.199 The cleansed codes are used as training data for generative AI models. The output of the training data is new codes that are either copies of the original codes or derivatives thereof. Though AI companies have not made their codes public, the output exposes the scope and the damages to coders and authors alike. The output mirroring the training data demonstrates that when it comes to your children, you can immediately identify them because you were the one who gave birth to them. The coders seek to bring their grievances as a class.
C. Data Privacy Violations
The third type of AI class action challenges the unlawful taking of private information, including personally identifiable information, from hundreds of millions of internet users, including children of different age groups, across platforms and sites, without their informed consent or knowledge. The allegations in J.L. v. Google LLC and A.T. v. OpenAI exhibit this âmove fast and break thingsâ attitude.200 Indeed, Google, the defendant in one of the two data privacy class actions, has repeatedly engaged in the strategy of âDo First, Ask Forgiveness Laterâ that entangled the company in legal troubles with both regulators and rights holders.201 The data privacy claims, however, is not in the copyrightâs âinputâ claims category.
In J.L., eight plaintiffs filed a proposed class action against Google regarding its generative AI system Bard.202 The plaintiffs alleged that Bard scraped personal data and property for its training data. Google moved to dismiss the complaint based on a wide range of procedural and substantive defects.203 Google asserted that the use of the data is necessary to train Bard and the lawsuit would âtake a sledgehammer not just to Googleâs services but to the very idea of generative AI.â204 Googleâs response reflects the current arms race among Big Tech companies in their race to develop and train what will become the dominant generative AI models.
In another class action, A.T. v. OpenAI, the complaint alleges that OpenAI, together with Microsoft, has continually released new versions of ChatGPT, Dall-E, and Vall-E since the first version was released, jolting and entrancing the public with âhuman-likeâ generative AI capability.205 Subsequently, with ChatGPT 4.0, for instance, OpenAI charges a subscription fee of $20 per month per user. OpenAI is amassing billions of dollars from both its alliance with Microsoft and charging high user fees. According to the complaint filed in the class action, to develop ChatGPT, Dall-E, and Vall-E or generative AI products, OpenAI turns to the internet to steal private information, including personal data from millions of unsuspecting consumers worldwide. This stolen data, the complaint alleges, includes private information and private conversations, medical data, information about children, and other type of data OpenAI can scrape off the internet to feed its large language models and deep language algorithms.
In addition, the complaint asserts, OpenAI takes private and personal information from its 100 million registered users, without disclosing that to them. This information includes all input details, account information users enter for registration, names, contact details, login credentials, emails, payment information for paid users, transaction records, identifying data harvested from usersâ devices and browsers, social media information, chat log data, usage data, analytics, cookies, keystrokes, typed searches, and other online activity data.206 Most significantly, OpenAI scrapes usersâ digital footprints accumulated in real-time and as far back as fifteen years ago.
The allegations paint a stark picture that the massive personal data gives OpenAI sufficient information for its generative AI products to be created and trained to have the ability to âcreate our digital clones, including the ability to replicate our voice and likeness and predict and manipulate our next move,â âmisappropriate our skill sets and encourage our own professional obsolescence.â207
***
AI models pose a threat to privacy and human creativity. In October 2023, President Biden signed an executive order directing agencies to develop âNew Standards for AI Safety and Security.â208 But that directive is unlikely to provide compensation to those creators whose copyrights have already been disrespected, or to those people whose privacy has been violated. Especially given the pace and scale of AI companiesâ efforts, litigation is the best option for copyright holdersâincluding everyday peopleâto shelter their creative works and personal data from this new force. The first wave of AI class actions seeks to do just that.
IV. Class Certification in Pre-AI Copyright Suits
Class actions, governed in federal courts by Rule 23, are a well-established (if sometimes controversial) vehicle for groups of individuals alleging similar harm to seek redress for that harm in a single suit led by representative plaintiffs.209 The quintessential class action is for so-called ânegative-valueâ claims, whereâin the absence of a classâindividuals would not have the incentive or resources to vindicate their legal rights in court.210 AI class actions brought by creators whose works have been used to âtrainâ large language models fit this framework perfectly. Yet class actions are high-stakes, uncertain endeavors. The very things that make class actions so valuableâtheir flexibility, their potentially broad preclusive effect, and their ability to generate productive settlement discussionsâalso make them risky for everyone involved.
Although group litigation has ancient roots, the modern class action dates to 1966, when rulemakers promulgated what is now Rule 23.211 From the outset, many proponents of Rule 23 conceived of it as a quasi-regulatory device, resolving concrete disputes in a way that could provide closure to large numbers of people, many of whom would lack the resources to seek individual vindication of their rights.212 Today, civil rights, consumer, and tort litigants argue that class actions are âan important substitute for, or addition to, public administration.â213 In contrast to this quasi-regulatory conception, the defense bar has sought to depict Rule 23 as a mere joinder device, albeit one that, in their view, is being exploited by rapacious plaintiffsâ lawyers. David Marcus refers to this formalist conception as an âadjectivalâ view of Rule 23âone in which the procedure is subordinate to, and should not distort, the substantive law.214
Inevitably, these theoretical tensions have played out in copyright class actions.215 Even prior to the Roberts Courtâs procedural revival216âwhich has created heightened barriers to class certificationâcourts evinced skepticism toward copyright class actions. One court described a proposed copyright class unflatteringly as a âFrankenstein monster.â217 Perhaps as a result, until recently, there were relatively few such suits, with the Google Book Project and Spotify cases likely the only ones to receive mainstream media coverage.218
Yet over time many groups of creators did seek to bring class actions to vindicate their copyright claims. As described below, plaintiffs in earlier suits invoked different categories of Rule 23, sought different forms of relief, and met with varying degrees of success. Among those that failed, some founded on substantive grounds (such as a fair use defense) unrelated to class certification, while others were deemed not to have met the requirements of Rule 23. Many of these decisions were unpublished, which limited their precedential value.219
Nevertheless, these pre-AI copyright class actions provide insight into the applicability of Rule 23 in todayâs AI suits. In general, when courts concluded that the heart of a claim was a defendantâs allegedly infringing conduct, they tended to find Rule 23 satisfied. Conversely, courts focusing on the individualized nature of copyrights, or on individualized defenses such as notice or fair use, tended to find class resolution inappropriate.220 Predictably, then, plaintiffsâ theory of liability was rooted in the defendantsâ conduct: the class argued that the defendant acted according to a uniform policy or practice, often with a dollop (or more) of bad faith. Conversely, copyright class defendants emphasized the scattered nature of the plaintiff copyright holders; they told a story of uniqueness, messy layers of contracts, and the need for individualized determinations of copyright ownership and fair use.221 Details of Rule 23 aside, certification appears to depend on which of these conflicting narrativesâand class action theoriesâprevails.
This Section describes how courts have applied the certification requirements of Rule 23(a) and (b) in pre-AI copyright class actions. It also explains the specific way that courts have applied Rule 23(e)âwhich governs approval of class action settlementsâin the context of copyright. To approve a class action settlement, a court must find that the proposed class satisfies all of these provisions.
A. Copyright and Class Action Rule 23(a)
To obtain class certificationâin order to litigate, or to obtain judicial approval of a settlementâa party bears the burden to satisfy the Rule 23 criteria. First, the proposed class representative must demonstrate that the claims meet the four requirements of Rule 23(a), which are referred to in shorthand as numerosity, commonality, typicality, and adequacy of representation.222 Taken collectively, these four factors are intended to ensure that a proposed class is sufficiently large and yet sufficiently cohesive to litigate as a collective, through the proxies of class representatives.
Prior to the Roberts Courtâs heightened scrutiny of class actions, 23(a) was not considered particularly onerous. This shoot-from-the-hip view of Rule 23(a) applied in early copyright class actions as it did more generally. For example, in Bernstein v. Universal Pictures, Inc., the district court provisionally certified a class of lyricists and composers in a suit alleging that movie and television producers used adhesive contracts to constrain plaintiffs from entering the industry.223 The Second Circuit ultimately rejected the suit on its merits, but the class certification appears to have been decided in a cursory preliminary order. Another court certified a class of composers alleging infringement by cable company Showtime/The Movie Channel in five breezy paragraphs (at least, breezy by civil procedure standards).224
Later decisions, however, gave more attention to 23(a) factors:
23(a)(1): To meet the numerosity requirement, a party seeking class certification must show that the class âis so numerous that joinder of all members is impracticable.â225 There is no numerical cutoff for a class; this finding is within the discretion of the court. Class sizes smaller than 50 people are unusual, but not unheard of.226 Copyright class actions have not typically struggled to meet the numerosity threshold, although in one early case, a court denied certification in a one-page order because the plaintiffs had failed to demonstrate the existence of a sufficient number of freelance photographers affected by the defendantâs alleged infringement.227
23(a)(2): This factor, which requires a showing of âquestions of law or fact common to the class,â228 was long considered a minimal burdenâa view that is reflected in early copyright class actions.229 In Ryan v. CARL Corporation, for example, the plaintiff class argued that the defendantâs document retrieval service, which functioned âlike a private interlibrary loan service,â had provided over 700,000 copies of plaintiffsâ copyrighted publications to users of its service, making payments for only 461 of them.230 In affirming commonality, the court noted that âplaintiffs are essentially challenging a standard business procedureâ with a âcommon core of salient facts.â231 The same logic applied in American Geophysical Union v. Texaco, Inc., although that case settled without a judicial analysis of Rule 23, after courts in the Second Circuit ruled against Texacoâs fiar use defense,232 Texacoâs activity of making copies of academic articles for its employees was uniform in its purpose and effect.
Since 2011, however, commonality has been a higher hurdle. That year, the Supreme Court decided Wal-Mart v. Dukes, a Title VII class action in which over a million employees claimed gender discrimination, the Court emphasized that not any common questions would do; claims must depend on a âcommon contention…that is capable of classwide resolutionâwhich means that determination of its truth or falsity will resolve an issue that is central to the validity of each one of the claims in one stroke.â233
Unsurprisingly, Wal-Martâs more stringent approach affected proposed copyright classes. For example, in Wu v. Pearson Education, Inc., the district court initially certified a class of copyright owners who alleged that Pearson had printed educational materials containing the ownersâ work product in numbers in excess of the amount for which Pearson had paid.234 Subsequently the case was transferred to a different district court judge, who decertified the class, in part based on a finding that âlayers of individually negotiated contractual arrangementsâ defeated commonality.235 Similarly, the district court in Football Association Premier League, Ltd. v. YouTube found no commonality in a suit by a âworldwide classâ of copyright owners against YouTube, given the need for individual determinations of the validity of a copyright, notice to YouTube of infringement, fair use, damages, and other specific defenses.236 Commonality continues to pose a significant challenge to class plaintiffs, including those in pending copyright class actions.
23(a)(3): This provision asks whether âthe claims or defenses of the representative parties are typical of the claims or defenses of the class.â237 Although there is some overlap between this provision and both (a)(2) and (a)(4),238 the focus of the typicality factor is on the congruency between the claim of the named plaintiff(s) and those absent class members: it requires that the claims of the class representatives are fundamentally aligned with those of the absentees.239
Analysis of typicality in copyright class actions has often been cursory. In one early suit by a proposed plaintiff class of music publishers against a proposed defendant class of producers and distributors, the court held that the single named plaintiff lacked Article III standing to make claims against defendants other than the one that had allegedly reproduced his copyrighted music.240 For that reason, the court held that the plaintiff also failed to show typicality.241
But typicality may also be a sticking point in plaintiff-side class actions. For example, the court in Football Association minced no words in rejecting certification based on typicality as well as commonality.242 The class claims in that case were dauntingly broad. The proposed class included copyright holders worldwide who fell into either of two categories: (1) YouTube had previously blocked infringement of their copyrights on its site, but there were subsequent infringing uploads; and/or (2) YouTube had allowed copyrighted musical compositions to be used without authorization.243 The court noted that, to establish liability, each class member of this proposed worldwide class would be required to show that YouTube had legal knowledge of the unlawful conduct.244 As to each allegedly unlawful download, moreover, YouTube would have a potential fair use defense.245 âThe unique nature of each work and of its infringement,â the court stated, âcannot be obliterated by its inclusion in a sea of other claims.â246 Other courts have also evinced skepticism of typicality in broadly framed class claims.247
23(a)(4): The final, and most important, element of 23(a) is adequacy of representation. This provision encapsulates the need to ensure due process before allowing representative parties to litigate the rights of those who are absent.248
Adequacy contemplates two separate inquiries. The firstâand the one most pressing in copyright class actionsâis whether the named class representatives will fairly and adequately represent the class. To protect this fundamental requirement, courts must ensure there are no significant conflicts of interest between the named representatives and absent class members. In Amchem Products, Inc. v. Windsor, the Supreme Court rejected a massive asbestos-related class settlement on the ground that the class representatives, who had current asbestos-related injuries, were structurally inadequate to represent class members whose injuries might manifest in a more-distant future.249 Although the Court pointed to specific examples that gave rise to its concernsâsuch as the settlementâs failure to adjust for future inflationâits holding was premised not only on the terms of the settlement, but the structural conflict of interests between representatives and absent class members.
Courts have found adequacy to be an important consideration in copyright class settlements involving future licensing agreements. For example, the district court rejected a proposed 2011 settlement in Authors Guild v. Google (Google Book Project) in part because the âinterest and valuesâ of certain absent class members differed from those of the class representatives.250 Under such circumstances, the court found it unacceptable that absent class members might be âdeemedâby their silenceâto have granted to Google a license to future use of the copyrighted works.â251 Similarly, the Second Circuit reversed approval of a proposed settlement in In re Literary Works in Electronic Database Copyright Litigation because the Court agreed with objectors that the named plaintiffs had a structural incentive to favor class members with registered copyrights over class members without any registered copyrightsâa bias that was reflected in the structure of the proposed settlement.252
In both Google Book Project and In re Literary Works, the plaintiffs were subsequently able to satisfy the adequacy requirement, at least at the district court level. In Google Book Project, Judge Chin in 2012 approved Authors Guildâs motion for class certification notwithstanding Googleâs claim that the class representatives were inadequate because a large percentage of absent class members in fact supported Googleâs book project and/or believed they benefited from it.253 In dicta, the Second Circuit noted its view that Googleâs adequacy argument âmay carry some force,â but it vacated certification without analysis to prioritize resolution of the fair use question.254
The class in In re Literary Works also ultimately satisfied the adequacy requirement. The Second Circuit mandated the creation of sub-classes to ensure adequate representation of authors with only non-registered copyrights.255 Subsequently the district court approved a revised settlement agreement that included such a sub-class.256
The second facet of adequacy addresses whether class counsel has the requisite experience and resources to prosecute a complex action on behalf of a class. As of 2003, Rule 23 mandates that courts adjudicating class actions appoint class counsel.257 Since that time, it is unusual for a court to deny class certification on the ground that counsel is inadequate. But in one caseâcoincidentally decided in 2003âthe court did deny certification in a copyright case based in part on poor representation.258
B. Copyright and Class Action Rule 23(b)
In addition to Rule 23(a), a proposed class must also comport with one provision of Rule 23(b), which delineates three major forms of the class action. Given the small number of proposed copyright classes that survived Rule 23(a), there is minimal caselaw analyzing Rule 23(b) in the context of copyright claims. That said, copyright plaintiffs have sought certification under all three Rule 23(b) class subtypesâa pattern that is replicated in pending AI class actions. As described below, the various Rule 23(b) categories differ in ways that are significant for copyright cases. For example, some Rule 23(b) categories prohibit class members from opting out; one category mandates such an opt-out right. Some categories allow money damages; others bar them. As described below, the AI copyright plaintiffs have pleaded certification under all these potential categories, leaving their options open.
Rule 23(b)(1): Although it comes first chronologically, Rule 23(b)(1) is a lesser known, somewhat obscure provision of the class action rule. It contains two subparts.259 Referred to colloquially as the âprejudice classâ provision, (b)(1) authorizes class resolution in situations where individual suits reaching inconsistent results might prejudice either the party opposing the class (typically the defendant), or members of the class (typically the plaintiffs). Rule 23(b)(1) does not permit putative class members to opt out of a certified class; this means that a class certified under (b)(1) will bindâfor good or for illâall who are encompassed within the class definition. In addition, there is no requirement that class members receive individualized notice in (b)(1) classes, although Rule 23 does provide that a court âmay direct appropriate notice to the class.â260
(b)(1)(A): This sub-provision of (b)(1) allows certification where individual suits might âestablish incompatible standards of conduct.â261 On its face, this language seems to contemplate class certification under a very broad set of circumstances. That is not true in practice. Most importantly, this provision does not make certification appropriate simply due to the risk of inconsistent jury verdicts. Construed that way, (b)(1)(A) would make most cases certifiable.262 In practice, this Rule 23(b) category typically applies in contexts where a ruling against the defendant as to one party would force the court to take identical steps with respect to others similarly situated, such as in a case brought by one of multiple riparian landowners.263 In modern day, this subtype is most commonly used in ERISA cases, where a plaintiffâs action is brought on behalf of a retirement plan.264
The court in Showtime/The Movie Channel granted certification in a copyright suit under (b)(1)(A), based on its finding that individual suits by copyright owners against the defendant cable company might create contrary, seemingly arbitrary, results.265 Reflecting an earlier, more naĂŻve approach to class action analysis, however, the courtâs analysis was barely a paragraph long.266
(b)(1)(B): This sub-provision allows certification where individual judgments âas a practical matter, would be dispositive of the interestsâ of those not before the court.267 Just as with (b)(1)(A), stare decisisâor the possibility that different courts might reach different resultsâis not enough to justify certification under this sub-provision. As with Rule 23(b)(1)(A), absent class members do not have a right to opt out of a (b)(1)(B) class.
This is not a strong basis for class certification in the AI context. The paradigmatic use of (b)(1)(B) isâor wasâwhere a defendantâs resources are inadequate to meet the substantial legal claims against it; in the absence of a class action, the first plaintiffs to reach judgment will deplete the available funds at the expense of those later in line. But forcing all affected parties to sue as part of a class raises due process concerns, especially given the lack of an opt-out right. In Ortiz v. Fibreboard, the Supreme Court cast a shadow on this type of class action. It rejected a settlement of an asbestos class action that would have been confined to the available insurance proceeds of the defendant (and not every available dollar); and it warned against âadventurous applicationâ of Rule 23(b)(1)(B).268 As a result, courts rarely certify limited fund classes; like its counterpart (b)(1)(A), this subtype is predominantly used in ERISA suits.269
Courts have not certified copyright class actions under Rule 23(b)(1). In WB Music Corp. v. Rykodisk, Inc., the court refused such certification, noting that because each class memberâs infringement claim was particular to them, resolution of one would not impair the rights of others.270 Similarly, the First Circuit rejected an attempt by a graphic artist to sue a defendant class made up of wholesalers and retailers who infringed the copyright to her wallpaper design.271 In essence, the court found, the plaintiff was arguing, incorrectly, that stare decisis was sufficient to qualify as a prejudice requiring class resolution; the court was unpersuaded.272
23(b)(2): This class action category, referred to as the âcivil rightsâ provision, applies where âthe party opposing the class has acted or refused to act on grounds that apply generally to the class, so that final injunctive relief or corresponding declaratory relief is appropriate respecting the class as a whole.â273 This provision is a vehicle for obtaining declaratory and injunctive relief, rather than money damages.274 As with 23(a), there is no right to opt out of a certified (b)(2) class: a decision is binding on all those who fall within the class.
Rule 23(b)(2) may provide relief to groups of copyright owners in two ways. First, (b)(2) certification can provide the basis for class-wide injunctive reliefâthat is, it may be invoked to prevent future infringement. Second, parties and courts may invoke (b)(2) in so-called bifurcated proceedings. Under Rule 23(c)(4), a court may certify a class only for certain issues. Using this issue-class provision, a court could certify a class under (b)(2) for purposes of determining liability, while leaving damages or other downstream claims to be resolved through a different, perhaps individual, mechanism.275
Two early copyright class actions illustrate this. In one of the cases (b)(2) certification was not appropriate given the plaintiffsâ request for substantial damages awards.276 But the plaintiffs in Ryan v. CARL Corp. were successful in obtaining certification under (b)(2), despite potential for money damages. The Ryan class requested that the court bifurcated issues into liabilityâfor which (b)(2) was appropriateâand damages, which could be adjudicated later, if necessary, either in the form of a (b)(3) class or in individual proceedings.277 The court in Ryan agreed: it approved (b)(2) certification given the âuniformityâ of the defendantâs allegedly infringing practice and stayed consideration of a damages class for another day.278
Yet courts have been skeptical of (b)(2) copyright claims where plaintiffs cannot show a high degree of uniformity in the defendantâs relationship with class members. In Kihn v. Bill Graham Archives LLC, the Ninth Circuit reversed a district court decision certifying a class of musicians alleging copyright violations by a company that allows users access to recordings of live concerts from the 1950s to 1990s.279 The district court had certified the class under both (b)(2) and (b)(3); the appellate court rejected both.280 As to (b)(2), the Court of Appeals found that the result of certificationâan injunction requiring the defendant to remove a whole category of recordings from its websiteâfailed to recognize significant distinctions among class members.281 For example, some of the musicians involved might have reached agreements with the defendant, and others may wish to do so in the future.282 Under (b)(2), those musicians would have no opportunity to opt out of the class.283
23(b)(3): This final class action category, added in 1966, allows certification of a class for purposes of seeking money damages for individual class members.284 The pending AI copyright class actions all seek some form of money damages, which means that they will very likely need to contend with the thorny requirements of (b)(3). The Supreme Court has labeled (b)(3) âthe most adventuresome innovationâ of the modern Rule 23.285 As one scholar stated two decades ago, the (b)(3) standard is fuzzy at best; the rules for interpreting it âhave received virtually no clarification from the Supreme Court, have bewildered lower courts, and have not attracted substantial scholarly scrutiny.â286 Notwithstanding the Roberts Courtâs strong procedural bent, that remains largely true.
Because of inherent tensions between individualsâ right to their own day in court and the monolithic nature of a class determination, plaintiffs seeking (b)(3) certification must make additional showings to demonstrate that a representative actionârather than individual suitsâis appropriate. Specifically, parties seeking (b)(3) certification must prove that common issues in the case predominate over individual issues, and also that a class action is superior to individual suits or other dispute resolution formats.287 Finally, a court certifying a class under Rule (b)(3) must also find that administering the class is feasible,288 although the Supreme Court has clarified that a court need not address trial manageability if it is certifying a class for purposes of settlement.289 Unlike in the other 23(b) categories, members of (b)(3) classes have the right to opt out of membership in the class.290
To show predominance, a proposed class must show that the class issues are not only common, but in fact are centralâthat they are the driving force in the dispute. Not surprisingly, therefore, proposed class suits with questionable commonality and typicality will inevitably founder on the extra predominance requirement.291
In Google Book Project, for example, the district court used Rule 23(b)(3) to certify a class of copyright holders in their suit alleging that Googleâs project to digitize and create a scannable index of books was a âmassive copyright infringement.â292 The court found that â[e]very potential class memberâs claim arises out of Googleâs uniform, widespread practice of copying entire books without permission of the copyright holder and displaying snippets of those books for search.â293 Googleâs fair use defense, the court found, could also be resolved on an aggregate basis.294 The district court also found that a class action was a superior method of resolving the claims against Google, saving both time and money.295 On appeal, the Second Circuit vacated the certification order.296 The appellate court did not take specific issue with the district courtâs analysis, although it intimated some skepticism.297 Rather, it ordered the district court hold its class certification in abeyance until after it resolved the fair use question.298
Class actions involve competing narrativesâconnectivity v. entropy. Copyright certification decisions reflect this tension. For example, in opposing (b)(3) certification, Google argued that copyright ownership was too varied and complex for resolution on a class-wide basis.299 But the district court found that those individual issues were subsidiary to the central question of the legality of Googleâs book project.300 In contrast, in Waite v. UMG Recordings, Inc., the district court denied (b)(3) certification on the ground that individual issuesâspecifically, whether the musical compositions at issue were âmade for hireââprevented resolution of the class claims on common proof.301 Resolution of these same connectivity v. entropy questions will determine class certification in the AI context.
C. Copyright and Class Action Settlements Under Rule 23(e)
Settlements of class actions are differentâand more onerousâthan the private settlements that are traditional in litigation. Because any settlement will bind absent class members to a judgment, Rule 23(e) requires a court to ensure that those absent class members have been adequately represented and that the proposed settlement is âfair, reasonable, and adequate.â302 Reflecting the quasi-regulatory nature of class actions, the end-product of Rule 23(e) is a hybrid creature between a court judgment and a private settlement contract.303 Parties first bargain with each other; then they must persuade the court, over the voices of any objectors, that the settlement is worthy of a judicial stamp of approval.
This approval requires multiple steps. First, a court must determine that a proposed settlement class meets the Rule 23(a) and (b) requirements described above; the Supreme Court expressly admonished litigants and courts that the requirements applyâperhaps with even greater importâin the settlement context.304 Second, Rule 23(e) mandates that courts give appropriate notice to all class members.305 It must also hold a âfairness hearingâ to evaluate the settlement, and to allow objectors to the proposal to have their say.306 In evaluating whether a proposed settlement is adequate, a court must consider, inter alia: whether the representatives fairly represented the class; whether the proposal was negotiated at armsâ length; and whether the relief provided for the class is adequate, taking into account factors such as the costs and risks of proceeding to trial and the method proposed for remitting the relief provided by the settlement (money damages or otherwise) to class members.307
Central to this fairness determination is an evaluation of whether there is a significant conflict among different subsets of the class. In Amchem Products, Inc. v. Windsorâthe Supreme Courtâs master blueprint for class action settlementsâthe Court rejected a global asbestos settlement on the ground that the agreement impermissibly privileged class members with current asbestos-related injuries over those whose injuries would only manifest in the future.308
Notwithstanding the relative paucity of copyright class actions, copyright class settlements have played an influential role in establishing the parameters for technology companiesâ use of intellectual property. The Second Circuitâs analysis of the proposed settlement in In re Literary Works provides an apt illustration.309
Following the Supreme Courtâs decision in Tasini,310 the Second Circuit evaluated a proposed consolidated class settlement in a suit by authors whose works had been placed in electronic databases without their permission.311 The district court had referred the case to mediation; three years later, the parties had reached an agreement.312 They jointly moved the district court to certify the class and approve the settlement.313 In re Literary Works contains two lessons for AI class actions:
1. Class Settlements Must Not Mask Significant Intra-Class Conflicts
The Second Circuit initially rejected the proposal on the ground that it did not adequately represent certain class members.314 Under the terms of the proposed settlement, class members were divided into three classes.315 Category A included class members who hold registered copyrights making them eligible for statutory damages.316 Category B included those with copyrights registered too late to render them eligible for statutory damages, and Category C included those who have not yet registered their copyrightsâa group which composed 99% of the total claims.317 The Court found that there had been inadequate representation of the Category C group.318 Noting that the interests of Category C should have been protected by a sub-class represented by independent counsel,319 the Court remanded with instructions for the creation of three separately represented sub-classes.320 On remand, the district court approved a settlement with those sub-classes.321 As discussed below, some of the pending AI suits have defined the class to include only creators who have timely registered their copyrights; other suits, however, are including both registered and non-registered copyright holders in the class, thus raising the likely need for sub-classes.322
2. Class Settlements May Release Defendants from Liability for Future Actions
Objectors to the proposed settlement in In re Literary Works also challenged the scope of the settlement release, which applied to claims that were beyond those that had been pleaded in the class action.323 Under the terms of the settlement, class members who did not opt out were prohibited from barring future use of their works by the defendant publishers, including the licensing of those works to third parties.324 Objectors argued that future infringements would be new and distinct harms, and they also objected to the provision allowing future sub-licensing.325 But the Second Circuit overruled the objections, finding that the settlementâs scope was acceptable and that the terms were reasonably within the factual predicate of the underlying suit.326
There may be limits to courtsâ tolerance of future releases in some class settlements.327 In the proposed settlement of Google Book Project, for example, the district court rejected a release outlined in a 166-page settlement agreement, finding the scope of the release of future claims too broadâtoo unpredictably constraining on class members.328 But the courtâs rejection of that settlement may be an outlier.329 Courts have approved copyright class settlements with releases in many other contexts.
D. Administrative Rule 23 Requirements
In addition to satisfying Rule 23(a) and one of the categories of Rule 23(b), a class action must be administrableâthat is, a court must find that the parties are capable of implementing a judgment in a way that is fair to absent class members. Several practical elements of Rule 23 are particularly relevant to copyright class actions. Although pre-AI copyright opinions do not expressly address all these administrative elements, they are discussed here for purposes of setting forth the main points of contention in many certification disputes.
1. Defining the Class: Ascertainability and the âFail-safe Classâ
Over the past decade, some federal courts have read an âimplied requirement of ascertainabilityâ into Rule 23.330 The basic concept is not controversial. To be certified, a class must be âsufficiently definite so that it is administratively feasible for the court to determine whether a particular individual is a member.â331 To ensure this, a class must be defined by objective criteria.332 Sometimes this can be complicated. For example, in a consumer class action, it may be difficult to determine by an objective standard who purchased a particular food item or cosmetic.
As it is applied by some courts, ascertainability can interpose a high, seemingly arbitrary barrier to class certification.333 The Third Circuit is the leader of this aggressive approach.334 That Court defends its rigorous ascertainability requirement on the ground that it protects absent class members, ensures due process to the defendant, and eases the cost and administrative burden of an ill-defined class.335 This is purportedly necessary in order to determine the preclusive scope of a judgmentââwho gets the benefit of any relief and who gets the burden of any loss.â336 The term ascertainability is nowhere in the text or rulemaking commentary to Rule 23.
But the Circuits do not agree on the nature and scope of an ascertainability requirementâor even on whether it really existsâand the Supreme Court has yet to weigh in. The Second and Ninth Circuits, where most of the major AI class actions are pending, have both expressly rejected the Third Circuitâs ascertainability requirement as grounded in policy rather than the text of Rule 23.337 Therefore, while ascertainabilityâsometimes called âadministrative feasibilityââmay pose a challenge to class action litigants, the AI class actions are not likely to founder on this ground.
Related to ascertainability is the argument that a court should not certify a âfail-safeâ class.338 A fail-safe class is one in which membership in the class depends on whether a party has a valid claim.339 In other words, if a class member loses their claim for relief, then by definition they were never properly a member of the class, and therefore are not bound by res judicata.340 Neither the Second nor the Ninth Circuits has ruled on the existence or import of the concept of a âfail-safeâ class.341 In any case, however, as discussed below, the AI plaintiffs are not proposing such an ipse-dixit, preclusion-proof class. If courts in the pending AI cases certify classes of copyright holders, the members of those classesâthat is, those creators who do not exercise their right to opt out342âmay either prevail or take nothing, depending on courtsâ views of the merits.
2. The Order of Operations
Under Rule 23, district courts have wide discretion as to how they manage class actions.343 Therefore, while Rule 23 provides that courts should assess class certification at âan early practicable time,â344 many courts allow certification to be decided after decisionsâor even judgmentâon the merits.345
This procedural flexibility inevitably interjects another axis of strategy and uncertainty into class actions, including the pending AI copyright cases. For example, Texaco, discussed above, was brought as a class action. However, the parties stipulated to try the question of fair use prior to addressing class certification. Once the Second Circuit denied Texacoâs fair use defenseâbasically ensuring that Texaco would be held liable for copying research publications for its employeesâthe case settled.346
As discussed below, litigants in the pending AI copyright class actions have taken different procedural pathways. Those decisions have pragmatic and substantive implications.
V. Certification of AI Input Classes
>Against the above background, multiple content creators have filed proposed class actions in federal courts against generative AI companies, seeking relief for various forms of copyright infringement, sometimes accompanied by state law claims. These lawsuits have sprung up primarily in two locations: New York and California, although two are in Delaware.347 The New York cases, all in the Southern District, include actions brought by fiction authors, non-fiction authors, and journalists.348 On the west coast, several suitsâincluding those brought by visual artists,349 music publishers,350 coders,351 and additional suits by authors352âwere brought in California; all are pending in the Northern District.
In addition to the suits seeking formal class certification, there are a handful of cases against AI companies brought by single organizations or corporations that support, aggregate, and disseminate the work of a large number of creators. These include cases by The New York Times, the Center for Investigative Reporting, and Getty Images.353 Because these suits do not implicate the technical or logistical complexities of Rule 23, courts in these suits may reach decisions on the merits, including the applicability of fair use, earlier than their class-action cousins. If that happens, those precedents will influence the default expectationsâand therefore the settlement optionsâof AI litigants across the board.
In the class suits, the partiesâ litigation strategies reflect those adopted by early copyright class litigants discussed above.354 The plaintiff creators argue that the defendantsâ rapacious copying of literary works is the defining conduct in the dispute, providing the basis for class certification and liability. As their complaint states, âOpenAI and Microsoft have built a business valued into the tens of billions of dollars by taking the combined works of humanity without permission.â355 In contrast, the defendantsâwhile trying to evade liability entirely based on fair use356âhave generally made blanket denials regarding the class-wide actions necessary to support class certification.357 The success of these opposing strategies will largely depend on courtsâ views of the underlying intellectual property claims. As is always true of class actions, procedure and substantive law are intertwined. But even if the copyright claims appear to chart new groundâsomething we disputeâthe class certification issues are, mostly, old hat.
For purposes of analyzing the likelihood of obtaining class certification, we divide AI claims into two categories. In the first category, input claims, the plaintiffs have a strong claim for class certification.358 The plaintiffsâ input claim is that AI companies committed direct copyright infringement by duplicating copyrighted works, without asking permission or offering compensation, in order to âtrainâ their AI foundation models. To be sure, these are not slam-dunk claims; the AI defendants are straining mightily for a fair use defense, and the plaintiffs must obtain evidence proving which works were copiedâevidence that they claim the defendants possess. Nevertheless, we believe these input claims are not only meritorious but also that they fit solidly within the Rule 23 class action framework. By way of illustration, we describe two input class claims below.
Several of the pending AI class suits seek a remedy not only for the initial copying of their works, but also for the âmarket usurpation defendants have enabled by making Plaintiffs unwilling accomplices in their own replacement.â359 We refer claims based on this market usurpation as output claims.360 Output claims, often alleging derivative infringement361 or asserting violations of the Digital Millenium Copyright Act, allege that AI companies are using creatorsâ contentâwhether computer code, plot devices, literary text, images, or even nameâto instantly produce and market derivative content without permission, attribution, or compensation. Although output claims are not the focus of this Article, we note that they are likely to require more individualized evidence and be subject to more individualized analysis (for example, of fair use, or substantial similarity). This is likely to make class certification of such claims challenging.362 In all likelihood, however, the strength of those claims will never be tested: the lingering uncertainty and risk for all parties will provide a strong incentive to reach a settlement of all outstanding claims.
Below are two illustrative examples of input claims where we believe class certification is appropriate. As these examples demonstrate, AI copyright classes are not cookie-cutter. To the contrary, parties and attorneys have adopted quite different litigation strategies in these cases.
A. The Input Class: Direct Copyright Infringement in Authors Guild v. OpenAI, Inc.
In Authors Guild v. OpenAI, a professional writersâ organization363 as well as two groups totaling approximately 30 plaintiffsâincluding best-selling fiction authors such as George R. R. Martin and Jodi Picoult, as well as non-fiction authors including Stacy Schiff and Jia Tolentinoâseek to represent classes of fiction and non-fiction authors in consolidated suits against Open AI and Microsoft.364 Among other claims, the plaintiffs allege that OpenAI and Microsoft committed direct copyright infringement when they scraped class membersâ works in order to âtrainâ their AI large language models GPT-N and ChatGPT.365
The consolidated complaint defines the fiction class as â[a]ll natural persons in the United States who are the sole authors of, and legal or beneficial owners of Eligible Copyrights in, one or more Fiction Class Works; and all persons in the United States who are the legal or beneficial owners of Eligible Fiction Copyrights in one or more Fiction Class Works held by literary estates.â366 The non-fiction class definition is the same for non-fiction copyright owners. For purposes of these claims, eligible class members are limited to those who timely registered their copyrights. In their Answers, OpenAI and Microsoft repeatedly deny that any works were infringed.367
In their pleadings, the defendants also assert blanket denials regarding class certification.368 Assuming that the plaintiff authors succeed in defeating a motion for summary judgment on the question of fair use, however, we believe plaintiffsâ direct infringement claims merit class certification.
1. Order of OperationsâFair Use First
As an initial matter, the parties to the consolidated Authors Guild suitânow in the discovery process in the S.D.N.Y.âfollowed the blueprint established in Texaco and negotiated an agreement to allow the district court to first resolve the defendantsâ fair use defense, and only then to take up class certification if necessary.369 In exchange for this concession by the plaintiffs, the defendants agreed to forego filing Rule 12(b)(6) motions to dismiss, and to comply with an expedited briefing and discovery schedule.370
However it is resolved, the Authors Guild courtâs fair use ruling, or a similar ruling in another case, will set the tone for the remainder of the case. As in Texaco, a rejectionâeven a partial oneâof fair use will put the authors in a strong settlement position. A recent decision by the District of Delaware rejecting an AI companyâs fair use defense is one such positive development for creators. In Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., Thompson Reuters, owner of legal search engine Westlaw, alleged that Ross had unlawfully copied Westlawâs headnotes in order to train Rossâs competing product.371 Ross had initially sought to license Westlawâs headnotes. When Westlaw refused, Ross hired a company to create âBulk Memosâ to train Rossâs AI model; the Bulk Memos were created from Westlaw headnotes.372 The district court found over 2,000 acts of direct infringement.373
The court rejected Rossâs claim of fair use. It held that âusing Thomson Reuters’s headnotes as AI data to create a legal research tool to compete with Westlawâ was not transformative.374 Further, it found that the copying could have an effect on Westlawâs market for its own current and future products.375 The court therefore granted summary judgment to Thompson Reuters/Westlaw as to the specific infringed headnotes.376
While this is a promising example of a court rejecting a fair use defense by an AI company for training on copyrighted material, the decision in Thompson Reuters is limited by its context: legal search engine competitors.
If the court in Authors Guild does not reach the same conclusion, however, and it finds that the defendantâs âtrainingâ constitutes fair use, either for all authors or (for example) only for the non-fiction authors, the settlement dynamics will shift in favor of the AI companies. Neither ruling is likely to destroy the incentive to reach some sort of settlement in these cases; the possibility of appellate or Supreme Court reversal will remain a threat. But the fair use determination will set the tone.
That said, no order of operations is risk-free here. Copyright class action plaintiffs on the west coast have opted for the traditional certification-first route. In Tremblay v. OpenAI, Inc., the defendants have already succeeded in dismissing some of the plaintiffsâ claims,377 although not the direct infringement claims.378 That dismissal will narrow the scope of potentially valuable discovery. Litigating class certification is complex and costly. In addition, the presence of a potential fair use defense may muddy the certification analysis. The possibility that the defendants have a valid fair use defense as to even some class members might hinder a court from finding that the plaintiffs have met their burden to show Rule 23 commonality and typicality. In contrast, the Authors Guild plaintiffs will have the Rule 23 wind at their back if they survive a fair use motion for summary judgment.
The decision to resolve fair use prior to certification has other collateral consequences as well, because there are competing class actions in different jurisdictions, with potential overlap in class members. Before class certification, absent class members have not yet been joined to a suit.379 If the Authors Guild case suffers a serious setback prior to certification, or if certification is delayed, absent class members may migrate to Tremblay or a different case that reaches class certification and settlement sooner.
2. Authors Guild Satisfies Rule 23(a) Factors
To succeed on certification, plaintiffs must show that their class claims meet the four Rule 23(a) requirements of numerosity, commonality, typicality, and adequacy of representation.380 As one scholar has noted, these quippy labels âmask the complexity of the case law and scholarship that has accreted over generations to give these terms their meanings.â381 Notwithstanding this complexity, and assuming that fair use is not an obstacle, the Authors Guild plaintiffs have strong arguments for all four 23(a) factors.
Rule 23(a)(1)âthe requirement that the class be âso numerous that joinder of all members is impracticableâ382âis satisfied. Notwithstanding the weak protests of the defendants,383 numerosity is not seriously in doubt.384 Relatedly, by limiting the class membership to creators in the U.S. with registered copyrights,385 the plaintiffs have defined the class in a way that is objective and verifiable, thus satisfying any potential ascertainability arguments. (Note: This is not necessarily true of the Tremblay plaintiffs in the Northern District of California, who are seeking to represent all copyright holders, whether registered or not.386)
The Authors Guild plaintiffs can also likely meet their burden to show commonality under Rule 23(a)(2). Predictably, the plaintiffsâ theory of the case centers on the unifying and uniform act of AI companies duplicating copyrighted works for purposes of âtrainingâ large language models. Specifically, the Consolidated Complaint alleges that â[d]efendants copied and data-mined the works of [class members], without permission or compensation.â387 In so doing, the plaintiffs allege, the registered copyrights of all class members âwere infringed in materially the same way.â388
At some level, this allegation may appear to be circular; it begs the question of what constitutes a material infringement. But the plaintiffsâ claims are more straightforward than that: they allege that the AI defendants unlawfully and directly âcopied Plaintiffsâ works and then fed them into their âlarge language models.ââ389 This is a simple, unifying claim of liability for direct infringement. Whether or not the defendantsâ actions are in fact unlawful, copying is the central question in the case for all class members.390
Plaintiffsâ direct infringement claim also satisfies Rule 23(a)(3)âs typicality requirement, which tends to merge into the commonality analysis.391 Typicality asks whether the claims of the class representatives align with those of the absent class members. In this case, the question is whether the fiction author representativesâ claims mirror those of absent fiction authors, and whether the same dynamic is true for the non-fiction representatives vis-Ă -vis the absent non-fiction class members. It appears that they do. The class representatives allege that the defendant AI companies used all class membersâ copyrighted works in the same way and for the same general (albeit broad and evolving) purpose.392 Beneath the technological complexity of AI, the direct infringement claims are simple and uniformâunlike the âFrankensteinâ claims that the district court rejected in Football Association. In fact, the Second Circuit has found typicality in class actions with far more divergence among class membersâ experiences than exists here.393 Moreover, unlike in Football Association, the classes in Authors Guild are defined concretely and objectively. They are limited to persons in the United States who possess (or are beneficial owners of) registered copyrights.394 The claims are cohesive.
Finally, and for many of the same reasons, the Authors Guild classes satisfy Rule 23(a)(4) adequacy of representation. As just discussed, it does not appear that either the Authors Guild or the individual class representatives have interests that are misaligned with those of the absent class members. In the Google Book Project case, Google alleged that many plaintiff class members supported Googleâs efforts and disagreed with the class claims, thus calling adequacy of representation into question. Although it decided the case on other grounds, the Second Circuit acknowledged that Googleâs argument âmay carry some force.â395 As of now, however, there is no significant outcry by creators who favor giving their intellectual property to AI companies at no cost. To be sure, there are academics and policymakers who favor AI companiesâ free use of copyrighted material.396 But there is as of yet no sign of a serious fissure among creators that would undermine adequacy of representation. In fact, the Library Copyright Alliance, which supports a finding of fair use for AI copying, submitted comments to the Copyright Office urging that office to defer to courtsâ resolution of these questions.397 As to the second element of (a)(4) adequacyâthe quality of the lawyeringâthere is no serious likelihood that the court or the defendants will argue that the renowned attorneys representing the class are in any way inadequate.398
At first glance, it may also appear that the fame and wealth of named plaintiffs like John Grisham and Jodi Picoult could undermine typicality and/or adequacy in a class composed largely of minor authors. After all, Grisham and other best-selling authors might argue that the defendantsâ actions resulted in vastly larger damages. But that should not prove to be a barrier. Copyright law provides for two possible forms of damages: actual and statutory.399 It is often difficult to calculate actual damages. In lieu of that calculation, a plaintiff may elect at any time before final judgment to receive statutory damages for each act of infringement.400 The precise amount of a statutory damages award depends on a courtâs assessment of the seriousness of the infringement and the defendantâs ability to pay; it does not depend on a showing of the damages suffered by the copyright owner.401
The Authors Guild Complaint notes that, should the class elect statutory damages, âthe damages inquiry will likewise be common, if not identical,â among all members.402 The simplicity of statutory damages bolsters the classesâ claims of commonality, typicality, and adequacy. It also raises the specter of a massive damages award.
3. Authors Guild Satisfies Rule 23(b)
The Authors Guild complaint asserts that class certification would be appropriate under all of the Rule 23(b) categories.403 In addition, it alleges that, as an alternative to certification of the entire dispute, the court could bifurcate the proceeding into liability and damages under its power to certify a class only as to certain issues.404 This would entail first certifying a liability-only class under Rule 23(b)(2), for the purposes of resolving the common question of whether the defendants are liable for infringement for copying class membersâ works,405 while leaving the question of damages to another process on another day.
Despite the class certification laundry list in the pleadings, however, the plaintiffsâ most straightforward and effective option is to obtain certification under the most rigorous of the Rule 23(b) options, Rule 23(b)(3), and they are in a strong position to do that on their direct infringement claims. This form of certification had distinct advantages. First, it will allow any objecting authors to opt out of the class and either pursue claims elsewhere or drop any potential claimâthus minimizing the force of any arguments that the class members are not united. Second, unlike (b)(2) classes, in which class members may only obtain declaratory or injunctive relief, (b)(3) classes may seek money damages.
As explained above, to obtain certification under Rule 23(b)(3), the Authors Guild class representatives will need to show what are referred to in shorthand as predominance and superiority. This they can do.
The predominance requirement requires plaintiffs to show that the common issues are at the heart of their case, and that any differences among plaintiffs are of only secondary importance. As discussed above, this provision is pragmatic rather than formalistic; courts have discretion as to how they interpret and apply it.406
But predominance should not be an insuperable barrier to certification of input copyright classes. As of now, the Authors Guild plaintiffsâ strong arguments in favor of commonality and typicality carry over directly to this predominance argument: The plaintiffs are alleging identical conduct by the AI defendants, and they have invoked their right to seek statutory damages, which would minimize the need for thousands of fact-specific individual determinations.
The superiority requirement asks whether a class action is the best option for resolution of the claims at issue.407 Rule 23(b)(3) lists five non-exclusive factors that courts should consider in evaluating superiority, including whether individual class members would have an interest in proceeding separately; the nature of any ongoing litigation in other courts; the appropriateness of the forum, and potential difficulties in managing a class suit.408
As one scholar has recently observed, courts apply this requirement in different and sometimes haphazard ways.409 But the murkiness of superiority in class action doctrine writ large should not detract from its application to this case: a class action is the superior way to resolve the case of thousands of identically situated creators and to give repose to AI defendants. The fiscal and logistical impossibility of individual small-time authors taking on AI companies makes it applicable.
Moreover, the district court in this case has already, albeit indirectly, analyzed some of the superiority factors. The west-coast plaintiffs in Tremblay moved to intervene in Authors Guild for the purpose of moving to dismiss, stay, or transfer the New York action to the Northern District of California.410 In denying the motion, the district court stated its view that the Authors Guild classes had brought suit in an appropriate forum.411 The courtâs order also makes clear its view that the ânature of any litigation concerning the controversy already commenced by or against members of the class is not a barrier to the New York proceedings.412 This finding closely parallels the superiority finding required under Rule 23(b)(3)(B).413
B. The Input Class: Andersen v. Stability AI, Ltd.414
The Authors Guild case concerns AI modelsâ training on text; Anderson v. Stability AI is about images. The named plaintiffs in this proposed class actionâartists including Karla Ortiz, Gerald Brom and Jingna Zhang415âallege that defendants Runway, Stability AI, DeviantArt, and Midjourney downloaded billions of copyrighted images without permission for the purpose of creating and/or using an AI product called Stable Diffusion.416
Specifically, the plaintiffs claim that Runway worked with Stability to help train and then distribute Stable Diffusion, an AI image generator whose work products are âin the style ofâ the named plaintiffs and other artists.417 The training images came from so-called LAION datasets.418 Midjourney trained on Stable Diffusion, and DeviantArt used the model in its own image-generating product, DreamUp. In addition, plaintiffs allege that defendant Midjourney âhas repeatedly promoted the use of artist namesâincluding Plaintiffsâ namesâwithin text prompts as a means of getting better results.â419 These companies now market and sell AI image products; the plaintiffs allege that âwhat theyâre really selling is copyright infringement as a service.â420
The named plaintiffs have alleged several causes of action against each defendant. In addition to their input claimâdirect copyright infringementâthey assert claims for induced copyright infringement, vicarious copyright infringement, violation of the DMCA and the Lanham Act, as well as common law claims for unjust enrichment.421 They characterize defendantsâ AI image-generating products as âcopyright-laundering devices, promising customers the benefits of art without the costs of artists.â422
Unlike in Authors Guild, the parties in this suit did not stipulate to a non-traditional order of operations. Instead, the defendants moved to dismiss for failure to state a claim. Details aside, the plaintiffsâ claims largely survived this motion. The district court found that âthe plausible inferences at this juncture are that Stable Diffusion by operation by end users creates copyright infringement and was created to facilitate that infringement by design.â423
With regard to the specific claims, none of the defendants moved to dismiss the direct copyright infringement claims based on scraping of and âtrainingâ with copyrighted images. The court rejected defendantsâ motion to dismiss plaintiffsâ induced copyright infringement claims.424 For all defendants, the court dismissed the plaintiffsâ DMCA claims with prejudice, and their common law unjust enrichment claims with leave to amend.425 The court ruled variously on the other claims against each defendant:
- Stability AI: the district court granted the defendantsâ motion to dismiss the plaintiffsâ DMCA claims as well as their unjust enrichment claims.426 The court declined to dismiss the plaintiffsâ induced infringement claim.427
- Runway: the district court denied Runwayâs motion to dismiss direct infringement claims based on plaintiffsâ argument that Stable Diffusion is itself an âinfringing statutory copyâ of plaintiffsâ works, and/or that distributing Stable Diffusion is the equivalent of distributing plaintiffsâ works.428 It also denied Runwayâs motion to dismiss the induced infringement claims.429
- Midjourney: the district court denied Midjourneyâs motion to dismiss plaintiffsâ copyright claims; Midjourney had contended that the plaintiffsâ pleadings did not plausibly allege that the artistsâ work was covered by registered copyrights.430 The court also denied Midjourneyâs motion to dismiss plaintiffsâ Lanham Act trademark claims.431
- DeviantArt: the district court denied DeviantArtâs motion to dismiss copyright claims.432 It granted the motion to dismiss plaintiffsâ breach of contract claims and unjust enrichment claims.433
The named representatives in Andersen are seeking class certification under Rules 23(b)(2) and (b)(3).434 They break their (b)(3) class into three sub-classes, based on from which database the copyrighted images were downloaded.435 They also seek separate certification of a (b)(3) class composed of all artists whose names were invoked in prompts by the Midjourney Image Product.436 These multiple class claims are more nuanced than those in Authors Guild. But the Rule 23 analysis for direct infringement is functionally the same.
Discovery is ongoing. Based on currently available information, however, the plaintiffs can satisfy Rule 23(a) and (b) for their direct copyright claims. As with Authors Guild, numerosity is not controversial: the plaintiffs allege that billions of copyrighted artistic works, paired with descriptive captions, were copied without permission.437 And as with Authors Guild, the driving force of the litigationâdefendantsâ allegedly unlawful replication of plaintiffsâ copyrighted worksâmeets the commonality and typicality requirements.438 The First Amended Complaint states the common question as âwhether Defendants violated the copyrights of Plaintiffs and the Class when they downloaded and stored copies of [class membersâ] Works…and…used copies of the Works to train AI Image Products.â439 This allegation of centralized and uniform conduct is the heartland of commonality and typicality. Finally, as of now there are no allegations that either the named representatives or the highly competent attorneys representing the class are inadequate to represent the rights of absent class members.
The logic of Authors Guild also supports certification of the Andersen plaintiffsâ (b)(2) and (b)(3) classes for direct copyright infringement. Again, the centerpiece of certification under both subprovisions is the uniform conduct of the defendantsâconduct that affects the plaintiff visual artists in more or less identical ways. In fact, while analytically Authors Guild and Andersen are very similar, it is likely to be easierâmore intuitiveâto show copying of images than it is of texts.440
Of course, as with Authors Guild, the plaintiffs in Andersen will need to prevail against a fair use summary judgment motion. But, as explained below, we argue that fair use should not be extended to give enormous, for-profit technology companies free license to replicate copyrighted works.
VI. Stealing is Not Fair Use
In the early days of computers, Steve Jobs was known for bragging, âGood companies copy, great companies steal,â while Bill Gates begged hobbyists to stop copying Microsoft software without paying. Ironically, Apple subsequently sued Microsoft for stealing the displays of Appleâs Macintosh computer user interface in Windows 2.0. Perhaps both companies are great, and stealing is at the heart of Big Tech, both then and now.
Fair use, however, was not at issue in that stealing case. Stealing is and should never be deemed fair, as both great companies understand the fundamentals of good business and corporate reputation. Wholesale stealing, in particular, like what gave rise to the AI input class actions, is never fair.441
The defendants in the AI input class actions are riding on their fair use defense to avoid infringement liability. In addition to Oracle, they will rely on Field v. Google and Authors Guild v. Google for support. Unlike Google in these three cases, the AI companies are not engaging in âcachingâ webpages, nor indexing snippets for users to search, nor using a miniscule amount of essential code; instead, they are brazenly stealing wholesale literary text and images to train their models. AI models need copyrighted content and without the creative works of authorship input into the AI models, there will be no AI models. The stealing itself is not transformative. The act of copying or scraping creative works belonging to others is not transformative. The act at that moment does not create anything new to benefit anyone except AI models that have been valued at billions of dollars and enriched Big Tech and Big AI companies. Without acknowledging the existence of copyrights and without compensating creators, the stealing avoids payments to creators and disregards existing human creativity.
Furthermore, like the defendant in American Geophysical Union v. Texaco, Inc. who made copies of academic articles for their corporate use and faced the courtâs ruling that the act of copying the articles was not transformative, AI defendants copy the entirety of literary text and images for their own corporate use of building their AI models likewise is just plain copying without pay. Also, the stealing in the AI input class constitutes some exceptions under copyright law as seen in the MAI Trio cases, because the stealing is massive, permanent, and intentional to build powerful models of artificial intelligence for worldwide domination.
At the moment when the stealing occurs, the input claim of copyright infringement is established. That means the act at that moment of copying and scraping does not transform the literary text or images into something else except AI models, which are trained to commit the second type of copyright infringement by violating the right to prepare derivative works. AI companies will attempt to conflate the input and output claims, injecting end users as a shield for liability. At the input stage, however, there are no users to enter the prompts because the AI companies are busy stealing copyrighted works to complete the training of the AI models. Users enter the scheme only after AI companies debut their AI models long after the AI models have ingested and trained on all the copyrighted works.
The second factor of fair use is the nature of the copyrighted works. The creators will easily satisfy this factor against fair use because their literary text and nonfiction contain the level of creativity entitled to protection in fair use analysis as discussed in Harper & Row v. Nation Enterprises.442 Likewise, on the third factor, the amount and substantiality of the portion used in relation to the copyrighted work as a whole favors the creators because the AI defendants steal the copyrights in their entirety. The last factor, the effect of the use on the potential market for the value of the copyrighted work, also favors against a finding of fair use. Indeed, as we saw in the discussion of the last factor by the court in American Geophysical Union v. Texaco, Inc., the existence of a licensing model to establish royalty for creators makes stealing unfair. Also, AI defendants themselves openly admitted the licensing model for non-literary content as they paid Reddit millions for the license to use the platformâs user content.
Stealing is never fair to victims, and society. Stealing is never fair when Big AI and Big Tech with enormous resources are teaming up to extract the heart and soul of the creativity class. Fair use defense is not available shielding AI defendants in the input claim. AI defendants must pay under a licensing model.
VII. Fair Licensing
The Framers enshrined the Intellectual Property Clause to âpromote the Progress of Science and useful Artsâ into the Constitution by securing âfor limited Timesâ to authors âthe exclusive Rightâ to their âWritings.â443 At the time of the Framing, âpromote the Progress of Scienceâ referred to âthe creation and spread of knowledge and learningâ from works of authorship.444 The young nation needed authors and new works of authorship as expressed and embodied in literature, music, theatre, journalism, fine arts, and architecture for development and growth.445 The works of authorship, they believedâkey contributors to humanity and to defining a new nationâmust be cultivated and incentivized through uniform federal protection.446 Without legal protection, others could freely copy authorsâ works, reducing the incentive to create works of authorship in the young nation and thereafter.447 Executing this constitutional mandate without delay, the new Congress passed the first federal copyright law during its first session in 1790.448
Copyright law embraces human creativity, and the level of creativity necessary for a work of authorship qualified for protection is minimal. The modicum of creativity means new works of authorship are constantly in the stream of creation from the Founding time to the present. As long as the works possessing a modicum of creativity are independently created by the authors and fixed in a tangible medium of expression, the works enjoy the exclusive rights enshrined in the Constitution. An email, a tweet,449 a poem, a song, a drawing, a painting, an essay, an investigative report, a photograph, a blueprint, a chevron design on a cheerleader uniform,450 a musical score, a dance, a recording, a video, a game, a homemade movie, a pamphlet, and a compilation of data,451 regardless of its aesthetic appeals, high or low culture, captivating or distasteful, are entitled to federal protection. Human creativity, in all forms, drives the nationâs soul, as the Constitutionâs âfor the progress of Scienceâ recognizes.
AI companies and their Big Tech backers, as seen in the class action complaints, brazenly take works of authorship, from literary to nonfiction, photographs, and images to music works, to develop their dataset. They steal to build their models. They disregard the constitutionally mandated copyright protection for human creativity as expressed in eight categories of works of authorship.452 Their AI foundation models generate output infringing on copyrighted works and populating the online content with synthetic commons where fakes are quickly replacing human-created text, images, and sounds. The theft of human creativity should be addressed by compensating the authors for their losses. Before we discuss âhow,â a detour on copyright law on damages is illuminating.
Federal protection of works of authorship encompasses a robust enforcement system against infringers. Indeed, copyright law accords the copyright owner with the option to elect statutory damages in the amount of no less than $750 and no more than $30,000 per work being infringed.453 If the infringer committed the act(s) willfully, the statutory damages per work can increase up to $150,000.454 Statutory damages are only available in copyrights, not other types of intellectual property infringements, because Congress recognized that actual damages and profits are difficult to establish by copyright owners, and desired to compensate the copyright owners for the losses to their creativity as expressed in works of authorship. Statutory damages also serve to prevent copyright infringements which are easy to accomplish by copyists. To obtain statutory damages, copyright owners must register their copyrights within three months of publication or before the infringement begins.
Returning to âthe how,â we suggest fair licensing to compensate the creative working class who are the authors of copyrighted works of authorship being taken with neither permission nor compensation from AI companies and their Big Tech backers. What is licensing? What is fair?
Licensing is an old business model in the intellectual property field. Bill Gates sold licenses or the right to use Microsoftâs software which is protected under copyright law to end-users who are individuals and business entities.455 In demanding that people pay for his copyrighted software instead of using it without permission, Gates asked, âWho can afford to do professional work for nothing?â and sparked the licensing of software.456 Microsoft dictates the prices of different licenses, and the individuals and business entities pay. The prices are fair market value because Microsoft, the seller, and the individuals and business entities, the buyers, are unrelated, and they agree to engage in the transactions without pressure.457 In other words, Microsoft and other Big Tech companies know the licensing of copyrights well and their lawyers understand the clientsâ licensing-centered business.
Moreover, the music industry is known for its savviness in licensing copyrights; their experience can shape licensing deals. Further, under the compulsory licenses for music, everyone can use othersâ songs for programming, digital, and physical sound recording releases, without permission from the composition rightsholders but must pay the statutory mechanical rate for each copy sold on the song covered. The rate is 9.1 cents per song or 1.75 cents per minute of playing time, whichever is greater.458 Spotify, Apple Music, and other digital streaming services pay blanket licenses for mechanical royalties, and parties on both sides of blanket licenses can furnish the licensing terms as samples to AI copyright class action cases.459
Likewise, the number of licensing deals AI companies have secured with various social media platforms, media entities, and publishing companies intimates the current pricing lists of different types of content for AI companies to train their models.460 These deals demonstrate that AI companies know how to value and pay for the content types that they need. What constitutes fair market value is what the parties decide in the deals that they both have reasonable knowledge of the copyrighted works, the use of the copyrighted works, and their value, and they act under no undue pressure.
Regarding representation in negotiating for fair terms in licensing for the creative working class, the lawyers for the plaintiffs in the AI class action cases are sophisticated law firms. They are capable of negotiating with AI companies and their Big Tech backers for a fair market value of different licenses of copyrights. Using the statutory damages provisions as guideposts, $750 to $150,000 per work of authorship, the parties through their lawyers can reach sensible licensing terms to achieve the widely recognized fair market value concept.
Moreover, for the licensing to be fair to the creative class who typically possess neither resources nor sophistication for negotiation, we suggest three key terms for consideration. First, the scope of the license grant must satisfy four components: (i) the royalty setting must be fairly commensurate with the individual creatorâs works, (ii) the license grant must preserve the authorâs control over the use of their copyrights in subsequent technologies, unknown at the time of the license but later developed, (iii) the authors are entitled to receive notice of any proposed changes to the scope of the license grant, and (iv) and licenseesâ unilateral expansion of the scope of the license grant constitutes intentional copyright infringement. These components will help tilt the balance from AI defendants/licensees to the creators.
Second, the creators must also retain control over their copyrights in addition to the control relating to the scope of the license grant. Here, the authorship of the creatorsâ works must be acknowledged. This requires appropriate attribution to the original human creators who created the works before machines.
Lastly, creatorsâ autonomy must be respected. That means creators have the right to opt-in or opt-out of the licensing scheme at any time during the duration of the licensing scheme.
Licensing, though not a perfect model, is the only viable option to righting the wrongs, the stealing by AI companies of works of authorship produced by the creative class. The massive stealing must end. With the three key terms outlined above, we believe that the lawyers of both sides can negotiate towards an agreement to conclude the AI input claims, the first in the two steps in the preservation of humanity.
Conclusion
Upholding copyright law in the AI age will not stymie innovation; this is not about fiction authors barring technology from accurately diagnosing cancer. Nor is upholding copyright law against AI companies administratively impossible. In a world where AI can instantaneously accomplish so many tasks, it can also accomplish the task central to the constitutional concept of copyright: preserving and respecting the small, individual, and fundamentally human contributions of creators to national and international culture. The pending AI suits allow these individual creators to aggregate to take on AI defendantsâthe ultimate technological aggregators. We believe that courts can, and should, use long-existing copyright law to recognize and vindicate the copyrights of the creative working class.
****
VIII. Appendix A
Conducted on October 1, 2024, 12:45 p.m.
Sent by you: generate multiple choice questions for sorting Hogwarts houses
Sent by Copilot: Sure! Here are some multiple-choice questions to help sort someone into a Hogwarts house:
- Which quality do you value most?
- A) Bravery
- B) Loyalty
- C) Intelligence
- D) Ambition
- What would you do if you found a lost wallet?
- A) Try to find the owner yourself
- B) Hand it over to the authorities
- C) Look for clues to find the owner
- D) Keep it for yourself
- Which magical creature do you find most fascinating?
- A) Phoenix
- B) Hippogriff
- C) Thestral
- D) Basilisk
- How do you prefer to spend your free time?
- A) Going on adventures
- B) Helping friends
- C) Reading books
- D) Planning your future
- What is your greatest fear?
- A) Failure
- B) Betrayal
- C) Ignorance
- D) Weakness
- Which of these subjects would you excel in?
- A) Defense Against the Dark Arts
- B) Herbology
- C) Charms
- D) Potions
- How do you handle conflict?
- A) Face it head-on
- B) Seek a peaceful resolution
- C) Analyze the situation
- D) Use it to your advantage
- What is your ideal pet?
- A) Lion
- B) Badger
- C) Eagle
- D) Snake
- Which of these quotes resonates with you the most?
- A) âCourage is not the absence of fear.â
- B) âLoyalty is the strongest bond.â
- C) âKnowledge is power.â
- D) âAmbition drives success.â
- What is your approach to teamwork?
- A) Lead the team
- B) Support the team
- C) Strategize for the team
- D) Ensure the team wins
These questions can help identify traits associated with Gryffindor, Hufflepuff, Ravenclaw, and Slytherin. Have fun sorting!
****
Sent by you: generate crypto-related questions sorting Hogwarts houses
Sent by Copilot: Sure! Here are some crypto-related multiple-choice questions to help sort someone into a Hogwarts house:
- Which aspect of cryptocurrency excites you the most?
- A) The potential for revolutionary change
- B) The sense of community and collaboration
- C) The innovative technology behind it
- D) The opportunities for financial gain
- How would you handle a sudden drop in the value of your crypto investments?
- A) Stay calm and hold on
- B) Discuss strategies with fellow investors
- C) Analyze the market trends and data
- D) Look for ways to capitalize on the situation
- What type of blockchain project would you most likely support?
- A) One that aims to disrupt traditional systems
- B) One that focuses on social good and inclusivity
- C) One that introduces groundbreaking technology
- D) One that promises high returns on investment
- How do you prefer to learn about new cryptocurrencies?
- A) Through hands-on experience and experimentation
- B) By joining online forums and communities
- C) By reading whitepapers and technical documents
- D) By following market trends and expert advice
- What is your approach to securing your crypto assets?
- A) Use a combination of methods to ensure safety
- B) Share tips and best practices with others
- C) Stay updated with the latest security technologies
- D) Invest in the most secure and profitable options
- Which of these crypto-related activities appeals to you the most?
- A) Mining or staking
- B) Participating in airdrops and giveaways
- C) Developing or coding blockchain solutions
- D) Trading and investing
- How do you view the future of cryptocurrency?
- A) As a tool for global change and empowerment
- B) As a way to build stronger communities
- C) As a field ripe for innovation and discovery
- D) As a lucrative investment opportunity
- What would you do if you discovered a new, promising cryptocurrency?
- A) Invest immediately and take the risk
- B) Share the information with your network
- C) Research thoroughly before making a decision
- D) Evaluate its potential for high returns
- Which of these statements resonates with you the most?
- A) âFortune favors the bold.â
- B) âTogether, we are stronger.â
- C) âKnowledge is the key to success.â
- D) âAmbition knows no bounds.â
- What is your strategy for staying informed about the crypto market?
- A) Follow the latest news and trends
- B) Engage with the crypto community
- C) Study market analysis and reports
- D) Monitor investment opportunities closely
These questions can help identify traits associated with Gryffindor, Hufflepuff, Ravenclaw, and Slytherin in the context of cryptocurrency. Enjoy sorting!
****
Sent by you: Generate law school-related questions sorting Hogwarts houses
Sent by Copilot: Sure! Here are some law school-related multiple-choice questions to help sort someone into a Hogwarts house:
- Which area of law interests you the most?
- A) Criminal Law
- B) Family Law
- C) Intellectual Property Law
- D) Corporate Law
- How do you handle a challenging case?
- A) Tackle it head-on with determination
- B) Collaborate with colleagues for the best outcome
- C) Research extensively to find the best solution
- D) Strategize to turn the case in your favor
- What motivates you to become a lawyer?
- A) Fighting for justice
- B) Helping those in need
- C) The intellectual challenge
- D) Achieving success and recognition
- How do you prepare for a big exam?
- A) Study diligently and practice under pressure
- B) Form study groups and support each other
- C) Create detailed outlines and review notes
- D) Focus on key areas to maximize your score
- What is your approach to legal ethics?
- A) Uphold justice at all costs
- B) Ensure fairness and compassion
- C) Follow the rules and guidelines meticulously
- D) Navigate the system to benefit your client
- Which legal figure do you admire the most?
- A) Ruth Bader Ginsburg
- B) Thurgood Marshall
- C) Oliver Wendell Holmes Jr.
- D) Clarence Darrow
- How do you handle courtroom pressure?
- A) Stay calm and assertive
- B) Rely on teamwork and support
- C) Use logic and evidence to persuade
- D) Use your wit and charm to influence
- What is your ideal legal career path?
- A) Public defender or prosecutor
- B) Legal aid or non-profit work
- C) Academic or legal scholar
- D) Corporate lawyer or partner at a firm
- Which of these quotes resonates with you the most?
- A) âJustice delayed is justice denied.â
- B) âInjustice anywhere is a threat to justice everywhere.â
- C) âThe law is reason, free from passion.â
- D) âThe ends justify the means.â
- How do you approach legal writing?
- A) Write with passion and conviction
- B) Ensure clarity and empathy
- C) Focus on precision and detail
- D) Make persuasive and compelling arguments
These questions can help identify traits associated with Gryffindor, Hufflepuff, Ravenclaw, and Slytherin in the context of law school. Good luck with your sorting!
