Adobe faces new class-action over AI training on pirated books

Adobe is facing a proposed class-action lawsuit that accuses the software giant of using pirated books to train one of its AI algorithms, escalating a widening legal battle over how generative AI systems are built and what data they can lawfully learn from.

The complaint was filed on behalf of Elizabeth Lyon, an Oregon-based author known for nonfiction writing guidebooks. The lawsuit alleges that Adobe’s language model program called SlimLM was trained, at least in part, on copyrighted works that were copied without permission—potentially affecting a broad class of authors whose books were included in the same training sources.

What the lawsuit alleges about Adobe’s SlimLM training data

According to the filing, the core allegation is that Adobe used pirated versions of numerous books—including Lyon’s works—to train SlimLM. Adobe has described SlimLM as a small language model series designed to be “optimized for document assistance tasks on mobile devices,” a category that typically includes summarization, rewriting, drafting, and other text-centric features.

The lawsuit focuses on the provenance of the data used to pre-train SlimLM. Adobe has stated that SlimLM was pre-trained on SlimPajama-627B, which it characterizes as a “deduplicated, multi-corpora, open-source dataset” released by Cerebras in June 2023. The plaintiff argues that, despite being described as open-source, the dataset is alleged to contain material derived from sources that included pirated copyrighted books.

The chain: SlimPajama, RedPajama, and Books3

The filing argues that SlimPajama was created by copying and manipulating the RedPajama dataset, and that RedPajama, in turn, incorporated Books3—a large corpus of approximately 191,000 books that has repeatedly surfaced in disputes over AI training data. The complaint claims that because SlimPajama is a derivative dataset, it allegedly “contains the Books3 dataset,” including copyrighted works belonging to Lyon and other authors.

At the heart of the dispute is a question that has become central to the generative AI era: whether training on copyrighted text without consent, attribution, or compensation is permissible, and whether “open” datasets can still embed unlawful copies when their upstream sources are contested.

Why Books3 keeps showing up in AI copyright fights

Books3 has become a flashpoint because it is widely discussed as a training source for multiple generative AI systems, yet it is also frequently described by rightsholders as a repository of pirated books. As AI developers race to build more capable models, large-scale text datasets have been assembled from many sources—some licensed, some public domain, and some alleged to be scraped or copied without authorization.

The Lyon complaint underscores a growing tension: even when a company points to an intermediary dataset that is labeled “open-source,” authors argue that the presence of copyrighted works in upstream components can still create liability for downstream users—especially if the copyrighted works were unlawfully obtained in the first place.

Adobe joins a crowded field of AI training lawsuits

The proposed class-action arrives amid a broader wave of litigation targeting how AI models are trained. In recent months, lawsuits have increasingly cited shared datasets and common pipelines that are used across the industry, alleging that multiple companies benefited from the same questionable sources.

RedPajama, specifically, has been referenced in other high-profile cases. A September lawsuit against Apple alleged that the company used copyrighted material in training for its Apple Intelligence efforts, claiming protected works were copied “without consent and without credit or compensation.” In October, a similar lawsuit against Salesforce also alleged use of RedPajama for training purposes. The Adobe complaint follows this pattern by focusing less on a single file or a single book and more on the training supply chain that can propagate disputed content across models.

Settlements raise the stakes for the industry

Legal pressure has also produced expensive outcomes. In September, Anthropic agreed to pay $1.5 billion to authors who accused the company of using pirated versions of their works to train its chatbot, Claude. That resolution was widely viewed as a marker of how costly these disputes can become—and as a signal that courts and plaintiffs may increasingly test the boundaries of copyright law as applied to machine learning.

What this could mean for Adobe’s AI strategy

Adobe has been one of the most prominent creative software companies to embrace generative AI since 2023, rolling out multiple AI features and services, including its Firefly media-generation suite. While the present lawsuit centers on SlimLM and book datasets rather than image generation, it touches a similar reputational nerve: creative professionals and rightsholders want clarity on what data is used, who gets credited, and whether creators are compensated.

If the case proceeds, it may force deeper scrutiny into Adobe’s documentation around model training, dataset selection, and internal governance—particularly how the company evaluates third-party datasets that are described as open-source or publicly available. It could also intensify calls for standardized data audits and provenance tracking, especially as more AI products move onto mobile devices and into everyday workflows.

Key questions likely to shape the case

While the lawsuit’s merits will be argued in court, the allegations raise several questions that have become central across the AI sector:

Whether using copyrighted books for model training constitutes infringement or can be defended under doctrines like fair use (depending on jurisdiction and specific facts).
How liability should be assigned when a model is trained on a dataset that is itself derived from other datasets.
What “open-source dataset” should mean when upstream components may include disputed or unlawful material.
Whether authors are entitled to compensation, attribution, opt-out mechanisms, or other remedies when their works are used in training.

For now, the proposed class-action adds another major name to the list of companies being challenged over AI training data practices, and it reinforces a message rippling through the tech industry: dataset provenance is no longer a back-office detail—it is becoming a defining legal and public-trust issue.

Adobe faces new class-action over AI training on pirated books

Promptwatch Secures €6M to Navigate the AI-Driven SEO Shift

Dailyza: New AI Risk Frameworks Standardise Global Cyber Safety

Helsing Secures $1.8B Funding to Expand AI Defence Platform

Dailyza: Why Gaming Is the Modern Antidote to Daily Stress

Pollo AI Review: Is This Image Generator Right for Marketers?

Dexory: Oana Jinga on Warehouse Automation and Robot Strategy

Leave A Reply Cancel Reply

Skalar Secures 12 Million Euro to Revolutionize AI Accounting

SFC Capital Secures £1M Cash Return from Initial Angel Fund

US Investors Dominate Europe’s AI Funding Landscape in Q2 2026

Mercor Targets $20B Valuation Despite High-Profile Data Breach

Lovable Targets $12B Valuation Amid Rapid Low-Code Expansion

Paradigm Secures $1.2B Capital to Drive AI and Robotics Growth

Kord Secures £6.4M to Revolutionise Property Transactions

Dailyza Analysis: 15 New AI Unicorns Emerge in June 2026

Tangos Secures $20 Million Investment for AI Crime Detection

Myricx Bio Secures $1.5B Novartis Deal After $121M Funding

Expeditions Secures €197M to Boost Defence and Deep Tech

Talp Secures $20 Million Pre-Seed Funding to Scale Operations

Technovation CEO Tara Chklovski on 2025 Startup Funding Shifts

Rivage Secures €1.5 Million to Scale AI Rental Management

Crusoe Eyes $3B Funding Round at $30B Valuation

Adobe faces new class-action over AI training on pirated books

What the lawsuit alleges about Adobe’s SlimLM training data

The chain: SlimPajama, RedPajama, and Books3

Why Books3 keeps showing up in AI copyright fights

Adobe joins a crowded field of AI training lawsuits

Settlements raise the stakes for the industry

What this could mean for Adobe’s AI strategy

Key questions likely to shape the case

Keep Reading

Leave A Reply Cancel Reply