Close Menu
Dailyza | Tech, Investments, Business & World News
  • Startups
  • Venture Capital
  • World
  • Economy
  • Politics
  • Science
  • Technology
  • Travel
  • Culture
Facebook X (Twitter) Instagram
Trending
  • Ripple Invests in Flutterwave to Advance African Crypto Rails
  • Dailyza Exclusive: AI Startup Secures $2.5M to Slash Costs
  • Odyssey Secures $310M to Advance General-Purpose World Models
  • Accel Leads $1B Funding Round to Bolster US Cyber Defenses
  • Lithuanian Drone Startup Secures 2M Euros for Defense Tech
  • Warren Secures €10M Seed Funding to Modernise Belgian Pensions
  • San Francisco Tech Week: Where Innovation Meets High Fashion
  • Tonada Secures $3M Funding to Revolutionize Retail Audio
Dailyza | Tech, Investments, Business & World NewsDailyza | Tech, Investments, Business & World News
Monday, June 22
  • Startups
  • Venture Capital
  • World
  • Economy
  • Politics
  • Science
  • Technology
  • Travel
  • Culture
Dailyza | Tech, Investments, Business & World News
Home»Technology
Adobe logo on a screen as authors file a class-action lawsuit alleging SlimLM AI was trained on pirated books via the SlimPajama and Books3 datasets

Adobe faces new class-action over AI training on pirated books

19 December 2025 Technology No Comments5 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email

Adobe is facing a proposed class-action lawsuit that accuses the software giant of using pirated books to train one of its AI algorithms, escalating a widening legal battle over how generative AI systems are built and what data they can lawfully learn from.

The complaint was filed on behalf of Elizabeth Lyon, an Oregon-based author known for nonfiction writing guidebooks. The lawsuit alleges that Adobe’s language model program called SlimLM was trained, at least in part, on copyrighted works that were copied without permission—potentially affecting a broad class of authors whose books were included in the same training sources.

What the lawsuit alleges about Adobe’s SlimLM training data

According to the filing, the core allegation is that Adobe used pirated versions of numerous books—including Lyon’s works—to train SlimLM. Adobe has described SlimLM as a small language model series designed to be “optimized for document assistance tasks on mobile devices,” a category that typically includes summarization, rewriting, drafting, and other text-centric features.

The lawsuit focuses on the provenance of the data used to pre-train SlimLM. Adobe has stated that SlimLM was pre-trained on SlimPajama-627B, which it characterizes as a “deduplicated, multi-corpora, open-source dataset” released by Cerebras in June 2023. The plaintiff argues that, despite being described as open-source, the dataset is alleged to contain material derived from sources that included pirated copyrighted books.

The chain: SlimPajama, RedPajama, and Books3

The filing argues that SlimPajama was created by copying and manipulating the RedPajama dataset, and that RedPajama, in turn, incorporated Books3—a large corpus of approximately 191,000 books that has repeatedly surfaced in disputes over AI training data. The complaint claims that because SlimPajama is a derivative dataset, it allegedly “contains the Books3 dataset,” including copyrighted works belonging to Lyon and other authors.

At the heart of the dispute is a question that has become central to the generative AI era: whether training on copyrighted text without consent, attribution, or compensation is permissible, and whether “open” datasets can still embed unlawful copies when their upstream sources are contested.

Why Books3 keeps showing up in AI copyright fights

Books3 has become a flashpoint because it is widely discussed as a training source for multiple generative AI systems, yet it is also frequently described by rightsholders as a repository of pirated books. As AI developers race to build more capable models, large-scale text datasets have been assembled from many sources—some licensed, some public domain, and some alleged to be scraped or copied without authorization.

The Lyon complaint underscores a growing tension: even when a company points to an intermediary dataset that is labeled “open-source,” authors argue that the presence of copyrighted works in upstream components can still create liability for downstream users—especially if the copyrighted works were unlawfully obtained in the first place.

Adobe joins a crowded field of AI training lawsuits

The proposed class-action arrives amid a broader wave of litigation targeting how AI models are trained. In recent months, lawsuits have increasingly cited shared datasets and common pipelines that are used across the industry, alleging that multiple companies benefited from the same questionable sources.

RedPajama, specifically, has been referenced in other high-profile cases. A September lawsuit against Apple alleged that the company used copyrighted material in training for its Apple Intelligence efforts, claiming protected works were copied “without consent and without credit or compensation.” In October, a similar lawsuit against Salesforce also alleged use of RedPajama for training purposes. The Adobe complaint follows this pattern by focusing less on a single file or a single book and more on the training supply chain that can propagate disputed content across models.

Settlements raise the stakes for the industry

Legal pressure has also produced expensive outcomes. In September, Anthropic agreed to pay $1.5 billion to authors who accused the company of using pirated versions of their works to train its chatbot, Claude. That resolution was widely viewed as a marker of how costly these disputes can become—and as a signal that courts and plaintiffs may increasingly test the boundaries of copyright law as applied to machine learning.

What this could mean for Adobe’s AI strategy

Adobe has been one of the most prominent creative software companies to embrace generative AI since 2023, rolling out multiple AI features and services, including its Firefly media-generation suite. While the present lawsuit centers on SlimLM and book datasets rather than image generation, it touches a similar reputational nerve: creative professionals and rightsholders want clarity on what data is used, who gets credited, and whether creators are compensated.

If the case proceeds, it may force deeper scrutiny into Adobe’s documentation around model training, dataset selection, and internal governance—particularly how the company evaluates third-party datasets that are described as open-source or publicly available. It could also intensify calls for standardized data audits and provenance tracking, especially as more AI products move onto mobile devices and into everyday workflows.

Key questions likely to shape the case

While the lawsuit’s merits will be argued in court, the allegations raise several questions that have become central across the AI sector:

  • Whether using copyrighted books for model training constitutes infringement or can be defended under doctrines like fair use (depending on jurisdiction and specific facts).
  • How liability should be assigned when a model is trained on a dataset that is itself derived from other datasets.
  • What “open-source dataset” should mean when upstream components may include disputed or unlawful material.
  • Whether authors are entitled to compensation, attribution, opt-out mechanisms, or other remedies when their works are used in training.

For now, the proposed class-action adds another major name to the list of companies being challenged over AI training data practices, and it reinforces a message rippling through the tech industry: dataset provenance is no longer a back-office detail—it is becoming a defining legal and public-trust issue.

Previous ArticleUK Bans Deepfake ‘Nudification’ Apps in New Online Safety Push
Next Article BBC’s Eden: Untamed Planet: Helena Bonham Carter’s Wild
Aden Erickson

Keep Reading

Ripple Invests in Flutterwave to Advance African Crypto Rails

Dailyza Exclusive: AI Startup Secures $2.5M to Slash Costs

Odyssey Secures $310M to Advance General-Purpose World Models

Accel Leads $1B Funding Round to Bolster US Cyber Defenses

Lithuanian Drone Startup Secures 2M Euros for Defense Tech

San Francisco Tech Week: Where Innovation Meets High Fashion

Add A Comment

Leave A Reply Cancel Reply

Warren Secures €10M Seed Funding to Modernise Belgian Pensions

Venture Capital 18 June 2026

Ghent-based fintech startup Warren has raised €10M in seed funding led by Motive Ventures to address the significant pension savings gap for Belgian employees.

Dailyza Exclusive: Why Climate Tech Founders Are Shunning VC

Niklas Zennström Secures €25M Investment from BAE Systems

Monday.com Launches $200M Fund to Accelerate Workplace AI

19-Year-Old Founder Secures $3.5M to Solve Migration Crisis

All-Female VC Team Secures £45M British Business Bank Mandate

Prometheus Lands $12B Series B Led by Jeff Bezos

Ventech Leads €12M Round for Enterprise AI Pioneer

SpaceX Valuation Hits $1.77 Trillion as Gen Z Rushes to Invest

SpaceX Valuation: Wall Street Giants Disagree by $132B

World Fund Berlin: Deep-Tech Founders Push for Sovereignty

fonio.ai Secures $17M Funding From 20VC at $140M Valuation

Databricks Eyes $175B Valuation After $5.4B Revenue

ICEYE Secures €450M Series F to Hit €10B Valuation

Pitchdrive Closes €60M Fund to Back European AI Startups

Dailyza | Tech, Investments, Business & World News
  • Startups
  • Contact
  • About Us
© 2026 Dailyza

Type above and press Enter to search. Press Esc to cancel.