Close Menu
Dailyza | Tech, Investments, Business & World News
  • Startups
  • Venture Capital
  • World
  • Economy
  • Politics
  • Science
  • Technology
  • Travel
  • Culture
Facebook X (Twitter) Instagram
Trending
  • X-energy Launches IPO Roadshow, Targets $814M for SMR Commercialization
  • Qalzy Launches Pre-Seed Round to Enhance AI Nutrition Scale
  • Upscale AI Secures $200M Series A to Enhance Data Centre Networking
  • urfuture Secures £1.7M Seed Funding to Revolutionize Hiring
  • Solidroad Secures $25M Series A to Revolutionize QA with AI
  • CamGraPhIC Secures €211 Million Funding from European Commission
  • Dailyza: EU-Startups Summit 2026 to Ignite Innovation in Malta
  • Outcraft AI Secures €2 Million in Pre-Seed Funding from Practica Capital
Dailyza | Tech, Investments, Business & World NewsDailyza | Tech, Investments, Business & World News
Friday, April 17
  • Startups
  • Venture Capital
  • World
  • Economy
  • Politics
  • Science
  • Technology
  • Travel
  • Culture
Dailyza | Tech, Investments, Business & World News
Home»Technology
Adobe logo on a screen as authors file a class-action lawsuit alleging SlimLM AI was trained on pirated books via the SlimPajama and Books3 datasets

Adobe faces new class-action over AI training on pirated books

19 December 2025 Technology No Comments5 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email

Adobe is facing a proposed class-action lawsuit that accuses the software giant of using pirated books to train one of its AI algorithms, escalating a widening legal battle over how generative AI systems are built and what data they can lawfully learn from.

The complaint was filed on behalf of Elizabeth Lyon, an Oregon-based author known for nonfiction writing guidebooks. The lawsuit alleges that Adobe’s language model program called SlimLM was trained, at least in part, on copyrighted works that were copied without permission—potentially affecting a broad class of authors whose books were included in the same training sources.

What the lawsuit alleges about Adobe’s SlimLM training data

According to the filing, the core allegation is that Adobe used pirated versions of numerous books—including Lyon’s works—to train SlimLM. Adobe has described SlimLM as a small language model series designed to be “optimized for document assistance tasks on mobile devices,” a category that typically includes summarization, rewriting, drafting, and other text-centric features.

The lawsuit focuses on the provenance of the data used to pre-train SlimLM. Adobe has stated that SlimLM was pre-trained on SlimPajama-627B, which it characterizes as a “deduplicated, multi-corpora, open-source dataset” released by Cerebras in June 2023. The plaintiff argues that, despite being described as open-source, the dataset is alleged to contain material derived from sources that included pirated copyrighted books.

The chain: SlimPajama, RedPajama, and Books3

The filing argues that SlimPajama was created by copying and manipulating the RedPajama dataset, and that RedPajama, in turn, incorporated Books3—a large corpus of approximately 191,000 books that has repeatedly surfaced in disputes over AI training data. The complaint claims that because SlimPajama is a derivative dataset, it allegedly “contains the Books3 dataset,” including copyrighted works belonging to Lyon and other authors.

At the heart of the dispute is a question that has become central to the generative AI era: whether training on copyrighted text without consent, attribution, or compensation is permissible, and whether “open” datasets can still embed unlawful copies when their upstream sources are contested.

Why Books3 keeps showing up in AI copyright fights

Books3 has become a flashpoint because it is widely discussed as a training source for multiple generative AI systems, yet it is also frequently described by rightsholders as a repository of pirated books. As AI developers race to build more capable models, large-scale text datasets have been assembled from many sources—some licensed, some public domain, and some alleged to be scraped or copied without authorization.

The Lyon complaint underscores a growing tension: even when a company points to an intermediary dataset that is labeled “open-source,” authors argue that the presence of copyrighted works in upstream components can still create liability for downstream users—especially if the copyrighted works were unlawfully obtained in the first place.

Adobe joins a crowded field of AI training lawsuits

The proposed class-action arrives amid a broader wave of litigation targeting how AI models are trained. In recent months, lawsuits have increasingly cited shared datasets and common pipelines that are used across the industry, alleging that multiple companies benefited from the same questionable sources.

RedPajama, specifically, has been referenced in other high-profile cases. A September lawsuit against Apple alleged that the company used copyrighted material in training for its Apple Intelligence efforts, claiming protected works were copied “without consent and without credit or compensation.” In October, a similar lawsuit against Salesforce also alleged use of RedPajama for training purposes. The Adobe complaint follows this pattern by focusing less on a single file or a single book and more on the training supply chain that can propagate disputed content across models.

Settlements raise the stakes for the industry

Legal pressure has also produced expensive outcomes. In September, Anthropic agreed to pay $1.5 billion to authors who accused the company of using pirated versions of their works to train its chatbot, Claude. That resolution was widely viewed as a marker of how costly these disputes can become—and as a signal that courts and plaintiffs may increasingly test the boundaries of copyright law as applied to machine learning.

What this could mean for Adobe’s AI strategy

Adobe has been one of the most prominent creative software companies to embrace generative AI since 2023, rolling out multiple AI features and services, including its Firefly media-generation suite. While the present lawsuit centers on SlimLM and book datasets rather than image generation, it touches a similar reputational nerve: creative professionals and rightsholders want clarity on what data is used, who gets credited, and whether creators are compensated.

If the case proceeds, it may force deeper scrutiny into Adobe’s documentation around model training, dataset selection, and internal governance—particularly how the company evaluates third-party datasets that are described as open-source or publicly available. It could also intensify calls for standardized data audits and provenance tracking, especially as more AI products move onto mobile devices and into everyday workflows.

Key questions likely to shape the case

While the lawsuit’s merits will be argued in court, the allegations raise several questions that have become central across the AI sector:

  • Whether using copyrighted books for model training constitutes infringement or can be defended under doctrines like fair use (depending on jurisdiction and specific facts).
  • How liability should be assigned when a model is trained on a dataset that is itself derived from other datasets.
  • What “open-source dataset” should mean when upstream components may include disputed or unlawful material.
  • Whether authors are entitled to compensation, attribution, opt-out mechanisms, or other remedies when their works are used in training.

For now, the proposed class-action adds another major name to the list of companies being challenged over AI training data practices, and it reinforces a message rippling through the tech industry: dataset provenance is no longer a back-office detail—it is becoming a defining legal and public-trust issue.

Previous ArticleUK Bans Deepfake ‘Nudification’ Apps in New Online Safety Push
Next Article BBC’s Eden: Untamed Planet: Helena Bonham Carter’s Wild
Aden Erickson

Keep Reading

X-energy Launches IPO Roadshow, Targets $814M for SMR Commercialization

Qalzy Launches Pre-Seed Round to Enhance AI Nutrition Scale

Upscale AI Secures $200M Series A to Enhance Data Centre Networking

Solidroad Secures $25M Series A to Revolutionize QA with AI

Outcraft AI Secures €2 Million in Pre-Seed Funding from Practica Capital

Jane Street Invests $7 Billion in CoreWeave AI Cloud Services

Add A Comment

Leave A Reply Cancel Reply

urfuture Secures £1.7M Seed Funding to Revolutionize Hiring

Venture Capital 17 April 2026

urfuture has raised £1.7M in seed funding to innovate hiring through behavioral matching for Gen Z.

CamGraPhIC Secures €211 Million Funding from European Commission

Dailyza: EU-Startups Summit 2026 to Ignite Innovation in Malta

Accel Secures $5 Billion to Fuel AI Startups Growth

EVANIUM Secures €2.2 Million to Advance OPTISOLV® Technology

Dailyza Announces EU-Startups Summit 2026 in Malta

Newfund Launches HEKA, Europe’s First €60M BrainTech Fund

GPO Fund’s Jeff Stewart on Strategic IPO Decisions for Startups

Dailyza Explores Compliance Challenges for Remote Startups in Europe

LightSeeds Secures €162k Funding to Boost CleanTech Solutions

Dailyza: Where Nordic Women-Founded Startups Face Capital Challenges

SiFive Secures $400M From NVIDIA, Apollo Ahead of IPO

EIGHT Portugal raises €3M Seed to scale video-first dating app

MillTech secures $60M from Apax Digital at $325M valuation

Eka Ventures closes new fund to back life, health and climate tech

Dailyza | Tech, Investments, Business & World News
  • Startups
  • Contact
  • About Us
© 2026 Dailyza

Type above and press Enter to search. Press Esc to cancel.