Murdoch vs. Machine: News Corp Takes on Bezos-Backed Perplexity AI in Landmark Copyright Battle

 AI Systems Analyzing News Content - Copyright Battle Visualization

AI companies are revolutionizing how news content is processed and distributed, creating unprecedented challenges for traditional media organizations. The battle between News Corp and Perplexity AI represents a critical moment that will shape the future relationship between artificial intelligence and journalism, potentially setting new precedents for digital content rights.

by
October 22, 2024

Rupert Murdoch’s news empire is taking on Jeff Bezos-backed artificial intelligence company Perplexity in a blockbuster lawsuit that could have major implications for the future of AI and intellectual property rights. Dow Jones & Company, publisher of The Wall Street Journal, and NYP Holdings, owner of the New York Post, have accused Perplexity of “massive” copyright infringement, “illegal” misappropriation of content, and peddling fake news “hallucinations” in its AI-powered search engine.

This guide breaks down everything you need to know about this high-stakes legal battle, from the key allegations and players to the potential impacts on the news industry and AI regulation. Learn how the case could set major precedents at the intersection of copyright law, trademark rights, and cutting-edge AI technologies.

Understand the ins-and-outs of the Copyright Act and Lanham Act claims, the fair use defense, and the evolving legal landscape around training AI on copyrighted works. See how this lawsuit fits into the larger debate over whether AI is a threat or boon to content creators and publishers.

1. The Parties & Background

    • Dow Jones & NYP Holdings: Subsidiaries of Rupert Murdoch’s News Corp, they publish leading newspapers The Wall Street Journal and New York Post.
    • Perplexity: An AI startup backed by Jeff Bezos and founded in 2022 that uses generative AI and large language models to power a search engine.
    • News Content: Plaintiffs employ journalists to create original, copyrighted articles. This content fuels their subscription and ad revenue.
    • AI Training: Perplexity allegedly scraped plaintiffs’ articles without permission to train its AI models and build its search index.
    • “Skip the Links”: Perplexity touts that its AI provides full answers based on scraped sources, so users can bypass clicking through to publishers’ sites.

Key Context

    • Many AI companies rely on “web scraping” to ingest online data, like news articles, to train their large language models and knowledge bases.
    • Publishers argue this mass copying violates their copyrights and diverts readers and revenue from their sites by providing article summaries and excerpts.
    • But some argue this AI training is “fair use” – a legal doctrine allowing limited use of copyrighted material without permission for transformative purposes.
    • The rise of generative AI and chatbots that can produce human-like text has accelerated debates over what constitutes copyright infringement by AI.
    • This case could help establish key legal parameters and licensing models as publishers seek to protect content while tech giants aim to train AI on it.

Stakes & Implications

    • If successful, the lawsuit could force AI companies to pay to license news content for training purposes and limit article scraping and summarization.
    • It may establish that verbatim copying of articles to train AI is not fair use, even if the final outputs are different from the original articles.
    • The case highlights the tension between publishers fighting to protect their business models and AI firms seeking to index the web to build services.
    • Striking the right balance is crucial – if publishers can’t monetize content, it could reduce quality journalism; but overly restricting AI could hamper innovation.
    • As generative AI advances, the lawsuit may spur new legal frameworks and norms around licensing content to train AI while protecting IP rights.

FAQs

    • Is all web scraping for AI training illegal? Not necessarily – it depends on factors like how much was copied, for what purpose, and effects on the market for the original.
    • Have there been similar lawsuits against AI companies? Yes – publishers have sued Google, OpenAI, and others over scraping content to train AI models.
    • What’s the fair use argument for AI training? That ingesting articles to build an AI index that provides different outputs is a “transformative” use allowed under copyright law.
    • Will this shut down Perplexity’s search engine? Not necessarily, but if Perplexity loses, it may need to remove infringing content, limit article use, and potentially pay damages.
    • Could this stifle AI innovation? Some worry aggressive copyright enforcement against AI will chill development of new technologies and services.

2. The Copyright Infringement Claims

    • Unauthorized Copying: Perplexity allegedly copied plaintiffs’ articles without permission to train its AI and build its search index.
    • Two Forms of Infringement: Complaint alleges Perplexity infringed by 1) copying articles as “inputs” to train AI, and 2) generating infringing “outputs” in search results.
    • Unfair Substitution: Perplexity’s search results with article excerpts, summaries and paraphrases aim to substitute for and replace original articles.
    • Lost Revenue: By discouraging clicks to publishers’ sites, Perplexity allegedly diverts readers and ad/subscription revenues from plaintiffs.
    • Not Fair Use: Plaintiffs argue Perplexity’s copying is not protected “fair use” as it’s commercial, not transformative, copies full articles, and harms their markets.

How Copyright Applies to AI

    • Copyright gives authors exclusive rights to copy and make “derivative works” (adaptations) of their original content.
    • Plaintiffs argue Perplexity violates these rights by copying full articles to build its training database and index without a license.
    • They also contend Perplexity search outputs infringe by reproducing excerpts of their articles and creating derivative summaries/paraphrases.
    • Perplexity will likely argue its use is “fair use” – allowed if purpose is “transformative” and doesn’t substitute for original articles in the market.
    • But plaintiffs say Perplexity encourages users to skip their sites, so the AI results do replace the articles and divert readers and revenue.

Training vs. Outputs

    • An important issue is whether ingesting articles to train an AI model is a copyright violation itself, even if the AI’s outputs differ from the original content.
    • Plaintiffs argue Perplexity’s initial copying to its database is infringement regardless of what the final AI-generated text contains.
    • But some courts have found that intermediate copying to build searchable databases is fair use if the final product shown to users is transformative.
    • Plaintiffs will stress Perplexity’s search results include verbatim article excerpts, especially for paid subscribers, in addition to AI-generated summaries.
    • So the argument is both the initial copying and the AI-generated previews in search infringe, acting as an illegal substitute for the original articles.

FAQs

    • Is scraping content always copyright infringement? Not necessarily – scraping just facts/ideas is allowed, but copying creative expression raises infringement issues.
    • What’s the test for “fair use”? Courts weigh 1) purpose of use, 2) nature of copyrighted work, 3) amount copied, and 4) effect on market for original work.
    • Have courts ruled on fair use and AI training? Not definitively – there are ongoing cases, but no clear precedent on scraping web data for AI models.
    • Can AI-generated text infringe copyright? Yes, if it reproduces copyrighted content like verbatim excerpts or close paraphrases that substitute for the original.
    • What’s the difference between derivative works vs. transformative fair uses? Derivatives recast original works; transformative fair uses have a new purpose that doesn’t substitute.

3. The Trademark Claims & AI “Hallucinations”

    • Unauthorized Use of Marks: Perplexity allegedly uses plaintiffs’ trademarks like “The Wall Street Journal” and “New York Post” without permission in search results.
    • “Hallucinations”: Plaintiffs argue Perplexity’s AI wrongly attributes false statements and made-up articles to their publications.
    • Reputation Harm: Linking plaintiffs’ respected news brands to fake content allegedly harms their reputation for accuracy and misleads readers.
    • Trademark Violations: The unauthorized use of marks with false info allegedly violates trademark law by creating confusion and diluting the brands.
    • Lanham Act Claims: Complaint alleges false designation of origin and trademark dilution under the federal Lanham Act.

Trademark Rights in News Brands

    • Trademarks protect brand names, logos and slogans used to identify the source of goods/services, like news publications.
    • Trademark owners can stop others from using their marks in a way that confuses consumers about the source or affiliation of content.
    • Plaintiffs argue Perplexity violates their trademark rights by wrongly linking made-up “hallucinated” articles to trusted news sources.
    • By falsely attributing fake news to the WSJ and NY Post, plaintiffs say Perplexity deceives readers, tarnishes their reputation, and waters down their brand value.
    • Trademark dilution law specifically protects “famous” brands from uses that blur or harm their distinctiveness, even without confusion.

AI “Hallucinations” & Fake News

    • AI “hallucinations” are false or nonsensical statements the AI confidently makes, often by mixing up or wrongly combining info from training data.
    • Perplexity’s AI allegedly hallucinates fake quotes, made-up stories, and inaccurate info that it attributes to the WSJ and NY Post.
    • The lawsuit argues this sullies plaintiffs’ reputation for accuracy, makes readers think they published false content, and erodes trust in their reporting.
    • By “passing off” hallucinated content as plaintiffs’ work, Perplexity allegedly violates trademark law’s prohibition on false designation of origin.
    • Plaintiffs will have to show Perplexity’s use of their marks with fake info is likely to confuse consumers or harm the goodwill and reputation of their brands.

FAQs

    • Does all AI-generated fake news violate trademarks? Not necessarily – it depends if the fake info is attributed to a specific source in a way that uses their marks to confuse consumers.
    • What’s the difference between trademark infringement and dilution? Infringement requires likelihood of confusion; dilution protects famous marks from blurring or tarnishment even without confusion.
    • Why are AI hallucinations a growing concern? AI models can mix up info in training data and state false claims as facts; as AI use grows, this misinformation can spread rapidly.
    • How else could AI raise trademark issues? AI could create logos/designs that are confusingly similar to existing brands; as AI generates more content, trademark conflicts may rise.
    • Could disclaimers help avoid trademark liability? Possibly, if they clearly inform users that AI outputs are not affiliated with or endorsed by trademark owners. But disclaimers aren’t always effective.

4. Perplexity’s Potential Defenses & Counterarguments

    • Fair Use: Perplexity argues web scraping to train AI is fair use because the resulting search outputs are transformative and provide a unique service.
    • Different Purpose: Perplexity says its AI-generated previews have a different purpose (finding relevant sources) vs. substituting for full articles.
    • Intermediate Copying: Perplexity stresses scraping articles for its index/training is an intermediate step; the end product is different and often links to sources.
    • Public Benefit: Perplexity touts its service makes useful article info more easily accessible to readers, an AI-enabled public benefit.
    • No Substitution: Perplexity argues its short previews and excerpts don’t substitute for paywalled articles behind login screens, which aren’t publicly accessible.

Key Fair Use Factors

    • Purpose: Perplexity says using article text/data to provide an AI-powered search tool is transformative with public benefits.
    • Nature of Works: But plaintiffs stress that their highly creative, expressive articles deserve strong protection against copying, even for AI.
    • Amount Copied: Plaintiffs emphasize Perplexity copies full articles for its index/training; but Perplexity argues the end outputs are limited excerpts.
    • Market Effects: A key dispute is if Perplexity’s previews substitute for and divert revenue from the original articles or have a different purpose.
    • Balancing: The court will have to weigh these factors to assess if Perplexity’s copying ultimately counts as fair use or infringement.

Copyright Preemption of Unfair Competition Claims

    • Copyright law “preempts” (overrides) some state law claims like unfair competition that target unauthorized uses of content.
    • The idea is copyright should provide uniform nationwide protection for expressive works, not a patchwork of state laws.
    • If plaintiffs had also sued for state unfair competition, Perplexity would argue those claims are preempted by the Copyright Act.
    • But plaintiffs wisely focused on federal copyright and trademark claims, which are not subject to preemption.
    • Still, the preemption defense could nullify any state law claims plaintiffs later try to add to challenge Perplexity’s use of their content.

FAQs

    • How will Perplexity try to prove fair use? By arguing its search purpose is transformative, outputs are limited, and use doesn’t substitute for original articles.
    • What’s the key question on market impact? Whether Perplexity’s previews satisfy reader interest so they don’t visit the original articles, diverting traffic and revenue.
    • Why does copyright preemption matter here? It could block plaintiffs from bringing additional state law claims against Perplexity for using their content.
    • How might the fair use ruling impact other AI companies? It could clarify if scraping web data to train AI is fair use and any limits on AI-generated content.
    • What other defenses might Perplexity raise? Arguing some article material is unprotectable facts/ideas; disputing copyright ownership or validity; DMCA safe harbor.

5. Potential Paths Forward & Key Takeaways

    • Settlement & Licensing: Parties could settle with Perplexity agreeing to license news content, limit scraping/excerpts, and give more credit to sources.
    • Proceed to Discovery & Trial: If no settlement, expect a fierce court battle to set precedents on scraping copyrighted content for AI training.
    • Potential Injunction: Plaintiffs will likely seek an injunction to stop Perplexity from scraping and reproducing their content without permission.
    • Congressional Action: The case may spur calls for Congress to update copyright law to address AI’s use of creative content.
    • Industry Shifts: As the litigation unfolds, watch for news/media companies and AI developers to negotiate more content licensing deals.

Importance of the Fair Use Ruling

    • How the court applies copyright fair use to Perplexity’s practices will be closely watched as a key ruling for the AI industry.
    • Other generative AI companies scrape Internet data to train models, so broad precedents on fair use could impact their practices too.
    • The ruling could help clarify where data scraping for AI crosses the line and when AI-generated outputs infringe copyrights.
    • Perplexity will push for a flexible fair use win; plaintiffs want a ruling that scraping full articles to train AI is not transformative and harms markets.
    • The nuanced fair use analysis could establish guidelines for what “robot readers” can lawfully ingest as AI rapidly evolves.

Balancing IP Rights & AI Innovation

    • This case reflects the tricky balance between protecting intellectual property and enabling beneficial AI technologies.
    • News publishers understandably want to control and monetize their content, especially as AI threatens traditional media models.
    • But overly rigid copyright rules could stifle AI tools that make information more accessible and offer new creative possibilities.
    • Policymakers and courts will need to adapt IP law for the AI age in ways that incentivize quality content creation and cutting-edge innovation.
    • Collaborative licensing approaches could help, but high-stakes lawsuits may also be needed to set boundaries and prompt congressional action.

Key Takeaways

    • Uncharted Territory: This case is a key early test of how courts will apply copyright and trademark law to generative AI and web scraping to train AI models.
    • Fair Use Showdown: Expect a fierce battle over whether Perplexity’s scraping and AI-generated previews of news articles are copyright fair use or infringement.
    • Reputational Risk: The trademark claims highlight how generative AI can create “hallucinated” content that is falsely attributed to brands, misleading consumers.
    • Innovation Policy: The rulings in this case could shape practices of other AI companies that rely on Internet data and how strictly copyright restricts AI.
    • New Legal Frameworks: As more suits challenge AI’s use of creative content, watch for potential congressional action to update IP law and new licensing models to emerge.

Summary

 Advanced AI robot analyzing burning newspaper with digital interface display

The lawsuit by WSJ and NY Post parent companies against Perplexity AI could establish major precedents on copyright, trademark, and fair use boundaries as applied to artificial intelligence systems that are trained on creative content.

The high-stakes copyright and trademark infringement lawsuit by Dow Jones and NYP Holdings against AI startup Perplexity will be closely watched for its potential to set landmark precedents on critical issues regarding AI’s use of creative content.

With allegations of massive web scraping of news articles to train Perplexity’s AI models and generate search previews, the case tees up a major battle over fair use rights in the age of artificial intelligence. The trademark claims involving AI “hallucinations” of false news content attributed to the media plaintiffs also highlight the reputational risks of generative AI for brands.

As the first major copyright lawsuit against a generative AI company proceeds, the rulings could have a profound impact on the practices of AI developers that rely on ingesting Internet data and the degree to which existing intellectual property rights restrict artificial intelligence. The case may spur action in Congress to adapt copyright law for AI and new licensing models between content owners and AI companies.

Ultimately, this landmark litigation reflects the critical need to strike a careful balance between upholding intellectual property rights that incentivize the creation of quality content and journalism, while still enabling the development of innovative and societally beneficial artificial intelligence capabilities.

Test Your Knowledge

Questions: Dow Jones/NYP v. Perplexity AI Lawsuit

    • 1. What are the key claims against Perplexity?
      • A) Copyright infringement for scraping articles to train its AI
      • B) Trademark violations for falsely attributing “hallucinated” articles to the publishers
      • C) Patent infringement for its AI technology
      • D) Both A and B
    • 2. Why do the plaintiffs argue Perplexity’s scraping is not “fair use”?
      • A) It copies full articles for a commercial purpose
      • B) The AI previews replace the original articles
      • C) It harms the market/value of the content
      • D) All of the above
    • 3. What are some of Perplexity’s likely defenses?
      • A) Its use is transformative with a different purpose
      • B) The AI outputs are different than the articles
      • C) It benefits the public by making info more accessible
      • D) All of the above
    • 4. Why does it matter if the AI outputs contain fake info?
      • A) It could violate trademarks by falsely attributing content to the publishers
      • B) It harms the publishers’ reputations for accuracy
      • C) It misleads readers about the origins of the content
      • D) All of the above
    • 5. How might this case impact other AI companies?
      • A) Set precedents on if scraping data to train AI is fair use
      • B) Clarify copyright limits on AI-generated content
      • C) Spur Congress to adapt IP law for AI
      • D) All of the above

Answers

    • 1. D) The suit alleges both copyright infringement for scraping news articles to train Perplexity’s AI, and trademark violations for the AI misattributing fake articles to the publishers.
    • 2. D) The plaintiffs argue Perplexity’s article scraping is not fair use because it copies full works for a commercial purpose, the AI previews replace the articles, and this harms the market for the original content.
    • 3. D) Perplexity will likely argue its use of the articles is fair because it’s transformative with a different purpose, the AI outputs differ from the articles, and the service benefits the public.
    • 4. D) The trademark claims argue the AI’s false article attributions violate the publishers’ marks, harm their reputations for accuracy, and mislead readers about the origins of the fake content.
    • 5. D) The rulings could set major precedents for the AI industry on whether scraping data to train AI is fair use, what copyright limitations apply to AI outputs, and spur Congress to adapt IP law for AI.

Also See

Blade Robber 2049? Tesla, Musk, and Warner Bros. Discovery Sued for AI Heist of ‘Blade Runner 2049’

Foul Play: Why NBA Star Victor Wembanyama Is Suing Over “El Wemby” Merch

Rocket Docket: Inside the Explosive Lawsuit Pitting Elon Musk’s SpaceX Against California Regulators

Follow LawInc on Instagram

Facebooktwitterredditpinterestlinkedinmail