WASHINGTON – Legal cases continue to unfold as AI companies are using major publishers’ data and there is huge debate on how the exclusive and curated content is used in training and AI-generated outputs.
Amid the ongoing challenges, CNN filed a copyright infringement lawsuit against AI company Perplexity AI, accusing it of using and distributing CNN’s news content without permission through its AI platform. The lawsuit, filed Thursday in a NY court, alleges that Perplexity scraped and reproduced CNN’s journalistic material, including more than 17,000 articles, along with related images, videos, and other media assets.
Several media giants have also taken similar action, arguing that AI tools are increasingly relying on copyrighted journalism without proper licensing or compensation. At the heart of the argument is how news content is being used in the age of AI. Publishers say their reporting is being repackaged and redistributed at scale, while AI firms maintain that their systems are built on publicly available information and legal data use.
CNN spokesperson said the lawsuit is about ensuring that large technology companies do not profit from journalistic work without fair payment. The statement also highlighted the costs and risks involved in producing credible reporting, arguing that professional journalism requires proper compensation when used commercially.
Perplexity AI pushed back on the claims. Company’s top eagles said Facts cannot be copyrighted, arguing that information itself is not protected under copyright law. Court filings also show that CNN previously attempted to negotiate a content licensing agreement with Perplexity last year, but talks between the two sides ultimately did not result in a deal.
New York Times earlier took ChatGPT backed OpenAI and Microsoft to court in 2023. NYT alleges that millions of its articles were used without permission to train AI models, and that tools like ChatGPT and Copilot sometimes reproduce or closely mirror its reporting. The case is still in pre-trial stages, with disputes ongoing over access to user logs and key claims such as contributory infringement. A settlement is widely expected, possibly involving licensing agreements.
At the center of these disputes are two key questions, whether training AI models on copyrighted material qualifies as fair use, and whether AI systems that reproduce or closely paraphrase news content violate copyright law.
AI companies argue that training on publicly available data is transformative and legally protected, while publishers claim it amounts to large-scale unauthorised use of journalism that undermines their revenue.
The legal outcomes of these cases are expected to reshape how AI systems are trained, pushing the industry toward more licensing-based data use rather than open scraping.
Deepfakes and Artificial Intelligence: A New Challenge for Pakistan
