A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times' suit goes well beyond that to show how the material ingested during training can come back out during use. "Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples," the suit alleges.

The suit alleges—and we were able to verify—that it's comically easy to get GPT-powered systems to offer up content that is normally protected by the Times' paywall. The suit shows a number of examples of GPT-4 reproducing large sections of articles nearly verbatim.

The suit includes screenshots of ChatGPT being given the title of a piece at The New York Times and asked for the first paragraph, which it delivers. Getting the ensuing text is apparently as simple as repeatedly asking for the next paragraph.

The suit is dismissive of attempts to justify this as a form of fair use. "Publicly, Defendants insist that their conduct is protected as 'fair use' because their unlicensed use of copyrighted content to train GenAI models serves a new 'transformative' purpose," the suit notes. "But there is nothing 'transformative' about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it."

The suit seeks nothing less than the erasure of both any GPT instances that the parties have trained using material from the Times, as well as the destruction of the datasets that were used for the training. It also asks for a permanent injunction to prevent similar conduct in the future. The Times also wants money, lots and lots of money: "statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity."

  • daisy
    ·
    6 months ago

    The more I think about this, the more I wonder if it's all an elaborate play by the media companies to get the tech companies to buy them out. The tech companies have ridiculously huge cash reserves, and media companies' stocks aren't nearly as valuable as people think. For example, the New York Times has a market cap of $8 billion USD, and made a profit of $90 million USD in their July/August/September 2023 quarter. Apple made $23 billion USD in profit in that same quarter, has a market cap of $3 trillion USD, and has cash reserves that would make Scrooge McDuck envious.

    Imagine if all these legal fights over AI scraping are the media industry's way to say to the tech companies "Hey, the data we have the rights to is incredibly valuable to your AI work. We could tie you up in court for years, setting you well behind your competitors. Wanna make a bid?"

    • FunkyStuff [he/him]
      ·
      6 months ago

      That's totally valid, but what about the Disneys, the Universals, and the Sonys? Not all media companies are made equal, and there's a lot of inertia behind those giants despite the falling rate of profit.

      • drhead [he/him]
        ·
        6 months ago

        Have you SEEN what Disney has been making lately? They'd gladly pivot to AI slop the second it matches their declining quality standards.

      • daisy
        ·
        6 months ago

        Of course it's just an idea. It's probably also a plan that would appeal more to print media companies that have doubts about long-term profitability and stand to lose a lot from text-generation AIs.