New York Times sues Microsoft and OpenAI for impacting its business, claims generative AI models don't qualify for fair use

Satya Nadella with Sam Altman at a conference
It looks like Satya Nadella and Sam Altman have more battles they have to face together. (Image credit: Bullfrag)

What you need to know

  • Microsoft and OpenAI partner on the leading AI chatbot in ChatGPT.
  • These AI Models train themselves by scraping the internet for content, often paraphrasing or directly quoting sources without compensation.
  • The question of whether AI models fall into fair use is already being investigated by government regulators, but the NYTimes have taken the issue to the courts.
  • If the courts side with the NYTimes, it could be a huge blow to all AI models.

The New York Times has announced it is suing OpenAI and Microsoft over AI use of copyrighted work. Breaking the cardinal rule of a news outlet of never allowing yourself to become the story, the New York Times has decided to stand up to Microsoft and OpenAI in the hopes of helping their bottom line. 

In the lawsuit filed by the New York Times, the plaintiff discusses how the use of New York Times copyrighted material by Microsoft and OpenAI is monetarily impacting the New York Times. While I personally agree that the issue at hand here needs to be addressed, the fact that I can't actually read the New York Times coverage on this lawsuit on their own website because I would have to subscribe to read it is likely a larger reason for their declining revenue. 

The real story here is the fight for copyright protections and fair compensation for content creators, whether they be journalists, artists, or storytellers. This lawsuit, if it doesn't end in a settlement, has the chance to make it all the way to the Supreme Court and define the future of AI in the world. 

Why is the New York Times suing Microsoft and OpenAI?

The New York Times requires a subscription to access articles on the site.  (Image credit: NewYorkTimes)

The topic of the lawsuit is not something new and has been a simmering pot ready to boil over for a while now. We recently discussed how, in response to regulators looking into AI and fair use, Microsoft believes that end-users should be responsible for copyright infringement instead of the companies. The New York Times' suit, though, is slightly different, and they state that the actual scraping of the information by Microsoft and ChatGPT should be stopped as that information is not being purchased from the copyright holder. 

While Defendants engaged in widescale copying from many sources, they gave Times content particular emphasis when building their LLMs—revealing a preference that recognizes the value of those works. Through Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.

New York Times via lawsuit

The New York Times also stated that the company has been trying to discuss a resolution with Microsoft and OpenAI, but it has not come to a resolution. The New York Times blocked OpenAI's web crawler back in August, per the Verge, and other companies seem to be following suit, but this doesn't solve the issue of the "millions of articles" OpenAI has already used to train ChatGPT on. 

The New York Times is asking for court assistance in forcing Microsoft and OpenAI to remove the New York Times' copyrighted material from the AI models datasets. The below bullets are pulled from the "Prayer for Relief" in the lawsuit.

  • WHEREFORE, The Times demands judgment against each Defendant as follows:
  • 1. Awarding The Times statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity; Case 1:23-cv-11195 Document 1 Filed 12/27/23 Page 67 of 69 68
  • 2. Permanently enjoining Defendants from the unlawful, unfair, and infringing conduct alleged herein;
  • 3. Ordering destruction under 17 U.S.C. § 503(b) of all GPT or other LLM models and training sets that incorporate Times Works;
  • 4. An award of costs, expenses, and attorneys’ fees as permitted by law; and
  • 5. Such other or further relief as the Court may deem appropriate, just, and equitable. 

Does AI fall under fair use?

This is the real question at the heart of this lawsuit. As previously stated, this issue is already being looked into by the U.S. Copyright Office. However, it seems, likely due to the worsening economy, that the New York Times doesn't feel that they can wait for a bureaucratic resolution to the issue; rather, they want to ask the courts for a swifter response. 

Any content creator on the web right now, especially writers, feels the pain being caused by the advent of AI. If the AI bots were sequestered to only function on their perspective websites or applications, it wouldn't be such a big issue, but with Copilot integrated into Bing and now Google testing AI results for searches, many news and publication sites on the internet are seeing decreased traffic. Searches using AI-generated results are really just "paraphrasing with style," as Toy Story's Woody might say, rather than using any actual intelligence. 

The biggest issue facing the New York Times though is their choice of opponent. Satya Nadella is kind of pitching a perfect game with the recent fight against multiple world governmental regulators with the closure of the Activision Blizzard King acquisition to fighting off an internal coup at OpenAI, which led to the ousting of Sam Altman until Satya Nadella got involved. 

If this lawsuit goes the distance and does end up setting a precedent or forces the U.S. Copyright Office to accelerate its decision on fair use with regards to AI, ultimately leading to the decision that companies are required to compensate the source content creator, it will forever change the future of AI.

What do you think about the New York Times suing Microsoft and OpenAI? Do you think the Times has a legitimate case? Let us know in the comments. 

Colton Stradling
Contributor

Colton is a seasoned cybersecurity professional that wants to share his love of technology with the Windows Central audience. When he isn’t assisting in defending companies from the newest zero-days or sharing his thoughts through his articles, he loves to spend time with his family and play video games on PC and Xbox. Colton focuses on buying guides, PCs, and devices and is always happy to have a conversation about emerging tech and gaming news. 

  • GraniteStateColin
    Interesting subject. In general, I strongly favor enforcing copyrights and, when in doubt or when the law is murky, err in favor of the creator over those using the creator's work for rights. At the same time, I generally frown on large payments for damages, which plaintiffs and their attorneys tend to inflate beyond all reason.

    I do think there's an irony here with the NYT in particular, the paper who fired its chief editor for allowing a conservative editorial (by a sitting Senator, no less) with nothing resembling parallel behavior for leftist editorials even by irreputable sources, per NYT's own admission (under the contradictory logic that it's good to hear from contrarian voices). For the NYT to now claim to have superior content (and for MS to treat it as such, if that accusation is true), seems comical.

    The NYT does do news (and occasionally some good investigative reporting), but if there is any political angle, even a secondary one, the news and facts are heavily subordinated to the "narrative" that fits their far left political views.

    Still, between my disdain for the NYT's bias and my support of copyright, for me, the copyright protection is more important. Whether we agree or disagree with someone, we should all want their rights to their own words and thoughts protected, just as we would want done with our own. IF (don't know the facts here) MS is using NYT writing without compensating the NYT to build the text composition for its chat responses, then the NYT is correct in demanding recompense. However, if it's just using the facts as others do who summarize or re-report the news, then that is not a copyright infringement.

    An interesting case for sure. Another question worthy of debate on this: is this something that the courts should settle under existing copyright law, or is AI sufficiently unique it its application here that new legislation is needed, in which case, what should that legislation be?
    Reply
  • fjtorres5591
    At least the NYT is honest in saying the lawsuit is a desperation money grab.
    That it took until August to block the web crawler is either an indication of implied consent or technological cluelessness. Neither will be helpful against a good legal team.

    Also noteworthy is their filing being in Manhattan, with its history of pro-publishing bias, rather than California or Settle, homes of OpenAI and Microsoft respectively.

    Expect round one to be a fight over venue.
    Reply
  • fdruid
    This is pretty pathetic to read. Someone wants to make an example out of this.
    But you just can't stop the future. We're not going back to buying paper news, nor paying for individual news outlets.
    Reply
  • larakurst
    Because of scale of these lawsuits, and who they're suing, and the way the tool has been so rapidly integrated into everything, I think that the more of these lawsuits to get added on increases the likelihood that it will be found to not be copyright infringements first of all, or they'll just have to make some minor modification to the tool and not pay anything, but they're not going to have to pay anything regardless. Or it'll be pocket change.
    Reply
  • wojtek
    fdruid said:
    This is pretty pathetic to read. Someone wants to make an example out of this.
    But you just can't stop the future. We're not going back to buying paper news, nor paying for individual news outlets.
    You are aware that without this 'awful newspapers' (which provide a bulk of training material) the AI would be still quite stupid and wouldn't be able to "write" such article? Still, even now a lot of outlets outther uses AI and it's just painful to read.

    Is sad that some are such blidaifhthed and hooked in yet another tech bandwagon that they want to ignore everything and prise their new "gold idol" without even batting an eyelash...

    I'm annoyed by all the big corp that crawl and scrap the Web and then make bazzilions dollars withoufh creating proper content (I'm looking at you Google) and then claiming "everyone could see it so we just stole it"... ffs, how deprived of any morals you would have to be?
    Reply
  • fjtorres5591
    wojtek said:
    You are aware that without this 'awful newspapers' (which provide a bulk of training material) the AI would be still quite stupid and wouldn't be able to "write" such article? Still, even now a lot of outlets outther uses AI and it's just painful to read.

    Is sad that some are such blidaifhthed and hooked in yet another tech bandwagon that they want to ignore everything and prise their new "gold idol" without even batting an eyelash...

    I'm annoyed by all the big corp that crawl and scrap the Web and then make bazzilions dollars withoufh creating proper content (I'm looking at you Google) and then claiming "everyone could see it so we just stole it"... ffs, how deprived of any morals you would have to be?
    Look to the law and precedent.
    The Author's Guild vs Google found transformative use to be fair.
    Web crawlers through publicly available websites is legal globally, by fair use in the US, by politician "permission" in the EU.

    And most damning to the AI training handwringers, the Fair Use tests:

    To determine whether a proposed use is a fair use, you must consider the following four factors:

    Purpose: The purpose and character of the use, including whether such use is of a commercial nature, or is for nonprofit education purposes.
    Nature: The nature of the copyrighted work.
    Amount: The amount and substantiality of the portion used in relation to the copyrighted work as a whole.
    Effect: The effect of the use upon the potential market for, or value of, the copyrighted work.By precedent the tests resolve into two main tests: transformation and substitution. The first is about how the allegedly infringing product compares to the allegedly copied product. The second is about whether the alleged infringing product *itself* can substitute significantly for the copyrighted product itself in the market.

    Note that, first, the NYT is suing for the training of LLM models, not the output of the models, which is what might have market impact. LLM models themselves aren't sold in liue of newspapers. That will have to factor in the substitution question.

    Second, the damage they attribute to LLM model training, has been ongoing for decades, long before LLMs hit the mainstream, and in recent times particularly, their revenues have declined from reduced ad spending. That will have to factor into the claim of harm from the training of LLM models.

    The NYT has spun one narrative that paints them as victims of a new tech.
    A closer look at the facts suggests a different narrative, that they are operating a declining business with a product with a declining market and they are dsperately looking for a payout from a different business to minimize their ongoing losses.

    This lawsuit strongly resembles the Google lawsuit that was ruled fair use in that it is about "scanning" content to create a technical product. Google scanned to create a database of citations to index the web, LLM training "scans" for language use and data to create a model of how humans perceive and interact with both.

    In similar recent lawsuits against LLM training the judges have so far been skeptical of the plaintiff claims, dejanding specific examples of infringement.

    The courts will decide but I doubt the Vegas odds makers will be favoring the NYT.

    Those that don't like the current legal framework are free to petition the Congress for changes.
    Reply
  • wojtek
    fjtorres5591 said:
    Look to the law and precedent.
    Fortunately I'm not from a place that runs on precedent law.... phew.
    fjtorres5591 said:
    Web crawlers through publicly available websites is legal globally, by fair use in the US, by politician "permission" in the EU.
    Erm... in a say way that any attempt at pushback at extortionist practices by our beloved Internet monopolies i.e. Canada's and Australia's publishers attempt at blocking google resulted in google removing them from the results... if only we weren't run by monopolies that built their position on stealing other works I wonder... but I guess it's all good and dandy as google is modern and publishers are all and musty and dying so it's ok to steal... clappity-clap :D
    Reply
  • fjtorres5591
    wojtek said:
    Fortunately I'm not from a place that runs on precedent law.... phew.

    Erm... in a say way that any attempt at pushback at extortionist practices by our beloved Internet monopolies i.e. Canada's and Australia's publishers attempt at blocking google resulted in google removing them from the results... if only we weren't run by monopolies that built their position on stealing other works I wonder... but I guess it's all good and dandy as google is modern and publishers are all and musty and dying so it's ok to steal... clappity-clap :D
    What exactly is stolen by LLM?
    Can you unsubscribe from the NYT and get a news feed from ChatGPT?

    This isn't a case of NYT whining about a competitor in their business taking away their customers (remember Tom Hanks' YOU'VE GOT MAIL) but about losing customers for whatever reason and trying to blame a random business somewhere else. The court may very well demand specific examples that don't involve the LLM model invoking a browser.

    As to google, their business is built on indexing what is *freely* avaiable online and *sending* traffic to those sites.

    Google has many sins to answer for, but the search engine itself isn't one. Paying billions to phone vendors to block alternatives? They're in court to answer for that.

    Bear in mind that not only are crawlers legal, there is also a long established mechanism to prevent your "precious" from being visited by crawlers. It is called robots.txt. Look it up. In the NYT article it explicitly says they *didn't* block crawlers before august. So, either they had no objection to being visited or were operating a web site without understanding the basics. Neither is an excuse to be *retroactively* demanding a toll tax.

    Again, the courts will speak soon enough.
    Reply
  • fjtorres5591
    larakurst said:
    Because of scale of these lawsuits, and who they're suing, and the way the tool has been so rapidly integrated into everything, I think that the more of these lawsuits to get added on increases the likelihood that it will be found to not be copyright infringements first of all, or they'll just have to make some minor modification to the tool and not pay anything, but they're not going to have to pay anything regardless. Or it'll be pocket change.
    Apple, who is yet again late to the party, is reported to be talking to the NYC glass tower publishers to pay them to train a model off their books. The offer? $50M.

    "Go 'way kid, don' bother me." -- Foghorn Leghorn.

    That is about what the NYT might aspire to at best.
    Not going to save them even in the unlikely case their friendly judge ignores precedent.
    Reply
  • GraniteStateColin
    @fjtorres5591 , @wojtek , @larakurst , @fdruid , interesting points and discussion above. Two points:

    1. A judicial review of the law generally does not (and should never) care how many people are using something to determine if it's legal or not. Granted, 100M users may warrant more careful consideration than 100 users, but ultimately assessing who is protected and who is not should not be a function of the number of people on either side of that scale. Therefore, suggesting that "this is the future" or "this is good for XXXXX" should be irrelevant. It's either legal or it's not. (I would acknowledge, that there are some partial exceptions to this in, for example, the realm of mandatory patent licensing to cell phone manufacturers, but the patent owners are still paid for those licenses, more like the legal logic behind eminent domain.)

    2. It is very possible that the law does need to be rewritten for AI use of copyrighted material. It is a fundamentally different consideration. I think the courts will hold that copyright holders have rights over their work. But in the past, where an explanatory summary (as opposed to a straight summary with sampling, like for Cliff Notes who do pay for rights to the original works) on the work of others has never been considered copyright infringement, the ability of AI to instantly rewrite something in entirely new words and pull in data from multiple sources would effectively render copyrights on nonfiction irrelevant, or, at best, effectively require nonfiction writers to use a peculiar style so customers are buying for the writer's personality rather than the content. Traditionally, this usage of prior works would have been considered standard research and would not have been copyright infringement, but in a world where AI can do that instantly and rewrite the work, the original people who put in the effort are prevented from monetizing their work product, which is the core reason for copyright law in the first place.

    I don't think there is a simple answer to this, though hopefully we end up at a place where most of us can agree that the solution is fair in its protection of the intellectual property rights of the creators without merely raising new hurdles to productivity or holding back the useful advances in technology. One proposal I've heard (from Rick Beato, mostly on the music side of AI), is that it should not be possible to monetize AI creations -- use them if you want, but you can't then sell the product, which reduces the incentive to steal. Not sure how this plays into search engine advertising monetization (and recall the damage Napster did with free sharing of music), but it seems like a good foundation or starting point for driving the thought around this.
    Reply