Eight Regional Newspapers Sue Microsoft and OpenAI

The publishers say ChatGPT infringes on their copyright.

Mary Trimble, Grayson Logue, & Peter Gattuso / May 1, 2024

Happy Wednesday! To the Idaho man arrested Monday on alcohol charges after allegedly kicking a bison, here’s a friendly reminder that, in the game of drunk man versus wildlife, the wildlife always wins.

Quick Hits: Today’s Top Stories

Israeli Prime Minister Benjamin Netanyahu said on Tuesday that Israel would move forward with a military offensive into the southern Gaza city of Rafah “to eliminate the Hamas battalions there,” regardless of whether his government is able to strike a deal with the terror group to secure the release of hostages in exchange for a temporary truce. Netanyahu’s pledge came hours before U.S. Secretary of State Antony Blinken arrived in Israel to push a U.S.-backed deal that would free 33 Israeli hostages in the first phase in exchange for a six-week ceasefire and the release of hundreds of Palestinian prisoners. “We are determined to get a ceasefire that brings the hostages home and to get it now, and the only reason that that wouldn’t be achieved is because of Hamas,” Blinken said Wednesday morning, speaking alongside Israeli President Isaac Herzog. “There is a proposal on the table, and as we’ve said, no delays, no excuses.”
The United Kingdom’s National Health Services (NHS) plans to propose changes to its constitution that would separate single-sex wards according to “biological sex,” Health and Social Care Secretary Victoria Atkins announced Tuesday. The new measure would mean that transgender individuals would be placed in wards with people of their biological sex or in a single-patient room when possible. “The government has been clear that biological sex matters,” Atkins said. “The constitution proposal makes clear what patients can expect from NHS services in meeting their needs, including the different biological needs of the sexes.” The NHS Constitution of England was last updated in 2015 and is required to be updated at least every ten years; there will now be an eight-week review of the proposal.
House Democrats confirmed on Tuesday they would not let a measure to oust Republican House Speaker Mike Johnson succeed, effectively killing the motion to vacate led by Republican Rep. Marjorie Taylor Greene of Georgia. The motion was already unlikely to succeed—having the public support of only two other House Republicans—but Greene indicated Tuesday she will move ahead with the measure anyway. Meanwhile, Johnson clarified to reporters that he never requested House Democrats’ help, saying their statement was “the first I heard of” their support.
New York Judge Juan Merchan held former President Donald Trump in criminal contempt of court on Tuesday, fining him $9,000 and threatening the possibility of jail time. Merchan—who is overseeing Trump’s New York criminal trial—determined Trump violated a gag order restraining him from publicly commenting on witnesses and others related to his trial. Trump is also ordered to remove the nine violations—seven social media posts and two campaign website posts—by 2:15 p.m. Wednesday. Shortly after Merchan’s ruling, Trump took to social media to accuse the judge of “taking away my FREEDOM OF SPEECH” and “RIGGING THE PRESIDENTIAL OF 2024 ELECTION.”
Eight regional U.S. newspapers—owned by hedge fund Alden Global Capital—on Tuesday sued OpenAI and Microsoft for copyright infringement. The publications—which include the Chicago Tribune and New York Daily News—allege OpenAI and Microsoft’s artificial intelligence chatbots were trained on “millions” of their articles without permission. The New York Times filed a similar lawsuit against the two tech companies in December.
New York police dressed in riot gear on Wednesday evening cleared an administrative building anti-Israel students had occupied, vandalized, and blockaded overnight on Tuesday. Columbia administrators said Wednesday they invited the NYPD onto campus to “restore safety and order to our community,” and threatened students in the building with expulsion. Police arrested dozens of students, and administrators requested a police presence remain on campus until at least May 17. At UCLA, meanwhile, violent clashes between anti-Israel demonstrators and counter-protesters broke out overnight, prompting school administrators to request support from the Los Angeles police.

The New Frontier in Copyright Enforcement

The New York Times – OpenAi – Microsoft – Photo Illustration — Smartphones display OpenAI ChatGPT and Microsoft Copilot with The New York Times in the background in this photo illustration taken in Brussels, Belgium, on December 28, 2023. (Photo Illustration by Jonathan Raa/NurPhoto via Getty Images)

After a handful of newspapers on Tuesday sued Microsoft and artificial intelligence (AI) company OpenAI, alleging the companies’ chatbots surfaced their articles verbatim, we tried to get ChatGPT—OpenAI’s souped-up search engine—to reproduce one of our favorite G-Files, 2021’s “American Passover.”

Either ChatGPT is not training on intellectual property or it’s covering its tracks after landing in hot water, because this is the response it came up with: “I’m sorry for any confusion, but I can’t provide the full text of a specific column from The Dispatch, including ‘American Passover,’ as it’s a paid subscription-based publication, and providing verbatim text would go against copyright policies,” the response read. “However, I can summarize the key points or themes of the column if you’d like.” (It couldn’t, actually.)

Eight newspapers owned by Alden Global Capital, an investment firm, filed suit Tuesday against OpenAI and its financial and cloud-computing backer, Microsoft, alleging the companies violated copyright laws by training their generative AI programs on the newspapers’ content and surfacing exactly what ChatGPT claimed it doesn’t: verbatim replicas of copyrighted material. It’s just the latest in a series of suits from copyright owners concerned that OpenAI is profiting off their protected content by feeding it into large language models (LLMs), like ChatGPT, that could potentially set new limits around this technology.

OpenAI, founded in 2015 by Sam Altman, Elon Musk, and seven others, has—like any self-respecting Silicon Valley startup—been rocked by infighting in recent months. But its pioneering AI chatbot—ChatGPT, released in December 2022—has remained entirely superior to its competition despite the turmoil at the company. As we explained late last year:

The tool is a cross between a search engine—it can give you the answer to a simple question in a conversational way—and a robot that can do your homework, and its responses read eerily similar to human conversation. The generative AI can create or fix computer code, write an essay or other creative product, translate text from one language to another, or tell you what you should make for brunch for six people with dietary restrictions, among a host of other capabilities. ... Sometimes ChatGPT “hallucinates”—that is, provides information that is simply incorrect, even if it sounds convincing (a word of caution for our younger readers trying to pass AI content off as their own schoolwork). The “large language model” (LLM) that creates the responses has to be fed vast quantities of information to keep up; the currently free, publicly available ChatGPT model is only working off data from before January 2022.

Since ChatGPT launched, ushering in a generative AI boom, high-profile authors, music labels, and artists have sued the companies developing those models, arguing that training LLMs on their words, music, or images infringes on their copyrights. In December, the New York Times became the first major news publication to sue OpenAI and Microsoft, seeking “to hold them responsible for the billions of dollars in statutory and actual damages” without putting a specific dollar figure on the suit.

Eight regional newspapers—including the Chicago Tribune, Orlando Sentinel, and New York Daily News—jumped into the fray on Tuesday, alleging the OpenAI and Microsoft “purloined”—how’s that for an SAT word?—the newspapers’ copyrighted content, effectively stealing the original reporting and sharing it in a way that was more accessible than the publishers’ paid product. Likewise, the Times accused the companies of trying to “free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.”

More than just using the articles to train the model, both lawsuits allege that ChatGPT and Microsoft’s chatbot, Copilot, can and do surface nearly word-for-word copies of their articles without linking back to the original article, undercutting the news organizations’ paywalls even more explicitly than simply summarizing or repackaging facts. “Defendants are taking the Publishers’ work with impunity and are using the Publishers’ journalism to create GenAI products that undermine the Publishers’ core businesses by retransmitting ‘their content’—in some cases verbatim from the Publishers’ paywalled websites—to their readers,” lawyers for the eight papers allege in this week’s lawsuit.

OpenAI says direct replication like that is not a feature, but a bug—and one it’s trying to fix. “Memorization is a rare failure of the learning process that we are continually making progress on, but it’s more common when particular content appears more than once in training data, like if pieces of it appear on lots of different public websites,” the company said earlier this year in response to the New York Times’ lawsuit. “So we have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs.”

But the tech company has also passed some of the blame off onto its users. It took a swipe at an example from the Times lawsuit, where ChatGPT produced verbatim content in response to a hyper-specific prompt explicitly asking the bot to give the user access to an article they had been “pay-walled out of.” “We also expect our users to act responsibly,” OpenAI said. “Intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.”

Altman, the OpenAI founder, does accept that his company is using copyrighted material to train its models. “Because copyright today covers virtually every sort of human expression—including blog posts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” he wrote in December, before the Times lawsuit, in testimony before the United Kingdom’s House of Lords. “Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

So how does OpenAI justify using this material without paying for it? There are exceptions to copyright restrictions under so-called “fair use” principles, which test a handful of factors, like the amount of content copied, whether it’s being copied to generate a profit and deprives the copyright owner of their chance to profit from the content, and whether its use is transformative—in other words, sufficiently unlike the source material. “OpenAI and the other defendants in these lawsuits will ultimately prevail, because no one—not even the New York Times—gets to monopolize facts or the rules of language,” lawyers for the tech companies argued in a February filing in the Times case.

OpenAI has asserted training LLMs is a fair use case in part because it holds its product is generating unique content. OpenAI and Microsoft programmers would say their models “are not slavishly copying verbatim passages from existing content,” said Michael Rosen, a trial attorney specializing in intellectual property issues in tech. “What they’re doing instead is using that information that’s out there to train their machines to be able to generate something that is genuinely transformative and new and different from what came before it.” OpenAI has said that publications can opt out of having their content scraped, as it claims the New York Times did in 2023. Hundreds of news sites have installed blockers to keep the models from collecting their data.

In a potentially inconvenient set of facts, OpenAI is simultaneously pursuing agreements with various publications and publishers to pay to use their content. On Monday, just a day before the eight papers filed suit, OpenAI announced a licensing agreement with the Financial Times that will allow ChatGPT users “to see select attributed summaries, quotes and rich links to FT journalism in response to relevant queries.” The company has also reached licensing agreements with the Associated Press to use part of its archive to train its LLM and with Axel Springer—the German company that owns both Business Insider and Politico—allowing ChatGPT to use its content for both training and outputs of the program. When the Times sued, it was in talks with OpenAI to license the use of some of its content—obviously, those discussions weren’t going so well.

That parallel track could undercut OpenAI’s fair use argument. “If you’re a jury or judge looking at this and you see that OpenAI has reached agreements with certain companies but has not reached agreements with others,” Rosen told TMD, “one could draw the conclusion that, ‘Yeah, why isn’t OpenAI just taking a license in this case?’”

While such licensing agreements may satisfy copyright holders, they could raise barriers to entry for smaller companies trying to develop their own models. “As someone who’s supportive of AI development, I’m happy to see that they’re working on some resolution everybody can be happy about,” Michael Frank—CEO and founder of Seldon Strategies, a macro and geopolitical advisory firm built on AI—told TMD. “And on the other hand, as a leader of a startup and thinking about all the small players in this ecosystem … I know there are companies that are trying to get into the model development game, and they’re not going to afford all these deals that are being negotiated.”

Rosen believes these cases will end in a settlement rather than going to trial—at least in part because the discovery process and accompanying negative publicity could be uncomfortable for OpenAI and Microsoft. But that doesn’t mean cases like this will stop. “I do think that we’re seeing the beginnings of the first wave of these types of intellectual property lawsuits that are deeply getting into the weeds of how generative AI operates,” Rosen said. “And if not this case, or the previous ones, we will start to see a jurisprudence emerge over how to handle these really unique challenges. And that jurisprudence will include some sort of philosophical approach that the courts are comfortable with as to how we want to treat AI as a tool—whether it is an automaton or autonomous or a little bit of both.”

Worth Your Time

“Donald Trump thinks he’s identified a crucial mistake of his first term: He was too nice,” Time Magazine’s Eric Cortellessa wrote of his recent interview with the former president, in which Trump reveals what he wants a second term to look like. “Trump has sought to recast an insurrectionist riot as an act of patriotism,” Cortellessa wrote. “‘I call them the J-6 patriots,’ he says. When I ask whether he would consider pardoning every one of them, he says, ‘Yes, absolutely.’ Trump has also vowed to appoint a ‘real special prosecutor’ to go after Biden. ‘I wouldn’t want to hurt Biden,’ he tells me. ‘I have too much respect for the office.’ Seconds later, though, he suggests Biden’s fate may be tied to an upcoming Supreme Court ruling on whether Presidents can face criminal prosecution for acts committed in office. ‘If they said that a President doesn’t get immunity,’ says Trump, ‘then Biden, I am sure, will be prosecuted for all of his crimes.’” Perhaps most concerningly, Cortellessa writes: “Trump does not dismiss the possibility of political violence around the election. ‘If we don’t win, you know, it depends,’ he tells TIME.”

Presented Without Comment

At a press conference in front of the occupied academic building at Columbia University:

Reporter: “Why should the university be obligated to provide food to people who've taken over a building?”

Student protester: “To allow it to be brought in. Well, I mean, I guess it's ultimately a question of what kind of community and obligation Columbia feels it has to its students. Do you want students to die of dehydration and starvation or get severely ill even if they disagree with you? If the answer is no, then you should allow basic—I mean it's crazy to say because we are on an Ivy League campus, but this is like basic humanitarian aid we’re asking for. Like, could people please have a glass of water?”

Reporter: “But they did put themselves in that, very deliberately in that situation, in that position, so it seems like you’re saying, ‘We want to be revolutionaries, we want to take over the building, now would you please bring us some food and water.’”

In the Zeitgeist

Matt “Tugboat” Wilkinson—a 10th-round draft pick from Central Arizona College—is taking the minor league baseball world by storm, pitching six hitless innings in a game last week that dropped his earned run average for the season down to a sterling 0.44. Tug, as his friends call him, has only given up six hits in his first four starts.

Toeing the Company Line

How will the TikTok ban play out? What’s going on with the Methodists? Is the RNC’s ground game in trouble? Will and Kevin were joined by Alex Reinauer, Mark Tooley, Drucker, and John to discuss all that and more on last night’s Dispatch Live (🔒). Members who missed the conversation can catch a rerun—either video or audio-only—by clicking here.
In the newsletters: Nick wondered (🔒) whether the campus protests will ultimately serve to put Trump back in the White House.
On the podcasts: Jonah and Sarah discuss subtweets and Trump’s immunity case on The Remnant, while Jonah joins the Road to Now podcast for an episode Dispatch listeners can also check out on The Skiff (🔒).
On the site: Alison Somin explores how competitive K-12 schools are still getting away with race-based admission policies and Jonah argues that today’s college protests are the product of 1960s radicalism nostalgia.

Let Us Know

Is it fair use to train LLMs on copyright material?

Mary Trimble

Mary Trimble is a former editor of The Morning Dispatch.

Grayson Logue

Grayson Logue is a staff writer for The Dispatch and is based in Philadelphia, Pennsylvania. Prior to joining the company in 2023, he worked in political risk consulting, helping advise Fortune 50 companies. He was also an assistant editor at Providence Magazine and is a graduate student at the University of Edinburgh, pursuing a Master’s degree in history. When Grayson is not writing pieces for the website, he is probably working hard to reduce the number of balls he loses on the golf course.

Peter Gattuso

Peter Gattuso is a fact check reporter for The Dispatch, based in Washington, D.C. Prior to joining the company in 2024, he interned at The Dispatch, National Review, the Cato Institute, and the Competitive Enterprise Institute. When Peter is not fact-checking, he is probably watching baseball, listening to music on vinyl records, or discussing the Jones Act.

Please note that we at The Dispatch hold ourselves, our work, and our commenters to a higher standard than other places on the internet. We welcome comments that foster genuine debate or discussion—including comments critical of us or our work—but responses that include ad hominem attacks on fellow Dispatch members or are intended to stoke fear and anger may be moderated.

With your membership, you only have the ability to comment on The Morning Dispatch articles. Consider upgrading to join the conversation everywhere.