Regulating Frontier Models in AI

Diving deep on a new proposal in California.

Published May 9, 2024

Welcome back to Techne! This week I am tearing through God, Human, Animal, Machine by Meghan O’Gieblyn on the recommendation of subscriber Michael. The book is a series of essays that explores what it means to be human in the digital age.

Notes and Quotes

I missed this when it happened a couple of weeks back, but the 5th Circuit Court of Appeals ruled that Texas can continue to enforce a law mandating that online porn sites verify the ages of their visitors.
TikTok is suing the federal government over a recent bill that would force the company’s divestiture or face a ban. My Morning Dispatch colleagues had much more on the lawsuit in their newsletter earlier today.
A Philippine court banned golden rice late last month, prohibiting the type of rice that has been genetically modified to produce beta carotene—a deficiency of which leads to blindness in about 500,000 children a year worldwide, including thousands in the Philippines.* The decision was made following a lawsuit from Greenpeace.
The Washington Post reported last week on what happened when Raymond Dolphin, an assistant principal of a middle school in Connecticut, got rid of cell phones in the school. The school is louder because students are in conversation with each other and “the angsty intensity kids are living under” has diminished.
Florida Gov. Ron DeSantis signed a bill last week banning lab-grown meat in the state. I’m not sure I understand why people would support this bill.
Tennessee Gov. Bill Lee signed HB 1891 into law earlier this month, enacting the legislation that, starting next year, will require social media companies to obtain permission from known minors’ parents before they can create an account. The Alaska House just passed a similar bill, as did the Oklahoma House.
The United States has a wild mismatch between politicians and the citizenry in terms of age, according to a new paper from Adam Bonica and Jacob Grumbach. “Despite being among the youngest by median age of population,” the paper reads, “the U.S. has the oldest legislators of any [Organisation for Economic Co-operation and Development] member nation.”

A New Proposed AI Regulation in California

California is the undisputed champion of state data regulation. In 2018, Gov. Jerry Brown signed the California Consumer Privacy Act (CCPA), making California the first state with a comprehensive data privacy law. CCPA has since been followed by the California Privacy Rights Act ballot measure in 2020, at least nine revisions to fix definitional problems in those two laws, and a raft of other supplementary laws to regulate data in the state.

It’s no surprise, then, that 31 bills that would regulate artificial intelligence (AI) systems are currently before California’s state legislature. SB 1047—the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act—seems increasingly likely to pass. SB 1047 is authored by Democratic Sen. Scott Wiener, the darling of the pro-housing movement; has garnered the support of online writers Zvi Mowshowitz and Scott Alexander; and has at times hovered around a 50 percent passage rate in the futures markets. This bill could very well become law.

In advocating for the legislation, Sen. Wiener said that it would establish “clear, predictable, common-sense safety standards for developers of the largest and most powerful AI systems.” But there is nothing about this bill that is common sense to me. It is an extensive safety compliance regime that accords serious power to a new agency and has countless gaps. Artificial intelligence (AI) safety advocates have been dramatically underplaying how extensive these requirements would be, and as I’ve pointed out earlier, there has been effectively no discussion about the dubious constitutionality of the bill.

If enacted, SB 1047 would regulate the next generation of advanced models or frontier models. When large models such as ChatGPT or Meta’s LLaMA hit a certain computing threshold, they would be designated as a “covered AI model” in California and be subject to a litany of requirements, including safety assessment mandates, third-party model testing, shutdown capability, certification, and safety incident reporting.

Covered AI models under SB 1047 are partially defined by the amount of computing power needed to train the model. The industry typically couches AI models in petaFLOPS, which are 10^15 floating-point operations. OpenAI’s GPT-4 is estimated to have taken 21 billion petaFLOPS to train, while Google’s Gemini Ultra probably took 50 billion petaFLOPs. Similar to the standard set by President Joe Biden’s executive order on AI, SB 1047 would apply to models with greater than 10^26 floating-point operations, which amounts to 100 billion petaFLOPS. So the current frontier models are just below the covered AI model threshold, but the next generation of models—including GPT-5—should probably hit that regulation mark.

But SB 1047 goes further than Biden’s executive order because it also captures any model “trained using a quantity of computing power sufficiently large that it could reasonably be expected to have similar or greater performance … using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.” Like so much else in the bill, the language here is convoluted. A plain letter reading of the proposed law suggests that once the threshold of 100 billion petaFLOPS is met, models that match or best those regulated models in common benchmarks like ARC or HellaSwag would also be subject to regulation, creating a cascading downward effect. Microsoft, for example, has been working on small-language models that achieve remarkable performance on a variety of benchmarks, which are likely to get caught up in the regulatory scheme.

If SB 1047 is enacted, before a covered AI model is even trained, the developer of the model would be required to:

“Implement administrative, technical, and physical cybersecurity protections to prevent unauthorized access to, or misuse or unsafe modification of, the covered model, including to prevent theft, misappropriation, malicious use, or inadvertent release or escape of the model weights from the developer’s custody, that are appropriate in light of the risks associated with the covered model, including from advanced persistent threats or other sophisticated actors;”
Build in a killswitch;
Implement all covered guidance by the newly created Frontier Model Division;
Implement a detailed safety and security protocol that is certified by the company;
Conduct an annual review of the protocol “to account for any changes to the capabilities of the covered model and industry best practices and, if necessary, make modifications to the policy;”
“Refrain from initiating training of a covered model if there remains an unreasonable risk that an individual, or the covered model itself, may be able to use the hazardous capabilities of the covered model, or a derivative model based on it, to cause a critical harm;” and finally
“Implement other measures that are reasonably necessary, including in light of applicable guidance from the Frontier Model Division, National Institute of Standards and Technology, and standard-setting organizations, to prevent the development or exercise of hazardous capabilities or to manage the risks arising from them.”

Then, before the model goes public, developers of covered AI models would have to perform capability testing, implement “reasonable safeguards,” prevent people from using “a derivative model to cause a critical harm,” and refrain from deploying “a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model.”

To top it all off, developers of covered AI models would be required to implement “other measures that are reasonably necessary, including in light of applicable guidance from the Frontier Model Division, National Institute of Standards and Technology, and standard-setting organizations, to prevent the development or exercise of hazardous capabilities or to manage the risks arising from them.”

Covered AI models could get a “limited duty exemption” from all of the requirements above if they could demonstrate that the model could not be used to enable:

“A chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties;”

“At least five hundred million dollars ($500,000,000) of damage through cyberattacks on critical infrastructure via a single incident or multiple related incidents;”
“At least five hundred million dollars ($500,000,000) of damage by an artificial intelligence model that autonomously engages in conduct that would violate the Penal Code if undertaken by a human;” or
“Other threats to public safety and security that are of comparable severity to the harms described in paragraphs (A) to (C), inclusive.”

Whether exempt or not, all covered AI models would have to report incidents to the Frontier Model Division, a new regulatory body created by the law that would set safety standards and broadly administer the law. Additionally, SB 1047 would impose a “know your customer” requirement onto computing clusters, a regime that was designed for anti-money laundering and terrorism enforcement. Intended to guard against malicious intent, this provision of the law would mandate that organizations operating computing clusters obtain identifying information from prospective customers who might make a covered AI model.

The penalties for developers who violate the legislation’s provisions are high: a fine of 10 percent of the cost of training the model for the first violation, and 30 percent of the model’s cost for every violation after that. The bill would also give California’s attorney general the ability to ask a judge to delete the model. Then, to top it all off, the bill would establish a new, publicly funded cloud computing cluster called “CalCompute.”

The constitutionality and gaps in SB 1047.

As I have explained elsewhere, the discourse over AI regulation “has largely been bereft of legal analysis.” I focused primarily in that piece on the constitutionality of pausing AI development, but the analysis could just as easily be extended to most of SB 1047. AI bills run right into issues of constitutionality and the First Amendment:

As [John Villasenor of the Brookings Institution] explained it, “to the extent that a company is able to build a large dataset in a manner that avoids any copyright law or contract violations, there is a good (though untested) argument that the First Amendment confers a right to use that data to train a large AI model.”

The idea is untested because the issue has never been formally ruled on by the Supreme Court. Instead, it’s been the lower courts that have held that software is a kind of speech. All of the modern cases stem from the cryptography wars of the early to mid-1990s. Bernstein v. United States stands as the benchmark. Mathematician, cryptologist, and computer scientist Daniel Bernstein brought the cases, which contested U.S. export controls on cryptographic software. The Ninth Circuit Court recognized software code as a form of speech and stuck down the law.

Junger v. Daley also suggests that software is speech. Peter Junger, a professor specializing in computer law at Case Western Reserve University, initiated the legal challenge due to concerns over his self-created encryption programs. Junger intended to publish these programs on his website but worried about potential legal risks so he sued. Initially, a District Court judge determined that encryption software lacked the expressive content needed for First Amendment protections. On appeal, the Sixth Circuit Court was clear: “Because computer source code is an expressive means for the exchange of information and ideas about computer programming, we hold that it is protected by the First Amendment.”

SB 1047 might also conflict with federal law. As Kevin Bankston, a law professor and well-respected expert in technology law, pointed out, the “know your customer” provisions of SB 1047 “appear to violate the federal Stored Communications Act.” He’s not the only one to express such concerns, but broadly, the government has to get a subpoena or court order for that kind of information.

Beyond the serious constitutional and federal concerns, there are countless holes in the bill. The definition of covered AI models is unclear. A lot of power is granted to the Frontier Model Division to define the rules. The bill is riddled with reasonableness standards, which are notoriously tricky to define in the law. The penalties are remarkably steep, and the legality of a court-ordered deletion is dubious. Oh, and all of this is funded by fees assessed on covered AI models that will also help fund CalCompute. But the details of CalCompute aren’t at all fleshed out.

The bigger picture.

Stepping back for a second, all of this feels like too much too quickly. ChatGPT is not even two years old, and yet there are already significant and wide-ranging bills being proposed to rein it in. But that’s by design. In making the case for the bill, Wiener argued that:

[The rise of large-scale AI systems] gives us an opportunity to apply hard lessons learned over the last decade, as we’ve seen the consequences of allowing the unchecked growth of new technology without evaluating, understanding, or mitigating the risks. SB 1047 does just that, by developing responsible, appropriate guardrails around development of the biggest, most high-impact AI systems to ensure they are used to improve Californians’ lives, without compromising safety or security.

I’ve heard this argument before, and I think it is profoundly mistaken. Big Tech companies aren’t saints, but if there is a lesson to be learned from the past decade in tech policy, it is that these rules impose serious costs on the tech ecosystem, so policymakers should be careful how they match legislative solutions to clearly established harms. I wrote all about this in a previous edition of Techne on privacy laws, and it applies to AI regulation too.

If Sen. Wiener gave me the pen, I would probably scrap most of this bill. Besides, a lot of it is probably illegal under the First Amendment anyway. So I tend to agree with Dean Ball (a Dispatch contributor!) when he opined that:

SB 1047 was written, near as I can tell, to satisfy the concerns of a small group of people who believe widespread diffusion of AI constitutes an existential risk to humanity. It contains references to hypothetical models that autonomously engage in illegal activity causing tens of millions in damage and model weights that “escape” from data centers—the stuff of science fiction, codified in law.

Policymakers need to ensure that their legislative solutions are precisely tailored to address clearly defined problems, rather than imposing broad requirements because those mandates can reverberate back into the industry with distorting effects. In the coming months, I’ll dive further into this topic and explain what should be done to assuage AI doomsayers who worry about catastrophic and existential risk.

Until next week,

🚀 Will

AI Roundup

NOTE: This is a new section I’m testing out since there is so much happening in AI.

I really liked this op-ed from my former colleague Taylor Barkley, which argued that “policymakers should not reference or rely on fictional scenarios as reasons to regulate AI.”
Apparently, Meta has spent almost as much buying computing power in the last five years as the Manhattan Project cost in total.
Andrew Ng explains where large-language models (LLMs) are headed: “An LLM wrapped in an agentic workflow may produce higher-quality output than it can generate directly.”
“Recruiters are going analog to fight the AI application overload.”
A quick read on “Deterministic Quoting,” which is a “technique that ensures quotations from source material are verbatim, not hallucinated.”
Hundreds of nurses protested the implementation of sloppy AI into hospital systems.

Research and Reports

This paper from Ariell Reshef and Caitlin Slattery has got me thinking. Apparently, in the 20 years between 1970 and 1990, the employment share of legal services in the United States more than doubled, and the wages doubled as well. This was “in stark contrast to stability during 1850–1970 and in 1990–2015.” So what changed? “Using historical data, we observe a tight correlation between the employment and compensation of lawyers and the scope of, and uncertainty created by, federal regulations and legislation.” A back-of-the-envelope “calculation implies that 42% of income to lawyers and partners are in excess of what these payments would be if relative income remained at 1970 levels. This represents an excess cost of $104 billion dollars in 2015 alone.”

Smartphones were banned in middle schools across Norway, but at varying rates around the country. A new paper by Sara Abrahamsson analyzes the impact:

Grades improved, going up by 0.08 of a standard deviation (SD) for girls, while boys experienced no effect of the treatment;
Bullying fell by 0.42 of an SD for girls and 0.39 of an SD for boys;
Girls consulted less with mental health-related professionals;
“I find no effect on students’ likelihood (extensive margin) of being diagnosed or treated by specialists or GPs for a psychological symptom and diseases;”
The grade gains were highest for students from lower socioeconomic backgrounds; and
There appeared to be no correlation between the improvement in grades and the reduction in visits to mental health professionals.

Clarification, May 10, 2024: This newsletter was updated to make clear that a deficiency of beta carotene can lead to blindness, not beta carotene itself.