How Well Do Tech Companies Know You?

James C. Scott’s lessons about government can be applied to big tech platforms.

Published July 25, 2024 • Updated July 26, 2024

Welcome back to Techne! Apparently, there is a shop in Memphis, Tennessee, that uses 100-plus-year-old grease to fry hamburgers. I might have to make a trip down because I am old enough to remember and yearn for the old-style McDonald’s fries. Formula 47, as it was called, created a much better fry and was mostly made up of beef tallow, but it was replaced as an oil in the 1990s due to one man, Phil Sokolof. For the full story, you can’t do better than Malcolm Gladwell.

Notes and Quotes

From hospital networks to the airline industry, a major digital disruption occurred Friday when the cybersecurity company CrowdStrike updated its software. Their service—designed to protect against cyberattacks predominantly from Russia and China—ironically became the source of one of the most significant tech outages in recent memory. The incident underscores vulnerabilities in our infrastructure, where a single point of failure can have massive consequences.
The slow crawl of New York’s Metropolitan Transportation Authority into the digital age turned out to be a blessing in disguise on Friday when the global tech outage hit. Unlike other transportation bodies, where all operations halted, the only portions of the MTA’s infrastructure that were hit were its newer real-time data feeds. The older location system from the 1990s continued working as usual.
Netflix strengthened its position in the streaming market last quarter, gaining 8 million new subscribers and increasing its sales and profit margin forecasts. Meanwhile, Apple is adjusting its approach to content production, with lower upfront payments for shows and more readily canceling underperforming productions.
NASA announced Wednesday that it will discontinue the VIPER lunar rover project. The decision comes due to rising costs, launch delays, and potential future budget overruns. Originally slated for a 2023 launch, VIPER’s readiness date had already been pushed to late 2025 because of schedule and supply chain issues. NASA plans to repurpose VIPER’s instruments and components for future moon missions.
Last year, Meta introduced a “pay or consent” model in the EU, offering ad-free Facebook and Instagram for up to 12.99 euros monthly or the user’s consent to personal data collection for targeted ads. EU regulators are now challenging Meta, citing confusing language, pressured rollout, and insufficient time for users to consider the implications. Regulators are giving the company about six weeks to propose changes to this model or face potential fines.
Sens. Joe Manchin of West Virginia and John Barrasso of Wyoming introduced the Energy Permitting Reform Act of 2024. The bipartisan bill aims to enhance U.S. energy security by streamlining the approval process for crucial energy and mineral projects, as well as by accelerating domestic energy development.
A group of researchers used the launch of Apple’s App Tracking Transparency feature as a natural experiment to better understand the effect of users limiting access to their data: “Leveraging data from all U.S. advertisers on Meta combined with offline administrative data, we find that reductions in digital ad effectiveness led to decreases in investments in advertising, increases in market concentration, and increases in product prices.”
The CEO of Taiwan Semiconductor Manufacturing Company (TSMC) is predicting that supply won’t balance out demand for advanced chips until 2025 or 2026.
The ever-insightful Brian Potter at Construction Physics thinks through what it would take to recreate Bell Labs, the lab of AT&T that began in the 1920s and is credited with the development of radio astronomy, the transistor, the laser, and the photovoltaic cell.

James C. Scott, Legibility, and the Omnipresence of Tech

Last week, political scientist James C. Scott passed away. Scott’s 1998 book, Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed, easily ranks near the top of my most influential books, right alongside Michael Billig’s Thinking and Arguing, Deirdre McCloskey’s If You’re So Smart, and Virginia Postrel’s The Future and Its Enemies.

Scott’s primary research centered on the people of Southeast Asia and how they resisted authority. This interest led him to write a series of books on government resistance throughout the 1970s and into the 1990s. Seeing Like a State reflects the whole of that scholarship as read through the concept of what Scott calls “legibility”: government efforts “to arrange the population in ways that simplified the classic state functions of taxation, conscription, and prevention of rebellion.” As he explains in the book:

Having begun to think in these terms, I began to see legibility as a central problem in statecraft. The premodern state was, in many crucial respects, partially blind; it knew precious little about its subjects, their wealth, their landholdings and yields, their location, their very identity. It lacked anything like a detailed “map” of its terrain and its people. It lacked, for the most part, a measure, a metric, that would allow it to “translate” what it knew into a common standard necessary for a synoptic view.

Seeing Like a State was about rationalist government schemes that failed, including scientific management of forestry in Germany in the late 1800s, collective farms in the Soviet Union, the building of Brasilia, forced villagization in Tanzania, and the urban plans of Le Corbusier. But more fundamentally, the book focused on how systems need to narrow their own vision to have an impact. As he continued,

Certain forms of knowledge and control require a narrowing of vision. The great advantage of such tunnel vision is that it brings into sharp focus certain limited aspects of an otherwise far more complex and unwieldy reality. This very simplification, in turn, makes the phenomenon at the center of the field of vision more legible and hence more susceptible to careful measurement and calculation.

Scott’s notion of legibility and its obverse, illegibility, offers a framework to understand the relationship between a system and the people it is intended to act upon.

So naturally, I have found it helpful in understanding how social media companies and other big tech platforms interact with users.

Legibility and illegibility.

Platforms and other data-centric businesses seem to possess a kind of insight into people’s lives. TikTok is “astonishingly good at revealing people’s desires even to themselves,” reported Ben Smith in the New York Times. “The TikTok Algorithm Knew My Sexuality Better Than I Did,” read one headline. Facebook users describe the platform as “powerful, perceptive, and ultimately unknowable.” In collecting, storing, and processing massive troves of data, these systems seem to be able to peer into our lives to deliver us the things we want.

But there are serious limits to these systems. Even with all of the data being produced and collected, people’s lives are still illegible to a degree.

To be clear, I am not denying that user data can be used to accurately predict highly sensitive personal attributes like sexual orientation, ethnicity, religious view, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, gender, and, most important for this discussion, political opinions.

But in practice, these digital systems sometimes succumb to the same big fault as governments: They impose an order on a reality more complex than they can imagine.

For example, Facebook categorizes users into groups for advertising based on what they term “affinities.” In a 2019 Pew survey of how those categories matched people’s preferences, only 13 percent of users felt these categories described their preferences very accurately, while 46 percent found them somewhat accurate. Conversely, 27 percent of users felt misrepresented by these categories, and 11 percent weren’t assigned to any category. In other words, over a third of all users are effectively illegible to Facebook, while another quarter of users have shades of illegibility.

In fact, this would lead to a lawsuit.

A complaint by the National Fair Housing Alliance alleged that Facebook’s user classification and ad targeting tools permitted landlords, developers, and housing service providers to limit the audience for their ads based on sex, religion, familial status, and national origin in violation of the Fair Housing Act. The Department of Justice eventually joined the suit, National Fair Housing Alliance v. Facebook, Inc., which in 2022 got Facebook parent Meta to agree to change its ad services.

As a result of the settlement, Meta dropped the “Special Ad Audience” tool for its housing ads. It also got rid of thousands of ad categories, including the much-maligned “African-American multicultural affinity.” I remember looking at those ad categories in 2016 when they became public to find that I was tagged with an “African-American multicultural affinity.” I told a co-worker (another white guy) about it and he looked at his ad categories. He too was tagged with that affinity. But the court cases never discussed this aspect of the tool: that the category included people who most would assume shouldn’t belong.

Legibility is the view from the systematizer to the person. Legibility highlights the boundaries of knowledge in action, which is one of the fundamental questions in tech policy.

Back in 2021, I listed a bunch of examples of illegibility, like the Phoenix man who is suing the city for a wrongful arrest because the data obtained from Google was faulty. But I’ve since come across lots of other examples of illegibility:

Employees at Meta had discovered by March 2021 that “our ability to detect vaccine-hesitant comments is bad in English, and basically non-existent elsewhere,” The Verge reported. Another employee chimed in, noting that comments “are almost a complete blind spot for us in terms of enforcement and transparency right now” even though they make up a “significant portion of misinfo on FB.”
Reporting from CNN on leaked internal documents on January 6, again from Meta, found that, “At the time it was very difficult to know whether what we were seeing was a coordinated effort to delegitimize the election, or whether it was protected free expression by users who were afraid and confused and deserved our empathy. But hindsight being 20:20 makes it all the more important to look back to learn what we can about the growth of the election delegitimatizing movements that grew, spread conspiracy, and helped incite the Capitol insurrection.”
Twitter miscounted the number of users for years. It seems that Twitter overstated the number of daily users on its service for three years straight, overcounting by up to 1.9 million users each quarter.
Mark Serrels wrote about his experience being served ads for pants he already bought in a piece titled, “I bought ninja pants, and Instagram’s algorithms will punish me forever.”

And there are other examples too: in machine learning, in the controlled experiments that advertisers and web designers use, and in so much else.

The lesson.

Seeing Like a State is about much more than legibility. But then, James C. Scott’s work is about so much more than Seeing Like a State. Still, I think the concept of legibility is an incredibly powerful way to frame our understanding of the digital world and its limits.

I’m not alone. Venkatesh Rao has found insights in the book, Eugene Wei used the concept to understand TikTok, and Neil Chilson, the Federal Trade Commission’s former chief technologist, relied on Scott’s historical lessons to offer guidance in tech policy.

Awash in data, there is a common assumption that legibility has been solved. But I’m not sure it is ever safe to assume. Instead, I think we should be comfortable, as Scott was, with fuzzy boundaries, limited knowledge, and foiled plans.

Until next week,

🚀 Will

AI Roundup

MIT-led researchers have developed a machine-learning framework to predict heat movement in semiconductors and insulators. This approach can forecast phonon dispersion relations up to 1,000 times faster than other AI methods, with similar or improved accuracy. Compared to traditional non-AI techniques, it could be a million times faster.
The supply of training data for AI models is shrinking as individuals and companies implement measures to protect their information from being harvested.
This article outlines strategies for leveraging LLMs in various coding tasks. Techne readers, you’ll find this useful!
Researchers Anders Humlum and Emilie Vestergaard conducted a large-scale survey that reveals widespread adoption of ChatGPT among workers, but with significant disparities: Women use it less frequently and higher earners are more likely to adopt it. While workers recognize ChatGPT’s productivity potential, employer restrictions and training needs often limit its use, and efforts to inform workers about its benefits have minimal impact on adoption rates.