White knight captures black queen. Checkmate.
A human being can immediately recognize this as a move from a chess game. But to a smart-yet-dumb social media algorithm, it can seem rather sinister: a call to violence, the use of racialized adjectives, and an invocation of what must surely be the White Knights of the Ku Klux Klan!
This is not merely hypothetical. In early 2021, YouTube’s algorithm flagged a number of videos on popular chess channels for hate speech, including some featuring world grandmasters. A subsequent experiment by an artificial intelligence researcher found that more than 80 percent of chess videos or comments that had been flagged for hate speech were false positives. The algorithm had attempted to filter out racist speech but had accidentally swept up four times more innocent speech in its net.
This case might sound esoteric, but the episode is a warning of what happens when platforms ramp up algorithmic content moderation. A similar false-positive problem is coming for more substantive domains, including politics and culture, potentially chilling political dissent and squelching social movement formation.
Algorithmic content moderation is being paired with a seductive disciplinary method: the “shadowban.” Platforms use algorithms to identify objectionable speech, but, with a shadowban, rather than suspending the account or removing the content, the platform simply shunts it off to the side. The creator will still be able to see his or her content, but it won’t appear in search results and could be excluded from users’ organic feeds.
Platforms often deny engaging in shadowbanning, but the incentives to shadowban are significant. It can be hard for a creator to prove that they were shadowbanned: The content is still on their page and receiving at least some views from existing friends.
Furthermore, there is no appeal process for shadowbanning, which reduces any negative public relations exposure that comes when a platform takes down viral content or popular accounts.
Unlike other platforms, TikTok openly shadowbans, saying it “may reduce discoverability” for objectionable content by “redirecting search results” and “making videos ineligible for recommendation in the For You feed.” In other words, a shadowbanned TikTok video may still be viewed by one’s existing followers but remain quarantined from the discovery mechanism that normally accounts for the overwhelming majority of views.
Shadowbanning is not a new concept. The term dates to the early days of the world wide web when forum moderators would ban users without telling them, leaving them angrily typing into the void. But it is increasingly problematic given the rise of a particular kind of social media platform, of which TikTok is the most significant example.
TikTok is fundamentally different from other platforms. Posing as a social media platform, it’s a discovery algorithm—a refinement of older, defunct search engines like StumbleUpon that once trawled the internet to generate random interesting content for users.
While Instagram serves up avocado toast pictures from your favorite influencer and Facebook shows you the latest baby pictures from your niece, TikTok foregrounds encounters with complete strangers. Your “For You” page—where most users spend most of their time—will serve up a dazzling array of content, from a beatbox remix of an Indian folk song to a video of a tiny dog that woke up on the wrong side of the bed. By assessing your attention and engagement, TikTok’s algorithm tailors the feed to your specific interests.
TikTok claims more than 1 billion users worldwide, including two-thirds of Generation Z in America. But more significant than its sheer size is its expanding function as an engine of cultural, social, and political discovery.
We are in the middle of a sweeping reset in cultural production. Record companies used to send representatives to dive bars and honky tonks looking to sign promising musical acts, incubate them, and turn one in a dozen into a profitable hit for the label.
Today, recording labels avidly watch platforms like Soundcloud and TikTok, hoping to be the first to sign the next hot rapper or to find the dance song of the summer. The TikTok-to-airplay pipeline includes artists like GAYLE (“abcdefu”), Tai Verdes (“Stuck in the Middle”), and even the singing Scottish postman responsible for 2021’s brief sea shanty craze (“Wellerman”).
Similarly, Youtube’s discovery function allowed top content creator MrBeast—who has more than 100 million subscribers watching his wacky stunts and entering his lucrative contests—to skip moving to Hollywood, stay in his hometown in North Carolina, and still draw more eyeballs than most blockbuster movie directors.
The power of social media discovery is coming for the political sphere. Although politics is an older person’s game and TikTok is dominated by youth, enterprising politicians are already incorporating TikTok into their campaign strategies.
Ken Russell, who ran in the 2022 Democratic primary for Florida’s 27th Congressional District, went viral on TikTok with a play on a thirst trap video—or “vote-trap” as he styled it—that reached 7 million viewers. Russell’s bid for the nomination fell short, but it enabled someone with a thin political resume to mount a meaningful challenge.
Young politicians like Alexandria Ocasio-Cortez and Madison Cawthorn have ridden their successful Instagram accounts all the way to Congress, propelling them into the national conversation as few first-term representatives before them.
Politicians co-opting popular media is not an entirely novel phenomenon. Ronald Reagan’s telegenic presence is legendary, Obama’s re-election campaign ran a Twitter Q&A in 2012, and, well, we probably don’t need to say much about Donald Trump and Twitter. Yet previous mass media favored those with pre-existing public recognition. TikTok’s discovery function throws wide open the door of virality—and thus the potential for political influence—to anybody.
In a broader social context, TikTok’s primary virtue is that it allows ordinary people, unknown political candidates, and local activists to bypass traditional gatekeepers and potentially reach a national audience. But shadowbanning disciplines creators by teaching them to be careful what they say in order to increase their potential for a viral video. The algorithm rewards political speech of the lowest common denominator and disincentivizes frank conversations about topics like the war on drugs, sexuality, and racial discrimination. For major creators, the risk of even a temporary shadowban can have a chilling effect on the topics they choose to cover and the words they say.
For example, black creators have alleged that TikTok routinely shadowbans their content, especially videos discussing systemic racism, the post-George Floyd protests, and the Black Lives Matter movement. Ziggi Tyler, who accuses TikTok of a “consistent undertone of anti-Blackness,” ran an experiment using the self-description function that showed any entry using the word “black” was blocked while white supremacist labels were accepted.
TikTok acknowledged that it was the fault of automated protections “which flag phrases typically associated with hate speech” and flagged Tyler by accident. The algorithm could not adequately tell the difference between hate speech and hated speech.
Contrast that with Twitter, where the hashtag “#BLM” played a vital role in launching a mass social movement with significant political consequences. Think of how different recent American politics would have been if Twitter’s algorithm had accidentally flagged the first uses of the phrase “black lives matter” as hate speech. TikTok’s shadowbanning process could have already forestalled social movement formation and we would have no way of knowing.
To be fair, content moderation at scale is hard. So many millions of minutes of video are uploaded each day that moderation must be done automatically, without human intervention in more than a tiny handful of cases. TikTok tries to correct for its false positive problem by making exceptions for otherwise objectionable content that is “educational, scientific, artistic … counterspeech, or content that otherwise enables individual expression on topics of social importance.” Yet that definition so broad it’s functionally meaningless; even blatant hate speech from a neo-Nazi would technically qualify as “individual expression on topics of social importance.”
But quibbles about exception criteria aside, the fundamental problem is that shadowbanning is automated—instantly flagged by the algorithm—while exceptions are manual, requiring scarce human input. Regardless of the platform’s intentions, that is a chilling, default thumb on the content moderation scale.
Some have called for government regulation in response—the idea being that mandated algorithmic transparency via routine government audits will solve the problem. However, doing so will only further politicize the content moderation process and create additional incentives for partisan manipulation.
This is not a hypothetical warning. Facebook CEO Mark Zuckerberg disclosed in an interview that Facebook had restricted sharing of a New York Post exclusive about Hunter Biden’s laptop during the 2020 election season because the FBI had warned him via back channels that it might be Russian propaganda. Regardless of how one feels about the particular political uses of the Biden laptop story, it is alarming to contemplate the ways government pressure on content moderation could be abused to punish disfavored speech.
Content moderation appears caught between a whirlpool of excessive moderation on one side—sucking down false positives and chilling speech—and the shoals of extremists on the other, who spread hate and drive away ordinary users. There seems to be no simple solution to the dilemma. Steer away from one risk and you tack closer to the other.
That said, there are practical, light-touch countermeasures that platforms can adopt to combat misinformation and hate speech. For example, a recent study demonstrated that allowing social media users to rate the accuracy of an article before sharing it decreased the proliferation of fake news, increased the ratio of real news, and did so without diminishing user engagement.
Those who use hate speech are a small, albeit vocal, part of any online community; one study found that fewer than 1 percent of tweets contain hate speech. Since most social media users engage in good faith, platforms can develop ways to crowdsource hate speech detection and prevention with escalating thresholds for content removal or user suspension based on feedback.
Crowdsourcing could more effectively police hate speech and do so without generating an excessive number of false positives. Platforms could go a step further, taking a page from sites like Wikipedia by actively building a culture of user empowerment in the moderation process.
This is the third way in the content moderation debates—a path between the false choices commonly offered up. We do not have to choose between either an overly strict, government-regulated algorithmic moderation model that spits out false positives according to politicized standards or a moderation free-for-all featuring an online community filled with fascists and trolls.
But framing shadowbanning primarily as a platform responsibility is problematic because it absolves users of our role in fueling the problem. It is like blaming a mirror for showing us our reflection.
One side in the content moderation debate is prone to magical thinking, wishing that waving a regulatory wand at platforms will convince them to simply program away the problems. How hard can it really be to write a perfect algorithm?! The other side has its head in the sand, hiding behind the rhetoric of free speech while ignoring any responsibility to police its own content or inculcate civic virtue among its users.
The answer to the problem of online hate speech is the same as the answer to the problem of offline hate speech. For a century, the standard for free speech has hewed to Supreme Court Justice Louis Brandeis’s comment in 1927 that “the remedy to be applied” to wrong and evil speech “is more speech, not enforced silence.”
It is vital that we remember this counterspeech principle when discussing the unintended effects of shadowbanning. If combating hate speech requires more speech, then flagging false positives and enacting shadowbans will have the opposite effect, compelling less counterspeech on major platforms—even as hate speech continues to fester on minor platforms. Overreliance on algorithmic content moderation as a solution to offensive speech can backfire.
Finding the proper balance between excessive and inadequate content moderation will require us to have thicker skins, to possess a greater toleration of discomfort for the sake of ultimate communal benefit. The word “toleration” might have positive connotations today, but the Latin root evokes bearing a burden or enduring a beating. Toleration is the pain that we endure to avoid creating a system that impedes our own speech rights and which allows us to talk back.
Extending a robust principle of toleration to online platforms does not require passive acceptance of all content. Counterspeech, not silence, is the goal. Rather than naively looking for a perfect algorithmic moderator or relying on a partisanized bureaucrat, we endure the rhetorical blow of hateful speech in order to ensure the chance to respond with a double measure of truth, compassion, and decency.
Endurance is the price we pay in order to be able to effectively debate, counter, embarrass, upstage, and excoriate offensive speech. Were Louis Brandeis alive and on social media today, I would expect him to be the first to comment, “Delete your account.”