Systematic Reviews of Gender Transition Treatments, Explained

The American Academy of Pediatrics is playing catch-up with European health officials.

Testosterone treatment. (Photo by Rory Doyle for The Washington Post via Getty Images)

Published August 18, 2023 • Updated February 25, 2025

On Wednesday, North Carolina became the 22nd state to restrict gender transition treatments for minors after both Republican-controlled legislative chambers passed a bill to prohibit “surgical gender transition procedures on minors and prescribing, providing, or dispensing puberty-blocking drugs or cross-sex hormones to minors.”

The near-party-line veto override of Gov. Roy Cooper, a Democrat, comes two weeks after the board of the American Academy of Pediatrics (AAP) voted to authorize a systematic review of such treatments—while also reaffirming a 2018 policy statement that calls for health insurance plans to cover “medical, psychological, and, when indicated, surgical gender-affirming interventions.”

The AAP’s review “reflects the board’s concerns about restrictions to access to health care with bans on gender-affirming care in more than 20 states,” it announced.

But health authorities in Europe have already conducted systematic reviews over the last five years—which have prompted many countries to change their policies and curtail access to such treatments except in rare cases. Could a similar scenario play out in the U.S.?

How do systematic reviews work?

Most scientific studies contain a basic “literature review” section surveying past studies on related topics, but more thorough systematic reviews aim to identify virtually all relevant studies before homing in on the ones with the highest-quality evidence. They allow researchers (and in the AAP’s case, practicing physicians) to get a better, more comprehensive picture of the research that already exists.

Researchers working on a systematic review follow a well-defined process, identifying ahead of time what population(s), interventions, comparisons, and outcomes (PICO) they are interested in. For example, one review of gender transition treatments commissioned by the National Health Service in England considered the following question: “In children and adolescents with gender dysphoria, what is the clinical effectiveness of treatment with [puberty blockers] compared with one or a combination of psychological support, social transitioning to the desired gender or no intervention?” For clinical effectiveness, the researchers looked at a range of outcomes, including “the impact on gender dysphoria, mental health and quality of life.”

After defining the PICO, researchers then scour peer-reviewed journals and online databases for potentially relevant studies and narrow down from there. For example, an English-language review of hormone treatments conducted by Swedish researchers started by identifying nearly 10,000 papers—but ultimately only 24 were relevant according to the PICO. Finally, important data are extracted from the relevant studies so they can be more easily compared and assessed.

The “GRADE” system most researchers follow in rating evidence is “a very structured, well-worked-out approach,” Dr. Gordon Guyatt, a professor of health research methods at McMaster University in Ontario and a leader in the field of evidence-based medicine, tells The Dispatch. But “inevitably, there is judgment involved.”

What have the reviews in Europe found?

The most notable systematic reviews were commissioned by government agencies in Sweden and the United Kingdom looking for evidence-based clinical guidelines amid an unprecedented increase in adolescents seeking care for gender dysphoria: At England’s main clinic for Gender Identity Development Service (GIDS), for instance, total referrals increased by more than 1,700 percent from 2012 to 2022.

Gender dysphoria is defined by the fifth edition of the Diagnostic and Statistical Manual as “a marked incongruence between one’s experienced/expressed gender and assigned gender,” lasting at least six months and manifesting itself in multiple specific symptoms. Adults need to meet at least two out of a list of six symptoms, and children need to meet six out of eight, including “a strong desire to be of the other gender or an insistence that one is the other gender.” Other symptoms on the list are “a strong preference for the toys, games or activities stereotypically used or engaged in by the other gender” and “a strong dislike of one’s sexual anatomy.”

A “Dutch protocol” for treating adolescent gender dysphoria with hormonal interventions was developed in the 1990s, based on research suggesting that hormone treatments and eventual surgeries could reduce the symptoms of gender dysphoria and improve mental and physical health. But that approach was developed before the recent surge in referrals, which has been driven primarily by adolescents born female—a notable departure from years of precedent in which people born male accounted for the majority of gender dysphoria cases. By 2017, female referrals made up 70 percent of the total at GIDS, with the number of teenage girls presenting with gender dysphoria rising by 5,000 percent in seven years, according to a 2022 report from The Times of London.

The changing circumstances surrounding diagnoses of gender dysphoria have led to increased scrutiny. The English systematic reviews, published in 2021, came in the context of a broader independent review of the country’s “gender identity services” for young people. Among its interim findings was evidence that the Dutch protocol was apparently not followed at the GIDS clinic, with “children and young people with neurodiversity and/or complex mental health problems” given hormone intervention instead of “therapeutic support.”

The Swedish review, published earlier this year, looked at the effects of hormone treatments on children and teens with gender dysphoria. In some studies, the treatment in question was puberty blockers alone, in some it was cross-sex hormones alone, in some it was a combination of the two, and in some it also included gender-reassignment surgery. The upshot was clear: High-quality information is lacking, and puberty blockers “should be considered experimental treatment of individual cases rather than standard procedure.”

The results of the two English reviews—one for puberty blockers and one for cross-sex hormones—were similar. They concluded that while the treatments may help improve gender dysphoria in some people, “any potential benefits of gender-affirming hormones must be weighed against the largely unknown long-term safety profile of these treatments in children and adolescents with gender dysphoria.”

Now, transition treatments for minors in England and Sweden are limited to those enrolled in clinical trials. Finland, Norway, and France are being similarly cautious.

What will the AAP’s systematic review accomplish?

Many U.S. lawmakers might still oppose gender transition treatments on moral grounds even if the evidence suggested that they were effective in treating gender dysphoria. And there’s always a chance the weight of the evidence could shift as more research is done.

But the research to date suggests that we don’t know much about the long-term effects of hormone treatments in gender dysphoric minors—something the AAP hasn’t made any reference to in its public-facing communications or its treatment recommendations.

“It seems quite problematic to me to say ‘we’ll make our recommendations first, and then we’ll look at the evidence later,’” Guyatt says. He attributes the organization’s “slightly bizarre” decision to the polarization that is “epidemic” in the United States.

“They’re responding to the political situation around gender-affirming therapy,” he says. “Those extremes are not evidence-based.”

Instead, he hopes policymakers consider the evidence “in the context of values and preferences.”

“If you place an extremely high value on the autonomy of the young folks and a much lower value on harm reduction, you would take one approach, and if you place a low value on the autonomy of the young folks and a high value on harm minimization with uncertain benefit, then you would take another approach,” he says. “But to dismiss either of those—which strike me as both legitimate values that need to be considered—to dismiss either one of them, to devalue them completely, would be foolish.”