Closing the gender data gap in healthcare

Now more than ever, data is at the center of engagement and decision making. Healthcare and life sciences are no exception: data is central to discussions about public health and is core to enabling continued scientific advancement. The accelerated digitalization of healthcare during the pandemic has only expanded the amount of health-related data available—and cemented the key role that it plays in care delivery, disease prediction and diagnosis, biopharma and medtech innovation, and patient outcomes.

Despite the exponential growth in data generated across the healthcare ecosystem, notable gaps remain. One such area is women’s health, in which gaps span the entire data value chain—from defining women’s health (pre–data generation) to diagnosing (data generation) to tracking at the national level (data collection) to translating data into insights at the global level through epidemiological studies (data analysis). These data disparities ultimately influence health outcomes for women globally by creating blind spots in the insights that drive research design, investment decisions, and pipeline priorities. Certain subsets of women, such as those of different backgrounds, sexual orientations, and gender identities, are more vulnerable to the gaps and negative effects of these blind spots. Furthermore, insufficient availability and analysis of women-specific health data undermine advancements in disease-state understanding and limit asset discovery opportunities across medical conditions with meaningful unmet need.

This article highlights these disparities and explores options to remedy them.

Understanding the data value chain

Health-related data has numerous sources and is analyzed for different applications. This article explores examples of data gaps in women’s health at key moments across the health data value chain (Exhibit 1).

Shortcomings exist at each stage of the data value chain in women’s health.

The following sections examine these examples in greater detail.

Pre–data generation: Defining women’s health

Good data sets begin with good definitions. Without clear definitions, the metrics to track and the conclusions to draw remain murky. However, at present there is no one definition of “women’s health.”

Historically, women’s health was largely defined as “reproductive health.” More recently, academics and clinicians have used a more expansive lens, recognizing that sex is a significant factor in the development and progression of many diseases. The National Academy on Women’s Health Medical Education defines women’s health as “devoted to facilitating the preservation and wellness of and prevention of illness in women and includes screening, diagnosis and management of conditions that are unique in women, are more common in women, are more serious in women, and have manifestations, risk factors or interventions that are different in women.”

For the purposes of this analysis, we define “women’s health” as encompassing both female-specific conditions, including those tied to female reproduction or female biology, and general health conditions that may affect women differently, such as cardiovascular diseases, or disproportionately, such as autoimmune diseases (Exhibit 2). It is crucial to understand sex-driven differences, as today’s care models often ignore those differences, resulting in health outcomes that can vary by sex (often to women’s disadvantage).

The definition of women’s health goes beyond reproductive health.

Data generation: Documenting women’s diagnoses in claims data

Insurance claims data—particularly in the United States—provides critical insight into the nature of health conditions and how they are treated. These data sets have inherent limitations, as their primary use for physicians is for billing purposes. However, this data is high quality and widely used, and thus the analysis for this article uses diagnosis codes on claims as a proxy for overall diagnosis rates.

According to US claims data from January 2019 through August 2022, the prevalence of women’s health conditions (estimated by epidemiological data sources) is roughly five times that of their documented diagnoses. In other words, for every one woman diagnosed with a women’s health condition, roughly four go undiagnosed. In comparison, the difference between epidemiological prevalence and documented diagnoses for men’s health conditions narrows to roughly 1.5 times (Exhibit 3).

There is meaningful variation between prevalence and diagnosis of women’s health conditions.

While this disconnection between prevalence and diagnosis indicates an inconsistency in women’s health data across various sources, such sex-based differences may also reflect structural drivers, such as biases in care delivery. Implicit biases, defined as “attitudes and beliefs about race, ethnicity, age, ability, gender, or other characteristics that operate outside our conscious awareness and can be measured only indirectly,” have been found to be associated with “diagnostic uncertainty.” This phenomenon is corroborated by surveyed patients: 20 percent of women say a healthcare provider has ignored or dismissed their symptoms, compared with 14 percent of men.

Furthermore, we found that the sex of the diagnosing physician appears to be correlated with the likelihood of being diagnosed with a condition. In other words, in the claims data we analyzed, women appear to be more likely to be diagnosed with a female-specific condition, including menopause, polycystic ovary syndrome (PCOS), and endometriosis, if their physician is a woman. In our analysis of claims data, women represent approximately 40 percent of primary care physicians (PCPs) but nearly 50 percent of PCPs documenting diagnoses of these key women’s health conditions (Exhibit 4).

Female primary-care physicians are disproportionately the physicians diagnosing women’s health conditions.

As Caroline Criado Perez writes in her book Invisible Women: Data Bias in a World Designed for Men, “It’s not always easy to convince someone a need exists if they don’t have that need themselves.” Given that claims data informs life sciences investment decisions, “blind spots” in these data sets may contribute to a perception of less unmet need and less need for continued innovation.

These biases in care delivery are also likely reinforced during medical training. Out of 112 internal-medicine residency programs reviewed in a 2016 study, approximately 25 percent did not include menopause in the core curriculum, 30 percent did not include contraception, nearly 40 percent did not include PCOS, and more than 70 percent did not include infertility. These educational gaps are present even in programs dedicated to women’s health: a survey of US obstetrics and gynecology residents found that fewer than two in ten receive formal training in menopause medicine, but seven in ten would like to receive it.

The rate of underdiagnosis for women is more striking in light of data that shows that women are, on average, more likely to seek out care. A 2013 Kaiser survey found that 68 percent of men and 81 percent of women identified a clinician they see for routine care, and women were more likely than men to have seen a provider in the past two years (91 percent versus 75 percent).

Data collection: Reporting sex-disaggregated health data at the national level

Globally, the quality and quantity of women’s health data collection is uneven. The World Bank tracks country-level reporting rates of gender-specific healthcare indicators and finds significant gaps. For example, in 2020, less than 10 percent of countries reported data related to female access to contraception. This lack of visibility undermines the ability to understand drivers of maternal and child health, maternal mortality, and sexually transmitted infections (STIs) such as HIV/AIDS. And while data availability related to contraceptive use has improved, many countries still only provide data for married women. Less than 5 percent of countries reported 2020 data on menstrual material usage, which is a critical indicator of public health, gender equity, and human rights.

The availability of sex-specific health data also differs by country. During the COVID-19 pandemic, for instance, 76 percent of high-income countries reported COVID-19 case data by sex, compared with 37 percent of low-income countries. Sex-disaggregated data provides important insights into the biological mechanisms and socioeconomic risk factors that drive disease prevention and may translate to the development of more effective biopharma (and other) interventions by sex. Without it, the picture of global women’s health remains incomplete, particularly in lower-income countries.

Data analysis: Improving the metrics of epidemiological studies

In addition to data published by individual nations, the global burden of disease (GBD) is often used by clinicians, payers, researchers, analysts, and policy makers to understand the evolving global healthcare landscape. The GBD is the world’s most comprehensive observational epidemiological study, spanning 204 countries, 369 diseases and injuries, and 87 risk factors.

However, traditional metrics tracked in the GBD may not capture the full scale of need in women’s health. Using the GBD tool of the Institute for Health Metrics and Evaluation (IHME), we investigated the prevalence and burden of disease associated with select women’s health conditions (Exhibit 5). Of the conditions defined in this analysis, nearly 60 percent of prevalence data was attributed to female-specific diseases such as maternal health, contraception, and menopause. However, these same female-specific conditions represent less than 25 percent of disability-adjusted life years associated with women’s health conditions. In other words, traditional health metrics do not accurately reflect the widespread suffering associated with female-specific conditions.

Traditional health metrics do not accurately reflect the prevalence and burden of women’s health conditions.

A delta between prevalence and disability-adjusted life years is not surprising in itself. However, the low disability-adjusted life years associated with female-specific conditions (some of which, such as menopause, are not tracked at all in the GBD) appear to meaningfully understate the disruption and suffering associated with female-specific conditions. For example:

Menopause. Approximately 80 percent of women indicate that menopause interferes with their lives, and roughly one-third of these women also experience depression. Furthermore, with an estimated $810 billion in healthcare spending and productivity losses, menopause places a significant economic burden on the global economy.
Infertility. About 40 percent of women with infertility are reported to experience depression, and another 35 percent are reported to experience anxiety. These rates are an estimated 1.5 to 2.0 times higher in low- and middle-income countries than in high-income countries.
Endometriosis. Women with endometriosis have about three times higher healthcare costs on average. The time to diagnosis is estimated to be more than seven years, with an even more pronounced delay for Black women. Furthermore, approximately half of women with endometriosis reported earning less money as a result of the impact of endometriosis symptoms.
Dysmenorrhea. Period pain can disrupt women’s lives, with about 40 percent of young women reporting negative effects on classroom performance and about 20 percent reporting school absences due to pain. This impact is more pronounced in low- and middle-income countries.

Steps to close the data gaps in women’s health

Data is foundational to our understanding of disease states and is a crucial catalyst for continued life-sciences innovation. A 2023 report from the Enterprise Strategy Group and Splunk found that leaders in key data-maturity metrics also excelled in product innovation; these leaders report a higher number of product launches per year and a higher share of revenue accounted for by new innovations.

In women’s health, data gaps coincide with lower rates of clinical development focused on women’s health: excluding oncology, just 1 percent of biopharma pipeline assets and 2 percent of medtech novel approvals are directed at addressing women’s health conditions. (Including oncology, the rates increase to 5 and 4 percent, respectively.) Without data to accurately document the extent and nature of conditions, there is a limited fact base to fuel innovation—in women’s health specifically and in life sciences overall. But the opportunities to close gaps are plentiful and represent exciting opportunities for industry participants with a stake in improving women’s health outcomes.

Acknowledge the importance of sex in the definition and treatment of disease. The data gap is substantial and will take many hands and substantial effort to bridge. Today’s clinical trials, diagnoses, and treatment plans are built on existing data sets, only some of which include thorough analysis of robust sex-disaggregated data. Fortunately, working to close the data gap in women’s health could bring forth opportunities throughout the data value chain for life-sciences organizations, providers, payers, academics, and investors alike. The starting point is building a widely acknowledged definition of women’s health that includes all relevant conditions—not just those related to reproductive health—and highlights the biological relevance of sex to health outcomes.

Reinforce incentives at every step of the women’s health data value chain. At the regional and national levels, clinicians, academics, and researchers could benefit from updated guidance on the impact of sex-based differences on clinical outcomes, which demographic data to collect, and mechanisms for the collection of women’s health data. Improving visibility into women’s unmet medical needs will also require new mechanisms, incentives, and infrastructure to facilitate the generation of sex-disaggregated data. Some changes, such as those enhancing the understanding of sex-based differences in clinical outcomes, could have a substantial impact by creating a stronger foundation for women’s health data collection and analysis. Incumbent health and life sciences companies could also take the lead in working with companies that have unanalyzed sex-disaggregated data.

Improve the generation and use of data in care delivery. Clinician training in sex-specific biology and the implicit biases we know about today, with an emphasis on narrowing the gap between the prevalence of its condition and the volume of diagnoses, could help to improve health outcomes for women and critical sources of data generation. Once an expanded set of sex-disaggregated data is collected and assessed, that data can also guide future training on sex-based differences as data relates to healthcare research, clinical trials, pharmaceutical development, treatment, and more. Organizations that use clinical decision support systems to assist in clinical training may consider how a more robust set of sex-disaggregated data could improve models. The goal is to equip everyone—from researchers to providers to payers—with comprehensive training and tools built on best-in-class data practices.

Fund new ventures related to women’s health data. Investors could also seek out opportunities to fund new ventures focused on generating women’s health data and understanding the impact of sex-driven differences on health outcomes. The successful models that led to the proliferation of precision medicine in other therapeutic areas, such as oncology-focused electronic medical records companies, could guide these new ventures. Investors, entrepreneurs, and established life-sciences incumbents each have a role to play in investing in this white space.

Rethink traditional epidemiological metrics. Finally, stakeholders can consider how health metrics and studies are used, as well as the implications of those choices. Both key health outcome metrics and population-level analyses (for example, global epidemiological studies) should aim to accurately reflect the patient experience of different subpopulations. Revisiting and expanding these metrics could be a joint effort among governments, academics, clinicians, and public-health experts.

Our collective understanding of disease burden drives not only health outcomes but also investment decisions—and yet that understanding does not benefit from a comprehensive data set on women. The gaps are many, and women of different demographics feel the effects to varying degrees—but a new commitment to raising the bar in women’s health data could unlock the next generation of life-sciences innovations and care delivery for women globally. Moreover, taking care of women is taking care of communities. Everyone will benefit from a world with a comprehensive definition of women’s health, diagnosis rates more in line with condition prevalence, nationally reported sex-disaggregated data, and global epidemiological studies that consider the full breadth of women’s health experiences. The future of innovation in women’s health is only as strong as the data value chain that supports it. It’s time to close these gaps.