Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms
Cardiology
The American Heart Association (AHA) Get with the Guidelines–Heart Failure Risk Score predicts the risk of death in patients admitted to the hospital.9 It assigns three additional points to any patient identified as “nonblack,” thereby categorizing all black patients as being at lower risk. The AHA does not provide a rationale for this adjustment. Clinicians are advised to use this risk score to guide decisions about referral to cardiology and allocation of health care resources. Since “black” is equated with lower risk, following the guidelines could direct care away from black patients. A 2019 study found that race may influence decisions in heart-failure management, with measurable consequences: black and Latinx patients who presented to a Boston emergency department with heart failure were less likely than white patients to be admitted to the cardiology service.24
Cardiac surgeons also consider race. The Society of Thoracic Surgeons produces elaborate calculators to estimate the risk of death and other complications during surgery.10 The calculators include race and ethnicity because of observed differences in surgical outcomes among racial and ethnic groups; the authors acknowledge that the mechanism underlying these differences is not known. An isolated coronary artery bypass in a low-risk white patient carries an estimated risk of death of 0.492%. Changing the race to “black/African American” increases the risk by nearly 20%, to 0.586%. Changing to any other race or ethnicity does not increase the estimated risk of death as compared with a white patient, but it does change the risk of renal failure, stroke, or prolonged ventilation. When used preoperatively to assess risk, these calculations could steer minority patients, deemed to be at higher risk, away from surgery.
Nephrology
Since it is cumbersome to measure kidney function directly, researchers have developed equations that determine the estimated glomerular filtration rate (eGFR) from an accessible measure, the serum creatinine level. These algorithms result in higher reported eGFR values (which suggest better kidney function) for anyone identified as black.11,25 The algorithm developers justified these outcomes with evidence of higher average serum creatinine concentrations among black people than among white people. Explanations that have been given for this finding include the notion that black people release more creatinine into their blood at baseline, in part because they are reportedly more muscular.11,25 Analyses have cast doubt on this claim,26 but the “race-corrected” eGFR remains the standard. Proponents of the equations have acknowledged that race adjustment “is problematic because race is a social rather than a biological construct” but warn that ending race adjustment of eGFR might lead to overdiagnosis and overtreatment of black patients.27 Conversely, race adjustments that yield higher estimates of kidney function in black patients might delay their referral for specialist care or transplantation and lead to worse outcomes, while black people already have higher rates of end-stage kidney disease and death due to kidney failure than the overall population.25 As long as uncertainty persists about the cause of racial differences in serum creatinine levels, we should favor practices that may alleviate health inequities over those that may exacerbate them.
Similar adjustment practices affect kidney transplantation. The Kidney Donor Risk Index (KDRI), implemented by the national Kidney Allocation System in 2014, uses donor characteristics, including race, to predict the risk that a kidney graft will fail.12 The race adjustment is based on an empirical finding that black donors’ kidneys perform worse than nonblack donors’ kidneys, regardless of the recipient’s race.28 The developers of the KDRI do not provide possible explanations for this difference.12 If the potential donor is identified as black, the KDRI returns a higher risk of graft failure, marking the candidate as a less suitable donor. Meanwhile, black patients in the United States still have longer wait times for kidney transplants than nonblack patients.29 Since black patients are more likely to receive kidneys from black donors, anything that reduces the likelihood of donation from black people could contribute to the wait-time disparity.29 Use of the KDRI may do just that. Mindful of this limitation of the KDRI, some observers have proposed replacing “the vagaries associated with inclusion of a variable termed ‘race’” with a more specific, ancestry-associated risk factor, such as APOL1 genotype.28
Obstetrics
The Vaginal Birth after Cesarean (VBAC) algorithm predicts the risk posed by a trial of labor for someone who has previously undergone cesarean section. It predicts a lower likelihood of success for anyone identified as African American or Hispanic.13 The study used to produce the algorithm found that other variables, such as marital status and insurance type, also correlated with VBAC success.14 Those variables, however, were not incorporated into the algorithm. The health benefits of successful vaginal deliveries are well known, including lower rates of surgical complications, faster recovery time, and fewer complications during subsequent pregnancies. Nonwhite U.S. women continue to have higher rates of cesarean section than white U.S. women. Use of a calculator that lowers the estimate of VBAC success for people of color could exacerbate these disparities. This dynamic is particularly troubling because black people already have higher rates of maternal mortality.30
Urology
The STONE score predicts the likelihood of kidney stones in patients who present to the emergency department with flank pain. The “origin/race” factor adds 3 points (of a possible 13) for a patient identified as “nonblack.”15 By assigning a lower score to black patients, the STONE algorithm may steer clinicians away from thorough evaluation for kidney stones in black patients. The developers of the algorithm did not suggest why black patients would be less likely to have a kidney stone. An effort to externally validate the STONE score determined that the origin/race variable was not actually predictive of the risk of kidney stones.16 In a parallel development, a new model for predicting urinary tract infection (UTI) in children similarly assigns lower risk to children identified as “fully or partially black.”17 This tool echoes UTI testing guidelines released by the American Academy of Pediatrics in 2011 that were recently criticized for categorizing black children as low risk.31
Assessment
Similar examples can be found throughout medicine. Some algorithm developers offer no explanation of why racial or ethnic differences might exist. Others offer rationales, but when these are traced to their origins, they lead to outdated, suspect racial science or to biased data.22,30,31 In the cases discussed here, researchers followed a defensible empirical logic. They examined data sets of clinical outcomes and patient characteristics and then performed regression analyses to identify which patient factors correlated significantly with the relevant outcomes. Since minority patients routinely have different health outcomes from white patients, race and ethnicity often correlated with the outcome of interest. Researchers then decided that it was appropriate — even essential — to adjust for race in their model.
These decisions are the crux of the problem. When compiling descriptive statistics, it may be appropriate to record data by race and ethnicity and to study their associations. But if race does appear to correlate with clinical outcomes, does that justify its inclusion in diagnostic or predictive tools? The answer should depend on how race is understood to affect the outcome.30 Arriving at such an understanding is not a simple matter: relationships between race and health reflect enmeshed social and biologic pathways.32 Epidemiologists continue to debate how to responsibly make causal inferences based on race.33 Given this complexity, it is insufficient to translate a data signal into a race adjustment without determining what race might represent in the particular context. Most race corrections implicitly, if not explicitly, operate on the assumption that genetic difference tracks reliably with race. If the empirical differences seen between racial groups were actually due to genetic differences, then race adjustment might be justified: different coefficients for different bodies.
Such situations, however, are exceedingly unlikely. Studies of the genetic structure of human populations continue to find more variation within racial groups than between them.34,35 Moreover, the racial differences found in large data sets most likely often reflect effects of racism — that is, the experience of being black in America rather than being black itself — such as toxic stress and its physiological consequences.32 In such cases, race adjustment would do nothing to address the cause of the disparity. Instead, if adjustments deter clinicians from offering clinical services to certain patients, they risk baking inequity into the system.
This risk was demonstrated in 2019 when researchers revealed algorithmic bias in medical artificial intelligence.36 A widely used clinical tool took past health care costs into consideration in predicting clinical risk. Since the health care system has spent more money, on average, on white patients than on black patients, the tool returned higher risk scores for white patients than for black patients. These scores may well have led to more referrals for white patients to specialty services, perpetuating both spending discrepancies and race bias in health care.
A second problem arises from the ways in which racial and ethnic categories are operationalized. Clinicians and medical researchers typically use the categories recommended by the Office of Management and Budget: five races and two ethnicities. But these categories are unreliable proxies for genetic differences and fail to capture the complexity of patients’ racial and ethnic backgrounds.34,35 Race correction therefore forces clinicians into absurdly reductionistic exercises. For example, should a physician use a double correction in the VBAC calculator for a pregnant person from the Dominican Republic who identifies as black and Hispanic? Should eGFR be race-adjusted for a patient with a white mother and a black father? Guidelines are silent on such issues — an indication of their inadequacy.
Researchers are aware of this dangerous terrain. The Society of Thoracic Surgeons acknowledged concerns raised by clinicians and policymakers “that inclusion of SES factors in risk models may ‘adjust away’ disparities in quality of care.” Nonetheless, it proceeded to consider “all preoperative factors that are independently and significantly associated with outcomes”: “Race has an empiric association with outcomes and has the potential to confound the interpretation of a hospital’s outcomes, although we do not know the underlying mechanism (e.g., genetic factors, differential effectiveness of certain medications, rates of certain associated diseases such as diabetes and hypertension, and potentially [socioeconomic status] for some outcomes such as readmission).”10 This decision reflects a default assumption in medicine: it is acceptable to use race adjustment even without understanding what race represents in a given context.
To be clear, we do not believe that physicians should ignore race. Doing so would blind us to the ways in which race and racism structure our society.37-39 However, when clinicians insert race into their tools, they risk interpreting racial disparities as immutable facts rather than as injustices that require intervention. Researchers and clinicians must distinguish between the use of race in descriptive statistics, where it plays a vital role in epidemiologic analyses, and in prescriptive clinical guidelines, where it can exacerbate inequities.
This problem is not unique to medicine. The criminal justice system, for instance, uses recidivism-prediction tools to guide decisions about bond amounts and prison sentences. One tool, COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), while not using race per se, uses many factors that correlate with race and returns higher risk scores for black defendants.40 The tool’s creators explained that their design simply reflected empirical data.41 But if the underlying data reflect racist social structures, then their use in predictive tools cements racism into practice and policy. When these tools influence high-stakes decisions, whether in the clinic or the courtroom, they propagate inequity into our future.
In 2003, Kaplan and Bennet asked researchers to exercise caution when they invoked race in medical research: whenever researchers publish a finding based on race or ethnicity, they should follow seven guidelines, including justifying their use of race and ethnicity, describing how subjects were assigned to each category, and carefully considering other factors — especially socioeconomic status — that might affect the results.42 We propose an adaptation of these guidelines to evaluate race correction in clinical settings. When developing or applying clinical algorithms, physicians should ask three questions: Is the need for race correction based on robust evidence and statistical analyses (e.g., with consideration of internal and external validity, potential confounders, and bias)? Is there a plausible causal mechanism for the racial difference that justifies the race correction? And would implementing this race correction relieve or exacerbate health inequities?
If doctors and clinical educators rigorously analyze algorithms that include race correction, they can judge, with fresh eyes, whether the use of race or ethnicity is appropriate. In many cases, this appraisal will require further research into the complex interactions among ancestry, race, racism, socioeconomic status, and environment. Much of the burden of this work falls on the researchers who propose race adjustment and on the institutions (e.g., professional societies, clinical laboratories) that endorse and implement clinical algorithms. But clinicians can be thoughtful and deliberate users. They can discern whether the correction is likely to relieve or exacerbate inequities. If the latter, then clinicians should examine whether the correction is warranted. Some tools, including eGFR and the VBAC calculator, have already been challenged; clinicians have advocated successfully for their institutions to remove the adjustment for race.43,44 Other algorithms may succumb to similar scrutiny.45 A full reckoning will require medical specialties to critically appraise their tools and revise them when indicated.
Our understanding of race has advanced considerably in the past two decades. The clinical tools we use daily should reflect these new insights to remain scientifically rigorous. Equally important is the project of making medicine a more antiracist field.46 This involves revisiting how clinicians conceptualize race to begin with. One step in this process is reconsidering race correction in order to ensure that our clinical practices do not perpetuate the very inequities we aim to repair.