Beyond the Boxes, Part 5: Analysis and Interpretation of Race and Ethnicity

Natalie Smith, Rae Anne Martinez, Nafeesa Andrabi, Andrea (Andi) Goodwin, Rachel Wilbur, Paul Zivich

This post is the fifth in a series about the use of race and ethnicity in population health research. Our previous posts have reviewed defining, measuring, and coding race and ethnicity. Researchers then typically move into analysis and interpretation, which is the focus of this post.

By the time we arrive at analyzing data, researchers should have (1) explicit working definitions of race and ethnicity, (2) a clear understanding of our measures, (3) a clear rationale for coding variables, and (4) an ability to justify our choices. Without giving serious attention to all of these aspects, our analyses and interpretations can at best be superficial, and at worst result in conclusions that do not reflect reality and could harm the populations we are studying.

Considerations for analyses

Descriptive aims work to describe a defined population (e.g. the proportion of individuals with characteristic “X”). These types of aims are often exploratory and could benefit from including multiple measurement schemes for race and ethnicity.

For example, a study may consider measuring both self-identification and self-classification. Open-ended measures may be preferred over requiring respondents to select from predetermined categories t. Self-classification could also be measured to align with reporting requirements or later analytic methods. Directly measuring self-classification is preferred over attempting to re-code self-identity into higher aggregated categories, because that way, we are not imposing our own assumptions over others’ identities.

Predictive aims focus on creating algorithms or rule-sets to predict future characteristics. These aims often create an algorithm using training data and are then deployed to a real-world setting. In that case, studies should ensure that the training data uses race and ethnicity dimensions that are consistent with the real-world setting data. However, we note that the utility of inclusion of race or ethnicity in clinical screening tools and other predictive algorithms (e.g. prison sentencing) is questionable.

Inferential aims focus on the estimation of causal effects, such as what the risk of death would have been had everyone been exposed to chemical “A.” These studies require explicit conceptualization, measurement, and coding of race and ethnicity.

As a confounder, researchers should consider what dimensions of race and ethnicity might be most relevant. Even when race or ethnicity is a confounder, it is critical to use the correct dimension. For example, melanin expression, rather than self-reported race, would likely be a confounder of the relationship between sunscreen use and development of skin cancer. Another example is the relationship between taking “Medication B” and subsequent stress. In this context, racism may be the common cause of both not receiving “Medication B” and elevated levels of stress.

When race or ethnicity is of direct interest, we might be interested in the causal effect of race or ethnicity itself (perhaps as a proxy for racism). In this context, an unclear definition of “race” likely does not meet the definition of a “well-defined” exposure. Better defining race or ethnicity along specific dimensions can move us closer to a well-defined exposure, and could then be incorporated into the potential outcomes framework. Another framework is to study racial or ethnic disparities through mediation analyses, as discussed here and here. Finally, researchers might also be interested in the modification of another exposure effect by race or ethnicity. In that case, measures that focus on disparities between groups have also been proposed.

Considerations for interpretation

In our review of epidemiology literature, we were unable to credibly assess if researchers interpreted race and ethnicity results appropriately, because as our previous post discusses, authors typically do not state how they conceptualize race or ethnicity. We offer two examples of interpretations highlighting alternative interpretations based on different conceptualizations. Consider a study where race is a “risk factor:” Where authors have stated that race is a social construct, race in a “risk factor” study might be interpreted as a proxy for structural and historical factors that influence access to resources (power, privilege, control of governmental or commercial institutions, aggregation of economic resources, and so on) and constrain life chances.

However, where authors have stated race is underlying biological and genetic variation, race in a “risk factor” study would be interpreted as immutable, innate biological differences between groups of people. Despite widespread scientific agreement that race has no biological or genetic basis, we see time and time again that scientists still interpret race as a biological difference. Focusing on interpreting “racism not race” has been highlighted in recent months, and we are encouraged by this work.

As we are interpreting our analytic findings, we ask ourselves the following questions:

Does your interpretation align with the dimension of race that you measured?
Have you checked that you are not assigning a biological cause to racial or ethnic differences that were measured using a self-reported dimension?
Have you considered interpersonal, organizational, and structural racism as a potential explanation or mechanism?

Resources:

LaVeist, T. A. (2000). On the study of race, racism, and health: a shift from description to explanation. International Journal of Health Services, 30(1), 217-219.
Vyas, D. A., Eisenstein, L. G., & Jones, D. S. (2020). Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. NEJM 383:874-882.
Hernán, M. A. (2016). Does water kill? A call for less casual causal inferences. Annals of epidemiology, 26(10), 674-680.
Robins, J. M., & Weissman, M. B. (2016). Commentary: Counterfactual causation and streetlamps: what is to be done?. International Journal of Epidemiology, 45(6), 1830-1835.
Naimi, A. I., & Kaufman, J. S. (2015). Counterfactual theory in social epidemiology: reconciling analysis and action for the social determinants of health. Current Epidemiology Reports, 2(1), 52-60.
VanderWeele, T. J., & Robinson, W. R. (2014). On causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology (Cambridge, Mass.), 25(4), 473.
Naimi, A. I. (2016). The counterfactual implications of fundamental cause theory. Current Epidemiology Reports, 3(1), 92-97.