This post is the fourth in a series about the use of race and ethnicity in population health research. Our previous posts have outlined our thoughts around conceptualizing and measuring race and ethnicity. Generally, researchers will then have to code those race and ethnicity measurements for use in analyses. This post details how we approach coding these variables.
As we’ve discussed, issues of race and ethnicity measurement are directly linked to analysis and interpretation. But there is another layer of complexity here: coding. The practice of coding variables—collapsing groups together or merging different variables—can fundamentally alter the results of data analysis, and ultimately, the interpretations of those results. As a part of our larger project studying how population researchers incorporate race and ethnicity into their work, we examined how population health studies code race and ethnicity. You can see some of our results here. In sum, we find:
- Emphasis on binary coding schemes oriented around whiteness (i.e. “white,” “non-white”)
- Broad use of “white,” “Black,” “Hispanic,” and “other,” where “Hispanic” is used as a de-facto racial category, and everyone else is aggregated into the ambiguous “other”
- Slight variations on the above
These findings drive our guiding questions:
- How will you code race or ethnicity in statistical analyses?
- Have you collapsed race and ethnicity into an ethno-racial construct?
- If you aggregate different racial or ethnic groups, what are the potential implications for your findings?
- In your manuscript, have you communicated which groups you collapsed together, why these decisions were made, and what implications they have for interpreting your findings (biases, limitations)?
The underlying assumption when we engage with race and ethnicity variables is that these groups are meaningful in terms of history, privilege, access to resources, cultural similarities, and so on. When we collapse groups together, we implicitly make decisions about whose history, power, privilege, and so on are more or less similar and important. Sometimes, depending on the context and specific study question, collapsing might be advisable. Racial or ethnic groups may share similar histories along various axes. But oftentimes they do not. In such instances, collapsing groups is not advisable, particularly in the context of the highly ambiguous “other” category.
We are still considering how to approach the aggregation of groups—and really, we should all push ourselves to think about this more. How can we meaningfully interpret a coefficient that represents people with vastly different contexts and backgrounds? Why did we bother including a group at all if we can’t produce findings with respect to their unique lived experience?
We also wanted to mention the differences between using an ethno-racial construct compared to using individual ethnicity and race constructs. These two approaches to coding have very different embedded assumptions. When race and ethnicity are kept separate for analyses, we assume that each captures distinct information, and could be related to population health outcomes in distinct ways. Conversely, when they are combined into an ethno-racial construct (e.g., non-Hispanic white, non-Hispanic Black, Hispanic, other), this assumes that race and ethnicity are capturing similar information and have similar relationships to health outcomes.
As researchers, we must be aware of the differences and similarities between race and ethnicity as constructs. Your research question might demand an ethnoracial perspective, or perhaps you’re dealing with data limitations.
Some additional guiding questions on this point:
- Did you intentionally end up with an ethnoracial construct?
- Do you agree with the assumptions behind this position?
Racial boundaries and identities may overlap with ethnic boundaries and identities. Clear delineation can be challenging. On that note, we believe it is important to name those we did not see represented, discussed, or acknowledged in the studies we sampled.
We never saw MENA (Middle Eastern or North African), Black or Afro-Latinos, or Indigenous Latinos. We also very rarely saw the diversity of Native American/Alaskan Native, Asian, or Hawaiian & Pacific Islanders explored in health scholarship. These groups are frequently relegated to the “other” category. For example, when we treat “Hispanic” or “Latino/a/x” as a racial group, Afro-Latinos and Indigenous Latinos typically fall into the same group, masking potentially important differences between those groups. Most authors fail to state who even constitutes the “other” category. As such, we are unable to tell if folks are present in our studies but masked by coding practices, or if they aren’t included at all. Regardless, more work is needed to explore the unique health concerns for these erased populations.
It is critical that we justify why we chose a particular action and that we understand the assumptions those actions entail. Over the last few posts, we have highlighted that our conceptualization of race ethnicity and how we measure these social constructs influence the assumptions we make when we collapse groups together. To be clear: We are all making assumptions that influence our methodological choices. We recommend that authors make their assumptions and choices clear in their scientific communication.
Our next post will discuss our guiding questions around interpretation of race and ethnicity in analyses.