Quality of Life Measures and Standardized Tests Share Equity Problems
Leah McClimansWe live in a culture of evaluation and assessment, perhaps no more so than in public health. Epidemiology, health promotion, and health services research are just some of the areas that rely on standardized metrics to obtain outcomes for mortality, morbidity, and quality of life—the bread and butter of public health professionals.
Measurement stands for values of rigor and precision. It also stands for values of justice and equity. Ideally, every death is counted, illness is reported where it is found, and quality of life is assessed without prejudice. Of course, ideals are difficult to realize, and measuring public health outcomes in practice is a messy and difficult business (as the recent pandemic has reminded us). Yet it is important to have ideals. They are our North Star.
Measurement outcomes give the impression of justice and equity by avoiding bias in human judgment. When two students in a class are selected for a gifted and talented program, administrators can defend the choice with reference to IQ tests, not teacher judgement. Similarly, health economists and government officials use QALYs (Quality-Adjusted Life Years) rather than State preferences, to defend health expenditures on curative treatments in patients with a longer life expectancy over life-extending treatments in patients with a poor baseline quality of life.
Yet we might wonder: Why do so many White middle-class students enter gifted and talented programs? Why is research on disability and chronic illness underfunded?
Consider a measure methodologically similar to many quality of life measures: the standardized achievement test (SAT). Last week the New York Times announced that University of California would end the use of SAT and ACT in admissions. In 2018 University of Chicago went “test optional.”
The problem with the SAT and ACT? They aren’t equitable. As a 2019 lawsuit against the University of California states, instead of academic ability or curriculum mastery, these exams reflect demographic and socioeconomic characteristics such as wealth, race and parental education. For years, similar arguments have been made about QALYs and non-utilitarian quality of life measures. Measures don’t always fulfill their promise of justice and equity.
There are at least two ways that SATs and quality of life measures can reflect problems of equity. For instance, when we look at SAT outcomes, we find White and Asian students are nearly two times as successful as African American, Latino and Native Indian students in hitting SAT benchmarks for college readiness. And persons with disabilities and/or chronic illnesses have a poorer quality of life when measured with standardized patient-reported outcome measures.
What should we make of these outcomes? Some will say that while these outcomes are regrettable, the problem isn’t with the measuring instrument. The quality of education in the U.S. is uneven, and that is what SAT outcomes reveal. Similarly, access to physical and social activities is uneven for people with disabilities and chronic illnesses—and some disabilities inherently limit participation—and this is what quality of life outcomes reveal.
Not everyone agrees. Ableist expectations about a “good” quality of life are often built into the design of quality of life measures. For instance, an instrument assessing the quality of life of a person with epilepsy may focus on the number and severity of seizures, rather than on stigma and privacy. People with epilepsy report that assumptions about how seizures affect a person’s ability to function are important to a good quality of life. For example, if an employer believes that seizures limit performance, then their number and frequency takes on a special significance. Thus, part of the problem with frequent and/or severe seizures is the lack of control one has over where they happen, and who might witness them, in short, seizures tend to violate privacy. Although the FDA has tried to improve the measure development process to include patient perspectives, change is slow. Similar to quality of life measures we see certain expectations built into the design of SATs. SATs are designed with the expectation that college readiness is a matter of conscious knowledge delivered under time pressure. This expectation rewards students who are good at recalling certain kinds of facts and working through certain kinds of problems. Even the College Board recognizes this is a flawed expectation, and last year tried to add an “adversity score” (they abandoned the attempt a few months later).
Another problem with SATs and quality of life measures is “differential item functioning” or DIF. DIF occurs when different groups—say, White students and African American students—have different probabilities of getting a question right even though the groups both have the same, say, reading ability. In quality of life measures, DIF is also investigated across time. For instance, my quality of life is supposedly the same, but I answer a question about my ability to participate in leisure activities differently now than I did two months ago.
When DIF is present, it suggests that a question is biased. This bias, in turn, suggests a problem with the validity of the instrument. Multiple studies have found DIF relationships between the difficulty of items on the SAT and different racial groups. DIF is also found on questions in quality of life measures.
Measures, like humans, are imperfect and complex. It is important to remember this as we strive to fulfill our ideals. Measurement is not our North Star. As we aim for equity in public health and elsewhere, we must use measures with our eyes open to their flaws.
There is some very good literature that can help to open our eyes to bias in health measurement. Quality of Life and Human Difference: Genetic Testing, Health Care, and Disability is an excellent anthology with some relevant pieces, in particular, this one authored by Ron Amundson. There is also this more recent paper from the Kennedy Institute of Ethics Journal, and here is a paper I co-authored that explores one version of DIF (referred to as response shift) in quality of life measurement. Or you might take the advice that was recently given to me by a patient advocate who is herself disabled: consider getting involved with the FDA’s Patient Representative Program. This program gives patients and patient advocates the opportunity to work with the FDA directly to improve the development of measures, such as patient-reported outcome measures (which measure constructs such as quality of life or physical functioning) and provide direct input into drug approvals.
All comments will be reviewed and posted if substantive and of general interest to IAPHS readers.