Our obsession with measurements
fitbits, sleep score, steps, followers, one-rep max, heart rate, screen time, word count, IQ, GDP, GPA, net worth
Objective measurements have had phenomenal success in the physical sciences. It’s not intuitive, in my personal experience, to go out into the world and confirm or refute our beliefs proactively. There is a little bit of anxiety before we run an experiment or look at the final numbers—will we see what we actually expect to see?
However, not all things can be measured objectively. The less objective a thing is, the more careful we have to be. Usually, we’ll pick a measurement that approximates the thing that we’re actually interested in. For example, at times, I’ll use word count to measure how much writing I get done in a day, even though that doesn’t equate to how much quality writing I get done. Other times, I and some other writers use amount of time spent writing in the day, to gauge that we’re on track.
In these cases, measurements can still be useful—but we run into problems when we confuse the measurement for the thing that what we’re actually interested in. The problems grow bigger when we blindly optimize for those measurements, which are very rough approximations of what we actually want to optimize for. Things become especially dangerous when we use those measurements to create incentives.
Our health, our lives, our productivity can’t be summarized by a single number.
First, an example of the good use of measurement in a subjective area
A few years ago, I went to a seminar by Neel Shah, a physician with a focus on reproductive health. He told us the story of how he discovered a surprising fact. He found that one of the strongest risk factors for c-sections was ‘how busy a hospital was’ when a mother entered it to give birth.
This was a good use of data and measurements. The measurements were selected and interpreted by a physician, someone who understands the context in which the data was generated. He selected c-sections as a health measure because they were associated with surgical complications and had increased dramatically in the last few years. In the interpretation of the results, Shah pointed out if there wasn’t enough space on the floor, it added extra pressure on doctors to get the mothers out more quickly, and c-sections were often quicker than a prolonged labor.
Importantly, the health measure itself was not, as far as I know, used as a direct incentive for doctors. You can imagine that, with a high enough incentive, the number of c-section performed could be reduced to zero, or could be pegged at a specific target number. However, this wouldn’t be useful. But the thought experiment demonstrates an interesting property of measuring humans—they can respond to being measured, and as soon as you use a measurement as an incentive, the measurement becomes a weaker indicator of the thing your actually interested in.
Standardized tests in grade schools
For those that aren’t familiar, standardized tests are used to evaluate the performance of schools. The students themselves are given tests on math, reading comprehension, and writing—but it is the teachers, schools, and administrators that are being evaluated. The schools that do well are rewarded.
You could argue that in this case, although the tests are used as an incentive, they remain a somewhat good indicator of students performance. Students have to know how to read, after all, to do well on the tests. However, the tests still become worse indicators of overall education—because teachers start to teach to the test.
I grew up in the era of the standardized test. We had class periods dedicated to teaching us how to take them. It’s better to guess on a multiple choice question than to leave it blank. If you get stuck on a problem, make a note of it, and go back to it later. Make sure you fill in the circle completely or the machine won’t count it. They explained how the essay question would be graded. We needed to keep strict adherence to the 5 paragraph formula. And we practiced reading excerpts to answer multiple choice questions to test our reading comprehension.
And then what to all of the other aspects of education that are pushed to the wayside? Independent projects that give room for creativity, the cultivation of social skills and friendships, reading books1? Schools do a lot for our future generation that aren’t being captured by these tests.
So, yes, test the students. Track their progress. Identify what gets them to do better and worse on these tests, but don’t optimize for them2.
Academia
Educated people are not immune to the incentivizing effects of metrics. Academics are often evaluated based on the number of papers they publish and how often they get cited. This has, predictably, led to paper mills. Many academics would also argue that the quality of academic publications have also declined overall, and I would certainly agree.
Other numbers in our personal life
Another area where it is clear that metrification has gone too far in our conception of ourselves. So often, we confuse our self-worth with our GPA, or with our monthly income, or with the number of likes on our most recent social media post.
With technology, we have more personal measurements to track: fitbits, sleep score, steps per day, one-rep max at the gym, heart rate, followers, likes, screen time. For the social media inclined like me, there’s also SubStack notes per day and posts per month.
Some people confuse the number with what it is trying to reflect. They try to increase the number of followers they have, forgetting that it is only a reflection of the impact that their work makes, not a thing to strive for in itself. They try to improve their overall health but get fixated on the numbers and their actual health declines.
IQ and human value
A number of people seem to confuse one measure in particular, IQ, with human value. Some people think we should optimize future generations for IQ, promoting weird breeding strategies and positive eugenics.
Weird breeding strategies aside, the fact that people strike on IQ as the most valuable trait is strange in itself. They seem to take for granted the traits of conscientiousness, health, kindness, empathy, and collaboration. Can you imagine a world of intelligent conmen?
Charities and philanthropy
Metrification has come to charities. The idea here is to remove our subjective feelings about charities in order to compare them objectively and decide which one will do the most good with our money. In some ways, this is a good thing. We want to spend our charitable givings wisely. However, we have to be careful here for falling for the same trap we’ve fell for time and time again in other areas.
There is no objective measure for ‘what good a charity does’. I’ve seen some metrics thrown around like quality-adjusted life year (QALYs). I worry there will be unintended consequences by deciding where the money go based on these numbers.
Closing remarks
I guess, numbers are useful tools but be careful with them. Be mindful of when you start optimizing for the number instead of the soft unquantifiable thing that really matters.
If you feel like numbers are promoting negative behaviors in you or your workplace, take a step back from them. Recenter yourself on your actual goal. Remember why you started in the first place.
which has been on the news lately: The Elite College Students Who Can't Read Books
As I doing was research for this article, I found a book called the Tyranny of Metrics by Jerry Muller. He goes more into depth about the problems of using metrics in institutions like schools. For example, he talks about how using external incentives often reduces the intrinsic motivation. He also argues that incentives reduce morale if the people being measured don’t buy into the measurement. They need to agree it’s a good idea
Another point that I think is pertinent--sometimes numbers can reflect two things.
My go-to example for this is research on procrastination. About 95% of research on procrastination uses survey measures to assess procrastination. Basically, you ask a person ten to twenty questions about procrastination in their life and assign them a procrastination score.
This measure correlates pretty strong with *actually* procrastinating, when you measure it in terms of hard behavioral metrics (like when someone turns in a school report). So, researchers assume it's safe to use as a proxy. And, when they use it, they find that procrastination correlates very strongly with things like anxiety.
So, the consensus in the field is that procrastination and anxiety are related, and people examine that relationship in all sorts of papers. The problem is that, once you move from measuring procrastination using a survey to measuring procrastination using a hard measure, the relationship with anxiety vanishes, or grows so small that it's basically inconsequential.
Why? Well, for my money, the survey measures reflect two things. 1) A person's honest evaluation of their actual behavior, and 2) How they feel about it.
And anxious people are more likely to judge themselves harshly. It adds an element to those survey measures that corrupts them.
Overall, a very insightful look at our sometimes blind trust in metrics. This one quote stood out to me: "the thought experiment demonstrates an interesting property of measuring humans—they can respond to being measured, and as soon as you use a measurement as an incentive, the measurement becomes a weaker indicator of the thing your actually interested in."