Editor’s note: Joe Hopper is president of Chicago-based Versta Research. This is an edited version of a post that originally appeared under the title, “How random correlations ruined a gender study.”
The problem with most business approaches to analytics is that they rely on automated, unthinking algorithms. The algorithms scan boatloads of data and generate correlations that are surprising and, therefore, presumably deeply insightful. But in most cases the correlations are not insightful. They are random coincidences, which unfortunately does not stop humans from sharing with others the surprise (and presumably importance) of those findings.
The New York Times’ analytics department offered a recent case-in-point with a story about the number of women CEOs who head up large companies. They showed that “the number of chief executives named John … is very similar to the number of female executives” among Fortune 500 companies. They go on to list all sorts of strange statistical comparisons like this one: “Fewer women directed the top-grossing 100 films last year than men named Michael and James combined.”
Comparisons like these are always silly and uninformative. But sometimes they are even worse because they suggest the opposite of what the analysts should be conveying.
First, comparing to the number of men named John, when John is one of the most common names in America, implicitly suggests that the number of women CEOs is actually quite high. If you want to emphasize that something is uncommon, then you should compare it to something that everyone knows is uncommon.
Second, comparing something seemingly random like the number of CEOs with the first name John suggests that the incidence of women in positions of power is similarly random. But it is not. The social mechanisms behind gender discrimination are deep, powerful and pervasive. The random correlation between these two statistics (and the near-randomness by which men named John become or do not become CEOs) trivializes the reasons why women do not become CEOs.
So it turns out that silly, random correlations are not harmless entertainment. Sometimes they have a rather pernicious effect. Turning data into stories requires finding and showing powerful comparisons but those comparisons need to be authentic and meaningful.