Looking for a few to stand for the many
Editor’s note: Kevin Raines is principal, Corona Research Inc., Denver.
Some years back, my company was given an interesting problem by a client. The client, a large government agency, wanted to do some focus group research within the black population. The research needed to be representative of all black households in the United States - within the context of being small-group research, of course - and the topic was one that could vary significantly by community. (I use the term “black” in this article because of a debate about whether the term African-American might exclude people who are of black racial ancestry but whose heritage is traced through an intermediate geography other than Africa. [Think Jamaica or Haiti.] Our client wanted “black” in its broadest definition.)
So where, they asked us, should they do the focus groups? On the surface, the question is easy to understand. Blacks, like Americans of any race, live in a vast range of economic and social environments. Other than their racial identification, a black financial analyst living in downtown Manhattan may have absolutely nothing in common with a black sales rep living in Cheyenne, Wyo., who may have nothing in common with a black farmer living outside Gadsden, Ala. Our mission was to find six communities in the nation that, in combination, best reflected this diversity. (Within those six communities, multiple focus groups would be conducted of different demographic strata such as age and gender, so we didn’t have to worry about that.)
Piece of cake. Well, except for that “in combination” part. There are 3,141 counties in the United States, which means that there are more than 1 quintillion different combinations of six that are possible. Now, I like to think I’m good at math, and I would humbly say that I’m pretty darn good at it. But crunching a quintillion combinations was a bit of a daunting assignment, particularly when the supercomputer at the office was already tied up doing some sort of calculations related to the origins of the universe. (One of our analysts has a theory…) If we evaluated a million combinations per second, it would take us about 32,000 years to run through them all.
This meant that we had to have a system. We had to first break down the problem to its core elements and narrow the possibilities as much as possible, and then we had to come up with a specific analytical method to identify the best combination using a method other than brute force.
Demographics and math - I love this stuff. I hope you do, too, because you’re about to get a big, interesting dose of it. However, I promise to keep it a layman’s level for those who may be more interested in results than process.
Phase one: narrowing the problem
First off, I should address the whole definition of “community.” We initially thought about considering a city to be a community, but this caused all sorts of problems, the biggest of which is that there are a lot of rural people in the U.S. who don’t live in cities. So we went with counties as our definition of a community. Plus, there’s some great data available at a county level that isn’t available for cities, so it became an easy decision.
Having made that definition, we first looked at some raw geographic criteria. Where in the country do most blacks live? What sizes of communities do they populate? Knowing that we had to pick exactly six counties when all was said and done, we decided to immediately eliminate the possibility that all six counties would be next to each other, or that all six would coincidentally be big cities or small towns. By setting constraints that forced us to pick counties in different parts of the country, and counties of different (but representative) sizes, we could force the process to be geographically equitable.
First, we divided the black population up by the size of the community that they live in (considering only the black population). What we found was interesting. At the top end, we found that one-sixth of the U.S. black population lives in counties that are home to 600,000 or more black people. And what’s really interesting is that there are only six counties in the U.S. that fit that description: the core counties of New York City, Chicago, Philadelphia, Detroit, Houston and Los Angeles. So we had already narrowed down the candidates for one of our six sites. At the bottom end, we found that one-sixth of the U.S. black population lives in counties that have fewer than 15,300 blacks. There were a whole bunch of those, which would get narrowed down quickly when we looked at them more closely.
After figuring out these size distributions, we made the decision that one of our six communities would be drawn from each slice of the pie in Figure 1, in order to represent the varying sizes of black communities around the country.
Next, we looked at geography. Where in the nation do black populations live? That too was interesting. If you go by the regional definitions of the Census Bureau, slightly more than half of all American black people live in the South - that broad swath from Maryland to Texas. About one-sixth live in the Northeast, one-sixth in the Midwest, and about one-tenth in the West.
This again helped us. We decided that three of our six communities would be represented by Southern counties, one would be in the Northeast, one in the Midwest, and, for lack of a better landing spot, one in the West. Other than a little uncertainty about the West, the match was pretty easy.
Finally, we went back to our smallest size category and started cutting out individual counties. If we needed to invite 50 or so people to focus groups in each community, we first needed to ensure that there are enough people to recruit in that county. Believe it or not, the black population of Alaska’s North Slope is pretty small. Since it’s not possible to get everyone in a county to come to focus groups, or to find them and recruit them, we set a minimum population size for counties and limited our candidates to those populations.
Based on our past experience conducting focus groups with black participants, we somewhat arbitrarily set a population threshold of 10,000 or more blacks. Any county with a smaller black population was immediately eliminated. While we could certainly populate focus groups in smaller communities, and while we hated to eliminate those smaller population clusters, it made a lot of sense to do so. First off, about 90 percent of the black population lives in counties with 10,000 or more blacks, so we were still keeping most of our eligible population. Second, there may be practical issues in targeting smaller populations in terms of being able to do solid scientific recruiting. And third, there was another practical issue: we were able to eliminate 2,622 of our 3,141 counties by setting this threshold.
We were already making great progress. By cutting our list of candidates from 3,141 to 519, we were able to cut our number of combinations down from 1 quintillion to only about 26 trillion. That was a good start, and our computers were breathing a sigh of relief. We were now down to only about 300 days of work at a million combinations per second.
We got kind of fancy at this point. Knowing the information in the map and the pie chart, along with a bunch of local-area statistics about the 519 counties, you can mathematically optimize which size of county should be picked from which part of the country, with a goal of being the most representative. For those of you who know linear programming, you can e-mail me and I’ll walk you through it, but the bottom line is that, if you want to best approximate the nation, the optimum combination of region and community size is shown in Table 1. You’ll see that it satisfies both of the constraints that we decided on in terms of region and community size.
So this thing keeps getting easier. There are only two Midwestern counties that are home to more than 600,000 blacks: Wayne County (the core county of Detroit), and Cook County (the core county of Chicago). One of them was destined to be picked. There are only eight Southern counties with black populations of 275,000 to 599,999. One of those would be picked. It got a little hairier from there, as there were more smaller communities that were eligible, including 292 Southern counties in the two smallest categories.
All in all, we were left with 325 candidates remaining from our initial list of 3,141. I didn’t care to determine our exact combinations at that point, but a quick back-of-the-envelope calculation means that we were probably below 1 trillion combinations now. Worst case, 1.5 trillion. This was great!
Phase two: identifying the candidates
By using just a few basic geographic criteria, we were able to winnow our list of candidates from 3,141 to 325. But we care about a lot more than just the raw population and the part of the country they’re in. A community is defined by a number of factors, some of which we can measure and some of which we can’t. Since we needed to do this scientifically, we stuck with only those things that we could measure. Fortunately, there are a lot of those factors.
We began sifting through data sets to figure out what types of data are available. Our goal was to find data that was meaningful in defining the characteristics of a community (that one’s obvious) and was consistent for these 325 communities. If we researched each community individually, it would be too easy to end up with lots of apples and oranges - data for each community that was gathered in different ways and didn’t allow us to do direct head-to-head comparisons. For that reason, we decided to stick with data from federal government sources.
At this point, we buried ourselves in numbers. For every community, we developed a statistical profile of the community, both for the community in general and for the black community as a subset of the general community. (Okay, I admit that we didn’t just do it for the 325 candidate counties. As long as we were doing the work, it was kind of interesting to put them together for all 3,141 counties. We had a fixed-price contract, but I still couldn’t resist, so it became kind of a hobby for a while.) These profiles contained 18 different measures, including information on households, demographics, the local economy and a few measures that were specific to our client. A few examples that are representative of the main categories include the population growth rate (both for the entire county and for the black population), household income levels, the proportion of the population that was black, home ownership rates, economic differences between black and non-black households, the job structure of the community (e.g., service economy, manufacturing, etc.), poverty rates, family structures and many others.
Phase three: number-crunching time
At the end of phase two, we had 325 communities in our pool, and we knew a whole lot about each of them. Now we returned to our original question: Which six communities, in combination, best represent the diversity of America’s black population?
Here’s where it got fun. Theoretically, we could look at each of our 18 community statistics and identify whether a community was in the top one-sixth nationally, the bottom one-sixth, or somewhere in between when compared to all of the communities that had a black population of more than 10,000. In a perfect world, we could identify six communities where each community fell into one segment for a particular statistic. For example, one community would have high average incomes, one moderately high, one slightly above average, one slightly below average, one moderately low, and one with low average incomes. The diversity of America, right?
Our challenge was that we had to do it simultaneously for 18 different measures, and for around a trillion combinations. So it was back to advanced math, or more specifically, a really big linear programming model.
For those of you not familiar with this type of model, it’s essentially a way to define a real-life situation in numbers, and also to define a measure that you want to either maximize or minimize. Once you do that, you define constraints that must be met in the selection process, and then you can use this mathematical technique to sift through potential solutions that find the “best” minimum or maximum without violating the constraints. This type of model can be used for everything from maximizing seating in a restaurant to optimizing a client’s advertising dollars. The interesting part is that, if you’re creative, you can go beyond traditional statistics and set constraints that go beyond basic math. For example, in this model, we mathematically set a constraint that one of our six sites had to be within driving distance of Washington, D.C., so the client could drive out and observe at least one set of focus groups.
So after a lot of pondering and a few days of intense work, we were able to set up a mathematical model that included three main parts: One part was a database that defined all of our candidate communities, so the model could distinguish between communities. A second part contained a bunch of constraints, which were basically equations that defined “diversity of locale” for each of the community measures, as well as a few other specific constraints that the client wanted (e.g., the driving distance from D.C.). The third part, the master equation, would take a combination of six counties, examine their characteristics, and do two things: determine if they met the constraints at all, and if so, assign them a score for how well they met them. High score wins.
The beauty of this type of model is that it doesn’t have to test every combination. It looks at trends and slopes, and quickly pares out blocs of combinations that don’t look promising, without testing every one of them. It’s not a perfect system, for practical reasons too detailed to describe here, but it’s a good system that will yield a strong result. We assigned it to a computer and left it alone for a while. (A while being a day or so.)
Results made sense
Once we heard the little chime that the model was done running, we checked it out. The results made sense. We ended up with six communities that seem to show quite a variety of environmental and social conditions. Our six communities were:
Cook County, Ill. – The core county of Chicago, this county represented a very large “flagship” African-American community that represented a broad spectrum of occupations, incomes, lifestyles and backgrounds. (An interesting bit of trivia: one out of every 30 black people in the U.S. live in Cook County.)
City of Baltimore – This combined city-county community represented a large central-city environment with relatively low median income and proportionally high black population.
Clark County, Nev. – This community, the core county of Las Vegas, represented a “black enclave,” a community where a relatively sizeable, fast-growing black population is somewhat isolated from other black populations. The black community here, as compared to other communities was also statistically somewhat similar to the general population, another criteria in our evaluation.
Middlesex County, N.J. – This community represented a “dispersed black population” that is a small proportion of the overall population, has a relatively high income, and (perhaps out of necessity) is statistically very similar to the general population.
Gregg County, Texas – This represented a classic mid-sized Southern community of a type that is home to a large number of black households.
Oktibbeha County, Miss. (and surrounding rural counties) – This area represented a classic small Southern community with a proportionally high black population that, on average, economically lags the general community. (A university in this location skewed our numbers here a bit, but after further analysis the site was kept.)
A spectrum
Are these sites good for you if you want to do market research with the black population? Probably. They may not be the perfect combination for your needs, because our client had some specific criteria for its own purposes (and indeed the client later added other project-specific criteria that resulted in some changes). However, unless you have specific needs, these six communities represent a very strong combination that depicts a spectrum of black households.