Listed-surname sample suitable for most research among Hispanics
Editor's note: Roger S. Sennott, Ph.D., is vice president/general manager of Market Development, Inc., San Diego. David H. Taber is the company's director of business development.
Thanks to its size (22.4 million in 1990), its anticipated growth (72 percent from 1990 to 2010), and the fact that individuals can be reached cost effectively due to their dependence on Spanish-language media, the U.S. Hispanic market continues to be a key target for many consumer goods and services. U.S. Hispanics will account for 31 percent of the U.S. population growth from 1990 to 2010, compared to 36 percent for non-Hispanic whites.
Budget constraints, however, often prevent companies from conducting quantitative research among Hispanics. Many other companies limit their research to one large-scale study (which attempts to answer every question the marketing department has) every three to four years. This means that companies often throw away their opportunity to gather solid, usable information to guide their Hispanic marketing efforts.
Research personnel frequently claim that Hispanic research tends to cost much more than comparable general market research. The reason for increased cost is their insistence that the sample source used for Hispanic studies represent as many potential respondents as possible. The increased representation, however, increases the cost of the research by lowering the overall study incidence (i.e., the proportion of potential respondents who are Hispanic decreases as the representation increases).
Market Development, Inc. (MDI), a San Diego research firm, funded research to evaluate the most common sample sources used to conduct telephone studies among Hispanics. Its purpose was to measure the differences among the various sample sources in terms of respondent profiles and dialing productivity. The results suggest that for most standard Hispanic studies, the least expensive sampling alternative is more than satisfactory.
Background on telephone sampling among Hispanics
Sampling approaches for Hispanic telephone studies range from samples representing only part of the Hispanic telephone household population - but yielding a high incidence of Hispanics -- to samples representing the entire Hispanic telephone household population and yielding a significantly lower incidence of Hispanics.
Three common methods for creating Hispanic telephone samples are:
- randomly selecting telephone numbers chosen from listed numbers of individuals with Hispanic surnames (listed-surname sample);
- randomly selecting telephone numbers that have exchanges located in high-density Hispanic population areas (high-density sample); and
- combining the methods above, developing a proportion of the sample using a high-density sample and the remainder of the sample using a listed surname sample from low-Hispanic density areas only (hybrid sample).
If cost and timing were not considerations, a fourth and optimal sample choice is the general market standard of selecting telephone numbers that are truly random within particular markets (random-digit dialing, or an RDD sample).
Listed-surname samples are based on a list of over 12,000 Hispanic surnames supplied by the Bureau of Census. Listed-surname samples have the advantage of a very high incidence of Hispanics, typically 70 percent and higher. Errors of commission, i.e., respondents who have Hispanic surnames and do not consider themselves Hispanic, are eliminated by screening for Hispanic self-identification.
Listed-surname samples, however, represent only part of the Hispanic telephone household population. The two sources of omission errors, i.e., households with telephones who would classify themselves as Hispanic but do not appear on listed surname samples, are Hispanics:
- with unlisted telephone numbers. These could account for anywhere from 30 percent to 55 percent of Hispanic telephone households depending on the market; and
- without Hispanic surnames. According to the Census, this could include up to 18 percent of the Hispanic population, half of whom would have listed phone numbers, thereby omitting an additional 9 percent of the Hispanic telephone household population.
The purpose of the high-density samples is to include unlisted numbers within the sampling frame and, at the same time, maintain a reasonably high incidence of Hispanics. Telephone exchanges can be drawn from high-density census tracts or high-density ZIP codes. Telephone numbers are then typically created by randomly generating the last four digits. Many of the resulting phone numbers may be from households outside of high density areas. This occurs for two reasons. One, telephone numbers with the same exchange are not necessarily located in the same area. Two, ZIP codes and census tracts based on the 1990 census may be somewhat inaccurate in terms of today's Hispanic population.
The most common type of high-density sample is based on census tracts or exchanges that have a Hispanic population density of at least 30 percent. This type of sample can yield a Hispanic incidence of anywhere from 40 percent to 60 percent, depending on the market. High-density random samples also omit part of the Hispanic telephone population, namely those Hispanics who live in low-density areas. The number of households excluded can vary substantially (from 10 percent to 80 percent) depending on the market.
The hybrid sample offers a compromise. Its Hispanic incidence efficiency is greater than that of the high-density sample and less than that of the listed surname sample. It also includes a greater proportion of the Hispanic telephone population than the other sample types.
Key questions when using the hybrid sample are: What proportion of the numbers should be derived from a high-density sample? What proportion from a listed-surname sample in low-density areas? And, what should be the cut-off for determining whether a density level is high or low? A typical approach is to use 30 percent as the cut-off density level and to administer one-half of the completed interviews using each of the two sample source components.
Study findings
In terms of past four-week household usage for the 13 different products or foods asked about, there was only one product from one sample source which was significantly different from the RDD sample. The usage of bottled salad dressing was higher using the high-density sample. (See Table 1.)
In terms of household ownership of the 10 high-ticket products and/or services asked about, incidence figures for two products deviated from those for the RDD sample. When using the listed-surname sample, the incidence of having an answering machine was lower and the incidence of having a washing machine was higher. (See Table 2.)
Respondents were asked about 13 different classification variables and significant differences appeared on two of them relative to the RDD sample. All three samples reported a higher incidence of Spanish-television viewing than did the RDD sample, and the listed surname sample resulted in a higher proportion of the respondents reporting they were of Mexican origin. (See Table 3.)
As for productivity issues, there were, as expected, dramatic differences between the samples. For the listed surname sample, 13 percent of all dials resulted in contacts (i.e., getting the opportunity to ask the correct respondent the first screener question), and 74 percent of the households claimed to have a member who was of Hispanic origin. Using the high-density sample resulted in an 8 percent contact rate and a Hispanic incidence of 45 percent. The productivity of the hybrid sample was between that of the other two.
All of the results were also analyzed by looking only at respondents who speak Spanish more than English at home (a very common screening requirement for Hispanic studies). Of all 36 variables asked about, only the results for three were significantly different than those for the RDD sample. In two cases, the listed-surname sample was different, in one case the high-density sample was different, and none of the variables was significantly different using the hybrid sample.
And the winner is . . .
If a winner needed to be selected from the sample test study, it would be the hybrid sample. When compared to the RDD sample among Spanish-speaking respondents it was not significantly different on any of the 36 variables. Both the high-density and listed-surname samples, however, also provided results that were very similar to those of the RDD sample. By chance alone, one would expect each sample to produce results different from those of the RDD sample on one or two variables.
If all samples are satisfactory for representing the Hispanic consumer, the sample selection criteria should focus on productivity, or cost, issues. The listed-surname sample clearly is most productive and is less costly because it:
- has fewer non-working numbers, since all numbers are from listed households;
- has a Hispanic incidence much higher than for the others; and
- has lower sample costs, since fewer telephone numbers need to be purchased.
Using a high-density sample as the baseline, a hybrid sample could reduce the total study cost by 10 percent to 20 percent, and using a Iisted-surname sample could reduce the total study cost by 20 to 40 percent. Hopefully, the listed-surname approach would free up enough dollars to conduct at least one quantitative study instead of relying on information from a Hispanic expert. It also might allow the company to divide a large "one size fits all" questionnaire into two or more focused questionnaires.
The results of the study show that when a proportion of the population is excluded from a sample, the resulting sample is only inferior if the excluded individuals are significantly different on key variables than those who are included. This principle also applies to sampling targets other than Hispanics.
Often, sampling issues gain exaggerated importance since this is one of the few areas in which error can be measured scientifically. When conducting research among Hispanics, researchers would probably best be served by improving other study considerations which typically have a much greater impact on the results. Their attention should be focused on eliminating the possible biases caused by:
- interviewers who speak Spanish poorly;
- use of English-speaking supervisors for Spanish-language studies;
- asking questions that are inappropriate for the Hispanic market;
- excluding questions that are critical for Hispanic studies;
- questionnaires that are too long; and
- questionnaires that are poorly adapted into Spanish.
In summary, MDI's study shows that, at least for the Los Angeles ADI (the largest U.S. Hispanic market), it is possible for companies to gain quantitative insights into the Hispanic market without paying research prices that are higher than those for comparable general market studies. Given the size and potential of this lucrative market, that's good news indeed.