Editor’s note: Patrick Elms is project group manager/quantitative research at TNS Market Development, a San Diego research firm.
When measuring attribute importance and brand performance among Hispanics, a frequent problem researchers encounter is the tendency for Hispanics to use only the upper end of the rating scale. For example, on a 5-point scale nearly all responses will be in the 3 to 5 range, and on a 10-point scale there will be few ratings lower than 6. Ratings such as these are much higher than are typically found among the general population.
In its years of conducting research among Hispanics, TNS Market Development has encountered this phenomenon numerous times and with various types of scales, including importance ratings, brand attribute ratings, agreement ratings, and psychographic attitude ratings. While most pronounced when using a traditional 5-point scale, the problem also occurs when using a 10-point scale, with responses concentrated in the top three or four response choices. With a 5-point scale, the level of high ratings is similar whether only the end points are anchored (e.g., 5 = very important and 1 = not at all important) or all the points are anchored (e.g., extremely important, very important, somewhat important, not very important, not at all important). Specifically with importance, we find that Hispanics resist identifying any attribute as unimportant when using a traditional rating scale.
This tendency to use the upper end of rating scales poses several challenges for researchers. First, the narrow range of responses results in few significant differences between mean or top-box (percent giving the highest rating) attribute ratings, making it difficult to identify the key attributes. Secondly, if the ratings are to be used in a factor analysis or other multivariate procedure, the low variability among the ratings reduces the effectiveness of the analysis. These problems are summed up well by DeVellis (1991):
A desirable quality of a measurement scale is variability. A measure cannot covary if it does not vary. If a scale fails to discriminate differences in the underlying attribute, its correlations with other measures will be restricted and its utility will be limited [p. 64].
A third problem occurs when making comparisons to a general population survey on the same questions, where comparing anything but the rank order of the attribute ratings can be misleading. For example, an attribute that is most important among non-Hispanics might have a top box score of 65 percent, while the same attribute among Hispanics could have a top box score of 75 percent, yet only be relatively moderate in importance. But when comparing the 65 percent score to the 75 percent among Hispanics, one might mistakenly conclude that the attribute is more important to Hispanics.
Only a handful of solutions have been suggested to address this problem. The most common is to expand the number of response items on the scale (Dana, 2000; DeVellis, 1991; Hui and Trandis, 1989). Hui and Trandis (1989) found that differences in responses between Hispanics and non-Hispanics narrowed significantly when a 10-point scale replaced a 5-point scale. However, DeVellis (1991) points out that expanding the scale runs the risk of respondent fatigue, which would lower reliability. He also notes that “false precision” may result if the respondent cannot discriminate meaningfully between the response choices (e.g., what makes an attribute a 7 versus and 8), since the error portion of the variability would increase rather than the portion attributable to the measured phenomenon.
Other proposed solutions concern the data analysis phase, including examining the complete distribution of each item rather than a single measure of central tendency or proportion, combining the two highest categories, and reporting standardized z-scores (Hui & Trandis, 1989). While these methods may shed additional light on the dynamics of the responses, they require additional analysis or data manipulation, and fail to address the key problem of invariability in high ratings. Clearly the better path is to create a scale where the tendency to heavily use the high end of the scale is less likely to occur in the first place.
To address this problem in importance ratings among Hispanics, TNS Market Development created a behaviorally anchored importance scale designed to spread the responses more evenly across the scale, and tested the scale against a standard importance rating using a split sample. It was desirable to limit the scale to five points, since that is the standard for most importance research, and therefore clients would be more familiar with its interpretation as compared to a scale with more points. Our hypothesis is that a scale with every point verbally anchored and that demands relative comparisons between attributes in the decision process would greatly decrease the occurrence of high ratings, and therefore increase variability.
Scale construction
Examples of rating scales with behavioral anchors are found sparingly in psychology and marketing. An early example of a behaviorally anchored scale is given by Nunnally (1967), who quickly dismisses behavioral scales as being too difficult to construct. More recently, Glowa & Lawson (2000) propose a “Propositional-Descriptive” scale to measure satisfaction, noting that compared to standard Likert scales, “respondents are less likely to cluster towards the middle or top of the scale.” Additional rationale for using behavioral anchors is presented by Schuman and Presser (1996) in comparing measures of intensity (attitude strength) and centrality (importance to a decision) toward social issues:
The centrality measure has a behavioral reference, whereas the intensity question refers only to feelings…People apparently find it easier to report feeling strong about an issue than to say that it is one of the most important issues they would consider in an election [p. 237].
One area in which behavioral anchors are widely used is in employee evaluations, where they are known as behaviorally anchored ratings scales (BARS). Two of the advantages of BARS noted by Maiorca (1997) are applicable to the high ratings issue among Hispanics. First, BARS “eliminate the use of potentially misleading numerical volume measures that are not readily interpretable,” and secondly, they “reduce rater bias and error by anchoring the rating with specific behavioral examples” (p. 1).
With these goals in mind, the specific wording selected to anchor each of the points on the scale intentionally set a high standard for the top response choice. Also, the anchors were designed to reflect the weight of each attribute in the decision-making process rather than some general notion of importance. The topic for the test was the importance of job attributes and benefits when making employment decisions. The following are the scale anchors and their corresponding numeric values:
5) You would never accept a job that did not have this benefit.
4) You consider this benefit equally with other important benefits when deciding whether to take a job.
3) You would accept a job that did not have this benefit if it had other benefits you want.
2) You consider this benefit a minor factor in your job decisions.
1) You don’t care at all about this benefit when considering job opportunities.
We call this scale a decision criteria anchor importance scale because points are anchored with phrases that describe attributes in relation to the decision process. The scale can be easily transformed for any product or service category by replacing “benefit” with “feature” and referring to purchasing the appropriate category rather than accepting a job.
Test methodology
The scale was tested as part of a quarterly telephone omnibus study among Hispanics in the Los Angeles, Houston and San Antonio designated marketing areas (DMAs). The sampling method used listed Hispanic surname selection, in which phone numbers are randomly selected among those listed in the phone book with surnames that are among the 13,000 identified by the U.S. Census Bureau as typically Hispanic. Respondents were screened to be self-identified Hispanics age 18 or older. Approximately half of the sample was in Los Angeles, with one-quarter each in Houston and San Antonio, and the data was weighted by market to reflect Hispanic population size. Half the interviews were among males and half among females.
A total of 601 interviews were completed, with 302 asked the employment importance questions using a traditional importance scale and 299 asked the questions using the decision criteria anchor importance scale (we will refer to this as the “behavioral scale” in the analysis). The traditional scale asked for a rating from 1 to 5, where 5 means very important and 1 means not important at all, with the middle points undefined. Random selection was used to determine which type of scale each respondent received. The employment benefits tested were as follows:
- high income;
- job security;
- flexible working hours;
- good health care and other employee benefits;
- a fun, social working atmosphere;
- opportunities for advancement;
- close to where you live;
- a company with a good reputation.
Results
The results for each of the items on the two scale types using top box (5) and top two box (4 or 5) ratings are shown in Table 1. The behavioral scale performed as expected by significantly reducing the percentage of responses in the upper two choices across every attribute. The difference was most pronounced for top box scores, where the anchored scores ranged from 18 percent to 46 percent, compared to the traditional scores from 56 percent to 89 percent.
To detect any interactions between the ratings and respondent characteristics, Tables 2 through 5 show the top box scores for both scale types by gender, age, place of birth and language spoken at home. There are no significant differences between gender and age, but place of birth and language show that more acculturated Hispanics (i.e., U.S.-born and speaking English more than Spanish) are much less likely to give high importance ratings than less acculturated Hispanics on the traditional scale. With the behavioral scale, responses from more acculturated Hispanics were somewhat less positive than among those less acculturated, but not nearly as much as in the traditional scale. In other words, the ratings gap between more acculturated and less acculturated Hispanics was smaller using the behavioral scale.
Another hypothesized effect of using the behavioral scale is that the variability will be greater than when using the traditional scale. Table 6 compares the standard deviation (dispersion of the responses around the mean) and squared multiple correlation (correlation between each item and a combination of the other seven items) for each employment benefit by each of the scales. For each item the variation using the behavioral scale is higher than when the traditional scale is used, indicating that multivariate analyses based on the correlation matrix will yield better results.
Conclusions
As predicted, the decision criteria anchor importance scale successfully reduced the number of high ratings, with top box ratings typically less than half of those produced with a traditional numeric rating scale. The scale is effective across gender and age groups, and reduces the differences between acculturation levels due to high ratings.
One issue to consider when deciding whether to use the behavioral scale is whether any comparisons are to be made against other data sets, either from Hispanic benchmark data or to similar studies with other populations. It is clear that data from the behavioral scale cannot be directly compared to previous research among Hispanics that used a traditional importance scale. However, as noted at the beginning of this article, this problem also exists when comparing traditional scales where Hispanics focus on the upper end of the scale. Considering the benefits of the behavioral scale, we recommend that it be used as a new benchmark, and any comparisons to past data use rank order of the attributes. For general market and other population comparisons, we similarly recommend using the anchored scale, either for Hispanics only for rank order comparisons, or ideally for the entire project, including non-Hispanics.
Several opportunities for further research would be useful to further test and utilize this scale. First is a comparison using the behavioral scale between importance ratings of Hispanics and the general population on the same set of attributes, to see whether the scale lowers the number of upper category responses for both populations equally or instead narrows the gap between Hispanics and non-Hispanics. A similar comparison among Hispanics in markets other than the Mexican-dominated cities used in this test (e.g., New York and Miami) could confirm whether the behavioral scale reduces high ratings for other Hispanic-origin groups. A second useful inquiry is to explore the interval properties of the behavioral scale to determine how far off the intervals are from a true metric scale. A third path of investigation is to develop scales for other types of questions using the decision criteria anchor importance scale as a guide. Attribute performance ratings, satisfaction ratings and psychographic attribute ratings are three possible areas for this type of work.
References
Dana, J. (2000). Lingua franca. Marketing News, 34, 17.
DeVellis, R. F. (1991). Scale development: Theory and applications. Newbury Park, Calif.: Sage.
Glowa, T., & Lawson, S. (2000). Satisfaction measurement: Is it worth it? Quirk’s Marketing Research Review, 14(9), 32-38.
Hui, C H., & Triandis, H. C. (1989). Effects of culture and response format on extreme response style. Journal of Cross-Cultural Psychology, 20, 296-309.
Maiorca, J. (1997). How to construct behaviorally anchored rating scales (BARS) for employee evaluations. Supervision, 58(8), 15-18.
Schuman, H. & Presser, S. (1996). Questions and answers in attitude surveys: Experiments on question form, wording and context. Thousand Oaks, Calif.: Sage.