Editor’s note: Keith Chrzan is division vice president, marketing sciences at Maritz Research, St. Louis. Joey Michaud is the firm’s director, marketing sciences.

The choice of a response scale is one of the most contentious debates in customer satisfaction research and one of the least crucial. Academic research on scale design is voluminous but not always relevant. Much of the published work on attitude rating scales comes from the field of public opinion research and not on satisfaction research or even on marketing research. Nor is this academic research consistent. We learn, for example, that more scale points are better than fewer; that two, five, nine, 10, 11, and 101 are optimal; that two are not enough, and that there is no relationship between scale quality and the number of scale points.

However, some broad themes and specific findings do emerge. Maritz Research’s experience and testing of satisfaction scales leads to the following recommendations about satisfaction response scales.

What scales should accomplish

Response scales for overall customer satisfaction meet a variety of objectives. These scales:

  • reliably measure the construct of customer satisfaction;
  • validly measure the construct of customer satisfaction;
  • provide discriminating measures;
  • permit interval-level statistical analyses;
  • apply to a wide variety of product and service categories;
  • are appropriate for mail, telephone, Internet and personal data collection;
  • are easy for respondents to understand and remember during interviewing.

This article concerns only overall satisfaction scales. Other measures (such as performance scales for product or service attributes, agree/disagree scales, etc.) are outside the scope of this article.

Response scale properties decisions

Our recommendation depends upon five major factors:

A. Mode of administration
Sometimes respondents see a scale (paper-and-pencil surveys, Web or PC-based surveys) and sometimes they just hear it (telephone and interactive voice response surveys). Humans process visual and aural information differently. Some scales are more confusing when heard than when seen, which can lead respondents to answer the same question differently in different data collection modes. For this reason, some of the recommendations depend on whether a scale is to be perceived visually or aurally.

B. Scale balance
Balanced scales have equal numbers of positive and negative points: completely satisfied; mostly satisfied; mostly dissatisfied; completely dissatisfied. Although there are exceptions, a best practice with respect to balanced scales is that the use of modifiers should be symmetrical on the positive and negative ends of the scale.

Unbalanced scales attempt to get greater discrimination on one side of the scale than on the other. If past experience suggested that most respondents are satisfied with a certain product or service, a researcher might want to “stretch out” the positive side of the scale, as in this extreme example (five positives and only two negatives): completely satisfied; very satisfied; mostly satisfied; somewhat satisfied; barely satisfied; mostly dissatisfied; completely dissatisfied.

Whether a scale should be balanced or unbalanced usually depends on whether we’re measuring a unipolar or a bipolar concept of satisfaction. A unipolar satisfaction scale might range from “not satisfied” to “completely satisfied” (i.e., it doesn’t measure any more extreme dissatisfaction at all). In contrast, a bipolar scale would range from “completely dissatisfied” to “completely satisfied” (i.e., it measures extremes of both satisfaction and dissatisfaction).

When we don’t know ahead of time whether most respondents will tend to be satisfied or dissatisfied, or if we expect high levels of dissatisfaction, a balanced bipolar scale is appropriate. If we know from past experience to expect low levels of dissatisfaction, an unbalanced unipolar scale will be better. In our firm’s experience, most studies show low levels of dissatisfaction, so our most frequent scale recommendation is for an unbalanced satisfaction scale.

Some people think that satisfaction and dissatisfaction should be measured separately and are different entities. Although Maritz has observed that satisfaction and dissatisfaction may sometimes have different drivers, and there may be nonlinear relationships between satisfaction and other variables, we have seen no compelling reason to measure them separately.

C. Midpoints
A midpoint communicates neutrality on a balanced scale. “Neither satisfied nor dissatisfied” serves as the midpoint in this balanced bipolar scale: completely satisfied; mostly satisfied; neither satisfied nor dissatisfied; mostly dissatisfied; completely dissatisfied.

There seems to be no difference in quality between scales that have a midpoint and those that do not. For bipolar scales, however, Maritz Research follows the advice of Sudman and Bradburn (1982) to “include the middle category unless there are persuasive reasons not to do so.” Krosnick (in press) suggests the same.

D. Number of scale points
Despite the strong opinions of some scale enthusiasts, there just isn’t powerful empirical evidence that a single number of scale points are always the best. A literature review by Cox (1980) summarizes the consensus regarding self-administered surveys: “Seven, plus or minus two, appears to be a reasonable range for the optimal number of response alternatives.” Internal research has not indicated a consistent difference in quality between 5- and 10-point scales. Krosnick (in press) finds that 5-point scales have the greatest test-retest reliability in paper-and-pencil surveys. Maritz’ experience with mail studies has shown that 5-point scales provide better dispersion of responses (and are, therefore, more discriminating) than other scales. We expect these results would generalize to other visual survey modes (e.g., PC or Web-based).

The findings about the superiority of 5-point scales apply specifically to visually perceived scales and may or may not generalize to questions that respondents hear rather than see. Ten-point response scales are common in telephone surveys and have also been found to provide some advantages: greater statistical precision, statistical power, and room for improvement (Wittink and Bayer, 1994). We have also seen evidence that respondents from some ethnic groups shy away from using the extreme endpoints and may answer more accurately if they have a greater number of scale points from which to choose. Hence, while our basic recommendation is to use 5-point scales, we are comfortable using 10-point scales for some surveys, particularly those conducted via telephone.

E. Scale anchors
Marketing research convention, Maritz Research’s experience, and recent academic work (Krosnick, in press) suggest that 5-point scales be fully word-anchored. Similarly, convention and our experience with the difficulty of administering more than about five verbally anchored scale points support word anchoring only the endpoints of 10-point scales.

Recommended satisfaction response scale

For the reasons above, Maritz prefers a 5-point fully-word-anchored unbalanced response scale for measuring overall satisfaction. The preferred scale works well in both aural and visual survey modes (Figure 1). We also recommend that it be presented without associated numbers (see section on other considerations).

Acceptable alternatives

There are situations when other scales may be appropriate.

A. Bipolar scale
When we don’t know how responses may be distributed or when we suspect generally lower levels of satisfaction in a particular study, it may be prudent to use a bipolar scale. Recommended bipolar scales for visual and aural presentation are shown in Figure 2.

B. 10-Point Scale
Figures 3 and 4 show unipolar and bipolar versions of the 10-point scale Maritz Research recommends for telephone interviewing.

Other considerations

A. Presentation of the scale
In an aural data collection mode (telephone), presentation refers to the wording of the question. In a visual mode (mail, Internet, personal interview with “show card”), it means how the scale is configured (e.g., horizontally or vertically, symbol used to express categories, highest category at the right or left, etc.).

In a visual mode, the recommended response scale is presented horizontally and ordered low on the left to high on the right. There is little evidence supporting either low-to-high or high-to-low ordering. Sudman and Bradburn (1982) generally recommend low-to-high, and our firm’s experience provides mild evidence agreeing that a low-to-high ordering is better.

Overall satisfaction is usually measured alone, and not in a battery of like-scaled items, so a vertical ordering (from low satisfaction at the top to high satisfaction on the bottom) may also be used:

not at all satisfied;

slightly satisfied;

somewhat satisfied;

very satisfied;

completely satisfied.

When presenting endpoint-only anchored scales visually, whether or not numbers are associated with the unlabeled categories is open to debate. Maritz’ experience and internal research show that responses may be different with and without the numbers, but not which measure is more valid or reliable.

B. Multi-language surveys
If a survey is to be translated into multiple languages, the meaning of verbal scale anchors may not translate perfectly. Since this problem is more likely to occur as the number of verbal anchors increases, consider using verbal anchors only for the endpoints of multi-language scales.

C. Consistency in administration
The human brain is good at many things but measuring its own internal mental states isn’t one of them. Compared to a micrometer, an odometer or a Breathalyzer, the brain is a poor measuring device indeed. As a result, minor differences in how rating scales are administered can cause large differences in study results. Asking the same scale of similar respondents in phone and mail surveys routinely produces sizable differences in both mean ratings and the distribution of ratings. Scales with different anchors or even identical scales administered in different ways or in different contexts can return dramatically different results. Any time a scale is used in tracking, or when a scale will be used in two separate studies and results on it compared, any of the above differences plus others, can wreck comparability.

Much more important than which scale you use is that you maintain wave-to-wave consistency. Before any changes are made to a scale or its presentation, it is wise to conduct a side-by-side test to see how the change will affect the results.

Variety of objectives

A good satisfaction response scale must meet a variety of objectives - most importantly valid and reliable measurement of the satisfaction construct. A good scale will also be discriminating, permit interval-level statistical analyses, and be easy for respondents to understand. Maritz Research prefers response scales that are applicable to a wide variety of product and service categories and can be used in all data collection modes.

Our basic recommendation is a 5-point fully-word-anchored unbalanced scale (appropriate in both visual and aural modes) is shown in Figure 5.

If low levels of satisfaction are anticipated, a 5-point fully-word-anchored balanced bipolar scale (Figure 6) may be used (appropriate in both visual and aural modes).

In telephone surveys, 10-point satisfaction scales have been shown to provide some advantages. Ten-point scales are also useful with certain ethnic groups who shy away from using the extremes, and can be helpful when a study is being conducted in multiple languages (fewer verbal anchors to translate). For the 10-point scale, Maritz offers unipolar and bipolar alternatives, with only the endpoints anchored:

Unipolar 
1 = Not at all satisfied 10 = Completely satisfied

Bipolar
1 = Completely dissatisfied 10 = Completely satisfied

These recommendations result from an extensive literature review, internal research and considerable experience with customer satisfaction research. However, Maritz recognizes that there may be good arguments supporting other satisfaction response scales. When data comparisons are important, we recommend using the existing scale. The ability to compare data from identical scales far outweighs any benefit from response scale redesign in nearly all situations. 

References

Sudman, Seymour and Bradburn, Norman M. Asking Questions: A Practical Guide to Questionnaire Design. San Francisco: Jossey-Bass, 1982.

Cox, Eli P. III. “The Optimal Number of Response Alternatives for a Scale: A Review,” Journal of Marketing Research, 17 (1980): 407-22.

Wittink, Dick R. and Leonard R. Bayer.   “The Measurement Imperative.” Marketing Research, 6 1994: 14-22.