Rating scales can influence results

Abstract

A summarized excerpt of a U.S. Department of Commerce study testing the merits of a seven-point rating scale versus a 10-point rating scale.

Editor’s note: A U.S. Department of Commerce report, "Approaches to Developing Questionnaires, " includes a section on the results of study testing a seven-point rating scale and a ten-point rating scale using the split sample technique. A summarized excerpt from this section appears below.

Introduction

The Income Survey Development Program (ISDP) was established in 1976 to develop and test procedures to improve survey data on income, on participation in government aid programs, and economic well-being. Because of known measurement problems and because results were to be used in a series of national panel studies, the testing phase was considerably more extensive than is usual for household surveys. The program was jointly sponsored by the Department of Health and Human Services and the Bureau of the Census.

The ISDP Research Panel included a number of split sample and other tests. A single example—a test of two alternative subjective measures of well-being—is described here. This example was chosen because its straight-forward field procedures are easily transferable to many survey situations and because the evaluation incorporated several common techniques.

The Problem

Attitudinal measures originally developed and tested by Andrews and Withey had been used in earlier ISDP field tests. The items asked respondents to rate their life as a whole, their personal economic situation and, for those with children, their income in terms of providing for their children. The items were designed to provide an additional means of evaluating the impact of government aid programs and to assess overall economic well-being.

Previously, respondents answered by choosing one of seven labeled categories as shown in the left panel of Figure 1. Results using these seven "delighted-to-terrible" categories showed that reported attitudes have a strong positive skew, with most responses clustering on the "delighted" end of the scale. Empirically, such skewed distributions and the lack of variation hampered many applications of the scale, especially in multivariate analyses.

Figure 1: The “Delighted-Terrible” response categories

Delighted	Delighted Very Pleased
Pleased	Pleased
Mostly Satisfied	Mostly Satisfied Somewhat Satisfied
Mixed (about equally satisfied and dissatisfied)	Mixed (about equally satisfied and dissatisfied)
Mostly Dissatisfied	Delighted Mostly Dissatisfied
Unhappy	Unhappy
Terrible	Terrible

Design of the Test

Because of these limitations, additional response categories were developed. The result was a 10-category version of the "delighted-terrible" scale which is shown in the right panel of Figure 1. This expanded set of response categories was primarily meant to allow respondents more choice among the positive categories. Designers were uncertain, however, whether respondents could make meaningful distinctions among so many items.

Therefore, it was decided to test the items using a split sample aimed at assessing whether a greater proportion of valid variance (in the sense of meaningful distinctions) was captured in the 10-item scale than in the 7-item one.

Field Implementation

The panel involved a national probability sample of 7,500 households in which all adults were to be personally interviewed.

Sample households were divided into random halves prior to interviewing, and a numerical designation indicated the half to which each household was assigned. Since the questions are attitudinal ones, interviewers were instructed to ask them only of adults interviewed personally. While all respondents were asked the same questions, half of the households received the seven- and the other half received the ten-category response choices.

Flashcards listing the "delighted-terrible" response categories were used for the two sets of questions; interviewers were instructed to read the questions exactly as worded, and not to read the answer categories unless respondents were blind or unable to read. If a respondent was unsure of which of two or three boxes to choose, interviewers were to probe by saying that "the one that comes closest to the way you feel" be chosen. Finally, interviewer manuals emphasized the importance of neutrality and accuracy in administering these attitudinal items.

Field Evaluation

Staff researchers and questionnaire designers observed as many interviews as possible. Respondents (and interviewers) appeared to enjoy the opportunity to express their attitudes, and respondents did not appear confused by the longer list. Written observation reports and informal discussions were used to elicit observers' views about the questionnaire and interview interaction.

Evaluation

First, item nonresponse associated with the two scales was examined. It was thought that nonresponse on the experimental 10-point scale might be higher if respondents found it too difficult to discriminate among so many categories. However, results showed that item nonresponse rates were relatively low, ranging from .5 to 5 percent, and respondents using the 10-point scale were as likely to respond as those using the 7-point scale.

Frequency distributions on the two scales for the three questions are presented in the upper panel of Table A; summary statistics, using numbers arbitrarily assigned from 1 to 7 and 1 to 10, are provided in the lower panel. Overall, the data suggest that the 10-point scale resulted in somewhat more dispersion and lesser positive skew than the 7-point scale. For example, a lower percentage of respondents chose one of the two most positive categories in the 10-point scale, and positive skew was reduced for all three test items (reductions of about 40 percent occurred for the income assessment items).

Table A. Distribution Of Responses to Three Test Questions Using 7- and 10-Point Scales

	Item and number of scale points
	Life in general		Family income overall		Family income for children
Category	10-point scale	7-point scale	10-point scale	7-point scale	10-point scale	7-point scale
	Part 1. Percent Distributions
Total	100	100	100	100	100	100
Delighted	9	11	2	2	2	4
Very pleased	15	--	5	--	7	--
Pleased	21	29	14	16	14	16
Mostly satisfied	23	34	21	33	19	31
Somewhat satisfied	8	--	14	--	12	--
Mixed	11	17	13	23	13	23
Somewhat dissatisfied	4	--	12	--	14	--
Mostly dissatisfied	3	5	7	13	8	12
Unhappy	3	2	5	7	5	8
Terrible	2	3	6	6	6	6
	Part 2. Summary Statistics
Mean	4.0	2.9	5.4	3.7	5.4	3.7
Standard deviation	2.1	1.3	2.3	1.4	2.3	1.5
Percent in highest category	9.1	10.6	2.3	1.9	2.0	3.6
Percent in two highest categories	24.0	39.5	6.8	18.0	8.9	19.9
percent below “mixed”	12.2	9.8	30.5	26.0	33.8	25.9
Skew	.8	.9	.4	.6	.3	.5
Kurtosis	.3	1.1	-.6	-.2	-.8	-.3
Coefficient of variation	52.1	44.6	41.7	37.8	42.6	39.6
Number of cases	5,753	5,458	5,741	5,467	2,460	2,276
*Note: For the 10-point scale, assigned numerical values ranged from 1 (delighted) to 10 (terrible). For the 7-point scale, values ranged from 1 (delighted) to 7 (terrible). Distributions are based on weighted counts.

Variation in respondents' subjective assessments of well-being was then related to their objective characteristics as reported in the survey. Bivariate associations between attitudes—especially individuals' assessments of income—and income showed the expected relatively high correlations. However, the results also showed the 7-point scale to be as strongly associated with income as the 10-point scale, suggesting that the larger variance yielded by the 10-point scale might not be meaningful.

To further explore that question, a simple multivariate model, regressing income on the "income adequacy for children" attitude item and controlling for family size, was used. Under selected specifications of measured income, consistently more variance was explained in the regressions using the 10-point dependent variable than in those using the 7-point measure, although in two regressions, estimated with an income variable believed to be "weak," differences of only 8 percent were found.

For the most part, however, the regression models showed encouraging relative differences in explained variance using the 10-versus the 7-point scales. To date, however, statistical evaluation has not provided an unequivocal answer to the issue of construct validity. Work in this area is continuing and more conclusive results in the future may lead to a clearer recommendation about the use of these items in future questionnaires.