Address objectives without objections
Editor’s note: Neil Kalt is president of Neil Kalt and Associates Inc., a Middlebury, Conn., research firm.
Every questionnaire has two overriding goals. The first is to keep respondents on-task - to hold their attention as they move through the questionnaire, keep them focused and get them to answer each question honestly. The second is to generate data that fully addresses the study’s objectives.
Achieving the second goal is the key. The sole purpose of achieving the first goal is to make achieving the second possible. How do you achieve both goals? By forging a questionnaire that is well thought-out; that is clearly, logically and succinctly written; that is constructed with the study’s objectives in mind; that is always user-friendly; and that, if the stars align, has moments of ingenuity and imagination. The considerations that follow are intended to provide you with the tools and insights to construct questionnaires that satisfy these criteria.
1. A study’s objectives are its first and most important consideration. They drive the research design, the construction of the questionnaire and the analysis and interpretation of the data. Which is why this bears repeating: when constructing a questionnaire, the key question is always: Will this questionnaire generate data that fully addresses the objectives of this study? Keep asking yourself this question as you construct the questionnaire.
2. Each question should be clearly written, in plain English, free of jargon and without ambiguity. Any question that falls short of this requirement may lessen the validity of the questionnaire. Unfortunately, writing clearly is easier said than done. A key reason is the gap that usually exists between the clarity with which we think we write and the clarity with which we actually write.
There are several things we can do to narrow, and possibly close, this gap:
• Be aware that it exists. It’ll make you think more critically as you write and it’ll get you to review what you’ve written with a more discerning eye.
• Ask at least one person whose judgment about the written word you trust to look at your questionnaire and, as warranted, suggest changes.
• Personally administer your questionnaire to a few people. If you ask, they’ll tell you whether the questions are easy to understand and whether they’re saying what you want them to say.
3. Try to write questions the way you speak, in a conversational style. It should make it easier for respondents to understand the questions and answer them, and it may help to keep them interested. One way to get an idea of how close you’ve come is to read it aloud and listen carefully to how it sounds - you want it to sound as if you’re conversing with someone rather than reading aloud to them. Another way is to try it out on a few people and ask how easily it reads.
4. Every questionnaire should have a logical flow to it, should make intuitive sense as the respondent moves from one question to the next. One technique is to order them in a way that’s consistent with how most people would approach the subject at hand. For example, if you’re asking about a particular product, you might begin with questions about awareness, then move to questions about expectations, then to a purchase decision, then to reactions to the product, and finally to the likelihood of purchasing the product again. When you ask people how easily the questionnaire reads, ask them about its flow as well.
5. The cost of wearing out your welcome is almost always high. You’re asking people to give you one of their more precious possessions - their time. If they feel that you’re asking for too much, they may either stop in midstream and walk away or begin to answer quickly and with little or no thought, which is an ugly compromise between feeling obligated to complete the questionnaire and not wanting to give it any more time and effort.
Accordingly, the time it takes to complete a questionnaire should always be reasonable. A key determinant is the respondents’ level of involvement in the category. For example, you can probably get away with a longer questionnaire when you’re interviewing people who ride motorcycles and asking questions about Harley-Davidsons than you can when you’re asking people about toothpaste.
Another key determinant is how easy, or difficult, it is to get through the questionnaire. If there are no bumps in the road, no thorny patches, nothing to annoy or frustrate respondents, then a 15-minute questionnaire should be just fine. However, if there are questions that are less than clear, questions that involve rating and ranking an overly long list of attributes, repetitive questions and questions that don’t make sense, then 15 minutes is going to seem like forever and respondents will react accordingly.
Once you have a questionnaire that fully addresses the objectives of the study, resist the temptation, and sometimes the pressure, to make it any longer until you pilot-test both versions. While it would be nice to get answers to additional questions that you and/or your client would like to ask, and while you may be inclined to feel that asking just a few more questions won’t hurt, once respondents begin to feel that “this is taking up too much of my time,” the quality of the information you collect will decline, sometimes precipitously. So if you can, take some people through the version of the questionnaire that doesn’t include the additional questions, take some others through the version that does, and ask how they feel about its length.
6. Construct scales that respondents can easily understand and use. To help you do that, consider the following:
• Use descriptors that are a good fit with the subject matter. For example, if you want to ask respondents how they feel about a magazine, a scale that uses degrees of liking is a better fit than a scale that uses degrees of satisfaction. That is, people are more likely to use “like” and “don’t like” than “satisfied” and “dissatisfied” to describe how they feel about a magazine.
• Verbal descriptors, if they’re sufficiently focused and clearly stated, are often - but not always - preferable to numeric or symbolic descriptors. For example, let’s look at a five-point scale that’s comprised entirely of verbal descriptors, a five-point numeric scale with two verbal descriptors anchoring the ends of the scale, and a five-point symbolic scale that’s anchored by two verbal descriptors, as shown in Figure 1.
Scales consisting entirely of verbal descriptors - if sufficiently focused and clearly stated - are usually preferred because they leave no doubt about the meaning of each of the points on the scale. The same cannot be said about numeric and symbolic scales. Indeed, rather than assign meaning to each scale’s interior points (2, 3 and 4 in the numeric scale), respondents tend to think about these points in terms of “more” and “less.”
Still, verbal descriptors are not always the best choice. There are times when less direction is better than more, when you want respondents to decide where on a scale they belong without giving them a road map - in short, when you want to use a minimally-defined scale.
The number of points that you build into a scale should be the number of points you need to produce a reliable measure - and no more. If you have good reasons for believing that it takes three points, three points is all you want to use. If you’re right, adding more points will only tend to lessen the reliability, and hence the validity, of the scale.
7. A scale does not have to be symmetric, nor does it have to have an equal number of favorable and unfavorable points. Typically, discrimination in the favorable part of the scale is far more useful than discrimination in the unfavorable part. For example, the difference between feeling that a product is “very good” and feeling that it’s “good” can be the difference between keeping a customer and losing him. In contrast, the difference between feeling that a product is “fair” and feeling that it’s “poor” is largely immaterial. While these descriptors and the reasons respondents give for feeling this way can shed light on the product’s shortcomings and, as a result, on some of the changes that should be made, neither “fair” nor “poor” can be taken as a vote of confidence. Moreover, we should provide an array of choices that permits each respondent who is interested in the product to find one that closely mirrors his/her feelings. Accordingly, enhanced discrimination in the favorable part of the scale is a goal worth pursuing. For example, the scale in Figure 2 includes nine numeric descriptors, six of which are paired with verbal descriptors. Together, they give respondents seven increasingly favorable choices, ranging from 3 (“good”) to 9 (“the very best”). Using more numbers than there are verbal descriptors stretches the scale, giving respondents more options. That all nine numeric choices are not paired with verbal descriptors isn’t a problem as long as there are enough verbal descriptors. And there are. The six verbal descriptors provide more than enough context to get a pretty good sense of the meaning of the three numeric choices that are unpaired.
8. Many respondents may choose a verbal descriptor based on its location in a scale rather than the meaning that the descriptor conveys. To illustrate, in separate readership studies of the same issue of a magazine, the scale used in the first study and the scale used in the second study generated the following distributions in response to a question about satisfaction with this issue of the magazine:
Scale one: the first study
Very satisfied 53%
Somewhat satisfied 41%
Somewhat dissatisfied 4%
Very dissatisfied 1%
Scale two: the second study
Extremely satisfied 18%
Very satisfied 58%
Somewhat satisfied 21%
Not too satisfied 2%
Not at all satisfied 1%
Given these distributions, what we really have is a two-point scale in the first study - “somewhat satisfied” and “very satisfied” - and a three-point scale in the second - “somewhat satisfied,” “very satisfied” and “extremely satisfied.” If the meaning of a descriptor, rather than its position in the scale, determines the choices respondents make, the percentage that said “very satisfied” in scale one would be about the same as the sum of the percentages that said “very satisfied” and “extremely satisfied” in scale two - since “very satisfied” in scale one encompasses both “very satisfied” and “extremely satisfied” in scale two. One look at the numbers - 53 percent in response to scale one and 76 percent in response to scale two - tells us that this wasn’t what happened.
What appears to have happened is this: scale one effectively gave respondents two choices, scale two gave them three. Making use of the choices they were given, about 40 percent of the respondents in the second study “moved up” to a more favorable descriptor. Since the same specifications were used to select both samples and since sample size wasn’t an issue, the cause lies elsewhere. Let’s look at probable reasons why so many respondents moved up when given the chance:
• The addition of “extremely satisfied” to scale two enabled 18 percent of the sample to select a descriptor that came closer to their feelings than “very satisfied.” So they chose it.
• “Somewhat satisfied” is part of scale one and scale two - exactly the same words are used in both scales. So why did the percentage of people who said they were somewhat satisfied fall from 41 percent in the first study to 21 percent in the second? My guess is that the 20 percent who moved from somewhat satisfied to very satisfied were what I’ll call top box-averse: that is, they’re cautious, careful about making up their minds and don’t like to go out on a limb. Unfortunately, they felt that selecting the most favorable descriptor in the scale was going out on a limb. So they opted for the lower-profile choice - the second descriptor. In the first scale, that choice is “somewhat satisfied.” In the second scale, it’s “very satisfied.” In all probability, these people were very satisfied with this issue of the magazine in both studies. However, the first study forced them to choose between the descriptor that best captured their feelings and the psychological comfort of the second position. They chose the latter.
Regardless of the validity this explanation, the findings indicate that we’re not going to get the most valid data we can unless our scales include choices that allow the full range of respondents’ feelings to be expressed. At the same time, this analysis suggests that there are respondent idiosyncrasies that impose limits on the validity of the data, idiosyncrasies whose impact can be lessened but not eliminated.
9. Memory problems. Very few people can remember, with anything approaching accuracy, the number of times they engaged in a particular behavior in the last several months or more, unless it’s something they do on a regular basis - like once a day or once a week. Unless it is, don’t bother to ask. What you’ll get is mostly guesswork.
10. If you’re going to use ratings or rankings, use both. They give you largely different pieces of information, pieces that complement one another, pieces that are both instructive.
Ratings shed light on how important each attribute is to a respondent. However, they may tell you little, and sometimes nothing, about the relative importance of these attributes to a respondent: for example, the respondent who rates a number of the attributes of a given product as “very important.” Are all these attributes equally important to the respondent? Are some more important than others? Is there one that’s more important than the rest? If we don’t get these attributes ranked, we don’t know.
Rankings tell you about the relative importance of a list of attributes. What they don’t tell you is how important each of these attributes is to a respondent. To a given respondent, they may all be important, or they may all be unimportant.
Ask respondents to rate each attribute. Next, ask them to rank the attributes that they rated as sufficiently important - for example, all the attributes that were rated as at least “fairly important.” If an attribute isn’t important to a respondent, there’s no reason to have the respondent rank it.
11. Given a choice, try to keep your questions closed-ended. Though open-ended questions can provide very useful, insightful information, the data that they generate can be seriously flawed. For example, if interviewers are used, you don’t know how often answers were recorded in their entirety, how often they were paraphrased and how often an interviewer jotted down just a small portion of an answer. If the questionnaire is self-administered, substantial variation in the ability and willingness of respondents to record their thoughts and feelings in any detail is almost always going to compromise the quality and usefulness of their responses.
However, there is a way to generate all the information that open-ended questions provide using a questionnaire that’s comprised solely of closed-ended questions and, in doing so, to substantially improve the quality of this information by eliminating the effects of both interviewer and respondent variation. How? If you’re running a big study - one that involves a large sample and that has a sizable number of open-ended questions that are must-haves - and you have the luxury of time, there is a way to make it a lot easier for respondents to work their way through the questionnaire, appreciably shorten the time it takes to complete the questionnaire, substantially increase the quality of the yield of the open-ended questions and eliminate a chunk of the time and expense of interviewing - if you’re using interviewers - and coding. Here’s how:
• Conduct a pilot test with at least 200 respondents. Use the responses to each open-ended question to build well thought-out and clearly-stated codes. That is, create groups of responses, such that the responses that comprise each group all have to do with the same issue and are headed by an umbrella statement that gives voice to this issue.
• Order the groups, and the responses within each group, based on the frequency with which they were mentioned.
• Finally, use these codes to convert each open-ended question to a closed-ended question by replacing the blank lines or empty text box with the groups of responses you’ve built.
What you’ll have is a powerhouse questionnaire that yields incomparably rich data - data that will take you to new levels of understanding and insight. One key reason why: converting open-ended questions into closed-ended questions turns unaided recall into aided recall, which will almost always make a considerable difference when the converted questions are multi-layered and richly detailed. Why? Because they organize respondents’ thoughts and feelings, which makes it easier to identify the responses that are connected to their lives. They also create awareness of the full range of responses, at least some of which respondents would have forgotten to mention had the question been open-ended.
To give you an idea of what a converted question looks like, I’ve borrowed one from a self-administered questionnaire (Figure 3). Respondents were caregivers to people - almost always family members - who had a progressive, often debilitating, disease. Although the question is now closed-ended, it still provides opportunities to record a response, should that be necessary.
Can be expanded
I have no doubt that this list of questionnaire construction guidelines can be expanded - there is always more to say. At the same time, what is here covers a lot of ground, and in a way that I hope you’ll find useful. It’s hard to deftly balance the research project’s aims while respecting the respondent’s time but it can be done. By applying the above considerations, you’ll go a long way toward making the questionnaires you construct even more user-friendly, insightful and effective.