Data processing made easy

Editor's note: Eric DeRosia is assistant phone center manager with Western Wats Center, Provo, Utah.

With the availability of powerful personal computers and simple to use software, many researchers who used to rely on outside companies for data processing are now considering hiring clerks for data entry and doing the number crunching themselves. If you are thinking about doing your own data processing, or if you simply want to avoid problems when others work on your projects, it may be helpful to listen to the advice of someone who has had years of experience solving marketing research data processing problems. According to Steve Woodall, coordinator of data services for Western Wats Center in Provo, Utah, many common data processing problems that can be avoided by taking a few preventative measures during the survey design process. At first glance these suggestions may seem simplistic, but their implementation will solve many real-life data processing problems. (Note: The following examples represent telephone surveys. However, the same principles apply to surveys conducted in malls, through the mail, or any other data collection technique.)

Use numeric codes to denote options. When writing a questionnaire, many researchers designate respondent options with letters to be circled or empty boxed to be checked. Since almost all forms of statistical analysis require that the data be represented by numbers, this can cause accuracy problems during data entry. The mental gymnastics required for data entry clerks to convert letters or boxes to numbers can lead to many mistakes. The simplest solution to this problem is to denote each respondent option with a numeric code. (See example 1.)

Example 1

Do you feel that things in the country are going in the right direction, or have things pretty seriously gotten off on the wrong track?

Right direction 1
Wrong track 2

(DO NOT READ)
DK/REF 3

Use a constant number of digits in codes. Most statistical software identifies data from its column in the matrix of data, that is, its place in that survey's assigned row of data. To avoid data entry mistakes that cause the data to be "misplaced" in its row, leading zeros should be used to make each option code a similar number of digits. (See example 2.)

Example 2

First option 01
Second option 02
. . .
Ninth option 09
Tenth option 10
Eleventh option 11

These extra digits will serve as a reminder to data entry clerks that this question requires the entry of two digits. Without a reminder, clerks may enter only a single digit for options one through nine, causing subsequent data to be "misplaced" and analyzed in the wrong column as part of the wrong question. Although the clerk may notice the error, there is no reason to take chances. Simply adding leading zeros will prevent such errors before they happen.

At the same time, leading zeros should only be used when necessary. If they are added when there are nine or fewer options, they will create unnecessary work, making the data entry process less efficient.
Align codes on the right. If clerks must search all over each page for the respondent's answers, the data entry process will be unnecessarily difficult. If the difficulty in finding the data on the page causes a clerk to accidentally skip a response during data entry, all the subsequent questions will be entered in to the wrong columns as explained above. By aligning the codes on one side of the page, this problem will be solved.

Because surveys are usually stapled in the upper-left corner, aligning the answer codes on the right side of the page, rather than on the left, will make both interviewing and data entry faster and more convenient.

Options for don't know/refused. Almost all questions will have at least one respondent who says " I don't know" or refuses to answer. Without a consistent way of handling these responses, interviewers will record the answer in a myriad of ways, making data entry difficult. Unless these two answers must be analyzed separately, a single code can be simply added, as in example 1. By adding this code to each question, don't know/refused responses can be handled with consistency.

Define skip pattern logic clearly. Confusion over skip pattern instructions during data collection will cause problems during data processing. Consider example 3.

Example 3

Do you feel Blake Ream has done his job as congressman well enough to deserve re-election, or do you feel it is time to give someone else the chance to do a better job?

Re-elect 1(Skip to Q.11)
New person 2(Skip to Q.12)

(DO NOT READ)
DK/REF 3

Since there is no skip instruction for the third option, the interviewers will be unsure of what to do when a respondent answers "I don't know." Some interviewers will respond by asking question 11, some will ask question 12, some will ask neither, and a few will ask both. Such inconsistencies will cause confusion during data processing. By simply adding an instruction to the DK/REF option, interviewing and data processing confusion will be avoided in situations such as this.

Provide generous room for open-ended answers. When detailed open-ended answers are desired but not enough page-space has been allotted, interviewers will often "cram" the answer onto the space given, making the answers difficult to read. By simply expanding the allotted space, answers will be easier to read and therefore the open-end processing will be more accurate.

These few steps, if taken during the survey design process, will make your data processing easier. Numeric codes should be used to identify respondent options. The number of digits in the codes should be minimized and kept constant. The codes should be aligned on the right side of the page. In addition, skip pattern instructions should be clearly explained. Lastly, generous page space should be provided to interviewers to write open-ended responses. Applying these simple suggestions will increase data processing accuracy, improve efficiency, and prevent headaches.