Gary M. Mullet is director of statistical services with Sophisticated Data Research in Atlanta. He holds a Ph.D. from the University of Michigan and has taught there and at Georgia Tech, the University of Cincinnati and Berry College. Previous non-academic experience includes Burke Marketing Research. He has published several papers in both the statistics and marketing research literature and is a frequent speaker at meetings of professional societies. An applied statistician, Mullet seeks to de-mystify statistical techniques through research seminars, writing and lecturing, as well as in his client consultation.
If this were the type of publication I steal looks at while waiting in the check-out lane of my neighborhood supermarket, you'd possibly see correspondence analysis described as the "up and coming star" of marketing research data analysis or a technique that researchers "cannot live without" or even "Researcher confesses to loving new analytic technique."
Since we are not going to purvey purple prose, we'll take a somewhat more professional approach in the following paragraphs. First, we'll present a brief history of the technique, then take a look at why correspondence analysis is so valuable to today's marketing researchers and finally, as propriety permits, list some actual studies in which correspondence analysis was actually used. Also, to save the typesetter's sanity, we'll use the abbreviation CA in what follows.
Type of data
As you are well aware, in marketing research much of the data collected are of the "yes-no" variety (Aware of brand? Ever used brand? Employed outside the home?). Another variety of the same type of data is the "one from many," such as brand used most recently, product used most frequently and so on. Both of these types of questions yield nominal scale data.
Further, we tend to collect lots of rank order data, especially in concept/ product testing work and (ordered) categorical data - income, age, expenditure and the like. These latter types are referred to as ordinal scaled data. Complicating the analyst's job is the fact that for whatever reasons, we generally have a handful or more of respondents who give us "don't know" or "no answer" or "none of the above'' or whatever in response to our carefully worded inquiry.
CA history
While CA seems to have been known since the 1930s, current awareness, interest and usage got a big boost in the 1960s when J.P. Benzecri and his colleagues started developing and refining the mathematical and computer algorithms which now allow researchers to do more than merely crosstabulate the above data types. Working with the Benzecri results we can now use nominal and ordinal data as inputs to generate perceptual maps and/or respondent clusters or segments.
The researcher now has a tool to reduce the large data sets typically generated in a marketing research study without having to make simplifying and potentially dangerous assumptions about treating the scale of the data as metric. Within the past four or five years in particular, various theoretical and applied works have appeared in print about CA. (Some will be found referenced as "dual-scaling," a sometime U.S. appellation for CA). These, in turn, have generated the current groundswell of interest among today's marketing researchers, although some U.S. research companies have been doing CA since the mid-to-late 1970s.
Easy interpretation
In addition to allowing researchers to better analyze nominal and ordinal scaled data, the output of CA seems easy to interpret to most who have tried it. The perceptual maps which are generated have no vectors; each item and each scale is represented as a single point.
If you've written or read a report which included a discussion of a typical point-vector map, say where several brands were rated on several attributes, you recognize the clutter which occurs when either of the two lists gets fairly large. Not only is this clutter reduced in CA, you don't have nearly the difficulty in interpreting the relative positioning of the brands and attributes (although you do need to avoid going overboard in this part of the analysis, since it appears to be almost too easy).
Respondents, too, feel that it's easier to tell an interviewer whether or not a product is, say, sweet, as opposed to rating the sweetness on a 10-point, or whatever, scale. Since respondents are merely answering yes or no to the attribute or adjective list for each brand (all that describe the brand), you may find some references to CA listed as "pick any" analysis.
The reading and interpreting of the clusters or segments, if you also do that type of analysis with CA, seems easier as well. Since all of the data are reduced to "yes-no" (i.e., each respondent is either in the "less than $15,000 per year" or not, each is in the "$15,000 but less than $30,000" or not, and so on), the clusters show up with a narrative description of how many and what percent of the respondents are within each of the answer categories for each.
There are also attendant chi-square statistics and significance levels for each answer category shown, which compare those within a given cluster with those in the total sample. Further, any true metric data can be carried along and used, too. Such data are not used to form the clusters, but the means of these are compared with corresponding means in the total sample, cluster-by-cluster. Again, significance levels are supplied.
Other refinements
Depending on whether you are doing a mapping project or working on a cluster analysis, there are some other nice refinements in CA computer output. You can, for example, look at eigenvalues and variance explained (as with factor analysis). For a map, you are given the coordinates in up to six dimensions and variance allocated by axis and for each "variable." You can use supplementary variables to overlay other information on maps (such as age category). If desired, you can get projections of all points on each axis; these help interpret the dimensions of a given map.
Respondent numbers consisting of any four alpha-numeric characters can be printed as cluster group member identifiers; obviously, this is a tremendous help for further data tabulation and analysis. Axes can be tested for statistical significance in explaining variance for a given data set. The current research and interest in CA have given researchers these and many other analytical enhancements and opportunities.
Some of many product categories to which some of your peers have applied one or the other major features of CA include:
- Consumer durables
- News media
- Financial services
- Health services
- OTC health products
- Retail outlets
- Names of a new product/line extension
- Food products
- Clothing
- Travel/tourism
Since this is a non-technical introduction to a fairly technical subject, we won't show you the inevitable "squigglies" that appear in some of the more technical journals. However, if you are so inclined, drop a note to the managing editor of this publication or me and a bibliography will be sent to you. Most of the sources have enough complexity to satisfy anyone, in addition to showing some very nice detailed examples with actual computer output. I'll be happy to send you a sample of either or both types, too, at your request.
Not a panacea
As with any relatively new and fascinating technique, we should avoid the temptation of fitting a given study into CA where it's unwarranted. CA is not a panacea; used wisely, however, we can now do a much better job of analyzing and interpreting particular types of data from marketing research studies than we could a few years ago. Hence, our decisions are better and that's really what this business is all about.