Editor's note: Steven Struhl is vice president, senior methodologist at Total Research, Chicago.
Roughly 20 years ago, conjoint analysis was hailed as an innovative new way to determine consumers' true, product and service preferences. Many companies quickly adopted it as the method for discovering the worth of a product's features. Many years' experience and a wealth of experimental evidence showed that earlier research methods simply did not produce accurate predictions of consumer response. While conjoint analysis solved many of the problems that arose with earlier methods, it also began showing some shortcomings. Discrete choice modeling was developed as an analytical technique to resolve these problems.
Nearly all methods used before conjoint came along involved asking consumers direct questions about the features they wanted in a product, often using importance-rating scales. Direct rating methods had several serious drawbacks. If given no constraints, consumers will tend to rate nearly everything as important. After all, it costs nothing to give a proposed feature a "highly important" rating when participating in a survey. In a product-feature study using a list of 20 product attributes, for instance, 15 or 17 features (if not all 20) could emerge as "crucial" for the product.
Another problem with rating scales lies in the way people tend to give what they perceive to be socially desirable answers. But what people think they should do and what they actually do often differ.
Finally, even sophisticated shoppers often have trouble stating what motivates them to choose one product over another. The choice process is something like the process of riding a bicycle in that it is largely based on types of judgment that most people find difficult to precisely describe.
Direct-question research typically resulted in over-priced products loaded with features that had little appeal to real-world consumers, and stories of failed products and services are legion.
Enter conjoint analysis
Conjoint analysis made product and service testing more realistic. Its most common form, full-profile conjoint analysis, presents the study participant with a series of product descriptions (or other representations). The participant is asked to look at the descriptions and make a series of choices similar to those they would make in the real world. The results of the study allow the value of various product features to be derived mathematically. Figure 1 shows the typical conjoint profile, or card.
Conjoint analysis has several basic analytical requirements or ground rules. First, it requires that the products or services tested be treated as sets of distinct attributes (or features). It also requires a limited set of variations (or levels) for each feature. In a test of hotel features, for example, one feature might be the type of hotel lobby. Suppose this attribute had three levels: plain and simple, small and opulent, and large and ostentatious. Four to ten other hotel attributes might be varied in similar ways to develop the product profiles.
In most cases, full-profile conjoint analysis uses a special type of experimental design that selects a specific subset of the many possible combinations of attributes that could be tested. For instance, suppose that in our hotel example, we decided to test seven attributes, three of which had three levels and four of which had two levels. This would lead to 432 possible combinations of attribute levels (3 x 3 x 3 x 2 x 2 x 2 x 2).
Full-profile conjoint analysis could estimate the worth of all these possible combinations using just 16 product profiles. This is good, because the average respondent could not be counted upon to rate 432 different product profiles without heavy use of certain illegal stimulants. Use of experimental design methods clearly can extract a great deal more information than you could get otherwise.
Conjoint analysis predicts consumer choices better than rating scales because participants in a conjoint analysis based study look at relatively realistic product profiles and make trade offs among various product features, selecting some combinations of attributes as better than others. Hence, conjoint analysis is called a multi-attribute trade-off technique.
Why discrete choice modeling?
For all its strengths, conjoint analysis also shows some weaknesses, particularly in testing branded products. In conjoint analysis, brand often gets treated as a product feature, with various brands as the levels. Problems arise when brand appears as an attribute to be tested along with other attributes. Since each level of each attribute must appear with each other attribute level, impossible combinations of brands and features can appear. Impossible brand-price combinations are especially likely to appear. Also, when the brand name itself signals a degree of product quality, it cannot be traded accurately against other attributes.
Problems also can arise in asking respondents to rate or rank product profiles. If the procedure is to work at all, study participants must rank or rate all the conjoint profiles. Sometimes rankings prove difficult. Looking at a series of 16 product profiles, most of us probably could select the one we like best and the one we like the least. We probably also would have little trouble identifying our second favorite and our next-to-least favored choices. It's harder to decide which should be ranked fifth or sixth. Forcing respondents to carefully rank or rate alternatives they would never choose can make the task imposed by conjoint somewhat different from real-world product selection behavior. And unfortunately, accurate rankings or ratings of all the product profiles are needed for conjoint analysis to provide meaningful results.
Discrete choice modeling avoids impossible combination and forced choice problems, while preserving - and even extending - the estimating power of conjoint. With DCM, respondents see products or services alongside competitive products in a series of market scenarios. They are asked to look at each scenario, and answer a simple question: If these were all the choices available, which would you choose, if any?
Once the respondent has done this, he or she simply goes on to the next scenario and makes the same simple decision. Figure 2 shows a sample DCM Scenario.
DCM's approach has several benefits, aside from posing a more realistic and natural task. Perhaps most importantly, the question asked is the question researchers care about most. The respondent makes a choice, decision or purchase, rather than just stating a preference. In addition to having a great deal of theoretical support, the DCM approach has high face validity with non technical (managerial) audiences.
Further, representations of brands can be customized to match marketplace reality. Each brand can have its own attributes and attribute levels. Attributes can vary for different brands. Several products with the same brand name can appear side by side in the scenarios, allowing for direct measurement of product line effects.
DCM can even be set up to have one or several products missing from some scenarios, which allows for the measurement of marketplace effects of product introduction or withdrawal.
DCM also handles certain experimental designs more easily than conjoint. With DCM, many products can share one large experimental design, or each product can have its own conjoint-style design. Products can share experimental designs (for instance, two experimental designs can be split among six products). Very large designs (requiring, say, 32 or 64 product scenarios) can be fractionalized (split apart) very easily; the downside is that fractionalization requires larger samples. However, it is still much easier than the complex gyrations required when splitting conjoint designs, which tend to become messy and produce less than ideal results.
With large enough samples, you can even use random designs with DCM. With DCM, random designs do not have the exact formulations of experimental designs. Instead, they put levels of attributes together in a different random configuration for each respondent. Sawtooth Software's CBC product does this.
You can also use DCM to analyze data collected with no explicit design, an approach called "revealed preference analysis" that has long been used in econometrics. However, any design departing from strict experimental design principals must be tested carefully.
Key characteristics
DCM analysis necessarily involves choices among alternative products or services, typically shown side-by-side in scenarios. In addition, DCM typically uses estimation by logistic regression procedures. Multinomial (or polytomous) logistic regression is used for choices among more than two alternatives. When more than two alternatives are tested, DCM also must make an important mathematical assumption, the independence of irrelevant alternatives (IIA). IIA is a key property of DCM.
Conditional variables
DCM differs from most other forms of analysis in that it often uses conditional variables. Conditional variables exist only for one or a few of the choices available. They also can have different levels for the various choices. Again considering hotels, suppose Marriott was the only chain that offered an automated robot bar in each room. The automated bar could be a small refrigerator that electronically recorded drinks you took from it, or a smaller wall-mounted unit. With DCM, "automated bar" could appear as a conditional variable only in connection with the Marriott. Since this feature would not be offered by the other hotels in the real world it would not appear as a feature tested in connection with them.
Aggregate level analysis
DCM also has one salient limitation: Analysis can be done at the aggregate level only. This is a consequence of logistic regression, which works in terms of likelihoods or odds.
Odds, of course, only can be estimated at the group or aggregate level. Some experts say you need, at a minimum, the moral equivalent of six people to do any estimation with logistic regression.
Aggregate-level analysis makes it relatively easy to fractionalize large DCM tasks. Fractionalization is done most efficiently by splitting the scenarios randomly among respondents. For a large design that has many attributes and levels, and that requires 32 scenarios, you can simply give each respondent 16 scenarios at random. This would require a larger sample, of course; in this case, double the initial number. Assuming respondents could handle a lengthy task, you also could try a 50 percent boost in sample size with each respondent rating 24 scenarios.
The ease of splitting tasks between respondents leads back to the idea of using the moral equivalent of a certain number of people. But sampling error is a factor in DCM just as it is in other survey research. Given this, you will want to use samples adequate for accurate estimation. There is nothing moral about a sample of six.
Greater complexity
Analytical complexity is, unfortunately, another key aspect of DCM. A few software packages have addressed the issue, building in analytical procedures that, to varying degrees, make DCM somewhat simpler. Still, DCM remains far more difficult than conjoint when it comes to analysis and model specification. You may need special designs to capture interactions (e.g., so-called response surface-type designs, which standard conjoint design programs cannot generate, or other more esoteric designs).
DCM models also characteristically include testing for relations beyond the raw data. For example, you might look at effects based upon squared or cubed variables, especially in investigating the effects of price. You also usually would test for interactions between key variables.
Iterative analytical procedures and models that must "converge"
Multinomial logit itself works differently from many multivariate procedures. It runs iteratively until it converges upon a solution, making it at least roughly akin to K-Means clustering, which runs and reruns solutions until two consecutive runs fall within some acceptable tolerance of each other. Unfortunately, while clustering models usually converge, or behave, DCM models may not.
Several factors can cause non convergence with DCM:
- multicollinear variables (variables highly correlated with each other or some combination of other variables);
- the presence among the choices in the scenarios of one or more alternatives selected very infrequently; and
- the presence in the design of any highly infrequent variable or variables. Infrequent variables can arise because DCM allows variables to be conditional, to apply to only one of the choices being tested.
Unfortunately, it is unclear how infrequent a variable or an alternative can be without causing problems. If the model does not converge, though, infrequent alternatives within the scenarios and infrequent variables should be among the first suspects discarded from the model.
DCM models may not work on the first half dozens tries. Even when a model works, it may be far from the best in terms of what you need to find. So DCM usually requires more exploration of alternatives than other methods, such as conjoint analysis.
Simulations closely related to scenarios
Simulations must be closely tied to the scenarios presented to respondents. If five products appear in all the DCM scenarios, you cannot do accurate estimations of what might happen if there were only four, or if another product entered the market as a sixth competitor. Such contingencies must be considered as part of the initial DCM design. Conjoint provides more flexibility, allowing you to make unplanned, after the fact estimations of effects.
Reasons favoring the logistic model
Given that the logistic model can be harder to get right than the models used in many other procedures, you may well ask why it is worth the bother. The simple answer is that it provides more analytical precision and power. Logistic regression models handle problems with discrete (not continuous) dependent variables that ordinary linear regression cannot. Linear regression definitely does not work correctly when you are trying to predict a variable that can take only two values, such as choose vs. don't choose, yes vs. no, or O vs. 1. Linear regression is not bounded by the values that you are trying to predict in this case. Predictions from linear regression can take any value, so instead of just the 0 or 1 you are hoping to predict, it might produce a prediction of 0.1 or two, or even a negative number. The correct answer always would be either 0 or 1 (or yes or no) when considering a single product and a single choice between two alternatives, so linear regression definitely is not the best method for analyzing that situation.
The theory of linear regression also states that it should not be used when the dependent variable can take only a few discrete values. So linear regression often will not work well even when the consumer can make several choices at once (for instance, choosing which types of soft drink and how many of each to buy on a shopping trip).
Discriminant analysis, not linear regression, is the linear model technique that should be used to predict which choices consumers make. Multi-nomial logit (MNL) can be thought of as similar to multiple discriminant analysis in several ways. Predictions of group memberships (or which thing gets chosen) from both discriminant and MNL often will be highly similar - assuming that discriminant analysis can handle the problem in question. Some statisticians even use MNL as a sort of superset of discriminant analysis, capable of handling more complex analytical questions. However, MNL has different theoretical underpinnings than discriminant analysis. MNL works in terms of likelihoods and odds. As such, its approach and the output it produces are more closely related to the question of choice than the assumptions and output of discriminant analysis.
Logistic models: Beyond a simple straight-line approach
The logistic model is not linear and additive. It does not assume that just adding a few utilities will accurately model how people respond. Rather, it assumes an S-shaped (sigmoidal) response curve. You can think of this curve as representing a view or theory of how consumers will respond to a product.
When utilities are near zero, utility must increase by a large amount to get consumers to a middling position. In other words, it takes a lot for a product or service to move consumers from indifference to some interest.
In the middle range of utilities, small improvements (or gains in utility) can lead to sharp increases in consumers' likelihood to respond. It only takes a little extra utility (or perceived value) to generate strong interest in a product among people who already have some interest in it. The end of the logistic curve gets flat like the beginning. In other words, it takes a lot of extra utility to move people from interest to action.
Not all writers on the subject see the logistic curve in quite this way, but it certainly goes with the shape of the logistic function. It is a helpful way to think of how logistic models work.
The nonlinear nature of the logistic response curve has some practical implications as well. First, you cannot directly estimate the value of an alternative by summing the utilities that it includes. With conjoint analysis, you simply add up the utilities of all the product/attribute variations of interest to you, and the total is what the product is worth. Not with DCM. Instead, you typically must run simulations through the MNL estimation program.
DCM and the independence of irrelevant alternatives
Strictly, the independence of irrelevant alternatives is a mathematical property of error terms that the MNL model assumes to exist. It is something like the assumption in linear regression that errors in estimations have a common variance and are independent (homoskedasticity).
Practically, though, assuming IIA works as a property of DCM means that you are assuming the odds of selecting one alternative versus another are not influenced by the presence of other alternatives that you are not actively considering. As a result, some critics have called the MNL model unrealistic as a representation of peoples' behavior. They argue that consumers always consider all alternatives in making any choice, even if many of the choices are things they would never select. Defenders of MNL say that if your decision comes down to a choice between A and B, the presence of any number of other alternatives makes no difference.
Arguments like these can go on all night, with each involved party more certain of the correctness of their position at the end. Recall though, that IIA is simply a consequence of reasonable assumptions about error terms. Fortunately, in practice, violations of IIA do not appear too often, and when they do, they usually prove remediable. However, the remedy involves experimenting with the DCM model until it no longer violates the IIA assumption.
The special properties and requirements of conditional variables
Conditional variables exist for one (or a few) of the alternatives being tested, but not for all. Using conditional variables allows each product to have its own unique attributes and levels. Conditional variables require special coding of variables. Normally, nominal-level variables are handled in analytical procedures by the use of dummy variables. These variables take on values of 1 or O, corresponding to the attribute being present or absent.
A nominal-level variable that has three levels would get translated into two dummy variables. For instance, in the hotel lobby example, we would create two dummy variables to describe the three types of lobby:
Conditional variables may require a different scheme, since a code of 0 is best used to stand for an attribute level that is absent in an alternative. Rather than using the usual 0 vs. 1 form, codes of -I and I are used, eliminating the chance of confusion between a code for an attribute level and the "level absent" code.
DCM: No individual-level utilities
Aggregate-level analysis means no individual-level utilities of the type obtained from conjoint. And the absence of individual-level utilities means segmentation is not available from DCM results. Segmentation based on conjoint utilities often provides tremendously useful results. The groups that emerge usually have sharply different wants and needs related to the product in question.
It's likely that it will be possible to infer individual-level utilities from DCM before long. On one side, the academics are rushing to the rescue, which means a batch of not terribly readable articles about something called latent class models are available. A few papers also have been presented by proponents at conferences. And a some academics even claim to have used such models.
Latent class models may become interesting, if not useful, at about the time Intel introduces the 80986 chip. The models now take a week or two to run on very large machines (faster and bigger than a fully loaded Pentium). At a recent conference, one academic stated that he simply had his department buy a Sun Computers work station (a highly powerful machine, something on the order of the most powerful computer in the world as of 1980). It was then a simple matter to run this machine for two or three weeks straight every time he needed to solve one of these problems. The academic failed to see why some listeners found this less than practical.
Other practitioners are investigating alternative approaches, but so far nothing promising has emerged.
In the meantime, nothing prevents you from clustering on other criteria, then running the DCM among the groups or segments developed in the clustering.
DCM: Evaluating the results
Once you have completed a DCM analysis, there are several criteria you can examine to see how well the model has performed. The first thing that needs to happen is simple: The model must run, or converge. Sometimes getting a model to converge takes some doing. You may need to drop some terms or add some others. Highly correlated variables can spoil a DCM model - something like the problem of multicollinearity in regular linear regression. You may need to transform some of the basic variables. (Squaring, cubing and taking the square roots of numerical variables are some of the more common transformations.) You may need to eliminate infrequent choices from consideration.
Once the model runs, one of the most straightforward ways to diagnose its goodness is to check the variables that emerge as significant. As in regular linear regression models, the variables entered into DCM models may or may not prove to contribute significantly. With DCM models, you also need to be particularly careful about variables that emerge as significant but have signs that seem backwards (a variable that you expect to have a positive effect emerges as apparently having a negative effect). This can happen when the model has too many terms (again, due to problems similar to multicollinearity in linear regression), or too few terms (you left out something important).
DCM also has a few standard measurements that help you assess the goodness of a model. One of these, the Rhosquared, is like the it-squared (explained variance) in linear regression. There is one caution here: The more terms or variables you have in the final model, the more likely you are to get a "good" Rho-squared. This again is like regular linear regression. A regression equation with 100 or so variables is likely to explain more variance than a model with just a few terms in it. As a result, big, complicated DCM models often look pretty good, based on the Rho-squared statistic. Below are listed some guidelines for reading the Rho-squared values that emerge from a DCM analysis. The figures from Williams and Louviere seem more fitted to large models. If you keep just a few of the most important variables - which after all, is what many clients want and need - some alternative interpretations become available.
Another key measure of a model's goodness is a correct classification table. You read these much as you read the correct classification table in a discriminant analysis. The correct classification table shows how often each choice was selected, and how often it would be predicted as the choice, based on the best DCM model that could be generated. The following table shows a simple DCM example that did not come out very well.
The table shows that consumers had four choices in each scenario (three products and "none of these"). Let's review what the data in the table show for the first alternative, or product. The first product, was actually chosen 740 times. That number appears at the end of the first row. Of those 740 actual choices, we would predict 495 choices based on the best DCM model. This leads to the "% correct" classification of 66.89 percent for this alternative (shown at the bottom of the first column).
Overall, we would predict that the first alternative would be chosen some 1,650 times, with 260 of the choices coming from those who actually chose the second alternative, 490 from those who actually chose the third and 405 from those who actually chose the fourth (none of these). The figures can all be found in the first column of numbers, which corresponds to the first alternative.
Although the model performs well predicting choices of the first alternative, it does not do well elsewhere. It predicts that nobody would choose the second or fourth alternatives, with correct classification rates of 0 percent for each. Overall correct classification is 29 percent, not appreciably better than the level of 25 percent that would be expected based on chance.
In short, the table shows that the model did not do a good job of predicting choices, unless our only interest was in discovering why people selected the first alternative. Incidentally, this fictional data led to an overall model fit (Rho-squared) of 0.21 - just marginally acceptable even by the most lenient standards. If we could do no better than this with a DCM model in real life, we might find the results somewhat disappointing. Fortunately, all the real-world models I have seen have performed significantly better than this one.
DCM to do it yourself: The analytical alternatives
Two new programs, CBC from Sawtooth Software, and Ntelogit from Intelligent Marketing Systems, make it possible for the nonspecialist to analyze a DCM problem. For some years, the analytically more advanced (or more brazen) could use multinomial logit programs from Systat and SAS to perform many of the same tasks. (The May issue of QMRR contains a review of the two new programs.)
DCM - a two-minute summary
DCM is a powerful technique that can solve many problems better than full-profile conjoint analysis. It is not an overstatement to say that DCM provides the ultimate in product and service testing.
DCM uses the most realistic methods available for measuring consumers' choices. Consumers respond to products and services as they would in the marketplace. All the products shown can have the attributes that they would in the marketplace. They do not need to share the same attributes and the attributes do not have to vary in the same ways, as in conjoint. Several products from a given manufacturer can be included in a single DCM task, allowing for the explicit modeling of product-line effects. Products can appear in some scenarios and not others, which leads to realistic testing of the elimination or introduction of new products.
Often, DCM gets applied to complex problems with many attributes. In fact, more attributes and levels actually can make it easier to fit a model closely to the data.
Even at its most accessible (for the moment, CBC from Sawtooth Software), DCM is more difficult to do than standard conjoint. It requires understanding of model specification and strategies for trouble-shooting.
Highly complex problems probably require an expert. However, not all experts necessarily will give the same answers. Even the authorities in the field are still debating the best approaches to intricate DCM problems. Unless you have some analytical sophistication, you may not want to try DCM by yourself - at least the first time. Regardless of how you try the technique, though, it definitely merits serious investigation, given its great analytical power and the tremendous amount of useful information it can provide.