Editor's note: Keith Chrzan is senior vice president of analytics at Sawtooth Software Inc. He can be reached at keith@sawtoothsoftware.com.

In the November/December 2024 edition of Quirk’s, Lynn Welsh and Tamara Fraley describe the use of patient chart audits to understand physicians’ prescribing behavior (“From recollection to reality: Understanding prescribing trends through chart audits”). They note that using chart audits enables us to see actual behaviors, not just behaviors recalled and reported in a typical survey research environment. Patient chart audits thus provide better data as they use recorded than recalled behaviors.

Let’s consider this an extension of that article, because below I’ll describe how we can apply choice modeling techniques to understand and predict physicians’ therapy choices. Alternatively, you could view this as a companion piece to my earlier article on situational choice experiments, SCEs, because the choice modeling we do with chart audits can resemble what we do in those experiments (see “Situational choice experiments for marketing research,” Quirk’s, March/April 2024).

We’ll start by introducing three kinds of models we often use with chart audit data. Then we’ll illustrate their outputs using results from a disguised case study conducted recently for a client.

Background

Not all choice models are experiments involving researcher-designed choice scenarios like choice-based conjoint. Dan McFadden published the first multinomial logit (MNL) choice model in the early 1970s as part of a study analyzing the choices commuters made among real-world travel options (McFadden 1974). One can build models to explain all manner of observed or self-reported choices. In fact, the first use of a choice experiment, which creates hypothetical choice scenarios for respondents to choose among, wasn’t published until the early 1980s, a decade after McFadden invented multinomial logit. For those familiar with the academic literature, choice models built from experimentally designed stimuli are called stated preference (SP) models while those built from observed or reported actual choices we refer to as revealed preference (RP) models. 

Choice models built from chart audits thus qualify as RP models, because they measure actual choices made by real decision makers in their natural environments. In a patient chart audit, we collect the therapies actually prescribed by physicians for particular patients and we model those therapy choices as a function of patient and physician characteristics. Depending on the details of the chart audit, there are three modeling approaches we use most often.

Polytomous multinomial logit (P-MNL) – a classic statistical modeling approach

P-MNL is a special case of the multinomial logit choice model, different from the “conditional” multinomial logit used in conjoint and other choice experiments (Theil 1969, Hoffman and Duncan 1988). In P-MNL, predictor variables are invariant across choice alternatives – in the case of patient chart audits they describe the patient or the physician, not the therapies. P-MNL produces a set of regression-like coefficients that quantify the impact of each patient or physician variable on therapy choice. These coefficients also enable us to build a simulator, so that we can predict physicians’ therapy choices for new patients not included in the chart audit. The case study below contains an example of P-MNL model output.

Classification tree – a visual, intuitive alternative

A classification tree is a decision tree which predicts the outcome of a categorical variable (Breiman, Friedman, Olshen and Stone 1984). In patient audit studies, this dependent variable is usually the physician’s prescribed or recommended therapy and the candidate explanatory variables can be any of the other variables collected in the patient audit that describe the physician, the patient or the disease. Starting with the entire set of candidate explanatory variables, the analysis looks at each variable in turn to see which best differentiates on the basis of therapy choices (think perhaps of a chi-square test where the variable with the most significant chi-squared statistic is selected, though the analysis is actually a bit more complex than that). The tree uses that single most discriminating variable to split the set of patient charts into two groups. The analysis then moves on to each of these two groups in turn and repeats the splitting process. It continues to repeat this process again and again, once for each resulting subgroup: find the most significant splitting variable, split into two maximally different groups, repeat. Eventually the analysis reaches a stopping point (think of this as the point where no further significant differences result, but again the analysis is a little more complex than that). What finally results is an inverted tree which starts with the entire sample of patient charts and ends with many different branches, each ending in a node defined by different physician choices and each with potentially very different results in terms of therapy shares. For example, a classification tree might look something like Figure 1.

A yellow classification tree.

Notice that the tree provides a structure to the decision hierarchy clients sometimes request – it appears that the physicians first considered their patient’s age in their decision process, before moving on to cardiac history, patient’s sex at birth and their insurance coverage. 

This process of repeated splits of the sample, a process called recursive partitioning, makes classification tree models sample size-intensive. We can do them with a patient chart audit because even though we may only have a few hundred respondents, each respondent gives us data from a few different patient charts. The case study below shows an example of the classification tree analysis.

Random forest – the power of machine learning with many trees

When you first hear about random forests, they may strike you as a method you wouldn’t expect to work well. Extensive experience has shown that they work extremely well as predictive models, however, so just bear with me as I describe them. As the name suggests, a random forest isn’t a single tree – instead, it builds a large number of trees or a forest (Breiman 2001). As the name also suggests, there is a random element to their construction – actually two random elements. First, only a randomly selected two-thirds or so of the cases contribute to the construction of a given tree, with one-third held out to test the predictive quality of the tree. The second random component involves each branch of each tree, where we consider only a random subset of the predictor variables as bases for a split. These two randomizing elements are said to “decorrelate” the tree. This can be valuable because in the classification tree analysis described earlier, if two variables are highly correlated, it’s possible for one of them to enter the tree and for the other to be left out, even though both of them might be important – the random forest would identify both of them as being important. After growing a forest of 500 or 1,000 trees, we look at how each tree would classify each of our physician audits and we give each tree a vote; a majority vote among the trees is our prediction for any particular patient chart. Summing the votes across charts gives us predicted shares. We assign variable importance by quantifying how much our prediction accuracy falls off when trees include a given variable compared to when they do not. 

While we can use random forests for predicting, doing so requires running a new set of observations through the existing forest of trees and then recording each tree’s vote. This is something we can do easily enough in the RF program but it would be very messy to build into Excel as the client wanted. 

When we have experimentally designed patients, as in the situational choice experiments mentioned earlier, the P-MNL and the classification tree models tend to work beautifully. When we don’t have experimental control – that is, when predictor variables may be correlated (such as for patient chart studies involving real patients) – the random forest model may work better. 

A case study

This case study shows how we used the three methods above to understand physicians’ observed choices among therapies A, B and C, for patients suffering from a specific disease and based on nearly 50 variables collected in a patient chart study. These were actual, not hypothetical, choices and no experimental design forced the predictors to be uncorrelated. Because of the proprietary nature of the study, the results below use disguised variable names and model coefficients.

Our client wanted to understand how physicians made decisions about what therapy to prescribe to a particular class of patients suffering from a particular disease. The client also wanted a simulator so that they could see how changes to physician and patient variables affected treatment decisions and they wanted a way for the management team to visualize the results.

The survey collected eight patient charts from each of 250 physicians, so we had a total of 2,000 therapy decisions with which to build our models. In addition to a dozen or so variables about the physician (years in practice, subspecialty and some disease-specific attitudes) in the main questionnaire, the patient charts included another 30+ variables related to the patient – demographics, concomitant conditions, disease-specific information (“diseasographics”), insurance coverage, disease-relevant behaviors and so on. 

After discussing the relative merits of the P-MNL model, a random forest model and a classification tree, the client opted against the random forest analysis, because they were interested in a predictive model they could build into an Excel workbook, which is difficult for RF. The client decided that the classification tree and the P-MNL would best suit their needs and both analyses ended up predicting therapy choices well. 

Classification tree. The tree analysis in this study produced a branchier tree than the example above: 13 binary splits that resulted in 14 final nodes based on different values of five of the variables (all five came from the patient chart data and none measured physician characteristics). Because the potential predictor variables had different numbers of levels, we used a particular classification tree method called conditional inference, which doesn’t favor splits on variables with more levels, as some other classification tree models may (Hothorn, Hornik and Zeileis 2006). 

The tree was richly differentiating, as the percentages of Therapy A ranged from 14% to 70% across the 14 nodes. Those for Therapy B ranged from 21% to 58% and those for Therapy C ranged between a low of 8% and a high of 40%. The tree also provided a nice visualization of how the variables interacted with one another to influence therapy decisions.

Also note that, with 14 ending paths, the model requires a healthy amount of sample to work with – eight charts from each of 250 physicians in this case gave us a sample size of 2,000.

Polytomous logit choice model. The P-MNL choice model ended up with the same five variables as significant predictors (not something that was guaranteed to happen but something that one might expect). The utility model in a P-MNL analysis has one vector (column) for each therapy option the physicians might prescribe (one for each of therapies A, B and C in this case). Each column has a unique set of coefficients (utilities) for each level of each predictor variable – this is because the model uses patient and physician characteristics to predict choices, not the characteristics of the patients. A model using product characteristics to make the prediction would have a single column of utilities, one for each level of each attribute, like in a typical conjoint experiment. Another way to think of the difference is that in P-MNL “each explanatory variable has a different effect on each outcome” (Long 1977, p. 178). Please refer to the previously cited article on situational choice experiments in Quirk’s, March/April 2024, to learn more about the structure of the P-MNL model. This utility model resulted for the therapy choices captured in our chart audit (Figure 2).

A colorful chart showing utilities and attributes.

For example, when X1=1 the utility of Therapy A increases by 0.78, the utility of Therapy B decreases by 0.95 while the utility of the reference level, Therapy C, remains unchanged. Share prediction follows the same logit choice rule used in conjoint studies, allowing us to report model results in user-friendly Excel simulators.

Simulation

The two models had similar levels of predictive accuracy but their specific predictions varied slightly across scenarios. For this reason, the simulator we built gave the client three options. They could: predict from the classification tree; predict from the P-MNL; or predict from an ensemble that averaged the predictions of the two models.

The simulator was built in Excel and the user interface looked just like any other conjoint/choice simulator, allowing the user to input patient and physician characteristics, and it displayed the resulting choice shares. The models operated behind the scenes (on hidden sheets).

Random forest analysis

As noted, the client opted against the random forest model. However, we include the model here for purposes of the case study. As described above, the random forest analysis produces a set of importances for each variable (Figure 3). We usually scale these to sum to 100% for ease of interpretation.

Figure 3: Attributes and their importance.

Again, the same five variables are the most important, albeit in a slightly different order than might have been suggested by the classification tree or the P-MNL model. As it turned out, the predictor variables were not correlated enough with each other for us to see a markedly different result from the random forest than from the other analyses, though this isn’t something we could have guaranteed would happen ahead of time. Luckily this analysis does suggest that the tree and P-MNL analyses reported to the client weren’t missing any important variables through the bad luck of their being correlated with other predictors. This suggests another benefit of the random forest analysis: it can serve as something of a safety net. Had the analysis identified important variables that were not included in the P-MNL and classification tree models, we could have alerted the client to the fact that those models suffered from some multicollinearity and to interpret them cautiously.

The client ended up with a useful visualization of the decision process as well as a flexible simulator in Excel that allowed them to make therapy share predictions for specific patient profiles. 

Add value

We thus have a variety of choice modeling tools we can apply to patient audit data. The choice models add value to the audit data by explaining the choices observed in the audits. This powerful combination of capturing and modeling actual choice behavior makes chart audits a powerful tool in the pharmaceutical marketer’s toolbox. Moreover, the RP modeling methods described here also apply to cases outside of pharmaceutical markets where we are able to collect data about choices and choosers. 

References

Breiman, L. (2001) “Random forests,” Machine Learning, 45: 5-32.
Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone (1984) “Classification and Regression Trees.” New York: Chapman and Hall.
Hothorn, T., K., Hornik and A. Zeileis (2006) “Unbiased recursive partitioning: A conditional inference framework,” Journal of Computational and Graphical Statistics, 15: 651-674.
Hoffman, S.D. and G.J. Duncan, “Multinomial and conditional logit discrete-choice models in demography,” Demography, 25(3): 415-427.
Long, J.S. (1997) “Regression Models for Categorical and Limited Dependent Variables.” Thousand Oaks: Sage.
McFadden, D. (1974) “Conditional logit analysis of qualitative choice behavior.” In: P. Zarembka (ed.) “Frontiers in Econometrics.” New York: Academic Press, 105-142.
Theil, H. (1969) “A multinomial extension of the linear logit model.” International Economic Review, 10(3): 251-259.