Reducing survey bias by capturing actual moment contexts

By Zachary Schendel, Frances James, Sarah Reneau, Dave Decelle | February 1, 2015

Reading time: 9 minutes

Abstract

Netflix researchers explore the differences between e-mail and push-notification smartphone surveys in capturing in-the-moment responses to questions about video-viewing choices.

Research Topics:: E-mail Surveys | Mobile Surveys | Quantitative Research | Software-Mobile Surveys | Survey Design
Industry/Market Focus:: Entertainment | Film/Movie | Television
Content Type: Case Study

Share Print

Listen to this article

Editor's note: Zachary A. Schendel is senior manager consumer insights at Netflix. Frances James is senior manager UX research at Netflix. Sarah Reneau is marketing coordinator consumer insights at Netflix. Dave Decelle is director consumer insights at Netflix.

Recently, Netflix set out to explore the influence of one’s context on video source choice in the actual decision moment. When you decide you want to watch video content, what drives you to choose Netflix or another source, like live TV?

Following a foundational series of video diaries and in-home ethnographies, hypotheses were formulated around drivers that might push Netflix members toward or away from the service. Drivers could be both implicit (e.g., needstate) or explicit (e.g., time of day). A survey was created to capture two key dependent variables for each driver: the fraction of moments in which a driver occurs; and the impact of each significant driver on the odds of a Netflix choice versus the competition (the analysis used was binomial logistic regression).

The business goal of the project was to prioritize the most frequent drivers that had the largest negative impact on Netflix’s odds of being chosen; in other words, the most common moments in which Netflix members tended to choose competing services.

The research plan hinged entirely on capturing valid, bias-free, in-the-moment data on a massive scale. Putting aside typical survey biases (e.g., responder bias, panel self-selection bias), there were two additional important sources of bias that needed to be accounted for by this survey: memory bias and observer bias.

Memory bias: This comes into play when a participant attempts to recall certain facts about a context that occurred in the past. Memory can be flawed and selective (Cann, McRae and Katz, 2011) and privilege recent events (e.g., recency effect in Crowder, 1976). Participants may be better at recalling the actual name of the show they watched than the needstate they were attempting to fulfill when it was chosen. They might also have different opinions of the moment after completing the show than they did in the moment when it was first chosen. Where were you the last time you watched video? That’s easy to recall. Now, can you recall how you were feeling or what mood you were in immediately before you made the choice?

Observer bias: This comes into play, for example, when an observed participant will change their behavior simply because they know someone else is watching them. The classic example is the Hawthorne effect, found originally in the 1920s at the Hawthorne Works Electric Company (Roethlisberger and Dickson, 1939). It was observed that any manipulation of the work environment had a positive impact on worker productivity. Later it was hypothesized that the real driver of productivity was simply the fact that workers knew their work was being observed. This is important for the current research because asking a participant to wait to describe the next context might artificially impact the choices they end up making. If you knew you were being observed, would you decide to watch Real Housewives?

To attempt to cancel out these biases, the survey always asked about the current moment. If there was no current moment, then the survey came in two forms: Next and Previous. For the Next version, if the participant was not actually watching something at the moment they received the notification, then they were instructed to wait until the next viewing moment to fill out the survey. For the Previous version, the participant recalled what happened during the last viewing moment they had.

Methodologies

E-mail survey: One of the most common quantitative research methods employed at Netflix is the e-mail survey. E-mail invitations will remain in an in-box (or junk mail) until a participant checks their e-mail. When they click on the link in the e-mail, the survey typically opens in their device’s Web browser.

There are some drawbacks to e-mail surveys that are particularly important for this exploration. First, even if a company sends the e-mail survey at a specific time, if a participant is unaware it is available then the survey will not be completed at the targeted time. Second, if the participant is unable to fill in the survey upon receipt because of device constraints, then the moment will have passed before it can be captured. For example, for participants who do not own or are uncomfortable checking e-mail on a smartphone or tablet, a laptop/desktop computer would need to be available at the target moment. Finally, the device might also bias participants away from important yet uncommon contexts. For example, if a participant was at a friend’s house during the target moment the e-mail was sent but didn’t see the e-mail on their computer until they got home, they might end up answering the survey about a completely different moment that occurred at their own home instead of the moment at their friend’s house. To help alleviate this potential issue, the participant was only asked to recall a single moment rather than an entire day’s worth of moments.

Push-notification survey: Another less common method that has been used at Netflix is the push-notification smartphone survey (e.g., through Research Now). The smartphone is an inherently omnipresent and personal device that will accompany you throughout all possible viewing contexts. In this method, participants who have opted into receiving push notifications receive a message that a survey is available on their smartphone. Clicking on the notification opens an app that contains the survey.

There are some drawbacks to the push-survey method as well. Because the methodology and technology are relatively new, panel size, target population representation and logistics can all be issues. One’s ability to recruit enough people can be impacted, especially if a niche audience is desired.

For this research, the recruit was for Netflix members who watch at least once a week and were also willing to take part in the study. Incidence was rather low. This can also have an impact on just how representative the sample is. First, because the study is recruiting from a limited panel who have the technology, knowledge and time to sign up for push-notification studies, compounded by the fact that the sample will typically be pulled from an even smaller subset of this population, it might be impossible to actually recruit a sample that looks like your target population. Further, the process of programming on proprietary mobile platforms and scheduling push notifications on a participant-by-participant basis (among other additional logistics with the push method) added labor, time and cost.

However, one advantage to the push survey is the flexibility to carefully control send times such that each push can be explicitly scheduled on a participant-by-participant basis. To take advantage of this capability, each participant was screened to discover how likely it was that they would be watching video in the foreground (paying attention) throughout the 24 hours of a typical weekday (five-point scale). Separately they were asked the same question about watching video in the background. The resulting top two-box distributions (e.g., week-day/foreground in Figure 1) were used to find the exact proportion of push surveys that should be sent during “foreground” and “background” moments and to weigh the number of push notification sends by time of day. For example, the majority of viewing moments occur during prime time (7-10 p.m. local) while the fewest are between 12 and 4 a.m. Proportions of push notifications were weighted accordingly. Each participant was then assigned a specific moment to receive the notification during which they had indicated they would be highly likely to be viewing something. These distributions also formed the testable ground truth.

Overall hypothesis

It was hypothesized that the push notification method would be superior to the e-mail survey method because it would capture participants’ in-the-moment contexts more accurately than the e-mail survey. This would result in a reduction in the memory and observer biases mentioned above and, therefore, yield results that were more representative of the variety of actual moment of choice. This would, however, not come without a significant investment both in time and money. To explore this overarching hypothesis, a number of specific hypotheses were tested, three of which are discussed below.

Because the e-mail method will not capture participants in the moment as accurately as the push notification method, it is hypothesized that:

1. The time difference between the survey being opened and the moment of choice recorded will increase.

2. Recorded moments will overindex during times of day when personal e-mail tends to be more frequently checked (e.g., first thing in the morning before work).

3. The variety of recorded moments will be reduced – participants will tend to recall the move obvious or aspirational viewing moments (e.g., watching in the living room, foreground viewing) over the less obvious, less thoughtful viewing moments (e.g., watching a TV at a bar, background viewing while doing chores).

Results

Testing Hypothesis 1: The correlation between send time and the time the survey was opened was significantly higher in the push survey, from r=.20 (e-mail) to r=.34 (push) (these numbers would both be higher but the survey was unable to track in local time). The e-mail survey was most commonly opened first thing in the morning (Figure 2), either at work or at home, and the push survey was opened at regular intervals throughout the day.

Testing Hypothesis 2: There were significant bumps in viewing moments captured by the e-mail survey in the middle of the day and immediately after work into prime time (Figure 3) while the push survey is a more accurate match to the known distribution of viewing (e.g., Figure 1). The push curve is also much closer to the typical weekday viewing curve from internal data.

Testing Hypothesis 3: For the purposes of this article, we are only displaying one metric used to test Hypothesis 3: foreground vs. background viewing. There were almost equal amounts of foreground and back-ground viewing in the e-mail method (Figure 4a) while there was more background viewing in the push method (Figure 4b).

Discussion

The primary goal was to capture the actual viewing moment, reduce memory and observer bias and, therefore, improve the validity of the overall survey responses. To this end, it was concluded that the push-notification method was superior to the e-mail method. The push survey was opened closer to the target send time, resulted in viewing proportions that were more similar to the ground truth and curbed an overincidence of the most common, stereotypical or aspirational viewing moments. In addition, the concerns about a sampling from a representative population were alleviated when it was found that the typical weekday view curve (Figure 3) matches internal data on playstarts more so than the e-mail data. Going forward, despite the added time, complexity and costs, it is concluded that a push-notification method would be recommended for any researcher looking to test a number of hypotheses on a massive, national scale where unobtrusive ethnographies might be seen as the gold standard.

In order to determine if a push method might be the best approach for you, ask yourself these questions:

Are the answers to your survey context-dependent or context-agnostic?
Are you trying to capture variables that might change across time of day or location?
Do you believe that the device context might artificially influence responses?
Are you concerned that participants might ignore something they do as “insignificant” or “unimportant” that is actually critical for you to capture (e.g., watching TV at a bar)?
Do you think participants might change their behavior simply because they know you are observing them?

If any of these seem like they might pertain to your research projects, consider moving away from an e-mail survey and toward something with a bit more control, ecological validity and reduced bias.

REFERENCES

Cann, D.R., McRae, K., Katz, A.N. (2011). “False recall in the Deese-Roediger-McDermott paradigm: The roles of gist and associative strength.” Quarterly Journal of Experimental Psychology, 64, 1515-1542.

Crowder, R.G. Principles of Learning and Memory. Hillsdale, N.J.: Lawrence Erlbaum and Associates; 1976.

Roethlisberger, F.J. and Dickson, W.J., Management and the Worker. Cambridge, Mass., Harvard University Press; 1939.