Learnings from real-world applications of gen AI tools in surveys

Abstract

As more companies and researchers start to work with generative AI applications, it is important to understand how to properly integrate them into the survey process. Using gen AI effectively leads to deeper, valuable insights and better research participant experiences.

How to successfully approach generative AI applications

Editor’s note: Rachel Dreyfus is the president of Dreyfus Advisors.

In this article I share my learnings to help market research practitioners apply generative AI survey tools. After completing two quantitative project for two different clients, where the surveys embedded an AI chatbot to converse with respondents, I found myself debriefing – what could I have done differently? I hope to help others through my learning experience.

A chatbot tool in a survey questionnaire can replace traditional open-ended – and some closed-ended – questions. The tool allows exploration beyond immediate rational responses, eliciting deeper insights and elevating the survey beyond a typical hybrid qual-quant instrument. The text analytics provide another set of reliable quantitative data. Collected from a stable sample size of several hundred or more, AI is deployed on both the front end in the respondent experience and on the back-end analysis. On the front end, the tool behaves similarly to a customer service chatbot. An avatar pops up and begins a conversation with the respondent generating unique probes that build on the responses provided. On the back end, common themes, sentiment and deeper insights can be investigated with text analytics using a proprietary platform and dashboard user interface.

For context, one project was a finished copy test of a nonprofit’s advertisement among prospective and current donors designed to sharpen the execution (video). The other project was a survey among employees to collect feedback on a company’s vision statement designed to identify red flags and improve the language used in the statement (text, no images). For both projects, AI conversations and text analyses were run on the HumanListening platform.

Considering gen AI research solutions

I had experience with two commonly used qualitative solutions embedded into survey instruments. Either live moderators who intercept individual respondents and offer additional incentives for one-on-one web chats (expensive) or asking survey respondents for an optional one-minute selfie video to answer an open-ended question (varying depth and quality). Neither of these respondent experience solutions provided advanced analytics on the back end. My experience with AI text analytics was primarily with traditional survey open ends, using text data to model and measure the topics that drive NPS, for instance. I had expectations for improved productivity in combining this conversational front-end experience with back-end text analytics. Here’s what I learned.

An example from a conversational AI chat.

A little conversation goes a long way:

Use generative conversations judiciously in survey questionnaires

It’s tempting to want to include an entire focus group’s worth of probing in the survey questionnaire – but just because it’s possible doesn’t mean it will be a good respondent experience. Each conversation will be equivalent to about three probes and responses. After two of these sequences, some respondents may be tapped out. Make any further conversations shorter and/or optional. Probe and response time varies but it’s safe to assume each probe will add about one to two minutes to the survey length. Two conversations can add six to 10 minutes to your LOI so it’s important to work with your vendor to stay within target, ideally by removing the multiple-choice questions that are no longer needed.

One client was initially concerned about my suggestion to use a chatbot, worrying it would be an unpleasant experience for the respondent. Ensure the vendor has a demo to “show vs. tell” what the experience will be like and assuage concerns. The project leader has the responsibility of keeping the chatbot probes easy to answer to avoid annoying respondents.

A summary of the survey respondent experience

Although the temptation will be to go heavy on the conversations, keep conversations super simple for better respondent experience and insights.
Some respondents will find conversing with the chatbot enjoyable, and others will feel less comfortable and express frustration. My takeaway is to shorten the interview for all respondents. Respondents always have the freedom to respond “next” to bypass the rest of the conversation (I noticed only a handful did so).

Keep guardrails on the chatbot: Using a ladder-up technique throughout the conversation

I had the option to provide coding terms and topics upfront to create the large language model (LLM). We would then be able to update the model with additional terms and topics after the soft launch. I lost time trying to guess the likely conversation themes and topics. When we pretested that survey version, the chatbot probed on the model terms that I fed it rather than follow the organic terms surfacing from the respondent conversation. I ended up abandoning my preset terms.

What worked better was to structure the conversation to use the moderator’s “ladder-up” approach, whereby the chatbot repeats the response and probes a step further on feelings and perceptions provided by the respondents. With this technique, which nearly imitates a focus group moderator, respondents feel “listened to” and provide more detailed responses than we’d typically get from flat open-ended questions, such as, “Why did you rate the ad ‘very high’ appeal?” We also had the opportunity to ask “why” questions designed to investigate emotions including, “How did the ad make you feel?” and “What images or phrases in the ad made you feel that way?” Connecting the respondent’s side of the conversational probes creates a rich and more insightful paragraph than a traditional open-ended verbatim response.

Infrequently, the chatbot missed the mark; fortunately, conversations quickly recovered. It usually happened when a respondent answered a question with another question (possibly using sarcasm). For example, one response about the ad’s copy was, “What does this even mean?” and the chatbot promptly responded with the textbook definition of the tagline. We would have preferred, “What do you think it means?” So, the tools are not quite human, yet. And, because the themes can be both positive or negative in sentiment, the multiple-choice questions act as the guardrails needed to filter and separate the likes and dislikes on the back end.

Questionnaire development summary

Stick tight to the objectives of the study in devising questions for the conversation.
Instead of trying to predict the LMM topics in advance, let them surface from the conversations for deeper insights. Use multiple-choice questions in combination with conversations where respondents reflect on reasons for the rating they selected.
Because the themes can be both positive or negative in sentiment, the multiple-choice questions will act as the guardrails needed to filter and separate the likes and the dislikes on the back end.

It takes time and effort to unearth valuable insights

One area where the tool saves time is the ability to dip in during fieldwork and begin to understand the text insights. In a study of one thousand completed interviews my report was drafted by the time the two-week fieldwork period elapsed. Yet now I had a data set worthy of 10 two-hour focus groups. One- or two-word themes surfaced using the back-end text analytics. Identifying the actionable insights from the more obvious will require combination of art and science. In the end, no shortcuts in deep thinking are available.

Some respondents provide their first impression in a thorough manner and others will give a couple of words, just like in focus groups. With a reliable sample size of several hundred responses, the data set becomes incredibly rich with detailed perceptions and feelings. The AI will quantify your topics but the role of the heavy lifting – figuring out what it all means – rests with the project lead. With practice it’s possible to become more efficient with these tools, but beginners may want to budget extra time and resources to sift through the results.

Analytics learning summary

Be sure to analyze the “first impression” or first response in the conversation for sentiment before the responses to probing. This capability was available, but I had to ask for it.
Ensure all variables are available for analysis of category comparisons. Sometimes a processing step will be required – request that up front.
The amount and types of text analysis available can be as broad as the number of quantitative variables and sub-groups and wading through the noise to find the interesting learnings is similar to reviewing tabulations. Ask if the vendor can provide text or topic banners.

Choose the right platform and ensure it meets your needs

Get a reference of the analysis platform in advance. Many vendors now offer these tools – both in hosted survey platforms and as add-ons to common survey platforms.

For me, a data nerd who enjoys rolling up my sleeves and immersing in the analysis, self-service was key. I needed platform training but also wanted to direct the analysis plan to meet to my clients’ objectives. Each platform may differ in its strengths and weaknesses so set some criteria and challenge the vendors before signing up.

It can be tempting to choose the low-cost solution and a DIY tool, yet I was pleased to have a trained custom research team that understood my questions from a methodology and an insights-based approach. Is the sentiment analysis using first-impression ratings? Can you meet my specifications for banner plans (summary tables, top two box in the stubs, etc.)? I used the banners less than I typically do, with fewer multiple-choice questions than a typical survey. I used banners primarily for descriptive profiling and high-level ratings questions. All the “why” questions can be satisfied in the conversations. Multiple-choice data also help confirm that all responses are included and categorized correctly.

Vendor selection criteria

White papers of case studies to demonstrate or past knowledge of the category.
Ability to update the LLM midstream as new themes surface.
Vendor has market research and marketing science focus vs. an IT or SAAS focus.
The team assigned has experience with the platform to be thought partners beyond just order takers (some providers are new/staff are still green).
Sophisticated analysis tools that allow charting, statistical testing and subgroup comparisons.
Transparency of text data and good alignment with the AI topic names.
Charts can be exported directly into editable Excel file (vs. uneditable images).

Include privacy assurance statements to reduce AI concerns

Marketing and legal teams may have concerns about using AI tools and want assurances; include these in the statement of work with the vendor.

Privacy assurances summary

Assurance that proprietary data collected will not be used to train the vendor’s AI platform in the future.
Assurance that the data will be free of hallucinations. Again, the hard work comes with ensuring an adequate pretest of the chatbot for the directions the conversation generatively leads and, on the back end, scanning through the respondent-level data for anything strange or unexpected.

Benefitting from gen AI survey tools

In conclusion, approach conversational AI in surveys with careful enthusiasm. They can add two new dimensions to research studies. First, a more interesting experience for respondents leads to a conversation richness comparable to a focus group (and superior as well, given bias and small sample of traditional focus groups). Respondents are curious and somewhat forgiving about the occasional odd probe the chatbot might serve up and, let’s face it, our respondents have been forgiving about poorly worded questions in traditional surveys too.

Secondly, the tools’ text analytics add dimension to the insights. There’s no substitute for a trained research project lead; text analytics require interpretation with rigor. I renamed and re-categorized some topic and themes digging into how the model worked, and I knew to separately evaluate first impressions and how to communicate my tabulation specifications. A DIY researcher on the product or marketing team may not be as well-versed in best practices.

I hope this overview of some of the solutions’ strengths and weaknesses will lead to efficiency – and perhaps my job will be necessary for the time being. Researchers can benefit as text analytics tools gain scale and continue to evolve to meet our future needs.