How AI will affect market research in 2024 and beyond
By Rick Kelly, Chief Strategy Officer, Fuel Cycle.
Within two years, AI will become an essential tool to researchers and their stakeholders, presenting a dramatic shift in the way insights are captured and delivered. While initial AI product implementations have focused on qualitative data, we also see significant enhancements to creating and analyzing quantitative data. We expect that AI applications will help drive massive efficiencies in existing research processes and enable organizations to scale insights-led decision-making dramatically.
Despite our enthusiasm, our research has found that throwing AI into insights applications without careful implementation can degrade quality. The foundational AI models we use, the way data is structured and the way we prompt AI models to generate results can significantly impact the quality of analysis produced by AI. I’ll cover why this matters and an initial experiment we’ve done that will hopefully empower your evaluation of AI insights applications.
AI adoption among researchers
While there is substantial coverage of AI in industry periodicals and conferences, most researchers have yet to adopt it widely. Every year, Fuel Cycle publishes The State of Insights report based on responses from validated brand-side research practitioners. In Q4 2023, only 15% of corporate researchers said they’re using AI to aid their insights reporting and 46% said they “never” or “rarely” use AI.
Despite current adoption of AI, the uses for research are already coming to light, both from industry practitioners and academics. Two highlights from recent academic work identify the potential for AI impact on our industry:
- Researchers at UC Berkeley and Columbia University found that synthetic respondents provide reliable data: “Using AI-based tools is a reliable augmentation or even substitute for human brand perception surveys. We find that automatically generated sentences can be used to create perceptual maps that broadly match those created from human surveys.”1
- Researchers at Wharton and OpenAI found that survey research is a job function most prone to automation via generative AI. Given that about 80% of the U.S. workforce could see at least 10% of their tasks affected by AI, survey researchers, who often engage in tasks like data analysis, questionnaire design and report writing, might find a considerable portion of their work either automated by these technologies.2
A consistent concern that surfaced in qualitative interviews is confidence in AI-generated results. AI application users have frequently highlighted the fact that AI models are prone to hallucinations (made-up results that are presented as facts), which impact their confidence. This makes sense; after all, we can’t make confident decisions if we don’t trust the analysis.
To understand why this is important, let’s review the concept of Jobs To Be Done in the context of insights.
Jobs To Be Done
As articulated by the late Clayton Christensen, the concept of Jobs To Be Done is a framework for understanding the underlying need behind every business transaction. In essence, when stakeholders engage insights professionals, they are “hiring” them to fulfill a specific job. In the context of insights, that job is to equip decision makers with the clarity and confidence needed to act. It's about transforming data into a strategic asset that reduces decision latency and financial overhead while enhancing the quality of outcomes.
Using this framework, we can ask: What is the job to be done by insights in an enterprise setting? Succinctly stated, businesses “hire” market research and insights to support leaders in making confident decisions. This doesn't mean confident decisions at any cost or timeframe but rather, optimizing both to enhance the decision-making process. It's a delicate balance where time, cost and confidence intersect.
As we integrate AI into insights, the goal isn't merely to expedite or cheapen the process but to elevate the quality of decision-making. The use of synthetic respondents, for instance, might reduce costs but the crux lies in whether it genuinely improves how insights are utilized in the enterprise. That's the key metric for success.
Throwing AI at a problem doesn't guarantee success; it must be a purposeful integration. True success lies in enhancing the job to be done of insights – improving not just speed and cost but also bolstering decision-making confidence. The intelligent application of AI in insights isn't just about throwing AI at business problems but understanding the job to be done and executing it well.
AI model selection and prompt engineering trade-offs
Not all AI models are equal, meaning the selection of underlying models for insights generation is an important consideration. Take, for instance, GPT-3.5 vs. GPT-4. GPT-3.5, while less expensive and faster, tends to be less accurate than GPT-4. GPT-4, though more costly and slower, offers significantly higher accuracy. These differences aren’t just limited to GPT models but include a wide array of solutions.
This diversity in model capabilities underscores the importance of selecting the right LLM for research studies. The choice of model directly impacts the quality and reliability of the results but it’s only part of the equation.
The other critical aspect is prompt engineering – a technique akin to programming in a natural language. Prompt engineering is about strategically structuring the inputs for AI to elicit specific, high-fidelity outputs. It's a nuanced process where the way questions or commands are framed can dramatically influence the AI's response. This approach is crucial because using generative AI isn't just about deploying technology; it's about harnessing it with precision. Effective prompt engineering ensures that AI addresses the problem and yields accurate, insightful results.
Evaluating the impact of different models and prompts
Given the potential for different outcomes based on model selection and prompting techniques, we conducted an experiment to assess whether different AI models and prompt engineering techniques would impact researchers’ acceptance of AI-generated results. Our participants were a blend of 23 corporate researchers and research suppliers, most of whom held neutral views regarding the role of AI within their field.
The experiment involved generating four distinct versions of executive summaries based on discussions from Fuel Cycle's research communities. These discussions varied widely in scale, with comment volumes ranging from dozens to thousands of responses. This variability was intentional, designed to test the robustness of AI analysis across different data sets.
Each participant was exposed to results from one discussion only, which had been analyzed in four unique ways. To ensure a fair comparison, we concentrated our efforts on a select group of LLMs and corresponding prompt engineering strategies. We included Anthropic's model and GPT-4, both without and with advanced prompt engineering techniques – namely, Tree of Thoughts and Chain of Density.3 Llama and GPT-3.5 were eliminated in our internal evaluations beforehand due to underperformance.
For the Anthropic and GPT-4 models, we maintained consistent prompts to ensure any observed differences in performance were due to the models and techniques themselves, not the variability in the input. Our goal was to craft prompts that could coax the best possible outcomes from each model, thereby offering a clear comparison of their capabilities in analyzing and summarizing complex research discussions.
We found the following:
AI model selection and prompt development can influence research results.
- GPT-4 alone displayed strong performance; GPT-4 + the prompt technique “Tree of Thought” improved results.
- 21 of 23 respondents said they either “strongly agreed” or “agreed” that results generated using this approach were “clear,” “human-like” and “useful.”
- GPT-4 performance and accuracy can be degraded with prompt techniques that aren’t designed with the outcome in mind.
Prompt refinement should be a key consideration in R&D efforts.
- Researchers and research providers should account for trade-offs in cost, speed and accuracy when developing generative AI solutions.
- Anthropic and GPT-3.5 are cheaper and faster than GPT-4 but have other performance trade-offs.
Analysis of significant amounts of data consistently took seconds.
- Processing as much as 3,000 discussion board comments and transforming it to an executive summary took under a minute, a process that we estimate would take 24 human working hours.
This design aimed to shed light on the practical trade-offs between different AI tools and methodologies in a real-world research setting, ultimately guiding us toward more informed decisions about integrating AI into market research.
As AI models begin to play a pivotal role in the research process, the findings from our experiment underscore the importance of model selection and prompt engineering in producing useful results that improve the Job To Be Done that insights are “hired” for.
Selecting the right AI model and mastering prompt engineering are not just operational tasks but strategic decisions that can significantly affect the quality of insights generated. Researchers and organizations must weigh the trade-offs to ensure that AI drives efficiency and also fortifies the confidence in decisions made.
AI will become an indispensable tool for many insights practitioners in the future but we need to do it right. With careful implementation, the promise of AI to empower insights-led decision-making is within grasp, promising a future where data transforms seamlessly into strategic action.
1 * Li, et al., “Language Models for Automated Market Research: A New Way to Generate Perceptual Maps.” Latest revision: August 2023.
2 Eloundou, Manning, Mishkin and Rock, “GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models.” March 21, 2023.
3 See promptengineering.ai for initial guidance on developing prompting techniques.