Promise or peril?

Editor's note: Raeann Bilow is marketing strategist at Cascade Insights. Sean Campbell is CEO of Cascade Insights. Jack Bowen is CEO of Coloop.

Market research is witnessing a significant transformation with the introduction of synthetic respondents or "synths" for short. These artificially created profiles are designed to mimic the human interactions typically encountered in market research studies. Crafted from extensive datasets that may include public records, targeted research projects and other relevant data sources, synths offer the potential to provide a more nuanced understanding of market dynamics.

Synths allow companies to gain deeper insights into their specific inquiries by simulating real human interactions within research initiatives. This approach supplements and enhances traditional methods of gathering feedback, aiding businesses in refining their marketing strategies, product development processes and overall strategic plans. However, the emergence of synths also raises potential concerns and risks associated with their use.

Below, we examine the advantages and challenges of using synthetic respondents in market research, highlighting both the opportunities they present and the risks involved. By adopting a balanced approach, researchers can effectively navigate the complexities associated with synthetic respondents and leverage them for maximum benefit.

Rewards of using synthetic respondents in research

Incorporating synthetic respondents as an augmentation to market research projects offers several benefits, including:

Enhanced preparation and refinement

Synthetic respondents offer a range of benefits in the pre-research phase of market research projects. To start, they help researchers with question development and refinement by generating an array of potential questions based on their research objectives. They can also be used to analyze questions for clarity, potential bias and their likelihood of eliciting meaningful responses.

For example, let's imagine that you want to conduct research on cybersecurity professionals and you have created a synthetic model that emulates their personality and response style. You can ask this model about their most common pain points, concerns, motivators or how they might react to different topics or questions. Based on their responses, you can refine and adjust the most pertinent questions to ask real respondents. And by testing different questions with the model, you can identify which ones elicit the most meaningful responses and identify any that might be problematic, unclear or potentially biased.

Synthetic respondents can also help to sharpen the study's focus right from the start, ensuring it zeroes in on the most pertinent and influential topics and target personas. Synthetic respondents, when utilized properly, can stand in as model profiles that represent ideal target audiences, incorporating factors such as demographics, interests and behaviors.

Finally, synthetic respondents can assist in refining the research methodology. Through preliminary interactions with synthetics, a researcher can experiment with various research designs, sampling techniques and analytical methods. This experimentation helps to identify the approaches that are most likely to enhance the study's outcomes, ensuring the research methodology is both effective and efficient.

Cost-effectiveness

Utilizing synthetic respondents in addition to traditional participants presents a cost-effective approach for conducting market research. By replacing a portion of human respondents with synthetic respondents, companies can cut down on the expenses tied to recruiting and compensating non-synths.

While there are initial costs involved in collecting the necessary data to develop synthetic respondents, this early investment can result in significant savings in the long run. By integrating insights from both synthetic and traditional respondents, businesses can achieve a balance, reducing research costs without compromising the quality and reliability of the data collected.

Speed and efficiency

Leveraging synthetic respondents allows for the rapid refinement of initial research concepts, facilitating the presentation of more refined ideas to human participants. This ensures that the concepts being tested are not only well-developed but also sharply focused. In essence, synthetics give researchers a laboratory of sorts to experiment within, safely and appropriately and in ways that enhance the entire research effort from start to finish.

This leveraging of a laboratory full of synths can result in a streamlined research process, leading to faster completion of research projects and faster access to critical business insights.

Engaging presentation of findings

Synthetic respondents can transform the analysis and presentation of research findings into a more dynamic and engaging process, moving beyond the constraints of traditional methods such as slide decks or reports. 

For example, synthetic respondents can be generated at the conclusion of a research study, with their profile based on the data and insights collected to date. Doing so allows a set of stakeholders to “chat with the data” in paradigms that are becoming more familiar each day as organizations and individuals embrace technologies such as ChatGPT. Ultimately, by engaging with synths after a research effort is complete, stakeholders can better understand the nuances of the data and explore various outcomes based on questions that may arise weeks or months after a traditional readout.

Risks associated with using synthetic respondents in research

While using synthetic respondents in a market research study offers numerous benefits, it also comes with risks. Researchers need to be aware of:

Avoiding bias introduced by synths 

Relying too heavily on synthetic respondents can compromise the integrity of traditional market research, risking biased results. Recent tests1 revealed that synthetic respondents exhibit biases and lack the diversity and subtlety found in qualitative and quantitative analyses. 

Therefore, it is crucial to corroborate synthetic findings with real human feedback and quantitative data. This cross-verification process is essential to enhance the accuracy of research outcomes and reduce bias, ensuring that the insights derived are both reliable and representative of the target population.

Diversity, equity and inclusion

AI models can exhibit biases due to the origins and composition of their training datasets, which are frequently not transparent. When these datasets fail to comprehensively represent a diverse array of demographics, cultural backgrounds and behaviors, the AI's outputs can be biased, leading to skewed outcomes.

An example of this issue is the use of the Common Crawl dataset for training large language models (LLMs). Common Crawl, a vast dataset collected from the internet, is a popular source for training AI due to its size and breadth. However, its composition reveals significant imbalances in language representation; for instance, English content makes up approximately 45% of the dataset, while Polish, among other languages, constitutes less than 2%. This disparity in language representation can result in AI models that are more adept at understanding and generating English content, potentially marginalizing non-English languages and the cultures associated with them.

Without deliberate efforts to include a broad and representative range of data, AI systems risk perpetuating existing biases and creating outcomes that do not fairly or accurately serve the global community.

Privacy and security concerns

When synthetic respondents are trained on LLMs, there is a risk of accidental inclusion of private or NDA-protected data into public datasets. If that confidential information is inadvertently incorporated, it can lead to breaches of confidentiality, legal and financial repercussions and concerns about data integrity and security. 

This is particularly concerning for businesses and individuals who entrust sensitive data to systems that utilize synthetic respondents. The unauthorized disclosure of private information can damage relationships, tarnish reputations and lead to a loss of trust in the entity responsible for the data breach.

To address privacy and security concerns, it's essential for organizations to implement stringent data governance and security measures. This includes conducting thorough data audits, anonymizing personal information and ensuring that the data used for training synthetic respondents is devoid of sensitive content. Moreover, it's critical that organizations creating synthetic users maintain ownership or at least control over the models produced. This control ensures that the synthetic respondents can be managed, updated or corrected in alignment with evolving data privacy standards and organizational needs, thereby safeguarding the integrity and confidentiality of the data involved.

Predictive limitations

Synthetic respondents, by their nature, cannot experience the present moment as humans do, which may limit their ability to forecast future trends accurately. Unlike real human interviews, which can address emerging issues that might have happened as recently as today, synths may not be able to meaningfully comment in these scenarios.

This limitation is partly because synthetic respondents tend to be based on models that rely on historical data, which inherently cannot include the very latest developments. For example, as of the publication date of this article, systems similar to ChatGPT would not have access to information or trends that emerged after that time.

Jeff Bezos once emphasized the significance of anecdotal evidence over data when predicting future trends, stating, “When the data and anecdotes disagree, the anecdotes are usually right.” This perspective underscores the value of obtaining human experiences that are based in the present and close observations of these experiences in real time.

Moreover, most LLMs, including those capable of processing text and images, still fall short of the human ability to integrate a wide range of sensory inputs – such as audio, vision, touch and spatial awareness – into their understanding and their training data sets. This highlights a fundamental gap between synthetic respondents and human experiences. While synthetic models can provide comprehensive analyses based on extensive datasets, they lack the depth of perception that comes from direct, multisensory engagement with the world.

Navigating the risks of using synthetic respondents in market research

Employing synthetic respondents in market research offers innovative opportunities for data collection and analysis. However, to effectively navigate the associated risks, researchers must adopt a comprehensive and cautious approach. Below are key strategies to mitigate these risks:

Combine research methods

Combining both synthetic respondents and traditional research methodologies is crucial for a balanced and comprehensive analysis. This hybrid approach allows researchers to leverage the efficiency and scalability of synthetic respondents while grounding their findings in the rich, nuanced insights that traditional research methods provide. By doing so, researchers can achieve a more accurate and holistic understanding of their subject matter, ensuring that the insights gleaned are both robust and reliable.

Assess outputs critically

Researchers must critically evaluate the outputs generated by synthetic respondents, especially in areas where bias is likely or data diversity may be insufficient. This involves a thorough examination of the assumptions underlying the synthetic models, as well as an assessment of how well the data represents the target population. By scrutinizing the results for potential biases and gaps, researchers can identify and address any distortions or oversights, ensuring that the conclusions drawn are valid and reflective of reality.

Ensure transparency and traceability

Maintaining transparency and traceability in the responses generated by synthetic respondents is essential for accountability. Researchers should ensure that each response can be traced back to its underlying data sources, allowing for a clear understanding of how conclusions were reached. This level of transparency not only bolsters the credibility of the research but also enables other researchers to replicate or challenge the findings, fostering a culture of openness and rigorous inquiry.

Respect the limits of synthetic users

It's important to acknowledge that synths are tools that enhance, not replace, traditional market research. They offer significant advantages in terms of efficiency and can handle large volumes of data with ease. However, they lack the depth of understanding and the ability to capture the full spectrum of human experiences and emotions that traditional methods, such as interviews and focus groups, can provide. Researchers should leverage synthetic respondents to complement and enrich their research efforts rather than viewing them as a standalone solution.

Consider the analogy of computer-based computational tools used in drug discovery. These tools, which simulate new molecules, have become an invaluable asset alongside traditional experimental methods. They streamline the drug discovery process by refining and narrowing down the hypotheses that need to be tested in actual trials. Similarly, synthetic respondents act as a preparatory tool in market research. They help ensure that researchers are asking the right questions and focusing their real-world studies efficiently, thereby complementing the traditional research process. Just as in silico models do not replace the need for real-world testing in drug development, synthetic respondents should be seen as a means to enhance the depth and relevance of market research findings.

Transforming market research

Synthetic respondents are rapidly transforming market research efforts. Researchers who skillfully embrace synths while ensuring their limitations are well understood are positioned to provide critical insights for their clients today and long into the future. When leveraged safely and effectively, synths can jumpstart our understanding and in no way should be seen as a threat but rather as a meaningful complement to traditional research methods. 

Reference
1 https://www.kantar.com/inspiration/analytics/what-is-synthetic-sample-and-is-it-all-its-cracked-up-to-be