Not all synthetic data is created equal

By Richard Preedy | April 22, 2025

Reading time: 4 minutes

Abstract

Marketing researchers around the world are examining the risks of using AI-generated synthetic data as a standalone replacement for traditional consumer research. While AI can offer faster insights and cost savings, not all AI methods are equally effective. The article argues that lumping all AI-driven tools under "synthetic data" is misleading and may cause unnecessary skepticism.

Research Topics:: Artificial Intelligence / AI | Consumer Research | Data Quality
Industry/Market Focus:: Consumers | Research Industry
Content Type: E-Newsletter Article

Share Print

Anchoring AI-generated insights data in human truth

Editor’s note: Richard Preedy is head of AI at market research firm Verve, London.

A paper from the UK’s Market Research Society reflects on the ongoing debate around the application of AI-generated “synthetic data” for insight. The conclusions from the analysis – that synthetic data used alone, as a direct replacement for consumer research, can be unreliable – strongly aligns with the perspective I’ve been developing for the last couple of years.

AI is transforming the way researchers generate insights, but not all AI-driven approaches are created equal. Synthetic data alone isn’t enough. Furthermore, using synthetic data as a catch-all term for all AI-powered outputs can be unhelpful and, potentially, misleading. AI is already transforming market research, and when used well, AI-generated data can deliver value. The benefits of AI as a solution to long-term industry challenges have become increasingly clear: delivering faster real-time insights, overcoming limitations in access panel data quality, providing secure testing for confidential materials and cutting costs by reducing reliance on expensive participant-driven research.

In lumping all AI-driven approaches under an umbrella term synthetic data, we’re doing a disservice to the real potential of AI-empowered insight – and fueling unnecessary skepticism.

Instead, we should look to AI simulations rooted in high-quality, proprietary client data, enriched with curated contextual data sets, to ensure every insight is anchored in and validated by human truth.

Synthetic data: The good, the bad and the deeply misleading

At its core, synthetic data is any data generated by AI that is designed to mimic real-world data. But not all synthetic data is created equal.

Fully synthetic data sets: Created primarily by algorithms with little direct connection to real people.
Partially synthetic data: Human data with some gaps filled in by AI, used to expand accuracy or overcome data set limitations.
Augmented data: AI uses real-world inputs as a foundation to expand on or extrapolate from.

These methods can vary greatly in their reliability.

Fully or partially synthetic approaches – and any poorly designed augmented approaches – face challenges that can create trust and data integrity issues if not addressed:

With a lack of human anchoring, synthetic data insights risk being disconnected from how real people actually think, feel and behave.
Oversimplification can be a challenge as AI can generalize too much, resulting in data that lacks context and smooths out nuances and niche trends that represent the messy reality of human behavior.
Bias reinforcement can occur when the data the AI is trained on is limited, flawed or unrepresentative. You risk an output that is full of exaggeration or overfitting existing patterns. Garbage in, garbage out.
Black box syndrome, where many synthetic data solutions lack transparency. If clients don’t know how insights were generated, how can they trust them?
With a closed loop, AI-generated data feeds back into itself over time. Without fresh, human inputs, it stagnates, distorts and declines in accuracy.

A lack of transparency and quality make it difficult for clients to trace the insights back to real consumers and customers. This can be fine for when fast, directional (quick and dirty) guidance is needed, but not for making decisions of consequence.

AI, human intelligence and cultural insight

Researchers can deliver AI empowered insight without relying on synthetic data.

To do this, researchers must focus on the interplay of AI with human intelligence and cultural insight – using augmented approaches and building transparent AI simulations using real, high-quality data from real people to ensure insight is credible, actionable and reflective of the real world.

Everything marketing researchers do must be anchored in human truth and verified back to humans.

AI plus human truth equals real insights. Train AI on proprietary, high-quality data sets, ensuring the insights are grounded in real behavior, not statistical guesswork.
Context matters. Consumer behavior is deeply tied to cultural, economic and psychological factors. That’s why researchers must integrate carefully curated contextual data sets, layering in social trends and cultural artifacts reflect real-world trends. AI alone can’t do that.
Full data transparency. Researchers must know what goes into their insights. Make sure any vendor you work with is open about their methods.
Human expertise. AI insights don’t run on autopilot, so be sure to overlay human expertise at every step. Setting the benchmark for quality using proven frameworks of real-world testing, validation and human empathy to extract meaning.

Combining the power of AI with deep human psychology, cultural analysis and real-world context helps researchers uncover real insight.

Don’t settle for artificial insights

The best AI-powered approaches are built on real-world data, cultural intelligence and human truth – verifiable, actionable and reflective of the real world.

So, let’s stop talking only about synthetic data. When used correctly, AI has the power to transform insights more broadly.