Faster, cheaper — and provably defensible.
The headline number: 100% of drivers significant at p<0.01 in human panel data were reproduced in the synthetic output. In multivariate analysis at the standard p<0.05 threshold, the pipeline reproduced 65–75% of significant drivers.
In plain decision terms: any research conclusion that would have been confident on human panel data stayed confident on synthetic. Borderline multivariate calls got recovered most of the time, but not always — fidelity enough for the research questions we scoped around, not enough for the ones we didn't. The product never promised more than that.
For those questions, that level of fidelity was enough to replace panel recruitment on cost, speed, and defensibility — without losing the signal the research was paid to find.
Where the pipeline gets it wrong.
The drop from 100% (strong-significance drivers, p<0.01) to 65–75% (multivariate drivers, p<0.05) isn't free. Two failure modes account for most of it:
- Variance compression. Synthetic responses cluster more tightly around the mean than real respondents do. The centre of the distribution is faithful; the spread is softer.
- Range restriction. The tails of the response distribution — extreme attitudes, edge demographics — are under-represented in synthetic compared with real. The pipeline reproduces the typical response well; the atypical response, less so.
For research questions where the centre is the signal — driver identification, brand-attribute strength, segment positioning — those failure modes were tolerable. Strong drivers stayed strong; the borderline multivariate calls that got lost were the ones a researcher would have hedged on anyway. For research questions where the tail is the signal — extreme-attitude tracking, low-incidence segments, edge demographics — synthetic isn't a substitute. Naming what doesn't work has been part of how we earned trust on what does.
The cost shape we're still tuning.
There's a real cost ceiling worth naming. Brand-positioning research — measuring how a brand sits relative to the entire market share — requires generating respondents across many brands and cross-brand interactions simultaneously, with full reference-panel validation across each. Doing that rigorously pushes the per-batch cost up substantially compared with single-brand work. We're constantly experimenting on how to optimise it: smarter sampling strategies, cheap validation passes that gate the expensive ones, batch reuse across related questions. Like the pipeline itself, the cost shape is something we tune, not something we've solved.
That quantitative case is what made the commercial story real. Led the technical side of an enterprise commercial engagement and supported commercialisation alongside the BrandComms founder, who carried the buyer relationship — translated the methodology into cost, turnaround, and pricing implications. Positioned the approach against an 8-supplier competitive landscape in a client-facing brief I led.
The product never needed to claim it replaced real humans for every research question. It needed to credibly replace panels for the specific research questions where we could demonstrate fidelity. Narrow claim, measured evidence, commercial traction.
100%
Drivers replicated at p<0.01
65–75%
At p<0.05 multivariate
Enterprise
Lead technical on engagement