Synthetic data and marketing
The map is not the territory; it’s an abstraction of reality that can help you “see” over the hill without moving. Like a drawing of a tree on paper, the image describes trees, but the image itself is not a tree.
Such is synthetic data. It’s data as an abstraction of source data. Just as it is easier to carry an image of a tree than the tree itself, synthetic data offer benefits which marketers can take advantage of without the friction that accompanies first party data.
How is it made?
Synthetic data differs from anonymized data or de-identified data because it is a creation. Just as drawing is not a process where trees are cut down and made smaller and flatter – synthetic data comes from a process of observations and then creation. The original data sources are observed, with patterns and details noted and used to create the abstraction.
The first question to ask is, “What geography are we are creating?” That answer defines the number of synthetic individuals to be created. Each synthetic individual needs to be described with attributes using the source data to inform decisions. Observations, such as ‘60 per cent of Canadians tend to drive SUVs and live in certain areas,’ influence the creative decisions. Through generative artificial intelligence, these percentages are “downscaled” to individuals transforming a representative sample of 20,000 people into the 38 million synthetic Canadians allocated across the 825,000 postal codes. Postal codes can then be aggregated into towns, cities, political ridings, Census Metropolitan Areas, and provinces, summing to the whole country.
There are huge benefits in this new format. One being that this data is a creation. Thinking of Don Draper, would we consider whether privacy law applies to him? Don Draper and his colleagues are vivid representations of people in a certain time and place – but wondering how to obtain and manage their consent is absurd. Since privacy law does not apply to a synthetic national population, this data is easy to share between companies, it can be used as a universal reference point, and it has nearly perfect match rates with other data. For these reasons, synthetic data is well suited for medical research where collaboration and transparency are prized but privacy and data breach concerns are high.
Currently, many in the marketing community are focused on first party data and the technology that supports storage, management, use, and of course consent and security. Now that marketers have access to synthetic data, there needs to be a re-think when it comes to the tech stack. Privacy legislation asks whether there are less intrusive means of achieving those purposes at a comparable cost and with comparable benefit. That means reviewing established data practice, weighing appropriate purposes with synthetic data features and protections versus first party data. Simply put, does a marketer need that first party data to target their customers? If the answer is no, then what needs to be changed in the martech stack? If brand growth is the priority, shouldn’t marketers be focused on prospects, an area where first party data is rare relative to the synthetic data available? And what about marketers with very little ability to collect first party data at all? Synthetic data gives the ability to see their customers in new detail like never before. Anything that can be done in synthetic formats should be done there. The utility is higher while the privacy and tech overhead is lower.
Some consider collecting first party data as being part of their “moat” which protects and enhances their company’s value. However, marketers need to recognize that synthetic data can create clones of first party data without access to the first party data. There are varying degrees of accuracy, timeliness, and depth depending on the source test used, but depending on the use, it can pass the “good enough test” for competitive intelligence or creation of retail media audiences.
The ability to create a national population is a boon to local Canadian media. Smaller regional markets are the areas where the most resolution is added. Additionally, the synthetic data creates a common actionable target across both digital and analogue media, making it ideal for cross-media planning, execution, and evaluation. Currently, there is international development pushing towards the use of synthetic individuals and populations as the means to understand cross-media reach and frequency with extensions into building planning models. A shared synthetic population of the country gives marketers, agencies, and media vendors a common truth set to work with.
There seems to be a quest to collect all the data from everybody and try to manage consent across a myriad of technologies, partners, and jurisdictions. The sheer scale of this direction ensures it cannot succeed and other marketing data ecosystems must be developed. Synthetic data is part of the foundation of this new arrangement, offering marketers new uses and views for business.