Proving Incrementality in Retail Media
Retail media networks promise billions in ad revenue, but without proof of incrementality, budgets remain stuck in legacy channels. This article breaks down three expert-backed methods that leading retailers use to demonstrate real sales lift and justify media spend. Learn how holdout tests, synthetic control models, and customer journey attribution separate true performance from correlation.
Force Holdouts to Measure True Lift
The only way we've found to get real incrementality on retail media is to stop trusting platform-reported last click and force some kind of control into the system. For Amazon, Walmart, and Instacart, we lean on geo splits or time-based holdouts where a portion of demand is intentionally left unexposed, even if it makes people nervous. One test that gave us confidence was pausing retail ads in a handful of matched regions while keeping everything else constant, then watching what happened to total sales, not just attributed sales. What surprised people was that some "top-performing" campaigns barely moved the needle once you removed last-click bias. The big lesson was that incrementality shows up in lift against a baseline, not in a dashboard that's paid to take credit. If you're not willing to let a slice of demand go dark temporarily, you're guessing, not measuring.

Build Synthetic Twins for SKU Counterfactuals
Incrementality across Amazon, Walmart, and Instacart works best when measured at the SKU x retailer level using a Synthetic Control approach. Instead of leaning on last-click logic, this method builds a counterfactual for each advertised SKU within each retail network, using a weighted blend of similar SKUs that were not advertised during the same window. Those controls come from the same retailer, same category, similar price bands, seasonality, and historical velocity. Because the comparison lives inside each retail ecosystem, the read reflects true demand lift rather than media proximity to conversion, and it stays insulated from regional quirks that often distort geo tests.
This structure avoids geo bias because it never relies on ZIP codes or regional holdouts where retailer coverage, delivery speed, or store density varies. Every SKU competes against its own synthetic twin drawn from national sales patterns inside the same platform. That makes Amazon results comparable to Walmart or Instacart without forcing artificial geographic splits that never behave the same across networks.
The outcome is a clean answer to one question advertisers care about: what sales would have happened anyway for this exact product at this exact retailer.
One example from our agency involved a packaged food brand running always-on sponsored placements across Amazon and Walmart. Last-click reports suggested strong performance on both, yet the synthetic control showed a different story. Amazon delivered a 14 percent incremental lift at the SKU level, while Walmart landed closer to 4 percent, with most volume reflecting baseline demand. Budget allocation changed the following quarter, favoring Amazon for conquesting and Walmart for coverage, and total incremental revenue rose without increasing spend.

Trace Journeys and Value New Customers
In retail media, a big challenge is understanding whether ads actually create new sales - or whether they just take credit for sales that would have happened anyway. Many strategies still rely on "last click" reporting, which gives all the credit to the final ad someone clicks before buying. This often makes results look better than they truly are, especially for brand search ads. To avoid this, we focus on incrementality: would this sale still have happened if we had not shown the ad? We usually use 2 strategies for this:
1. Full Customer Journey
Instead of only looking at the final click, we analyze the full customer journey. Using Amazon Marketing Cloud (AMC), we can connect:
ad impressions (who saw which ads)
...and purchase data (who actually bought)
This allows us to understand what happened before the purchase, not just the final interaction. We then compare two groups of shoppers:
Group A: Shoppers who first saw an upper-funnel ad (such as Sponsored Display or Streaming TV), later searched for the brand and then bought the product
Group B: Shoppers who only searched for the brand (and then bought) but did not see those earlier ads
If Group A converts at a higher rate than Group B, it indicates that the earlier ad helped create demand. This approach gives us a much more realistic view of performance than standard ROAS metrics.
2. Using "New-to-Brand" as a Practical Signal
Both Amazon and Walmart provide New-to-Brand (NTB) metrics, which show whether a customer is buying from a brand for the first time.
While NTB is not a perfect measure of incrementality, it is a very useful indicator.
For example:
A campaign may show strong ROAS
But if 90% of sales come from existing customers
In that case, the campaign is likely low-incremental. Campaigns that drive a healthy share of new customers are generally much more incremental and valuable for long-term growth.
Hope this helps!
Cheers,
Moritz
Leverage Auction Shocks as Instruments
Instrumental variables can use auction bid shocks to create clean changes in ad delivery. Sudden moves in rival bids or pacing change your impressions for reasons not tied to your demand. A first step confirms that the shock truly shifts exposure.
With strong controls and pre-trend checks, the key assumption is more believable. A two-step IV then turns that shift into a causal sales lift with clear error bars. Set up an IV plan that uses auction shocks and test it on recent campaigns now.
Apply Bayesian Mix Approach for Impact
Bayesian marketing mix models can separate baseline sales from sales caused by retail media. The model can include season, price, promos, stock, and ad effects with carryover and fading. Prior beliefs keep the numbers stable when data are thin and give clear uncertainty ranges.
Shared layers across stores or brands help learn faster while keeping key differences. Holdout weeks and simple model checks show if the lift is real. Build a Bayesian mix model with these parts and validate it on holdout weeks now.
Exploit Budget Cutoffs for Local Effects
Budget rules and category caps often create hard cutoffs that decide who gets extra spend. A regression discontinuity design compares outcomes for units just above and just below the cutoff to find local lift. Smooth covariates and no bunching near the cutoff support the design.
Using a narrow window and simple curves makes the estimate stable. Linking exposure near the cutoff to sales gives a clean incremental number with clear error ranges. Map your cutoffs and run an RD study around them this quarter.
Use Natural Experiments from Platform Changes
Natural experiments happen when rules or platforms change in ways advertisers do not control. A new privacy rule, a fee change, or a ranking update can shift exposure in some places but not others. A before and after comparison with a matched control group can turn that split into a causal sales effect.
Timelines of effects and a weighted control group help test trends and isolate the treated unit. Checks for spillovers across stores and channels protect against bias. Track coming changes and pre-register a natural experiment plan now.
Target High-Uplift Audiences via Causal Forests
Uplift modeling predicts which shoppers change behavior when they see retail media. Labels can come from true tests or from careful matching when tests are not possible. Simple two-model methods and newer causal forests can estimate a treatment effect for each shopper.
Uplift and Qini curves then check that the model ranks the right people. Turning spend toward high-uplift groups makes the added sales clear. Launch an uplift program that pairs small tests with targeted rollout today.

