Measuring True Incrementality in Programmatic Spend

Without holdout testing, you can't tell which conversions the DSP actually caused versus would have happened anyway. Here's how to build an incrementality framework.

Abstract data visualization representing incrementality measurement in programmatic advertising

The question your CFO is asking isn't "what's our ROAS?" — it's "if we cut programmatic spend in half, how many conversions would we actually lose?" These are different questions, and ROAS doesn't answer the second one.

ROAS measures efficiency: conversions per dollar of ad spend. Incrementality measures causality: conversions that occurred because of ad spend, rather than despite it. A campaign can have excellent ROAS while contributing almost no incremental conversions — this happens routinely in retargeting programs where the audience being "retargeted" was going to convert anyway based on intent signals that preceded any ad exposure.

Without a rigorous incrementality measurement framework, you don't know which portion of your programmatic conversions are incremental. This article outlines how to build one.

The foundational concept: intent-to-treat and the counterfactual

Incrementality measurement is fundamentally a counterfactual problem: what would have happened in the absence of the advertising? You can't observe the counterfactual directly — you can't simultaneously show someone an ad and not show them an ad. What you can do is randomly assign users to treatment (ad exposure) and control (withheld ad) groups, then observe conversion rates in each group.

The difference in conversion rate between the treatment group and the control group, multiplied by the number of treated users, gives you the incremental conversions attributable to the advertising. This framework is borrowed from randomized controlled trial (RCT) methodology — the same statistical logic used in clinical trials.

In programmatic contexts, the most common implementation is a ghost bidding test (also called holdout bidding or impression-level holdout). Your DSP bids on impressions for both treatment and control users, but withholds ad delivery for the control group. The bid log shows you which users were eligible to receive the ad; the conversion log shows you who converted. The comparison gives you the incrementality estimate.

Ghost bidding vs. geo holdout vs. time-based holdout

There are three main approaches to creating the counterfactual, each with different tradeoffs:

Ghost bidding (impression-level holdout): The DSP bids on auctions for control users but doesn't deliver an ad — the impression is "ghosted." This is methodologically the cleanest approach because it creates matched treatment and control groups at the individual user level, controlling for differences in audience composition between the groups. The limitation: most DSPs incur costs for ghost bid auctions (you pay for the bid evaluation even without delivery), and not all platforms support impression-level holdouts natively.

Geo holdout: You designate certain geographies as control regions where you stop or pause programmatic activity, then compare conversion rates in the holdout regions versus active regions. This is operationally simpler than ghost bidding but requires larger budget hold-outs (you're withholding an entire geography's exposure, not just a percentage of individual users) and is subject to geographic confounding — regional differences in market conditions, seasonality, and competitor activity can contaminate the comparison.

Time-based holdout: Alternating "on" and "off" periods for a campaign, comparing conversion rates in on-periods versus off-periods. This approach is highly vulnerable to temporal confounding — any systematic difference between the time periods (day of week, seasonal patterns, external events) will contaminate the incrementality estimate. Not recommended for performance campaigns where conversion timing is already volatile.

Ghost bidding is the preferred method for most programmatic contexts where the platform supports it. Geo holdout is appropriate when ghost bidding isn't available or when you need to measure full-funnel brand-level incrementality rather than campaign-level.

Setting up the test correctly

A properly designed incrementality test requires five things before launch:

  1. Random assignment at the user level: Treatment and control groups must be randomly assigned from the same eligible audience pool. Convenience splits (e.g., users who were reached vs. users who weren't reached) introduce selection bias that will inflate your incrementality estimate — users who are easier to reach with programmatic ads are often different from users who aren't, in ways that correlate with conversion intent.
  2. Pre-specified conversion window: Define the conversion window before the test starts. Post-hoc window selection is p-hacking. For most performance campaigns, a 7-day or 14-day click-through window plus 1-day view-through is a reasonable prior; adjust based on your product's average purchase cycle.
  3. Sufficient statistical power: The sample size required depends on your baseline conversion rate and the minimum detectable effect size you care about. If your campaign converts 2% of reached users and you want to detect an incremental lift of 0.3 percentage points (15% relative lift), you need roughly 20,000 users in each group for 80% power. Most performance teams underpower their incrementality tests by an order of magnitude and end up with inconclusive results.
  4. Clean holdout boundaries: Control users should not be reachable by any other active campaign that targets the same population. If your control group users are being reached by a separate prospecting campaign targeting the same audience definition, the control group is contaminated.
  5. Pre-specified analysis plan: Define how you'll handle user attrition, missing data, and the statistical test you'll run before you see any results. Moving the goalposts after seeing partial results is the most common way incrementality tests generate misleading conclusions.

Interpreting the results and what to do with them

The output of a well-designed holdout test is an incremental conversion lift estimate with a confidence interval. For example: "This campaign generated 340 incremental conversions (95% CI: 280-400) over the 30-day test period."

From this you can calculate true incremental ROAS: total campaign spend divided by incremental conversions multiplied by average order value. This number will almost always be lower than your platform-reported ROAS — often materially lower for mature retargeting programs where a significant share of "conversions" would have happened organically.

What you do with that number depends on what you find. If incremental ROAS is above your target threshold: continue investing, potentially scale. If it's below threshold but the absolute lift is meaningful relative to program cost: re-examine the audience targeting strategy (you may be over-investing in high-intent audiences that don't need to be reached programmatically) and consider shifting budget toward awareness-stage placements where you'd expect higher incrementality. If incremental lift is not statistically distinguishable from zero: this is important information — it means the program as currently configured is not demonstrably driving conversions.

Cross-DSP incrementality measurement

When you're running multiple DSPs, incrementality measurement gets more complex. Each platform's holdout test tells you the incremental lift of that platform in isolation. What it doesn't tell you is the joint incrementality of the full multi-DSP program — the conversions that wouldn't have happened without any of the platforms working together.

Testing the joint program requires a holdout that withholds all platforms simultaneously from the control group, which is operationally complex when you have 4+ active DSPs. The simpler approach is to run sequential incrementality tests — establish baselines on your primary platforms, then measure the marginal lift of adding each additional platform to the mix. This gives you a rough ranking of platforms by incremental contribution that informs budget allocation decisions.

The key insight from multi-platform incrementality testing is typically this: platforms with the highest reported ROAS are rarely the platforms with the highest incremental lift. Retargeting campaigns consistently show this pattern — high ROAS, low incrementality, because the audience being targeted was already converting at high rates from organic and direct channels. Awareness and prospecting campaigns show the inverse: lower reported ROAS, but higher incremental lift, because these campaigns reach users who wouldn't have converted through other channels.

This doesn't mean retargeting is bad. It means the budget allocation between retargeting and prospecting should reflect incremental contribution, not just efficiency metrics — and that balance is very different for most programs than their platform ROAS numbers would suggest.

PSA holdouts and why some teams prefer them

An alternative to ghost bidding that avoids the cost of auction participation for control users is a PSA (public service announcement) holdout. Rather than withholding the impression entirely, the DSP serves a PSA creative — a generic nonprofit or public interest message — to control group users in place of the advertiser's creative. The advertiser pays for the impression but doesn't deliver their message to control users.

PSA holdouts are preferred by some teams because they provide a cleaner conversion environment: control users receive an impression, which means they're in an active browsing context comparable to treatment users, but they don't see the test creative. This eliminates one potential confound in ghost bidding holdouts — the possibility that ghost bidding slightly alters user browsing behavior by changing which impression won the auction, creating a subtle compositional difference between treatment and control groups.

The practical tradeoff is cost. PSA holdouts require paying for every impression in the control group without any commercial benefit from those impressions. For large holdout samples, this can represent a meaningful budget allocation to generate the counterfactual measurement. Ghost bidding at platforms that support it is typically more cost-efficient for the same statistical power, but the PSA method remains valid and is appropriate in contexts where cost is secondary to methodological rigor.

MMM as a complement, not a substitute

Media mix modeling (MMM) is frequently positioned as an alternative to holdout incrementality testing for measuring programmatic effectiveness. The framing is misleading — MMM and holdout testing answer different questions at different levels of resolution.

MMM operates on aggregate time-series data: total spend by channel per time period, correlated against total conversion volume per time period, with controls for external variables (seasonality, promotion activity, competitive spend). The output is an estimated contribution of each channel to total sales. For high-level portfolio allocation decisions — how much to spend across paid social, programmatic, search, and offline collectively — MMM provides useful directional guidance.

MMM cannot tell you whether a specific DSP campaign is driving incremental conversions. The granularity is insufficient: programmatic spend is typically one line in an MMM model, not disaggregated by DSP, audience segment, or creative type. The temporal resolution (typically weekly or monthly data) misses the within-platform optimization signals that determine whether your budget is working. And MMM is particularly weak at measuring channels with strong last-touch presence — programmatic display frequently appears to contribute less in MMM models than holdout tests show, because last-touch attribution in search and direct channels captures conversions that programmatic influenced at an earlier funnel stage.

Multi-touch attribution (MTA) sits between MMM and holdout testing in both granularity and causal validity. MTA models distribute conversion credit across all touchpoints in the conversion path using statistical rules (data-driven attribution, linear, position-based). MTA provides more granular insight than MMM but less causal validity than holdout testing — the credit assignment is a model assumption, not an observed counterfactual. In cookieless environments where cross-site user tracking is increasingly limited, MTA data quality degrades significantly.

The measurement stack that provides the most complete picture combines all three: holdout testing for causal validity on specific campaigns, MTA for path-level understanding of cross-channel sequencing, and MMM for long-horizon portfolio allocation decisions where experiment-based testing isn't feasible. Each method has a role; treating any single approach as the definitive measurement system will produce blind spots.

Incrementality measurement at scale: practical constraints

Running rigorous holdout tests on every campaign simultaneously isn't operationally feasible for most performance teams. It requires dedicated holdout budget (typically 10-20% of the tested campaign's budget allocated to the control group), careful audience management to prevent control group contamination, and statistical analysis after the test period. For teams running 15+ simultaneous campaigns across multiple DSPs, running proper holdouts on all of them at once would require more measurement infrastructure than most organizations have.

The practical approach is to prioritize holdout testing for the campaigns and platforms where the incrementality question has the highest decision value: campaigns representing the largest budget concentration, campaigns where platform-reported ROAS differs significantly from what the team expects based on prior performance, and campaigns being evaluated for budget increases or decreases. These are the situations where getting the incrementality answer right changes a decision with material economic consequences.

Platforms where incrementality is relatively well-understood — branded search campaigns, for example, where the treatment-control split can be approximated through geography-based experiments — need less frequent holdout validation than platforms where attribution is complex and multi-touch path dependencies are high.

The iROAS conversation with finance

Incremental ROAS (iROAS) is the metric that connects programmatic measurement to finance team conversations about budget justification. Standard ROAS is an efficiency metric that finance teams have learned to discount — they've seen enough post-hoc attribution reports to understand that platforms claim credit generously. iROAS is a causal metric: it represents the return on each marginal dollar of programmatic spend, net of organic conversion rates that would have occurred without advertising.

A program with a 4.2× platform-reported ROAS and a 1.8× iROAS is a substantially different investment case than a program with a 2.1× platform-reported ROAS and a 1.9× iROAS. The first program is mostly attributing conversions that would have happened anyway. The second is generating genuine incremental return, even at a lower claimed efficiency. Finance teams who understand iROAS will prefer the second program; finance teams working from platform ROAS alone may prefer the first.

Building iROAS into your measurement reporting requires running holdout tests with sufficient rigor and frequency to generate statistically valid estimates. It's a higher bar than pulling platform-reported ROAS dashboards. It's also the only way to have a defensible answer to "what would happen if we cut programmatic spend?" — which is the question that actually determines programmatic's budget allocation when priorities shift.

The teams that have built incrementality measurement programs consistently report that the initial holdout results are uncomfortable: iROAS on retargeting programs is almost always lower than platform ROAS suggests, sometimes by 50% or more. That discomfort is informative. It means budget shifts — typically from retargeting toward prospecting and awareness — are warranted. The measurement work isn't about confirming what you hoped was true; it's about finding out what is.

Brandpathio includes incrementality modeling in the Scale and Enterprise tiers.

Request a Demo