Bayesian Priors in MMM: The Good, The Bad, and The Ugly

If you’ve sat in a marketing measurement meeting in the last two years, you’ve likely heard the “B-word” dropped with increasing frequency: Bayesian.

For a long time, Media Mix Modeling (MMM) was the domain of “Frequentist” statistics—the kind of math where you throw data into a black box, and it tells you what happened based solely on the data provided. 

As privacy changes (iOS14, the death of cookies) have decimated our ability to track individual user actions, the marketing-measurement industry has pivoted back to the “top-down” view of MMM. And this time, it’s the Bayesian approach rather than the frequentist one that is gaining popularity.

Whether you’re using Google’s Meridian, open-source powerhouses like PyMC Marketing, or a custom-built solution, you are dealing with Priors. To some, priors are a “cheat code” that makes models actually work. To others, they are a way to “cook the books.”

It is time to pull back the curtain. This isn’t a data science lecture; it’s an honest look at the tools the industry is using to decide where your next 1 million goes.

Why "Clean Data" is a Myth: The MMM Struggle

Before we talk about Bayesian math, we have to talk about why MMM is so hard. In a perfect world, an analyst would have 24 months of perfectly clean, varied data. In the real world, we have:

  1. Multicollinearity (The “Double-Up” Problem): You never run TV ads in a vacuum. You run TV, and simultaneously, you ramp up Search spend because people are looking for you. When TV and Search always move together, the model can’t easily tell who actually drove the sale. It’s like trying to figure out which twin ate the cookie when they’re both covered in crumbs.
  2. Sparse Data (The “Small Channel” Problem): You spent 5,000 on a TikTok test. In the grand scheme of a 15M budget, that’s a rounding error. A standard model might look at that tiny spend and say, “I see no signal here; therefore, the ROI is zero.” We know that’s probably not true, but the math is literal.
  3. The “Flatline” Problem: If you spend exactly 60k every single week on a Youtube Awareness campaign, the model has no “variance” to learn from. It doesn’t know what would happen if you spent 30k or 150k.
 

This is where “Frequentist” models can often break. They return “nonsensical” results—like saying your most expensive channel has an ROI of 50x, or worse, a negative ROI.

The Bayesian Engine: Priors, Likelihood, and Posteriors

In simple terms, Bayesian modeling allows us to give the model a “head start.”

Imagine you’re guessing how much a stranger weighs.

  • The Prior: Before they walk in the room, you know the average human weighs around 70–85 kgs. You wouldn’t guess 10 kg or 300 kg. That’s your prior knowledge.
  • The Likelihood: The person walks in. They look quite tall and muscular. This is the “new data” you’re seeing right now.
  • The Posterior: You combine your prior knowledge (averages) with what you see (the data) to make an informed guess: “I bet they weigh 95 kg.”

In MMM, it works exactly the same way:

  • The Prior: “Based on 10 years of marketing and a Geo-test we did last quarter, we think the ROI for Brand Search is roughly 5-6x.”
  • The Likelihood: “The data from the last 6 months says the ROI is 2x.”
  • The Posterior: The model finds a middle ground, perhaps landing on 4x.

The Good: Why Priors are a Superpower

When used correctly, priors are the “adult in the room.” They prevent the model from hallucinating and allow us to inject real-world business intelligence into the math.

1. Stability in the Chaos

Priors act as “guardrails.” We can set a prior that says, “We are 95% sure that TV ROI isn’t negative.” This prevents the model from spitting out a result that would get an analyst laughed out of the boardroom. It forces the math to stay within the realm of physical possibility.

2. Learning from Experiments (The "Holy Grail")

The biggest breakthrough in modern MMM (and why tools like Meridian are so hyped) is the ability to use Informed Priors from experiments.

If you ran a clean, randomized Geo-test on YouTube and found a 2.50x incremental ROAS, you can “feed” that result into the MMM as a prior. The MMM then uses its own data to “fine-tune” that result across the rest of the year. It bridges the gap between short-term testing and long-term planning.

3. The "Cold Start" for New Channels

If you’re launching a new channel—say, Reddit—you have no history. A traditional model is useless here. With a Bayesian approach, you can use “industry benchmarks” or “similar channel” data as a prior. You’re telling the model: “Start by assuming Reddit works a bit like Meta, and as we get more data, feel free to change your mind.”

The Bad: When Priors Become a Crutch

If priors are a superpower, they are also a dangerous temptation. The line between “informed” and “biased” is incredibly thin.

1. The "Anchor" Effect

If you set a prior that is too “narrow” (meaning you tell the model you are very certain about a result), the model will ignore the actual data. This is called over-weighting the prior. You could spend $1M and see zero sales, but if your prior says “Facebook is amazing,” the model might still tell you Facebook is amazing. It becomes a self-fulfilling prophecy.

2. The Subjectivity Trap

Whose “expert opinion” do we use for the prior?

  • The Brand Manager who needs their campaign to look successful?
  • The CFO who thinks all marketing is a waste of money?
  • The agency that has a vested interest in a specific channel?
  • If you aren’t careful, the MMM stops being an objective measurement tool and starts being a scoreboard for whoever has the most political power in the office.

3. Complexity Overload

Explaining a standard regression to a CMO is hard enough. Explaining “Probability Distributions” and “Half-Normal Priors” is a recipe for glazed eyes. There is a risk that the “Technical Marketers” and “Analysts” become the only ones who understand the “why” behind the numbers, creating a gap in trust with leadership.

The Ugly: The Laundering of Opinions

This is the part people don’t like to talk about in white papers. Because Bayesian models can be influenced, they will be influenced.

1. "Priors-In, Priors-Out" (The Feedback Loop)

The “Ugly” happens when an analyst “tunes” the priors until the model shows what they want it to show. “Oh, the model says TikTok is a 0.5x ROI? Let me just tighten the prior based on this one ‘best-case scenario’ case study I found online… okay, now it’s a 1.2x. Perfect.”

When we use priors to force a “palatable” answer, we aren’t modeling reality; we are laundering our opinions through math to make them look like science.

2. Hiding Behind the "Black Box"

Modern Bayesian libraries are sophisticated – they can handle complex shapes like “Adstock” (the carryover effect of ads) and “Saturation” (the point of diminishing returns) or complex hierarchies. However, if an analyst sets 50 different priors for 10 different channels, the “Ugly” reality is that no one—not even the analyst—fully understands how those priors are interacting. You can end up with a model that is “mechanically” correct but logically bankrupt.

3. The Illusion of Certainty

Bayesian models give you “credible intervals” (e.g., “We are 90% sure the ROI is between 2.0 and 3.5”). The “Ugly” side is that marketing leaders often ignore the range and just look at the midpoint. They take the “informed guess” of a Bayesian model and treat it as a hard, cold fact. This leads to over-confidence in budgets that are actually built on very shaky ground.

The Tales of Two Priors: Real-World Scenarios

To move beyond the theoretical, let’s look at how priors actually behave in the wild. These are common scenarios where the “B-word” makes a tangible difference in how millions of dollars are allocated.

Example 1: The "Video Tangle" (When Priors Save the Day)

The Setup: A retail brand runs heavy Linear TV and YouTube simultaneously. Because both channels follow the same promotional calendar (scaling up for Black Friday and down in January), they are highly correlated.

The Problem: Without priors, the model looks at the data and gets confused. It sees that sales went up when both channels were high, but it can’t decide which one did the heavy lifting. The “Frequentist” result? It gives YouTube a 12x ROI and TV a 0.1x ROI. The math “works,” but it’s logically impossible—we know TV drives reach, but the model is just picking a winner based on a statistical coin flip.

The Bayesian Fix: The analyst sets an Informed Prior. They look at a Geo-test conducted two years ago and a series of “Brand Lift” studies. They tell the model: “We have a high degree of confidence that TV ROI is between 1.5x and 2.5x.” The Outcome: Instead of “robbing” TV to pay YouTube, the model uses this prior as a leash. It keeps TV in a realistic range and then uses the remaining “unexplained” sales to calculate a more grounded ROI for YouTube (let’s say, 3.0x).

  • Why this is “The Good”: The prior didn’t ignore the data; it provided the context necessary to solve a correlation problem that math alone couldn’t fix.

Example 2: The "Optimistic Agency" (When Priors Go "Ugly")

The Setup: A performance agency is responsible for Meta and TikTok. They are under pressure to prove that TikTok is a viable growth engine so they can unlock more production budget.

The Problem: The actual data is lukewarm. TikTok is driving some sales, but the signal is “noisy” and the ROI isn’t clearly beating Meta.

The Subtle Tactic: Rather than “faking” the numbers, the agency’s analysts get sophisticated with the Priors. They set a very “tight” (highly certain) prior on TikTok’s Saturation Point (the point where more money stops helping) and a “generous” prior on its Adstock (how long the ad stays in a consumer’s mind).

The Result: The model spits out a glowing report: TikTok has a high “long-term” ROI and plenty of “headroom” to spend more.

  • The Trap: Here’s the “Ugly” part: The model’s validation metrics (how well the model fits the data) look still excellent. On paper, it’s a “perfect” model. But the result isn’t coming from the data—it’s coming from the fact that the analysts “tuned” the priors so specifically that the model had no choice but to agree.
  • The Reality: The business stakeholders see a scientifically “validated” model and double the TikTok budget. In reality, they are just doubling down on the agency’s own optimistic assumptions, laundered through a Bayesian black box.

If you are a marketing leader or a technical marketer, you don’t need to know how to write Python code or understand the intricacies of Bayesian modelling. But you must ask these three questions of your measurement team to ensure you’re seeing “The Good” and not “The Ugly”:

  1. “What are our ‘Informed Priors’ and where did they come from?” If the answer is “we just used the defaults,” be careful—the model might be drifting. If the answer is “we used our 2025 incrementality tests,” that’s a win.
  2. “Show me the ‘Prior vs. Posterior’ plot.” This is a standard chart in libraries like Meridian. If the “Prior” (what you told the model) and the “Posterior” (the final result) are identical, the model didn’t learn anything from your data—it’s just repeating what you told it.
  3. “What happens if we ‘relax’ the priors?” A good analyst should perform a Sensitivity Analysis. If shifting a prior slightly causes the whole budget recommendation to flip upside down, your model is a house of cards. It means the data isn’t strong enough to support the conclusions.

And of course, whenever feasible, validate the model results with well-designed and executed experiments (eg conversion lift studies, geo-experiments etc).

The Bottom Line

Bayesian MMM is not a crystal ball. It is a way to combine historical wisdom with current data.

It is “Good” because it handles the messy, correlated reality of modern marketing better than anything else we have. It is “Bad” because it requires a level of subjective judgment that many marketers aren’t prepared for. And it is “Ugly” when it’s used to hide biases behind a veneer of sophisticated probability.

As we move into the era of automated tools and sophisticated libraries, remember: The model is only as smart as the assumptions you give it. Use priors to inform the math, but never let them silence the data.

How often do you currently audit the assumptions (or “priors”) that go into your measurement models?

Leave a Reply

Your email address will not be published. Required fields are marked *