What data do we need for mMM?

One of the most frequent questions we get is what data is necessary for MMM and in what structure / format. Let’s explore the topic.

Basic data requirements for MMM

MMM quantifies how various marketing activities contribute to a company’s KPIs – such as sales – the model’s accuracy hinges on the quality and comprehensiveness of the data fed into it. Typically, a dataset spanning at least 2* years is recommended to capture the cyclic nature of business and consumer patterns. This means you should have 2 years of data (time series) on:

1) Required data inputs

  • Sales (or a similar business KPI) – the target variable (also called dependent, modelled or explained variable – because we explain its value using other variables).
    • Here we mean the total Sales – its breakdown into marketing channels is the output of MMM, not its input – this is sometimes confusing to people first encountering MMM.
    • Other frequently used target metrics are number of new customers, number of app installs or new registered users  
  • Media costs by channel – so e.g. Google Ads Pmax costs, FB Ads costs, Youtube daily or weekly costs, TV costs etc – for all your major marketing channels. How to structure this data is described below in the section on Channel grouping

2) Optional data inputs

  • Media impressions or similar “volume metric” – eg GRPs or TRPs for TV. This is recommended but MMM is possible even without these as long as you have costs for the channels.
  • Major events – these can be both internal (e.g. product / new collections launches, website redesign, some change in level of service/CX, changed stock availability, major PR event…) or external (new competitor entering the market or some other sudden change in market conditions). 
  • Pricing and discounting – optional but highly recommended for businesses that actively work with pricing (eg retail, e-commerce), ideally also capturing the competitive part of pricing (your brand vs your competitors) 
  • Major promotions – eg free delivery or major seasonal promotions
  • Competitor activity – e.g. their major promotions and media spending
  • Macroeconomic factors – consumer sentiment or inflation,…

In general, you should attempt to capture all major demand drivers in your model – if something is missing, there is a risk that the model predictions will be wrong. An example of a demand driver “map” for a retailer might be this:

There are exceptions to this and 2 years is not a “hard” limit – but you should take it as a general guideline and recommendation. Shorter data range may have negative consequences on the model quality in many cases.

For a one-off analysis the format can be a simple Excel, CSV or Google Sheet looking like this:

…or for online platforms it is usually faster and easier to use connectors in your MMM solution. For always-on MMM solutions, there is always some form of data processing pipeline that ingests new data automatically and feeds them into the model. 

Channel Grouping

For media the main question is how to structure the media channels into a so called channel grouping – e.g. should we treat Google Ads as 1 channel or should it be split into several by campaign type or objective or bidding strategy? This is something where your MMM partner should help you with guidance

  • The channels should not be too small (eg <2% spend share)
    • otherwise there is a high risk of the channel being hard to detect or having an unrealistic result
  • Also having one huge channel (e.g. 60% of total spend) and then lots of small channels is not ideal 
  • The channel structure reflect (as much as possible) how you work with the budgets and KPIs
  • There is also a limit on the number of channels – given by the number of data points
    • A general rule of thumb is that you should have at least 10x more data points than channels – so if you have 3 years of weekly data = 3 x 52 data points = 156 data points => you should have max 15 channels  
  • If two or more channels are too correlated, it may be a problem for MMM – again something your MMM partner should identify and fix.

All in all – designing a good channel grouping is quite important for MMM success and it is one of the areas where your MMM partner should help you as there are unavoidable trade-offs and navigating this requires experience with modelling and expert judgement.

In practice media channels are most often broken down into 10-20 channels in total and large platforms and mediatypes like Google, Facebook or TV usually into 2-5 “sub-channels”.

Use daily or weekly (or monthly) data?

Daily data allow for a more granular channel grouping (more channels measured) but they may be too “noisy” in some cases which can make the modelling process much more difficult. This is something that needs to be decided case by case by the modeller.

Do we need to include “everything”?

No, but even for the first versions of the model, you should include all factors that probably impact your sales in a major way – main media channels, often pricing/discounting and major promotions. 

What about seasonality?

Seasonality (yearly and weekly) is typically detected by the modelling process and you do not need to input it.  

Do all the channels need to have 2 years of data?

No, it is perfectly ok if you ran some channels only occasionally or only for some period of time.

There needs to be some variance

For MMM to work, there needs to be some variance in the costs of a given channel (in some days/weeks you spend more than in others) – this is typically not a problem. But for example if you have a multi-year sponsorship of a sports club (and its start or end does not fall into the modelling period), it may be difficult or impossible to quantify its effect.

Share Knowledge