Propensity Modeling for Marketing

Propensity modeling moves marketing from broad-blast campaigns to precise, individualized engagement. By predicting a customer's likelihood to take a specific action—like making a purchase or canceling a service—you can allocate resources efficiently, personalize messaging, and proactively retain valuable relationships.

From Business Question to Predictive Features

The journey begins by framing a clear business objective, such as "propensity to buy Product X in the next 30 days" or "propensity to churn in the next quarter." This objective directly defines your target variable: a binary label (e.g., 1 for purchased, 0 for did not purchase) for a specific historical cohort of customers. The model's job is to learn the patterns that distinguish the "1"s from the "0"s.

The fuel for this learning is feature engineering from behavioral data. Raw transactional logs, website clicks, email opens, and support tickets are transformed into predictive signals. Effective features are often rolling-window aggregates that capture recent behavior and trends. For a purchase propensity model, you might engineer features like 'total spend in the last 90 days,' 'number of category page views in the last 7 days,' and 'days since last purchase.' For churn, features like 'session frequency trend (slope),' 'number of support tickets logged,' and 'percent change in feature usage' become critical. The goal is to create a rich set of attributes that describe the customer's state and trajectory leading up to the target event.

A universal challenge in this domain is class imbalance handling. Typically, only a small percentage of customers buy a specific product or churn in a given window. If 95% of your labels are "0" (non-event), a naive model can achieve 95% accuracy by simply predicting "0" for everyone, which is useless. You must actively address this imbalance. Common techniques include using algorithmic approaches like XGBoost or LightGBM that have built-in parameters for handling imbalance, applying class weighting to tell the model that the rare "1" class is more important to get right, or carefully using downsampling of the majority class. The choice depends on your data volume and the specific algorithm.

Model Validation Beyond Accuracy

Once you have a trained model that outputs a probability score between 0 and 1 for each customer, you must validate its quality. Standard classification metrics like accuracy are misleading under imbalance. Instead, you should rely on the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). The ROC curve plots the true positive rate against the false positive rate at various thresholds. An AUC of 0.5 indicates a random model, while an AUC closer to 1.0 indicates strong predictive power. A value above 0.75 is often considered good for marketing applications.

Crucially, you need calibrated probability estimation. A well-calibrated model means that if it predicts a 70% propensity, the customer truly has a 70% chance of taking the action. Some powerful algorithms, like boosted trees, can produce biased (miscalibrated) probabilities. You can correct this by applying a post-processing technique like Platt Scaling (logistic regression on the model scores) or Isotonic Regression. Calibrated probabilities are non-negotiable for the next step: using an expected value framework to set action thresholds.

The most insightful validation comes from decile analysis. You sort all customers in a hold-out validation set (data not used for training) from highest to lowest predicted propensity score and split them into ten equal groups (deciles). You then calculate the actual response rate (e.g., % who bought) in each decile. A powerful model will show a steep gradient, with the top decile having a response rate many times higher than the average. This analysis directly answers the business question: "If we target only the top 10% of customers, what lift in response can we expect?"

This is visualized formally with lift and gain charts. A lift chart shows how much better your model performs at a given percentile compared to a random targeting baseline. For example, "lift at the 5th percentile" of 3 means you get 3 times the responses by targeting your model's top 5% versus randomly selecting 5% of customers. A cumulative gains chart plots the cumulative percentage of all actual responders captured as you target from the top percentile down. It shows the efficiency of your targeting; a perfect model would capture 100% of responders in the top deciles.

From Prediction to Business Action and Measurement

A raw propensity score is not an instruction. To decide whom to target, you need an expected value framework for action thresholds. This combines your calibrated probability ( $p$ ) with the economics of your campaign: $E x p ec t e d Va l u e = (p \times Va l u e_{S u ccess}) - ((1 - p) \times C os t_{C o n t a c t})$ Here, $Va l u e_{S u ccess}$ is the net profit from a conversion, and $C os t_{C o n t a c t}$ is the cost of your marketing intervention (e.g., discount, phone call). You calculate the breakeven probability where Expected Value = 0: $p_{t h res h o l d} = C os t_{C o n t a c t} / (Va l u e_{S u ccess} + C os t_{C o n t a c t})$ . You should only target customers whose propensity score exceeds this $p_{t h res h o l d}$ , as they are predicted to be profitable.

Finally, you must A/B test propensity-driven interventions. Deploying your model is not the end. You need to run a randomized controlled trial to measure its incremental impact. A standard design is:

Treatment Group (Model-Driven): Customers above the action threshold receive the marketing intervention.
Control Group A (Holdout): A random sample of customers above the threshold receive no intervention. This measures the model's ability to identify likely converters without stimulation.
Control Group B (Random): Customers below the threshold receive the intervention. This measures the baseline response rate without the model.

By comparing the response rates between these groups, you can isolate the true incremental lift generated by your propensity model and validate its return on investment.

Common Pitfalls

Leaking the Future: The most critical error is using information that would not have been available at the time of prediction. If your target is "purchase in June," all your feature engineering must use data only from May or earlier. Including a feature like "visited the checkout page in June" invalidates the model, as that visit is part of the purchase event you're trying to predict.

Ignoring Probability Calibration: Using uncalibrated scores for your expected value calculation will lead to targeting the wrong customers. If your model's "80% score" is really only a 50% chance, you will target unprofitable segments. Always check calibration on a hold-out set and apply scaling if needed.

Optimizing for the Wrong Metric: Maximizing AUC-ROC is good for overall model selection, but the final business decision depends on the lift in the top deciles and the expected value. A model with a slightly lower AUC but a much sharper concentration of responders in the top 5% may be far more valuable for a budget-constrained campaign. Always evaluate with decile analysis and business metrics.

Skipping the A/B Test: Assuming your model's lift in historical validation will translate directly to the same lift in a live campaign is a mistake. Market conditions change, and the act of marketing itself can influence behavior. An A/B test is the only way to credibly measure the causal impact and true ROI of your propensity modeling program.

Summary

Propensity models predict the likelihood of a customer action by learning from historical behavioral data. Success starts with precise business objective definition and rigorous feature engineering that avoids data leakage.
Address class imbalance directly using algorithmic choices, weighting, or sampling, and validate models using AUC-ROC and, more importantly, decile analysis and lift/gain charts to understand performance where it matters most.
Probability scores must be calibrated to reflect true likelihoods. Use these calibrated scores within an expected value framework to set profitable targeting thresholds based on your campaign economics.
The ultimate validation is a live A/B test that measures the incremental impact of acting on the model's predictions compared to random targeting or taking no action.

Propensity Modeling for Marketing

Propensity Modeling for Marketing

From Business Question to Predictive Features

Model Validation Beyond Accuracy

From Prediction to Business Action and Measurement

Common Pitfalls

Summary

Write better notes with AI