Partial Dependence and ICE Plots

Understanding why a complex machine learning model makes a certain prediction is just as important as its accuracy. When stakeholders need to trust a model's logic or when you need to validate its behavior, visualizing how features influence predictions becomes essential. Partial Dependence Plots (PDPs) and Individual Conditional Expectation (ICE) plots are powerful tools that illuminate the marginal effects of features, bridging the gap between model complexity and human interpretability.

What Are Partial Dependence Plots (PDPs)?

A Partial Dependence Plot (PDP) shows the average relationship between a target feature and the model's predicted outcome, while marginalizing over the values of all other features. It answers the question: "All else being equal, how does changing this one feature affect the average model prediction?" The PDP is a global explanation method, providing a summary of the model's behavior across the entire dataset.

The partial dependence function for a feature of interest, $x_{S}$ , is calculated as the average prediction when $x_{S}$ is forced to a specific value across all instances in your dataset. Mathematically, it is estimated as:

$\hat{f}_{S, P D P} (x_{S}) = \frac{1}{n} i = 1 \sum n \hat{f} (x_{S}, x_{C}^{(i)})$

Here, $\hat{f}$ is the trained model's prediction function, $x_{S}$ is the feature we want to plot, and $x_{C}^{(i)}$ represents the values of all other features for the $i$ -th data point in our dataset. In practice, you create a grid of values for $x_{S}$ . For each grid value, you replace the original $x_{S}$ value in every row of your dataset with this constant, get a new prediction from the model, and then average all those predictions to get one point on the PDP.

For example, in a model predicting house prices, a PDP for "square footage" would show the average predicted price as square footage increases from a minimum to a maximum value, while holding all other features (like number of bedrooms, location) at their real, observed values for each house in the data.

Individual Conditional Expectation (ICE) Plots: Seeing the Trees in the Forest

While a PDP shows an average effect, it can hide heterogeneous relationships—instances where the feature's effect varies. An Individual Conditional Expectation (ICE) plot addresses this by displaying one line per instance, showing how the prediction for that specific instance changes as the feature of interest changes.

Each ICE curve is generated by taking a single data instance and calculating its prediction as you vary $x_{S}$ across its grid, keeping all its other feature values ( $x_{C}$ ) constant. The PDP is simply the average of all ICE lines at each grid point.

ICE plots are invaluable for uncovering interactions. If all ICE lines are parallel, it suggests the feature's effect is consistent for all instances (additive). If the lines cross or have different slopes, it signals that the feature interacts with other features in the model. For instance, in a credit risk model, the ICE plot for "income" might show a strongly positive slope for most applicants but a flat or negative slope for those with a very high number of existing loans, revealing an interaction between income and debt burden.

Extending to Interactions and Handling Correlated Features

You can create a two-way PDP to visualize the interaction effect between two features. This generates a contour or heatmap surface showing the average predicted outcome as two features are simultaneously varied. It's a direct way to see if the effect of one feature depends on the level of another. If the contour lines are parallel, the interaction is weak; if they converge or curve, a meaningful interaction is present.

A critical assumption of standard PDPs is feature independence. When features are correlated, the PDP can create misleading plots by averaging over impossible or highly unlikely data points. For example, if you have a model with "latitude" and "average temperature" as features, a PDP for latitude will average predictions where latitude is arctic but temperature is tropical—a combination not found in the real data. This leads to extrapolations that may not reflect the model's actual behavior in the feasible data space.

Accumulated Local Effects (ALE) Plots: A Robust Alternative

Accumulated Local Effects (ALE) plots were developed specifically to handle correlated features. Instead of averaging predictions over the entire marginal distribution of other features (like PDPs do), ALE plots compute differences in predictions within small intervals of the feature of interest, using only the data points that actually fall in those intervals. They then accumulate these local differences.

The ALE method avoids averaging predictions for unrealistic data combinations, making it more reliable when features are correlated. It shows the main effect of the feature, conditioned on the actual data distribution. While the PDP asks "What is the average prediction when we set $x_{S}$ to a certain value?", the ALE plot asks "How does the prediction change when $x_{S}$ changes within a realistic local neighborhood?" In many practical scenarios with real-world, correlated data, ALE plots provide a more faithful visualization of the model's learned relationship.

Applying Feature Effect Plots for Validation and Communication

These visualization tools are not just diagnostic; they are central to the model development and deployment lifecycle. For model validation, you should always check if the relationships shown in PDPs and ICE plots align with domain knowledge and common sense. A model predicting health outcomes that shows risk decreasing with age might be a red flag requiring investigation. These plots can also reveal unexpected interactions or non-linearities that you can then test for significance.

For stakeholder communication, a well-crafted PDP is often the most effective way to build trust. It translates a "black box" model into an intuitive, graphical story. You can show business leaders: "See, the model predicts that customer churn risk plateaus after three support calls, which aligns with our team's experience." ICE plots can be used to illustrate segments—"For most customers, discount increases purchase probability, but for this loyal segment, it actually has little effect." This moves the conversation from blind faith in an algorithm to informed discussion about its logical outputs.

Common Pitfalls

Ignoring Feature Correlation: As discussed, using standard PDPs on correlated features leads to extrapolation and misleading interpretations. Correction: Always check feature correlation matrices. If strong correlations exist with the feature of interest, prefer ALE plots for a more accurate view of the model's behavior within the data manifold.

Over-interpreting the PDP as a Causal Effect: A PDP shows an associative relationship within the context of the model, not necessarily a causal one. The observed effect is contingent on which other features are included and how the model has used them. Correction: Frame interpretations cautiously: "The model associates higher values of X with increased Y," not "Higher X causes higher Y."

Missing Heterogeneity by Only Using PDPs: Relying solely on the average line of a PDP can mask subgroup effects or strong interactions, leading you to believe a relationship is uniform when it is not. Correction: Always generate ICE plots alongside PDPs. Look for differing slopes and patterns among the ICE lines to uncover hidden interactions.

Using Too Many ICE Lines: Plotting ICE lines for thousands of instances creates an unreadable, solid block of color. Correction: Use subsampling (e.g., plot 100-200 randomly selected instances) or cluster the ICE lines and plot a representative from each cluster. This maintains the insight into heterogeneity without the visual noise.

Summary

Partial Dependence Plots (PDPs) visualize the average marginal effect of one or two features on a model's predictions, providing a global model summary ideal for stakeholder communication.
Individual Conditional Expectation (ICE) plots reveal per-instance effects, allowing you to diagnose heterogeneity, uncover interactions, and see beyond the average trend shown in a PDP.
For investigating feature interactions, two-way PDPs are an excellent visual tool to see how the combined effect of two features differs from their individual effects.
When features are correlated, standard PDPs can be unreliable due to extrapolation; Accumulated Local Effects (ALE) plots are a robust alternative that computes effects conditioned on the actual data distribution.
These visualizations are critical for model validation (checking for plausible relationships) and stakeholder communication (translating model logic into intuitive graphics), forming a cornerstone of responsible and interpretable machine learning.

Partial Dependence and ICE Plots

Partial Dependence and ICE Plots

What Are Partial Dependence Plots (PDPs)?

Individual Conditional Expectation (ICE) Plots: Seeing the Trees in the Forest

Extending to Interactions and Handling Correlated Features

Accumulated Local Effects (ALE) Plots: A Robust Alternative

Applying Feature Effect Plots for Validation and Communication

Common Pitfalls

Summary

Write better notes with AI