Estimating predicted probabilities from logistic regression: different methods correspond to different target populations
Abstract
We review three common methods to estimate predicted probabilities following confounder-adjusted logistic regression: marginal standardization (predicted probabilities summed to a weighted average reflecting the confounder distribution in the target population); prediction at the modes (conditional predicted probabilities calculated by setting each confounder to its modal value); and prediction at the means (predicted probabilities calculated by setting each confounder to its mean value). That each method corresponds to a different target population is underappreciated in practice. Specifically, prediction at the means is often incorrectly interpreted as estimating average probabilities for the overall study population, and furthermore yields nonsensical estimates in the presence of dichotomous confounders. Default commands in popular statistical software packages often lead to inadvertent misapplication of prediction at the means.
Using an applied example, we demonstrate discrepancies in predicted probabilities across these methods, discuss implications for interpretation and provide syntax for SAS and Stata.
Citation impact
- FWCI
- 29.89
- Percentile
- 100%
- References
- 47
Authors
2Topics & keywords
- Statistics
- Logistic regression
- Econometrics
- Confounding
- Inference
- Population
- Causal inference
- Regression analysis
- Reduced inequalities