articleComputationJan 15, 2026GOLD OA

The Health-Wealth Gradient in Labor Markets: Integrating Health, Insurance, and Social Metrics to Predict Employment Density

University of Pennsylvania · Boston University · +2 more institutions

Indexed incrossrefdoaj

Abstract

Methods

We constructed a multi-source longitudinal dataset (2014–2024) by aggregating county-level Quarterly Census of Employment and Wages (QCEW) data with County Health Rankings to the state level. Using a time-aware split to evaluate performance across the COVID-19 structural break, we compared LASSO, Random Forest, and regularized XGBoost models, employing SHAP values for interpretability.

Results

The tuned, regularized XGBoost model achieved strong out-of-sample performance (Test R2 = 0.800). A leakage-safe stacked Ridge ensemble yielded comparable performance (Test R2 = 0.827), while preserving the interpretability of the underlying tree model used for SHAP analysis.

Citation impact

13
total citations
FWCI
355.35
Percentile
100%
References
17
Too recent for citation history.

Authors

3

Topics & keywords

Keywords
  • Interpretability
  • Census
  • Random forest
  • Workforce
  • Population
  • Ridge
  • Geocoding
UN Sustainable Development Goals
  • Decent work and economic growth
No related works found for this paper.