Abstract

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used…

Citation impact

753
total citations
FWCI
46.92
Percentile
100%
References
17
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Classifier (UML)
  • Machine learning
  • Adversarial system
  • Artificial intelligence
  • Post hoc
  • Toolbox
  • Data science
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.

Funding