Fooling LIME and SHAP

Slack, Dylan; Hilgard, Sophie; Jia, Emily; Singh, Sameer; Lakkaraju, Himabindu

doi:10.1145/3375627.3375830

articleProceedings of the AAAI/ACM Conference on AI Ethics and SocietyFeb 5, 2020BRONZE OA

Fooling LIME and SHAP

DSDylan Slack SHSophie Hilgard EJEmily Jia SSSameer Singh HLHimabindu Lakkaraju

University of California, Irvine · Harvard University Press

Indexed incrossref

Abstract

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used…

Citation impact

753

total citations

FWCI: 46.92
Percentile: 100%
References: 17

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Classifier (UML)
Machine learning
Adversarial system
Artificial intelligence
Post hoc
Toolbox
Data science

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.