Interpretation of Neural Networks Is Fragile

Stanford University

Indexed incrossref

Abstract

In order for machine learning to be trusted in many applications, it is critical to be able to reliably explain why the machine learning algorithm makes certain predictions. For this reason, a variety of methods have been developed recently to interpret neural network predictions by providing, for example, feature importance maps. For both scientific robustness and security reasons, it is important to know to what extent can the interpretations be altered by small systematic perturbations to the input data, which might be generated by adversaries or by measurement biases. In this paper, we demonstrate how to generate adversarial perturbations that produce perceptively indistinguishable inputs that are assigned…

Citation impact

675
total citations
FWCI
48.67
Percentile
100%
References
26
Citations per year

Authors

3

Topics & keywords

Keywords
  • Robustness (evolution)
  • Computer science
  • Artificial intelligence
  • Adversarial system
  • Hessian matrix
  • Machine learning
  • Artificial neural network
  • Interpretation (philosophy)
No related works found for this paper.