Interpretation of Neural Networks Is Fragile
Indexed incrossref
Abstract
In order for machine learning to be trusted in many applications, it is critical to be able to reliably explain why the machine learning algorithm makes certain predictions. For this reason, a variety of methods have been developed recently to interpret neural network predictions by providing, for example, feature importance maps. For both scientific robustness and security reasons, it is important to know to what extent can the interpretations be altered by small systematic perturbations to the input data, which might be generated by adversaries or by measurement biases. In this paper, we demonstrate how to generate adversarial perturbations that produce perceptively indistinguishable inputs that are assigned…
Citation impact
675
total citations
- FWCI
- 48.67
- Percentile
- 100%
- References
- 26
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Robustness (evolution)
- Computer science
- Artificial intelligence
- Adversarial system
- Hessian matrix
- Machine learning
- Artificial neural network
- Interpretation (philosophy)
No related works found for this paper.