Understanding Black-box Predictions via Influence Functions
Indexed inarxivdatacite
Abstract
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural…
Citation impact
1,190
total citations
- FWCI
- —
- Percentile
- —
- References
- 38
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Computer science
- Black box
- Debugging
- Hessian matrix
- Oracle
- Set (abstract data type)
- Machine learning
- Differentiable function
No related works found for this paper.