Understanding Black-box Predictions via Influence Functions

Koh, Pang Wei; Liang, Percy

doi:10.48550/arxiv.1703.04730

preprintarXiv (Cornell University)Mar 14, 2017GREEN OA

Understanding Black-box Predictions via Influence Functions

PWPang Wei Koh PLPercy Liang

Stanford University

Indexed inarxivdatacite

Abstract

How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural…

Citation impact

1,190

total citations

FWCI: —
Percentile: —
References: 38

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Computer science
Black box
Debugging
Hessian matrix
Oracle
Set (abstract data type)
Machine learning
Differentiable function

No related works found for this paper.