Interpretability Beyond Feature Attribution: Quantitative Testing with\n Concept Activation Vectors (TCAV)

Kim, Been; Wattenberg, Martin; Gilmer, Justin; Cai, Carrie J.; Wexler, James; Viégas, Fernanda; Sayres, Rory

doi:10.48550/arxiv.1711.11279

preprintarXiv (Cornell University)Nov 30, 2017GREEN OA

Interpretability Beyond Feature Attribution: Quantitative Testing with\n Concept Activation Vectors (TCAV)

BKBeen Kim MWMartin Wattenberg JGJustin Gilmer CJCarrie J. Cai JWJames Wexler

Indexed inarxiv

Abstract

The interpretation of deep learning models is a challenge due to their size,\ncomplexity, and often opaque internal state. In addition, many systems, such as\nimage classifiers, operate on low-level features rather than high-level\nconcepts. To address these challenges, we introduce Concept Activation Vectors\n(CAVs), which provide an interpretation of a neural net's internal state in\nterms of human-friendly concepts. The key idea is to view the high-dimensional\ninternal state of a neural net as an aid, not an obstacle. We show how to use\nCAVs as part of a technique, Testing with CAVs (TCAV), that uses directional\nderivatives to quantify the degree to which a user-defined concept is important\nto a…

Citation impact

735

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Interpretability
Computer science
Artificial intelligence
Interpretation (philosophy)
Image (mathematics)
Domain (mathematical analysis)
Machine learning
Feature (linguistics)

No related works found for this paper.