Multiple Object Recognition with Visual Attention

Ba, Jimmy; Mnih, Volodymyr; Kavukcuoglu, Koray

doi:10.48550/arxiv.1412.7755

preprintarXiv (Cornell University)Dec 24, 2014GREEN OA

Multiple Object Recognition with Visual Attention

JBJimmy Ba VMVolodymyr Mnih KKKoray Kavukcuoglu

University of Toronto · Google (United States) · +1 more institution

Indexed inarxivdatacite

Abstract

We present an attention-based model for recognizing multiple objects in images. The proposed model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. We show that the model learns to both localize and recognize multiple objects despite being given only class labels during training. We evaluate the model on the challenging task of transcribing house number sequences from Google Street View images and show that it is both more accurate than the state-of-the-art convolutional networks and uses fewer parameters and less computation.

Citation impact

702

total citations

FWCI: —
Percentile: —
References: 6

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Task (project management)
Convolutional neural network
Object (grammar)
Computation
Class (philosophy)
Cognitive neuroscience of visual object recognition

No related works found for this paper.