Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja

doi:10.1109/iccv.2015.11

preprintDec 1, 2015GREEN OA

Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

YZYukun Zhu RKRyan Kiros RZRich Zemel RSRuslan Salakhutdinov RURaquel Urtasun

University of Toronto · Canadian Institute for Advanced Research · +1 more institution

Indexed incrossref

Abstract

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine…

Citation impact

2,063

total citations

FWCI: 29.95
Percentile: 100%
References: 73

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Reading (process)
Context (archaeology)
Semantics (computer science)
Sentence
Embedding
Object (grammar)
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.