articleJun 1, 2016Closed access

End-to-End People Detection in Crowded Scenes

Stanford University · Max Planck Institute for Informatics

Indexed incrossref

Abstract

Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as nonmaximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes1.

Citation impact

546
total citations
FWCI
29.23
Percentile
100%
References
42
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • End-to-end principle
  • Decoding methods
  • Set (abstract data type)
  • Artificial intelligence
  • Detector
  • Task (project management)
  • Image (mathematics)
No related works found for this paper.