Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Microsoft Research Asia (China) · Microsoft (United States) · +2 more institutions
Abstract
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a…
Citation impact
- FWCI
- 214.11
- Percentile
- 100%
- References
- 70
Authors
4Topics & keywords
- Pooling
- Pascal (unit)
- Artificial intelligence
- Computer science
- Convolutional neural network
- Pattern recognition (psychology)
- Pyramid (geometry)
- Contextual image classification