Bilinear CNN Models for Fine-Grained Visual Recognition
University of Massachusetts Amherst
Abstract
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor. This architecture can model local pairwise feature interactions in a translationally invariant manner which is particularly useful for fine-grained categorization. It also generalizes various orderless texture descriptors such as the Fisher vector, VLAD and O2P. We present experiments with bilinear models where the feature extractors are based on convolutional neural networks. The bilinear form simplifies gradient computation and allows end-to-end training of both networks using image labels only.…
Citation impact
- FWCI
- 60.45
- Percentile
- 100%
- References
- 59
Authors
3Topics & keywords
- Bilinear interpolation
- Computer science
- Convolutional neural network
- Pattern recognition (psychology)
- Pairwise comparison
- Artificial intelligence
- Feature (linguistics)
- Computation