Learning Representations by Maximizing Mutual Information Across Views
Microsoft Research (United Kingdom)
Abstract
We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce multiple views of a local spatio-temporal context by observing it from different locations (e.g., camera positions within a scene), and via different modalities (e.g., tactile, auditory, or visual). Or, an ImageNet image could provide a context from which one produces multiple views by repeatedly applying data augmentation. Maximizing mutual information between features extracted from these views requires capturing information about high-level factors whose influence spans multiple views -- e.g., presence of…
Citation impact
- FWCI
- —
- Percentile
- —
- References
- 45
Authors
3Topics & keywords
- Mutual information
- Computer science
- Psychology
- Data science
- Artificial intelligence