Unifying Flow, Stereo and Depth Estimation
ETH Zurich · The University of Sydney · +2 more institutions
Abstract
We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality…
Citation impact
- FWCI
- 21.73
- Percentile
- 100%
- References
- 119
Authors
7Topics & keywords
- Computer science
- Artificial intelligence
- Optical flow
- Discriminative model
- Unified Model
- Inference
- Transformer
- Feature (linguistics)
- Reduced inequalities