MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Peking University · ETH Zurich · +1 more institution
Abstract
Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, the task is decomposed into three stages: (i) Generate multiple initial hypothesis…
Citation impact
- FWCI
- 22.54
- Percentile
- 100%
- References
- 59
Authors
5Topics & keywords
- Computer science
- Ambiguity
- Artificial intelligence
- Merge (version control)
- Transformer
- Machine learning
- Intuition
- Pattern recognition (psychology)