MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Peking University · ETH Zurich · +1 more institution

Indexed incrossref

Abstract

Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting spatial and temporal relationships. However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. In order to effectively model multi-hypothesis dependencies and build strong relationships across hypothesis features, the task is decomposed into three stages: (i) Generate multiple initial hypothesis…

Citation impact

409
total citations
FWCI
22.54
Percentile
100%
References
59
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Ambiguity
  • Artificial intelligence
  • Merge (version control)
  • Transformer
  • Machine learning
  • Intuition
  • Pattern recognition (psychology)
No related works found for this paper.

Funding