Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation

Li, Wenhao; Liu, Hong; Ding, Runwei; Liu, Mengyuan; Wang, Pichao; Yang, Wenming

doi:10.1109/tmm.2022.3141231

articleIEEE Transactions on MultimediaJan 7, 2022Closed access

Exploiting Temporal Contexts With Strided Transformer for 3D Human Pose Estimation

WLWenhao Li HLHong Liu RDRunwei Ding MLMengyuan Liu PWPichao Wang

Peking University · Sun Yat-sen University · +4 more institutions

Indexed incrossref

Abstract

Despite the great progress in 3D human pose estimation from videos, it is still an open problem to take full advantage of a redundant 2D pose sequence to learn representative representations for generating one 3D pose. To this end, we propose an improved Transformer-based architecture, called Strided Transformer, which simply and effectively lifts a long sequence of 2D joint locations to a single 3D pose. Specifically, a Vanilla Transformer Encoder (VTE) is adopted to model long-range dependencies of 2D pose sequences. To reduce the redundancy of the sequence, fully-connected layers in the feed-forward network of VTE are replaced with strided convolutions to progressively shrink the sequence length and…

Citation impact

293

total citations

FWCI: 27.10
Percentile: 100%
References: 70

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Encoder
Computer science
Transformer
Computation
Artificial intelligence
Pose
Redundancy (engineering)
Pattern recognition (psychology)

No related works found for this paper.