MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Zhang, Jinlu; Tu, Zhigang; Yang, Jianyu; Chen, Yujin; Yuan, Junsong

doi:10.1109/cvpr52688.2022.01288

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

JZJinlu Zhang ZTZhigang Tu JYJianyu Yang YCYujin Chen JYJunsong Yuan

Wuhan University · Technical University of Munich · +1 more institution

Indexed incrossref

Abstract

Recent transformer-based solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn spatio-temporal correlation. We observe that the motions of different joints differ significantly. However, the previous methods cannot efficiently model the solid inter-frame correspondence of each joint, leading to insufficient learning of spatial-temporal correlation. We propose MixSTE (Mixed Spatio-Temporal Encoder), which has a temporal transformer block to separately model the temporal motion of each joint and a spatial transformer block to learn inter-joint spatial correlation. These two blocks are utilized alternately to obtain better…

Citation impact

365

total citations

FWCI: 19.35
Percentile: 100%
References: 70

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Encoder
Computer science
Artificial intelligence
Transformer
Coherence (philosophical gambling strategy)
Joint (building)
Pattern recognition (psychology)
Motion estimation

No related works found for this paper.