Generating Human Motion from Textual Descriptions with Discrete Representations

Zhang, Jianrong; Zhang, Yangsong; Cun, Xiaodong; Zhang, Yong; Zhao, Hongwei; Lu, Hongtao; Shen, Xi; Ying, Shan

doi:10.1109/cvpr52729.2023.01415

articleJun 1, 2023Closed access

Generating Human Motion from Textual Descriptions with Discrete Representations

JZJianrong Zhang YZYangsong Zhang XCXiaodong Cun YZYong Zhang HZHongwei Zhao

Jilin University · Tencent (China) · +2 more institutions

Indexed incrossref

Abstract

In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) and Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that a simple CNN-based VQ-VAE with commonly used training recipes (EMA and Code Reset) allows us to obtain high-quality discrete representations. For GPT, we incorporate a simple corruption strategy during the training to alleviate training-testing discrepancy. Despite its simplicity, our T2M-GPT shows better performance than competitive approaches, including recent diffusion-based approaches. For example, on HumanML3D, which is currently the largest dataset,…

Citation impact

257

total citations

FWCI: 29.25
Percentile: 100%
References: 93

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Autoencoder
Consistency (knowledge bases)
Artificial intelligence
Generative grammar
Simple (philosophy)
Generative model
Motion (physics)

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.