articleJun 1, 2023Closed access

Generating Human Motion from Textual Descriptions with Discrete Representations

Jilin University · Tencent (China) · +2 more institutions

Indexed incrossref

Abstract

In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) and Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. We show that a simple CNN-based VQ-VAE with commonly used training recipes (EMA and Code Reset) allows us to obtain high-quality discrete representations. For GPT, we incorporate a simple corruption strategy during the training to alleviate training-testing discrepancy. Despite its simplicity, our T2M-GPT shows better performance than competitive approaches, including recent diffusion-based approaches. For example, on HumanML3D, which is currently the largest dataset,…

Citation impact

257
total citations
FWCI
29.25
Percentile
100%
References
93
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Autoencoder
  • Consistency (knowledge bases)
  • Artificial intelligence
  • Generative grammar
  • Simple (philosophy)
  • Generative model
  • Motion (physics)
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.

Funding