articleJun 16, 2024Closed access
MoMask: Generative Masked Modeling of 3D Human Motions
Indexed incrossref
Abstract
We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In Mo-Mask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. Starting at the base layer, with a sequence of motion tokens obtained by vector quan-tization, the residual tokens of increasing orders are de-rived and stored at the subsequent layers of the hierar-chy. This is consequently followed by two distinct bidirectional transformers. For the base-layer motion tokens, a Masked Transformer is designated to predict randomly masked motion tokens conditioned on text input at training stage. During generation (i. e. inference)…
Citation impact
121
total citations
- FWCI
- 36.31
- Percentile
- 100%
- References
- 57
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Generative grammar
- Computer science
- Artificial intelligence
No related works found for this paper.