articleJun 16, 2024Closed access

MoMask: Generative Masked Modeling of 3D Human Motions

University of Alberta

Indexed incrossref

Abstract

We introduce MoMask, a novel masked modeling framework for text-driven 3D human motion generation. In Mo-Mask, a hierarchical quantization scheme is employed to represent human motion as multi-layer discrete motion tokens with high-fidelity details. Starting at the base layer, with a sequence of motion tokens obtained by vector quan-tization, the residual tokens of increasing orders are de-rived and stored at the subsequent layers of the hierar-chy. This is consequently followed by two distinct bidirectional transformers. For the base-layer motion tokens, a Masked Transformer is designated to predict randomly masked motion tokens conditioned on text input at training stage. During generation (i. e. inference)…

No related works found for this paper.