SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Zhang, Wenxuan; Cun, Xiaodong; Wang, Xuan; Zhang, Yong; Shen, Xi; Guo, Yu; Shan, Ying; Wang, Fei

doi:10.1109/cvpr52729.2023.00836

articleJun 1, 2023Closed access

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

WZWenxuan Zhang XCXiaodong Cun XWXuan Wang YZYong Zhang XSXi Shen

Xi'an Jiaotong University · Tencent (China)

Indexed incrossref

Abstract

Generating talking head videos through a face image and a piece of speech audio still contains many challenges. i.e., unnatural head movement, distorted expression, and identity modification. We argue that these issues are mainly caused by learning from the coupled 2D motion fields. On the other hand, explicitly using 3D information also suffers problems of stiff expression and incoherent video. We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation. To learn the realistic motion coefficients, we explicitly model the connections between audio and different types of motion…

Citation impact

269

total citations

FWCI: 30.62
Percentile: 100%
References: 65

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Animation
Artificial intelligence
Face (sociological concept)
Motion (physics)
Computer vision
Expression (computer science)
Computer facial animation

No related works found for this paper.

Funding

NK
National Key Research and Development Program of China
Award: 2022YFB3303800