VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Chen, Haoxin; Zhang, Yong; Cun, Xiaodong; Xia, Menghan; Wang, Xintao; Weng, Chao; Shan, Ying

doi:10.1109/cvpr52733.2024.00698

articleJun 16, 2024Closed access

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

HCHaoxin Chen YZYong Zhang XCXiaodong Cun MXMenghan Xia XWXintao Wang

Tencent (China)

Indexed incrossref

Abstract

Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate plausible videos with mini-mal noise, excellent details, and high aesthetic scores. However, these models rely on large-scale, well-filtered, high-quality videos that are not accessible to the community. Many existing research works, which train models using the low-quality WebVid-10M dataset, struggle to generate high-quality videos because the models are optimized to fit WebVid-10M. In this work, we explore the training scheme of video models extended from Stable Diffusion and investigate the feasibility of leveraging low-quality videos and synthesized high-quality…

Citation impact

163

total citations

FWCI: 36.54
Percentile: 100%
References: 81

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Diffusion
Quality (philosophy)
Data modeling
Database

No related works found for this paper.