M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Wang, Junke; Wu, Zuxuan; Ouyang, Wenhao; Han, Xintong; Chen, Jingjing; Jiang, Yu–Gang; Li, Ser-Nam

doi:10.1145/3512527.3531415

preprintJun 23, 2022Closed access

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

JWJunke Wang ZWZuxuan Wu WOWenhao Ouyang XHXintong Han JCJingjing Chen

Fudan University

Indexed incrossref

Abstract

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images. In this paper, we aim to capture the subtle manipulation artifacts at different scales using transformer models. In particular, we introduce a Multi-modal Multi-scale TRansformer (M2TR), which operates on patches of different sizes to detect local inconsistencies in images at different spatial levels. M2TR further learns to detect forgery artifacts in the frequency domain to complement RGB information through a carefully designed cross modality fusion block. In addition, to stimulate Deepfake detection research, we introduce a high-quality Deepfake dataset, SR-DF, which consists of…

Citation impact

319

total citations

FWCI: 29.33
Percentile: 100%
References: 70

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Transformer
Modal
RGB color model
Pattern recognition (psychology)
Computer vision
Machine learning

No related works found for this paper.