LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Harbin Institute of Technology · Beihang University · +1 more institution
Abstract
Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of…
Citation impact
- FWCI
- 22.66
- Percentile
- 100%
- References
- 16
Authors
6- YXYiheng XuCorresponding
Harbin Institute of Technology
- MLMinghao Li
Beihang University
- LCLei Cui
Microsoft Research Asia (China)
- SHShaohan Huang
Microsoft Research Asia (China)
- FWFuru Wei
Microsoft Research Asia (China)
Topics & keywords
- Leverage (statistics)
- Document layout analysis
- Document image processing
- Focus (optics)
- Image (mathematics)
- Variety (cybernetics)
- Information extraction
- Historical document