TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models

Li, Minghao; Lv, Tengchao; Chen, Jingye; Cui, Lei; Lu, Yijuan; Florêncio, Dinei; Zhang, Cha; Li, Zhoujun; Wei, Furu

doi:10.1609/aaai.v37i11.26538

articleProceedings of the AAAI Conference on Artificial IntelligenceJun 26, 2023DIAMOND OA

TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models

MLMinghao Li TLTengchao Lv JCJingye Chen LCLei Cui YLYijuan Lu

Beihang University · Microsoft (Finland)

Indexed incrossref

Abstract

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments…

Citation impact

363

total citations

FWCI: 21.10
Percentile: 100%
References: 91

Citations per year

Authors

9

Topics & keywords

Topics

Keywords

Transformer
Computer science
AKA
Artificial intelligence
Language model
Optical character recognition
Pattern recognition (psychology)
Text recognition

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NN
National Natural Science Foundation of China
Awards: U1636211, 62276017, 62276017, U1636211, 61672081, 61672081