An Overview of the Tesseract OCR Engine

Smith, Ray

doi:10.1109/icdar.2007.4376991

articleProceedings of the International Conference on Document Analysis and RecognitionSep 1, 2007Closed access

An Overview of the Tesseract OCR Engine

RSRay Smith

Google (United States)

Indexed incrossref

Abstract

The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.

Citation impact

2,217

total citations

FWCI: 19.34
Percentile: 100%
References: 13

Citations per year

Authors

1

RS
Ray SmithCorresponding
Google (United States)

Topics & keywords

Topics

Keywords

Optical character recognition
Computer science
Artificial intelligence
Classifier (UML)
Information retrieval
Natural language processing
Pattern recognition (psychology)

No related works found for this paper.

Funding

UO
University of Nevada, Las Vegas