A Comparison of EasyOCR and Tesseract Performance in Text Extraction from Digital Images

Abyan Hanif  Alfatah; Ardan  Sahid; Alifia Anita Firdaus

doi:10.61902/jkti.v2i1.2436

Abyan Hanif Alfatah Universitas Muhammadiyah Klaten
Ardan Sahid Universitas Muhammadiyah Klaten
Alifia Anita Firdaus Universitas Muhammadiyah Klaten

DOI: https://doi.org/10.61902/jkti.v2i1.2436

Keywords: OCR, EasyOCR, Tesseract, Text Extraction, Deep Learning, Image Processing, CER, WER

Abstract

Rapid advancements in digital image processing technology have increased the demand for automatic text extraction systems from images, commonly known as OCR (Optical Character Recognition). In this field, there are two widely used tools: and , which are two very popular open-source software packages frequently utilized by developers to meet their needs. However, selecting the most appropriate tool often poses a unique challenge for researchers due to significant differences in the underlying architecture and performance offered by each library. This study aims to conduct an in-depth comparative analysis between Tesseract and EasyOCR, specifically regarding character recognition accuracy and data processing speed under various image conditions. The methodology employed in this study involves collecting a diverse dataset of images, ranging from very clean printed text to images with significant visual noise or distortion. Both software programs will then be tested using the Python programming language to systematically and measurably extract text from the same dataset. Performance evaluation is measured objectively using the Weighted Average Character Error Rate (CER) and Word Error Rate (WER) metrics.