A Comparison of EasyOCR and Tesseract Performance in Text Extraction from Digital Images
Abstract
Rapid advancements in digital image processing technology have increased the demand for automatic text extraction systems from images, commonly known as OCR (Optical Character Recognition). In this field, there are two widely used tools: and , which are two very popular open-source software packages frequently utilized by developers to meet their needs. However, selecting the most appropriate tool often poses a unique challenge for researchers due to significant differences in the underlying architecture and performance offered by each library. This study aims to conduct an in-depth comparative analysis between Tesseract and EasyOCR, specifically regarding character recognition accuracy and data processing speed under various image conditions. The methodology employed in this study involves collecting a diverse dataset of images, ranging from very clean printed text to images with significant visual noise or distortion. Both software programs will then be tested using the Python programming language to systematically and measurably extract text from the same dataset. Performance evaluation is measured objectively using the Weighted Average Character Error Rate (CER) and Word Error Rate (WER) metrics.






