Student: Zihao Mu
Mentor: Vladimir Tyan
Link to accomplished work:
- Merged PR: opencv/pull/17675
- Multiple text recognition models for OpenCV DNN: Shared model link
- Train your own text recognition model for OpenCV : deep-text-recognition-benchmark
- Detailed Tutorial: OpenCV OCR Tuorial
Hi, I'm Zihao Mu! I was the developer of openCV GSoC2020. The goal of this project is to improve text & digit recognition samples in OpenCV. In this deep learning era, we can implement some more efficient text recognition methods. The project mainly consists of two parts:
-
First Part Digital Recognition through live camera: Digital Detector: Connected Component Analysis Digital Recognizer: LeNet-5 pre-trained on MINST dataset.
-
Second Part: Text Recognition through live camera: Digital Detector: EAST Digital Recognizer: Multiple text recognition models based on deep learning
Implement opencv/sample/cpp/digits_lenet.cpp
base on Connected Component Analysis and LeNet-5. Finding stable preprocessing methods, and implementing ROI of digital rotation prediction.
Implement opencv/sample/dnn/text_detection.cpp
and opencv/sample/dnn/text_detection.py
, let it not only detect text, but recognize text. Based this Github Project, multiple text recognition models have been trained and can be correctly called by the OpenCV DNN module.
Provide a Detailed Tutorial, including how to train your own text recognition model, and how to convert the model to be called by OpenCV DNN.
Their performance at different text recognition datasets is shown in the table below:
Model name | IIIT5k(%) | SVT(%) | ICDAR03(%) | ICDAR13(%) | ICDAR15(%) | SVTP(%) | CUTE80(%) | average acc (%) | parameter( x10^6 ) |
---|---|---|---|---|---|---|---|---|---|
DenseNet-CTC | 72.267 | 67.39 | 82.81 | 80 | 48.38 | 49.45 | 42.50 | 63.26 | 0.24 |
DenseNet-BiLSTM-CTC | 73.76 | 72.33 | 86.15 | 83.15 | 50.67 | 57.984 | 49.826 | 67.69 | 3.63 |
VGG-CTC | 75.96 | 75.42 | 85.92 | 83.54 | 54.89 | 57.52 | 50.17 | 69.06 | 5.57 |
CRNN_VGG-BiLSTM-CTC | 82.63 | 82.07 | 92.96 | 88.867 | 66.28 | 71.01 | 62.37 | 78.03 | 8.45 |
ResNet-CTC | 84.00 | 84.08 | 92.39 | 88.96 | 67.74 | 74.73 | 67.60 | 79.93 | 44.28 |
The performance of the text recognition model were tesred on OpenCV DNN, and does not include the text detection model.
CPU: i5-8300 RAM: 16GB GPU: 1050 4GB
Ubuntu 18.01 OpenCV 4.4 CUDA 10.0
The demo video can be found here.
[1]Scene Text Detection and Recognition: The Deep Learning Era
[2]]http://cs-chan.com/doc/ICDAR17.pdf
[3]https://github.com/hwalsuklee/awesome-deep-text-detection-recognition
[4]https://arxiv.org/abs/1704.03155v2 (EAST)
[5]https://github.com/chineseocr/darknet-ocr (CTPN no BiLSTM)
[6]https://github.com/senlinuc/caffe_ocr (Densenet + BiLSTM and Densnet no BiLSTM)
[7]https://github.com/huoyijie/AdvancedEAST (EAST Advanced)
[8]https://github.com/meijieru/crnn.pytorch (CRNN)