Refine document and scripts of CTC model. by wanghaoshuang · Pull Request #798 · PaddlePaddle/models

wanghaoshuang · 2018-04-01T12:49:57Z

Add document.
Add arguments for saving model and init model.
Refine inference.py and eval.py.
Make ctc_reader.py support for custom data.

1. Add document. 2. Add arguments for saving model and init model. 3. Refine inference.py and eval.py. 4. Make ctc_reader.py support for custom data.

… ctc_doc

qingqing01 · 2018-04-03T03:46:56Z

+
+**- -test_list :** 存放测试集图片信息的list文件，如果设置为None，ctc_reader会自动下载使用默认数据集。如果使用自己的数据进行测试，需要修改该选项。默认为None。
+
+**- -num_classes :** 字符集的大小。如果设置为None, 则使用ctc_reader提供的字符集大小。如果使用自己的数据进行训练，需要修改该选项。默认为None.


这里觉得不用每个参数都解释，以后变动就得改。可以train.py里的参数注释写好点，告诉用户: python train.py --help查看使用方法。

qingqing01 · 2018-04-03T03:47:04Z

+
+**--input_images_list :**  存放待预测图片信息的list文件的路径。如果设置为None, 则使用ctc_reader提供的默认数据。默认为None.
+
+**--device DEVICE :** 设备ID。设置为-1，运行在CPU上；设置为0，运行在GPU上。默认为0。


qingqing01 · 2018-04-03T03:47:36Z

+
+**--device DEVICE :** 设备ID。设置为-1，运行在CPU上；设置为0，运行在GPU上。默认为0。
+
+预测结果会print到标准输出。


关于OCR的预测，最好输入一张图片，输出一个文本，结果显示出来。

直接输出文本有点困难，还需要用户给一个字典。
现在是用户可以输入一个图片路径，然后即时输出一个indexes 序列结果.

python inference.py --model_path models/model_00044_15000 ----------- Configuration Arguments ----------- device: 0 input_images_dir: None input_images_list: None model_path: models/model_00044_15000 ------------------------------------------------ Init model from: models/model_00044_15000. Please input the path of image: data/test_images/00008_4700.jpg result: [6514 5919 3415 173] Please input the path of image:

qingqing01 · 2018-04-03T03:48:01Z

+
+**--input_images_list :**  存放待评估图片信息的list文件的路径。如果设置为None, 则使用ctc_reader提供的默认数据。默认为None.
+
+**--device DEVICE :** 设备ID。设置为-1，运行在CPU上；设置为0，运行在GPU上。默认为0。


同上，Evaluation可以放在Inference前面？

1. Remove illustration of arguments. 2. Make inference support for more format input.

qingqing01 · 2018-04-11T05:07:07Z


-This model built with paddle fluid is still under active development and is not
-the final version. We welcome feedbacks.
+运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照安装文档中的说明更新PaddlePaddle安装版本。


这里要求需要更新，让使用Develop最新版本吧，后续稳定了，我们在更改。

qingqing01 · 2018-04-11T05:08:02Z

+<p align="center">
+<img src="images/train.jpg" width="620" hspace='10'/> <br/>
+<strong>图 2</strong>
+</p>


https://github.com/wanghaoshuang/models/tree/ctc_doc/fluid/ocr_recognition 图2没显示出来。

Fixed. Thx.

给出的图的同时，解释下图的意思，以及说下seq error是多少。有train的seq error吗？如果有画两条？

是否需要给出train和test的cost图？

qingqing01 · 2018-04-11T05:08:32Z

+
+
+
+### 1.3 Evaluate


这个是中文文档，请标题也使用中文。

qingqing01 · 2018-04-11T05:08:53Z

+    --model_path models/model_00044_15000
+```
+
+Read image path from list file and inference：


语言统一使用中文。

qingqing01 · 2018-04-11T05:09:09Z

+env CUDA_VISIBLE_DEVICE=0 python inference.py \
+    --model_path=models/model_00044_15000 \
+    --input_images_list="data/test.list"
+```


请给出预测结果示例。

qingqing01 · 2018-04-11T05:11:47Z

+
+```
+env CUDA_VISIABLE_DEVICES=0 python ctc_train.py \
+    --device=0 \


代码里可以去掉device，换成use_gpu?

qingqing01 · 2018-04-11T05:13:23Z

+    --device=0 \
+    --parallel=False \
+    --batch_size=32
+```


这些参数都是默认的话，写成下面这样？方便用户直接粘贴赋值。

env CUDA_VISIABLE_DEVICES=0 python ctc_train.py

qingqing01 · 2018-04-11T05:14:58Z

+add_arg('learning_rate',     float, 1.0e-3,    "Learning rate.")
+add_arg('l2',                float, 0.0004,    "L2 regularizer.")
+add_arg('max_clip',          float, 10.0,      "Max clip threshold.")
+add_arg('min_clip',          float, -10.0,     "Min clip threshold.")


max_clip/min_clip用了吗？感觉尽量减少参数。

qingqing01 · 2018-04-11T05:16:20Z

+add_arg('test_list',        str,    None,   "The list file of training images."
+        "None means using the default test_list file of reader.")
+add_arg('num_classes',      int,    None,      "The number of classes."
+        "None means using the default num_classes from reader.")


觉得用户可配置的参数是做实验【频繁】使用的参数。这里的参数太多了。

感觉还好吧，有些参数虽然不经常调整，但是也是必不可少的。

… ctc_doc

1. Remove unused arguments. 2. Refine doc. 3. Change 'device' to 'use_gpu'.

qingqing01

I approve the PR. But some comments need to fix in next PR.

qingqing01 · 2018-04-13T07:28:16Z

@@ -1,4 +1,179 @@
-# OCR Model
+
+[toc]


https://github.com/wanghaoshuang/models/tree/ctc_doc/fluid/ocr_recognition

toc 没显示出来。

qingqing01 · 2018-04-13T07:28:33Z

+
+# Optical Character Recognition
+
+这里将介绍如何在PaddlePaddle fluid下使用CRNN-CTC 和 CRNN-Attention模型对图片中的文字内容进行识别。


fluid -> Fluid

qingqing01 · 2018-04-13T07:29:12Z

+
+## 1. CRNN-CTC
+
+本章的任务是识别含有单行汉语字符图片，首先采用卷积将图片转为`features map`, 然后使用`im2sequence op`将`features map`转为`sequence`，经过`双向GRU RNN`得到每个step的汉语字符的概率分布。训练过程选用的损失函数为CTC loss，最终的评估指标为`instance error rate`。


features map换成中文吧，叫特征图。
sequence-> 序列。

经过双向GRU RNN得到每个step的汉语字符的概率分布

实际模型里并没有得到概率分布。

通过双向GRU学习到序列特征。

第一出现的CTC地方，需要中文。

instance error rate: 需要解释明白，可以写成：样本级别的错误率。

qingqing01 · 2018-04-13T07:44:05Z

+- **ctc_reader.py :** 下载、读取、处理数据。提供方法`train()` 和 `test()` 分别产生训练集和测试集的数据迭代器。
+- **crnn_ctc_model.py :** 在该脚本中定义了训练网络、预测网络和evaluate网络。
+- **ctc_train.py :** 用于模型的训练，可通过命令`python train.py --help` 获得使用方法。
+- **inference.py :** 加载训练好的模型文件，对新数据进行预测。可通过命令`python inference.py --help` 获得使用方法。


inference.py -> infer.py

qingqing01 · 2018-04-13T07:45:58Z

+<strong>图 1</strong>
+</p>
+
+在训练集中，每张图片对应的label是由若干数字组成的sequence。 Sequence中的每个数字表示一个字符在字典中的index。 `图1` 对应的label如下所示：


每张图片对应的label是由若干数字组成的sequence。Sequence中的每个数字表示一个字符在字典中的index。

每张图片对应的label是汉字在词典中的索引。

qingqing01 · 2018-04-13T07:48:15Z

+Init model from: /home/work/models/fluid/ocr_recognition/models/model_00052_15000.
+Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0060.jpg
+result: [3298 2371 4233 6514 2378 3298 2363]
+Please input the path of image: /home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg


/home/work/models/fluid/ocr_recognition/data/test_images/00001_0429.jpg
这样的路径在文档中，对用户不友好。

这里可以使用图1的图片吗？输出结果可以换成在词典中转换之后的汉字吗？

qingqing01 · 2018-04-13T07:50:12Z

+<p align="center">
+<img src="images/train.jpg" width="620" hspace='10'/> <br/>
+<strong>图 2</strong>
+</p>


给出的图的同时，解释下图的意思，以及说下seq error是多少。有train的seq error吗？如果有画两条？

是否需要给出train和test的cost图？

wanghaoshuang added 3 commits April 1, 2018 20:36

Refine document and scripts.

07d4531

1. Add document. 2. Add arguments for saving model and init model. 3. Refine inference.py and eval.py. 4. Make ctc_reader.py support for custom data.

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

57dd5c7

… ctc_doc

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

e4fa968

… ctc_doc

wanghaoshuang requested a review from qingqing01 April 1, 2018 12:50

wanghaoshuang added 2 commits April 1, 2018 20:55

Format readme.

d38f9aa

Format markdown.

1c78d27

qingqing01 reviewed Apr 3, 2018

View reviewed changes

Fix some issues:

53988dd

1. Remove illustration of arguments. 2. Make inference support for more format input.

wanghaoshuang requested review from abhinavarora and removed request for abhinavarora April 8, 2018 05:13

qingqing01 reviewed Apr 11, 2018

View reviewed changes

wanghaoshuang added 2 commits April 11, 2018 19:34

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

f67e732

… ctc_doc

Fix som issues:

bd97b39

1. Remove unused arguments. 2. Refine doc. 3. Change 'device' to 'use_gpu'.

qingqing01 approved these changes Apr 13, 2018

View reviewed changes

wanghaoshuang mentioned this pull request Apr 13, 2018

Fix comments in OCR CTC model doc. #843

Closed

wanghaoshuang merged commit 609dc34 into PaddlePaddle:develop Apr 13, 2018


		- -test_list : 存放测试集图片信息的list文件，如果设置为None，ctc_reader会自动下载使用默认数据集。如果使用自己的数据进行测试，需要修改该选项。默认为None。

		- -num_classes : 字符集的大小。如果设置为None, 则使用ctc_reader提供的字符集大小。如果使用自己的数据进行训练，需要修改该选项。默认为None.


		--input_images_list : 存放待预测图片信息的list文件的路径。如果设置为None, 则使用ctc_reader提供的默认数据。默认为None.

		--device DEVICE : 设备ID。设置为-1，运行在CPU上；设置为0，运行在GPU上。默认为0。


		--device DEVICE : 设备ID。设置为-1，运行在CPU上；设置为0，运行在GPU上。默认为0。

		预测结果会print到标准输出。


		--input_images_list : 存放待评估图片信息的list文件的路径。如果设置为None, 则使用ctc_reader提供的默认数据。默认为None.

		--device DEVICE : 设备ID。设置为-1，运行在CPU上；设置为0，运行在GPU上。默认为0。


		# Optical Character Recognition

		这里将介绍如何在PaddlePaddle fluid下使用CRNN-CTC 和 CRNN-Attention模型对图片中的文字内容进行识别。


		## 1. CRNN-CTC

		本章的任务是识别含有单行汉语字符图片，首先采用卷积将图片转为`features map`, 然后使用`im2sequence op`将`features map`转为`sequence`，经过`双向GRU RNN`得到每个step的汉语字符的概率分布。训练过程选用的损失函数为CTC loss，最终的评估指标为`instance error rate`。

Conversation

wanghaoshuang commented Apr 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 Apr 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanghaoshuang Apr 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

qingqing01 Apr 3, 2018 •

edited

Loading

wanghaoshuang Apr 8, 2018 •

edited

Loading