Add error rate calculation script. #82

pkuyym · 2017-06-08T09:22:20Z

resolves #81

kuke

Almost LGTM

kuke · 2017-06-08T11:10:35Z

deep_speech_2/error_rate.py

+    :param hypophysis: The hypophysis sentence.
+    :type reference: str
+    :param squeeze: If set true, consecutive space character 
+    will be squeezed to one


Would it be better to add and indent in line 114? The same below

kuke · 2017-06-08T11:12:30Z

deep_speech_2/error_rate.py

+    :param squeeze: If set true, consecutive space character 
+    will be squeezed to one
+    :type squeezed: bool
+    :param ignore_case: Whether ignoring character case.


Whether case-sensitive or not

kuke · 2017-06-08T11:13:58Z

deep_speech_2/error_rate.py

+def wer(reference, hypophysis, delimiter=' ', filter_none=True):
+    """
+    Calculate word error rate (WER). WER is a popular evaluation metric used
+    in speech recognition. It compares a reference to an hypophysis and


compare to --> compare with?

kuke · 2017-06-08T11:15:15Z

deep_speech_2/error_rate.py

+        return ref_len
+
+    distance = np.zeros((ref_len + 1) * (hyp_len + 1), dtype=np.int64)
+    distance = distance.reshape((ref_len + 1, hyp_len + 1))


Above two lines can be merged into one line

distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)

kuke · 2017-06-08T11:21:10Z

deep_speech_2/error_rate.py

+        Iw is the number of words inserted,
+        Nw is the number of words in the reference
+
+    We can use levenshtein distance to calculate WER. Take an attention that 


Seems there is no "take an attention", but pay/draw ...

qingqing01 · 2017-06-08T13:52:31Z

比较通用的代码可以放到paddle的repo里吧~

xinghai-sun · 2017-06-11T11:46:36Z

deep_speech_2/error_rate.py

+    return distance[ref_len][hyp_len]
+
+
+def wer(reference, hypophysis, delimiter=' ', filter_none=True):


"hypophysis" or "hypothesis"?

Why not provide a ignore_case argument, just as CER does?

xinghai-sun · 2017-06-11T11:48:57Z

deep_speech_2/error_rate.py

+    reference and hypophysis sentences before calculating WER.
+
+    :param reference: The reference sentence.
+    :type reference: str


str --> basestring. The same below.

xinghai-sun · 2017-06-11T11:56:48Z

deep_speech_2/error_rate.py

+    :return: WER
+    :rtype: float
+    """
+


Remove the blank line.

xinghai-sun · 2017-06-11T11:58:47Z

deep_speech_2/error_rate.py

+    :type delimiter: char
+    :param filter_none: Whether to remove None value when splitting sentence.
+    :type filter_none: bool
+    :return: WER


"WER" --> "Word error rate."

xinghai-sun · 2017-06-11T12:02:50Z

deep_speech_2/error_rate.py

+
+    .. code-block:: text
+
+        Sc is the number of character substituted,


character --> characters

xinghai-sun · 2017-06-11T12:58:04Z

deep_speech_2/error_rate.py

+def wer(reference, hypophysis, delimiter=' ', filter_none=True):
+    """
+    Calculate word error rate (WER). WER is a popular evaluation metric used
+    in speech recognition. It compares a reference with an hypophysis and


an hypothesis --> a hypothesis.

Please keep this description similar to cer function doc.

xinghai-sun · 2017-06-11T12:58:33Z

deep_speech_2/error_rate.py

+
+def cer(reference, hypophysis, squeeze=True, ignore_case=False, strip_char=''):
+    """
+    Calculate charactor error rate (CER). CER will compare reference text and


will compare --> compares

xinghai-sun · 2017-06-11T13:01:36Z

deep_speech_2/error_rate.py

+    if hyp_len == 0:
+        return ref_len
+
+    distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)


int32 is enough here. Done.

xinghai-sun · 2017-06-11T13:03:05Z

deep_speech_2/error_rate.py

+
+    distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)
+
+    # initialization distance matrix


initialization --> initialize

xinghai-sun · 2017-06-11T13:05:39Z

deep_speech_2/error_rate.py

@@ -0,0 +1,137 @@
+# -- * -- coding: utf-8 -- * --
+import numpy as np


Please add a simple module doc.

xinghai-sun · 2017-06-11T13:11:12Z

@qingqing01 目前因为ctc_beam_search暂时实现在计算图外，所以相应的WER/CER也只能临时放到计算图外，不合适放入paddle layer/evaluator.

pkuyym

Remove unnecessary parameters to keep the processing logic correct and simple.

pkuyym · 2017-06-12T03:38:14Z

deep_speech_2/error_rate.py

+    if hyp_len == 0:
+        return ref_len
+
+    distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)


int32 is enough here. Done.

pkuyym · 2017-06-12T03:38:48Z

deep_speech_2/error_rate.py

+
+    distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)
+
+    # initialization distance matrix


pkuyym · 2017-06-12T03:42:01Z

deep_speech_2/error_rate.py

+    :type delimiter: char
+    :param filter_none: Whether to remove None value when splitting sentence.
+    :type filter_none: bool
+    :return: WER


pkuyym · 2017-06-12T03:42:07Z

deep_speech_2/error_rate.py

+    reference and hypophysis sentences before calculating WER.
+
+    :param reference: The reference sentence.
+    :type reference: str


pkuyym · 2017-06-12T03:42:15Z

deep_speech_2/error_rate.py

+    :return: WER
+    :rtype: float
+    """
+


pkuyym · 2017-06-12T04:34:11Z

deep_speech_2/error_rate.py

+def wer(reference, hypophysis, delimiter=' ', filter_none=True):
+    """
+    Calculate word error rate (WER). WER is a popular evaluation metric used
+    in speech recognition. It compares a reference with an hypophysis and


pkuyym · 2017-06-12T04:34:27Z

deep_speech_2/error_rate.py

+        raise ValueError("Reference's word number should be greater than 0.")
+
+    if filter_none == True:
+        ref_words = filter(None, reference.strip(delimiter).split(delimiter))


pkuyym · 2017-06-12T04:34:35Z

deep_speech_2/error_rate.py

+    return wer
+
+
+def cer(reference, hypophysis, squeeze=True, ignore_case=False, strip_char=''):


pkuyym · 2017-06-12T04:35:03Z

deep_speech_2/error_rate.py

+
+    .. code-block:: text
+
+        Sc is the number of character substituted,


pkuyym · 2017-06-12T04:35:27Z

deep_speech_2/error_rate.py

+        hypophysis = hypophysis.strip(strip_char)
+    if squeeze == True:
+        reference = ' '.join(filter(None, reference.split(' ')))
+        hypophysis = ' '.join(filter(None, hypophysis.split(' ')))


xinghai-sun

Could you please add a unit test (with both English and Mandarin test case)?

… fix-81

xinghai-sun · 2017-06-16T08:55:28Z

deep_speech_2/error_rate.py

@@ -0,0 +1,133 @@
+# -*- coding: utf-8 -*-
+"""
+    This module provides functions to calculate error rate in different level.


-->"""This .....
Let's keep consistent across all files.

xinghai-sun · 2017-06-16T08:56:58Z

deep_speech_2/error_rate.py

+
+def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
+    """
+    Calculate word error rate (WER). WER compares reference text and 


--> """ Calculate

xinghai-sun · 2017-06-16T08:57:16Z

deep_speech_2/error_rate.py

+
+def cer(reference, hypothesis, ignore_case=False):
+    """
+    Calculate charactor error rate (CER). CER compares reference text and


--> """Calculate

xinghai-sun · 2017-06-16T08:58:01Z

deep_speech_2/error_rate.py

+    :param ignore_case: Whether case-sensitive or not.
+    :type ignore_case: bool
+    :return: Character error rate.
+    :rtype: float


Add :raises ValueError: If reference length is zero.

xinghai-sun · 2017-06-16T08:59:23Z

deep_speech_2/error_rate.py

+    :param delimiter: Delimiter of input sentences.
+    :type delimiter: char
+    :return: Word error rate.
+    :rtype: float


Add :raises ValueError: If there is zero reference words.

xinghai-sun · 2017-06-16T09:04:18Z

deep_speech_2/tests/test_error_rate.py

@@ -0,0 +1,29 @@
+# -*- coding: utf-8 -*-


Add:

""" Test error rate.""" from future import absolute_import from future import division from future import print_function

xinghai-sun · 2017-06-16T09:05:06Z

deep_speech_2/error_rate.py

+"""
+    This module provides functions to calculate error rate in different level.
+    e.g. wer for word-level, cer for char-level.
+"""


Add:

from __future__ import __absolute_import__ from __future__ import __division__ from __future__ import __print_function__

Lets keep consistent across DS2 project.

xinghai-sun · 2017-06-16T09:05:40Z

deep_speech_2/tests/test_error_rate.py

+# -*- coding: utf-8 -*-
+import unittest
+import sys
+sys.path.append('..')


Could we avoid using sys.path.append?

xinghai-sun · 2017-06-16T09:15:49Z

deep_speech_2/tests/test_error_rate.py

+    def test_wer(self):
+        ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
+        hyp = 'i GOT IT TO the FULLEST i LOVE TO portable FROM OF STORES last night'
+        word_error_rate = error_rate.wer(ref, hyp)


Could we add more test cases?
e.g.
self.assertTrue(error_rate.wer(ref, ref) == 0)
test if ValueError is raised if len(ref) == 0.

… fix-81

1. First install dependencies; 2. Surpport abosulate import.

pkuyym

Extend ci and fix unittest following comments.

pkuyym · 2017-06-18T06:06:51Z

deep_speech_2/error_rate.py

@@ -0,0 +1,133 @@
+# -*- coding: utf-8 -*-
+"""
+    This module provides functions to calculate error rate in different level.


pkuyym · 2017-06-18T06:07:10Z

deep_speech_2/error_rate.py

+"""
+    This module provides functions to calculate error rate in different level.
+    e.g. wer for word-level, cer for char-level.
+"""


pkuyym · 2017-06-18T06:07:25Z

deep_speech_2/error_rate.py

+
+def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
+    """
+    Calculate word error rate (WER). WER compares reference text and 


pkuyym · 2017-06-18T06:09:53Z

deep_speech_2/error_rate.py

+    :param delimiter: Delimiter of input sentences.
+    :type delimiter: char
+    :return: Word error rate.
+    :rtype: float


pkuyym · 2017-06-18T06:10:13Z

deep_speech_2/error_rate.py

+    :param ignore_case: Whether case-sensitive or not.
+    :type ignore_case: bool
+    :return: Character error rate.
+    :rtype: float


pkuyym · 2017-06-18T06:10:17Z

deep_speech_2/error_rate.py

+
+def cer(reference, hypothesis, ignore_case=False):
+    """
+    Calculate charactor error rate (CER). CER compares reference text and


pkuyym · 2017-06-18T06:14:36Z

deep_speech_2/tests/test_error_rate.py

@@ -0,0 +1,29 @@
+# -*- coding: utf-8 -*-


pkuyym · 2017-06-18T06:15:35Z

deep_speech_2/tests/test_error_rate.py

+# -*- coding: utf-8 -*-
+import unittest
+import sys
+sys.path.append('..')


pkuyym · 2017-06-18T06:30:43Z

deep_speech_2/tests/test_error_rate.py

+    def test_wer(self):
+        ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
+        hyp = 'i GOT IT TO the FULLEST i LOVE TO portable FROM OF STORES last night'
+        word_error_rate = error_rate.wer(ref, hyp)


xinghai-sun

Great job! Almost LGTM.

xinghai-sun · 2017-06-18T12:04:22Z

deep_speech_2/error_rate.py

+"""This module provides functions to calculate error rate in different level.
+e.g. wer for word-level, cer for char-level.
+"""
+


Remove Line 5.

xinghai-sun · 2017-06-18T12:04:50Z

deep_speech_2/error_rate.py

+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function


Add a blank line below Line 8.

xinghai-sun · 2017-06-18T12:09:16Z

deep_speech_2/tests/test_error_rate.py

+        try:
+            word_error_rate = error_rate.wer(ref, hyp)
+        except Exception as e:
+            self.assertTrue(isinstance(e, ValueError))


self.assertRaises?

xinghai-sun · 2017-06-18T12:09:41Z

deep_speech_2/tests/test_error_rate.py

+        try:
+            char_error_rate = error_rate.cer(ref, hyp)
+        except Exception as e:
+            self.assertTrue(isinstance(e, ValueError))


use assertRaises ?

xinghai-sun · 2017-06-18T12:10:43Z

deep_speech_2/error_rate.py

+    :type ignore_case: bool
+    :return: Character error rate.
+    :rtype: float
+    :raises ValueError: If reference length is zero.


reference length --> the reference length ?

The same in the cer.

xinghai-sun · 2017-06-18T12:12:34Z

deep_speech_2/error_rate.py

+import numpy as np
+
+
+def levenshtein_distance(ref, hyp):


rename it to _levenshtein_distance ?

Add a simple description or reference for levenshtein_distance?

pkuyym

Follow comments.

pkuyym · 2017-06-19T03:09:17Z

deep_speech_2/error_rate.py

+"""This module provides functions to calculate error rate in different level.
+e.g. wer for word-level, cer for char-level.
+"""
+


pkuyym · 2017-06-19T03:09:25Z

deep_speech_2/error_rate.py

+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function


pkuyym · 2017-06-19T03:11:00Z

deep_speech_2/error_rate.py

+    :type ignore_case: bool
+    :return: Character error rate.
+    :rtype: float
+    :raises ValueError: If reference length is zero.


pkuyym · 2017-06-19T03:12:15Z

deep_speech_2/tests/test_error_rate.py

+        try:
+            word_error_rate = error_rate.wer(ref, hyp)
+        except Exception as e:
+            self.assertTrue(isinstance(e, ValueError))


pkuyym · 2017-06-19T03:12:18Z

deep_speech_2/tests/test_error_rate.py

+        try:
+            char_error_rate = error_rate.cer(ref, hyp)
+        except Exception as e:
+            self.assertTrue(isinstance(e, ValueError))


pkuyym requested review from kuke and xinghai-sun June 8, 2017 09:22

Add error rate calculation script.

98b2a22

pkuyym force-pushed the fix-81 branch from 7313bb1 to 98b2a22 Compare June 8, 2017 09:27

kuke reviewed Jun 8, 2017

View reviewed changes

Fix typos and follow comments.

8e3c26f

xinghai-sun requested changes Jun 11, 2017

View reviewed changes

pkuyym commented Jun 12, 2017

View reviewed changes

xinghai-sun approved these changes Jun 12, 2017

View reviewed changes

xinghai-sun requested changes Jun 12, 2017

View reviewed changes

Follow comments.

9752884

pkuyym force-pushed the fix-81 branch from 0d7640f to 9752884 Compare June 13, 2017 04:24

yangyaming added 2 commits June 14, 2017 14:46

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

28dfad7

… fix-81

Add unittest.

d8345eb

xinghai-sun requested changes Jun 16, 2017

View reviewed changes

yangyaming added 3 commits June 18, 2017 13:55

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

d6300ef

… fix-81

Extend ci:

7afa9db

1. First install dependencies; 2. Surpport abosulate import.

Follow comments.

0322d75

pkuyym commented Jun 18, 2017

View reviewed changes

xinghai-sun approved these changes Jun 18, 2017

View reviewed changes

Follow comments.

ada4096

pkuyym commented Jun 19, 2017

View reviewed changes

pkuyym merged commit 0a0fcad into PaddlePaddle:develop Jun 19, 2017

pkuyym deleted the fix-81 branch June 20, 2017 02:26

		return distance[ref_len][hyp_len]


		def wer(reference, hypophysis, delimiter=' ', filter_none=True):


		.. code-block:: text

		Sc is the number of character substituted,


		distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)

		# initialization distance matrix

		@@ -0,0 +1,137 @@
		# -- * -- coding: utf-8 -- * --
		import numpy as np

		return wer


		def cer(reference, hypophysis, squeeze=True, ignore_case=False, strip_char=''):

Add error rate calculation script. #82

Add error rate calculation script. #82

Uh oh!

Conversation

pkuyym commented Jun 8, 2017

Uh oh!

kuke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 commented Jun 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinghai-sun commented Jun 11, 2017

Uh oh!

pkuyym left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!