Skip to content

Conversation

@pkuyym
Copy link
Contributor

@pkuyym pkuyym commented Jun 8, 2017

resolves #81

@pkuyym pkuyym requested review from kuke and xinghai-sun June 8, 2017 09:22
Copy link
Collaborator

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM

:param hypophysis: The hypophysis sentence.
:type reference: str
:param squeeze: If set true, consecutive space character
will be squeezed to one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to add and indent in line 114? The same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:param squeeze: If set true, consecutive space character
will be squeezed to one
:type squeezed: bool
:param ignore_case: Whether ignoring character case.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether case-sensitive or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def wer(reference, hypophysis, delimiter=' ', filter_none=True):
"""
Calculate word error rate (WER). WER is a popular evaluation metric used
in speech recognition. It compares a reference to an hypophysis and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare to --> compare with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return ref_len

distance = np.zeros((ref_len + 1) * (hyp_len + 1), dtype=np.int64)
distance = distance.reshape((ref_len + 1, hyp_len + 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above two lines can be merged into one line

distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Iw is the number of words inserted,
Nw is the number of words in the reference

We can use levenshtein distance to calculate WER. Take an attention that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is no "take an attention", but pay/draw ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@qingqing01
Copy link
Collaborator

比较通用的代码可以放到paddle的repo里吧~

return distance[ref_len][hyp_len]


def wer(reference, hypophysis, delimiter=' ', filter_none=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. "hypophysis" or "hypothesis"?
  2. Why not provide a ignore_case argument, just as CER does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

reference and hypophysis sentences before calculating WER.

:param reference: The reference sentence.
:type reference: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str --> basestring. The same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:return: WER
:rtype: float
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:type delimiter: char
:param filter_none: Whether to remove None value when splitting sentence.
:type filter_none: bool
:return: WER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"WER" --> "Word error rate."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


.. code-block:: text

Sc is the number of character substituted,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

character --> characters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def wer(reference, hypophysis, delimiter=' ', filter_none=True):
"""
Calculate word error rate (WER). WER is a popular evaluation metric used
in speech recognition. It compares a reference with an hypophysis and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. an hypothesis --> a hypothesis.
  2. Please keep this description similar to cer function doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def cer(reference, hypophysis, squeeze=True, ignore_case=False, strip_char=''):
"""
Calculate charactor error rate (CER). CER will compare reference text and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will compare --> compares

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if hyp_len == 0:
return ref_len

distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32 is enough here. Done.


distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)

# initialization distance matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialization --> initialize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,137 @@
# -- * -- coding: utf-8 -- * --
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a simple module doc.

@xinghai-sun
Copy link
Contributor

@qingqing01 目前因为ctc_beam_search暂时实现在计算图外,所以相应的WER/CER也只能临时放到计算图外,不合适放入paddle layer/evaluator.

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary parameters to keep the processing logic correct and simple.

if hyp_len == 0:
return ref_len

distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32 is enough here. Done.


distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int64)

# initialization distance matrix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:type delimiter: char
:param filter_none: Whether to remove None value when splitting sentence.
:type filter_none: bool
:return: WER
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

reference and hypophysis sentences before calculating WER.

:param reference: The reference sentence.
:type reference: str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:return: WER
:rtype: float
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def wer(reference, hypophysis, delimiter=' ', filter_none=True):
"""
Calculate word error rate (WER). WER is a popular evaluation metric used
in speech recognition. It compares a reference with an hypophysis and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

raise ValueError("Reference's word number should be greater than 0.")

if filter_none == True:
ref_words = filter(None, reference.strip(delimiter).split(delimiter))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return wer


def cer(reference, hypophysis, squeeze=True, ignore_case=False, strip_char=''):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


.. code-block:: text

Sc is the number of character substituted,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

hypophysis = hypophysis.strip(strip_char)
if squeeze == True:
reference = ' '.join(filter(None, reference.split(' ')))
hypophysis = ' '.join(filter(None, hypophysis.split(' ')))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a unit test (with both English and Mandarin test case)?

@@ -0,0 +1,133 @@
# -*- coding: utf-8 -*-
"""
This module provides functions to calculate error rate in different level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-->"""This .....
Let's keep consistent across all files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
"""
Calculate word error rate (WER). WER compares reference text and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> """ Calculate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def cer(reference, hypothesis, ignore_case=False):
"""
Calculate charactor error rate (CER). CER compares reference text and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> """Calculate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:return: Character error rate.
:rtype: float
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add :raises ValueError: If reference length is zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:param delimiter: Delimiter of input sentences.
:type delimiter: char
:return: Word error rate.
:rtype: float
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add :raises ValueError: If there is zero reference words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,29 @@
# -*- coding: utf-8 -*-
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

""" Test error rate."""
from future import absolute_import
from future import division
from future import print_function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""
This module provides functions to calculate error rate in different level.
e.g. wer for word-level, cer for char-level.
"""
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

from __future__ import __absolute_import__
from __future__ import __division__
from __future__ import __print_function__

Lets keep consistent across DS2 project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# -*- coding: utf-8 -*-
import unittest
import sys
sys.path.append('..')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid using sys.path.append?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def test_wer(self):
ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
hyp = 'i GOT IT TO the FULLEST i LOVE TO portable FROM OF STORES last night'
word_error_rate = error_rate.wer(ref, hyp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add more test cases?
e.g.
self.assertTrue(error_rate.wer(ref, ref) == 0)
test if ValueError is raised if len(ref) == 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

yangyaming added 3 commits June 18, 2017 13:55
Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend ci and fix unittest following comments.

@@ -0,0 +1,133 @@
# -*- coding: utf-8 -*-
"""
This module provides functions to calculate error rate in different level.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""
This module provides functions to calculate error rate in different level.
e.g. wer for word-level, cer for char-level.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
"""
Calculate word error rate (WER). WER compares reference text and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:param delimiter: Delimiter of input sentences.
:type delimiter: char
:return: Word error rate.
:rtype: float
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:return: Character error rate.
:rtype: float
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def cer(reference, hypothesis, ignore_case=False):
"""
Calculate charactor error rate (CER). CER compares reference text and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,29 @@
# -*- coding: utf-8 -*-
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# -*- coding: utf-8 -*-
import unittest
import sys
sys.path.append('..')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def test_wer(self):
ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
hyp = 'i GOT IT TO the FULLEST i LOVE TO portable FROM OF STORES last night'
word_error_rate = error_rate.wer(ref, hyp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Almost LGTM.

"""This module provides functions to calculate error rate in different level.
e.g. wer for word-level, cer for char-level.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Line 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line below Line 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

try:
word_error_rate = error_rate.wer(ref, hyp)
except Exception as e:
self.assertTrue(isinstance(e, ValueError))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertRaises?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

try:
char_error_rate = error_rate.cer(ref, hyp)
except Exception as e:
self.assertTrue(isinstance(e, ValueError))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assertRaises ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:type ignore_case: bool
:return: Character error rate.
:rtype: float
:raises ValueError: If reference length is zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference length --> the reference length ?

The same in the cer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import numpy as np


def levenshtein_distance(ref, hyp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. rename it to _levenshtein_distance ?
  2. Add a simple description or reference for levenshtein_distance?

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow comments.

"""This module provides functions to calculate error rate in different level.
e.g. wer for word-level, cer for char-level.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:type ignore_case: bool
:return: Character error rate.
:rtype: float
:raises ValueError: If reference length is zero.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

try:
word_error_rate = error_rate.wer(ref, hyp)
except Exception as e:
self.assertTrue(isinstance(e, ValueError))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

try:
char_error_rate = error_rate.cer(ref, hyp)
except Exception as e:
self.assertTrue(isinstance(e, ValueError))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pkuyym pkuyym merged commit 0a0fcad into PaddlePaddle:develop Jun 19, 2017
@pkuyym pkuyym deleted the fix-81 branch June 20, 2017 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add WER and CER evaluation script.

4 participants