Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,8 @@ before_install:
- docker pull paddlepaddle/paddle:latest
script:
- .travis/precommit.sh
- docker run -i --rm -v "$PWD:/py_unittest" paddlepaddle/paddle:latest /bin/bash -c
"cd /py_unittest && find . -name 'tests' -type d -print0 | xargs -0 -I{} -n1 bash -c 'cd {};
python -m unittest discover -v'"
- docker run -i --rm -v "$PWD:/py_unittest" paddlepaddle/paddle:latest /bin/bash -c
'cd /py_unittest; sh .travis/unittest.sh'

notifications:
email:
Expand Down
35 changes: 35 additions & 0 deletions .travis/unittest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash

abort(){
echo "Run unittest failed" 1>&2
echo "Please check your code" 1>&2
exit 1
}

unittest(){
cd $1 > /dev/null
if [ -f "requirements.txt" ]; then
pip install -r requirements.txt
fi
if [ $? != 0 ]; then
exit 1
fi
find . -name 'tests' -type d -print0 | \
xargs -0 -I{} -n1 bash -c \
'python -m unittest discover -v -s {}'
cd - > /dev/null
}

trap 'abort' 0
set -e

for proj in */ ; do
if [ -d $proj ]; then
unittest $proj
if [ $? != 0 ]; then
exit 1
fi
fi
done

trap : 0
135 changes: 135 additions & 0 deletions deep_speech_2/error_rate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# -*- coding: utf-8 -*-
"""This module provides functions to calculate error rate in different level.
e.g. wer for word-level, cer for char-level.
"""
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

from __future__ import __absolute_import__
from __future__ import __division__
from __future__ import __print_function__

Lets keep consistent across DS2 project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Line 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line below Line 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a simple module doc.



def levenshtein_distance(ref, hyp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. rename it to _levenshtein_distance ?
  2. Add a simple description or reference for levenshtein_distance?

ref_len = len(ref)
hyp_len = len(hyp)

# special case
if ref == hyp:
return 0
if ref_len == 0:
return hyp_len
if hyp_len == 0:
return ref_len

distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int32)

# initialize distance matrix
for j in xrange(hyp_len + 1):
distance[0][j] = j
for i in xrange(ref_len + 1):
distance[i][0] = i

# calculate levenshtein distance
for i in xrange(1, ref_len + 1):
for j in xrange(1, hyp_len + 1):
if ref[i - 1] == hyp[j - 1]:
distance[i][j] = distance[i - 1][j - 1]
else:
s_num = distance[i - 1][j - 1] + 1
i_num = distance[i][j - 1] + 1
d_num = distance[i - 1][j] + 1
distance[i][j] = min(s_num, i_num, d_num)

return distance[ref_len][hyp_len]


def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
"""Calculate word error rate (WER). WER compares reference text and
hypothesis text in word-level. WER is defined as:

.. math::
WER = (Sw + Dw + Iw) / Nw

where

.. code-block:: text

Sw is the number of words subsituted,
Dw is the number of words deleted,
Iw is the number of words inserted,
Nw is the number of words in the reference

We can use levenshtein distance to calculate WER. Please draw an attention that
empty items will be removed when splitting sentences by delimiter.

:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:param delimiter: Delimiter of input sentences.
:type delimiter: char
:return: Word error rate.
:rtype: float
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add :raises ValueError: If there is zero reference words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:raises ValueError: If reference length is zero.
"""
if ignore_case == True:
reference = reference.lower()
hypothesis = hypothesis.lower()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ref_words = filter(None, reference.split(delimiter))
hyp_words = filter(None, hypothesis.split(delimiter))

if len(ref_words) == 0:
raise ValueError("Reference's word number should be greater than 0.")

edit_distance = levenshtein_distance(ref_words, hyp_words)
wer = float(edit_distance) / len(ref_words)
return wer


def cer(reference, hypothesis, ignore_case=False):
"""Calculate charactor error rate (CER). CER compares reference text and
hypothesis text in char-level. CER is defined as:

.. math::
CER = (Sc + Dc + Ic) / Nc

where

.. code-block:: text

Sc is the number of characters substituted,
Dc is the number of characters deleted,
Ic is the number of characters inserted
Nc is the number of characters in the reference

We can use levenshtein distance to calculate CER. Chinese input should be
encoded to unicode. Please draw an attention that the leading and tailing
white space characters will be truncated and multiple consecutive white
space characters in a sentence will be replaced by one white space character.

:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:return: Character error rate.
:rtype: float
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add :raises ValueError: If reference length is zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

:raises ValueError: If reference length is zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference length --> the reference length ?

The same in the cer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""
if ignore_case == True:
reference = reference.lower()
hypothesis = hypothesis.lower()

reference = ' '.join(filter(None, reference.split(' ')))
hypothesis = ' '.join(filter(None, hypothesis.split(' ')))

if len(reference) == 0:
raise ValueError("Length of reference should be greater than 0.")

edit_distance = levenshtein_distance(reference, hypothesis)
cer = float(edit_distance) / len(reference)
return cer
63 changes: 63 additions & 0 deletions deep_speech_2/tests/test_error_rate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# -*- coding: utf-8 -*-
Copy link
Contributor

@xinghai-sun xinghai-sun Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

""" Test error rate."""
from future import absolute_import
from future import division
from future import print_function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""Test error rate."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import unittest
import error_rate


class TestParse(unittest.TestCase):
def test_wer_1(self):
ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
hyp = 'i GOT IT TO the FULLEST i LOVE TO portable FROM OF STORES last night'
word_error_rate = error_rate.wer(ref, hyp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add more test cases?
e.g.
self.assertTrue(error_rate.wer(ref, ref) == 0)
test if ValueError is raised if len(ref) == 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertTrue(abs(word_error_rate - 0.769230769231) < 1e-6)

def test_wer_2(self):
ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
word_error_rate = error_rate.wer(ref, ref)
self.assertEqual(word_error_rate, 0.0)

def test_wer_3(self):
ref = ' '
hyp = 'Hypothesis sentence'
try:
word_error_rate = error_rate.wer(ref, hyp)
except Exception as e:
self.assertTrue(isinstance(e, ValueError))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertRaises?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def test_cer_1(self):
ref = 'werewolf'
hyp = 'weae wolf'
char_error_rate = error_rate.cer(ref, hyp)
self.assertTrue(abs(char_error_rate - 0.25) < 1e-6)

def test_cer_2(self):
ref = 'werewolf'
char_error_rate = error_rate.cer(ref, ref)
self.assertEqual(char_error_rate, 0.0)

def test_cer_3(self):
ref = u'我是中国人'
hyp = u'我是 美洲人'
char_error_rate = error_rate.cer(ref, hyp)
self.assertTrue(abs(char_error_rate - 0.6) < 1e-6)

def test_cer_4(self):
ref = u'我是中国人'
char_error_rate = error_rate.cer(ref, ref)
self.assertFalse(char_error_rate, 0.0)

def test_cer_5(self):
ref = ''
hyp = 'Hypothesis'
try:
char_error_rate = error_rate.cer(ref, hyp)
except Exception as e:
self.assertTrue(isinstance(e, ValueError))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assertRaises ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



if __name__ == '__main__':
unittest.main()