Skip to content

Commit 204f124

Browse files
Fix sacrebleu parameter name (#2674)
* Fix argument name in sacrebleu kwargs description * Improve sacrebleu kwargs description * Align documentation on using metrics with sacrebleu kwargs description * Add example of passing additional arguments in using metrics * Fix style
1 parent df941fe commit 204f124

File tree

2 files changed

+54
-35
lines changed

2 files changed

+54
-35
lines changed

docs/source/using_metrics.rst

Lines changed: 44 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -69,13 +69,15 @@ Here is an example for the sacrebleu metric:
6969
from a source against one or more references.
7070
7171
Args:
72-
predictions: The system stream (a sequence of segments)
73-
references: A list of one or more reference streams (each a sequence of segments)
74-
smooth: The smoothing method to use
75-
smooth_value: For 'floor' smoothing, the floor to use
76-
force: Ignore data that looks already tokenized
77-
lowercase: Lowercase the data
78-
tokenize: The tokenizer to use
72+
predictions: The system stream (a sequence of segments).
73+
references: A list of one or more reference streams (each a sequence of segments).
74+
smooth_method: The smoothing method to use. (Default: 'exp').
75+
smooth_value: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
76+
tokenize: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
77+
Japanese and '13a' (mteval) otherwise.
78+
lowercase: Lowercase the data. If True, enables case-insensitivity. (Default: False).
79+
force: Insist that your tokenized input is actually detokenized.
80+
7981
Returns:
8082
'score': BLEU score,
8183
'counts': Counts,
@@ -84,6 +86,7 @@ Here is an example for the sacrebleu metric:
8486
'bp': Brevity penalty,
8587
'sys_len': predictions length,
8688
'ref_len': reference length,
89+
8790
Examples:
8891
8992
>>> predictions = ["hello there general kenobi", "foo bar foobar"]
@@ -101,15 +104,17 @@ Here is an example for the sacrebleu metric:
101104
>>> print(metric.inputs_description)
102105
Produces BLEU scores along with its sufficient statistics
103106
from a source against one or more references.
104-
107+
105108
Args:
106-
predictions: The system stream (a sequence of segments)
107-
references: A list of one or more reference streams (each a sequence of segments)
108-
smooth: The smoothing method to use
109-
smooth_value: For 'floor' smoothing, the floor to use
110-
force: Ignore data that looks already tokenized
111-
lowercase: Lowercase the data
112-
tokenize: The tokenizer to use
109+
predictions: The system stream (a sequence of segments).
110+
references: A list of one or more reference streams (each a sequence of segments).
111+
smooth_method: The smoothing method to use. (Default: 'exp').
112+
smooth_value: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
113+
tokenize: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
114+
Japanese and '13a' (mteval) otherwise.
115+
lowercase: Lowercase the data. If True, enables case-insensitivity. (Default: False).
116+
force: Insist that your tokenized input is actually detokenized.
117+
113118
Returns:
114119
'score': BLEU score,
115120
'counts': Counts,
@@ -118,6 +123,7 @@ Here is an example for the sacrebleu metric:
118123
'bp': Brevity penalty,
119124
'sys_len': predictions length,
120125
'ref_len': reference length,
126+
121127
Examples:
122128
>>> predictions = ["hello there general kenobi", "foo bar foobar"]
123129
>>> references = [["hello there general kenobi", "hello there !"], ["foo bar foobar", "foo bar foobar"]]
@@ -168,7 +174,7 @@ Let's use ``sacrebleu`` with the official quick-start example on its homepage at
168174
169175
Note that the format of the inputs is a bit different than the official sacrebleu format: we provide the references for each prediction in a list inside the list associated to the prediction while the official example is nested the other way around (list for the reference numbers and inside list for the examples).
170176

171-
Querying the length of a Metric object will return the number of examples (predictions or predictions/references pair) currently stored in the metric's cache. As we can see on the last line, we have stored three evaluation examples in our metric.
177+
Querying the length of a Metric object will return the number of examples (predictions or predictions/references pair) currently stored in the metric's cache. As we can see on the last line, we have stored three evaluation examples in our metric.
172178

173179
Now let's compute the sacrebleu score from these 3 evaluation datapoints.
174180

@@ -195,11 +201,18 @@ These additional arguments are detailed in the metric information.
195201

196202
For example ``sacrebleu`` accepts the following additional arguments:
197203

198-
- ``smooth``: The smoothing method to use
199-
- ``smooth_value``: For 'floor' smoothing, the floor to use
200-
- ``force``: Ignore data that looks already tokenized
201-
- ``lowercase``: Lowercase the data
202-
- ``tokenize``: The tokenizer to use
204+
- ``smooth_method``: The smoothing method to use. (Default: 'exp').
205+
- ``smooth_value``: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
206+
- ``tokenize``: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
207+
Japanese and '13a' (mteval) otherwise.
208+
- ``lowercase``: Lowercase the data. If True, enables case-insensitivity. (Default: False).
209+
- ``force``: Insist that your tokenized input is actually detokenized.
210+
211+
To use `"floor"` smooth method with floor value 0.2, pass these arguments to :func:`datasets.Metric.compute`:
212+
213+
.. code-block::
214+
215+
score = metric.compute(smooth_method="floor", smooth_value=0.2)
203216
204217
You can list these arguments with ``print(metric)`` or ``print(metric.inputs_description)`` as we saw in the previous section and have more details on the official ``sacrebleu`` homepage and publication (accessible with ``print(metric.homepage)`` and ``print(metric.citation)``):
205218

@@ -210,13 +223,15 @@ You can list these arguments with ``print(metric)`` or ``print(metric.inputs_des
210223
from a source against one or more references.
211224
212225
Args:
213-
predictions: The system stream (a sequence of segments)
214-
references: A list of one or more reference streams (each a sequence of segments)
215-
smooth: The smoothing method to use
216-
smooth_value: For 'floor' smoothing, the floor to use
217-
force: Ignore data that looks already tokenized
218-
lowercase: Lowercase the data
219-
tokenize: The tokenizer to use
226+
predictions: The system stream (a sequence of segments).
227+
references: A list of one or more reference streams (each a sequence of segments).
228+
smooth_method: The smoothing method to use. (Default: 'exp').
229+
smooth_value: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
230+
tokenize: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
231+
Japanese and '13a' (mteval) otherwise.
232+
lowercase: Lowercase the data. If True, enables case-insensitivity. (Default: False).
233+
force: Insist that your tokenized input is actually detokenized.
234+
220235
Returns:
221236
'score': BLEU score,
222237
'counts': Counts,
@@ -225,6 +240,7 @@ You can list these arguments with ``print(metric)`` or ``print(metric.inputs_des
225240
'bp': Brevity penalty,
226241
'sys_len': predictions length,
227242
'ref_len': reference length,
243+
228244
Examples:
229245
>>> predictions = ["hello there general kenobi", "foo bar foobar"]
230246
>>> references = [["hello there general kenobi", "hello there !"], ["foo bar foobar", "foo bar foobar"]]

metrics/sacrebleu/sacrebleu.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,15 @@
4747
from a source against one or more references.
4848
4949
Args:
50-
predictions: The system stream (a sequence of segments)
51-
references: A list of one or more reference streams (each a sequence of segments)
52-
smooth: The smoothing method to use
53-
smooth_value: For 'floor' smoothing, the floor to use
54-
force: Ignore data that looks already tokenized
55-
lowercase: Lowercase the data
56-
tokenize: The tokenizer to use
50+
predictions: The system stream (a sequence of segments).
51+
references: A list of one or more reference streams (each a sequence of segments).
52+
smooth_method: The smoothing method to use. (Default: 'exp').
53+
smooth_value: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
54+
tokenize: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
55+
Japanese and '13a' (mteval) otherwise.
56+
lowercase: Lowercase the data. If True, enables case-insensitivity. (Default: False).
57+
force: Insist that your tokenized input is actually detokenized.
58+
5759
Returns:
5860
'score': BLEU score,
5961
'counts': Counts,
@@ -62,6 +64,7 @@
6264
'bp': Brevity penalty,
6365
'sys_len': predictions length,
6466
'ref_len': reference length,
67+
6568
Examples:
6669
6770
>>> predictions = ["hello there general kenobi", "foo bar foobar"]

0 commit comments

Comments
 (0)