You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Fix argument name in sacrebleu kwargs description
* Improve sacrebleu kwargs description
* Align documentation on using metrics with sacrebleu kwargs description
* Add example of passing additional arguments in using metrics
* Fix style
Copy file name to clipboardExpand all lines: docs/source/using_metrics.rst
+44-28Lines changed: 44 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,13 +69,15 @@ Here is an example for the sacrebleu metric:
69
69
from a source against one or more references.
70
70
71
71
Args:
72
-
predictions: The system stream (a sequence of segments)
73
-
references: A list of one or more reference streams (each a sequence of segments)
74
-
smooth: The smoothing method to use
75
-
smooth_value: For 'floor' smoothing, the floor to use
76
-
force: Ignore data that looks already tokenized
77
-
lowercase: Lowercase the data
78
-
tokenize: The tokenizer to use
72
+
predictions: The system stream (a sequence of segments).
73
+
references: A list of one or more reference streams (each a sequence of segments).
74
+
smooth_method: The smoothing method to use. (Default: 'exp').
75
+
smooth_value: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
76
+
tokenize: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
77
+
Japanese and '13a' (mteval) otherwise.
78
+
lowercase: Lowercase the data. If True, enables case-insensitivity. (Default: False).
79
+
force: Insist that your tokenized input is actually detokenized.
80
+
79
81
Returns:
80
82
'score': BLEU score,
81
83
'counts': Counts,
@@ -84,6 +86,7 @@ Here is an example for the sacrebleu metric:
84
86
'bp': Brevity penalty,
85
87
'sys_len': predictions length,
86
88
'ref_len': reference length,
89
+
87
90
Examples:
88
91
89
92
>>> predictions = ["hello there general kenobi", "foo bar foobar"]
@@ -101,15 +104,17 @@ Here is an example for the sacrebleu metric:
101
104
>>> print(metric.inputs_description)
102
105
Produces BLEU scores along with its sufficient statistics
103
106
from a source against one or more references.
104
-
107
+
105
108
Args:
106
-
predictions: The system stream (a sequence of segments)
107
-
references: A list of one or more reference streams (each a sequence of segments)
108
-
smooth: The smoothing method to use
109
-
smooth_value: For 'floor' smoothing, the floor to use
110
-
force: Ignore data that looks already tokenized
111
-
lowercase: Lowercase the data
112
-
tokenize: The tokenizer to use
109
+
predictions: The system stream (a sequence of segments).
110
+
references: A list of one or more reference streams (each a sequence of segments).
111
+
smooth_method: The smoothing method to use. (Default: 'exp').
112
+
smooth_value: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
113
+
tokenize: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
114
+
Japanese and '13a' (mteval) otherwise.
115
+
lowercase: Lowercase the data. If True, enables case-insensitivity. (Default: False).
116
+
force: Insist that your tokenized input is actually detokenized.
117
+
113
118
Returns:
114
119
'score': BLEU score,
115
120
'counts': Counts,
@@ -118,6 +123,7 @@ Here is an example for the sacrebleu metric:
118
123
'bp': Brevity penalty,
119
124
'sys_len': predictions length,
120
125
'ref_len': reference length,
126
+
121
127
Examples:
122
128
>>> predictions = ["hello there general kenobi", "foo bar foobar"]
123
129
>>> references = [["hello there general kenobi", "hello there !"], ["foo bar foobar", "foo bar foobar"]]
@@ -168,7 +174,7 @@ Let's use ``sacrebleu`` with the official quick-start example on its homepage at
168
174
169
175
Note that the format of the inputs is a bit different than the official sacrebleu format: we provide the references for each prediction in a list inside the list associated to the prediction while the official example is nested the other way around (list for the reference numbers and inside list for the examples).
170
176
171
-
Querying the length of a Metric object will return the number of examples (predictions or predictions/references pair) currently stored in the metric's cache. As we can see on the last line, we have stored three evaluation examples in our metric.
177
+
Querying the length of a Metric object will return the number of examples (predictions or predictions/references pair) currently stored in the metric's cache. As we can see on the last line, we have stored three evaluation examples in our metric.
172
178
173
179
Now let's compute the sacrebleu score from these 3 evaluation datapoints.
174
180
@@ -195,11 +201,18 @@ These additional arguments are detailed in the metric information.
195
201
196
202
For example ``sacrebleu`` accepts the following additional arguments:
197
203
198
-
- ``smooth``: The smoothing method to use
199
-
- ``smooth_value``: For 'floor' smoothing, the floor to use
200
-
- ``force``: Ignore data that looks already tokenized
201
-
- ``lowercase``: Lowercase the data
202
-
- ``tokenize``: The tokenizer to use
204
+
- ``smooth_method``: The smoothing method to use. (Default: 'exp').
205
+
- ``smooth_value``: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
206
+
- ``tokenize``: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
207
+
Japanese and '13a' (mteval) otherwise.
208
+
- ``lowercase``: Lowercase the data. If True, enables case-insensitivity. (Default: False).
209
+
- ``force``: Insist that your tokenized input is actually detokenized.
210
+
211
+
To use `"floor"` smooth method with floor value 0.2, pass these arguments to :func:`datasets.Metric.compute`:
You can list these arguments with ``print(metric)`` or ``print(metric.inputs_description)`` as we saw in the previous section and have more details on the official ``sacrebleu`` homepage and publication (accessible with ``print(metric.homepage)`` and ``print(metric.citation)``):
205
218
@@ -210,13 +223,15 @@ You can list these arguments with ``print(metric)`` or ``print(metric.inputs_des
210
223
from a source against one or more references.
211
224
212
225
Args:
213
-
predictions: The system stream (a sequence of segments)
214
-
references: A list of one or more reference streams (each a sequence of segments)
215
-
smooth: The smoothing method to use
216
-
smooth_value: For 'floor' smoothing, the floor to use
217
-
force: Ignore data that looks already tokenized
218
-
lowercase: Lowercase the data
219
-
tokenize: The tokenizer to use
226
+
predictions: The system stream (a sequence of segments).
227
+
references: A list of one or more reference streams (each a sequence of segments).
228
+
smooth_method: The smoothing method to use. (Default: 'exp').
229
+
smooth_value: The smoothing value. Only valid for 'floor' and 'add-k'. (Defaults: floor: 0.1, add-k: 1).
230
+
tokenize: Tokenization method to use for BLEU. If not provided, defaults to 'zh' for Chinese, 'ja-mecab' for
231
+
Japanese and '13a' (mteval) otherwise.
232
+
lowercase: Lowercase the data. If True, enables case-insensitivity. (Default: False).
233
+
force: Insist that your tokenized input is actually detokenized.
234
+
220
235
Returns:
221
236
'score': BLEU score,
222
237
'counts': Counts,
@@ -225,6 +240,7 @@ You can list these arguments with ``print(metric)`` or ``print(metric.inputs_des
225
240
'bp': Brevity penalty,
226
241
'sys_len': predictions length,
227
242
'ref_len': reference length,
243
+
228
244
Examples:
229
245
>>> predictions = ["hello there general kenobi", "foo bar foobar"]
230
246
>>> references = [["hello there general kenobi", "hello there !"], ["foo bar foobar", "foo bar foobar"]]
0 commit comments