You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/a_quick_tour.mdx
-18Lines changed: 0 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,7 +65,6 @@ All evalution modules come with a range of useful attributes that help to use a
65
65
|---|---|
66
66
|`description`|A short description of the evaluation module.|
67
67
|`citation`|A BibTex string for citation when available.|
68
-
|`config`| A `dataclass` containing the settings of the module. |
69
68
|`features`|A `Features` object defining the input format.|
70
69
|`inputs_description`|This is equivalent to the modules docstring.|
71
70
|`homepage`|The homepage of the module.|
@@ -179,23 +178,6 @@ A common way to overcome this issue is to fallback on single process evaluation.
179
178
180
179
This solution allows 🤗 Evaluate to perform distributed predictions, which is important for evaluation speed in distributed settings. At the same time, you can also use complex non-additive metrics without wasting valuable GPU or CPU memory.
181
180
182
-
## Configuration
183
-
184
-
Some metrics can be configured with additional settings. For example, `accuracy` has an extra `normalize` setting which returns the fraction of correctly classified samples and is set to `True` by default. To change it you have two options: pass it as a keyword argument with `load()` or during `compute()`. With `load()`, the setting is changed permanently for the module, while passing it to `compute()` only changes it for the duration of the `compute()` call.
This is also useful for the following `combine()` method since it allows to load modules with specific settings before combining them.
198
-
199
181
## Combining several evaluations
200
182
201
183
Often one wants to not only evaluate a single metric but a range of different metrics capturing different aspects of a model. E.g. for classification it is usually a good idea to compute F1-score, recall, and precision in addition to accuracy to get a better picture of model performance. Naturally, you can load a bunch of metrics and call them sequentially. However, a more convenient way is to use the [`~evaluate.combine`] function to bundle them together:
Copy file name to clipboardExpand all lines: measurements/toxicity/README.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ The model should be compatible with the AutoModelForSequenceClassification class
30
30
For more information, see [the AutoModelForSequenceClassification documentation](https://huggingface.co/docs/transformers/master/en/model_doc/auto#transformers.AutoModelForSequenceClassification).
31
31
32
32
Args:
33
-
`data` (list of str): prediction/candidate sentences
33
+
`predictions` (list of str): prediction/candidate sentences
34
34
`toxic_label` (str) (optional): the toxic label that you want to detect, depending on the labels that the model has been trained on.
35
35
This can be found using the `id2label` function, e.g.:
36
36
```python
@@ -47,7 +47,7 @@ Args:
47
47
48
48
## Output values
49
49
50
-
`toxicity`: a list of toxicity scores, one for each sentence in `data` (default behavior)
50
+
`toxicity`: a list of toxicity scores, one for each sentence in `predictions` (default behavior)
51
51
52
52
`max_toxicity`: the maximum toxicity over all scores (if `aggregation` = `maximum`)
0 commit comments