Return Python float instead of numpy.float64 in sklearn metrics #2612

lewtun · 2021-07-09T09:48:09Z

This PR converts the return type of all sklearn metrics to be Python float instead of numpy.float64.

The reason behind this is that our Hub evaluation framework relies on converting benchmark-specific metrics to YAML (example) and the numpy.float64 format produces garbage like:

import yaml
from datasets import load_metric

metric = load_metric("accuracy")
score = metric.compute(predictions=[0,1], references=[0,1])
print(yaml.dump(score["accuracy"])) # output below
# !!python/object/apply:numpy.core.multiarray.scalar
# - !!python/object/apply:numpy.dtype
#   args:
#   - f8
#   - false
#   - true
#   state: !!python/tuple
#   - 3
#   - <
#   - null
#   - null
#   - null
#   - -1
#   - -1
#   - 0
# - !!binary |
#   AAAAAAAA8D8=

lewtun · 2021-07-09T10:02:09Z

I opened an issue on the sklearn repo to understand why numpy.float64 is the default: scikit-learn/scikit-learn#20490

lhoestq

Thanks ! :)

lhoestq · 2021-07-09T12:58:38Z

It could be surprising at first to use tolist() on numpy scalars but it works ^^

lhoestq · 2021-07-09T13:23:11Z

did the same for Pearsonr here: #2614

Return Python float instead of numpy.float64 in sklearn metrics

da0507a

lewtun requested review from albertvillanova and lhoestq July 9, 2021 09:48

lhoestq approved these changes Jul 9, 2021

View reviewed changes

lhoestq merged commit 060dc85 into huggingface:master Jul 9, 2021

lewtun deleted the cast-numpy-to-python-types branch July 9, 2021 13:05

lewtun mentioned this pull request Jul 9, 2021

Use ndarray.item instead of ndarray.tolist #2613

Merged

lhoestq mentioned this pull request Jul 9, 2021

Convert numpy scalar to python float in Pearsonr output #2614

Merged

albertvillanova added this to the 1.10 milestone Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Return Python float instead of numpy.float64 in sklearn metrics #2612

Return Python float instead of numpy.float64 in sklearn metrics #2612

Uh oh!

lewtun commented Jul 9, 2021 •

edited

Loading

Uh oh!

lewtun commented Jul 9, 2021

Uh oh!

lhoestq left a comment

Uh oh!

lhoestq commented Jul 9, 2021

Uh oh!

lhoestq commented Jul 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Return Python float instead of numpy.float64 in sklearn metrics #2612

Return Python float instead of numpy.float64 in sklearn metrics #2612

Uh oh!

Conversation

lewtun commented Jul 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewtun commented Jul 9, 2021

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

lhoestq commented Jul 9, 2021

Uh oh!

lhoestq commented Jul 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lewtun commented Jul 9, 2021 •

edited

Loading