Skip to content

Conversation

@davidheineman
Copy link
Member

Remove sklearn which was being used to compute F1 score. Adds a custom implementation.

This function shows equivalence between the two implementations:

from sklearn.metrics import f1_score as sklearn_f1

def custom_f1_score(y_true, y_pred, pos_label=1):
    y_true = list(y_true)
    y_pred = list(y_pred)
    tp = sum((yt == pos_label) and (yp == pos_label) for yt, yp in zip(y_true, y_pred))
    fp = sum((yt != pos_label) and (yp == pos_label) for yt, yp in zip(y_true, y_pred))
    fn = sum((yt == pos_label) and (yp != pos_label) for yt, yp in zip(y_true, y_pred))

    if tp + fp == 0 or tp + fn == 0:
        return 0.0

    precision = tp / (tp + fp)
    recall = tp / (tp + fn)

    if precision + recall == 0:
        return 0.0

    return 2 * precision * recall / (precision + recall)

# Test
labels = [0, 1, 0, 1, 0, 0, 1]
preds = [0, 1, 1, 1, 0, 0, 0]
preds_no_leading_space = [0, 1, 0, 1, 1, 0, 0]

score_sklearn = sklearn_f1(labels, preds, pos_label=0)
score_custom = custom_f1_score(labels, preds, pos_label=0)

score_sklearn_no_leading = sklearn_f1(labels, preds_no_leading_space, pos_label=0)
score_custom_no_leading = custom_f1_score(labels, preds_no_leading_space, pos_label=0)

def isclose(a, b, rel_tol=1e-9, abs_tol=0.0):
    return abs(a - b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)

assert isclose(score_custom, score_sklearn)
assert isclose(score_custom_no_leading, score_sklearn_no_leading)

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one note about the CHANGELOG, otherwise LGTM

Co-authored-by: Pete Walsh <[email protected]>
@davidheineman davidheineman merged commit ed1b21b into main Jul 19, 2025
7 checks passed
@davidheineman davidheineman deleted the clean-deps branch July 19, 2025 22:15
davidheineman added a commit to allenai/OLMo-core that referenced this pull request Jul 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants