add xnli task #153

hatimbr · 2023-04-20T14:10:43Z

Adding xnli to lm-evaluation-harness based on this PR 134

StellaAthena · 2023-04-20T14:23:30Z

@hatimbr Thank you for the contribution! Can you run this on a couple models with public evaluations to confirm that the reported scores are what is expected?

hatimbr · 2023-04-24T12:30:51Z

Hi @StellaAthena, I tested it with bloom-7b1 and bloom-560m and compared it to this evaluation.

With bloom-7b1, I got:

Task	Prompt	Version	Metric	Value		Stderr
xnli_en	GPT-3 style	1	acc	0.3335	±	0.0067
xnli_en	GPT-3 style		acc_norm	0.3285	±	0.0066

The public accuracy score was 0.3333333432674408

With bloom -560m, I got:

Task	Prompt	Version	Metric	Value		Stderr
xnli_fr	MNLI crowdsource	1	acc	0.3497	±	0.0067
xnli_fr	MNLI crowdsource		acc_norm	0.3345	±	0.0067

The public accuracy score was 0.35261043906211853

Is it good enough ? Should I run more tests ?

add xnli task

fadd3ba

hatimbr requested review from StellaAthena and jon-tow as code owners April 20, 2023 14:10

add deepspeed integration

54a712e

hatimbr closed this May 9, 2023

hatimbr deleted the hb_dev branch May 9, 2023 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add xnli task #153

add xnli task #153

Uh oh!

hatimbr commented Apr 20, 2023

Uh oh!

StellaAthena commented Apr 20, 2023

Uh oh!

hatimbr commented Apr 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add xnli task #153

add xnli task #153

Uh oh!

Conversation

hatimbr commented Apr 20, 2023

Uh oh!

StellaAthena commented Apr 20, 2023

Uh oh!

hatimbr commented Apr 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants