Skip to content

Conversation

@lewtun
Copy link
Member

@lewtun lewtun commented May 23, 2022

This PR tweaks the keys in the metadata that are used to define the column mapping for question answering datasets. This is needed in order to faithfully reconstruct column names like answers.text and answers.answer_start from the keys in AutoTrain.

As observed in #4367 we cannot use periods . in the keys of the YAML tags, so a decision was made to use a flat mapping with underscores. For QA datasets, however, it's handy to be able to reconstruct the nesting -- hence this PR.

cc @sashavor

@lewtun lewtun requested a review from lhoestq May 23, 2022 09:13
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 23, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I have no visibility about this, but if you say it is more useful for AutoTrain this way...

@lewtun
Copy link
Member Author

lewtun commented May 23, 2022

Thanks.

I have no visibility about this, but if you say it is more useful for AutoTrain this way...

Thanks for the review @albertvillanova ! Yes, I need some way to reconstruct the original column names with a period because that's how they appear after we flatten the nested columns. In any case, we can adjust this later if needed :)

@sashavor
Copy link
Contributor

Does that mean that we need to change the metadata?

@lewtun
Copy link
Member Author

lewtun commented May 24, 2022

Does that mean that we need to change the metadata?

Yes, but this PR takes care of it :)

@sashavor
Copy link
Contributor

Oh good! thanks for the heads up!

@lewtun lewtun merged commit 4a90b8a into master May 24, 2022
@lewtun lewtun deleted the fix-qa-meta-2 branch May 24, 2022 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants