Skip to content

Conversation

@Timothyxxx
Copy link
Contributor

@Timothyxxx Timothyxxx commented Feb 13, 2022

The last version has many problems,

  1. Errors in table load-in. Split by a single comma instead of using pandas is wrong.
  2. id reduplicated in _generate_examples function.
  3. Missing information of history questions which make it hard to use.

I fix it refer to https://github.com/HKUNLP/UnifiedSKG. And we test it to perform normally.

@Timothyxxx
Copy link
Contributor Author

Timothyxxx commented Feb 13, 2022

It shows below when I run test:

FAILED tests/test_dataset_common.py::LocalDatasetTest::test_load_dataset_all_configs_msr_sqa - ValueError: Unknown split "validation". Should be one of ['train', 'test'].

It make no sense for me😂.

@mariosasko
Copy link
Collaborator

@albertvillanova Does this PR has some additional fixes compared to #3771 or we can close it?

@albertvillanova
Copy link
Member

albertvillanova commented Feb 23, 2022

@mariosasko besides the fix of the DuplicatedKeysError, this PR:

  • changes the reading of one of the files: use pandas instead of splitting by comma
  • changes the splits: modifying train and adding validation
  • adds some extra logic in the processing of the data: adding a new field "question_and_history"

We should decide whether validating these additional changes.

  • for example, if we accept as pertinent the addition of the field "question_and_history", this should be added as feature to the info, and the matadata should be regenerated...

@Timothyxxx
Copy link
Contributor Author

Hi guys, anything we can do to fix that bug👀? @mariosasko @albertvillanova @lhoestq

@albertvillanova albertvillanova added the dataset contribution Contribution to a dataset script label Sep 22, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 3, 2022

The documentation is not available anymore as the PR was closed or merged.

@albertvillanova albertvillanova changed the title Fix problems in msr_sqa Fix bugs in msr_sqa dataset Oct 3, 2022
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Timothyxxx, thanks for your contribution fixing the bugs. And sorry for the late response.

@albertvillanova albertvillanova merged commit 55924c5 into huggingface:main Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset contribution Contribution to a dataset script

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants