Skip to content

Conversation

@konflux-internal-p02
Copy link

This PR contains the following updates:

Package Update Change
datasets minor ==2.15.0 -> ==2.21.0

Warning

Some dependencies could not be looked up. Check the warning logs for more information.


Release Notes

huggingface/datasets (datasets)

v2.21.0

Compare Source

Features
What's Changed
New Contributors

Full Changelog: huggingface/datasets@2.20.0...2.21.0

v2.20.0

Compare Source

Important
Datasets features
  • [Resumable IterableDataset] Add IterableDataset state_dict by @​lhoestq in https://github.com/huggingface/datasets/pull/6658
    • checkpoint and resume an iterable dataset (e.g. when streaming):

      >>> iterable_dataset = Dataset.from_dict({"a": range(6)}).to_iterable_dataset(num_shards=3)
      >>> for idx, example in enumerate(iterable_dataset):
      ...     print(example)
      ...     if idx == 2:
      ...         state_dict = iterable_dataset.state_dict()
      ...         print("checkpoint")
      ...         break
      >>> iterable_dataset.load_state_dict(state_dict)
      >>> print(f"restart from checkpoint")
      >>> for example in iterable_dataset:
      ...     print(example)

      Returns:

      {'a': 0}
      {'a': 1}
      {'a': 2}
      checkpoint
      restart from checkpoint
      {'a': 3}
      {'a': 4}
      {'a': 5}
      
General improvements and bug fixes
New Contributors

Full Changelog: huggingface/datasets@2.19.0...2.20.0

v2.19.2

Compare Source

Bug fixes

Full Changelog: huggingface/datasets@2.19.1...2.19.2

v2.19.1

Compare Source

Bug fixes

Full Changelog: huggingface/datasets@2.19.0...2.19.1

v2.19.0

Compare Source

Dataset Features
General improvements and bug fixes
New Contributors

Full Changelog: huggingface/datasets@2.18.0...2.19.0

v2.18.0

Compare Source

Dataset features
General improvements and bug fixes
New Contributors

Full Changelog: huggingface/datasets@2.17.1...2.18.0

v2.17.1

Compare Source

Bug Fixes

Full Changelog: huggingface/datasets@2.17.0...2.17.1

v2.17.0

Compare Source

Dataset Features
General improvements and bug fixes
New Contributors

Full Changelog: huggingface/datasets@2.16.1...2.17.0

v2.16.1

Compare Source

Bug fixes

Full Changelog: huggingface/datasets@2.16.0...2.16.1

v2.16.0

Compare Source

Security features
  • Add trust_remote_code argument by @​lhoestq in https://github.com/huggingface/datasets/pull/6429
    • Some Hugging Face datasets contain custom code which must be executed to correctly load the dataset. The code can be inspected in the repository content at https://hf.co/datasets/<repo_id>. A warning is shown to let the user know about the custom code, and they can avoid this message in future by passing the argument trust_remote_code=True.
    • Passing trust_remote_code=True will be mandatory to load these datasets from the next major release of datasets.
    • Using the environment variable HF_DATASETS_TRUST_REMOTE_CODE=0 you can already disable custom code by default without waiting for the next release of datasets
  • Use parquet export if possible by @​lhoestq in https://github.com/huggingface/datasets/pull/6448
    • This allows loading most old datasets based on custom code by downloading the Parquet export provided by Hugging Face
    • You can see a dataset's Parquet export at https://hf.co/datasets/<repo_id>/tree/refs%2Fconvert%2Fparquet
Features

Configuration

📅 Schedule: Branch creation - "after 5am on saturday" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

Signed-off-by: konflux-internal-p02 <126015336+konflux-internal-p02[bot]@users.noreply.github.com>
@konflux-internal-p02
Copy link
Author

Edited/Blocked Notification

Renovate will not automatically rebase this PR, because it does not recognize the last commit author and assumes somebody else may have edited the PR.

You can manually request rebase by checking the rebase/retry box above.

⚠️ Warning: custom changes will be lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants