Skip to content

set_format resets features #8110

@plutonium-239

Description

@plutonium-239

Describe the bug

Calling dataset.set_format('...') resets the informative features such as Array2D etc. back into List(List(Value)) and so on.

Steps to reproduce the bug

>>> dataset.features
{'lld': Array2D(shape=(None, 26), dtype='float32')}

>>> dataset.set_format('torch')
>>> dataset.features
{'lld': List(List(Value('float32')))}

Expected behavior

The feature information is not lost during set_format, and the feature remains Array2D with the correct shape and dtype in the above example.

Environment info

  • datasets version: 4.8.2
  • Platform: Linux-6.8.0-78-generic-x86_64-with-glibc2.39
  • Python version: 3.13.11
  • huggingface_hub version: 0.36.2
  • PyArrow version: 23.0.1
  • Pandas version: 2.3.3
  • fsspec version: 2026.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions