-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Feature request
In this blog post https://huggingface.co/blog/audio-datasets, I noticed the following code:
COLUMNS_TO_KEEP = ["text", "audio"]
all_columns = gigaspeech["train"].column_names
columns_to_remove = set(all_columns) - set(COLUMNS_TO_KEEP)
gigaspeech = gigaspeech.remove_columns(columns_to_remove)This kind of thing happens a lot when you don't need to keep all columns from the dataset. It would be more convenient (and less error prone) if you could just write:
gigaspeech = gigaspeech.keep_columns(["text", "audio"])Internally, keep_columns could still call remove_columns, but it expresses more clearly what the user's intent is.
Motivation
Less code to write for the user of the dataset.
Your contribution
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers