Skip to content

Conversation

@albertvillanova
Copy link
Member

Implement Dataset.add_column.

Close #1954.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature has been asked many times :)
This is awesome to finally have it ! Thanks for adding it !

I added some suggestions:

"""
column_table = InMemoryTable.from_pydict(column)
# Concatenate tables horizontally
self._data = ConcatenationTable.from_blocks([[self._data, column_table]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._data may not be a valid table block (i.e. either a InMemoryTable or a MemoryMappedTable object).
For example if self._data is a ConcatenationTable, this won't work.

Maybe we can use another ConcatenationTable constructor for this ?
Like for example a version of ConcatenationTable.from_tables but for axis=1 ?
Under the hood this uses from_blocks anyway, but it allows any kind of tables as input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to finish this PR first: #2151 😉

@albertvillanova albertvillanova added this to the 1.6 milestone Apr 20, 2021
@albertvillanova albertvillanova modified the milestones: 1.6, 1.7 Apr 20, 2021
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! Added a few comments :)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me :)

Once #2274 is merged we can update the test function to also check the metadata

@lhoestq
Copy link
Member

lhoestq commented Apr 29, 2021

#2274 has been merged. You can now merge master into this branch and use assert_arrow_metadata_are_synced_with_dataset_features(dset) to make sure that the metadata are good :)

@albertvillanova albertvillanova merged commit 02a27b7 into huggingface:master Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add a new column

3 participants