-
Notifications
You must be signed in to change notification settings - Fork 3k
Implement Dataset add_column #2145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Dataset add_column #2145
Conversation
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature has been asked many times :)
This is awesome to finally have it ! Thanks for adding it !
I added some suggestions:
src/datasets/arrow_dataset.py
Outdated
| """ | ||
| column_table = InMemoryTable.from_pydict(column) | ||
| # Concatenate tables horizontally | ||
| self._data = ConcatenationTable.from_blocks([[self._data, column_table]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._data may not be a valid table block (i.e. either a InMemoryTable or a MemoryMappedTable object).
For example if self._data is a ConcatenationTable, this won't work.
Maybe we can use another ConcatenationTable constructor for this ?
Like for example a version of ConcatenationTable.from_tables but for axis=1 ?
Under the hood this uses from_blocks anyway, but it allows any kind of tables as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to finish this PR first: #2151 😉
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! Added a few comments :)
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good to me :)
Once #2274 is merged we can update the test function to also check the metadata
|
#2274 has been merged. You can now merge master into this branch and use |
Implement
Dataset.add_column.Close #1954.