Conversation
It was possible that although computed type is SparseFrame, other type is returned (if meta was not a SparseFrame). Imports are not changed, just reorganized.
Support for dsp[index] syntax. Doesn't aim to work the same as in pandas, just maps __getitem__ onto partitions.
Previously it returned DataFrame, even though in case of 1-column non-empty SparseFrame, it returned Series. Imports are only re-organized.
Codecov Report
@@ Coverage Diff @@
## master #79 +/- ##
==========================================
+ Coverage 89.53% 89.81% +0.27%
==========================================
Files 7 7
Lines 1204 1217 +13
==========================================
+ Hits 1078 1093 +15
+ Misses 126 124 -2
Continue to review full report at Codecov.
|
It works by mapping SparseFrame.todense onto partitions. It as necessary to allow `map_partitions` to return other types then SparseFrame, so kwarg `cls` was added. It implies that one cannot use `cls` kwarg as an argument to mapped function (because it will be consumed by `map_partitions` and not passed to a mapped function).
kayibal
left a comment
There was a problem hiding this comment.
Nice new features although this introduces some breaking changes. I just have a few remarks on the todense() method.
sparsity/dask/core.py
Outdated
| return map_partitions(func, self, meta, *args, **kwargs) | ||
|
|
||
| # noinspection PyTypeChecker | ||
| def todense(self): |
There was a problem hiding this comment.
I'd rather use to delayed here, then apply to_dense on each delayed object and use dd.from_delayed to construct the dense dask collection.
sparsity/dask/core.py
Outdated
| ------- | ||
| res: dd.DataFrame | dd.Series | ||
| """ | ||
| meta = dd.from_pandas(self._meta.todense(), npartitions=1) |
There was a problem hiding this comment.
Meta could be just self._meta.todense()
There was a problem hiding this comment.
Not really, we need dask object here
| if idx is None: | ||
| return self.copy() | ||
| # we have a hidden zero column to replace missing indices (-1) | ||
| new_data = self._data.T[idx].T[:-1] |
There was a problem hiding this comment.
This should work on empty frames if scipy>=1.0.0
This leverages the dask.delayed object api to achieve the same result which was previously a hack between map_partitions and initializing dd.DataFrame directy.
No description provided.