Skip to content

Update indexer instantiation. Allow loc from index with duplicates.#46

Merged
kayibal merged 12 commits intomasterfrom
fix/indexing-duplicates
Aug 22, 2018
Merged

Update indexer instantiation. Allow loc from index with duplicates.#46
kayibal merged 12 commits intomasterfrom
fix/indexing-duplicates

Conversation

@kayibal
Copy link
Copy Markdown

@kayibal kayibal commented Aug 21, 2018

Ended up updating to pandas>=0.23.0 and also fixing circle ci

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 21, 2018

Codecov Report

Merging #46 into master will decrease coverage by 0.2%.
The diff coverage is 92.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #46      +/-   ##
==========================================
- Coverage   88.36%   88.16%   -0.21%     
==========================================
  Files           7        7              
  Lines        1143     1174      +31     
==========================================
+ Hits         1010     1035      +25     
- Misses        133      139       +6
Impacted Files Coverage Δ
sparsity/sparse_frame.py 87.42% <92.68%> (-0.45%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d5fd2b...c21824a. Read the comment docs.

@kayibal kayibal requested review from michcio1234 and vitords August 21, 2018 13:39
Copy link
Copy Markdown

@vitords vitords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

new_data = SparseFrame(new_mat, columns=index,
index=self.index)
else:
raise ValueError('Only supported aces are 0 and 1.')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*axes

assert res.index.date.min() == dt.date(2016, 1, 15)
res = res.compute(get=get_sync)
assert res.index.levels[0].max().date() == dt.date(2016, 2, 15)
assert res.index.levels[0].min().date() == dt.date(2016, 1, 15)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use datetime if you already need/have pandas, but that's so lightweight I don't think it justifies changing it.

Copy link
Copy Markdown

@michcio1234 michcio1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that we can use new pandas now. I have just a few comments and one concern (the one with __getitem__).

"""Create an indexer like _name in the class."""
if getattr(cls, name, None) is None:
_v = int(pd.__version__.split('.')[1])
if _v >= 23:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with this for now, but it would be prettier to do it like this:

tuple(map(int, pd.__version__.split('.')))
Out[6]: (0, 22, 0)
tuple(map(int, pd.__version__.split('.'))) > (0, 23, 0)
Out[7]: False

if item is not None and len(item) > 0:
return self.reindex_axis(item, axis=1)
else:
return self
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I do my_frame[[]], I'll get the whole frame in return? I think it should return a frame with no columns.

res = sp.SparseFrame.concat(res.compute(get=get_sync).tolist())
assert res.index.date.max() == dt.date(2016, 2, 15)
assert res.index.date.min() == dt.date(2016, 1, 15)
res = res.compute(get=get_sync)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't get kwarg deprecated?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noo but it's superfluous

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's been deprecated:
dask/dask@2036040

assert len(sf.loc['A'].index) == 3
assert len(sf.loc['B'].index) == 2
assert np.all(sf.loc['A'].todense().values == np.identity(5)[:3])
assert np.all(sf.loc['B'].todense().values == np.identity(5)[3:])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you wanted to test columns too, but didn't do it.

@kayibal
Copy link
Copy Markdown
Author

kayibal commented Aug 22, 2018

@michcio1234 update

if not isinstance(item, (tuple, list)):
item = [item]
if item is not None and len(item) > 0:
if len(item) > 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kayibal Previously you took care of the situation when None is passed, now you don't. I'm not sure if it's necessary, just wanted to drop a hint.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should raise an error

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's great.

Copy link
Copy Markdown

@michcio1234 michcio1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good.

@kayibal kayibal merged commit 8d2f8f6 into master Aug 22, 2018
@kayibal kayibal deleted the fix/indexing-duplicates branch August 23, 2018 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants