Skip to content

Optimization of distributed procedures#38

Merged
kayibal merged 11 commits intomasterfrom
optimization/distributed
Apr 19, 2018
Merged

Optimization of distributed procedures#38
kayibal merged 11 commits intomasterfrom
optimization/distributed

Conversation

@kayibal
Copy link
Copy Markdown

@kayibal kayibal commented Apr 1, 2018

Some smaller fixes and alternative procedures for the simple cases. Also adds renaming.

kayibal added 9 commits March 7, 2018 13:49
# Conflicts:
#	sparsity/test/test_dask_sparse_frame.py
implements distributed rename method and adds quicker routines to groupby_sum if divisions are known. Adds support for joining sp.SparseFrames onto a distributed SparseFrame.
@kayibal kayibal requested review from michcio1234 and vitords April 1, 2018 23:21
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2018

Codecov Report

Merging #38 into master will decrease coverage by 1.08%.
The diff coverage is 52.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage   85.55%   84.47%   -1.09%     
==========================================
  Files           7        7              
  Lines        1094     1108      +14     
==========================================
  Hits          936      936              
- Misses        158      172      +14
Impacted Files Coverage Δ
sparsity/dask/multi.py 84.48% <ø> (ø) ⬆️
sparsity/dask/core.py 79% <52.94%> (-3.29%) ⬇️
sparsity/dask/io.py 67.24% <0%> (+1.72%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d85502...b937e2f. Read the comment docs.

@michcio1234
Copy link
Copy Markdown

@kayibal, should it be rebased onto 87f9928 (Distributed groupby sum operation (#35))?

Copy link
Copy Markdown

@michcio1234 michcio1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I have only minor remarks.

from .shuffle import sort_index
return sort_index(self, npartitions=npartitions, divisions=None)

def groupby_sum(self, split_out=1, split_every=8):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring with arguments description.

(np.array(list('0123'*25)).astype(int), False),
(np.array(list('0123'*25)).astype(float), False),
])
def test_groupby_sum(idx, sorted):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename sorted parameter as currently it's misleading. Maybe known?

sort=False,
)
ddf2 = ddf.assign(idx=ddf.index).set_index('idx')
# ddf2.compute()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like ddf2 is unused.

kayibal added 2 commits April 19, 2018 17:56
# Conflicts:
#	sparsity/dask/core.py
#	sparsity/dask/shuffle.py
#	sparsity/test/test_dask_sparse_frame.py
@kayibal kayibal merged commit 9352ea3 into master Apr 19, 2018
@kayibal kayibal deleted the optimization/distributed branch April 19, 2018 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants