Get size of annexed files from keys where possible by jwodder · Pull Request #86 · datalad/datalad-fuse

jwodder · 2022-11-23T15:28:47Z

Closes #84.

codecov · 2022-11-23T15:44:48Z

Codecov Report

Base: 71.06% // Head: 71.51% // Increases project coverage by +0.44% 🎉

Coverage data is based on head (8412c61) compared to base (5421d3a).
Patch coverage: 70.32% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #86      +/-   ##
==========================================
+ Coverage   71.06%   71.51%   +0.44%     
==========================================
  Files          11       11              
  Lines         788      839      +51     
==========================================
+ Hits          560      600      +40     
- Misses        228      239      +11

Impacted Files	Coverage Δ
datalad_fuse/fuse_.py	`25.98% <8.69%> (-0.34%)`	⬇️
datalad_fuse/fsspec.py	`83.12% <70.00%> (-1.35%)`	⬇️
datalad_fuse/utils.py	`95.83% <94.33%> (-4.17%)`	⬇️
datalad_fuse/tests/test_fuse.py	`100.00% <100.00%> (ø)`
datalad_fuse/tests/test_util.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

yarikoptic · 2022-11-23T18:06:10Z

I am curious -- have you tried this branch on 000026 in https://github.com/dandi/dandisets-healthstatus -- did it provide remedy for the slow "traversal"?

jwodder · 2022-11-23T18:10:39Z

@yarikoptic I'd rather not try the healthcheck on this unless #83 was merged in so I could rebase on top of it.

yarikoptic · 2022-11-23T18:32:02Z

@yarikoptic I'd rather not try the healthcheck on this unless #83 was merged in so I could rebase on top of it.

take #83 out of draft? ;-)

jwodder · 2022-11-23T18:47:21Z

@yarikoptic Traversing 000026 using this branch now takes about 2 or 3 minutes (I don't have an exact time).

yarikoptic

looks great! Just one comment possibly to act on -- let's not bother with commit date for the files under .git/annex/objects

yarikoptic · 2022-11-23T18:28:10Z

datalad_fuse/utils.py



+@dataclass
+class AnnexKey:


eh, we better have/re-use this construct in DataLad to avoid duplicating it across codebase. Would be useful/replace AnnexRepo.get_size_from_key (https://github.com/datalad/datalad/blob/HEAD/datalad/support/annexrepo.py#L560) and useful for _sanitize_key (https://github.com/datalad/datalad/blob/HEAD/datalad/support/annex_utils.py) -- probably get to_filename for that purpose

Are you telling me to use those DataLad functions here, or to copy AnnexKey to DataLad, or something else?

I was just saying that eventually we might want to "borrow" your construct from here, I like it, instead of our functions in datalad.

yarikoptic · 2022-11-23T19:14:06Z

datalad_fuse/fuse_.py

+                            r = mkstat(
+                                is_file=True,
+                                size=iadok.size,
+                                timestamp=self._adapter.get_commit_datetime(path),


aren't we under .git/annex/objects here and thus the commit date wouldn't really be pertinent to that key file ? then let's just use some arbitrary timestamp -- e.g. fixed timestamp on when we started this fusefs instance. Should help us to save some cpu cycles

The commit date is cached when the adapter for the (sub)dataset is created [link], so there aren't many cycles to save.

it still needs to do some traversal to figure out the top of the dataset right? indeed might be negligible though

ok, let's proceed for now as is, and optimize if we see it adds penalty

datalad_fuse/fuse_.py

yarikoptic · 2022-11-23T20:26:18Z

@yarikoptic Traversing 000026 using this branch now takes about 2 or 3 minutes (I don't have an exact time).

ok, not super fast but much better than before and given number of files -- not too bad really. Would be worth py-spy top'ing it to see where time is spent. Let's proceed with this as already significant improvement.

jwodder added the performance Improve performance of an existing feature label Nov 23, 2022

jwodder force-pushed the gh-84 branch from b4882f7 to aa83c0a Compare November 23, 2022 15:38

jwodder marked this pull request as ready for review November 23, 2022 16:11

jwodder added 2 commits November 23, 2022 13:33

Add a class for parsed git-annex keys

3478e70

Get size of annexed files from keys where possible

8412c61

jwodder force-pushed the gh-84 branch from 96b5780 to 8412c61 Compare November 23, 2022 18:34

yarikoptic requested changes Nov 23, 2022

View reviewed changes

yarikoptic added the release Create a release when this pr is merged label Nov 23, 2022

yarikoptic merged commit 6c2a7dc into master Nov 23, 2022

yarikoptic deleted the gh-84 branch November 23, 2022 20:26

Conversation

jwodder commented Nov 23, 2022

Uh oh!

codecov bot commented Nov 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yarikoptic commented Nov 23, 2022

Uh oh!

jwodder commented Nov 23, 2022

Uh oh!

yarikoptic commented Nov 23, 2022

Uh oh!

jwodder commented Nov 23, 2022

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

yarikoptic Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

jwodder Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

yarikoptic Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

yarikoptic Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

jwodder Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

yarikoptic Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

yarikoptic Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yarikoptic commented Nov 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 23, 2022 •

edited

Loading