Skip to content

Conversation

@Jorropo
Copy link
Contributor

@Jorropo Jorropo commented Mar 16, 2022

Fixes #8794
Fixes #8791

Todo:

  • Add tests!

This adds:

  • skip-raw-leaves option, that avoids downloading raw leaves when you
    already know their size in the parrent block
  • multithreaded download
  • properly count blocks that show up multiple times with different
    multicodec
  • do not count size of inlined blocks

Speed:
Way way faster (it's hard to make an objective number because that depends a lot on how much multithreading helps you and how many raw-leaves block you skip), in my case: >100x faster.

@Jorropo Jorropo marked this pull request as draft March 16, 2022 15:17
@Jorropo Jorropo self-assigned this Mar 16, 2022
@Jorropo Jorropo force-pushed the better/ipfs-dag-stat branch from bf9eb70 to f1d673f Compare March 16, 2022 15:28
Fixes ipfs#8794
Fixes ipfs#8791

This adds:
- skip-raw-leaves option, that avoids downloading raw leaves when you
already know their size in the parrent block
- multithreaded download
- properly count blocks that show up multiple times with different
multicodec
- do not count size of inlined blocks

Speed:
Way way faster (it's hard to make an objective number because that depends a lot on how much multithreading helps you and how many raw-leaves block you skip), in my case: >100x faster.
@Jorropo Jorropo force-pushed the better/ipfs-dag-stat branch 2 times, most recently from 9b4bf2a to 90dbf55 Compare March 16, 2022 15:29
That bundles all raw leaves emissions in skip mode in one.
@Jorropo Jorropo mentioned this pull request Mar 17, 2022
3 tasks
@BigLep BigLep added this to the Best Effort Track milestone Mar 18, 2022
@BigLep
Copy link
Contributor

BigLep commented Mar 18, 2022

@aschmahmann will provide first round of feedback to determine if this is something that we want to land.

@BigLep
Copy link
Contributor

BigLep commented Mar 18, 2022

General comment: it looks like this PR is covering multiple issues. We think it's best to have the PRs separate in case there are different levels of alignment on the issues themselves.

@aschmahmann
Copy link
Contributor

skip-raw-leaves option, that avoids downloading raw leaves when you already know their size in the parent block

This seems specific to dag-pb and won't work for anything else. I don't think we should be adding anything dag-pb specific in here. Almost all dag-pb data is UnixFS and so people could look at ipfs files commands for this.

multithreaded download

This seems good, although this will become problematic with the introduction of selectors as there is currently no parallelizable selector traversal. I'd rather invest time/code complexity budget in parallelizable selector traversals if you're interested.

properly count blocks that show up multiple times with different multicodec

Seems reasonable.

do not count size of inlined blocks

Seems reasonable.

@Jorropo
Copy link
Contributor Author

Jorropo commented Mar 18, 2022

@aschmahmann

Almost all dag-pb data is UnixFS and so people could look at ipfs files commands for this.

UnixFS and ipfs files by extension does not provide as good information:

  • it counts identical blocks that show up multiple times, multiple times
  • it counts inlined blocks
  • it doesn't provide you the block count (btw if you only care about the block count and not the size, this code would also work with other codecs, I do say would because there is no block count only option and the usage of it is likely pretty limited)

@Jorropo Jorropo mentioned this pull request Apr 1, 2022
@Jorropo
Copy link
Contributor Author

Jorropo commented Apr 1, 2022

Closed in favor of #8843, I do not plan to work on rawleaf skipping or multithreaded downloads anytime soon.

@Jorropo Jorropo closed this Apr 1, 2022
@Jorropo Jorropo deleted the better/ipfs-dag-stat branch December 12, 2022 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Archived in project

3 participants