-
Notifications
You must be signed in to change notification settings - Fork 238
IPIP-499: UnixFS CID Profiles #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 17 commits
8842176
4ba68f0
6cc64cb
d8b8389
600d1fc
595588c
41f9b86
229988f
f37e610
7a12f0a
ff69e56
09baf68
cffade8
0402c84
ec07e30
f454912
9c621ba
c109c1a
383f9e3
e564968
bbd547f
70514b9
89c9c62
92352d7
9d0d415
a3dc7e2
94a1b79
7a8d6ab
a3044d6
5b19f2b
3a092a4
123be3d
263892a
e2f95dd
b832bcc
d7e81d7
26162e2
37132f1
0188e10
273a2d3
62d3cae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,132 @@ | ||||||||||
| --- | ||||||||||
| title: 'IPIP-0499: CID Profiles' | ||||||||||
| date: 2025-04-03 | ||||||||||
| ipip: proposal | ||||||||||
| editors: | ||||||||||
| - name: Michelle Lee | ||||||||||
| github: mishmosh | ||||||||||
| affiliation: | ||||||||||
| name: IPFS Foundation | ||||||||||
| - name: Daniel Norman | ||||||||||
| github: 2color | ||||||||||
| affiliation: | ||||||||||
| name: Shipyard | ||||||||||
| url: https://ipshipyard.com | ||||||||||
| relatedIssues: | ||||||||||
| - https://discuss.ipfs.tech/t/should-we-profile-cids/18507 | ||||||||||
| order: 0499 | ||||||||||
| tags: ['ipips'] | ||||||||||
| --- | ||||||||||
|
|
||||||||||
| ## Summary | ||||||||||
|
|
||||||||||
| This proposal introduces configuration profiles for CIDs used to represent files and directories with UnixFS. These ensure that the deterministic CID generation for the same data, regardless of the implementation. | ||||||||||
|
|
||||||||||
| Profiles explicitly define the UnixFS parameters, e.g. dag width, hash algorithm, and chunk size, that affect the resulting CID, such that given the profile and input data different implementations will generate identical CIDs. | ||||||||||
|
|
||||||||||
| ## Motivation | ||||||||||
|
|
||||||||||
| UnixFS CIDs are not deterministic. This means that the same file tree can yield different CIDs depending on the parameters used by the implementation to generate it, which in some cases, aren't even configurable by the user. For example, the chunk size, DAG width, and layout can vary between implementations or even between different versions of the same implementation. | ||||||||||
|
|
||||||||||
| This lack of determinism makes has a number of drawbacks: | ||||||||||
|
|
||||||||||
| - It is difficult to verify content across different tools and implementations, as the same content may yield different CIDs. | ||||||||||
| - Users are required to store and transfer UnixFS merkle proofs in order to verify CIDs, adding storage overhead, network bandwidth, and complexity to the verification process. | ||||||||||
| - In terms of developer experience, it deviates from the mental model of a hash function, where the same input should always yield the same output. This leads to confusion and frustration when working with UnixFS CIDs | ||||||||||
|
|
||||||||||
| By introducing profiles which define the parameters that affect the root CID of the DAG, we can benefit from both the optionality offered by UnixFS, where users are free to chose their own parameters, and determinism through profiles. | ||||||||||
|
|
||||||||||
| ## Detailed design | ||||||||||
|
|
||||||||||
| We introduce a set of named profiles that define a set of parameters for generating UnixFS CIDs. These profiles can be used by implementations to ensure that the same content will yield the same CID across different tools and implementations. | ||||||||||
|
|
||||||||||
| ### UnixFS parameters | ||||||||||
|
|
||||||||||
| The profiles define a set of parameters that affect the resulting string encoding of the CID. These parameters are based on the UnixFS specification and are used to generate the CID for a given file tree. The parameters include: | ||||||||||
|
|
||||||||||
| 1. CID version, e.g. CIDv0 or CIDv1 | ||||||||||
| 1. Multibase encoding for the CID, e.g. base32 | ||||||||||
| 1. Hash function used for all nodes in the DAG, e.g. sha2-256 | ||||||||||
| 1. UnixFS file chunking algorithm | ||||||||||
| 1. UnixFS file chunk size or target (if required by the chunking algorithm) | ||||||||||
| 1. UnixFS DAG layout (e.g. balanced, trickle etc...) | ||||||||||
| 1. UnixFS DAG width (max number of links per `File` node) | ||||||||||
| 1. `HAMTDirectory` bitwidth, i.e. the number of bits determines the fanout of the `HAMTDirectory` (default bitwidth is 8 == 256 leaves). | ||||||||||
| 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links | ||||||||||
|
||||||||||
| 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links | |
| 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links. We do not include details about the estimation algorithm as we do not encourage implementations to support it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit odd to discourage, when both most popular implementations in GO and JS use size-based heurstic - #499 (comment)
Unsure how to handle this. Perhaps clarify the heuristic is implementation-specific, and when deterministic behavior is expected, a specific heuristic should be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be estimating the block size as it's trivial to calculate it exactly. Can we not just define this (and punt to the spec for the details) to make it less hand-wavey?
| 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links | |
| 1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on the final size of the serialized form of the [PBNode protobuf message](https://specs.ipfs.tech/unixfs/#dag-pb-node) that represents the directory. |
mishmosh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
mishmosh marked this conversation as resolved.
Show resolved
Hide resolved
mishmosh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
mishmosh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting this is (imo) a blocker.
We did not merge UnixFS spec until we had sensible set of fixtures that people could use as reference.
The spec may be incomplete, but a fixture will let people reverse-engineer any details, and then PR improvement to spec.
Without fixtures for each UnixFS node type, we risk unknown unknown silently impacting final CID (e.g. because we did not know that someone may decide to place leaves one level sooner as "optimization" and someone else always at bottom, as "formal consistency")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking this in ipfs/kubo#11071
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
- I will implement
kubo-*profiles as part of 0.40 and test fixtures will be part of that work. - Then we will be able to link to them form spec, like we did in https://specs.ipfs.tech/unixfs/#appendix-test-vectors
Uh oh!
There was an error while loading. Please reload this page.