Skip to content
Open
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
8842176
Create ipip-0000.md
mishmosh Apr 3, 2025
4ba68f0
Update and rename ipip-0000.md to ipip-0499.md
mishmosh Apr 3, 2025
6cc64cb
add extra attributes proposed in review
lidel Apr 15, 2025
d8b8389
incorporate kubo#10774
lidel Apr 15, 2025
600d1fc
Merge branch 'main' into patch-1
bumblefudge May 5, 2025
595588c
Update src/ipips/ipip-0499.md
2color Aug 12, 2025
41f9b86
add daniel as editor
2color Aug 12, 2025
229988f
edit summary and motivation
2color Aug 12, 2025
f37e610
edit summary
2color Aug 12, 2025
7a12f0a
edit parameters and design
2color Aug 12, 2025
ff69e56
edit user benefit and compatibility
2color Aug 12, 2025
09baf68
refine parameters and introduce a named profile
2color Aug 12, 2025
cffade8
Apply suggestions from code review
2color Aug 20, 2025
0402c84
edit based on hector's feedback
2color Aug 20, 2025
ec07e30
Apply suggestions from code review
2color Aug 20, 2025
f454912
add multibase encoding
2color Aug 20, 2025
9c621ba
address feedback from rvagg
2color Aug 20, 2025
c109c1a
Update ipip-0499.md
mishmosh Nov 15, 2025
383f9e3
Update src/ipips/ipip-0499.md
mishmosh Nov 20, 2025
e564968
Update src/ipips/ipip-0499.md
mishmosh Nov 20, 2025
bbd547f
Update src/ipips/ipip-0499.md
lidel Nov 20, 2025
70514b9
fix typo (the the)
mishmosh Nov 21, 2025
89c9c62
Merge branch 'main' into patch-1
lidel Dec 12, 2025
92352d7
feat(ipip-0499): add chunking algorithm and align profile tables
lidel Dec 12, 2025
9d0d415
fix(ipip-0499): correct kubo legacy profile
lidel Dec 12, 2025
a3dc7e2
fix(ipip-0499): document legacy profile filtering behavior
lidel Dec 13, 2025
94a1b79
fix(ipip-0499): note that legacy table includes non-UnixFS implementa…
lidel Dec 13, 2025
7a8d6ab
feat(ipip-0499): add implementation versions to legacy profiles table
lidel Dec 13, 2025
a3044d6
fix(ipip-0499): update HAMTDirectory threshold and clean up parameters
lidel Dec 13, 2025
5b19f2b
feat(ipip-0499): document symlink handling in profiles
lidel Dec 13, 2025
3a092a4
fix(ipip-0499): clarify HAMTDirectory threshold calculation methods
lidel Dec 13, 2025
123be3d
fix(ipip-0499): update metadata and add contributors
lidel Dec 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions src/ipips/ipip-0499.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
title: 'IPIP-0499: UnixFS CID Profiles'
date: 2025-12-13
ipip: proposal
editors:
- name: Michelle Lee
github: mishmosh
affiliation:
name: IPFS Foundation
url: https://ipfsfoundation.org
- name: Daniel Norman
github: 2color
affiliation:
name: Independent
url: https://norman.life
- name: Marcin Rataj
github: lidel
affiliation:
name: Shipyard
url: https://ipshipyard.com/
relatedIssues:
- https://discuss.ipfs.tech/t/should-we-profile-cids/18507
thanks:
- name: Alex Potsides
github: achingbrain
affiliation:
name: Shipyard
url: https://ipshipyard.com/
- name: Juan Caballero
github: bumblefudge
affiliation:
name: IPFS Foundation
url: https://ipfsfoundation.org
- name: Hector Sanjuan
github: hsanjuan
affiliation:
name: Shipyard
url: https://ipshipyard.com/
- name: Steven Vandevelde
github: icidasset
- name: Christian Paul
github: jaller94
- name: Rod Vagg
github: rvagg
- name: Seth Docherty
github: SethDocherty
order: 0499
tags: ['ipips']
---

## Summary

This proposal introduces **configuration profiles** for CIDs that represent files and directories using [UnixFS](https://specs.ipfs.tech/unixfs/). The legacy profiles table also documents non-UnixFS implementations for reference.

## Motivation

UnixFS CIDs are currently non-deterministic. The same file or directory can produce different CIDs across implementations, because parameters like chunk size, DAG width, and layout vary between implementations. Often, these parameters are not even configurable by users.

This creates three problems:

- **Verification difficulty:** The same content produces different CIDs across tools, making content verification unreliable.
- **Additional overhead:** Users must store and transfer UnixFS merkle proofs to verify CIDs, adding storage overhead, network bandwidth, and complexity.
- **Broken expectations:** Unlike standard hash functions where identical input produces identical output, UnixFS CIDs behave unpredictably.

Configuration profiles solve this by explicitly defining all parameters that affect CID generation. This preserves UnixFS flexibility (users can still choose parameters) while enabling deterministic results.

## Detailed design

We introduce a set of **named configuration profiles**, each specifying the complete set of parameters for generating UnixFS CIDs. When implementations use these profiles, they guarantee that the same input, processed with the same profile, will yield the same CID across different tools and implementations.

### UnixFS parameters

Here is the complete set of UnixFS parameters that affect the resulting string encoding of the CID:

1. CID version, e.g. CIDv0 or CIDv1
1. Multibase encoding for the CID, e.g. `base32`
1. Hash function used for all nodes in the DAG, e.g. `sha2-256`
1. UnixFS file chunking algorithm
1. UnixFS file chunk size or target (if required by the chunking algorithm)
1. UnixFS DAG layout, e.g. `balanced`, `trickle`
1. UnixFS DAG width (max number of links per `File` node)
1. `HAMTDirectory` fanout, i.e. the number of bits determines the fanout of the `HAMTDirectory` (default bitwidth is 8 == 256 leaves).
1. `HAMTDirectory` threshold: max `Directory` size before switching to `HAMTDirectory`. Size can be calculated using full serialized [PBNode](https://specs.ipfs.tech/unixfs/#dag-pb-node) size (recommended), or estimated by `PBNode.Links` size (name + CID), or link count (naive).
1. Leaf Envelope: either `dag-pb` or `raw`
1. Whether empty directories are included in the DAG. Some implementations may apply filtering.
1. Whether hidden entities (including dot files) are included in the DAG. Some implementations may apply filtering.
1. Directory wrapping for single files: in order to retain the name of a single file, some implementations have the option to wrap the file in a `Directory` with link to the file.
1. Presence and accurate setting of `Tsize`.
1. Symlink handling: preserved as UnixFS Type=4 nodes, or followed (dereferenced to target).

The [UnixFS spec](https://specs.ipfs.tech/unixfs/) defines Type=4 for symlinks with target path stored in the Data field.

## CID profiles

To enable consistent CID generation, we define a series of named profiles that specify complete UnixFS parameter sets. Profile names may have any prefix, but must end in `YYYY-MM`.

The initial profile in the series, **`unixfs-2025`**, captures the baseline default parameters used by multiple implementations as of November 2025.

| Parameter | `unixfs-2025` |
| ----------------------------- | ------------------------------------------------------- |
| CID version | CIDv1 |
| Hash function | sha2-256 |
| Chunking algorithm | fixed-size |
| Max chunk size | 1MiB |
| DAG layout | balanced |
| DAG width (children per node) | 1024 |
| `HAMTDirectory` fanout | 256 blocks |
| `HAMTDirectory` threshold | TODO (likely entire block size, as in Helia) |
| Leaves | raw |
| Empty directories | TODO (kubo needs opt-out flag) |
| Hidden entities | TODO |
| Symlinks | TODO (preserved?) |

## Legacy profiles

We also define a series of **legacy profiles**, used by various implementations as of November 2025:

| | `kubo-legacy-2025` (v0.39) | `helia-2025` | `storacha-2025` | `kubo-2025` | `kubo-wide-2025` | `dasl-2025` |
| ----------------------------- | ------------------------------ | --------------- | ------------------ | ------------------ | ----------------------- | ------------- |
| Based on | kubo v0.39 (`legacy-cid-v0`) | @helia/unixfs 6.0.4 | w3cli 7.12.0 | kubo v0.39 (`test-cid-v1`) | kubo v0.39 (`test-cid-v1-wide`) | 2025-12 |
| CID version | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
| Hash function | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 |
| Chunking algorithm | fixed-size | fixed-size | fixed-size | fixed-size | fixed-size | not specified |
| Max chunk size | 256KiB | 1MiB | 1MiB | 1MiB | 1MiB | not specified |
| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified |
| DAG width (children per node) | 174 | 1024 | 1024 | 174 | **1024** | not specified |
| `HAMTDirectory` fanout | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified |
| `HAMTDirectory` threshold | 256KiB (est:links[name+cid]) | 256KiB (est) | 1000 **links** | 256KiB (est:links[name+cid]) | **1MiB** (est:links[name+cid]) | not specified |
| Leaves | dag-pb | raw | raw | raw | raw | not specified |
| Empty directories | included | included | excluded | included | included | not specified |
| Hidden entities | opt-in | opt-in | opt-in | opt-in | opt-in | not specified |
| Symlinks | preserved | followed | followed | preserved | preserved | not specified |

**Terminology:**
- `included`: Always included in the DAG (no option to exclude)
- `excluded`: Always excluded from the DAG (no option to include)
- `opt-in`: Excluded by default; implementations provide a flag to include (e.g., `--hidden` in Kubo/Storacha, `hidden: true` in Helia)
- `opt-out`: Included by default; implementations provide a flag to exclude
- `preserved`: Symlinks stored as UnixFS Type=4 nodes with target path (per [UnixFS spec](https://specs.ipfs.tech/unixfs/)). Note: Kubo (v0.39) `--dereference-args` only follows symlinks passed as CLI arguments; symlinks found during recursive traversal are always preserved.
- `followed`: Symlinks dereferenced and treated as target files/directories

See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/

### User benefit

Profiles provide 3 key advantages for working with content-addressed data:

1. **Predictable, deterministic behavior:** Profiles restore the expected property of content addressing: identical input data always produces identical CIDs, regardless of which implementation generates them.

2. **Lightweight verification:** Users can verify content without needing to rely on additional merkle proofs or CAR files.

3. **Simplified workflow:** Users can select a profile and automatically get consistent CIDs across all implementations, without needing to configure or understand the underlying parameters.

### Compatibility

UnixFS data encoded with the CID profiles defined in this IPIP remains fully compatible with existing implementations, since it conforms to the [https://specs.ipfs.tech/unixfs/](specification).

To generate CIDs in compliance with this IPIP, implementations must support the parameters defined in the profiles and support the set of named profiles. They MAY also support legacy profiles.

* Adding new functionality to support parameters and/or profiles
* Exposing configuration options for profiles

### Alternatives

As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.

## Test fixtures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting this is (imo) a blocker.

We did not merge UnixFS spec until we had sensible set of fixtures that people could use as reference.

The spec may be incomplete, but a fixture will let people reverse-engineer any details, and then PR improvement to spec.

Without fixtures for each UnixFS node type, we risk unknown unknown silently impacting final CID (e.g. because we did not know that someone may decide to place leaves one level sooner as "optimization" and someone else always at bottom, as "formal consistency")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking this in ipfs/kubo#11071

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


TODO

List relevant CIDs. Describe how implementations can use them to determine
specification compliance. This section can be skipped if IPIP does not deal
with the way IPFS handles content-addressed data, or the modified specification
file already includes this information.

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).