-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
go-ipfs 0.13 shipped with #8853 which introduced a bad-aid where we hide "size" column in big directories.
This allows us to skip child block resolution for directories bigger than 100 items, making the entire thing load really fast.
Sadly, directories smaller than 100 are still slow.
They load slower than a directory with 101 items.
Why showing size and type is expensive
The root node of every UnixFS DAG includes information about node type and the size of raw data inside of it (without metadata).
It also has links to other DAGs representing files and directories, and the total size of DAGs they represent (data + metadata)
The "size" on directory listing is the data without metadata.
To know the exact size and type of items in a directory listing, every item triggers additional block fetch, which baloons the time it takes to return response with a directory listing.
Proposed Change
Replace "size" based on raw dfata with "DAG size" based on Tsize value already present in the root UnixFS node (see logical format). The interface will look the same.
What is user benefit?
Every directory will load fast, as soon the root UnixFS node is available.
This makes an extreme difference on the first load, when the directory is not present in local cache.
Loading bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm (10k items)
will be as fast as a directory with 100.
Won't this be causing issues?
These values are provided in HTML dir listing only for quick eyeballing, and the difference between raw data and Tsize will usually be small enough to not impact this purpose:
- small files that fit into a single raw block will have
- big file size won't be significantly impacted by ipld metadata
But just to be sure, we will add on-hover tooltip explanation of the value (right now, there is none).
Is this worth it?
To illustrate, I started a new, empty repo each time, and listed a directory with 1864 items.
go-ipfs 0.12 (which fetched every UnixFS child block to read the size) took nearly 3 minutes:
$ time ipfs ls -s --size=true /ipfs/QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
0.59s user 0.13s system 0% cpu 2:37.03 totalSame directory, but listed with change to Tsize:
$ time ipfs dag get --output-codec=dag-json QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
0.44s user 0.33s system 126% cpu 0.611 totalFrom nearly 3 minutes to under 1s. I'd say worth it.
