Skip to content

gateway/dir-index-html: switch dir listing sizes to Tsize #9058

@lidel

Description

@lidel

This was inspired by #8178, #8455

go-ipfs 0.13 shipped with #8853 which introduced a bad-aid where we hide "size" column in big directories.
This allows us to skip child block resolution for directories bigger than 100 items, making the entire thing load really fast.

Sadly, directories smaller than 100 are still slow.
They load slower than a directory with 101 items.

Why showing size and type is expensive

The root node of every UnixFS DAG includes information about node type and the size of raw data inside of it (without metadata).
It also has links to other DAGs representing files and directories, and the total size of DAGs they represent (data + metadata)

The "size" on directory listing is the data without metadata.
To know the exact size and type of items in a directory listing, every item triggers additional block fetch, which baloons the time it takes to return response with a directory listing.

Proposed Change

Replace "size" based on raw dfata with "DAG size" based on Tsize value already present in the root UnixFS node (see logical format). The interface will look the same.

2022-06-24_23-23

What is user benefit?

Every directory will load fast, as soon the root UnixFS node is available.
This makes an extreme difference on the first load, when the directory is not present in local cache.

Loading bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm (10k items)
will be as fast as a directory with 100.

Won't this be causing issues?

These values are provided in HTML dir listing only for quick eyeballing, and the difference between raw data and Tsize will usually be small enough to not impact this purpose:

  • small files that fit into a single raw block will have
  • big file size won't be significantly impacted by ipld metadata

But just to be sure, we will add on-hover tooltip explanation of the value (right now, there is none).

Is this worth it?

To illustrate, I started a new, empty repo each time, and listed a directory with 1864 items.

go-ipfs 0.12 (which fetched every UnixFS child block to read the size) took nearly 3 minutes:

$ time ipfs ls -s --size=true /ipfs/QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
0.59s user 0.13s system 0% cpu 2:37.03 total

Same directory, but listed with change to Tsize:

$ time ipfs dag get --output-codec=dag-json QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm
0.44s user 0.33s system 126% cpu 0.611 total

From nearly 3 minutes to under 1s. I'd say worth it.

Metadata

Metadata

Assignees

Labels

kind/enhancementA net-new feature or improvement to an existing featureneed/analysisNeeds further analysis before proceedingneed/maintainers-inputNeeds input from the current maintainer(s)need/triageNeeds initial labeling and prioritizationtopic/gatewayTopic gatewaytopic/perfPerformance

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions