Skip to content

Excessively large number of inbound /ipfs/id/push/1.0.0 streams with v0.21.0-rc1 #9957

@mrd0ll4r

Description

@mrd0ll4r

Checklist

Installation method

built from source

Version

Compiled from tag v0.21.0-rc1 with Go 1.20.5:

Kubo version: 0.21.0-rc1
Repo version: 14
System version: amd64/linux
Golang version: go1.20.5

Config

# Modified as such:

ipfs config profile apply server

ipfs config --bool 'Swarm.ResourceMgr.Enabled' false

ipfs config --json 'Swarm.ConnMgr' '{
  "GracePeriod": "0s",
  "HighWater": 100000,
  "LowWater": 0,
  "Type": "basic"
}'

ipfs config --bool 'Swarm.RelayService.Enabled' false

Description

I'm that guy running https://grafana.monitoring.ipfs.trudi.group
This is our setup.
In particular, we run two daemons in docker-compose, see here.
The images are build using this Dockerfile
and configured using this script.

I recently moved from v0.18.1 to v0.21.0-rc1. I did not change the config mods I have been running before. We have a plugin to export Bitswap messages and information from the Peerstore (this is called every few minutes by an external client). We also export information about the peer store to Prometheus, see here.

It's mostly running fine, although with fewer connections, but that's probably just a question of time.
I noticed, however, that I'm approaching 1M goroutines per daemon, which is quite a bit more than before, see here.
I believe this might be connected to the number of inbound /ipfs/id/push/1.0.0 streams I have, see here.

Interestingly, the (linear with time) rise in inbound streams does not happen immediately when we start the daemons, and not at the same time for both daemons, although the were started within seconds of each other, see this graph. The second daemon follows a few hours later. Because the symptoms don't show up at the same time in both daemons, it doesn't feel like this is directly related to our regular data exports. It feels more like some concurrency bug in kubo that shows up only after a while. This is the graph in question, in case Grafana doesn't work:
image
The daemons did not restart in between (there's a panel for that somewhere).

Not too sure what's going on here. Let me know if I can help debug. I wonder if this is related to how we're exporting data from the Peerstore -- we're only using public functionality, was there some API change I missed, some cleanup or something? I will try running without our client for a while.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Critical: Tackled by core team ASAPkind/bugA bug in existing code (including security flaws)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions