Skip to content

fix: set LimitNOFILE=infinity for thymis-agent systemd service#687

Open
elikoga wants to merge 1 commit intomasterfrom
fix-agent-fd-limit
Open

fix: set LimitNOFILE=infinity for thymis-agent systemd service#687
elikoga wants to merge 1 commit intomasterfrom
fix-agent-fd-limit

Conversation

@elikoga
Copy link
Copy Markdown
Member

@elikoga elikoga commented Mar 27, 2026

Problem

The thymis-agent process leaks one file descriptor per relay-proxied connection that ends via remote EOF (sshd closes its side before the relay sends RtEConnectionCloseMessage). In edge_agent.py, read_from_tcp_and_send() silently breaks on empty read without closing the writer or removing from active_connections.

The default systemd soft fd limit is 1024. Once exhausted, asyncio.open_connection() fails with [Errno 16] Device or resource busy (Python 3.13 happy-eyeballs translates EMFILE as EBUSY). This brought down homepi4's terminal/SSH relay today.

Fix

Set LimitNOFILE=infinity as a stopgap while the underlying leak in edge_agent.py is addressed separately.

Root cause (to fix next)

In edge_agent.pyinitiate_connection() → inner read_from_tcp_and_send(): add a try/finally that calls writer.close() + active_connections.pop() when the read loop exits.

Agent accumulates open fds per relay-proxied connection (socket not
closed on remote EOF). Default soft limit of 1024 is reachable under
normal terminal usage, causing asyncio.open_connection() to fail with
EBUSY (Python 3.13 happy-eyeballs failure path on EMFILE). This is a
stopgap; the underlying fd leak in edge_agent.py should also be fixed.
@elikoga
Copy link
Copy Markdown
Member Author

elikoga commented Mar 27, 2026

Not a long term fix but it pushes back some cannot-connect type of crashes/resource leaks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant