Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Federation reader stops processing incoming requests after database crash #8470

@chr-1x

Description

@chr-1x

Description

Following my postgres instance being OOMkilled (a presumably unrelated issue), my federation reader worker stops processing incoming events (or processes them extremely slowly):

Here's the database server's memory usage chart showing the time at which the crash occurred:
image

Stacked up with the requests-in-flight (dark red is PUT FederationSendServlet on my federation_reader worker):
image

and age of last processed event (the new events that do come in are probably due to local activity?)
image

(I can provide other metrics graphs for this period upon request)

Note that the rest of the server continued working fine, it could exchange local messages and sync with clients without issues.

Log excerpt from the time of the crash attached (note that it appears to recover, the logs continue as if it were processing incoming requests but it doesn't seem to be reflected in the above graphs (or the observed behavior that messages from other servers stop coming in).
federation_reader.log.txt

Steps to reproduce

(note: I haven't attempted to reproduce this in isolation, but it has happened multiple times in situ with my current configuration)

  1. Set up the homeserver, with a postgres database, separate synapse.app.generic_worker handling the ^/_matrix/federation/v1/send/ endpoint and redis replication.

(My worker config:
federation_reader.yaml.txt

  1. Kill postgres

Expected: possibly a few requests error out, but the worker should recover after the database comes back up

Actual: worker stops processing requests until killed and restarted

Version information

  • Homeserver: matrix.cybre.space

If not matrix.org:

  • Version:
{
   "python_version": "3.6.8", 
   "server_version": "1.20.1 (b=master,86a72d1)" ,
}
  • Install method: pip

  • Platform: Ubuntu 18.04 VPS, not containerized.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-FederationT-DefectBugs, crashes, hangs, security vulnerabilities, or other reported issues.z-bug(Deprecated Label)z-p2(Deprecated Label)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions