Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/19394.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Prevent excessively long numbers for the retry interval of `WorkerLock`s. Contributed by Famedly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #19390 (comment) (another Famedly PR),

I am submitting this PR as an employee of Famedly, who has signed the corporate CLA, and used my company email in the commit.

I assume the same applies here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is correct. Will we have to state such each time we upstream changes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain here, we didn't have any corporate CLA signed from Famedly at the time but just got confirmation today that this has now happened ⏩

The whole https://github.com/famedly org is now excempt from the CLA check but you need to make your org membership public in order for the check to be able to work.

8 changes: 4 additions & 4 deletions synapse/handlers/worker_lock.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,8 +275,8 @@ async def __aexit__(

def _get_next_retry_interval(self) -> float:
next = self._retry_interval
self._retry_interval = max(5, next * 2)
if self._retry_interval > Duration(minutes=10).as_secs(): # >7 iterations
self._retry_interval = min(Duration(minutes=15).as_secs(), next * 2)
if self._retry_interval > Duration(minutes=10).as_secs(): # >12 iterations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have these as constants WORKER_LOCK_MAX_RETRY_INTERVAL and WORKER_LOCK_WARN_RETRY_INTERVAL (perhaps better name) so we can share better describe these values.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had actually considered such before also considering that a more flexible approach for different locks may be worth exploring. For example: when a lock is taken because an event is being persisted, the retry interval could be capped to a much smaller value, and the same for the logging of excessive times. Whereas, instead, a lock for purging a room might start with a longer retry interval but keep the cap the same.

Perhaps as defaults, if that exploration bears fruit. I'll shall add that to my notes for more work in this area, but I would rather do such separately

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to tackle that in a separate PR.

In the mean-time, I think my original suggestion still makes sense. And we can also assert that WORKER_LOCK_MAX_RETRY_INTERVAL > WORKER_LOCK_WARN_RETRY_INTERVAL so we always get a warning about excessive timeout.

logger.warning(
"Lock timeout is getting excessive: %ss. There may be a deadlock.",
self._retry_interval,
Expand Down Expand Up @@ -362,8 +362,8 @@ async def __aexit__(

def _get_next_retry_interval(self) -> float:
next = self._retry_interval
self._retry_interval = max(5, next * 2)
if self._retry_interval > Duration(minutes=10).as_secs(): # >7 iterations
self._retry_interval = min(Duration(minutes=15).as_secs(), next * 2)
if self._retry_interval > Duration(minutes=10).as_secs(): # >12 iterations
logger.warning(
"Lock timeout is getting excessive: %ss. There may be a deadlock.",
self._retry_interval,
Expand Down
Loading