Skip to content

Conversation

@yyforyongyu
Copy link
Member

Fix the following deadlock,

goroutine 12207 [sync.WaitGroup.Wait, 34 minutes]:
sync.runtime_SemacquireWaitGroup(0x2783f78?)
    runtime/sema.go:110 +0x25
sync.(*WaitGroup).Wait(0xc007ef21c0?)
    sync/waitgroup.go:118 +0x48
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).Stop(0x1fce980?)
    github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:866 +0x186
github.com/lightningnetwork/lnd/contractcourt.(*ChainArbitrator).ResolveContract(0xc000570788, {{0x60, 0x32, 0x94, 0xff, 0x72, 0x53, 0xc5, 0x9f, 0xf, ...}, ...})
    github.com/lightningnetwork/lnd/contractcourt/chain_arbitrator.go:546 +0x21c
github.com/lightningnetwork/lnd/contractcourt.(*ChainArbitrator).loadPendingCloseChannels.func4()
    github.com/lightningnetwork/lnd/contractcourt/chain_arbitrator.go:1361 +0x78
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).stateStep(0xc007e9f508, 0xdd387, 0x0, 0x0?)
    github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:1400 +0x1439
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).advanceState(0xc007e9f508, 0xdd387, 0x0, 0x0)
    github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:1673 +0x18e
github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).channelAttendant(0xc007e9f508, 0xdd376, 0xc0009fb4a0?)
    github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:2933 +0x81e
created by github.com/lightningnetwork/lnd/contractcourt.(*ChannelArbitrator).Start in goroutine 2319
    github.com/lightningnetwork/lnd/contractcourt/channel_arbitrator.go:493 +0x29e

@yyforyongyu yyforyongyu added this to the v0.20.0 milestone Jul 24, 2025
@yyforyongyu yyforyongyu self-assigned this Jul 24, 2025
@yyforyongyu yyforyongyu added channel closing Related to the closing of channels cooperatively and uncooperatively bug fix size/micro small bug fix or feature, less than 15 mins of review, less than 250 labels Jul 24, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yyforyongyu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical deadlock scenario within the contract court's arbitrator components. By refactoring the channel resolution signaling from a synchronous callback to an asynchronous channel-based notification, the potential for circular dependencies and blocking that led to the deadlock has been eliminated, improving the robustness of the system.

Highlights

  • Deadlock Resolution: Implemented an asynchronous signaling mechanism to resolve a deadlock occurring when a ChannelArbitrator attempts to stop while the ChainArbitrator is resolving a contract. The previous synchronous call from ChannelArbitrator to ChainArbitrator.ResolveContract has been replaced.
  • Asynchronous Channel Resolution: Introduced a new resolvedChan (lines 273-276) in ChainArbitrator to receive asynchronous notifications from ChannelArbitrator when a channel has been fully resolved. The ChainArbitrator now processes these resolution signals in its main event loop (dispatchBlocks, lines 718-726), preventing direct synchronous calls that could lead to deadlocks.
  • API Change: The ChannelArbitratorConfig now uses NotifyChannelResolved func() (lines 156-158 in channel_arbitrator.go) instead of MarkChannelResolved func() error, reflecting the change from a direct action to an asynchronous notification. The ChannelArbitrator now calls this new notification function (line 1396 in channel_arbitrator.go) when a channel is resolved.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a solid fix for the identified deadlock by decoupling the channel resolution notification from the contract resolution logic. The use of a channel to signal resolution asynchronously is a great approach. My review includes a few minor suggestions to align the code with the repository's style guide, primarily concerning line length for comments and log statements.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trying to understand how the dead lock occurs.

IIUC, this is only on shutdown? What exactly is the circular waiting condition? The resolver trying to call into the chain arb while it's shutting down which is what's calling shutdown?

activeWatchers: make(map[wire.OutPoint]*chainWatcher),
chanSource: db,
quit: make(chan struct{}),
resolvedChan: make(chan wire.OutPoint),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a buffer of one?

Can we get a sequence flow explanation of how the deadlock could arise before?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can guarantee that the ChainArb is always started before we call the ChannelArbs, so even without the buffer we are fine ?

@Roasbeef
Copy link
Member

After a bit of ultrathink, I understand the deadlock scenario now. See this diagram for a breakdown: https://gist.github.com/Roasbeef/4a3fbf28294eee98eca0143865981030#the-deadlock-scenario

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🫜

@ziggie1984 ziggie1984 self-requested a review July 25, 2025 08:27
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice fix 🎉

// resolveContracts listens to the `resolvedChan` to mark a given channel as
// fully resolved.
func (c *ChainArbitrator) resolveContracts() {
// Consume block epochs until we receive the instruction to shutdown.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Are we consuming block epochs here? Seems to me we only listen to the resolvedChan ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah wrong copy, now fixed!

activeWatchers: make(map[wire.OutPoint]*chainWatcher),
chanSource: db,
quit: make(chan struct{}),
resolvedChan: make(chan wire.OutPoint),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can guarantee that the ChainArb is always started before we call the ChannelArbs, so even without the buffer we are fine ?

@guggero guggero modified the milestones: v0.20.0, v0.19.3 Jul 25, 2025
@guggero guggero merged commit 8393709 into lightningnetwork:master Jul 25, 2025
37 of 39 checks passed
@yyforyongyu yyforyongyu deleted the fix-arb-deadlock branch July 25, 2025 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix channel closing Related to the closing of channels cooperatively and uncooperatively size/micro small bug fix or feature, less than 15 mins of review, less than 250

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants