Skip to content

fix(wecom): use channel context instead of HTTP request context for async message processing#914

Merged
yinwm merged 2 commits intosipeed:mainfrom
alexhoshina:fix/wecom-context-canceled
Feb 28, 2026
Merged

fix(wecom): use channel context instead of HTTP request context for async message processing#914
yinwm merged 2 commits intosipeed:mainfrom
alexhoshina:fix/wecom-context-canceled

Conversation

@alexhoshina
Copy link
Collaborator

@alexhoshina alexhoshina commented Feb 28, 2026

📝 Description

The HTTP request context is canceled as soon as the handler returns the response, causing PublishInbound to fail with "context canceled" when processMessage runs asynchronously in a goroutine. Use the channel's long-lived context (c.ctx) instead.

…sync message processing

The HTTP request context is canceled as soon as the handler returns the
response, causing PublishInbound to fail with "context canceled" when
processMessage runs asynchronously in a goroutine. Use the channel's
long-lived context (c.ctx) instead.
Copilot AI review requested due to automatic review settings February 28, 2026 13:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes async WeCom webhook message processing by avoiding the short-lived HTTP request context (which gets canceled when the handler returns), and instead using the channel’s long-lived context for background processing.

Changes:

  • Switch WeCom Bot webhook async processing to use c.ctx instead of the handler/request context.
  • Switch WeCom App webhook async processing to use c.ctx instead of the handler/request context.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/channels/wecom/bot.go Uses the channel lifecycle context for goroutine-based message processing to avoid context canceled on publish.
pkg/channels/wecom/app.go Same context change for the WeCom App webhook callback async processing path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

go c.processMessage(ctx, msg)
// Process the message with the channel's long-lived context (not the HTTP
// request context, which is canceled as soon as we return the response).
go c.processMessage(c.ctx, msg)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.ctx may be nil here if the channel hasn't been started yet (e.g., current tests invoke handleMessageCallback on a newly constructed channel without calling Start). A nil context will panic downstream in BaseChannel.HandleMessage / MessageBus.PublishInbound when ctx.Err() is called. Ensure c.ctx is always non-nil (initialize it in NewWeComAppChannel, or add a safe fallback here when c.ctx == nil).

Suggested change
go c.processMessage(c.ctx, msg)
procCtx := c.ctx
if procCtx == nil {
procCtx = context.Background()
}
go c.processMessage(procCtx, msg)

Copilot uses AI. Check for mistakes.
Comment on lines +295 to +297
// Process the message with the channel's long-lived context (not the HTTP
// request context, which is canceled as soon as we return the response).
go c.processMessage(c.ctx, msg)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing tests for handleMessageCallback, but they only assert the HTTP response and don't validate the async processing path. Please add a regression test that passes a canceled request context into handleMessageCallback and asserts an inbound message is still published (e.g., via msgBus.ConsumeInbound with a timeout), to prevent reintroducing the "context canceled" failure this change is addressing.

Copilot uses AI. Check for mistakes.
Comment on lines +570 to +572
// Process the message with the channel's long-lived context (not the HTTP
// request context, which is canceled as soon as we return the response).
go c.processMessage(c.ctx, msg)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current webhook callback tests only check the immediate HTTP response and don't assert that the async goroutine successfully publishes the inbound message. Add a regression test that cancels the request context and verifies an inbound message is still consumed from the bus (e.g., ConsumeInbound with a timeout), so this bugfix stays covered.

Copilot uses AI. Check for mistakes.
go c.processMessage(ctx, msg)
// Process the message with the channel's long-lived context (not the HTTP
// request context, which is canceled as soon as we return the response).
go c.processMessage(c.ctx, msg)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.ctx can be nil here when the channel hasn't been started yet (e.g., current unit tests call handleMessageCallback without calling Start). Passing a nil context into processMessage eventually reaches BaseChannel.HandleMessageMessageBus.PublishInbound(ctx, ...), which will panic on ctx.Err(). Initialize a non-nil default (e.g., set c.ctx = context.Background() in the constructor, or fall back to context.Background()/a detached context here when c.ctx == nil).

Suggested change
go c.processMessage(c.ctx, msg)
procCtx := c.ctx
if procCtx == nil {
procCtx = context.Background()
}
go c.processMessage(procCtx, msg)

Copilot uses AI. Check for mistakes.
…n tests

The ctx field was only set in Start(), so tests calling handleMessageCallback
without Start() caused a nil pointer dereference in MessageBus.PublishInbound.
Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

✅ Correct Problem Identification

The PR correctly identifies a real bug:

  • Root cause: HTTP request context is canceled immediately after the handler returns, but processMessage runs asynchronously in a goroutine, causing PublishInbound to fail with "context canceled" error
  • Fix direction is correct: Using the channel's long-lived context is the right approach

⚠️ Potential Issue: Resource Leak

There's a subtle bug in the implementation. The cancel function created in the constructor is overwritten in Start(), causing a resource leak:

// In constructor:
ctx, cancel := context.WithCancel(context.Background())
return &WeComBotChannel{
    ctx:    ctx,
    cancel: cancel,  // cancel_1
}

// In Start():
c.ctx, c.cancel = context.WithCancel(ctx)  // cancel_2 overwrites cancel_1

cancel_1 will never be called, causing a goroutine/resource leak (albeit small).

🔧 Suggested Fix

Release the previous cancel in Start():

func (c *WeComBotChannel) Start(ctx context.Context) error {
    logger.InfoC("wecom", "Starting WeCom Bot channel...")

    // Release the old cancel created in constructor
    if c.cancel != nil {
        c.cancel()
    }
    
    c.ctx, c.cancel = context.WithCancel(ctx)
    // ...
}

This applies to both WeComBotChannel and WeComAppChannel.

📝 Minor Suggestions

  1. Consider adding a test case to verify async processing doesn't fail due to context cancellation
  2. The comments are good, could be slightly more explicit about the "context canceled" error scenario

Summary

Aspect Assessment
Problem identification ✅ Correct
Fix direction ✅ Correct
Implementation details ⚠️ Minor issue (resource leak)
Code style ✅ Good

Please address the cancel overwrite issue and this PR should be good to merge. Thanks for the fix!

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait you leak pr @alexhoshina

@yinwm yinwm merged commit 9c9524f into sipeed:main Feb 28, 2026
2 checks passed
vvr3ddy added a commit to vvr3ddy/picoclaw that referenced this pull request Mar 1, 2026
Changes from upstream:
- fix(channel): config cleanup and regex precompile (sipeed#911, sipeed#916)
- fix(github_copilot): improve error handling (sipeed#919)
- fix(wecom): context leaks and data race fixes (sipeed#914, sipeed#918)
- fix(tools): HTTP client caching and response body cleanup (sipeed#940)
- feat(tui): Add configurable Launcher and Gateway process management (sipeed#909)
- feat(migrate): Update migration system with openclaw support (sipeed#910)
- fix(skills): Use registry-backed search for skills discovery (sipeed#929)
- build: Add armv6 support to goreleaser (sipeed#905)
- docs: Sync READMEs and channel documentation
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
…eled

fix(wecom): use channel context instead of HTTP request context for async message processing
Pluckypan pushed a commit to Pluckypan/picoclaw that referenced this pull request Mar 6, 2026
…eled

fix(wecom): use channel context instead of HTTP request context for async message processing
@alexhoshina alexhoshina deleted the fix/wecom-context-canceled branch March 12, 2026 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants