fix(feishu): invalidate cached token on auth error to enable retry recovery#1318
Conversation
nikolasdehor
left a comment
There was a problem hiding this comment.
Solid fix for a real operational issue. The Lark SDK v3's built-in token retry not clearing stale tokens from cache is a well-known pain point, and this custom tokenCache implementation is the correct workaround.
Review notes:
-
Cache implementation is clean --
tokenCachecorrectly implementslarkcore.CachewithGet/Set, and theInvalidateAllmethod usesclear(c.store)(Go 1.21+) for a clean wipe. The mutex usage is correct:RLockfor reads,Lockfor writes. -
Expiry check in Get -- returning empty string for expired entries is correct per the
larkcore.Cachecontract. The SDK will then request a new token. -
Comprehensive coverage -- all API call sites that check
resp.Success()now also callinvalidateTokenOnAuthError(resp.Code). I count:sendCard,EditMessage,SendPlaceholder,ReactToMessage,fetchBotOpenID,sendImage(upload + send),sendFile(upload + send),downloadResource. That looks complete. -
Idempotent invalidation -- calling
InvalidateAll()multiple times in quick succession (e.g., if multiple API calls fail simultaneously) is safe since it just clears the map. -
Note on
stripMentionPlaceholderssignature change -- this PR also includes the mention handling changes (addingbotOpenIDparameter, sender identity metadata). These seem shared with PR #1319 and #1283. The changes look correct, but having them in multiple PRs creates merge conflict risk. Coordination between these PRs would be good.
LGTM on the token cache fix.
|
|
pkg/channels/feishu/feishu_64.go
Outdated
| // Prepend sender identity so the LLM knows who sent the message. | ||
| if sender != nil && sender.SenderId != nil { | ||
| openID := "" | ||
| if sender.SenderId.OpenId != nil { | ||
| openID = *sender.SenderId.OpenId | ||
| } | ||
| if openID != "" { | ||
| content = fmt.Sprintf("[sender: open_id=%s] %s", openID, content) |
There was a problem hiding this comment.
This causes all messages to start with [sender: open_id=xxxx], which prevents the command system from working, as the command system only recognizes messages that begin with / or !.
Original message: /help
Will be processed as: [sender: open_id=ou_xxx] /help
handleCommand cannot recognize it as a command, so the message will be sent to the agent as ordinary chat text and not executed as a command.
pkg/channels/feishu/feishu_64.go
Outdated
| // Replace mention placeholders for all chat types: bot mentions are stripped, others become @Name(open_id:xxx) | ||
| if len(message.Mentions) > 0 { | ||
| knownBotID, _ := c.botOpenID.Load().(string) | ||
| content = stripMentionPlaceholders(content, message.Mentions, knownBotID) |
There was a problem hiding this comment.
For example
Group message: @Alice /help
will be processed as: @Alice(open_id:...) /help
causing the command to be unrecognized.
There was a problem hiding this comment.
Sorry, the old commits were mixed with other changes. Now the commit history is clean.
…covery The Lark SDK v3's built-in token retry loop does not clear stale tokens from cache when the server returns error 99991663 (tenant_access_token invalid), causing all API calls to fail until the token naturally expires (~2 hours). - Add tokenCache struct (implementing larkcore.Cache) with Get/Set/InvalidateAll methods and proper expired-entry cleanup - Wire custom cache into lark.NewClient via WithTokenCache() - Add invalidateTokenOnAuthError helper called in all API methods
3cf644d to
5a2b34f
Compare
…covery (sipeed#1318) The Lark SDK v3's built-in token retry loop does not clear stale tokens from cache when the server returns error 99991663 (tenant_access_token invalid), causing all API calls to fail until the token naturally expires (~2 hours). - Add tokenCache struct (implementing larkcore.Cache) with Get/Set/InvalidateAll methods and proper expired-entry cleanup - Wire custom cache into lark.NewClient via WithTokenCache() - Add invalidateTokenOnAuthError helper called in all API methods
|
@Vast-Stars 飞书token缓存失效后无法自动重试的问题抓得很准,自定义tokenCache的方案也很干净。之前这个bug能让API调用卡住两小时,修复很及时! 我们正在组建 PicoClaw Dev Group,在Discord上方便贡献者之间交流。感兴趣的话,发邮件到 |
…covery (sipeed#1318) The Lark SDK v3's built-in token retry loop does not clear stale tokens from cache when the server returns error 99991663 (tenant_access_token invalid), causing all API calls to fail until the token naturally expires (~2 hours). - Add tokenCache struct (implementing larkcore.Cache) with Get/Set/InvalidateAll methods and proper expired-entry cleanup - Wire custom cache into lark.NewClient via WithTokenCache() - Add invalidateTokenOnAuthError helper called in all API methods
Summary
99991663(tenant_access_token invalid), causing all API calls to fail until the token naturally expires (~2 hours)tokenCache(implementinglarkcore.Cache) with anInvalidateAll()method, injected vialark.WithTokenCache()99991663, invalidate the cache so the next application-level retry fetches a fresh tokenChanges
tokenCachestruct withGet/Set/InvalidateAllmethodslark.NewClientviaWithTokenCache()invalidateTokenOnAuthErrorhelper called in all API methods:sendCard,EditMessage,SendPlaceholder,ReactToMessage,fetchBotOpenID,sendImage,sendFile,downloadResourceTest plan
make buildcompiles successfullygo test ./pkg/channels/feishu/all pass