Skip to content

fix(pkg):do regex precompile insteadd on the fly#911

Merged
xiaket merged 5 commits intosipeed:mainfrom
ItsT0ng:tong/fix/do-regex-precompile
Mar 1, 2026
Merged

fix(pkg):do regex precompile insteadd on the fly#911
xiaket merged 5 commits intosipeed:mainfrom
ItsT0ng:tong/fix/do-regex-precompile

Conversation

@ItsT0ng
Copy link
Contributor

@ItsT0ng ItsT0ng commented Feb 28, 2026

📝 Description

  • Move HTTP status regex compilation from per-call in extractHTTPStatus to package-level var init, avoiding repeated regexp.MustCompile on every error classification
  • Fix the HTTP regex to use lowercase http since ClassifyError lowercases the input before calling extractHTTPStatus (the uppercase HTTP pattern was a dead code path)
  • Add a standalone word-boundary status code matcher (\b([3-5]\d{2})\b) as a fallback for bare status codes like "error 429"
  • Rename the package-level variable from patterns to httpStatusPatterns for clarity and to avoid shadowing the matchesAny parameter

Pattern priority

extractHTTPStatus now tries patterns in order of specificity:

  1. status[:\s]+(\d{3}) — most specific, matches "status: 429" / "status 429"
  2. http[/\s]+...(\d{3}) — matches "http/1.1 502" with protocol prefix
  3. \b([3-5]\d{2})\b — standalone fallback, matches any bare 3xx–5xx code at a word boundary

Test changes

  • Fix test input from "HTTP/1.1 502 Bad Gateway" to "http/1.1 502 bad gateway" to match the actual lowercased calling convention
  • Add test case for standalone status code extraction ("error 429" → 429)

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

📚 Technical Context (Skip for Docs)

  • Reference URL:
  • Reasoning:

🧪 Test Environment

  • Hardware: PC
  • OS: Ubuntu 22.04
  • Model/Provider: N/A
  • Channels: N/A

📸 Evidence (Optional)

Click to view Logs/Screenshots

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Copilot AI review requested due to automatic review settings February 28, 2026 11:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the provider error classifier to avoid recompiling HTTP-status regexes on the hot path (extractHTTPStatus), improving performance when classifying frequent LLM API errors.

Changes:

  • Moved HTTP-status regexp.MustCompile calls from inside extractHTTPStatus to a package-level precompiled slice.
  • Updated extractHTTPStatus to iterate over the shared precompiled regex list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 28, 2026 12:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ItsT0ng ItsT0ng force-pushed the tong/fix/do-regex-precompile branch from 40ac9ef to 59c3300 Compare February 28, 2026 12:19
Copilot AI review requested due to automatic review settings February 28, 2026 13:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(pkg/providers): do regex precompile instead of on the fly

Good performance optimization with a correctness fix included.

Changes

  1. error_classifier.go -- Moved httpStatusPatterns from local variables inside extractHTTPStatus() to package-level var block. This avoids recompiling 2-3 regexes on every error classification call. Since regexp.MustCompile is safe for concurrent use after compilation, this is the correct Go pattern.

  2. Fixed case mismatch -- Changed HTTP[/\s]+ to http[/\s]+ in the regex. This is a real bug fix: ClassifyError lowercases the input before calling extractHTTPStatus, so the uppercase HTTP pattern was dead code that could never match. Good catch.

  3. Added standalone status code fallback -- New pattern \b([3-5]\d{2})\b matches bare status codes like "error 429". This improves error classification coverage for messages that don't follow the "status: NNN" or "http/1.1 NNN" format.

  4. Test updates -- Fixed test input to use lowercase (matching actual calling convention) and added "error 429" test case.

shell.go Changes

  1. absolutePathPattern moved to package-level -- Same precompile optimization. Previously compiled inside guardCommand() on every call.

  2. defaultDenyPatterns reformatted from var = []*regexp.Regexp{...} to var (\n defaultDenyPatterns = ...\n absolutePathPattern = ...\n) block -- This is just grouping, no functional change.

Concern

  1. Standalone status code regex \b([3-5]\d{2})\b is broad -- This will match any 3-digit number from 300-599 at a word boundary. Strings like "connection on port 443" would return 443 as a status code. "error code 500" is fine, but "timeout after 300ms" would incorrectly match 300. Consider narrowing the pattern or adding it only as a last-resort fallback with lower priority (which it already is, being the last pattern tried). The current test cases pass, but edge cases could produce false positives.

  2. httpStatusPatterns variable rename -- Changed from patterns to httpStatusPatterns. Good -- avoids shadowing the patterns parameter name in matchesAny. Clean naming.

LGTM with the caveat on point #7 (broad fallback pattern). The precompile optimization is correct and the case-sensitivity fix addresses a real bug.

@ItsT0ng ItsT0ng changed the title fix(pkg/providers):do regex precompile insteadd on the fly fix(pkg):do regex precompile insteadd on the fly Mar 1, 2026
Copy link
Collaborator

@xiaket xiaket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

ItsT0ng added 5 commits March 1, 2026 23:03
…de matcher

The precompiled HTTP regex used uppercase "HTTP" which never matched
because ClassifyError lowercases the input. Replace it with a
case-insensitive word-boundary pattern that matches any standalone
3-digit status code (300-599), which also subsumes the HTTP/x.x case.

Add test case for standalone status code extraction.
…cher

Restore the http-prefixed regex (without unnecessary (?i) flag since
input is already lowercased by ClassifyError) as a mid-priority pattern
to reduce false positives. Add a standalone word-boundary matcher as a
fallback for bare status codes like "429". Fix test to use lowercased
input matching the actual calling convention.
The path regex in guardCommand was compiled on every call. Hoist it
to a package-level var (absolutePathPattern) alongside defaultDenyPatterns
in a single var block, so it is compiled once at init time.
@ItsT0ng ItsT0ng force-pushed the tong/fix/do-regex-precompile branch from a310804 to 10f142c Compare March 1, 2026 12:04
@xiaket xiaket merged commit d6e88da into sipeed:main Mar 1, 2026
2 checks passed
vvr3ddy added a commit to vvr3ddy/picoclaw that referenced this pull request Mar 1, 2026
Changes from upstream:
- fix(channel): config cleanup and regex precompile (sipeed#911, sipeed#916)
- fix(github_copilot): improve error handling (sipeed#919)
- fix(wecom): context leaks and data race fixes (sipeed#914, sipeed#918)
- fix(tools): HTTP client caching and response body cleanup (sipeed#940)
- feat(tui): Add configurable Launcher and Gateway process management (sipeed#909)
- feat(migrate): Update migration system with openclaw support (sipeed#910)
- fix(skills): Use registry-backed search for skills discovery (sipeed#929)
- build: Add armv6 support to goreleaser (sipeed#905)
- docs: Sync READMEs and channel documentation
liangzhang-keepmoving pushed a commit to liangzhang-keepmoving/picoclaw that referenced this pull request Mar 2, 2026
* fix(pkg/providers):do regex precompile insteadd on the fly

* fix(providers): replace HTTP-specific regex with standalone status code matcher

The precompiled HTTP regex used uppercase "HTTP" which never matched
because ClassifyError lowercases the input. Replace it with a
case-insensitive word-boundary pattern that matches any standalone
3-digit status code (300-599), which also subsumes the HTTP/x.x case.

Add test case for standalone status code extraction.

* fix(providers): restore http regex and add standalone status code matcher

Restore the http-prefixed regex (without unnecessary (?i) flag since
input is already lowercased by ClassifyError) as a mid-priority pattern
to reduce false positives. Add a standalone word-boundary matcher as a
fallback for bare status codes like "429". Fix test to use lowercased
input matching the actual calling convention.

* perf(tools): move path regex compilation from per-call to package init

The path regex in guardCommand was compiled on every call. Hoist it
to a package-level var (absolutePathPattern) alongside defaultDenyPatterns
in a single var block, so it is compiled once at init time.

* style(tools): move inline comment to fix golines formatting error
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
* fix(pkg/providers):do regex precompile insteadd on the fly

* fix(providers): replace HTTP-specific regex with standalone status code matcher

The precompiled HTTP regex used uppercase "HTTP" which never matched
because ClassifyError lowercases the input. Replace it with a
case-insensitive word-boundary pattern that matches any standalone
3-digit status code (300-599), which also subsumes the HTTP/x.x case.

Add test case for standalone status code extraction.

* fix(providers): restore http regex and add standalone status code matcher

Restore the http-prefixed regex (without unnecessary (?i) flag since
input is already lowercased by ClassifyError) as a mid-priority pattern
to reduce false positives. Add a standalone word-boundary matcher as a
fallback for bare status codes like "429". Fix test to use lowercased
input matching the actual calling convention.

* perf(tools): move path regex compilation from per-call to package init

The path regex in guardCommand was compiled on every call. Hoist it
to a package-level var (absolutePathPattern) alongside defaultDenyPatterns
in a single var block, so it is compiled once at init time.

* style(tools): move inline comment to fix golines formatting error
Pluckypan pushed a commit to Pluckypan/picoclaw that referenced this pull request Mar 6, 2026
* fix(pkg/providers):do regex precompile insteadd on the fly

* fix(providers): replace HTTP-specific regex with standalone status code matcher

The precompiled HTTP regex used uppercase "HTTP" which never matched
because ClassifyError lowercases the input. Replace it with a
case-insensitive word-boundary pattern that matches any standalone
3-digit status code (300-599), which also subsumes the HTTP/x.x case.

Add test case for standalone status code extraction.

* fix(providers): restore http regex and add standalone status code matcher

Restore the http-prefixed regex (without unnecessary (?i) flag since
input is already lowercased by ClassifyError) as a mid-priority pattern
to reduce false positives. Add a standalone word-boundary matcher as a
fallback for bare status codes like "429". Fix test to use lowercased
input matching the actual calling convention.

* perf(tools): move path regex compilation from per-call to package init

The path regex in guardCommand was compiled on every call. Hoist it
to a package-level var (absolutePathPattern) alongside defaultDenyPatterns
in a single var block, so it is compiled once at init time.

* style(tools): move inline comment to fix golines formatting error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants