Skip to content

make gateway aware of config.json change#1187

Merged
yinwm merged 13 commits intosipeed:mainfrom
cytown:d4
Mar 13, 2026
Merged

make gateway aware of config.json change#1187
yinwm merged 13 commits intosipeed:mainfrom
cytown:d4

Conversation

@cytown
Copy link
Contributor

@cytown cytown commented Mar 6, 2026

📝 Description

make gateway command aware of config.json change

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

📚 Technical Context (Skip for Docs)

  • Reference URL:
  • Reasoning:

🧪 Test Environment

  • Hardware: Hackintosh
  • OS: macOS 13.6
  • Model/Provider:
  • Channels:

📸 Evidence (Optional)

Click to view Logs/Screenshots

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds gateway-side awareness of config.json changes by introducing a polling watcher that hot-reloads configuration and recreates the LLM provider/agent registry at runtime.

Changes:

  • Add polling-based config.json watcher in gatewayCmd to trigger live reloads.
  • Add AgentLoop.SetProvider to swap provider/config during runtime reload.
  • Extend pkg/logger with formatted helper functions used by the gateway hot-reload path.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
cmd/picoclaw/internal/gateway/helpers.go Adds config polling watcher + main loop to reload config and recreate provider on change.
pkg/agent/loop.go Adds SetProvider to update agent loop config/provider via registry rebuild.
pkg/logger/logger.go Adds formatted logging helper functions used by gateway reload logging.
go.mod Promotes github.com/h2non/filetype to a direct dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sipeed-bot sipeed-bot bot added type: enhancement New feature or request domain: config labels Mar 6, 2026
@yinwm
Copy link
Collaborator

yinwm commented Mar 6, 2026

PR #1187 Review: Make Gateway Aware of config.json Changes

📋 Overview

This PR implements configuration hot-reload functionality: when config.json is modified, the gateway automatically detects and reloads the configuration, creates a new provider, and updates the agent loop without requiring a service restart.

Changed Files:

File +Lines -Lines Description
cmd/picoclaw/internal/gateway/helpers.go 162 20 Core logic
pkg/agent/loop.go 8 0 New SetProvider method
pkg/logger/logger.go 16 0 New formatted log functions
go.mod 1 1 Dependency change

✅ What's Done Well

  1. Polling instead of fsnotify - Simple and reliable, avoids cross-platform filesystem event issues
  2. Debounce handling - time.Sleep(500ms) ensures file write is complete
  3. Non-blocking send - Uses select + default to prevent config update pile-up
  4. Config validation - Calls ValidateModelList() after loading to ensure config is valid
  5. Graceful degradation - Continues using old config when new config fails to load

⚠️ Issues to Address

1. Suspicious go.mod Change

- github.com/h2non/filetype v1.1.3 // indirect
+ github.com/h2non/filetype v1.1.3

Changed filetype from indirect to direct dependency, but the code doesn't seem to use this package. This might be an accidental side effect of go mod tidy. Please confirm if this is intentional.

2. Potential Race Condition

provider = newProvider  // Direct assignment, no lock
agentLoop.SetProvider(provider, newCfg)

If provider is accessed concurrently in other goroutines, there could be a race condition. Please confirm that provider is only used in the main event loop.

3. Limited Scope of Config Reload

As noted in the code comment:

// Note: Some changes (like channel configs) may require restart to take full effect

Only provider and agent are updated, but the following config changes may not take effect:

  • channelManager config
  • deviceService config
  • cronService config

Consider documenting which configs support hot-reload in the PR description or logs.

4. Hardcoded Polling Interval

ticker := time.NewTicker(2 * time.Second)

The 2-second polling interval is hardcoded. Consider making it configurable if flexibility is needed in the future.

5. Inconsistent Logger Function Naming

func DebugSF(...)   // S + F ?
func Infof(...)     // Standard fmt style
func Errorf(...)

DebugSF naming style is inconsistent with Infof, Errorf. Suggest using Debugf for consistency or rename others to match.


🔴 Suggested Changes

  1. Revert the filetype change in go.mod (unless actually needed)
  2. Consider adding a lock for provider or confirm no concurrent access
  3. Unify logger function naming
  4. Log which configs don't support hot-reload

📊 Summary

Aspect Rating Notes
Functionality ⭐⭐⭐⭐ Core feature implemented, but limited scope
Code Quality ⭐⭐⭐⭐ Clean structure, proper error handling
Safety ⭐⭐⭐ Possible race condition
Maintainability ⭐⭐⭐⭐ Readable code, some hardcoded values

Overall: A useful feature improvement, but suggest addressing the above issues before merging.

@xiaket
Copy link
Collaborator

xiaket commented Mar 6, 2026

I have some opinion on this approach, I think the UNIX way of solving this problem is to expect a SIGHUP in the code, and when kill this process with SIGHUP, we will trigger a function to reload config.

@cytown
Copy link
Contributor Author

cytown commented Mar 7, 2026

I have some opinion on this approach, I think the UNIX way of solving this problem is to expect a SIGHUP in the code, and when kill this process with SIGHUP, we will trigger a function to reload config.

I think when you modify the config.json on fly with bot, you don't want to go to the terminal and send SIGHUP...

@xiaket
Copy link
Collaborator

xiaket commented Mar 7, 2026

I have some opinion on this approach, I think the UNIX way of solving this problem is to expect a SIGHUP in the code, and when kill this process with SIGHUP, we will trigger a function to reload config.

I think when you modify the config.json on fly with bot, you don't want to go to the terminal and send SIGHUP...

I mean it should be explicit, but not be automatic. For example, you can have a skill to update the config then send a signal to the process.

@cytown
Copy link
Contributor Author

cytown commented Mar 8, 2026

I have some opinion on this approach, I think the UNIX way of solving this problem is to expect a SIGHUP in the code, and when kill this process with SIGHUP, we will trigger a function to reload config.

I think when you modify the config.json on fly with bot, you don't want to go to the terminal and send SIGHUP...

I mean it should be explicit, but not be automatic. For example, you can have a skill to update the config then send a signal to the process.

then you will need the process id, and has permission to send SIGHUP...

@cytown
Copy link
Contributor Author

cytown commented Mar 10, 2026

@xiaket fix lint now, please check again. btw, the new released launcher will work perfect with this pr.

@yinwm
Copy link
Collaborator

yinwm commented Mar 10, 2026

Code Review Summary

Thanks for implementing the config hot-reload feature! The overall design approach is solid. However, I found some issues that need to be addressed before merging.


🔴 Critical Issue: Incomplete Thread Safety

The PR adds mu sync.RWMutex and thread-safe getter methods GetRegistry() and GetConfig(), but most code still directly accesses al.registry and al.cfg without lock protection.

From my analysis, there are 20+ places with direct unguarded access:

// loop.go:242 - unguarded access
if al.cfg.Tools.IsToolEnabled("mcp") { ... }

// loop.go:255 - unguarded access  
defaultAgent := al.registry.GetDefaultAgent()

// loop.go:343, 384-385, 400, 541, 631, 640, 642, 704, 760, etc.

Impact: During hot-reload, if ReloadProviderAndConfig() is swapping registry/config while Run() or processMessage() is reading these fields, data races will occur. Running go test -race would detect these issues.

Suggested Fix: Replace all direct accesses with thread-safe getters:

// Before (unsafe)
agent := al.registry.GetDefaultAgent()

// After (safe)
agent := al.GetRegistry().GetDefaultAgent()

Or use a snapshot pattern when atomic access to multiple fields is needed:

func (al *AgentLoop) getSnapshot() (*config.Config, *AgentRegistry) {
    al.mu.RLock()
    defer al.mu.RUnlock()
    return al.cfg, al.registry
}

🟡 Medium Issues

1. Panic Handling Could Be Clearer

In ReloadProviderAndConfig(), when a panic occurs:

go func() {
    defer func() {
        if r := recover(); r != nil {
            logger.ErrorCF(...)  // Only logs
        }
        close(done)
    }()
    registry = NewAgentRegistry(cfg, provider)
}()

The error message "registry creation failed (nil result)" doesn't indicate it was caused by a panic. Consider improving the error message to distinguish panic cases.

2. defer cancel() in Loop Context

case newCfg := <-configReloadChan:
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()  // This defers to function return, not loop iteration

While this works due to continue, it's semantically unclear. Consider extracting the reload logic into a separate function.


🟢 What Looks Good

  • ✅ Atomic swap design for provider + config is correct
  • ✅ Proper error handling - continues with old config on validation failure
  • ✅ Resource cleanup - correctly closes old StatefulProvider
  • ✅ Debounce mechanism (500ms) ensures file write completion
  • ✅ Clear logging throughout the hot-reload process
  • ✅ New logger format functions (Infof, Errorf, etc.) follow Go conventions

📝 Suggestions

  1. Add unit tests for ReloadProviderAndConfig() - it's a critical method
  2. Consider fsnotify instead of polling for better efficiency (2s polling + 500ms debounce means up to 2.5s delay)
  3. Add metrics for reload success/failure counts for monitoring

Summary

Recommendation: Request Changes

The core implementation approach is correct, but the thread safety is incomplete. Introducing locks without protecting all access points will likely cause data races during config hot-reload in production.

Please:

  1. Replace all direct al.registry and al.cfg accesses with GetRegistry() and GetConfig()
  2. Run go test -race ./... to verify no race conditions

@cytown
Copy link
Contributor Author

cytown commented Mar 10, 2026

Code Review Summary

Thanks for implementing the config hot-reload feature! The overall design approach is solid. However, I found some issues that need to be addressed before merging.

🔴 Critical Issue: Incomplete Thread Safety

The PR adds mu sync.RWMutex and thread-safe getter methods GetRegistry() and GetConfig(), but most code still directly accesses al.registry and al.cfg without lock protection.

From my analysis, there are 20+ places with direct unguarded access:

// loop.go:242 - unguarded access
if al.cfg.Tools.IsToolEnabled("mcp") { ... }

// loop.go:255 - unguarded access  
defaultAgent := al.registry.GetDefaultAgent()

// loop.go:343, 384-385, 400, 541, 631, 640, 642, 704, 760, etc.

Impact: During hot-reload, if ReloadProviderAndConfig() is swapping registry/config while Run() or processMessage() is reading these fields, data races will occur. Running go test -race would detect these issues.

Suggested Fix: Replace all direct accesses with thread-safe getters:

// Before (unsafe)
agent := al.registry.GetDefaultAgent()

// After (safe)
agent := al.GetRegistry().GetDefaultAgent()

Or use a snapshot pattern when atomic access to multiple fields is needed:

func (al *AgentLoop) getSnapshot() (*config.Config, *AgentRegistry) {
    al.mu.RLock()
    defer al.mu.RUnlock()
    return al.cfg, al.registry
}

🟡 Medium Issues

1. Panic Handling Could Be Clearer

In ReloadProviderAndConfig(), when a panic occurs:

go func() {
    defer func() {
        if r := recover(); r != nil {
            logger.ErrorCF(...)  // Only logs
        }
        close(done)
    }()
    registry = NewAgentRegistry(cfg, provider)
}()

The error message "registry creation failed (nil result)" doesn't indicate it was caused by a panic. Consider improving the error message to distinguish panic cases.

2. defer cancel() in Loop Context

case newCfg := <-configReloadChan:
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()  // This defers to function return, not loop iteration

While this works due to continue, it's semantically unclear. Consider extracting the reload logic into a separate function.

🟢 What Looks Good

  • ✅ Atomic swap design for provider + config is correct
  • ✅ Proper error handling - continues with old config on validation failure
  • ✅ Resource cleanup - correctly closes old StatefulProvider
  • ✅ Debounce mechanism (500ms) ensures file write completion
  • ✅ Clear logging throughout the hot-reload process
  • ✅ New logger format functions (Infof, Errorf, etc.) follow Go conventions

📝 Suggestions

  1. Add unit tests for ReloadProviderAndConfig() - it's a critical method
  2. Consider fsnotify instead of polling for better efficiency (2s polling + 500ms debounce means up to 2.5s delay)
  3. Add metrics for reload success/failure counts for monitoring

Summary

Recommendation: Request Changes

The core implementation approach is correct, but the thread safety is incomplete. Introducing locks without protecting all access points will likely cause data races during config hot-reload in production.

Please:

  1. Replace all direct al.registry and al.cfg accesses with GetRegistry() and GetConfig()
  2. Run go test -race ./... to verify no race conditions

done

@yinwm
Copy link
Collaborator

yinwm commented Mar 10, 2026

Code review

Found 1 issue:

  1. Thread safety issue: PR adds mu sync.RWMutex and thread-safe getter methods GetRegistry() and GetConfig(), but most code still directly accesses al.registry and al.cfg without lock protection. There are 20+ places with direct unguarded access (e.g., lines 290, 316-332, 342, 435 in loop.go), which creates data races when config reload happens concurrently with message processing.

mediaStore media.MediaStore
transcriber voice.Transcriber
cmdRegistry *commands.Registry
mu sync.RWMutex
}
// processOptions configures how a message is processed
type processOptions struct {

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@cytown
Copy link
Contributor Author

cytown commented Mar 11, 2026

Code review

Found 1 issue:

  1. Thread safety issue: PR adds mu sync.RWMutex and thread-safe getter methods GetRegistry() and GetConfig(), but most code still directly accesses al.registry and al.cfg without lock protection. There are 20+ places with direct unguarded access (e.g., lines 290, 316-332, 342, 435 in loop.go), which creates data races when config reload happens concurrently with message processing.

mediaStore media.MediaStore
transcriber voice.Transcriber
cmdRegistry *commands.Registry
mu sync.RWMutex
}
// processOptions configures how a message is processed
type processOptions struct {

🤖 Generated with Claude Code

  • If this code review was useful, please react with 👍. Otherwise, react with 👎.

done.

Copy link
Collaborator

@alexhoshina alexhoshina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR only rebuilds the config object related to the Provider. For modules like channels, they still hold pointers to the old config, so consistency might need to be considered

@cytown
Copy link
Contributor Author

cytown commented Mar 12, 2026

This PR only rebuilds the config object related to the Provider. For modules like channels, they still hold pointers to the old config, so consistency might need to be considered

refactor and use rebuild all services including channels mechanism.

@yinwm
Copy link
Collaborator

yinwm commented Mar 12, 2026

Code Review

This PR implements config hot-reload functionality. The overall design is sound, but there are blocking issues that need to be addressed.

🔴 Blocking Issues

1. Logger changes conflict with main branch (zerolog refactor)

PR #1239 (refactor logger to zerolog) was merged on 2026-03-11. This PR's logger changes are based on the old logger implementation and will cause conflicts.

  • Remove changes to pkg/logger/logger.go and pkg/logger/logger_test.go
  • Rebase onto latest main
  • Use zerolog API instead of adding Infof/Errorf/Debugf

2. activeRequests counter not released on panic (pkg/agent/loop.go)

// Current code - Done() won't execute if Chat() panics
for attempt := 0; attempt < maxRetries; attempt++ {
    al.activeRequests.Add(1)
    resp, err = agent.Provider.Chat(...)
    al.activeRequests.Done()
    ...
}

// Should be:
for attempt := 0; attempt < maxRetries; attempt++ {
    al.activeRequests.Add(1)
    resp, err = func() (*providers.LLMResponse, error) {
        defer al.activeRequests.Done()
        return agent.Provider.Chat(...)
    }()
    ...
}

🟡 Suggestions

  1. Polling interval hardcoded (helpers.go:517) - Consider using fsnotify instead of polling, or make the interval configurable

  2. restartServices ctx source (helpers.go:398) - The ctx comes from the main loop and could be canceled during reload. Consider using an independent context with timeout

  3. Timeout values hardcoded - The 30s timeouts for stopAndCleanupServices and reload context should be constants or configurable

✅ What's Done Well

  • Thread-safe design with sync.RWMutex
  • Graceful degradation when provider creation fails
  • 500ms debounce for file write completion
  • Non-blocking send on config reload channel

🤖 Generated with Claude Code

@cytown
Copy link
Contributor Author

cytown commented Mar 13, 2026

4. restartServices

done, beside 3... it's changed from fsnotify....

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yinwm yinwm merged commit 9676e51 into sipeed:main Mar 13, 2026
4 checks passed
davidburhans added a commit to davidburhans/picoclaw that referenced this pull request Mar 13, 2026
Conflicts resolved:
- helpers.go: merged import sections (io, log, net/http + sync)
- config.go: merged AgentDefaults with Schedule, SafetyLevel, BirthYear

Upstream features merged:
- Config hot reload (PR sipeed#1187)
- Anthropic Messages protocol (PR sipeed#1284)
- Enhanced Skill Installer v2 (PR sipeed#1252)
- Model command CLI (PR sipeed#1250)
- ModelScope provider (PR sipeed#1486)
- LINE webhook DoS protection (PR sipeed#1413)
@Orgmar
Copy link
Contributor

Orgmar commented Mar 13, 2026

@cytown Making the gateway react to config.json changes on the fly is a useful improvement, saves having to restart every time you tweak settings.

We've got a PicoClaw Dev Group on Discord for contributors. If you want to join, send an email to [email protected] with the subject [Join PicoClaw Dev Group] + your GitHub account and we'll send you the invite!

@cytown
Copy link
Contributor Author

cytown commented Mar 14, 2026

@cytown Making the gateway react to config.json changes on the fly is a useful improvement, saves having to restart every time you tweak settings.

We've got a PicoClaw Dev Group on Discord for contributors. If you want to join, send an email to [email protected] with the subject [Join PicoClaw Dev Group] + your GitHub account and we'll send you the invite!

I am already in there. This one?

CleanShot 2026-03-14 at 10 34 42@2x

dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 16, 2026
* make gateway aware of config.json change

* fix according to code review

* fix lint

* fix review comment

* fix for review

* refactor to fix review

* fix for review

* fix for review
neotty pushed a commit to neotty/picoclaw that referenced this pull request Mar 17, 2026
* make gateway aware of config.json change

* fix according to code review

* fix lint

* fix review comment

* fix for review

* refactor to fix review

* fix for review

* fix for review
@cytown cytown deleted the d4 branch March 19, 2026 06:36
j0904 pushed a commit to j0904/picoclaw that referenced this pull request Mar 22, 2026
* make gateway aware of config.json change

* fix according to code review

* fix lint

* fix review comment

* fix for review

* refactor to fix review

* fix for review

* fix for review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants