Skip to content

feat(plugin): phase2 core selection + resolver (1/4)#936

Closed
gh-xj wants to merge 2 commits intosipeed:mainfrom
gh-xj:xj/plugin-system-02281809-1-core
Closed

feat(plugin): phase2 core selection + resolver (1/4)#936
gh-xj wants to merge 2 commits intosipeed:mainfrom
gh-xj:xj/plugin-system-02281809-1-core

Conversation

@gh-xj
Copy link
Collaborator

@gh-xj gh-xj commented Mar 1, 2026

📝 Description

Stacked series PR (1/4) for Plugin System Phase 2/3.

Scope in this PR:

  • plugin selection config schema (default_enabled, enabled, disabled)
  • deterministic selection resolver
  • builtin catalog scaffolding
  • shared bootstrap resolver primitives

🔗 Stack Context

📌 Implementation Parity Matrix

  • Config schema exists in code: pkg/config/config.go, pkg/config/defaults.go
  • Deterministic selection resolver exists in code: pkg/plugin/manager.go
  • Bootstrap resolver exists in code: cmd/picoclaw/internal/pluginruntime/bootstrap.go

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Related: #473 (Plugin System Phase 2/3 rollout)

📚 Technical Context (Skip for Docs)

🧪 Test Environment

  • Hardware: GitHub Actions runner (x86_64)
  • OS: Ubuntu (CI)
  • Model/Provider: N/A
  • Channels: N/A

📸 Evidence (Optional)

Click to view Logs/Screenshots

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

@gh-xj gh-xj changed the title feat(plugin-system 1/4): phase2 core selection + resolver feat(plugin): phase2 core selection + resolver (1/4) Mar 1, 2026
@gh-xj
Copy link
Collaborator Author

gh-xj commented Mar 1, 2026

Plugin System Phase 2/3 设计评审(技术版,先拿设计 Buy-in)

1. 评审目的

本评审只解决一个问题:
在不引入动态加载复杂度的前提下,当前 Phase 2/3 设计是否在 正确性、可运维性、可演进性 三个维度成立,值得按当前 PR 栈推进。

PR 栈(均为 Draft):

建议合并顺序:#936 -> #937 -> #938 -> #939


2. 设计边界与假设

2.1 In Scope

  • 编译期内置插件的选择与装配
  • 配置驱动启停(default/enabled/disabled)
  • 启动期可观测摘要
  • CLI 状态查看与配置 lint

2.2 Out of Scope

  • 动态插件加载(.so / RPC runtime loader)
  • 热更新
  • 插件签名分发体系
  • 沙箱/权限模型

2.3 关键假设

  • 当前插件来源是内置 catalog(非外部来源)
  • 插件生命周期仍依赖现有 hook registry
  • entrypoint 装配发生在 Run()

3. 核心架构拆分

Phase 2: Selection Plane(控制平面)

目标:决定“加载谁”,并保证结果可预测。

关键实现位置:

  • 配置模型:pkg/config/config.go, pkg/config/defaults.go
  • 选择算法:pkg/plugin/manager.go
  • 内置目录:pkg/plugin/builtin/catalog.go
  • 启动装配:cmd/picoclaw/internal/pluginruntime/bootstrap.go
  • entrypoint 接线:cmd/picoclaw/internal/agent/helpers.go, cmd/picoclaw/internal/gateway/helpers.go

Phase 3: Introspection Plane(可观测平面)

目标:让 operator 可看到“当前状态是否符合预期”。

关键实现位置:

  • CLI list:cmd/picoclaw/internal/plugin/list.go
  • CLI lint:cmd/picoclaw/internal/plugin/lint.go
  • 运行摘要:agent/gateway 启动日志字段

4. 数据契约(当前已落地)

4.1 配置契约

{
  "plugins": {
    "default_enabled": true,
    "enabled": ["policy-demo"],
    "disabled": ["legacy-policy"]
  }
}

字段语义:

  • default_enabled: 当 enabled 为空时,是否默认启用 catalog 中插件
  • enabled: 显式启用集合
  • disabled: 显式禁用集合,优先级高于 enabled

4.2 CLI 契约

  • picoclaw plugin list / --format json:当前只承诺 name + status
  • picoclaw plugin lint --config <path>:返回配置是否合法(未知 enabled 会失败)

4.3 启动摘要契约

当前对外稳定字段:

  • plugins_enabled
  • plugins_disabled
  • plugins_unknown_enabled
  • plugins_unknown_disabled
  • plugins_warnings

备注:plugins.modedisabled_reason 等更细字段目前不是已实现契约。


5. 选择算法(规范化后确定性求解)

输入:

  • available(catalog 名称集合)
  • default_enabled
  • enabled[]
  • disabled[]

预处理:

  • 名称 trim + lower
  • 去重并记录 warning

求解规则(按优先级):

  1. 若名称不在 available
    • 出现在 enabled => 错误(fail fast)
    • 出现在 disabled => warning(继续)
  2. 对有效插件名:
    • disabled 中 => disabled
    • 否则若 enabled 非空 => 仅 enabled 中为 enabled
    • 否则 => default_enabled 决定

算法特性:

  • 决策函数纯函数化(相同输入必得相同输出)
  • 输出排序稳定(可测试、可比较)
  • 冲突可解释(disabled 覆盖 enabled)

6. 正确性与兼容性不变量

必须满足的不变量:

  1. plugins 配置时,行为与 baseline 保持一致。
  2. enabled 出现未知插件必须中止启动,防 silent misconfig。
  3. 选择结果在 agent/gateway 两条入口一致。
  4. Run() 前完成 plugin manager 注入,不允许运行时半初始化。
  5. CLI 与启动摘要描述同一真实状态,不出现 docs contract drift。

近期补强(稳定性修复):

  • pkg/agent/loop.go 加入 default-agent nil guard(避免 panic)
  • 恢复/收敛 LLM timeout 与 context-window 错误分类,降低误压缩重试
  • extractPeer 加 metadata -> msg.Peer 兼容回退

7. 失败模式与回滚路径

7.1 失败模式

  • 配置误写未知 enabled 插件 -> 启动失败
  • disabled 列表写错 -> warning,功能按剩余有效配置运行
  • catalog/factory 异常 -> 启动失败并返回具体 plugin 名称

7.2 回滚操作

  • 删除 plugins block(或清空 enabled/disabled)恢复 baseline 行为
  • 保留 phase2 控制平面,不影响 core agent 执行主链路

8. 可观测性与运维面

当前观测面:

  • 启动摘要字段(enabled/disabled/unknown/warnings)
  • CLI plugin list 状态快照
  • CLI plugin lint 预检

尚未纳入本阶段的观测面(后续候选):

  • per-hook 执行耗时与结果
  • 插件级 error code taxonomy
  • runtime health dashboard 聚合

9. 为什么不在这轮做动态加载

动态加载会引入以下未闭环问题:

  • ABI/toolchain 耦合
  • 进程隔离与权限边界
  • 供应链信任(签名、校验、撤销)
  • 宕机恢复与 crash-loop 控制

结论:先把“选择 + 观测”打牢,再在独立 RFC 里推进 runtime loader(建议 subprocess + RPC 模式)。


10. 评审拍板点(需要你明确结论)

请重点给 yes/no:

  1. 两平面拆分(Selection -> Introspection)是否成立。
  2. unknown in enabled = error / unknown in disabled = warning 是否符合线上安全策略。
  3. Phase 3 CLI 先只承诺 name/status 是否接受。
  4. 动态加载延后到独立 RFC 是否接受。

若上述 4 点通过,我们按现有 PR 栈推进;否则我会按你的结论拆出增量修订 PR(不扩大当前风险面)。


11. ASCII 架构图(评审速览)

11.1 控制平面 + 可观测平面

                    +-----------------------+
                    |   config.json         |
                    | plugins.{...}         |
                    +-----------+-----------+
                                |
                                v
                 +--------------+---------------+
                 | ResolveSelection()           |
                 | pkg/plugin/manager.go        |
                 +--------------+---------------+
                                |
                   enabled/disabled/unknown/warn
                                |
                                v
          +---------------------+----------------------+
          | pluginruntime.ResolveConfiguredPlugins()   |
          | cmd/picoclaw/internal/pluginruntime        |
          +---------------------+----------------------+
                                |
                   instantiate enabled plugins
                                |
          +---------------------+----------------------+
          |                                            |
          v                                            v
+---------+----------------+              +------------+---------------+
| agent entrypoint         |              | gateway entrypoint         |
| internal/agent/helpers.go|              | internal/gateway/helpers.go|
+---------+----------------+              +------------+---------------+
          |                                            |
          +----------------------+---------------------+
                                 |
                                 v
                       +---------+----------+
                       | AgentLoop + Hooks  |
                       | pkg/agent/loop.go  |
                       +---------+----------+
                                 |
                                 v
                       startup diagnostics keys:
                       - plugins_enabled
                       - plugins_disabled
                       - plugins_unknown_enabled
                       - plugins_unknown_disabled
                       - plugins_warnings

11.2 CLI 运维路径

picoclaw plugin list
        |
        v
LoadConfig -> ResolveConfiguredPlugins -> buildPluginStatuses
        |                                   |
        |                                   +--> name,status (text/json)
        v
 output to operator

picoclaw plugin lint --config <path>
        |
        v
LoadConfig -> ResolveConfiguredPlugins
        |
        +--> ok  -> exit 0
        +--> err -> non-zero + actionable error

11.3 选择算法(决策树)

for plugin in sorted(available):
  n = normalize(plugin)

  if n in disabled:
      state = DISABLED
      continue

  if enabled is not empty:
      if n in enabled:
          state = ENABLED
      else:
          state = DISABLED
      continue

  if default_enabled:
      state = ENABLED
  else:
      state = DISABLED

Validation:
- unknown in enabled  -> ERROR (fail fast)
- unknown in disabled -> WARNING
- duplicates          -> dedupe + WARNING
- overlap             -> disabled wins

@gh-xj gh-xj marked this pull request as ready for review March 1, 2026 09:39
Copilot AI review requested due to automatic review settings March 1, 2026 09:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces Phase 2 plugin selection primitives and scaffolding: a config schema for enable/disable selection, a deterministic resolver, a builtin plugin catalog with a demo plugin, and bootstrap logic to resolve configured plugins into instances.

Changes:

  • Add plugins config schema (default_enabled, enabled, disabled) with defaults + JSON tests.
  • Implement plugin.ResolveSelection and a plugin.Manager for compile-time plugin registration + introspection.
  • Add builtin plugin catalog + bootstrap resolver to instantiate enabled builtin plugins (plus a demo policy plugin and tests).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/plugin/manager.go Adds plugin contract, selection resolver, and manager registration/describe APIs.
pkg/plugin/manager_test.go Unit tests for manager behavior and selection resolver edge cases.
pkg/plugin/demoplugin/policy_demo.go Adds a demo “policy” plugin exercising tool/message/session hooks.
pkg/plugin/demoplugin/policy_demo_test.go Tests demo plugin blocking/redaction/allowlist/timeout normalization.
pkg/plugin/builtin/catalog.go Adds builtin plugin catalog + deterministic Names().
pkg/plugin/builtin/catalog_test.go Ensures catalog contains policy demo and Names() is deterministic/sorted.
pkg/hooks/hooks.go Implements hook registry with priority ordering, modifying vs void hooks, concurrency + cloning.
pkg/hooks/hooks_test.go Extensive tests for ordering, cancellation, concurrency, mutation isolation, budgets, panic recovery.
pkg/hooks/types.go Defines hook event types consumed by the registry and plugins.
pkg/config/config.go Adds PluginsConfig to top-level config schema.
pkg/config/defaults.go Sets plugin defaults in DefaultConfig().
pkg/config/config_test.go Adds tests for plugin defaults and JSON unmarshalling.
cmd/picoclaw/internal/pluginruntime/bootstrap.go Adds bootstrap helper to resolve config -> enabled builtin plugin instances + summary.
cmd/picoclaw/internal/pluginruntime/bootstrap_test.go Tests bootstrap behavior for unknown enabled/disabled and deterministic ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +76 to +84
enabledSet := make(map[string]struct{}, len(in.Enabled))
for _, name := range in.Enabled {
normalized := NormalizePluginName(name)
if _, exists := enabledSet[normalized]; exists {
result.Warnings = append(result.Warnings, fmt.Sprintf("duplicate enabled plugin %q ignored", normalized))
continue
}
enabledSet[normalized] = struct{}{}
}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolveSelection() adds normalized plugin names to enabledSet even when normalization yields an empty string (e.g., input of "" or whitespace). This can produce confusing warnings (duplicate enabled plugin ""), and will surface as an "unknown enabled plugins" error with an empty name. Skip empty normalized names (optionally emitting a warning) before de-dupe/validation.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +94
disabledSet := make(map[string]struct{}, len(in.Disabled))
for _, name := range in.Disabled {
normalized := NormalizePluginName(name)
if _, exists := disabledSet[normalized]; exists {
result.Warnings = append(result.Warnings, fmt.Sprintf("duplicate disabled plugin %q ignored", normalized))
continue
}
disabledSet[normalized] = struct{}{}
}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolveSelection() has the same issue for the disabled list: empty/whitespace-only entries are normalized to "" and added to disabledSet. This leads to misleading "unknown disabled plugin "" ignored" warnings. Skip empty normalized names before de-dupe and unknown checks.

Copilot uses AI. Check for mistakes.
c := *e
c.Args = cloneMapStringAny(e.Args)
if e.Result != nil {
r := *e.Result
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloneVoidEvent()'s AfterToolCallEvent cloning shallow-copies tools.ToolResult. ToolResult contains a Media slice, so mutations to Result.Media inside a void hook can leak across handlers and/or back to the caller, violating the stated isolation guarantee. Deep-clone Result.Media when copying ToolResult.

Suggested change
r := *e.Result
r := *e.Result
// Deep-clone the Media slice inside Result (if present) to avoid aliasing.
rv := reflect.ValueOf(&r).Elem()
mediaField := rv.FieldByName("Media")
if mediaField.IsValid() && mediaField.Kind() == reflect.Slice && !mediaField.IsNil() {
mediaCopy := reflect.MakeSlice(mediaField.Type(), mediaField.Len(), mediaField.Len())
reflect.Copy(mediaCopy, mediaField)
mediaField.Set(mediaCopy)
}

Copilot uses AI. Check for mistakes.
Comment on lines +258 to +261
// Truncation is intentional for timeout normalization.
return int(n), true
case float64:
// Truncation is intentional for timeout normalization.
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toInt() converts float32/float64 to int without any range checks. If a tool arg contains a large float value, the float->int conversion can overflow (Go spec: result is implementation-dependent when out of range), potentially producing a negative/small value that bypasses clampArgNumber(). Add max/min bounds checks for float32/float64 similar to the int64/uint64 cases.

Suggested change
// Truncation is intentional for timeout normalization.
return int(n), true
case float64:
// Truncation is intentional for timeout normalization.
// Truncation is intentional for timeout normalization, but enforce int range.
fn := float64(n)
if fn < float64(minInt64) || fn > float64(maxInt64) {
return 0, false
}
return int(fn), true
case float64:
// Truncation is intentional for timeout normalization, but enforce int range.
if n < float64(minInt64) || n > float64(maxInt64) {
return 0, false
}

Copilot uses AI. Check for mistakes.
@sipeed-bot sipeed-bot bot added type: enhancement New feature or request domain: config labels Mar 3, 2026
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sipeed-bot
Copy link

sipeed-bot bot commented Mar 25, 2026

@gh-xj Hi! This PR has had no activity for over 2 weeks, so I'm closing it for now to keep things organized. Feel free to reopen anytime if you'd like to continue.

@sipeed-bot sipeed-bot bot closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants