Skip to content

feat(dream): add SQLite backend for dream version control#3015

Open
JiajunBernoulli wants to merge 30 commits intoHKUDS:nightlyfrom
JiajunBernoulli:dream-by-sqllite
Open

feat(dream): add SQLite backend for dream version control#3015
JiajunBernoulli wants to merge 30 commits intoHKUDS:nightlyfrom
JiajunBernoulli:dream-by-sqllite

Conversation

@JiajunBernoulli
Copy link
Copy Markdown
Contributor

Replace git-based version control with SQLite to avoid conflicts with user's own git repositories. The SQLite backend provides the same API as GitStore, ensuring seamless migration.

Changes

  • Add SQLiteStore class with git-compatible API
  • Add version_backend config option (default: sqlite)
  • Update MemoryStore to support both backends
  • Add comprehensive tests for SQLiteStore

Motivation

The current dream mechanism uses git for version control, which may conflict with user's own git repository in the workspace. This PR introduces SQLite as the default backend, avoiding such conflicts while maintaining full backward compatibility.

Configuration

agents:
  defaults:
    dream:
      version_backend: sqlite  # default, can omit
      # version_backend: git   # use legacy git backend

Closes #2980

chengyongru and others added 27 commits April 5, 2026 22:09
…diately

Dream Phase 2 uses fail_on_tool_error=True, which terminates the entire
run on the first tool error (e.g. old_text not found in edit_file).
Normal agent runs default to False so the LLM can self-correct and retry.
Dream should behave the same way.
PyJWT and cryptography are optional msteams deps; they should not be
bundled into the generic dev install.  Tests now skip the entire file
when the deps are missing, following the dingtalk pattern.
* feat(dream): enhance memory cleanup with staleness detection

- Phase 1: add [FILE-REMOVE] directive and staleness patterns (14-day
  threshold, completed tasks, superseded info, resolved tracking)
- Phase 2: add explicit cleanup rules, file paths section, and deletion
  guidance to prevent LLM path confusion
- Inject current date and file sizes into Phase 1 context for age-aware
  analysis
- Add _dream_debug() helper for observability (dream-debug.log in workspace)
- Log Phase 1 analysis output and Phase 2 tool events for debugging

Tested with glm-5-turbo: MEMORY.md reduced from 149 to 108-129 lines
across two rounds, correctly identifying and removing weather data,
detailed incident info, completed research, and stale discussions.

* refactor(dream): replace _dream_debug file logger with loguru

Remove the custom _dream_debug() helper that wrote to dream-debug.log
and use the existing loguru logger instead. Phase 1 analysis is logged
at debug level, tool events at info level — consistent with the rest
of the codebase and no extra log file to manage.

* fix(dream): make stale scan independent of conversation history

Reframe Phase 1 from a single comparison task to two independent
tasks: history diff AND proactive stale scan. The LLM was skipping
stale content that wasn't referenced in conversation history (e.g.
old triage snapshots). Now explicitly requires scanning memory files
for staleness patterns on every run.

* fix(dream): correct old_text param name and truncate debug log

- Phase 2 prompt: old_string -> old_text to match EditFileTool interface
- Phase 1 debug log: truncate analysis to 500 chars to avoid oversized lines

* refactor(dream): streamline prompts by separating concerns

Phase 1 owns all staleness judgment logic; Phase 2 is pure execution
guidance. Remove duplicated cleanup rules from Phase 2 since Phase 1
already determines what to add/remove. Fix remaining old_string -> old_text.
Total prompt size reduced ~45% (870 -> 480 tokens).

* fix(dream): add FILE-REMOVE execution guidance to Phase 2 prompt

Phase 2 was only processing [FILE] additions and ignoring [FILE-REMOVE]
deletions after the cleanup rules were removed. Add explicit mapping:
[FILE] → add content, [FILE-REMOVE] → delete content.
ExecTool hardcoded bash, breaking exec on Windows. Now uses cmd.exe
via COMSPEC on Windows with a curated minimal env (PATH, SYSTEMROOT,
etc.) that excludes secrets. bwrap sandbox gracefully skips on Windows.
- test_exec_head_tail_truncation: use temp script file instead of
  python -c to avoid cmd.exe quote-parsing issues after PR HKUDS#2893
- test_grep_files_with_matches_supports_head_limit_and_offset: query
  full result set first to avoid mtime-dependent sort assumption
* feat(feishu): add done emoji support for reaction lifecycle

* feat(feishu): add done emoji support and update documentation
exec tool hints previously used val[:40] which cut paths mid-segment
(e.g. "D:\Documents\GitHub\nanobot.worktree…"). Now uses regex to
detect file paths in commands and abbreviates them properly, with
smart truncation at chain separators (&&, |, ;) as fallback.
Two improvements to Feishu streaming card experience:

1. Handle _resuming in send_delta: when a mid-turn _stream_end arrives
   with resuming=True (tool call between segments), flush current text
   to the card but keep the buffer alive so subsequent segments append
   to the same card instead of creating a new one.

2. Inline tool hints into streaming cards: when a tool hint arrives
   while a streaming card is active, append it to the card content
   (e.g. "🔧 web_fetch(...)") instead of sending a separate card.
   The hint is automatically stripped when the next delta arrives.

Made-with: Cursor
Three fixes for inline tool hints:

1. Consecutive tool hints now replace the previous one instead of
   stacking — the old suffix is stripped before appending the new one.

2. When _resuming flushes the buffer, any trailing tool hint suffix
   is removed so it doesn't persist into the next streaming segment.

3. When final _stream_end closes the card, tool hint suffix is
   cleaned from the text before the final card update.

Adds 3 regression tests covering all three scenarios.

Made-with: Cursor
Tool hints should be kept as permanent content in the streaming card
so users can see which tools were called (matching the standalone card
behavior). Previously, hints were stripped when new deltas arrived or
when the stream ended, causing tool call information to disappear.

Now:
- New delta: hint becomes permanent content, delta appends after it
- New tool hint: replaces the previous hint (unchanged)
- Resuming/stream_end: hint is preserved in the final text

Updated 3 tests to verify hint preservation semantics.

Made-with: Cursor
…splay

Two display fixes based on real-world Feishu testing:

1. tool_hints.py: format_tool_hints now deduplicates by comparing the
   fully formatted hint string instead of tool name alone. This fixes
   `ls /Desktop` and `ls /Downloads` being incorrectly merged as
   `ls /Desktop × 2`. Truly identical calls still fold correctly.
   (_group_consecutive and all abbreviation logic preserved unchanged.)

2. feishu.py: inline tool hints now display one tool per line with
   🔧 prefix, and use double-newline trailing to prevent Setext heading
   rendering when followed by markdown `---`.

Made-with: Cursor
…_delta for throttling

- Make tool_hint_prefix configurable in FeishuConfig (default: 🔧)
- Delegate tool hint card updates from send() to send_delta() so hints
  automatically benefit from _STREAM_EDIT_INTERVAL throttling
- Fix staticmethod calls to use self.__class__ instead of self
- Document all supported metadata keys in send_delta docstring
- Add test for empty/whitespace-only tool hint with active stream buffer
- Add explicit error logging for missing file_key and message_id
- Add logging for download failures
- Change audio extension from .opus to .ogg for better Whisper compatibility
- Feishu voice messages are opus in OGG container; .ogg is more widely recognized
Port Python implementation from a1ec7b1
(websocket channel module and channel tests; excludes webui debug app).
- Use hmac.compare_digest for timing-safe static token comparison
- Add issued token capacity limit (_MAX_ISSUED_TOKENS=10000) with 429 response
- Use atomic pop in _take_issued_token_if_valid to eliminate TOCTOU window
- Enforce TLSv1.2 minimum version for SSL connections
- Extract _safe_send helper for consistent ConnectionClosed handling
- Move connection registration after ready send to prevent out-of-order delivery
- Add HTTP-level allow_from check and client_id truncation in process_request
- Make stop() idempotent with graceful shutdown error handling
- Normalize path via validator instead of leaving raw value
- Default websocket_requires_token to True for secure-by-default behavior
- Add integration tests and ws_test_client helper
- Refactor tests to use shared _ch factory and bus fixture
Comprehensive guide covering wire protocol, configuration reference,
token issuance, security notes, and common deployment patterns.
QQ channel improvements (on top of nightly):
- Add top-level try/except in _on_message and send() for resilience
- Use defensive getattr() for attachment attributes (botpy version compat)
- Skip file_name for image uploads to avoid QQ rendering as file attachment
- Extract only file_info from upload response to avoid extra fields
- Handle protocol-relative URLs (//...) in attachment downloads

WeCom channel improvements:
- Add _upload_media_ws() for WebSocket 3-step media upload protocol
- Send media files (image/video/voice/file) via WeCom rich media API
- Support progress messages (plain reply) vs final response (streaming)
- Support proactive send when no frame available (cron push)
- Pass media_paths to message bus for downstream processing
- Use asyncio.to_thread for file I/O to avoid blocking event loop
- Add 200MB upload size limit with early rejection
- Fix file handle leak by using context manager
- Free raw bytes early after chunking to reduce memory pressure
- Add file attachments to media_paths (was text-only, inconsistent with image)
- Use robust _sanitize_filename() instead of os.path.basename() for path safety
- Remove re-raise in send() for consistency with QQ channel
- Fix truncated media_id logging for short IDs
- Use asyncio.to_thread for file I/O to avoid blocking event loop
- Add 200MB upload size limit with early rejection
- Fix file handle leak by using context manager
- Use memoryview for upload chunking to reduce peak memory
- Add inbound download size check to prevent OOM
- Use asyncio.to_thread for write_bytes in download path
- Extract inline media_type detection to _guess_wecom_media_type()
Cover helpers (sanitize_filename, guess media type), outbound send
(exception handling, media-then-text order, fallback), inbound message
processing (attachments, dedup, empty content), _post_base64file
payload filtering, and WeCom upload/download flows.
…oken cost and latency (HKUDS#2982)

When a user is idle for longer than a configured TTL, nanobot **proactively** compresses the session context into a summary. This reduces token cost and first-token latency when the user returns — instead of re-processing a long stale context with an expired KV cache, the model receives a compact summary and fresh input.
Replace git-based version control with SQLite to avoid conflicts with
user's own git repositories. The SQLite backend provides the same API
as GitStore, ensuring seamless migration.

- Add SQLiteStore class with git-compatible API
- Add version_backend config option (default: sqlite)
- Update MemoryStore to support both backends
- Add comprehensive tests for SQLiteStore

Closes HKUDS#2980
@chengyongru
Copy link
Copy Markdown
Collaborator

Overall: The SQLite backend itself looks clean and well-tested. The API compatibility with GitStore is solid. However, the backward compatibility story needs work before this can safely merge to nightly.

1. Default value "sqlite" silently migrates all existing users [CRITICAL]

# schema.py — new field, defaults to sqlite
version_backend: str = "sqlite"

Every existing user who upgrades will:

  • Lose access to their existing dream git commit history (it stays in .git/ but the runtime stops reading it)
  • Start fresh with an empty SQLite DB at memory/.dream_history.db

This is a silent behavior change. The default should remain "git" so existing users keep their current behavior, and only new installs or explicit opt-ins get SQLite:

version_backend: str = "git"  # preserve existing behavior

2. helpers.py:init_workspace() still hardcodes GitStore [MAJOR]

nanobot/utils/helpers.py:473-474 creates a GitStore unconditionally during onboard / workspace init. After this PR, the onboard flow initializes git, but the runtime agent uses SQLite by default — you end up with both .git/ and memory/.dream_history.db in the same workspace. This should either:

  • Read from the same config to decide which backend to init, or
  • Be updated to initialize through MemoryStore instead

3. No migration path for existing git history [MAJOR]

Users with existing dream commit histories have no way to bring them into SQLite. A one-time migration would be straightforward — read all commits from GitStore via log(), replay snapshots into SQLiteStore. Even a simple nanobot migrate-dream CLI command or a startup detection (.git exists with dream commits but no .dream_history.db → offer migration) would help.

4. Duplicate CommitInfo definition [MINOR]

CommitInfo is defined identically in both gitstore.py:14 and sqlitestore.py:17. Consider extracting it to a shared location (e.g. nanobot/utils/version_store.py) to avoid drift.

5. Property name git returns SQLiteStore [NIT]

# memory.py:58
def git(self) -> GitStore | SQLiteStore:
    """Version store for dream history (GitStore or SQLiteStore)."""
    return self._version_store

The property is called git but may return a SQLiteStore. Consider renaming to version_store or adding an alias.


Summary: The SQLite implementation itself looks good. I'd suggest flipping the default to "git", syncing helpers.py, and adding a migration path.

@chengyongru chengyongru added the invalid This doesn't seem right label Apr 11, 2026
- Change version_backend default from 'sqlite' to 'git' for backward compatibility
- Fix helpers.py to respect version_backend config instead of hardcoding GitStore
- Add migrate-dream CLI command for migrating git history to SQLite
- Extract CommitInfo to shared version_store.py to avoid duplicate definitions
- Add version_store property to MemoryStore (keep 'git' as backward compat alias)
- Add tests for default version_backend value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request invalid This doesn't seem right

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants