Skip to content

feat: real-time bot logs and health monitoring dashboard#87

Merged
BillChirico merged 80 commits intomainfrom
feat/logs-and-health
Feb 25, 2026
Merged

feat: real-time bot logs and health monitoring dashboard#87
BillChirico merged 80 commits intomainfrom
feat/logs-and-health

Conversation

@BillChirico
Copy link
Collaborator

@BillChirico BillChirico commented Feb 25, 2026

Summary

Closes #35 — Real-time log streaming and health monitoring dashboard.

Backend

Component File Description
WebSocket transport src/transports/websocket.js Custom Winston transport — buffers + broadcasts logs to WS clients
Log stream server src/api/ws/logStream.js WS endpoint with HMAC ticket auth, level/module/search filtering, queryLogs() historical replay, graceful shutdown
Health endpoint src/api/routes/health.js Extended: uptime, memory, CPU, Discord stats, error counts (1h/24h), Node version. Lazy-loads queryLogs with try/catch fallback.
Restart tracker src/utils/restartTracker.js PostgreSQL-backed restart history with signal/reason tracking, self-healing table creation on 42P01

Frontend

Component File Description
Log viewer log-viewer.tsx Terminal-style display with auto-scroll, color-coded levels, expandable metadata, keyboard-accessible
Log filters log-filters.tsx Level, module, and search filtering (disabled until WS connected)
WS client log-ws.ts Reconnecting WS with HMAC ticket auth, exponential backoff, metadata normalization
Health cards health-cards.tsx 8-card grid: uptime, memory, Discord ping, guilds, errors (1h/24h), CPU utilization %, Node version
Restart history restart-history.tsx Last 20 restarts with color-coded reason badges
Health section health-section.tsx Client orchestrator with 60s auto-refresh and AbortController cleanup
Logs page logs/page.tsx Route at /dashboard/logs combining health + logs
API routes bot-health/route.ts, ws-ticket/route.ts Proxy endpoints for health data and WS ticket generation

Security

  • WebSocket auth uses short-lived HMAC tickets (nonce.expiry.hmac) — BOT_API_SECRET never leaves the server
  • WS ticket endpoint validates session before issuing tickets

Review Fixes (Round 2)

All 25 CodeRabbit review comments addressed:

  • Critical: handleAuth async/await fix, HMAC ticket auth replacing raw secret exposure
  • Major: CPU utilization % (not raw seconds), keyboard a11y, metadata flattening, ticket retry with backoff, waiter cleanup in tests
  • Minor: Structured logging metadata, 300ms ping boundary, dead code removal, substring matching fix, isBotHealth item validation, redundant wrapper removal, filter disable-until-connected
  • Backend hardening: Lazy imports, try/catch fallbacks, self-healing table creation, structured warn metadata

Tests

  • 54 new tests across 4 test files
  • Full suite: 1308 passing, 0 failing

Stats

  • +3,182 lines across 27 files | 50 commits (progressive workflow)
  • next build clean | tsc --noEmit clean | npm test clean

Deferred

…and restart data

- Add system info (platform, nodeVersion, cpuUsage) to authenticated response
- Add error counts (lastHour, lastDay) via queryLogs from logs table
- Add restart data with graceful fallback when restartTracker unavailable
- Route is now async to support parallel queryLogs calls
- Tests: 3 new test cases covering system, errors, and restart fallback
- All 1261 tests passing, lint clean on changed files

Closes #35
fix: scope gitignore logs/ pattern to root only
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds real-time WebSocket log streaming (server, transport, and client), restart tracking persisted to PostgreSQL, enhanced /health diagnostics, logger/server lifecycle integration for WS transport, Next.js API endpoints and UI for log viewing and health, and comprehensive tests.

Changes

Cohort / File(s) Summary
Repo config & deps
/.gitignore, package.json
Restrict logs/ ignore to repo root, preserve data/.gitkeep; add ws ^8.19.0 dependency.
WebSocket server & lifecycle
src/api/ws/logStream.js, src/api/server.js, src/index.js
New WS log stream implementation with auth, filters, heartbeat; startServer accepts { wsTransport }; start/stop integrate WS stream and lifecycle wiring in index.
Winston transport & logger API
src/transports/websocket.js, src/logger.js
New WebSocketTransport broadcasting filtered log entries; addWebSocketTransport/removeWebSocketTransport added; logger included in default export.
Restart tracking
src/utils/restartTracker.js
New module to record restarts, update uptime on shutdown, and query restart history (creates table if missing).
Health route
src/api/routes/health.js
Health route made async; exposes system metadata, error counts via optional log query, and restart info when authenticated; graceful fallbacks on missing modules/errors.
Server-side tests
tests/api/routes/health.test.js, tests/api/ws/logStream.test.js, tests/transports/websocket.test.js, tests/utils/restartTracker.test.js, tests/index.test.js
Add tests covering health extensions, WS auth/filter/lifecycle, WebSocketTransport behavior, restartTracker DB flows, and package.json version mocking.
Frontend API routes
web/src/app/api/bot-health/route.ts, web/src/app/api/log-stream/ws-ticket/route.ts
New Next.js proxy for upstream /health and new endpoint to mint short-lived HMAC WS tickets (auth + config validation).
Frontend log streaming & types
web/src/lib/log-ws.ts, web/src/components/dashboard/types.ts
Client hook useLogStream with ticket fetch, WS auth/reconnect/history/filter; runtime BotHealth and Log types with guard.
Frontend dashboard pages & components
web/src/app/dashboard/logs/page.tsx, web/src/components/dashboard/*
New Logs page and components: HealthSection, HealthCards, RestartHistory, LogFilters, LogViewer; wiring for filters, auto-refresh, and UI interactions.
Frontend nav
web/src/components/layout/sidebar.tsx
Added "Logs" navigation item linking to /dashboard/logs.
Utilities
web/src/lib/format-time.ts
Added formatUptime helper used by dashboard components.

Possibly related PRs

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Most changes are in-scope with issue #35, but memory/CPU time-series graphs and alert configuration are deferred; minor ws dependency addition and .gitignore changes appear tangential. Clarify whether ws package and .gitignore updates are necessary infrastructure changes or should be separate commits.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: real-time bot logs and health monitoring dashboard' clearly and concisely summarizes the main changes—adding real-time log streaming and health monitoring features to the dashboard.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, detailing backend components, frontend components, security measures, and test coverage.
Linked Issues check ✅ Passed The PR implements core objectives from #35: WebSocket endpoint for real-time log streaming, log viewer UI with color-coded levels and auto-scroll, log filters by level/module/search, health cards with uptime/memory/CPU/Discord stats/error counts, and restart history tracking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/logs-and-health

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/api/server.js (1)

111-127: ⚠️ Potential issue | 🟠 Major

If setupLogStream throws synchronously the Promise never settles.

app.listen's callback is not wrapped in a try/catch by Node. An exception thrown by setupLogStream at line 117 would propagate as an unhandled exception at process level; resolve(server) at line 120 would never be reached and reject is only wired to the 'error' event (OS-level failures), so startServer's caller would hang.

🔧 Proposed fix
  return new Promise((resolve, reject) => {
    server = app.listen(port, () => {
      info('API server started', { port });

      // Attach WebSocket log stream if transport provided
      if (options.wsTransport) {
-       setupLogStream(server, options.wsTransport);
+       try {
+         setupLogStream(server, options.wsTransport);
+       } catch (err) {
+         error('Failed to start WebSocket log stream', { error: err.message });
+         // Non-fatal: HTTP server is already listening; continue without WS streaming
+       }
      }

      resolve(server);
    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/server.js` around lines 111 - 127, The app.listen callback may throw
if setupLogStream throws, leaving the Promise unresolved; update the app.listen
callback (in startServer) to wrap the setupLogStream call in a try/catch: call
setupLogStream(server, options.wsTransport) inside try, and on catch log the
error, set server = null, close the server if it started (or call server.close
if available), and call reject(err) so the Promise settles; only call
resolve(server) after the try completes successfully. Ensure the
server.once('error', ...) handler remains for OS-level errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 4: The /logs/ entry in .gitignore is root-scoped and thus already
excludes only the repository root logs directory, making the later unignore
patterns for web/src/app/dashboard/logs/ redundant; either remove the two
unignore lines "!web/src/app/dashboard/logs/" and
"!web/src/app/dashboard/logs/**" or add a clear comment above them explaining
why they must be kept (for example, if you intentionally want to allow that
specific nested logs dir while still ignoring root /logs/), and ensure the final
.gitignore reflects that intent consistently.

In `@src/api/routes/health.js`:
- Line 8: The route currently statically imports queryLogs which makes the
optional diagnostics feature crash authenticated health checks if the logs table
or DB pool is missing; change the code to either perform a dynamic import (like
restartTracker) and fall back gracefully when import fails, or null-guard the DB
pool before invoking queryLogs by checking the pool (e.g. dbPool != null) and
only calling queryLogs when present, returning a safe default/omitting logs
otherwise; update any references to queryLogs in the health handler to use the
dynamically imported module or the guarded call so absence of the logs
table/pool does not cause the health endpoint to fail.
- Around line 63-71: The health endpoint currently uses Promise.all to fetch
error counts via queryLogs which causes the whole handler to reject if one query
fails; change this to a fault-tolerant approach (wrap the queries in try/catch
or use Promise.allSettled) so that failures don't throw and instead you set safe
fallback values (e.g., null or 0) for body.errors.lastHour and
body.errors.lastDay; locate the Promise.all call that assigns
hourResult/dayResult and replace it with independent guarded calls to queryLogs
(or process Promise.allSettled results) and map each result to result.total or a
fallback before assigning body.errors.

In `@src/api/ws/logStream.js`:
- Line 313: The shutdown log at the info('WebSocket log stream server stopped')
call must include structured metadata for observability; update the call in
src/api/ws/logStream.js to pass a metadata object with at least service:
'ws-log-stream' (or similar), event: 'shutdown', processId (process.pid), uptime
(process.uptime()), and any server identifier/port (e.g. server.address() or
socket name) so downstream systems can correlate this event; ensure the metadata
object keys match existing logging conventions and include a timestamp if not
automatically added.
- Around line 142-145: Guard the async auth path by awaiting and catching
rejections from handleAuth(ws, msg) in the switch branch for msg.type === 'auth'
(wrap the call in try/catch or attach .catch) so malformed payloads don't cause
unhandled rejections; inside handleAuth ensure the secret is validated/coerced
to a string before calling isValidSecret (e.g., check typeof msg.secret ===
'string' or String(msg.secret') and guard Buffer.from(secret) with a try/catch
or validation) to prevent throws on unexpected types; finally, update the
logging call around the authentication flow (the unstructured log at the later
logging site) to use structured metadata (include fields like event, userId,
clientIp, and error) instead of a plain string to comply with src/ logging
guidelines.

In `@src/index.js`:
- Around line 262-268: The current shutdown block swallows all errors when
calling getPool() and updateUptimeOnShutdown(), making failures invisible;
change the try/catch to catch the error object and surface it (e.g.,
processLogger.error or console.error) with context mentioning
updateUptimeOnShutdown/getPool, and only suppress the error when you can confirm
the pool is intentionally uninitialized (e.g., getPool() returned null or no
DATABASE_URL); if it's a real runtime failure, log the full error and rethrow or
exit to ensure the shutdown problem is visible.

In `@src/transports/websocket.js`:
- Around line 82-85: filter.search currently assumes entry.message is a string
and calls .toLowerCase(), which can throw or silently fail for non-string
messages; coerce entry.message to a string (e.g. via String(...) or explicit
typeof check and .toString()) and normalize it before doing
.includes(searchLower) so null/objects/numbers won't break filtering; apply the
same fix in the other filtering block that uses filter.search (the block
referencing searchLower / entry.message later in the file).

In `@src/utils/restartTracker.js`:
- Line 72: The warning at warn('updateUptimeOnShutdown called before
recordRestart — skipping') should include structured metadata so logs are
searchable; modify the call in updateUptimeOnShutdown to pass an object with
contextual fields (e.g., { function: 'updateUptimeOnShutdown', action:
'recordRestartMissing', restartId, service: 'restartTracker' }) and any
available identifiers (like restartId or processId) so downstream aggregation
can filter and correlate with recordRestart and other events.
- Around line 96-108: getRestarts currently fails if the bot_restarts table is
missing; update getRestarts to detect that case and self-heal by creating the
table (or running a CREATE TABLE IF NOT EXISTS bot_restarts (...) with the
expected columns id, timestamp, reason, version, uptime_seconds) and then retry
the query—use the existing pool to run the CREATE TABLE IF NOT EXISTS before
re-querying, log creation attempts via logError/log info, and only return []
after a genuine non-recoverable error; reference getRestarts, bot_restarts, and
logError when implementing.

In `@TASK.md`:
- Line 3: The markdown lint failures MD022/MD058 are caused by missing blank
lines around headings and the requirements table; update TASK.md by ensuring
there is an empty line before and after each heading (e.g., the "## Parent"
heading) and by surrounding the requirements table with a blank line above and
below it (and apply the same fix to the other occurrences noted in the review).
Make the minimal edits to add single blank lines so headings and the table
comply with MD022/MD058 without changing content.

In `@tests/api/ws/logStream.test.js`:
- Around line 49-56: The timeout handler removes waiters by comparing the stored
resolve function reference, which differs from the resolve passed later, so
timed-out waiters remain and can steal future messages; fix by creating the
waiter object first (e.g., const waiter = { resolve: ..., timer: ... }), push
that waiter into waiters, and in the timeout callback remove the waiter object
by identity (findIndex(w => w === waiter) or filter) before rejecting; also
ensure clearTimeout(timer) is called when the waiter is resolved to avoid leaks
— update the logic around waiters, timer, resolve and the timeout callback in
this test.

In `@web/src/app/api/bot-health/route.ts`:
- Around line 68-69: The catch block currently calls
logger.error("[api/bot-health] Failed to proxy health data:", error) which
passes an Error object directly as meta; change it to extract and pass a plain
object (e.g. const errObj = error instanceof Error ? { error: error.message,
stack: error.stack } : { error: String(error) }) and call
logger.error("[api/bot-health] Failed to proxy health data:", errObj) so the
logger receives a serializable meta object; update the catch in the same scope
where logger.error is used.
- Around line 29-30: The error message in route.ts always claims both
BOT_API_URL and BOT_API_SECRET are missing; update the check around
botApiBaseUrl and botApiSecret to build and log a message that lists which
specific env var(s) are actually missing (e.g., inspect botApiBaseUrl and
botApiSecret, collect the names of those that are falsy, then call logger.error
with a message like "[api/bot-health] Missing config: <list>"); ensure you still
return/throw after logging so execution stops when required config is absent and
reference the existing symbols botApiBaseUrl, botApiSecret, and logger.error.

In `@web/src/app/api/log-stream/ws-ticket/route.ts`:
- Line 56: The endpoint currently returns the long-lived botApiSecret in
NextResponse.json({ wsUrl, secret: botApiSecret }) which exposes the static API
key to the browser; instead, generate a short-lived single-use ticket
server-side (e.g. crypto.randomUUID() or a short-TTL JWT signed with
botApiSecret), store it in an in-memory store or cache with a 30–60s TTL keyed
by the ticket, return NextResponse.json({ wsUrl, ticket }) (remove botApiSecret
from the response), and update the bot WS handshake (/ws/logs handler) to
validate and consume the ticket server-side before establishing the session so
the static secret is never sent to clients.

In `@web/src/app/dashboard/logs/page.tsx`:
- Around line 43-46: The LogFilters component is still enabled during the
"connecting" WebSocket state so filters set then can be dropped; update the
disabled prop to disable filters whenever status !== "connected" (i.e.,
disabled={status !== "connected"}) or implement queuing by storing the last
filter in state inside the component that owns handleFilterChange and calling
sendFilter when the socket emits "connected" (reference LogFilters,
handleFilterChange, sendFilter and the status variable) so filter changes made
during handshake are either blocked or reliably resent on connect.
- Around line 20-25: The handleFilterChange wrapper is unnecessary—replace uses
of handleFilterChange with the sendFilter function from useLogStream, remove the
handleFilterChange constant and its useCallback, and delete the now-unused
imports (useCallback and LogFilter) if they are no longer referenced; update any
event/prop that received handleFilterChange to receive sendFilter directly so
the stable sendFilter reference is used without an extra wrapper.

In `@web/src/components/dashboard/health-cards.tsx`:
- Around line 82-87: The displayed cpuUser and cpuSystem values are cumulative
CPU times (health.system.cpuUsage.user/system) converted to seconds, not
utilization; update the UI to either 1) clearly label these fields as "CPU Time"
(rename cpuUser/cpuSystem or their display labels in the HealthCard component)
so they aren't mistaken for percent utilization, or 2) compute true utilization
on the frontend by sampling health.system.cpuUsage at two timestamps and
calculating (delta_cpu_time / elapsed_time) * 100 to produce a percent
(implement sampling logic around the component that reads health.system.cpuUsage
and computes deltas into a new cpuUserPercent/cpuSystemPercent), and update the
display accordingly; adjust the same logic for the other occurrences flagged
(around cpuUser/cpuSystem and lines ~200-215).
- Around line 45-47: The ping-to-color logic in
web/src/components/dashboard/health-cards.tsx currently treats ping === 300 as
red; update the conditional that reads if (ping < 300) to be inclusive (if (ping
<= 300)) so a 300ms ping returns "text-yellow-500" (locate the snippet using the
variable name ping and the returned classes
"text-green-500"/"text-yellow-500"/"text-red-500").

In `@web/src/components/dashboard/health-section.tsx`:
- Around line 77-80: In the finally block that currently does `if
(abortControllerRef.current === controller) { setLoading(false); }`, avoid
calling setLoading for background refreshes by only clearing loading when this
fetch actually set it: change the guard to check the backgroundRefresh flag (or
the same loading state) — e.g., `if (abortControllerRef.current === controller
&& !backgroundRefresh) setLoading(false);` — so `setLoading(false)` runs only
for non-background fetches; reference abortControllerRef, controller, setLoading
and backgroundRefresh to find and update the code.

In `@web/src/components/dashboard/log-viewer.tsx`:
- Around line 71-79: The metadata toggle currently uses a click-only <div> (the
element rendering className and onClick={hasMeta ? onToggle : undefined}), which
is not keyboard-accessible; update that element to be keyboard-accessible by:
adding role="button" and tabIndex={hasMeta ? 0 : -1}, wiring an onKeyDown
handler that calls onToggle when Enter or Space are pressed, keeping the
existing onClick behavior, and adding an appropriate aria-expanded (e.g.,
aria-expanded={isOpen} or !!hasMetaState) and aria-controls if applicable;
reference the container element that uses hasMeta and the onToggle handler to
locate where to apply these changes.

In `@web/src/components/dashboard/restart-history.tsx`:
- Around line 29-41: The fallback "|| \"< 1m\"" in formatUptime is
dead/unreachable; update the function to return the joined parts directly
instead of using the OR fallback. Locate the formatUptime function and change
the final return from parts.join(" ") || "< 1m" to just parts.join(" "), leaving
the existing early `if (seconds < 60) return \`${seconds}s\`` guard and the
logic that builds parts (d, h, m) intact.
- Around line 49-71: The reasonStyle function misclassifies "restart" because
normalized.includes("start") matches substrings; update reasonStyle to avoid
substring matches by replacing the includes("start") check with a word-boundary
check or explicit checks (e.g., test for whole-word "start" using a regex like
/(^|\s)start(\s|$)/ or first guard against "restart" by checking
normalized.includes("restart") and routing it to the appropriate bucket) so that
the startup (start) branch only triggers for standalone "start" and not
"restart".

In `@web/src/components/dashboard/types.ts`:
- Line 65: The isBotHealth validator currently only checks
Array.isArray(v.restarts) but must also validate each RestartRecord element;
update the check inside isBotHealth to ensure v.restarts.every(restart =>
restart && typeof restart === 'object' && typeof restart.timestamp === 'number'
&& typeof restart.uptimeBefore === 'number' && typeof restart.uptimeAfter ===
'number' && (restart.reason === undefined || typeof restart.reason ===
'string')), so downstream restart-history.tsx can safely access
restart.timestamp, restart.uptimeBefore, etc.; adjust the type predicate to
reject null/ malformed entries accordingly.

In `@web/src/lib/log-ws.ts`:
- Around line 64-70: The normalization currently leaves server-provided metadata
nested as meta.metadata; update the logic around the destructuring of r (the
variables { message: _m, timestamp: _t, level: _l, module: _mod, type: _type,
...rest }) and construction of meta so that if rest.metadata exists and is an
object its own properties are shallow-merged into meta (and removed from nested
metadata), and if metadata is empty it is omitted; in other words, build meta
from rest plus flattened rest.metadata (only when it contains keys) before
returning { id, timestamp, level, message, module, meta } so the UI sees
flattened fields instead of meta.metadata.
- Around line 115-119: The current code in the block that calls fetchTicket()
(inside the log WS connection setup) returns early when no ticket is returned or
unmountedRef.current is true, which prevents scheduling reconnects; change this
so that when fetchTicket() fails or returns falsy (and the component is still
mounted) you do not permanently return but instead schedule a reconnect/retry
(use the existing reconnect scheduling path or call the same function that
enqueues retries) and keep connectingRef.current updated; specifically modify
the fetchTicket() handling in the function that uses unmountedRef and
connectingRef so that on falsy ticket you trigger the reconnect/backoff logic
(or set a retry timer) rather than simply returning, while still
short-circuiting immediately if unmountedRef.current is true.

---

Outside diff comments:
In `@src/api/server.js`:
- Around line 111-127: The app.listen callback may throw if setupLogStream
throws, leaving the Promise unresolved; update the app.listen callback (in
startServer) to wrap the setupLogStream call in a try/catch: call
setupLogStream(server, options.wsTransport) inside try, and on catch log the
error, set server = null, close the server if it started (or call server.close
if available), and call reject(err) so the Promise settles; only call
resolve(server) after the try completes successfully. Ensure the
server.once('error', ...) handler remains for OS-level errors.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8a4579 and d00a5c3.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (26)
  • .gitignore
  • TASK.md
  • package.json
  • src/api/routes/health.js
  • src/api/server.js
  • src/api/ws/logStream.js
  • src/index.js
  • src/logger.js
  • src/transports/websocket.js
  • src/utils/restartTracker.js
  • tests/api/routes/health.test.js
  • tests/api/ws/logStream.test.js
  • tests/index.test.js
  • tests/transports/websocket.test.js
  • tests/utils/restartTracker.test.js
  • web/src/app/api/bot-health/route.ts
  • web/src/app/api/log-stream/ws-ticket/route.ts
  • web/src/app/dashboard/logs/page.tsx
  • web/src/components/dashboard/health-cards.tsx
  • web/src/components/dashboard/health-section.tsx
  • web/src/components/dashboard/log-filters.tsx
  • web/src/components/dashboard/log-viewer.tsx
  • web/src/components/dashboard/restart-history.tsx
  • web/src/components/dashboard/types.ts
  • web/src/components/layout/sidebar.tsx
  • web/src/lib/log-ws.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.js: Use ESM modules only — use import/export, never require()
Use node: protocol for Node.js builtins (e.g. import { readFileSync } from 'node:fs')
Always use semicolons
Use single quotes for strings
Use 2-space indentation
No TypeScript — use plain JavaScript with JSDoc comments for documentation

Files:

  • tests/transports/websocket.test.js
  • src/api/server.js
  • tests/index.test.js
  • tests/utils/restartTracker.test.js
  • tests/api/routes/health.test.js
  • src/utils/restartTracker.js
  • tests/api/ws/logStream.test.js
  • src/transports/websocket.js
  • src/api/ws/logStream.js
  • src/logger.js
  • src/index.js
  • src/api/routes/health.js
tests/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Test files must achieve at least 80% code coverage on statements, branches, functions, and lines

Files:

  • tests/transports/websocket.test.js
  • tests/index.test.js
  • tests/utils/restartTracker.test.js
  • tests/api/routes/health.test.js
  • tests/api/ws/logStream.test.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.js: Always use Winston for logging — import { info, warn, error } from ../logger.js
NEVER use console.log, console.warn, console.error, or any console.* method in src/ files
Pass structured metadata to Winston logging calls (e.g. info('Message processed', { userId, channelId }))
Use custom error classes from src/utils/errors.js for error handling
Always log errors with context before re-throwing
Use getConfig(guildId?) from src/modules/config.js to read config
Use setConfigValue(path, value, guildId?) from src/modules/config.js to update config at runtime
Use splitMessage() utility for messages exceeding Discord's 2000-character limit
Use safeSend() wrapper for outgoing Discord messages to sanitize mentions and enforce allowedMentions

Files:

  • src/api/server.js
  • src/utils/restartTracker.js
  • src/transports/websocket.js
  • src/api/ws/logStream.js
  • src/logger.js
  • src/index.js
  • src/api/routes/health.js
🧠 Learnings (2)
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/**/*.js : NEVER use `console.log`, `console.warn`, `console.error`, or any `console.*` method in src/ files

Applied to files:

  • .gitignore
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/**/*.js : Always use Winston for logging — import `{ info, warn, error }` from `../logger.js`

Applied to files:

  • src/logger.js
🧬 Code graph analysis (15)
web/src/components/dashboard/log-filters.tsx (3)
web/src/lib/log-ws.ts (2)
  • LogFilter (20-24)
  • LogLevel (7-7)
src/utils/modAction.js (1)
  • opts (46-56)
web/src/components/ui/button.tsx (1)
  • Button (58-58)
tests/transports/websocket.test.js (1)
src/transports/websocket.js (1)
  • WebSocketTransport (25-159)
src/api/server.js (1)
src/api/ws/logStream.js (2)
  • setupLogStream (53-84)
  • stopLogStream (294-315)
web/src/lib/log-ws.ts (1)
src/api/ws/logStream.js (1)
  • logs (203-209)
web/src/components/dashboard/log-viewer.tsx (4)
web/src/lib/log-ws.ts (3)
  • LogLevel (7-7)
  • ConnectionStatus (26-26)
  • LogEntry (9-18)
web/src/lib/utils.ts (1)
  • cn (4-6)
src/api/ws/logStream.js (1)
  • logs (203-209)
web/src/components/ui/button.tsx (1)
  • Button (58-58)
web/src/app/api/bot-health/route.ts (2)
web/src/lib/bot-api.ts (1)
  • getBotApiBaseUrl (8-18)
src/logger.js (1)
  • error (237-239)
tests/utils/restartTracker.test.js (2)
src/utils/restartTracker.js (9)
  • recordRestart (43-61)
  • updateUptimeOnShutdown (70-87)
  • getRestarts (96-110)
  • getLastRestart (118-121)
  • getStartedAt (128-130)
  • _resetState (137-140)
  • result (49-52)
  • result (98-104)
  • rows (119-119)
src/logger.js (1)
  • warn (230-232)
tests/api/routes/health.test.js (1)
src/api/server.js (2)
  • app (28-28)
  • app (98-98)
web/src/components/dashboard/health-section.tsx (3)
web/src/components/dashboard/types.ts (2)
  • BotHealth (10-33)
  • isBotHealth (35-68)
web/src/components/dashboard/health-cards.tsx (1)
  • HealthCards (67-236)
web/src/components/dashboard/restart-history.tsx (1)
  • RestartHistory (94-150)
src/utils/restartTracker.js (3)
src/index.js (1)
  • pool (264-264)
tests/utils/restartTracker.test.js (1)
  • recordRestart (11-11)
src/logger.js (2)
  • info (223-225)
  • warn (230-232)
tests/api/ws/logStream.test.js (2)
src/transports/websocket.js (1)
  • WebSocketTransport (25-159)
src/api/ws/logStream.js (3)
  • setupLogStream (53-84)
  • stopLogStream (294-315)
  • getAuthenticatedClientCount (323-325)
src/transports/websocket.js (1)
src/logger.js (1)
  • info (223-225)
src/api/ws/logStream.js (2)
src/logger.js (2)
  • info (223-225)
  • warn (230-232)
src/api/middleware/auth.js (1)
  • isValidSecret (16-25)
src/logger.js (1)
src/transports/websocket.js (1)
  • WebSocketTransport (25-159)
web/src/components/dashboard/health-cards.tsx (3)
web/src/components/dashboard/types.ts (1)
  • BotHealth (10-33)
web/src/components/ui/card.tsx (4)
  • Card (76-76)
  • CardHeader (77-77)
  • CardContent (81-81)
  • CardTitle (79-79)
web/src/components/ui/skeleton.tsx (1)
  • Skeleton (12-12)
🪛 markdownlint-cli2 (0.21.0)
TASK.md

[warning] 3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 12-12: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 39-39: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


[warning] 48-48: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


[warning] 58-58: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 64-64: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🔇 Additional comments (8)
package.json (1)

30-31: LGTM!

ws is correctly added for WebSocket support, and explicitly listing winston-transport is appropriate when directly subclassing it for the custom WebSocketTransport.

tests/index.test.js (1)

240-247: LGTM!

The mockImplementation correctly short-circuits on package.json path reads before falling through to stateRaw, preventing the version-read from accidentally consuming or being blocked by a test-supplied stateRaw value.

web/src/components/layout/sidebar.tsx (1)

12-12: LGTM!

ScrollText is correctly imported, and the new Logs nav item follows the established pattern exactly. Active-state detection via pathname.startsWith('/dashboard/logs/') will work correctly for any nested routes under the logs page.

Also applies to: 43-47

web/src/components/dashboard/log-filters.tsx (1)

32-104: LGTM — filter logic, debounce, and cleanup are correctly implemented.

The onFilterChangeRef pattern correctly prevents stale-closure issues in emitFilter while keeping it stable with an empty dep array. The handleLevelChange cancelling the pending debounce before immediate emission correctly handles the race between text-field debounces and an instant level change. Cleanup on unmount is properly guarded.

src/logger.js (1)

291-315: LGTM — addWebSocketTransport / removeWebSocketTransport are cleanly implemented and consistent with the existing Postgres transport pair.

The format pipeline (redactSensitiveData → timestamp → json) matches the Postgres transport, ensuring secrets are scrubbed before reaching WebSocket clients. The synchronous close() (versus the async removePostgresTransport) is appropriate since WebSocketTransport.close() only clears an in-memory Set.

tests/transports/websocket.test.js (1)

1-219: Comprehensive and well-structured test suite — LGTM.

All public methods are exercised, including the no-client fast path, send errors, circular-reference metadata fallback, and per-client filter AND logic. Coverage should comfortably exceed the 80% threshold.

tests/api/routes/health.test.js (1)

15-17: No changes needed — the vi.mock approach used is valid and documented.

The vi.mock factory throwing an error is a standard and documented pattern in Vitest for simulating module load failures. Since restartTracker.js exists and is resolvable, throwing inside the factory correctly simulates a runtime import failure. The health route properly handles this with try-catch (lines 15-20), and the test correctly verifies the fallback behavior when the module is unavailable.

Likely an incorrect or invalid review comment.

tests/utils/restartTracker.test.js (1)

49-229: Strong behavioral coverage across happy-path and failure-path scenarios.

The suite exercises stateful restart tracking, DB query shape, and structured logger assertions in a maintainable way.

@BillChirico
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/api/routes/health.js`:
- Around line 99-109: The health route sets body.restarts to an object on
error/absent getRestartData which breaks the frontend contract expecting an
array; update the code in the getRestartData branch so you coerce restartInfo to
an array and return an empty array on failure/when getRestartData is missing.
Specifically, in the try block use something like body.restarts =
Array.isArray(restartInfo) ? restartInfo : []; and in the catch and the else set
body.restarts = []; to guarantee the shape is always an array (referencing
getRestartData, restartInfo, and body.restarts).

In `@web/src/components/dashboard/health-section.tsx`:
- Around line 70-71: After successfully updating health and last updated time in
the success branch (where setHealth and setLastUpdatedAt are called), clear the
stale error state by calling the component's error-state setter (e.g., setError
or setHealthError) with null/undefined so any existing error banner is removed;
add this call immediately after setHealth and setLastUpdatedAt in the same
success handler.

In `@web/src/components/dashboard/restart-history.tsx`:
- Around line 100-102: The UI is reversing health.restarts even though
getRestarts() in restartTracker.js already returns timestamp DESC; remove the
.reverse() on the restarts construction and instead apply a deterministic sort
on health.restarts by timestamp descending (newest first) before slicing to
MAX_RESTARTS so the restarts array (const restarts: RestartRecord[])
consistently contains the most recent entries; keep the existing slice(0,
MAX_RESTARTS) and reference health.restarts and MAX_RESTARTS when making the
change.

In `@web/src/components/dashboard/types.ts`:
- Around line 44-45: The type guard isBotHealth currently checks m.heapUsed and
m.heapTotal but fails to validate optional memory.rss, letting non-number rss
pass; update the guard in isBotHealth to also verify that if m.rss is defined it
is a number (e.g., add a check like typeof m.rss === "undefined" || typeof m.rss
=== "number") alongside the existing checks for m.heapUsed and m.heapTotal so
memory.rss is validated when present.

In `@web/src/lib/log-ws.ts`:
- Around line 105-106: The fetch to "/api/log-stream/ws-ticket" can be cached by
the browser; update the fetch call in web/src/lib/log-ws.ts (the call that
assigns to res) to explicitly prevent caching by adding cache: "no-store" (or
setting a 'Cache-Control: no-store' header) in the fetch options so each ticket
request is fresh and cannot be reused from browser cache.
- Around line 99-100: Race condition: when enabled toggles during an in-flight
fetchTicket(), an old async connect() can resume and create a stale WebSocket.
Fix by introducing an "attempt token" guarded by connectingRef: set
connectingRef.current to a unique token (e.g., a counter or Symbol) at the start
of connect(), store it locally inside the async flow, and before any post-await
side effects (especially before creating the WebSocket and after fetchTicket()),
verify the token still matches connectingRef.current; if not, abort that connect
attempt. Also ensure connect() early-returns if connectingRef indicates an
active attempt to avoid concurrent connects, and clear/reset connectingRef when
the connect attempt finishes or is aborted; keep using unmountedRef only for
unmount checks.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d4e697 and a1d8b6d.

📒 Files selected for processing (12)
  • src/api/routes/health.js
  • src/api/ws/logStream.js
  • src/index.js
  • src/transports/websocket.js
  • tests/api/ws/logStream.test.js
  • tests/utils/restartTracker.test.js
  • web/src/components/dashboard/health-cards.tsx
  • web/src/components/dashboard/health-section.tsx
  • web/src/components/dashboard/log-viewer.tsx
  • web/src/components/dashboard/restart-history.tsx
  • web/src/components/dashboard/types.ts
  • web/src/lib/log-ws.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.js: Use ESM modules only — use import/export, never require()
Use node: protocol for Node.js builtins (e.g. import { readFileSync } from 'node:fs')
Always use semicolons
Use single quotes for strings
Use 2-space indentation
No TypeScript — use plain JavaScript with JSDoc comments for documentation

Files:

  • tests/api/ws/logStream.test.js
  • src/api/ws/logStream.js
  • src/api/routes/health.js
  • tests/utils/restartTracker.test.js
  • src/transports/websocket.js
  • src/index.js
tests/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Test files must achieve at least 80% code coverage on statements, branches, functions, and lines

Files:

  • tests/api/ws/logStream.test.js
  • tests/utils/restartTracker.test.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.js: Always use Winston for logging — import { info, warn, error } from ../logger.js
NEVER use console.log, console.warn, console.error, or any console.* method in src/ files
Pass structured metadata to Winston logging calls (e.g. info('Message processed', { userId, channelId }))
Use custom error classes from src/utils/errors.js for error handling
Always log errors with context before re-throwing
Use getConfig(guildId?) from src/modules/config.js to read config
Use setConfigValue(path, value, guildId?) from src/modules/config.js to update config at runtime
Use splitMessage() utility for messages exceeding Discord's 2000-character limit
Use safeSend() wrapper for outgoing Discord messages to sanitize mentions and enforce allowedMentions

Files:

  • src/api/ws/logStream.js
  • src/api/routes/health.js
  • src/transports/websocket.js
  • src/index.js
🧠 Learnings (2)
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/**/*.js : Pass structured metadata to Winston logging calls (e.g. `info('Message processed', { userId, channelId })`)

Applied to files:

  • src/api/ws/logStream.js
  • src/transports/websocket.js
  • web/src/lib/log-ws.ts
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/commands/*.js : Use `parseDuration()` from `src/utils/duration.js` for duration-based commands (timeout, tempban, slowmode)

Applied to files:

  • web/src/components/dashboard/restart-history.tsx
🧬 Code graph analysis (8)
tests/api/ws/logStream.test.js (1)
src/api/ws/logStream.js (4)
  • nonce (171-171)
  • setupLogStream (53-84)
  • stopLogStream (327-349)
  • getAuthenticatedClientCount (357-359)
src/api/ws/logStream.js (3)
tests/api/ws/logStream.test.js (3)
  • nonce (15-15)
  • expiry (16-16)
  • hmac (17-17)
src/logger.js (2)
  • info (223-225)
  • warn (230-232)
src/api/routes/health.js (1)
  • queryLogs (76-76)
src/api/routes/health.js (1)
src/api/ws/logStream.js (1)
  • queryLogs (234-234)
tests/utils/restartTracker.test.js (1)
src/utils/restartTracker.js (10)
  • recordRestart (43-61)
  • updateUptimeOnShutdown (70-91)
  • getRestarts (100-131)
  • getLastRestart (139-142)
  • getStartedAt (149-151)
  • _resetState (158-161)
  • result (49-52)
  • result (102-108)
  • result (115-121)
  • rows (140-140)
web/src/components/dashboard/health-cards.tsx (3)
web/src/components/dashboard/types.ts (1)
  • BotHealth (10-33)
web/src/components/ui/card.tsx (4)
  • Card (76-76)
  • CardHeader (77-77)
  • CardContent (81-81)
  • CardTitle (79-79)
web/src/components/ui/skeleton.tsx (1)
  • Skeleton (12-12)
web/src/components/dashboard/log-viewer.tsx (3)
web/src/lib/log-ws.ts (3)
  • LogLevel (7-7)
  • ConnectionStatus (26-26)
  • LogEntry (9-18)
web/src/lib/utils.ts (1)
  • cn (4-6)
src/api/ws/logStream.js (1)
  • logs (236-242)
web/src/components/dashboard/health-section.tsx (3)
web/src/components/dashboard/types.ts (2)
  • BotHealth (10-33)
  • isBotHealth (35-77)
web/src/components/dashboard/health-cards.tsx (1)
  • HealthCards (67-240)
web/src/components/dashboard/restart-history.tsx (1)
  • RestartHistory (99-155)
web/src/lib/log-ws.ts (1)
src/api/ws/logStream.js (1)
  • logs (236-242)
🔇 Additional comments (8)
src/api/routes/health.js (1)

10-26: Lazy queryLogs tombstone handling looks good.

The one-time failure tombstone avoids repeated import retries under persistent optional-feature absence.

src/api/ws/logStream.js (1)

108-122: Connection lifecycle and shutdown cleanup are solid.

Async message error handling, client cleanup, and server shutdown state reset are implemented defensively and cleanly.

Also applies to: 327-349

web/src/components/dashboard/health-cards.tsx (1)

82-91: Good null-safety and CPU metric clarification.

The error counters are now safely rendered when values are null, and CPU derivation is explicitly documented in the implementation.

Also applies to: 178-200

tests/utils/restartTracker.test.js (1)

200-262: Excellent branch coverage for self-heal behavior.

The added tests for 42P01 retry-success and retry-failure paths materially improve confidence in restart tracker resilience.

src/index.js (1)

421-433: Good orphan-transport cleanup on API startup failure.

The catch-path removal of wsTransport prevents lingering logger transports when startServer fails.

src/transports/websocket.js (1)

101-153: Transport broadcast/filter flow is robust.

The zero-client fast path, per-client filter checks, and serialization fallback make this transport resilient under normal failure modes.

tests/api/ws/logStream.test.js (1)

41-76: Strong WS contract coverage and race-resistant message queueing.

The waiter-by-identity queue handling and end-to-end lifecycle/auth/filter tests materially improve reliability and reduce flake risk.

Also applies to: 143-320

web/src/components/dashboard/log-viewer.tsx (1)

70-91: Accessibility and badge sizing fixes look good.

Keyboard toggling semantics and minimum badge width are both correctly implemented and improve usability.

Also applies to: 98-99

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (8)
web/src/lib/log-ws.ts (2)

117-119: Stale connect() race when enabled toggles mid-flight is still unguarded.

If enabled flips while fetchTicket() is in-flight, the old connect() call resumes and may create a WebSocket after a new call has started. connectingRef prevents concurrent starts but not a stale completion racing against a fresh one. An attempt-token pattern (e.g., incrementing counter checked after each await) would close this gap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/lib/log-ws.ts` around lines 117 - 119, The connect() function can
complete after enabled has toggled and start a stale WebSocket because
connectingRef only prevents concurrent starts; add an attempt-token (e.g.,
increment a local counter on each connect start stored on a ref like
connectAttemptRef) and check that token after each await (notably after
fetchTicket()) and before creating the socket to ensure the current attempt is
still valid and enabled; also verify unmountedRef and enabled after awaits and
bail out if the token changed so stale connect() runs cannot create a socket or
mutate state.

104-106: Ticket fetch still missing cache: "no-store".

Short-lived HMAC tickets must always be fetched fresh. Without an explicit cache directive, browsers may serve a cached (expired) ticket, causing auth failures on reconnect.

🔧 Proposed fix
-      const res = await fetch("/api/log-stream/ws-ticket");
+      const res = await fetch("/api/log-stream/ws-ticket", { cache: "no-store" });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/lib/log-ws.ts` around lines 104 - 106, The fetch call for the
short-lived HMAC ticket in web/src/lib/log-ws.ts must disable caching; update
the fetch("/api/log-stream/ws-ticket") call to pass the request option cache:
"no-store" so browsers never return a stale ticket. Locate the fetch call within
the try block (the one assigning const res) and add the options object with
cache: "no-store" to the call, keeping the existing error/return behavior
intact.
web/src/components/dashboard/restart-history.tsx (1)

99-102: .reverse() likely inverts backend's DESC order, contradicting the "most recent first" description.

getRestarts() in src/utils/restartTracker.js returns rows ORDER BY timestamp DESC (newest first). Calling .reverse() flips this to ASC (oldest first), then .slice(0, MAX_RESTARTS) takes the 20 oldest entries. This contradicts the CardDescription at Line 108: "most recent first."

🔧 Proposed fix — use deterministic sort instead
   const restarts: RestartRecord[] = health
-    ? [...health.restarts].reverse().slice(0, MAX_RESTARTS)
+    ? [...health.restarts]
+        .sort((a, b) => Date.parse(b.timestamp) - Date.parse(a.timestamp))
+        .slice(0, MAX_RESTARTS)
     : [];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/restart-history.tsx` around lines 99 - 102,
RestartHistory is reversing an array that getRestarts() already orders DESC,
causing the UI to show oldest first; remove the .reverse() and instead
explicitly sort health.restarts deterministically by timestamp (descending)
before slicing to MAX_RESTARTS so the list is reliably "most recent first"
(refer to RestartHistory, the restarts variable, getRestarts, and MAX_RESTARTS;
ensure sort compares restart.timestamp values to produce descending order and
then slice(0, MAX_RESTARTS)).
web/src/components/dashboard/types.ts (1)

41-44: Optional memory.rss still unvalidated in the type guard.

The BotHealth interface declares rss?: number (Line 15), but the guard doesn't verify its type when present. A non-number rss value would pass validation.

🔧 Proposed fix
   const m = mem as Record<string, unknown>;
   if (typeof m.heapUsed !== "number" || typeof m.heapTotal !== "number") return false;
+  if (m.rss !== undefined && typeof m.rss !== "number") return false;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/types.ts` around lines 41 - 44, The type guard
for BotHealth in types.ts currently checks memory.heapUsed and memory.heapTotal
but ignores the optional memory.rss, so a non-number rss would slip through;
update the guard around the mem/m variables (the block using const mem =
v.memory; const m = mem as Record<string, unknown>) to also verify that if m.rss
is present/defined then typeof m.rss === "number" (return false otherwise),
ensuring the optional rss field is validated when provided.
src/api/routes/health.js (1)

99-109: body.restarts fallback shape still mismatches the frontend contract.

The BotHealth interface in types.ts (Line 32) defines restarts: RestartRecord[], and isBotHealth validates with Array.isArray(v.restarts). The error/fallback paths on Lines 105 and 108 assign { total: 0, last: null } — an object, not an array — which will cause isBotHealth to reject the entire health payload on the frontend.

Additionally, verify that getRestartData is actually exported from restartTracker.js. The relevant snippets only show getRestarts, getLastRestart, recordRestart, etc. — no getRestartData export.

#!/bin/bash
# Check if getRestartData is exported from restartTracker.js
rg -n "export.*getRestartData" src/utils/restartTracker.js

# Check all exports from restartTracker
rg -n "^export " src/utils/restartTracker.js
🔧 Proposed fix for shape consistency
     if (getRestartData) {
       try {
         const restartInfo = await getRestartData();
-        body.restarts = restartInfo;
+        body.restarts = Array.isArray(restartInfo) ? restartInfo : [];
       } catch {
-        body.restarts = { total: 0, last: null };
+        body.restarts = [];
       }
     } else {
-      body.restarts = { total: 0, last: null };
+      body.restarts = [];
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/routes/health.js` around lines 99 - 109, The health route is
returning an object for body.restarts which violates the BotHealth contract
expecting RestartRecord[] and may also reference a non-exported function
getRestartData; update src/api/routes/health.js to set body.restarts to an array
fallback (e.g., empty array) on error or when no data, and call the actual
exported restart helper (replace getRestartData with the real export such as
getRestarts or a newly exported getRestartData from
src/utils/restartTracker.js); ensure you import the correct symbol and that the
success path assigns the restart array (restartInfo) directly to body.restarts
so isBotHealth(Array.isArray(...)) passes.
web/src/components/dashboard/health-section.tsx (1)

70-71: Stale error banner persists alongside fresh health data on a successful background refresh.

A background refresh skips setError(null) (by design, Lines 32–35), so if a prior foreground fetch left error set, a subsequent successful background refresh updates health and lastUpdatedAt but leaves the error banner visible. Users will see both fresh data and an outdated error simultaneously.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/health-section.tsx` around lines 70 - 71, When a
background refresh successfully updates health, also clear any prior error so
the stale error banner doesn't remain: in the background refresh path where
setHealth(payload) and setLastUpdatedAt(new Date()) are called, add a call to
setError(null) (the same state setter used for errors) so successful background
fetches clear existing errors; ensure this change only runs on successful
responses in the same handler that currently updates health and lastUpdatedAt.
src/transports/websocket.js (1)

108-109: ⚠️ Potential issue | 🟠 Major

Normalize message to a string before building payloads.

Line 121 currently uses message || '', which drops 0/false and can preserve non-serializable objects. If serialization fails at Line 129, the fallback at Line 135 reuses the same value and can throw again, skipping Line 152 (callback()).

🔧 Proposed fix
   log(info, callback) {
@@
-    const { level, message, timestamp } = info;
+    const { level, message, timestamp } = info;
+    const messageText = typeof message === 'string' ? message : String(message ?? '');
@@
     const entry = {
       type: 'log',
       level: level || 'info',
-      message: message || '',
+      message: messageText,
       metadata,
       timestamp: timestamp || new Date().toISOString(),
       module: metadata.module || null,
     };
@@
       payload = JSON.stringify({
         type: 'log',
         level: entry.level,
-        message: entry.message,
+        message: messageText,
         metadata: {},
         timestamp: entry.timestamp,
         module: null,
       });

Also applies to: 118-139, 152-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/transports/websocket.js` around lines 108 - 109, Normalize the extracted
info.message to a safe string immediately after const { level, message,
timestamp } = info so falsy values like 0/false are preserved and
non-serializable objects are converted to a string; then use that
normalizedMessage when building payloads and when calling JSON.stringify inside
the transport send logic (ensure try/catch around JSON.stringify and, on
failure, replace with a safe fallback string like String(normalizedMessage) or a
generic placeholder), and always invoke callback() in both success and
fallback/error branches; reference the variables info, message, timestamp,
level, normalizedMessage, payload and callback when making these changes.
src/api/ws/logStream.js (1)

165-176: ⚠️ Potential issue | 🟠 Major

Harden ticket parsing so malformed auth payloads follow the explicit rejection path.

Line 168 calls .split() without enforcing a string ticket, so malformed payloads (e.g., object/array ticket values) can throw inside validateTicket instead of returning a normal auth failure. Also, Line 175 accepts non-numeric expiry strings because parseInt(...) can produce NaN and NaN <= Date.now() is false.

🔧 Proposed fix
 function validateTicket(ticket, secret) {
-  if (!ticket || !secret) return false;
+  if (typeof ticket !== 'string' || typeof secret !== 'string') return false;

   const parts = ticket.split('.');
   if (parts.length !== 3) return false;

   const [nonce, expiry, hmac] = parts;
   if (!nonce || !expiry || !hmac) return false;

   // Check expiry
-  if (parseInt(expiry, 10) <= Date.now()) return false;
+  if (!/^\d+$/.test(expiry)) return false;
+  if (Number(expiry) <= Date.now()) return false;
@@
-  if (!msg.ticket || !validateTicket(msg.ticket, process.env.BOT_API_SECRET)) {
+  if (typeof msg.ticket !== 'string' || !validateTicket(msg.ticket, process.env.BOT_API_SECRET)) {
     warn('WebSocket auth failed', { reason: 'invalid ticket' });
     ws.close(4003, 'Authentication failed');
     return;
   }

Also applies to: 201-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/ws/logStream.js` around lines 165 - 176, The validateTicket function
can throw for non-string tickets and incorrectly allow non-numeric expiry; first
ensure ticket is a string (and secret present) before calling ticket.split by
returning false if typeof ticket !== 'string', then split and check parts length
and that nonce, expiry, hmac are non-empty strings; parse expiry with Number or
parseInt and explicitly check for NaN (e.g., const expNum = Number(expiry); if
(!Number.isFinite(expNum) || expNum <= Date.now()) return false) so only valid
numeric future expiries pass; apply the same hardened checks to the duplicate
parsing logic referenced around lines 201-203.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/src/components/dashboard/health-cards.tsx`:
- Around line 27-38: formatUptime is duplicated with slightly different behavior
between health-cards.tsx and restart-history.tsx; extract a single shared helper
(e.g., formatUptime) into a common utility module (like utils/time.ts),
implement the agreed-upon canonical behavior (pick one: include seconds for <1m
like "Xs" or normalize to "0m" — update implementation accordingly), remove the
local formatUptime from health-cards.tsx and restart-history.tsx, and import the
shared function into both components (update references to the function name
formatUptime to ensure consistent formatting everywhere).

In `@web/src/components/dashboard/health-section.tsx`:
- Around line 12-18: The formatLastUpdated function currently hardcodes the
"en-US" locale; update formatLastUpdated to let Intl.DateTimeFormat use the
runtime/browser locale by removing the "en-US" argument or passing undefined as
the first parameter when constructing new Intl.DateTimeFormat (keep the existing
options: hour, minute, second) so times render according to the user's locale.
- Around line 27-83: The foreground fetch can leave loading stuck true when
aborted by a background refresh; in fetchHealth capture whether this invocation
actually set loading (e.g., const didSetLoading = !backgroundRefresh && (() => {
setLoading(true); setError(null); return true })() or set didSetLoading =
!backgroundRefresh before modifying state) and then in the finally block always
call setLoading(false) if didSetLoading is true (regardless of
abortControllerRef.current === controller), while keeping the existing
abortControllerRef/controller logic and preserving backgroundRefresh behavior;
update references in fetchHealth (abortControllerRef, controller,
backgroundRefresh, setLoading) so the invocation that set loading always clears
it.
- Around line 43-46: Replace the direct window.location.href assignment used in
the 401 handling with Next.js router navigation: import and call the useRouter
hook (useRouter from next/navigation or next/router depending on App Router vs
Pages Router) inside the component that contains the response.status === 401
check, obtain router via const router = useRouter(), and call
router.replace("/login") instead of setting window.location.href so navigation
stays within Next.js routing and works in SSR/test environments.

In `@web/src/components/dashboard/log-viewer.tsx`:
- Around line 220-246: The log list currently renders up to MAX_LOGS entries
directly (in the component that maps logs to <LogRow> inside the containerRef
div), causing potential jank; replace the direct logs.map rendering with a
windowed/virtualized list (e.g., `@tanstack/react-virtual`) that renders only
visible rows while preserving behavior from handleScroll, bottomRef and
auto-scroll-to-bottom logic, passing expandedIds and toggleExpand into the
virtualized row renderer and ensuring variable row heights (or measuring
heights) so expanded metadata displays correctly. Update the container element
to be the virtual scroller's parent, use the library's scroll container for
handleScroll and bottomRef anchoring, and ensure keys use entry.id and the
virtualized list maintains focus on the bottom when the user is at the tail of
the stream.

---

Duplicate comments:
In `@src/api/routes/health.js`:
- Around line 99-109: The health route is returning an object for body.restarts
which violates the BotHealth contract expecting RestartRecord[] and may also
reference a non-exported function getRestartData; update
src/api/routes/health.js to set body.restarts to an array fallback (e.g., empty
array) on error or when no data, and call the actual exported restart helper
(replace getRestartData with the real export such as getRestarts or a newly
exported getRestartData from src/utils/restartTracker.js); ensure you import the
correct symbol and that the success path assigns the restart array (restartInfo)
directly to body.restarts so isBotHealth(Array.isArray(...)) passes.

In `@src/api/ws/logStream.js`:
- Around line 165-176: The validateTicket function can throw for non-string
tickets and incorrectly allow non-numeric expiry; first ensure ticket is a
string (and secret present) before calling ticket.split by returning false if
typeof ticket !== 'string', then split and check parts length and that nonce,
expiry, hmac are non-empty strings; parse expiry with Number or parseInt and
explicitly check for NaN (e.g., const expNum = Number(expiry); if
(!Number.isFinite(expNum) || expNum <= Date.now()) return false) so only valid
numeric future expiries pass; apply the same hardened checks to the duplicate
parsing logic referenced around lines 201-203.

In `@src/transports/websocket.js`:
- Around line 108-109: Normalize the extracted info.message to a safe string
immediately after const { level, message, timestamp } = info so falsy values
like 0/false are preserved and non-serializable objects are converted to a
string; then use that normalizedMessage when building payloads and when calling
JSON.stringify inside the transport send logic (ensure try/catch around
JSON.stringify and, on failure, replace with a safe fallback string like
String(normalizedMessage) or a generic placeholder), and always invoke
callback() in both success and fallback/error branches; reference the variables
info, message, timestamp, level, normalizedMessage, payload and callback when
making these changes.

In `@web/src/components/dashboard/health-section.tsx`:
- Around line 70-71: When a background refresh successfully updates health, also
clear any prior error so the stale error banner doesn't remain: in the
background refresh path where setHealth(payload) and setLastUpdatedAt(new
Date()) are called, add a call to setError(null) (the same state setter used for
errors) so successful background fetches clear existing errors; ensure this
change only runs on successful responses in the same handler that currently
updates health and lastUpdatedAt.

In `@web/src/components/dashboard/restart-history.tsx`:
- Around line 99-102: RestartHistory is reversing an array that getRestarts()
already orders DESC, causing the UI to show oldest first; remove the .reverse()
and instead explicitly sort health.restarts deterministically by timestamp
(descending) before slicing to MAX_RESTARTS so the list is reliably "most recent
first" (refer to RestartHistory, the restarts variable, getRestarts, and
MAX_RESTARTS; ensure sort compares restart.timestamp values to produce
descending order and then slice(0, MAX_RESTARTS)).

In `@web/src/components/dashboard/types.ts`:
- Around line 41-44: The type guard for BotHealth in types.ts currently checks
memory.heapUsed and memory.heapTotal but ignores the optional memory.rss, so a
non-number rss would slip through; update the guard around the mem/m variables
(the block using const mem = v.memory; const m = mem as Record<string, unknown>)
to also verify that if m.rss is present/defined then typeof m.rss === "number"
(return false otherwise), ensuring the optional rss field is validated when
provided.

In `@web/src/lib/log-ws.ts`:
- Around line 117-119: The connect() function can complete after enabled has
toggled and start a stale WebSocket because connectingRef only prevents
concurrent starts; add an attempt-token (e.g., increment a local counter on each
connect start stored on a ref like connectAttemptRef) and check that token after
each await (notably after fetchTicket()) and before creating the socket to
ensure the current attempt is still valid and enabled; also verify unmountedRef
and enabled after awaits and bail out if the token changed so stale connect()
runs cannot create a socket or mutate state.
- Around line 104-106: The fetch call for the short-lived HMAC ticket in
web/src/lib/log-ws.ts must disable caching; update the
fetch("/api/log-stream/ws-ticket") call to pass the request option cache:
"no-store" so browsers never return a stale ticket. Locate the fetch call within
the try block (the one assigning const res) and add the options object with
cache: "no-store" to the call, keeping the existing error/return behavior
intact.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d4e697 and a1d8b6d.

📒 Files selected for processing (12)
  • src/api/routes/health.js
  • src/api/ws/logStream.js
  • src/index.js
  • src/transports/websocket.js
  • tests/api/ws/logStream.test.js
  • tests/utils/restartTracker.test.js
  • web/src/components/dashboard/health-cards.tsx
  • web/src/components/dashboard/health-section.tsx
  • web/src/components/dashboard/log-viewer.tsx
  • web/src/components/dashboard/restart-history.tsx
  • web/src/components/dashboard/types.ts
  • web/src/lib/log-ws.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.js: Use ESM modules only — use import/export, never require()
Use node: protocol for Node.js builtins (e.g. import { readFileSync } from 'node:fs')
Always use semicolons
Use single quotes for strings
Use 2-space indentation
No TypeScript — use plain JavaScript with JSDoc comments for documentation

Files:

  • src/api/routes/health.js
  • src/api/ws/logStream.js
  • tests/api/ws/logStream.test.js
  • src/index.js
  • tests/utils/restartTracker.test.js
  • src/transports/websocket.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.js: Always use Winston for logging — import { info, warn, error } from ../logger.js
NEVER use console.log, console.warn, console.error, or any console.* method in src/ files
Pass structured metadata to Winston logging calls (e.g. info('Message processed', { userId, channelId }))
Use custom error classes from src/utils/errors.js for error handling
Always log errors with context before re-throwing
Use getConfig(guildId?) from src/modules/config.js to read config
Use setConfigValue(path, value, guildId?) from src/modules/config.js to update config at runtime
Use splitMessage() utility for messages exceeding Discord's 2000-character limit
Use safeSend() wrapper for outgoing Discord messages to sanitize mentions and enforce allowedMentions

Files:

  • src/api/routes/health.js
  • src/api/ws/logStream.js
  • src/index.js
  • src/transports/websocket.js
tests/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Test files must achieve at least 80% code coverage on statements, branches, functions, and lines

Files:

  • tests/api/ws/logStream.test.js
  • tests/utils/restartTracker.test.js
🧠 Learnings (1)
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/**/*.js : Pass structured metadata to Winston logging calls (e.g. `info('Message processed', { userId, channelId })`)

Applied to files:

  • src/api/ws/logStream.js
  • web/src/lib/log-ws.ts
  • src/transports/websocket.js
🧬 Code graph analysis (10)
src/api/routes/health.js (2)
src/api/index.js (1)
  • router (12-12)
src/api/ws/logStream.js (1)
  • queryLogs (234-234)
web/src/components/dashboard/health-section.tsx (3)
web/src/components/dashboard/types.ts (2)
  • BotHealth (10-33)
  • isBotHealth (35-77)
web/src/components/dashboard/health-cards.tsx (1)
  • HealthCards (67-240)
web/src/components/dashboard/restart-history.tsx (1)
  • RestartHistory (99-155)
web/src/components/dashboard/restart-history.tsx (1)
web/src/components/dashboard/types.ts (2)
  • BotHealth (10-33)
  • RestartRecord (2-7)
src/api/ws/logStream.js (3)
tests/api/ws/logStream.test.js (3)
  • nonce (15-15)
  • expiry (16-16)
  • hmac (17-17)
src/logger.js (2)
  • info (223-225)
  • warn (230-232)
src/api/routes/health.js (1)
  • queryLogs (76-76)
tests/api/ws/logStream.test.js (2)
src/api/ws/logStream.js (4)
  • nonce (171-171)
  • setupLogStream (53-84)
  • stopLogStream (327-349)
  • getAuthenticatedClientCount (357-359)
src/transports/websocket.js (1)
  • WebSocketTransport (26-161)
src/index.js (4)
src/logger.js (4)
  • warn (230-232)
  • addWebSocketTransport (291-303)
  • removeWebSocketTransport (310-315)
  • error (237-239)
src/db.js (3)
  • pool (13-13)
  • getPool (290-295)
  • client (84-84)
src/utils/restartTracker.js (2)
  • updateUptimeOnShutdown (70-91)
  • recordRestart (43-61)
src/api/server.js (1)
  • startServer (92-128)
tests/utils/restartTracker.test.js (1)
src/utils/restartTracker.js (10)
  • recordRestart (43-61)
  • updateUptimeOnShutdown (70-91)
  • getRestarts (100-131)
  • getLastRestart (139-142)
  • getStartedAt (149-151)
  • _resetState (158-161)
  • result (49-52)
  • result (102-108)
  • result (115-121)
  • rows (140-140)
web/src/lib/log-ws.ts (5)
tests/commands/purge.test.js (1)
  • id (44-44)
src/utils/debugFooter.js (1)
  • r (285-285)
src/utils/registerCommands.js (1)
  • rest (36-36)
src/api/ws/logStream.js (1)
  • logs (236-242)
src/utils/safeSend.js (1)
  • normalized (53-53)
web/src/components/dashboard/health-cards.tsx (3)
web/src/components/dashboard/types.ts (1)
  • BotHealth (10-33)
web/src/components/ui/card.tsx (4)
  • Card (76-76)
  • CardHeader (77-77)
  • CardContent (81-81)
  • CardTitle (79-79)
web/src/components/ui/skeleton.tsx (1)
  • Skeleton (12-12)
src/transports/websocket.js (1)
src/logger.js (1)
  • info (223-225)
🔇 Additional comments (17)
src/api/routes/health.js (2)

10-26: Tombstone pattern for getQueryLogs looks good.

The queryLogsFailed sentinel avoids retrying a permanently-unavailable import on every request. This addresses the earlier review feedback cleanly.


76-97: Graceful error handling for queryLogs — LGTM.

The try/catch around Promise.all and the null-safe fallbacks ({ lastHour: null, lastDay: null }) correctly degrade the health endpoint when the logs table is unavailable.

web/src/components/dashboard/restart-history.tsx (2)

50-76: Reason classification ordering is correct now.

Checking "restart" before "startup"/"start" prevents the substring false-positive, and normalized.startsWith("start") avoids matching mid-word. This addresses the earlier feedback.


129-147: Null-safe rendering for version and uptimeBefore — LGTM.

Lines 141 and 144 correctly handle nullable fields with ?? "—" and != null guards, matching the updated RestartRecord type.

tests/utils/restartTracker.test.js (2)

200-262: Good coverage of the 42P01 self-heal path.

Both the successful retry (Lines 200–230) and the retry-failure (Lines 232–262) branches are now tested, addressing the earlier review gap. The mock structure cleanly simulates the three-call sequence (SELECT fail → CREATE TABLE → SELECT retry).


17-26: makePool fragment-matching is pragmatic but worth noting.

The SQL-fragment-based matching works well here since each operation uses distinct SQL keywords. If future queries share fragments (e.g., two different FROM bot_restarts queries in one test), the first match wins silently. This is fine for now.

src/index.js (3)

53-60: Guarded BOT_VERSION read with graceful fallback — LGTM.

The try/catch around readFileSync with a 'unknown' default prevents a crash at module evaluation time if package.json is missing (e.g., stripped Docker image). This addresses the earlier review feedback.


266-272: Shutdown uptime logging now surfaces errors instead of swallowing them — LGTM.

The warn call with structured metadata (error, module) replaces the empty catch block, making shutdown incidents diagnosable.


421-434: Orphaned transport cleanup on startServer failure — LGTM.

Block-scoped wsTransport and the removeWebSocketTransport call in the catch block ensure the transport is properly removed from Winston and closed if the server fails to start.

web/src/components/dashboard/types.ts (2)

66-74: Per-element RestartRecord validation in isBotHealth — LGTM.

Each element is checked for correct types, including null-safe checks for version and uptimeBefore. This addresses the earlier feedback comprehensively.


51-55: Null-safe error count validation — LGTM.

lastHour and lastDay correctly accept null | number, matching the backend's fallback behavior when queryLogs is unavailable.

web/src/components/dashboard/health-cards.tsx (1)

67-239: Health cards implementation looks solid.

All the previously-flagged issues are addressed:

  • Ping boundary is inclusive at 300ms (Line 46).
  • CPU now shows an estimated utilization percentage with raw seconds sub-text (Lines 82–91, 213–219).
  • Error counts are null-safe with ?.toLocaleString() ?? "—" fallback (Lines 180–182, 197–199).
  • errorColor guard checks != null before calling (Lines 180, 197).
web/src/lib/log-ws.ts (2)

66-77: Metadata flattening is correctly implemented — LGTM.

The rawMeta is destructured out, spread into flatMeta alongside remaining fields, and empty objects are elided. This addresses the earlier nested-metadata issue.


222-241: Lifecycle cleanup is comprehensive — LGTM.

The cleanup resets unmountedRef, connectingRef, clears timers, nullifies WS handlers before closing, and sets status to disconnected. This addresses the earlier cleanup feedback.

web/src/components/dashboard/log-viewer.tsx (2)

51-128: Keyboard accessibility properly implemented — LGTM.

role="button", tabIndex={0}, aria-expanded, and onKeyDown (Enter/Space) are correctly applied only when metadata is present. The level badge width (min-w-[3rem]) prevents clipping of 4-character labels.


144-177: Auto-scroll and pause logic is well-structured.

The three-layer approach — paused state, userScrolledRef for manual scroll detection, and the 50px threshold — provides intuitive behavior. Resuming resets the scroll ref and uses a short timeout to ensure the scroll target is in the DOM.

tests/api/ws/logStream.test.js (1)

41-76: Queue/waiter synchronization fix looks solid.

The waiter identity-based cleanup on timeout (Line 68) plus clearTimeout on resolve (Line 63) correctly prevents stale waiters from consuming future messages.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (12)
web/src/lib/log-ws.ts (2)

105-106: ⚠️ Potential issue | 🟠 Major

Force fresh ticket fetches with cache: 'no-store'.

Line 105 uses default browser caching semantics. For short-lived auth tickets, stale cached responses can break WS auth.

Proposed fix
-      const res = await fetch("/api/log-stream/ws-ticket");
+      const res = await fetch('/api/log-stream/ws-ticket', { cache: 'no-store' });
For browser Fetch API, can GET responses be reused from cache by default, and is `cache: 'no-store'` the correct way to force a fresh response for short-lived auth tickets?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/lib/log-ws.ts` around lines 105 - 106, The fetch call that obtains
the short-lived WS ticket (the await fetch("/api/log-stream/ws-ticket") in
web/src/lib/log-ws.ts) can return cached responses; update the call to force a
fresh network request by passing the Fetch option cache: 'no-store' (i.e., call
fetch with { cache: 'no-store' }) so the ticket is never served from browser
cache and stale WS auth is avoided; keep the existing res.ok check and null
return logic intact (locate the fetch in the function handling WS ticket
retrieval).

117-134: ⚠️ Potential issue | 🟠 Major

Guard connect() with an attempt token to prevent stale async reconnect races.

connectingRef blocks parallel starts, but it does not invalidate an older in-flight connect() after lifecycle transitions. A stale attempt can resume post-await fetchTicket() and create an outdated socket.

Proposed fix
   const unmountedRef = useRef(false);
   const connectingRef = useRef(false);
+  const connectAttemptRef = useRef(0);

   const connect = useCallback(async () => {
     if (unmountedRef.current || connectingRef.current) return;
     connectingRef.current = true;
+    const attempt = ++connectAttemptRef.current;

     const ticket = await fetchTicket();
-    if (!ticket || unmountedRef.current) {
+    if (attempt !== connectAttemptRef.current || unmountedRef.current) {
+      connectingRef.current = false;
+      return;
+    }
+    if (!ticket) {
       connectingRef.current = false;
       // Ticket fetch failed — retry with backoff instead of giving up
       if (!unmountedRef.current) {
         setStatus("reconnecting");
@@
     ws.onclose = () => {
-      if (unmountedRef.current) return;
+      if (unmountedRef.current || attempt !== connectAttemptRef.current) return;
       connectingRef.current = false;
       setStatus("reconnecting");
@@
     return () => {
       unmountedRef.current = true;
       connectingRef.current = false;
+      connectAttemptRef.current += 1;
       if (reconnectTimerRef.current) clearTimeout(reconnectTimerRef.current);
#!/bin/bash
# Verify whether an attempt-token guard exists for in-flight connect() invalidation.
rg -n "connectAttemptRef|attempt !== connectAttemptRef\\.current|const attempt = \\+\\+connectAttemptRef" web/src/lib/log-ws.ts

Also applies to: 207-217, 230-233

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/lib/log-ws.ts` around lines 117 - 134, Add an “attempt token” guard
to invalidate stale async connect() runs: create a mutable ref (e.g.,
connectAttemptRef = useRef(0)), increment it at the start of connect() and
capture its value in a local const attempt; after each await (notably after
fetchTicket()) and before any side-effect (setting connectingRef, creating
sockets, scheduling reconnects via reconnectTimerRef), check if attempt ===
connectAttemptRef.current and return early if not; also increment
connectAttemptRef.current on unmount/cleanup to invalidate in-flight attempts
and apply the same attempt-check pattern to other connect entry points
referenced (the other connect() call sites mentioned) so stale async resumptions
cannot create outdated sockets.
src/api/routes/health.js (1)

99-109: ⚠️ Potential issue | 🟠 Major

Keep body.restarts JSON shape consistent on all paths.

Line 103 can return whatever getRestartData() returns, while Line 105 and Line 108 force an object fallback. This makes the payload contract unstable for dashboard consumers.

Proposed fix
     if (getRestartData) {
       try {
         const restartInfo = await getRestartData();
-        body.restarts = restartInfo;
+        body.restarts = Array.isArray(restartInfo)
+          ? restartInfo
+          : Array.isArray(restartInfo?.restarts)
+            ? restartInfo.restarts
+            : [];
       } catch {
-        body.restarts = { total: 0, last: null };
+        body.restarts = [];
       }
     } else {
-      body.restarts = { total: 0, last: null };
+      body.restarts = [];
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/routes/health.js` around lines 99 - 109, The health route sets
body.restarts to whatever getRestartData() returns on success but to a fixed
object on failure/absent, causing an unstable JSON contract; update the logic in
the handler around getRestartData so that after calling getRestartData() you
normalize/reduce restartInfo into a consistent shape (e.g., { total, last })
before assigning to body.restarts, and in the catch/else branches assign the
same shaped fallback ({ total: 0, last: null}); reference getRestartData,
restartInfo and body.restarts when making this change.
src/api/ws/logStream.js (1)

201-202: ⚠️ Potential issue | 🟠 Major

Validate msg.ticket type before calling validateTicket().

Line 201 accepts any truthy ticket, but validateTicket() assumes a string and uses .split('.'). Non-string payloads can throw and take the slow error path instead of clean auth rejection.

Proposed fix
-  if (!msg.ticket || !validateTicket(msg.ticket, process.env.BOT_API_SECRET)) {
+  const secret = process.env.BOT_API_SECRET;
+  if (
+    typeof msg.ticket !== 'string' ||
+    typeof secret !== 'string' ||
+    !validateTicket(msg.ticket, secret)
+  ) {
     warn('WebSocket auth failed', { reason: 'invalid ticket' });
     ws.close(4003, 'Authentication failed');
     return;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/ws/logStream.js` around lines 201 - 202, The auth check uses
msg.ticket directly but validateTicket(msg.ticket, ...) assumes a string (calls
.split), so guard against non-string tickets: in the WebSocket auth path
surrounding the if that calls validateTicket, first verify typeof msg.ticket ===
'string' (or coerce safely) and treat non-strings as invalid (call
warn('WebSocket auth failed', { reason: 'invalid ticket type' }) and
return/close). Update the conditional that currently references msg.ticket and
validateTicket to perform the type check before invoking validateTicket to avoid
throws from non-string payloads.
src/transports/websocket.js (1)

110-116: 🧹 Nitpick | 🔵 Trivial

Exclude known Winston internal keys from broadcast metadata.

Line 113 currently forwards all extra keys, which can include logger internals like splat. Filter these before sending to WS clients.

Proposed fix
-    const metadata = {};
+    const EXCLUDED_KEYS = new Set(['level', 'message', 'timestamp', 'splat']);
+    const metadata = {};
     for (const key of Object.keys(info)) {
-      if (key !== 'level' && key !== 'message' && key !== 'timestamp') {
+      if (!EXCLUDED_KEYS.has(key)) {
         metadata[key] = info[key];
       }
     }
#!/bin/bash
# Verify current metadata exclusion logic and Winston internal key handling.
rg -n "Object\\.keys\\(info\\)|splat|EXCLUDED_KEYS|metadata\\[key\\]" src/transports/websocket.js src/logger.js
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/transports/websocket.js` around lines 110 - 116, The metadata extraction
currently copies all non-level/message/timestamp keys from the info object (in
the websocket transport) and may forward Winston internals like splat; update
the extraction in the code that builds metadata from info so it filters out
known Winston internal keys (e.g., "splat", "level", "message", "timestamp" and
any other logger internals) before broadcasting to WS clients—implement this by
introducing a small EXCLUDED_KEYS set/array and checking key membership when
iterating Object.keys(info) (or by destructuring known allowed keys), ensuring
the variable metadata only contains safe user-provided fields.
web/src/components/dashboard/health-section.tsx (4)

12-18: ⚠️ Potential issue | 🟡 Minor

formatLastUpdated hardcodes "en-US" locale.

Pass undefined to let Intl.DateTimeFormat pick up the user's browser locale automatically.

♻️ Proposed fix
-  return new Intl.DateTimeFormat("en-US", {
+  return new Intl.DateTimeFormat(undefined, {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/health-section.tsx` around lines 12 - 18, The
function formatLastUpdated currently hardcodes the "en-US" locale; change the
Intl.DateTimeFormat instantiation in formatLastUpdated to pass undefined (new
Intl.DateTimeFormat(undefined, {...})) so the runtime uses the user's browser
locale automatically while keeping the existing options (hour, minute, second).

70-71: ⚠️ Potential issue | 🟡 Minor

Stale error banner remains visible after a successful refresh.

When a previous foreground fetch failed and error is set, a subsequent successful fetch updates health but never calls setError(null), so the error alert stays on screen alongside fresh data.

🐛 Proposed fix
       setHealth(payload);
+      setError(null);
       setLastUpdatedAt(new Date());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/health-section.tsx` around lines 70 - 71, The
success path that calls setHealth(payload) and setLastUpdatedAt(new Date()) must
also clear any prior error so the stale error banner is removed; update the same
success handler (where setHealth and setLastUpdatedAt are called) to call
setError(null) after successfully updating health (reference setHealth,
setLastUpdatedAt, and setError) so a previous foreground fetch failure does not
leave the error alert visible.

43-46: 🧹 Nitpick | 🔵 Trivial

Use router.replace("/login") instead of window.location.href.

Hard-assigning window.location.href bypasses the Next.js App Router, triggers a full page load, and breaks in SSR/test environments.

♻️ Proposed fix
+import { useRouter } from "next/navigation";

 export function HealthSection() {
+  const router = useRouter();
   // ...
       if (response.status === 401) {
-        window.location.href = "/login";
+        router.replace("/login");
         return;
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/health-section.tsx` around lines 43 - 46, The
code is doing a hard redirect with window.location.href inside the 401 handling
(the block checking response.status === 401); replace that with a Next.js App
Router navigation call: import and call useRouter from 'next/navigation', get
const router = useRouter() in the component (e.g., inside the HealthSection
component), and call router.replace("/login") instead of setting
window.location.href; ensure the component is a client component or the redirect
runs client-side so useRouter is available.

77-83: ⚠️ Potential issue | 🔴 Critical

loading can get permanently stuck true when a background auto-refresh aborts an in-flight foreground fetch.

Scenario:

  1. User clicks RefreshsetLoading(true), controller A installed.
  2. The 60 s timer fires while A is in-flight → fetchHealth(true) aborts A and installs controller B.
  3. Controller A's catch catches AbortError and executes return — but finally still runs.
  4. A's finally: abortControllerRef.current === controllerB !== AsetLoading(false) skipped.
  5. B's finally: !backgroundRefreshfalsesetLoading(false) skipped.

The spinner never stops and the Refresh button stays permanently disabled until a page reload.

The fix is to track whether this invocation set loading and always clear it, regardless of which controller is current:

🐛 Proposed fix
  const fetchHealth = useCallback(async (backgroundRefresh = false) => {
    abortControllerRef.current?.abort();
    const controller = new AbortController();
    abortControllerRef.current = controller;
+   const didSetLoading = !backgroundRefresh;

    if (!backgroundRefresh) {
      setLoading(true);
      setError(null);
    }

    try {
      // ... unchanged ...
    } catch (fetchError) {
      if (fetchError instanceof DOMException && fetchError.name === "AbortError") return;
      setError(
        fetchError instanceof Error ? fetchError.message : "Failed to fetch health data",
      );
    } finally {
-     if (abortControllerRef.current === controller && !backgroundRefresh) {
+     if (didSetLoading) {
        setLoading(false);
      }
    }
  }, []);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/health-section.tsx` around lines 77 - 83, The
loading state can get stuck because the finally block only clears loading when
the current controller matches and backgroundRefresh is false; change the logic
in the function that installs controller (use the local symbols
abortControllerRef, controller, backgroundRefresh, setLoading) to record whether
this invocation actually set loading (e.g., compute a local boolean like
didSetLoading = !backgroundRefresh then call setLoading(true) only if true) and
in the finally always clear loading for this invocation by calling
setLoading(false) when didSetLoading is true (instead of checking
abortControllerRef.current === controller), so each invocation that started
foreground loading will always clear it even if another controller replaced it.
web/src/components/dashboard/types.ts (1)

43-44: ⚠️ Potential issue | 🟡 Minor

memory.rss is still not validated in isBotHealth.

A non-number rss (e.g., the key present but set to a string) still passes the guard.

🛡️ Proposed fix
   if (typeof m.heapUsed !== "number" || typeof m.heapTotal !== "number") return false;
+  if (m.rss !== undefined && typeof m.rss !== "number") return false;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/types.ts` around lines 43 - 44, The health check
in isBotHealth currently only validates heapUsed and heapTotal on mem (cast to
m) but does not validate memory.rss, so non-number rss values slip through;
update the guard inside isBotHealth (where mem is cast to m and
heapUsed/heapTotal are checked) to also verify typeof m.rss === "number" (or if
rss lives under a nested memory object, check typeof m.memory?.rss === "number")
and return false if rss is not a number so that heapUsed, heapTotal and rss are
all validated.
web/src/components/dashboard/restart-history.tsx (1)

100-102: ⚠️ Potential issue | 🟠 Major

.reverse() displays oldest restarts first, contradicting "most recent first".

The backend getRestarts() already returns records in timestamp DESC order. Calling .reverse() flips them to ascending, showing the oldest entry at the top despite the CardDescription claiming "most recent first."

🐛 Proposed fix
   const restarts: RestartRecord[] = health
-    ? [...health.restarts].reverse().slice(0, MAX_RESTARTS)
+    ? [...health.restarts]
+        .sort((a, b) => Date.parse(b.timestamp) - Date.parse(a.timestamp))
+        .slice(0, MAX_RESTARTS)
     : [];

Using an explicit sort is also more robust if the payload order ever changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/restart-history.tsx` around lines 100 - 102, The
current expression that constructs restarts uses
[...health.restarts].reverse().slice(0, MAX_RESTARTS) which reverses the
backend's timestamp DESC order and shows oldest-first; fix by removing the
reverse and instead ensure a robust descending-sort by timestamp (e.g., sort by
b.timestamp - a.timestamp or comparable date parse) then slice(0, MAX_RESTARTS);
update the expression that builds restarts (referencing RestartRecord,
health.restarts, and MAX_RESTARTS) so the UI consistently shows most recent
first.
web/src/components/dashboard/health-cards.tsx (1)

27-38: 🧹 Nitpick | 🔵 Trivial

formatUptime is duplicated between health-cards.tsx and restart-history.tsx with subtly different behaviour.

health-cards.tsx emits "0m" for inputs under one minute; restart-history.tsx emits "Xs". Extract a single canonical helper into a shared utility (e.g., web/src/lib/utils/time.ts) and import it in both components.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/components/dashboard/health-cards.tsx` around lines 27 - 38, Extract
the duplicated formatUptime function into one shared helper (exported from a new
util module) and update both components to import it; create a single canonical
implementation named formatUptime(seconds: number): string that returns "Xd Xh
Xm" and switches to "Xs" for durations under one minute (so it matches the
restart-history behavior), remove the local formatUptime definitions from
health-cards.tsx and restart-history.tsx, and ensure both components import the
new helper and use the exported function name formatUptime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/src/components/dashboard/health-cards.tsx`:
- Around line 86-91: The computed cpuPct (derived from cpuTotalSec and
health.uptime) can exceed 100% on multi-core systems; clamp the numeric
percentage to the 0–100 range before converting to a string so the UI shows a
bounded 0.0–100.0 value. Specifically, change the flow that calculates cpuPct to
compute a numeric pct = (cpuTotalSec / health.uptime) * 100, then apply a clamp
like Math.min(Math.max(pct, 0), 100), and finally call toFixed(1) on the clamped
value; update any code that expects cpuPct as a string accordingly. Ensure this
change references cpuTotalSec, health.uptime, and cpuPct so it’s easy to locate
in the component.

In `@web/src/components/dashboard/restart-history.tsx`:
- Around line 14-27: The formatTimestamp function currently hardcodes the
"en-US" locale causing non-US users to see US date formatting; update
formatTimestamp to pass undefined (or omit the locale) to Intl.DateTimeFormat so
it respects the runtime/user locale (consistent with formatLastUpdated in
health-section.tsx); locate the formatTimestamp function and change the
Intl.DateTimeFormat call to use undefined instead of "en-US" while keeping the
same options and the try/catch behavior.

---

Duplicate comments:
In `@src/api/routes/health.js`:
- Around line 99-109: The health route sets body.restarts to whatever
getRestartData() returns on success but to a fixed object on failure/absent,
causing an unstable JSON contract; update the logic in the handler around
getRestartData so that after calling getRestartData() you normalize/reduce
restartInfo into a consistent shape (e.g., { total, last }) before assigning to
body.restarts, and in the catch/else branches assign the same shaped fallback ({
total: 0, last: null}); reference getRestartData, restartInfo and body.restarts
when making this change.

In `@src/api/ws/logStream.js`:
- Around line 201-202: The auth check uses msg.ticket directly but
validateTicket(msg.ticket, ...) assumes a string (calls .split), so guard
against non-string tickets: in the WebSocket auth path surrounding the if that
calls validateTicket, first verify typeof msg.ticket === 'string' (or coerce
safely) and treat non-strings as invalid (call warn('WebSocket auth failed', {
reason: 'invalid ticket type' }) and return/close). Update the conditional that
currently references msg.ticket and validateTicket to perform the type check
before invoking validateTicket to avoid throws from non-string payloads.

In `@src/transports/websocket.js`:
- Around line 110-116: The metadata extraction currently copies all
non-level/message/timestamp keys from the info object (in the websocket
transport) and may forward Winston internals like splat; update the extraction
in the code that builds metadata from info so it filters out known Winston
internal keys (e.g., "splat", "level", "message", "timestamp" and any other
logger internals) before broadcasting to WS clients—implement this by
introducing a small EXCLUDED_KEYS set/array and checking key membership when
iterating Object.keys(info) (or by destructuring known allowed keys), ensuring
the variable metadata only contains safe user-provided fields.

In `@web/src/components/dashboard/health-cards.tsx`:
- Around line 27-38: Extract the duplicated formatUptime function into one
shared helper (exported from a new util module) and update both components to
import it; create a single canonical implementation named formatUptime(seconds:
number): string that returns "Xd Xh Xm" and switches to "Xs" for durations under
one minute (so it matches the restart-history behavior), remove the local
formatUptime definitions from health-cards.tsx and restart-history.tsx, and
ensure both components import the new helper and use the exported function name
formatUptime.

In `@web/src/components/dashboard/health-section.tsx`:
- Around line 12-18: The function formatLastUpdated currently hardcodes the
"en-US" locale; change the Intl.DateTimeFormat instantiation in
formatLastUpdated to pass undefined (new Intl.DateTimeFormat(undefined, {...}))
so the runtime uses the user's browser locale automatically while keeping the
existing options (hour, minute, second).
- Around line 70-71: The success path that calls setHealth(payload) and
setLastUpdatedAt(new Date()) must also clear any prior error so the stale error
banner is removed; update the same success handler (where setHealth and
setLastUpdatedAt are called) to call setError(null) after successfully updating
health (reference setHealth, setLastUpdatedAt, and setError) so a previous
foreground fetch failure does not leave the error alert visible.
- Around line 43-46: The code is doing a hard redirect with window.location.href
inside the 401 handling (the block checking response.status === 401); replace
that with a Next.js App Router navigation call: import and call useRouter from
'next/navigation', get const router = useRouter() in the component (e.g., inside
the HealthSection component), and call router.replace("/login") instead of
setting window.location.href; ensure the component is a client component or the
redirect runs client-side so useRouter is available.
- Around line 77-83: The loading state can get stuck because the finally block
only clears loading when the current controller matches and backgroundRefresh is
false; change the logic in the function that installs controller (use the local
symbols abortControllerRef, controller, backgroundRefresh, setLoading) to record
whether this invocation actually set loading (e.g., compute a local boolean like
didSetLoading = !backgroundRefresh then call setLoading(true) only if true) and
in the finally always clear loading for this invocation by calling
setLoading(false) when didSetLoading is true (instead of checking
abortControllerRef.current === controller), so each invocation that started
foreground loading will always clear it even if another controller replaced it.

In `@web/src/components/dashboard/restart-history.tsx`:
- Around line 100-102: The current expression that constructs restarts uses
[...health.restarts].reverse().slice(0, MAX_RESTARTS) which reverses the
backend's timestamp DESC order and shows oldest-first; fix by removing the
reverse and instead ensure a robust descending-sort by timestamp (e.g., sort by
b.timestamp - a.timestamp or comparable date parse) then slice(0, MAX_RESTARTS);
update the expression that builds restarts (referencing RestartRecord,
health.restarts, and MAX_RESTARTS) so the UI consistently shows most recent
first.

In `@web/src/components/dashboard/types.ts`:
- Around line 43-44: The health check in isBotHealth currently only validates
heapUsed and heapTotal on mem (cast to m) but does not validate memory.rss, so
non-number rss values slip through; update the guard inside isBotHealth (where
mem is cast to m and heapUsed/heapTotal are checked) to also verify typeof m.rss
=== "number" (or if rss lives under a nested memory object, check typeof
m.memory?.rss === "number") and return false if rss is not a number so that
heapUsed, heapTotal and rss are all validated.

In `@web/src/lib/log-ws.ts`:
- Around line 105-106: The fetch call that obtains the short-lived WS ticket
(the await fetch("/api/log-stream/ws-ticket") in web/src/lib/log-ws.ts) can
return cached responses; update the call to force a fresh network request by
passing the Fetch option cache: 'no-store' (i.e., call fetch with { cache:
'no-store' }) so the ticket is never served from browser cache and stale WS auth
is avoided; keep the existing res.ok check and null return logic intact (locate
the fetch in the function handling WS ticket retrieval).
- Around line 117-134: Add an “attempt token” guard to invalidate stale async
connect() runs: create a mutable ref (e.g., connectAttemptRef = useRef(0)),
increment it at the start of connect() and capture its value in a local const
attempt; after each await (notably after fetchTicket()) and before any
side-effect (setting connectingRef, creating sockets, scheduling reconnects via
reconnectTimerRef), check if attempt === connectAttemptRef.current and return
early if not; also increment connectAttemptRef.current on unmount/cleanup to
invalidate in-flight attempts and apply the same attempt-check pattern to other
connect entry points referenced (the other connect() call sites mentioned) so
stale async resumptions cannot create outdated sockets.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d4e697 and a1d8b6d.

📒 Files selected for processing (12)
  • src/api/routes/health.js
  • src/api/ws/logStream.js
  • src/index.js
  • src/transports/websocket.js
  • tests/api/ws/logStream.test.js
  • tests/utils/restartTracker.test.js
  • web/src/components/dashboard/health-cards.tsx
  • web/src/components/dashboard/health-section.tsx
  • web/src/components/dashboard/log-viewer.tsx
  • web/src/components/dashboard/restart-history.tsx
  • web/src/components/dashboard/types.ts
  • web/src/lib/log-ws.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.js: Use ESM modules only — use import/export, never require()
Use node: protocol for Node.js builtins (e.g. import { readFileSync } from 'node:fs')
Always use semicolons
Use single quotes for strings
Use 2-space indentation
No TypeScript — use plain JavaScript with JSDoc comments for documentation

Files:

  • src/api/routes/health.js
  • src/transports/websocket.js
  • tests/api/ws/logStream.test.js
  • src/api/ws/logStream.js
  • tests/utils/restartTracker.test.js
  • src/index.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.js: Always use Winston for logging — import { info, warn, error } from ../logger.js
NEVER use console.log, console.warn, console.error, or any console.* method in src/ files
Pass structured metadata to Winston logging calls (e.g. info('Message processed', { userId, channelId }))
Use custom error classes from src/utils/errors.js for error handling
Always log errors with context before re-throwing
Use getConfig(guildId?) from src/modules/config.js to read config
Use setConfigValue(path, value, guildId?) from src/modules/config.js to update config at runtime
Use splitMessage() utility for messages exceeding Discord's 2000-character limit
Use safeSend() wrapper for outgoing Discord messages to sanitize mentions and enforce allowedMentions

Files:

  • src/api/routes/health.js
  • src/transports/websocket.js
  • src/api/ws/logStream.js
  • src/index.js
tests/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Test files must achieve at least 80% code coverage on statements, branches, functions, and lines

Files:

  • tests/api/ws/logStream.test.js
  • tests/utils/restartTracker.test.js
🧠 Learnings (2)
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/**/*.js : Pass structured metadata to Winston logging calls (e.g. `info('Message processed', { userId, channelId })`)

Applied to files:

  • src/transports/websocket.js
  • web/src/lib/log-ws.ts
  • src/api/ws/logStream.js
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/commands/*.js : Use `parseDuration()` from `src/utils/duration.js` for duration-based commands (timeout, tempban, slowmode)

Applied to files:

  • web/src/components/dashboard/health-cards.tsx
🧬 Code graph analysis (8)
src/api/routes/health.js (1)
src/api/ws/logStream.js (1)
  • queryLogs (234-234)
web/src/components/dashboard/restart-history.tsx (3)
web/src/components/dashboard/types.ts (2)
  • BotHealth (10-33)
  • RestartRecord (2-7)
web/src/components/ui/skeleton.tsx (1)
  • Skeleton (12-12)
web/src/components/ui/card.tsx (5)
  • Card (76-76)
  • CardHeader (77-77)
  • CardTitle (79-79)
  • CardDescription (80-80)
  • CardContent (81-81)
web/src/lib/log-ws.ts (1)
src/api/ws/logStream.js (1)
  • logs (236-242)
web/src/components/dashboard/health-cards.tsx (3)
web/src/components/dashboard/types.ts (1)
  • BotHealth (10-33)
web/src/components/ui/card.tsx (4)
  • Card (76-76)
  • CardHeader (77-77)
  • CardContent (81-81)
  • CardTitle (79-79)
web/src/components/ui/skeleton.tsx (1)
  • Skeleton (12-12)
web/src/components/dashboard/health-section.tsx (3)
web/src/components/dashboard/types.ts (2)
  • BotHealth (10-33)
  • isBotHealth (35-77)
web/src/components/dashboard/health-cards.tsx (1)
  • HealthCards (67-240)
web/src/components/dashboard/restart-history.tsx (1)
  • RestartHistory (99-155)
tests/api/ws/logStream.test.js (1)
src/api/ws/logStream.js (3)
  • setupLogStream (53-84)
  • stopLogStream (327-349)
  • getAuthenticatedClientCount (357-359)
src/api/ws/logStream.js (3)
tests/api/ws/logStream.test.js (3)
  • nonce (15-15)
  • expiry (16-16)
  • hmac (17-17)
src/logger.js (2)
  • info (223-225)
  • warn (230-232)
src/api/routes/health.js (1)
  • queryLogs (76-76)
src/index.js (4)
src/logger.js (4)
  • warn (230-232)
  • addWebSocketTransport (291-303)
  • removeWebSocketTransport (310-315)
  • error (237-239)
src/db.js (3)
  • pool (13-13)
  • getPool (290-295)
  • client (84-84)
src/utils/restartTracker.js (2)
  • updateUptimeOnShutdown (70-91)
  • recordRestart (43-61)
src/api/server.js (1)
  • startServer (92-128)
🔇 Additional comments (13)
src/api/routes/health.js (1)

10-26: Lazy queryLogs loading is robust.

The tombstone pattern here avoids repeated failed imports and keeps the health endpoint resilient.

src/transports/websocket.js (1)

82-89: Search filtering now safely handles non-string messages.

Good defensive coercion before case-insensitive matching.

tests/utils/restartTracker.test.js (1)

200-262: Excellent coverage for the 42P01 self-heal branches.

Both retry-success and retry-failure paths are now explicitly tested, including logging assertions.

src/index.js (2)

266-272: Shutdown uptime path is now observable and safe.

Good change to log failures instead of silently swallowing shutdown-path errors.


421-433: Startup failure cleanup for WS transport is correctly handled.

Nice fix preventing orphaned transport instances when startServer fails.

src/api/ws/logStream.js (1)

327-348: stopLogStream() cleanup state reset looks good.

Resetting wsTransport and counters after closing the server keeps module state clean for restarts/tests.

tests/api/ws/logStream.test.js (1)

41-76: Message-queue waiter cleanup is correctly fixed.

Using waiter identity in timeout cleanup closes the stale-waiter hole and makes these WS tests more deterministic.

web/src/components/dashboard/log-viewer.tsx (2)

70-90: Keyboard accessibility for metadata expansion is well implemented.

The role, tabIndex, aria-expanded, and Enter/Space handling are all in place.


154-154: No action required. The "instant" value is a valid and standardized ScrollBehavior option per the W3C CSSOM View Module specification and MDN documentation. The three valid values are "auto", "smooth", and "instant", so the code at line 154 is correct as written.

Likely an incorrect or invalid review comment.

web/src/components/dashboard/types.ts (1)

35-77: LGTM — per-item RestartRecord validation and null-aware errors guards are now in place.

The outstanding items from previous rounds (per-element loop on v.restarts, null-safe lastHour/lastDay, nullable version/uptimeBefore) are all addressed correctly.

web/src/components/dashboard/restart-history.tsx (1)

29-42: LGTM — formatUptime dead-code fallback removed and reasonStyle ordering fixed.

web/src/components/dashboard/health-cards.tsx (1)

44-48: LGTM — ping colour boundary and errors null guards are now correctly handled.

Also applies to: 178-202

web/src/components/dashboard/health-section.tsx (1)

32-35: LGTM — setError(null) is now correctly guarded behind !backgroundRefresh, and setLoading(false) is likewise gated, eliminating the previous background-refresh flicker.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/api/server.js`:
- Around line 115-122: The call to setupLogStream inside the app.listen callback
can throw synchronously and prevent resolve(server) from running, leaving the
startServer Promise pending; wrap the setupLogStream(server,
options.wsTransport) invocation in a try/catch inside the listen callback (or
otherwise call it before resolving) and on catch call reject(err) (or emit the
error) so the Promise settles instead of hanging; ensure resolve(server) only
runs after successful setup and reference the setupLogStream call and the
startServer/app.listen callback when applying the change.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1d8b6d and 56e707c.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (2)
  • src/api/server.js
  • src/index.js
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.js: Use ESM modules only — use import/export, never require()
Use node: protocol for Node.js builtins (e.g. import { readFileSync } from 'node:fs')
Always use semicolons
Use single quotes for strings
Use 2-space indentation
No TypeScript — use plain JavaScript with JSDoc comments for documentation

Files:

  • src/api/server.js
  • src/index.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.js: Always use Winston for logging — import { info, warn, error } from ../logger.js
NEVER use console.log, console.warn, console.error, or any console.* method in src/ files
Pass structured metadata to Winston logging calls (e.g. info('Message processed', { userId, channelId }))
Use custom error classes from src/utils/errors.js for error handling
Always log errors with context before re-throwing
Use getConfig(guildId?) from src/modules/config.js to read config
Use setConfigValue(path, value, guildId?) from src/modules/config.js to update config at runtime
Use splitMessage() utility for messages exceeding Discord's 2000-character limit
Use safeSend() wrapper for outgoing Discord messages to sanitize mentions and enforce allowedMentions

Files:

  • src/api/server.js
  • src/index.js
🧠 Learnings (1)
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/**/*.js : Always use Winston for logging — import `{ info, warn, error }` from `../logger.js`

Applied to files:

  • src/index.js
🧬 Code graph analysis (1)
src/index.js (4)
src/logger.js (4)
  • warn (230-232)
  • addWebSocketTransport (291-303)
  • removeWebSocketTransport (310-315)
  • error (237-239)
src/db.js (3)
  • pool (13-13)
  • getPool (290-295)
  • client (84-84)
src/utils/restartTracker.js (2)
  • updateUptimeOnShutdown (70-91)
  • recordRestart (43-61)
src/api/server.js (1)
  • startServer (93-129)
🔇 Additional comments (6)
src/api/server.js (1)

12-12: LGTM!

src/index.js (5)

25-26: LGTM!


57-64: LGTM! Addresses the previously flagged unguarded readFileSync at module load.


324-326: LGTM!


270-276: LGTM! Addresses the previously flagged silent error suppression in the uptime-write path.


425-438: LGTM! Addresses the previously flagged orphaned wsTransport on startServer failure — cleanup is now correct and the block scope properly limits wsTransport's lifetime.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
web/src/lib/log-ws.ts (1)

187-194: ⚠️ Potential issue | 🟠 Major

history replaces all logs — coordinate with server-side fix to avoid dropped entries.

setLogs(normalized) at line 193 replaces the entire log list. If the server registers the client for real-time broadcasting before sending history (as noted in the logStream.js review), any real-time entries received between auth_ok and history will be silently dropped. The server-side fix (registering with transport after sending history) is the correct place to address this; no client-side change needed if that's applied.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/lib/log-ws.ts` around lines 187 - 194, The "history" case currently
replaces the entire log list via setLogs(normalized), which can drop real-time
entries; coordinate with the server-side fix so the server registers the client
for real-time broadcasting only after sending history (do that in the transport
registration flow on the server). If the server-side fix cannot be applied
immediately, change the client handling in the "history" case to merge
normalized with existing logs (use setLogs(prev => merge-and-dedupe(prev,
normalized) by entry id), enforce MAX_LOGS, and reuse normalizeEntry, makeId,
and LogEntry for consistency).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/api/ws/logStream.js`:
- Around line 53-84: setupLogStream can be called multiple times and will leak
previous WebSocketServer and heartbeatTimer instances; add a guard at the start
of setupLogStream that checks existing globals (wss, heartbeatTimer) and either
calls stopLogStream() to clean them up before reinitializing or returns early
(whichever fits the intended semantics). Specifically, in setupLogStream
reference wss, heartbeatTimer, wsTransport and call the existing stopLogStream()
helper (or perform equivalent cleanup: close/terminate wss,
clearInterval(heartbeatTimer), null out wss/heartbeatTimer/wsTransport, and
cleanupClient for clients) before creating a new WebSocketServer, so repeated
invocations do not leak resources or start duplicate timers.
- Around line 224-249: Move the transport registration so the client is added
only after history is sent: call queryLogs(HISTORY_LIMIT), normalize and send
the history via sendJson(ws, {type: 'history', logs}) (or send empty logs on
error and log via logError), and only then call wsTransport.addClient(ws);
ensure this reordering applies in both success and catch paths and update any
related logging (info('WebSocket client authenticated', { totalClients:
authenticatedCount })) to reflect post-registration if needed; keep sendJson(ws,
{ type: 'auth_ok' }) where appropriate.

In `@src/transports/websocket.js`:
- Around line 112-118: The EXCLUDED_KEYS Set is currently created inside the
log() function on every call; move its declaration to module scope as a
top-level const (e.g., const EXCLUDED_KEYS = new
Set(['level','message','timestamp','splat'])) so log() reuses the same Set
instead of allocating a new one each time; update/remove the in-function
declaration and keep the metadata construction logic that iterates
Object.keys(info) unchanged so behavior is identical while avoiding repeated
allocations.

In `@tests/api/routes/health.test.js`:
- Around line 101-117: Add a failing-path test that verifies the health endpoint
returns null error counts when queryLogs throws: in
tests/api/routes/health.test.js create a new it block that stubs BOT_API_SECRET,
uses queryLogs.mockRejectedValueOnce(...) twice, builds the app via buildApp(),
calls GET /api/v1/health with the x-api-secret header, and asserts res.status is
200 and that res.body.errors.lastHour and res.body.errors.lastDay are null;
reference the existing queryLogs and buildApp symbols so the test mirrors the
happy-path test and confirms the catch branch in health.js.

In `@web/src/components/dashboard/health-cards.tsx`:
- Line 154: Replace hardcoded "en-US" in all toLocaleString calls within the
health-cards.tsx component so number formatting follows the user's runtime
locale; specifically update expressions like
health.discord.guilds.toLocaleString("en-US") (and the other occurrences on the
same file) to call toLocaleString() with no argument or
toLocaleString(undefined). Ensure you change every instance (the three places
that currently pass "en-US") so they behave consistently with the
Intl.DateTimeFormat change in health-section.tsx.

In `@web/src/components/dashboard/health-section.tsx`:
- Around line 29-86: The fetchHealth callback captures router but the
useCallback deps are empty; update the useCallback dependency array to include
router (the value returned by useRouter) so fetchHealth's closure stays
consistent—i.e., locate the fetchHealth function wrapped with useCallback and
add router to its dependency list (ensuring any ESLint/react-hooks warnings are
resolved while preserving existing logic like abortControllerRef, setLoading,
setError, setHealth, and setLastUpdatedAt which are stable).

---

Duplicate comments:
In `@web/src/lib/log-ws.ts`:
- Around line 187-194: The "history" case currently replaces the entire log list
via setLogs(normalized), which can drop real-time entries; coordinate with the
server-side fix so the server registers the client for real-time broadcasting
only after sending history (do that in the transport registration flow on the
server). If the server-side fix cannot be applied immediately, change the client
handling in the "history" case to merge normalized with existing logs (use
setLogs(prev => merge-and-dedupe(prev, normalized) by entry id), enforce
MAX_LOGS, and reuse normalizeEntry, makeId, and LogEntry for consistency).

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56e707c and 00c82b1.

📒 Files selected for processing (10)
  • src/api/routes/health.js
  • src/api/ws/logStream.js
  • src/transports/websocket.js
  • tests/api/routes/health.test.js
  • web/src/components/dashboard/health-cards.tsx
  • web/src/components/dashboard/health-section.tsx
  • web/src/components/dashboard/restart-history.tsx
  • web/src/components/dashboard/types.ts
  • web/src/lib/format-time.ts
  • web/src/lib/log-ws.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.js: Use ESM modules only — use import/export, never require()
Use node: protocol for Node.js builtins (e.g. import { readFileSync } from 'node:fs')
Always use semicolons
Use single quotes for strings
Use 2-space indentation
No TypeScript — use plain JavaScript with JSDoc comments for documentation

Files:

  • tests/api/routes/health.test.js
  • src/api/routes/health.js
  • src/transports/websocket.js
  • src/api/ws/logStream.js
tests/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Test files must achieve at least 80% code coverage on statements, branches, functions, and lines

Files:

  • tests/api/routes/health.test.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.js: Always use Winston for logging — import { info, warn, error } from ../logger.js
NEVER use console.log, console.warn, console.error, or any console.* method in src/ files
Pass structured metadata to Winston logging calls (e.g. info('Message processed', { userId, channelId }))
Use custom error classes from src/utils/errors.js for error handling
Always log errors with context before re-throwing
Use getConfig(guildId?) from src/modules/config.js to read config
Use setConfigValue(path, value, guildId?) from src/modules/config.js to update config at runtime
Use splitMessage() utility for messages exceeding Discord's 2000-character limit
Use safeSend() wrapper for outgoing Discord messages to sanitize mentions and enforce allowedMentions

Files:

  • src/api/routes/health.js
  • src/transports/websocket.js
  • src/api/ws/logStream.js
🧠 Learnings (2)
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/commands/*.js : Use `parseDuration()` from `src/utils/duration.js` for duration-based commands (timeout, tempban, slowmode)

Applied to files:

  • web/src/lib/format-time.ts
  • web/src/components/dashboard/health-cards.tsx
📚 Learning: 2026-02-25T02:39:33.506Z
Learnt from: CR
Repo: VolvoxLLC/volvox-bot PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-25T02:39:33.506Z
Learning: Applies to src/**/*.js : Pass structured metadata to Winston logging calls (e.g. `info('Message processed', { userId, channelId })`)

Applied to files:

  • src/transports/websocket.js
  • web/src/lib/log-ws.ts
  • src/api/ws/logStream.js
🧬 Code graph analysis (5)
web/src/components/dashboard/health-section.tsx (2)
web/src/components/dashboard/types.ts (2)
  • BotHealth (10-33)
  • isBotHealth (35-78)
web/src/components/dashboard/health-cards.tsx (1)
  • HealthCards (55-229)
src/api/routes/health.js (3)
src/api/ws/logStream.js (1)
  • queryLogs (235-235)
src/index.js (1)
  • pool (272-272)
src/db.js (1)
  • pool (13-13)
src/transports/websocket.js (2)
src/utils/modAction.js (1)
  • opts (46-56)
src/logger.js (1)
  • info (223-225)
web/src/lib/log-ws.ts (1)
src/api/ws/logStream.js (1)
  • logs (237-243)
src/api/ws/logStream.js (2)
tests/api/ws/logStream.test.js (3)
  • nonce (15-15)
  • expiry (16-16)
  • hmac (17-17)
src/logger.js (2)
  • info (223-225)
  • warn (230-232)
🔇 Additional comments (13)
src/api/routes/health.js (3)

10-26: LGTM — Tombstone pattern for lazy-loaded queryLogs is correct.

The queryLogsFailed flag prevents repeated dynamic import attempts after the first failure, addressing the earlier review feedback. The flow is clean: check tombstone → check cached → attempt import → cache or tombstone.


102-122: body.restarts shape is now consistently an array — previous review concern addressed.

All paths (success at line 108, empty pool at 115, catch at 118, and fallback at 121) return an array, matching the frontend contract.


86-89: No action required — queryLogs correctly returns full total count independent of row limit.

The implementation uses a separate COUNT(*) query (line 73) that applies only the WHERE clause filters (level, since, until, search) without the LIMIT parameter. The limit only affects the returned rows via the paginated data query (lines 78-80). This means the code correctly minimizes data transfer with limit: 1 while the total field accurately reflects the full count of all matching records.

src/transports/websocket.js (1)

26-162: Well-structured transport with good resilience patterns.

The zero-overhead early exit (line 103), JSON serialization fallback (lines 130-142), per-client error isolation (lines 149-151), and proper use of WebSocket.OPEN (line 146) are all solid. Prior review feedback has been addressed.

src/api/ws/logStream.js (2)

92-122: Good defensive practices in connection/cleanup lifecycle.

Auth timeout, pong-based heartbeat, .catch() on the async message handler (line 109), and the idempotent cleanupClient with Math.max(0, ...) guard are all well-implemented.

Also applies to: 278-294


165-188: Solid ticket validation with appropriate guards.

Type checks, NaN guard via Number.isFinite, timing-safe comparison, and try/catch around timingSafeEqual (which throws on length mismatch) are all correct.

web/src/lib/log-ws.ts (2)

89-251: Well-implemented reconnecting WebSocket hook with proper lifecycle guards.

The attempt-token pattern (line 121/126/217/242), exponential backoff with cap, ticket fetch retry on failure, filter reapplication after reconnect, and thorough cleanup on unmount are all solid. Prior review feedback has been comprehensively addressed.


54-77: Metadata flattening and normalization look correct.

The destructuring excludes top-level fields and metadata, then merges rawMeta with remaining rest fields. This addresses the previous review concern about nested meta.metadata.

web/src/lib/format-time.ts (1)

1-17: LGTM — Clean, correct duration formatter.

Logic is straightforward, the guard on line 14 ensures at least one component is always shown, and the constants are correct.

tests/api/routes/health.test.js (1)

10-17: Mock setup for optional modules is well-structured.

The logQuery.js mock provides a working default, and the restartTracker.js mock simulates module absence by throwing, which correctly exercises the top-level try/catch fallback in health.js.

web/src/components/dashboard/types.ts (1)

35-78: Comprehensive and correct runtime type guard.

All past review concerns (null-safe version/uptimeBefore, optional rss validation, per-item RestartRecord validation) are properly addressed. The guard correctly mirrors the interface definitions and handles nullable fields.

web/src/components/dashboard/restart-history.tsx (1)

85-142: Clean implementation with all prior feedback addressed.

Deterministic sort (line 88), null-safe rendering for version (line 129) and uptimeBefore (line 132), locale-aware timestamp formatting (line 17), and the correct reason-matching order (lines 40-59) are all in good shape.

web/src/components/dashboard/health-cards.tsx (1)

55-229: Everything else LGTM.

All previously raised issues have been properly resolved:

  • pingColor boundary inclusive at 300ms (line 34). ✅
  • cpuPct clamped to 0–100 with Math.min(Math.max(rawPct, 0), 100) (line 80). ✅
  • errors.lastHour/errors.lastDay null-guarded before errorColor and toLocaleString (lines 169–171, 186–188). ✅
  • formatUptime now imported from the shared @/lib/format-time module (line 21). ✅

@BillChirico BillChirico merged commit c3781ac into main Feb 25, 2026
1 of 5 checks passed
@BillChirico BillChirico deleted the feat/logs-and-health branch February 25, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Web dashboard: real-time bot logs and health monitoring

1 participant