Skip to content

fix: on awareness craah prevention and memory leak onStoreDocument#1032

Merged
janthurau merged 4 commits intoueberdosis:mainfrom
matteotarantino-algor:fix-onawareness-and-lock
Dec 9, 2025
Merged

fix: on awareness craah prevention and memory leak onStoreDocument#1032
janthurau merged 4 commits intoueberdosis:mainfrom
matteotarantino-algor:fix-onawareness-and-lock

Conversation

@matteotarantino-algor
Copy link
Contributor

@matteotarantino-algor matteotarantino-algor commented Dec 9, 2025

Critical Bugs in @hocuspocus/extension-redis

Description

Two critical bugs in @hocuspocus/extension-redis v3.4.0 cause memory leaks and server crashes in production:

  1. Lock Memory Leak: Acquired locks are never stored in the internal map, preventing proper cleanup and causing Redis lock accumulation
  2. Awareness Crash: Publishing awareness updates with zero connections throws ioredis error and crashes the server

Steps to reproduce the bug

Bug 1: Lock Memory Leak

  1. Configure Hocuspocus server with Redis extension and multiple instances
  2. Connect client and edit a document
  3. Save the document multiple times (triggers onStoreDocument)
  4. Monitor Redis locks and afterStoreDocument execution
  5. Observe: Locks acquired in onStoreDocument (line 224) are never released by afterStoreDocument (line 252) because locks.get(lockKey) returns undefined

Root cause: Missing this.locks.set(resource, { lock }) after acquiring the lock in onStoreDocument.

Bug 2: Awareness Update Crash (#1027)

  1. Configure Hocuspocus server with Redis extension
  2. Connect client to a document
  3. Disconnect the last client from the document
  4. Observe server crash with error:
    "Command queue state error"
    

Root cause: onAwarenessUpdate attempts to publish to Redis even when connections.size === 0, causing ioredis to fail.

Expected behavior

  1. Lock Management: Locks acquired in onStoreDocument should be stored in this.locks map and properly released in afterStoreDocument, preventing memory leaks
  2. Awareness Updates: Should skip Redis publish when no connections exist, preventing crashes on client disconnect

Environment

  • Operating system: macOS / Linux (production)
  • Hocuspocus version: 3.4.0
  • Redis version: AWS Valkey (Redis-compatible)
  • ioredis version: 5.8.2

Impact

Bug 1 (Lock Leak)

  • ⚠️ Severity: HIGH
  • Memory leak on Redis (locks accumulate until timeout)
  • Potential deadlocks when multiple instances try to save the same document
  • Confirmed in production logs: lockAttempts:1, lockReleases:0

Bug 2 (Awareness Crash)

  • 🔴 Severity: CRITICAL
  • Server crashes on every client disconnect
  • Affects all documents with active collaborators
  • Confirmed 14 occurrences in 5-minute test session

Proposed Fix

// packages/extension-redis/src/Redis.ts

async onStoreDocument({ documentName }: onStoreDocumentPayload) {
  const resource = this.lockKey(documentName);
  const ttl = this.configuration.lockTimeout;
  try {
    const lock = await this.redlock.acquire([resource], ttl);
    const oldLock = this.locks.get(resource);
    if (oldLock) {
      await oldLock.release;
    }
    this.locks.set(resource, { lock }); // ✅ FIX: Add this line
  } catch (error) {
    // ... error handling
  }
}

  async onAwarenessUpdate({
    documentName,
    awareness,
    added,
    updated,
    removed,
    document,
  }: onAwarenessUpdatePayload) {
    // Do not publish if there is no connection: it fixes this issue: "https://github.com/ueberdosis/hocuspocus/issues/1027"
    const connections = document?.connections.size || 0; // suppress
    if (connections === 0) {
      return; // avoids exception
    }
    const changedClients = added.concat(updated, removed);
    const message = new OutgoingMessage(documentName).createAwarenessUpdateMessage(
      awareness,
      changedClients
    );

    return this.pub.publish(this.pubKey(documentName), this.encodeMessage(message.toUint8Array()));
  }

@janthurau
Copy link
Contributor

Thank you for your PR!

Could you reformat it with npm run lint? 🙏

- Fix BUG ueberdosis#1: Lock memory leak - store acquired lock in locks map
- Fix BUG ueberdosis#3: Awareness crash when no connections - check connections before publish

Fixes ueberdosis#1027
@matteotarantino-algor
Copy link
Contributor Author

matteotarantino-algor commented Dec 9, 2025

@janthurau

I had Prettier configured differently in my local environment.
Unfortunately running npm run lint show many errors in unrelated files and makes me impossibile to revert these prettier changes by using your linter.,
I temporarily disabled Prettier to make the change and avoid formatting noise.

Let me know if everything looks good. Otherwise, I can create a new branch and open a fresh pull request: now that Prettier is disabled locally, this shouldn't happen again.

Just tell me what you prefer!

@janthurau janthurau merged commit 67957ae into ueberdosis:main Dec 9, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants