Skip to content

Fatal ioredis error: "Command queue state error" caused by Hocuspocus pub/sub messages #1027

@matteotarantino-algor

Description

@matteotarantino-algor

Description

We’re experiencing recurring crashes caused by an uncaught exception thrown inside ioredis, specifically this error:

{
    "level": "err",
    "message": "Process shutting down",
    "metadata": {
        "reason": "uncaughtException",
        "error": {
            "message": "Command queue state error. If you can reproduce this, please report it. Last reply: message,hocuspocus::69306d66030152471d5feac7,\u0019hocuspocus-10.0.29.163:33\u001869306d66030152471d5feac7\u0001\f\u0001\ufffd\ufffd\ufffd\ufffd\u000e\u0002\u0004null",
            "stack": "Error: Command queue state error. If you can reproduce this, please report it. Last reply: message,hocuspocus::69306d66030152471d5feac7,\u0019hocuspocus-10.0.29.163:33\u001869306d66030152471d5feac7\u0001\f\u0001\ufffd\ufffd\ufffd\ufffd\u000e\u0002\u0004null\n    at DataHandler.shiftCommand (/usr/src/app/hocuspocus-be/hocuspocus-be/node_modules/ioredis/built/DataHandler.js:180:27)\n    at DataHandler.returnReply (/usr/src/app/hocuspocus-be/hocuspocus-be/node_modules/ioredis/built/DataHandler.js:57:27)\n    at JavascriptRedisParser.returnReply (/usr/src/app/hocuspocus-be/hocuspocus-be/node_modules/ioredis/built/DataHandler.js:21:22)\n    at JavascriptRedisParser.execute (/usr/src/app/hocuspocus-be/hocuspocus-be/node_modules/redis-parser/lib/parser.js:544:14)\n    at TLSSocket.<anonymous> (/usr/src/app/hocuspocus-be/hocuspocus-be/node_modules/ioredis/built/DataHandler.js:26:20)\n    at TLSSocket.emit (node:events:518:28)\n    at TLSSocket.emit (node:domain:489:12)\n    at TLSSocket.Readable.read (node:internal/streams/readable:782:10)\n    at TLSSocket.Socket.read (node:net:777:39)\n    at flow (node:internal/streams/readable:1283:53)"
        }
    }
}

The error seems to originate from a pub/sub message (Last reply: message,hocuspocus::<docId>,…) being processed when the internal commandQueue is unexpectedly empty. This leads to a fatal error event emitted by ioredis, which then becomes an uncaughtException and crashes the Hocuspocus worker.

This is happening intermittently in production and results in the worker being force-restarted by our process manager.

Environment?

  • ioredis: 5.8.2
  • redis: AWS Valkey Elasticache Serverless
  • Hocuspocus version: 3.4.0

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions