Skip to content

Conversation

@uriyage
Copy link
Contributor

@uriyage uriyage commented May 25, 2025

This commit fixes two issues in pubsubUnsubscribeChannel that could lead to memory corruption:

  1. When calculating the slot for a channel, we were using getKeySlot() which might use the current_client's slot if available. This is problematic when a client kills another client (e.g., via CLIENT KILL command) as the slot won't match the channel's actual slot.

  2. The found variable was not initialized to NULL, causing the serverAssert to potentially pass incorrectly when the hashtable lookup failed, leading to memory corruption in subsequent operations.

The fix:

  • Calculate the slot directly from the channel name using keyHashSlot() instead of relying on the current client's slot
  • Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill another client that is subscribed to a sharded pubsub channel during a transaction.

Crash log (After initializing the variable 'found' to null, without initialization, memory corruption could occur):

VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840

This commit fixes two issues in pubsubUnsubscribeChannel that could lead to
memory corruption:

1. The 'found' variable was not initialized to NULL, causing the serverAssert
   to potentially pass incorrectly when the hashtable lookup failed, leading
   to memory corruption in subsequent operations.

2. When calculating the slot for a channel, we were using getKeySlot() which
   might use the current_client's slot if available. This is problematic when
   a client kills another client (e.g., via CLIENT KILL command) as the slot
   won't match the channel's actual slot.

The fix:
- Initialize 'found' to NULL
- Calculate the slot directly from the channel name using keyHashSlot()
  instead of relying on the current client's slot

Added a test case that reproduces the issue by having one client kill another
client that is subscribed to a sharded pubsub channel during a transaction.

Signed-off-by: Uri Yagelnik <[email protected]>
@uriyage uriyage force-pushed the fix_pubsub_unsubscribe_channel branch from 5b98c2b to 644c7f2 Compare May 25, 2025 09:16
@codecov
Copy link

codecov bot commented May 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.03%. Comparing base (053bf9e) to head (5d32f50).
Report is 2 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2137      +/-   ##
============================================
- Coverage     71.18%   71.03%   -0.16%     
============================================
  Files           122      122              
  Lines         66046    66045       -1     
============================================
- Hits          47013    46912     -101     
- Misses        19033    19133     +100     
Files with missing lines Coverage Δ
src/pubsub.c 96.79% <100.00%> (ø)

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch (maybe state this is related to the fuzzer infrastructure you are about to present soon)

Overall LGTM, minor comments

Signed-off-by: Uri Yagelnik <[email protected]>
@ranshid ranshid merged commit bd5dcb2 into valkey-io:unstable May 25, 2025
50 of 51 checks passed
@ranshid ranshid added the release-notes This issue should get a line item in the release notes label May 25, 2025
@ranshid ranshid removed this from Valkey 7.2 May 25, 2025
@ranshid ranshid moved this to To be backported in Valkey 8.0 May 25, 2025
@ranshid ranshid moved this to To be backported in Valkey 8.1 May 25, 2025
@ranshid ranshid added the bug Something isn't working label May 25, 2025
hpatro pushed a commit to hpatro/valkey that referenced this pull request Jun 4, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.


The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.


Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Jun 4, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
hpatro pushed a commit that referenced this pull request Jun 9, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
hpatro pushed a commit that referenced this pull request Jun 11, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
@hpatro hpatro moved this from To be backported to 8.1.2 in Valkey 8.1 Jun 11, 2025
vitarb pushed a commit to vitarb/valkey that referenced this pull request Jun 12, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
(cherry picked from commit bd5dcb2)
@vitarb vitarb mentioned this pull request Jun 13, 2025
vitarb pushed a commit to vitarb/valkey that referenced this pull request Jun 13, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
(cherry picked from commit bd5dcb2)
shanwan1 pushed a commit to shanwan1/valkey that referenced this pull request Jun 13, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: shanwan1 <[email protected]>
@uriyage uriyage mentioned this pull request Jul 10, 2025
zuiderkwast pushed a commit to vitarb/valkey that referenced this pull request Aug 15, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
(cherry picked from commit bd5dcb2)
zuiderkwast pushed a commit to vitarb/valkey that referenced this pull request Aug 15, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
(cherry picked from commit bd5dcb2)
Signed-off-by: Viktor Söderqvist <[email protected]>
@zuiderkwast zuiderkwast moved this from To be backported to 8.0.5 in Valkey 8.0 Aug 18, 2025
zuiderkwast pushed a commit to vitarb/valkey that referenced this pull request Aug 21, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
(cherry picked from commit bd5dcb2)
Signed-off-by: Viktor Söderqvist <[email protected]>
zuiderkwast pushed a commit that referenced this pull request Aug 22, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
(cherry picked from commit bd5dcb2)
Signed-off-by: Viktor Söderqvist <[email protected]>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Sep 16, 2025
This commit fixes two issues in `pubsubUnsubscribeChannel` that could
lead to memory corruption:

1. When calculating the slot for a channel, we were using getKeySlot()
which might use the current_client's slot if available. This is
problematic when a client kills another client (e.g., via CLIENT KILL
command) as the slot won't match the channel's actual slot.

2. The `found` variable was not initialized to `NULL`, causing the
serverAssert to potentially pass incorrectly when the hashtable lookup
failed, leading to memory corruption in subsequent operations.

The fix:
- Calculate the slot directly from the channel name using keyHashSlot()
instead of relying on the current client's slot
- Initialize 'found' to NULL

Added a test case that reproduces the issue by having one client kill
another client that is subscribed to a sharded pubsub channel during a
transaction.

Crash log (After initializing the variable 'found' to null, without
initialization, memory corruption could occur):
```
VALKEY BUG REPORT START: Cut & paste starting from here ===
59707:M 24 May 2025 23:10:40.429 # === ASSERTION FAILED CLIENT CONTEXT ===
59707:M 24 May 2025 23:10:40.429 # client->flags = 108086391057154048
59707:M 24 May 2025 23:10:40.429 # client->conn = fd=11
59707:M 24 May 2025 23:10:40.429 # client->argc = 0
59707:M 24 May 2025 23:10:40.429 # === RECURSIVE ASSERTION FAILED ===
59707:M 24 May 2025 23:10:40.429 # ==> pubsub.c:348 'found' is not true

------ STACK TRACE ------

Backtrace:
0   valkey-server                       0x0000000104974054 _serverAssertWithInfo + 112
1   valkey-server                       0x000000010496c7fc pubsubUnsubscribeChannel + 268
2   valkey-server                       0x000000010496cea0 pubsubUnsubscribeAllChannelsInternal + 216
3   valkey-server                       0x000000010496c2e0 pubsubUnsubscribeShardAllChannels + 76
4   valkey-server                       0x000000010496c1d4 freeClientPubSubData + 60
5   valkey-server                       0x00000001048f3cbc freeClient + 792
6   valkey-server                       0x0000000104900870 clientKillCommand + 356
7   valkey-server                       0x00000001048d1790 call + 428
8   valkey-server                       0x000000010496ef4c execCommand + 872
9   valkey-server                       0x00000001048d1790 call + 428
10  valkey-server                       0x00000001048d3a44 processCommand + 5056
11  valkey-server                       0x00000001048fdc20 processCommandAndResetClient + 64
12  valkey-server                       0x00000001048fdeac processInputBuffer + 276
13  valkey-server                       0x00000001048f2ff0 readQueryFromClient + 148
14  valkey-server                       0x0000000104a182e8 callHandler + 60
15  valkey-server                       0x0000000104a1731c connSocketEventHandler + 488
16  valkey-server                       0x00000001048b5e80 aeProcessEvents + 820
17  valkey-server                       0x00000001048b6598 aeMain + 64
18  valkey-server                       0x00000001048dcecc main + 4084
19  dyld                                0x0000000186b34274 start + 2840
````

---------

Signed-off-by: Uri Yagelnik <[email protected]>
(cherry picked from commit bd5dcb2)
Signed-off-by: Viktor Söderqvist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working release-notes This issue should get a line item in the release notes

Projects

Status: 8.0.5
Status: 8.1.2

Development

Successfully merging this pull request may close these issues.

2 participants