Skip to content

fix: allow sharing RocketMQ MQClientInstance to avoid thread explosion#4311

Open
daguimu wants to merge 2 commits into
alibaba:2025.1.xfrom
daguimu:fix/rocketmq-share-client-instance-issue4291
Open

fix: allow sharing RocketMQ MQClientInstance to avoid thread explosion#4311
daguimu wants to merge 2 commits into
alibaba:2025.1.xfrom
daguimu:fix/rocketmq-share-client-instance-issue4291

Conversation

@daguimu
Copy link
Copy Markdown
Contributor

@daguimu daguimu commented Apr 23, 2026

Problem

RocketMQUtils.getInstanceName appends System.nanoTime() to every generated client name, so every consumer and producer ends up with a unique instanceName and therefore a unique clientId (ip@instanceName). RocketMQ's MQClientManager.getOrCreateMQClientInstance creates a separate MQClientInstance per unique clientId, and each instance spawns ~20-24 threads (Netty EventLoop, PullMessageService, RebalanceService, ScheduledExecutor, etc.). With many stream bindings (e.g. 70+ consumers in the reported case) this multiplies into ~1,700 RocketMQ threads where one shared instance would be sufficient.

Root Cause

Consumers with different groups connecting to the same name server can safely share a single MQClientInstance — the consumerTable inside MQClientInstance is keyed by group. The nanoTime suffix defeats that sharing unconditionally, even when the binding topology would tolerate it.

Fix

  • Add an opt-in shareClientInstance boolean to RocketMQCommonProperties (default false, so behavior is unchanged unless the user sets it).
  • Add RocketMQUtils.getInstanceName(rpcHook, identify, shareClientInstance). When shareClientInstance is true, the nanoTime segment is omitted and the generated name becomes deterministic ([ak|]identify|pid), which lets RocketMQ reuse a single MQClientInstance across multiple consumers/producers in the same JVM.
  • Keep the existing two-argument getInstanceName as a thin delegate so any external callers and the existing default behavior are preserved.
  • Propagate shareClientInstance from RocketMQBinderConfigurationProperties to the consumer/producer properties inside mergeRocketMQProperties using OR-logic, so users can toggle the flag once at the binder level.
  • Route the flag through RocketMQConsumerFactory#initPushConsumer, RocketMQConsumerFactory#initPullConsumer, and RocketMQProduceFactory so every produced/consumed client honors it.

Users who hit the thread explosion can now set:

spring:
  cloud:
    stream:
      rocketmq:
        binder:
          share-client-instance: true

or scope it per-binding on consumer/producer properties.

Tests Added

New unit tests in RocketMQUtilsTest:

Change Point Test
Default overload keeps pre-existing per-call uniqueness getInstanceNameWithoutSharingProducesDistinctNames, getInstanceNameWithoutSharingIncludesNanoTimeSuffix
New shareClientInstance=true mode yields deterministic names getInstanceNameWithSharingProducesDeterministicNameForSameIdentify, getInstanceNameWithSharingOmitsNanoTimeSuffix
Shared names still differentiate by identify getInstanceNameWithSharingStillDifferentiatesByIdentify
Access-key segment still prepended when an AclClientRPCHook is supplied getInstanceNameWithRpcHookPrependsAccessKey
Two-arg overload delegates to the unshared path defaultOverloadDelegatesToUnsharedBehavior
mergeRocketMQProperties OR-logic for the new flag mergeRocketMQPropertiesPropagatesShareClientInstanceFromBinder, mergeRocketMQPropertiesPreservesConsumerShareClientInstanceWhenBinderIsFalse, mergeRocketMQPropertiesDefaultLeavesShareClientInstanceFalse

Impact

  • Default behavior is unchanged (shareClientInstance=false), so existing applications pick up no behavioral drift.
  • Applications with many RocketMQ stream bindings can opt in to collapse per-binding MQClientInstance copies into one, recovering the ~20 threads each duplicate would spawn.
  • Changes are confined to the RocketMQ stream binder starter; no cross-module surface is altered.

Fixes #4291

daguimu added 2 commits April 23, 2026 23:10
Introduce an opt-in shareClientInstance flag on RocketMQ common
properties. When enabled, RocketMQUtils.getInstanceName omits the
per-call nanoTime suffix so consumers and producers connecting to
the same name server can reuse one MQClientInstance and its worker
threads instead of each spawning ~20 threads.

The flag defaults to false to preserve the historical per-client
isolation, and propagates from binder-level configuration down to
individual consumer/producer properties.

Fixes alibaba#4291
…antics

- Switch shareClientInstance from a primitive to @nullable Boolean so
  an explicit consumer/producer-level false can override a binder-level
  true (important for ordered/transactional bindings that need isolated
  MQClientInstance).
- Rename isShareClientInstance()/setShareClientInstance(boolean) to
  getShareClientInstance()/setShareClientInstance(Boolean) to match the
  getXxx() convention already used by the other boolean fields in
  RocketMQCommonProperties.
- Change mergeRocketMQProperties to the default-only merge pattern
  already used for string fields: propagate the binder value only when
  the extension has not set one explicitly.
- Update consumer/producer factories to unbox via Boolean.TRUE.equals.
- Extend RocketMQUtilsTest with explicit-false-wins, producer merge,
  explicit-true preservation, and both-unset cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consumers with different groups create separate MQClientInstance per binding, causing thread explosion

1 participant