Skip to content

Comments

feat: add ConnectionManager to replace connections singleton for MilvusClient#3283

Open
XuanYang-cn wants to merge 1 commit intomilvus-io:masterfrom
XuanYang-cn:conn-manager
Open

feat: add ConnectionManager to replace connections singleton for MilvusClient#3283
XuanYang-cn wants to merge 1 commit intomilvus-io:masterfrom
XuanYang-cn:conn-manager

Conversation

@XuanYang-cn
Copy link
Contributor

Summary

  • Introduce a new ConnectionManager component with strategy pattern (RegularStrategy / GlobalStrategy) that provides connection pooling, lifecycle management, and UNAVAILABLE error recovery for MilvusClient, bypassing the legacy connections singleton
  • Integrate both MilvusClient and AsyncMilvusClient to use ConnectionManager / AsyncConnectionManager with shared connection deduplication by address|token, health checks on idle connections, and per-client db_name via CallContext
  • Add comprehensive test suite (132 tests, 100% coverage on connection_manager.py) and design documentation in docs/plans/

Design

See pymilvus-002-connection-manager-design.md for full design doc.

Key Architecture Decisions

  • Strategy pattern: RegularStrategy for direct connections, GlobalStrategy for global cluster topology discovery & primary routing via _GlobalStrategyMixin
  • on_unavailable() -> bool: Strategies decide whether recovery is needed; ConnectionManager._recover() is shared logic
  • Split-lock pattern (sync): Lock released before on_unavailable() network I/O, re-acquired and re-verified before _recover()
  • Fully async chain: AsyncConnectionManager uses asyncio.Lock, async error callbacks, and awaits ensure_channel_ready() on recovery
  • Connection deduplication: Shared by address|token key; db_name is per-client via BaseMilvusClient._generate_call_context()

Test plan

  • Unit tests for ConnectionConfig.from_uri() parsing (various URI formats, overrides, invalid schemes)
  • Singleton behavior for both sync and async managers
  • Shared connection deduplication and dedicated mode isolation
  • Health check on idle connections (>30s threshold)
  • UNAVAILABLE recovery for both RegularStrategy and GlobalStrategy
  • Global topology refresh and _GlobalStrategyMixin edge cases
  • Async error callback chain and concurrent recovery prevention
  • MilvusClient and AsyncMilvusClient integration with ConnectionManager
  • All 132 tests pass, lint clean, 100% coverage on connection_manager.py
  • Integration test with actual Milvus server

🤖 Generated with Claude Code

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: XuanYang-cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 90.75235% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.10%. Comparing base (921e74b) to head (820d123).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
pymilvus/milvus_client/async_milvus_client.py 61.97% 54 Missing ⚠️
pymilvus/decorators.py 33.33% 4 Missing ⚠️
pymilvus/client/async_grpc_handler.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3283      +/-   ##
==========================================
+ Coverage   76.46%   77.10%   +0.63%     
==========================================
  Files          63       64       +1     
  Lines       13321    13575     +254     
==========================================
+ Hits        10186    10467     +281     
+ Misses       3135     3108      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…usClient

Introduce a new ConnectionManager component that provides connection
pooling, lifecycle management, and strategy-based connection handling
for MilvusClient, bypassing the legacy `connections` singleton.

Key features:
- Strategy pattern: RegularStrategy for direct connections,
  GlobalStrategy for global cluster topology discovery & primary routing
- Connection deduplication by address|token key (shared mode)
- Dedicated connection mode for isolated connections
- Health check on idle connections (>30s): gRPC state + GetVersion probe
- UNAVAILABLE error recovery via strategy.on_unavailable() -> bool
- Split-lock pattern (sync) to avoid holding lock during network I/O
- Fully async chain for AsyncConnectionManager (asyncio.Lock, async
  error callbacks, async recovery with ensure_channel_ready)
- _GlobalStrategyMixin extracts shared topology logic for sync/async
- Background TopologyRefresher for global clusters (5min interval)
- Error callback chain: retry decorator -> handler._on_rpc_error ->
  ConnectionManager.handle_error -> _recover

Integration:
- MilvusClient.__init__ uses ConnectionManager.get_or_create()
- AsyncMilvusClient uses deferred _connect() via __aenter__
- db_name is per-client via CallContext, not per-handler
- Both clients register as WeakSet references on shared connections

Also includes:
- Design docs in docs/plans/ with template and README
- Comprehensive test suite (132 tests, 100% coverage on connection_manager.py)
- Removes dead code: GlobalStub class (superseded by GlobalStrategy),
  ConnectionConfig.secure field (never consumed by handlers)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: yangxuan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants