Skip to content

Comments

feat:support columnar search result to better performance#3214

Open
jac0626 wants to merge 3 commits intomilvus-io:masterfrom
jac0626:feature/columnar-search-result
Open

feat:support columnar search result to better performance#3214
jac0626 wants to merge 3 commits intomilvus-io:masterfrom
jac0626:feature/columnar-search-result

Conversation

@jac0626
Copy link
Collaborator

@jac0626 jac0626 commented Jan 19, 2026

see #3213

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jac0626
To complete the pull request process, please assign longjiquan after the PR has been reviewed.
You can assign the PR to them by writing /assign @longjiquan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link

Summary of Changes

Hello @jac0626, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant performance enhancement for search results by implementing a new ColumnarSearchResult class. This new class stores search data in a columnar format, allowing for lazy access and avoiding the overhead of eagerly creating numerous Python objects. The change is seamlessly integrated into the existing gRPC handlers, providing an option to retrieve results in this optimized format while maintaining full API compatibility with the previous row-based SearchResult. This feature aims to improve the efficiency of handling large search result sets by reducing memory footprint and speeding up initialization.

Highlights

  • New Columnar Search Result Implementation: Introduces ColumnarSearchResult as a high-performance alternative to the existing SearchResult, designed for efficiency in handling large result sets.
  • Performance Optimization: The new implementation utilizes columnar storage, lazy data access, and reduced object creation (O(1) initialization) to significantly improve performance and minimize memory usage, especially when only a subset of results is accessed.
  • API Compatibility: The ColumnarSearchResult is designed to be a drop-in replacement, maintaining full API compatibility with the original SearchResult's iteration patterns, field return types, and dict-like interfaces.
  • Integration into gRPC Handlers: Both synchronous (grpc_handler.py) and asynchronous (async_grpc_handler.py) gRPC search and hybrid search methods have been updated to conditionally return ColumnarSearchResult based on result_type or use_columnar parameters.
  • Comprehensive Testing: New test files (test_columnar_compatibility.py and test_columnar_search_result.py) have been added to ensure type compatibility, iteration behavior, dict-like interface, performance benefits, and cover various data types and edge cases for the new columnar result structure.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the dco-passed label Jan 19, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ColumnarSearchResult as a performance-optimized, drop-in replacement for SearchResult, focusing on lazy data access and reduced object creation. The implementation is well-done and includes an extensive and thorough test suite, which is excellent for ensuring compatibility and correctness. My review includes a few suggestions for improvement, mainly concerning a potential Liskov Substitution Principle violation in an accessor class, an opportunity to enhance performance when handling dynamic fields, and a note on the use of contextlib.suppress which could mask underlying issues.

@jac0626 jac0626 force-pushed the feature/columnar-search-result branch from 552ab0b to 1fe4346 Compare January 19, 2026 07:45
@jac0626 jac0626 changed the title feat:support columnar search result to better performance [WIP]feat:support columnar search result to better performance Jan 19, 2026
@jac0626 jac0626 force-pushed the feature/columnar-search-result branch from 76c71ab to da00c8a Compare January 19, 2026 07:56
@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 93.30922% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.27%. Comparing base (f90b685) to head (ad267f8).

Files with missing lines Patch % Lines
pymilvus/client/columnar_search_result.py 93.71% 34 Missing ⚠️
pymilvus/orm/collection.py 33.33% 2 Missing ⚠️
pymilvus/orm/future.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3214      +/-   ##
==========================================
+ Coverage   76.06%   76.27%   +0.21%     
==========================================
  Files          62       63       +1     
  Lines       13018    13559     +541     
==========================================
+ Hits         9902    10342     +440     
- Misses       3116     3217     +101     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jac0626
Copy link
Collaborator Author

jac0626 commented Jan 19, 2026

More work is needed in the future to improve the code's readability, maintainability, extensibility, and performance @jac0626

  • Extract base class - Dedupe shared logic with SearchResult(highlight, metadata parsing)
  • Replace if-elif chains - Use TypeHandler registry pattern in _bind_accessor()
  • Consolidate Accessor classes - Reduce boilerplate with generic/factory pattern
  • Unify dynamic field handling - Merge $meta path with static field path
  • Define Protocol/ABC - Formal interface for type checking compatibility

@mergify mergify bot added the ci-passed label Jan 19, 2026
@jac0626 jac0626 changed the title [WIP]feat:support columnar search result to better performance feat:support columnar search result to better performance Jan 20, 2026
@XuanYang-cn
Copy link
Contributor

#3208 made some changes in search_result, might need extra attention.

But lets not hurry into this new feature, I believe we can do some small changes that we missed on the perf test, mainly engineering improvements. Let's not skip the engineering improvements, that's more like to be released quickly and harmlessly.

And for a new feature like this one, we need a design not a decision. In the design, I'd like some of these questions answered:

  1. Why do we choose to add a flag in search, and make it complete compatible to the old search_result? Why not replace?
  2. Why we need a complete compatible result?
  3. can we provide to_pandas, to_arrow, etc.? can we make the new return results typed for better usages?
  4. Why we choose to ignore those complex types? which are most likely to be beneficial from this feature.

Anyway, let's discuss designs based on the issue #3213, and then implement based on the final design.

For compatibility test(if we're choosing this way): the new code should pass OLD ut, new ut doesn't prove anything.

@jac0626
Copy link
Collaborator Author

jac0626 commented Jan 22, 2026

#3208 made some changes in search_result, might need extra attention.

But lets not hurry into this new feature, I believe we can do some small changes that we missed on the perf test, mainly engineering improvements. Let's not skip the engineering improvements, that's more like to be released quickly and harmlessly.

And for a new feature like this one, we need a design not a decision. In the design, I'd like some of these questions answered:

  1. Why do we choose to add a flag in search, and make it complete compatible to the old search_result? Why not replace?
  2. Why we need a complete compatible result?
  3. can we provide to_pandas, to_arrow, etc.? can we make the new return results typed for better usages?
  4. Why we choose to ignore those complex types? which are most likely to be beneficial from this feature.

Anyway, let's discuss designs based on the issue #3213, and then implement based on the final design.

For compatibility test(if we're choosing this way): the new code should pass OLD ut, new ut doesn't prove anything.

I will try to do engineering improvements firstly, then I would upload a design doc soon.

@jac0626 jac0626 force-pushed the feature/columnar-search-result branch 7 times, most recently from 32e8221 to 1a3c4a2 Compare January 29, 2026 08:14
@jac0626 jac0626 force-pushed the feature/columnar-search-result branch from 1a3c4a2 to 86573ae Compare January 29, 2026 08:27
@mergify mergify bot added needs-dco and removed dco-passed labels Jan 29, 2026
@jac0626 jac0626 force-pushed the feature/columnar-search-result branch from e7db13d to ad267f8 Compare January 29, 2026 09:47
@mergify mergify bot added dco-passed and removed needs-dco labels Jan 29, 2026
@jac0626
Copy link
Collaborator Author

jac0626 commented Jan 29, 2026

#3208 made some changes in search_result, might need extra attention.

But lets not hurry into this new feature, I believe we can do some small changes that we missed on the perf test, mainly engineering improvements. Let's not skip the engineering improvements, that's more like to be released quickly and harmlessly.

And for a new feature like this one, we need a design not a decision. In the design, I'd like some of these questions answered:

  1. Why do we choose to add a flag in search, and make it complete compatible to the old search_result? Why not replace?
  2. Why we need a complete compatible result?
  3. can we provide to_pandas, to_arrow, etc.? can we make the new return results typed for better usages?
  4. Why we choose to ignore those complex types? which are most likely to be beneficial from this feature.

Anyway, let's discuss designs based on the issue #3213, and then implement based on the final design.

For compatibility test(if we're choosing this way): the new code should pass OLD ut, new ut doesn't prove anything.

@XuanYang-cn Thanks for the feedback!

Regarding #3208: Already addressed — the columnar implementation has been updated to accommodate those changes.

On engineering improvements: Done.see #3240, I have investigating some other improvements, but just make little sense.

On design: I've prepared a design doc:

To answer your specific questions:

  1. Why compatible, not replace?
    last version we provide a flag, now we are doing a direct replacement — no flag, no dual versions. "Compatible" here means the new ColumnarSearchResult maintains the same API contract as the old SearchResult, so existing user code continues to work without changes.

  2. Return types:

    • For the standard iteration API (hit.id, hit['field']), return types are identical to the original.
    • For the new get_column() API
      • return_type="list": Works for all types.
      • return_type="numpy": Returns native np.ndarray for numeric/vector types. For complex types (JSON, Dynamic, Sparse etc.) where numpy offers no benefit, we will raise an error instead of returning inefficient object arrays.
  3. to_pandas / to_arrow:
    Not implemented yet. We can add these as follow-up work

  4. Complex types:
    Fully supported — JSON, ARRAY, dynamic fields ($meta), and all vector types are covered.

On testing:
Currently using patch-based compatibility tests (test_columnar_compat.py) to verify the new code passes the old SearchResult tests. Full validation would benefit from e2e testing against a real Milvus instance.

Let me know if you'd like me to update anything in the design doc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants