Skip to content

Add Faceted Search Support#2

Merged
mjochum64 merged 2 commits intomainfrom
feature/faceted-search
Nov 8, 2025
Merged

Add Faceted Search Support#2
mjochum64 merged 2 commits intomainfrom
feature/faceted-search

Conversation

@mjochum64
Copy link
Owner

Summary

Implements Faceted Search functionality for data exploration and aggregation, enabling users to analyze document distributions across field values.

What is Faceted Search?

Faceted search (also called faceted navigation) allows users to explore data by showing aggregated counts for different field values. This is extremely useful for:

  • Data exploration ("What categories do I have?")
  • Filtering guidance ("Show me all programming documents")
  • Analytics ("How many documents per author?")

Changes

Core Implementation

  • SolrClient (src/server/solr_client.py):

    • Extended search() method with facet_fields: Optional[List[str]] parameter
    • Automatically sets facet=true and facet.mincount=1 when facet_fields provided
    • Returns Solr response including facet_counts.facet_fields
  • MCP Server (src/server/mcp_server.py):

    • Added facet_fields parameter to search tool
    • Updated tool description to mention faceting
    • Enhanced logging to show when facets are requested

Testing

  • Unit Tests (tests/test_server.py):

    • test_search_tool_with_facets: Tests MCP tool with facet parameters
    • test_solr_client_search_with_facets: Tests SolrClient faceting directly
    • All 7 tests passing ✅
  • Integration Testing:

    • Verified with MCP Inspector GUI
    • Tested with live Solr instance
    • Confirmed facet counts for category and author fields

Documentation

  • README.md:
    • Added "Faceted Search" to features list
    • New example: "Using faceted search"
    • Extended MCP Inspector guide with facet usage
    • Updated search tool description

Example Usage

Request

{
  "query": "*:*",
  "rows": 10,
  "facet_fields": ["category", "author"]
}

Response

{
  "response": {
    "numFound": 10,
    "docs": [...]
  },
  "facet_counts": {
    "facet_fields": {
      "category": ["programming", 3, "technology", 3, "database", 1, ...],
      "author": ["john", 2, "smith", 2, "alice", 1, ...]
    }
  }
}

Test Results

============================= test session starts ==============================
tests/test_server.py::test_search_solr_resource PASSED                   [ 14%]
tests/test_server.py::test_search_tool PASSED                            [ 28%]
tests/test_server.py::test_get_document_tool PASSED                      [ 42%]
tests/test_server.py::test_solr_client_search PASSED                     [ 57%]
tests/test_server.py::test_solr_client_get_document PASSED               [ 71%]
tests/test_server.py::test_search_tool_with_facets PASSED                [ 85%]
tests/test_server.py::test_solr_client_search_with_facets PASSED         [100%]

============================== 7 passed in 0.21s

Benefits for LLMs

Faceted search enables LLMs to:

  1. Explore: "What categories are in this dataset?"
  2. Analyze: "How many documents per category?"
  3. Guide searches: "Show me all programming documents"
  4. Understand data distribution: "Who are the most prolific authors?"

Future Enhancements

Potential follow-up features:

  • Facet ranges (for numeric/date fields)
  • Facet pivots (multi-dimensional analysis)
  • Facet queries (custom facet definitions)
  • Highlighting support

🤖 Generated with Claude Code

**New Features:**
- Extend SolrClient.search() with facet_fields parameter
- Add facet support to MCP search tool
- Facet counts returned in response under facet_counts.facet_fields

**Implementation Details:**
- facet_fields accepts list of field names (e.g., ["category", "author"])
- Automatically sets facet=true and facet.mincount=1 in Solr query
- Response includes aggregated counts per field value

**Testing:**
- Added 2 new unit tests for faceted search
- Verified with MCP Inspector and live Solr instance
- All 7 unit tests passing

**Documentation:**
- Updated README.md with faceted search examples
- Added MCP Inspector usage guide for facets
- Updated feature list to highlight faceting capability

Example usage:
```json
{
  "query": "*:*",
  "rows": 10,
  "facet_fields": ["category", "author"]
}
```

Response includes facet_counts with category/author distributions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@gemini-code-assist
Copy link

Summary of Changes

Hello @mjochum64, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates faceted search capabilities into the system, enabling more dynamic data exploration and aggregation. By extending the SolrClient and the MCP Server's search tool, users can now request aggregated counts for specified fields, which is invaluable for understanding data distributions, guiding searches, and performing basic analytics. The changes are thoroughly tested and documented, providing a robust and user-friendly enhancement to the existing search functionality.

Highlights

  • Faceted Search Implementation: Introduced comprehensive support for faceted search, allowing users to explore and aggregate data based on field values.
  • SolrClient Enhancement: The SolrClient.search() method was extended to accept an optional facet_fields parameter, automatically configuring Solr for faceting when provided.
  • MCP Server Integration: The search tool in the MCP Server now supports the facet_fields parameter, passing it to the Solr client and enhancing logging to reflect facet requests.
  • Testing and Documentation: New unit tests were added for both the SolrClient and the search tool to validate faceting functionality, and the README.md was updated with feature descriptions, examples, and usage instructions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

**Version Updates:**
- pyproject.toml: 1.0.0 → 1.2.0
- README.md: 1.1.0 → 1.2.0

**CHANGELOG Updates:**
- Added missing [1.1.0] entry for MCP 1.21.0 modernization
- Added [1.2.0] entry for Faceted Search feature

**Version History:**
- 1.0.0 (2025-04-26): Initial release
- 1.1.0 (2025-11-08): MCP 1.21.0 modernization
- 1.2.0 (2025-11-08): Faceted Search support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively introduces faceted search capabilities, a valuable feature for data exploration. The changes in the Solr client and MCP server are logical and well-implemented. The inclusion of new tests is also a great step. My review focuses primarily on enhancing the new tests to make them more robust and readable, ensuring they fully validate the new functionality.

Comment on lines +248 to +252
async def get(*args, **kwargs):
return mock_response

mock_client = AsyncMock()
mock_client.__aenter__.return_value.get = get

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test checks the final output but misses a crucial validation: ensuring that the SolrClient constructs the correct HTTP request with faceting parameters. The current mock setup with a local get function makes it difficult to inspect the call arguments.

By refactoring to use a dedicated AsyncMock for the get method, you can then assert that httpx was called with the expected faceting parameters (facet=true, facet.field, etc.), making the test much more robust.

After applying the suggestion, you can add these assertions to the end of the test:

# Verify call parameters
get_mock.assert_called_once()
_, kwargs = get_mock.call_args
expected_params = {
    'q': '*:*',
    'wt': 'json',
    'rows': 10,
    'start': 0,
    'facet': 'true',
    'facet.field': ['category'],
    'facet.mincount': 1
}
assert kwargs['params'] == expected_params
    get_mock = AsyncMock(return_value=mock_response)
    mock_client = AsyncMock()
    mock_client.__aenter__.return_value.get = get_mock

Comment on lines +218 to +220
category_facets = result["facet_counts"]["facet_fields"]["category"]
assert "programming" in category_facets
assert category_facets[category_facets.index("programming") + 1] == 3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current method of asserting facet counts by finding the index of the value and checking the next element in the list works, but it can be hard to read and maintain. A more readable and robust approach is to convert Solr's flat list response (['value1', count1, 'value2', count2]) into a dictionary for assertions. This makes the test's intent much clearer.

Suggested change
category_facets = result["facet_counts"]["facet_fields"]["category"]
assert "programming" in category_facets
assert category_facets[category_facets.index("programming") + 1] == 3
category_facets_list = result["facet_counts"]["facet_fields"]["category"]
category_facets = dict(zip(category_facets_list[::2], category_facets_list[1::2]))
assert category_facets.get("programming") == 3

assert category_facets[category_facets.index("programming") + 1] == 3

# Verify ctx.info was called
assert mock_context.info.called

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test verifies the output of the search tool, but it doesn't confirm that the facet_fields argument is correctly passed to the underlying solr_client.search method. Adding an assertion to check the arguments of the mocked call will make the test more thorough and prevent regressions.

    assert mock_context.info.called

    # Verify that the solr_client.search was called with the correct facet fields
    mock_context.request_context.lifespan_context.solr_client.search.assert_called_once_with(
        query="*:*",
        filter_query=None,
        sort=None,
        rows=10,
        start=0,
        facet_fields=["category", "author"]
    )

Comment on lines +274 to +278
category_facets = result["facet_counts"]["facet_fields"]["category"]
assert "programming" in category_facets
assert "technology" in category_facets
assert category_facets[category_facets.index("programming") + 1] == 3
assert category_facets[category_facets.index("technology") + 1] == 3 No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the other new test, the facet count assertions here can be made more readable and maintainable. Converting the flat list from Solr's response into a dictionary makes the assertions clearer and less brittle.

        category_facets_list = result["facet_counts"]["facet_fields"]["category"]
        category_facets = dict(zip(category_facets_list[::2], category_facets_list[1::2]))
        assert category_facets.get("programming") == 3
        assert category_facets.get("technology") == 3

@mjochum64 mjochum64 merged commit eb03640 into main Nov 8, 2025
1 check failed
@mjochum64 mjochum64 deleted the feature/faceted-search branch November 8, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant