Skip to content

Add documentation to dataframe query workflow and fix api rendering for bindings#11650

Merged
ntjohnson1 merged 12 commits intomainfrom
nick/query_data
Oct 29, 2025
Merged

Add documentation to dataframe query workflow and fix api rendering for bindings#11650
ntjohnson1 merged 12 commits intomainfrom
nick/query_data

Conversation

@ntjohnson1
Copy link
Member

@ntjohnson1 ntjohnson1 commented Oct 24, 2025

Related

What

  • Primary task: first pass through query-data.md to give an overview of how to start and connect to OSS server and get to a dataframe with pointers to datafusion for follow on work
    • Will fill an issue to make this into a snippet/tutorial to make testable after we get a python object for the server
  • Secondary task: Render docs for DataframeQueryView and lots of other bindings exposed through catalog
    • This scope creeped a lot so I just did the manual class list will file a follow on ticket for proposal to fix this more globally
    1. Add lint codes so I can separate out my ignores for pyclass_eq and pyclass_module requirements
      1. The lint.py changes (which aren't great since I'm not an experience parser writer but don't seem awful)
    2. Add a pyclass module requirement to specify rerun_bindings.rerun_bindings see Add module definition to all pyclasses #11268 for more context
      1. All the rust file changes
    3. Fight mkdocs to render bindings properly
      1. The mkdocs related files
      2. Adds the ability to generate our docs just off of the stubs without the shared object being present at all. NOTE: The docs actually look better this way, but I added some accommodation to make them look less awful if the shared object is present.

Filed an issue to fix all the other missing classes from our rendered api docs: https://linear.app/rerun/issue/RR-2766/cover-remaining-classes-in-python-api-rendered-docs
and an issue with a proposal to avoid this problem moving forward:
https://linear.app/rerun/issue/RR-2765/change-python-public-api-surface-pattern-to-support-easier-docs

@github-actions
Copy link

github-actions bot commented Oct 24, 2025

Web viewer built successfully.

Result Commit Link Manifest
31a9f2a https://rerun.io/viewer/pr/11650 +nightly +main

View image diff on kitdiff.

Note: This comment is updated whenever you push a commit.

@github-actions
Copy link

github-actions bot commented Oct 24, 2025

Latest documentation preview deployed successfully.

Result Commit Link
31a9f2a https://landing-d6idazyq6-rerun.vercel.app/docs

Note: This comment is updated whenever you push a commit.

@ntjohnson1 ntjohnson1 added sdk-python Python logging API deploy docs Once this PR is merged to main, the resulting commit will be cherry-picked to docs-latest include in changelog 📖 documentation Improvements or additions to documentation 🧑‍💻 dev experience developer experience (excluding CI) and removed sdk-python Python logging API labels Oct 24, 2025
@github-actions
Copy link

Your changes cannot be automatically cherry-picked to docs-latest.

You should remove the deploy docs label and perform the cherry-pick manually after merging.

@ntjohnson1 ntjohnson1 removed the deploy docs Once this PR is merged to main, the resulting commit will be cherry-picked to docs-latest label Oct 24, 2025
@ntjohnson1 ntjohnson1 requested a review from Copilot October 24, 2025 11:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds documentation for querying data via the Open Source Server and fixes API rendering for Python bindings classes. The changes ensure that bindings classes are properly documented by enforcing the module = "rerun_bindings.rerun_bindings" parameter in Rust #[pyclass] declarations and updating the linting infrastructure to support granular NOLINT directives.

Key Changes:

  • Enhanced the linting system to check for both eq and module parameters in #[pyclass] declarations
  • Added support for specific error codes in NOLINT directives (e.g., NOLINT: ignore[py-cls-eq])
  • Updated all Rust #[pyclass] declarations to include the required module parameter
  • Added new documentation page for querying data via Open Source Server
  • Improved mkdocs configuration to properly render bindings documentation

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/lint.py Expanded linting to enforce module parameter and support code-specific NOLINT directives
rerun_py/src/viewer.rs Updated NOLINT syntax and added module parameter
rerun_py/src/python_bridge.rs Added module parameter to multiple pyclass declarations
rerun_py/src/dataframe/schema.rs Updated NOLINT syntax
rerun_py/src/dataframe/rrd.rs Updated NOLINT syntax
rerun_py/src/dataframe/recording_view.rs Updated NOLINT syntax
rerun_py/src/dataframe/recording.rs Updated NOLINT syntax
rerun_py/src/catalog/task.rs Added module parameter to pyclass declarations
rerun_py/src/catalog/table_entry.rs Added module parameter
rerun_py/src/catalog/mod.rs Added module parameter to enum
rerun_py/src/catalog/errors.rs Updated exception module path
rerun_py/src/catalog/entry.rs Added module parameter to multiple pyclass declarations
rerun_py/src/catalog/dataset_entry.rs Added module parameter
rerun_py/src/catalog/datafusion_table.rs Added module parameter
rerun_py/src/catalog/datafusion_catalog.rs Added module parameter
rerun_py/src/catalog/dataframe_rendering.rs Added module parameter
rerun_py/src/catalog/dataframe_query.rs Added module parameter
rerun_py/src/catalog/catalog_client.rs Added module parameter
rerun_py/rerun_bindings/rerun_bindings.pyi Added docstrings for DatasetEntry and DataframeQueryView
rerun_py/mkdocs.yml Added filters and enabled stub package finding
rerun_py/docs/gen_common_index.py Added class list for catalog section and bindings support
docs/content/getting-started/data-out/query-data.md New documentation page for querying via Open Source Server

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@bllchmbrs bllchmbrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, made some nits.


<!-- TODO(RR-2818) add link to doc -->

# Open source server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just nuke this little section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nuke this section as in The open source server is still in development to reach API parity with the cloud offering and evolve with the cloud offering. or nuke the whole top level heading?

Comment on lines +70 to +77
```python
if "oss_demo" in client.dataset_names():
dataset = client.get_dataset_entry(name="oss_demo")
else:
dataset = client.create_dataset(
name="oss_demo",
)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break this into 2.

  1. create dataset
  2. get a dataset

Copy link
Member

@jleibs jleibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scanned the bindings / gen_common_index / linter changes. Thanks for working through this.

@ntjohnson1 ntjohnson1 merged commit 5862d3d into main Oct 29, 2025
41 checks passed
@ntjohnson1 ntjohnson1 deleted the nick/query_data branch October 29, 2025 18:10
@ntjohnson1 ntjohnson1 mentioned this pull request Oct 29, 2025
timsaucer pushed a commit that referenced this pull request Oct 29, 2025
### Related
#11650 changed our NOLINT syntax
around pyclass but didn't pull main before merge.

### What

Unbreaks CI.
ntjohnson1 added a commit that referenced this pull request Oct 30, 2025
### Related
#11650

### What

Updates the doc string for using_index_values since it was unclear to
me.
ntjohnson1 added a commit that referenced this pull request Oct 30, 2025
…or bindings (#11650)

* Closes RR-2251

* Primary task: first pass through query-data.md to give an overview of
how to start and connect to OSS server and get to a dataframe with
pointers to datafusion for follow on work
* Will fill an issue to make this into a snippet/tutorial to make
testable after we get a python object for the server
* Secondary task: Render docs for DataframeQueryView and lots of other
bindings exposed through catalog
* This scope creeped a lot so I just did the manual class list will file
a follow on ticket for proposal to fix this more globally
1. Add lint codes so I can separate out my ignores for pyclass_eq and
pyclass_module requirements
1. The lint.py changes (which aren't great since I'm not an experience
parser writer but don't seem awful)
3. Add a pyclass module requirement to specify
`rerun_bindings.rerun_bindings` see
#11268 for more context
      1. All the rust file changes
   4. Fight mkdocs to render bindings properly
      1. The mkdocs related files
2. Adds the ability to generate our docs just off of the stubs without
the shared object being present at all. NOTE: The docs actually look
better this way, but I added some accommodation to make them look less
awful if the shared object is present.

Filed an issue to fix all the other missing classes from our rendered
api docs:
https://linear.app/rerun/issue/RR-2766/cover-remaining-classes-in-python-api-rendered-docs
and an issue with a proposal to avoid this problem moving forward:

https://linear.app/rerun/issue/RR-2765/change-python-public-api-surface-pattern-to-support-easier-docs
ntjohnson1 added a commit that referenced this pull request Oct 30, 2025
#11650 changed our NOLINT syntax
around pyclass but didn't pull main before merge.

Unbreaks CI.
ntjohnson1 added a commit that referenced this pull request Oct 30, 2025
### Related
#11650

### What

Updates the doc string for using_index_values since it was unclear to
me.
ntjohnson1 added a commit that referenced this pull request Nov 3, 2025
…or bindings (#11650)

* Closes RR-2251
* Primary task: first pass through query-data.md to give an overview of
how to start and connect to OSS server and get to a dataframe with
pointers to datafusion for follow on work
* Will fill an issue to make this into a snippet/tutorial to make
testable after we get a python object for the server
* Secondary task: Render docs for DataframeQueryView and lots of other
bindings exposed through catalog
* This scope creeped a lot so I just did the manual class list will file
a follow on ticket for proposal to fix this more globally
1. Add lint codes so I can separate out my ignores for pyclass_eq and
pyclass_module requirements
1. The lint.py changes (which aren't great since I'm not an experience
parser writer but don't seem awful)
3. Add a pyclass module requirement to specify
`rerun_bindings.rerun_bindings` see
#11268 for more context
      1. All the rust file changes
   4. Fight mkdocs to render bindings properly
      1. The mkdocs related files
2. Adds the ability to generate our docs just off of the stubs without
the shared object being present at all. NOTE: The docs actually look
better this way, but I added some accommodation to make them look less
awful if the shared object is present.

Filed an issue to fix all the other missing classes from our rendered
api docs:
https://linear.app/rerun/issue/RR-2766/cover-remaining-classes-in-python-api-rendered-docs
and an issue with a proposal to avoid this problem moving forward:

https://linear.app/rerun/issue/RR-2765/change-python-public-api-surface-pattern-to-support-easier-docs
ntjohnson1 added a commit that referenced this pull request Nov 3, 2025
### Related
#11650

### What

Updates the doc string for using_index_values since it was unclear to
me.
ntjohnson1 added a commit that referenced this pull request Nov 20, 2025
…isting (#11928)

### Related
#11650

### What
I hate seeing `<rerun_bindings.rerun_bindings.DataFusionTable object at
0x10d4220b0>` when I print things.
* Updated our lint.py to use git for file discovery. This is ~1000x
faster than the package walk. Don't have a way to confirm we aren't
missing new things now 😬 if not correct by inspection I guess I could
try doing a set diff and manually inspecting
* I also asked a robot to generate a new lint rule to check for __str__
in our pymethods
* Implemented a whole bunch and added no lint when I got tired of it

Here is my minimal repro script to exercise most code paths:
```python
import rerun as rr
client = rr.catalog.CatalogClient("rerun://sandbox.redap.rerun.io")
print(client)
print(client.table_entries())
dataset = client.get_dataset(name="droid:raw")
print(dataset)
print(dataset.partition_table())
print(dataset.list_indexes())
first_partition = dataset.partition_ids()[0]
qv = dataset.dataframe_query_view(index="real_time", contents="/observation/gripper_position").filter_partition_id(first_partition)
print(qv)
print(rr.RecordingStream("foo"))
print(rr.GrpcSink("rerun+http://127.0.0.1:9876/proxy"))
print(rr.FileSink("out.rerun"))
```

Before
```console
CatalogClient(rerun://sandbox.redap.rerun.io:443)
[Entry(Table, '__entries')]
Entry(Dataset, 'droid:raw')
<rerun_bindings.rerun_bindings.DataFusionTable object at 0x10d4220b0>
[<rerun_bindings.rerun_bindings.IndexingResult object at 0x10d1e5b00>]
<rerun_bindings.rerun_bindings.DataframeQueryView object at 0x10d0e7660>
<rerun.recording_stream.RecordingStream object at 0x103232cf0>
<rerun_bindings.rerun_bindings.GrpcSink object at 0x10cff4540>
<rerun_bindings.rerun_bindings.FileSink object at 0x10cff4540>
```

After ( a little verbose but definitely better than above and we can
adjust as needed).

```console
CatalogClient(rerun://sandbox.redap.rerun.io:443)
[Entry(Table, '__entries')]
DatasetEntry(name='droid:raw', id='187514A5086F1F1D5906873e499c6436')
DataFusionTable(name='droid:raw_partition_table')
[IndexingResult(index='/camera/wrist/embedding:embeddings' on 'real_time': VectorIvfPq { num_sub_vectors: 16, metric: cosine })]
DataframeQueryView(
 dataset=DatasetEntry(name='droid:raw', id='187514A5086F1F1D5906873e499c6436'),
 query_expression=QueryExpression {
     view_contents: Some(
         ViewContentsSelector(
             {
                 /observation/gripper_position: None,
             },
         ),
     ),
     include_semantically_empty_columns: false,
     include_tombstone_columns: false,
     include_static_columns: Both,
     filtered_index: Some(
         "real_time",
     ),
     filtered_index_range: None,
     filtered_index_values: None,
     using_index_values: None,
     filtered_is_not_null: None,
     sparse_fill_strategy: None,
     selection: None,
 },
 partition_ids=["ILIAD_50aee79f_2023_07_12_20h_28m_36s"]
)
RecordingStream(Some(
    StoreInfo {
        store_id: StoreId(
            Recording,
            "foo",
            "935b03a47f834753b5329a70dc991556",
        ),
        cloned_from: None,
        store_source: PythonSdk(
            3.11.13,
        ),
        store_version: Some(
            CrateVersion {
                major: 0,
                minor: 28,
                patch: 0,
                meta: Some(
                    DevAlpha {
                        alpha: 1,
                        commit: None,
                    },
                ),
            },
        ),
        is_partial: false,
    },
))
GrpcSink(ProxyUri {
    origin: Origin {
        scheme: RerunHttp,
        host: Ipv4(
            127.0.0.1,
        ),
        port: 9876,
    },
})
FileSink("out.rerun")
```
ntjohnson1 pushed a commit that referenced this pull request Jan 15, 2026
…ngs` prefix in the Python docs (#12448)

Fix an issue which I believe was introduced in #11650 where some symbols
where displayed with `rerun_bindings.rerun_bindings` prefix in the front
page tables of the Python docs:

<img width="773" height="530" alt="image"
src="https://github.com/user-attachments/assets/ff9a877d-5f3b-4cfa-86da-6edfdb62b26d"
/>

My fix appears to do the right thing based on `pixi run py-docs-serve`.
nikolausWest pushed a commit that referenced this pull request Jan 15, 2026
…ngs` prefix in the Python docs (#12448)

Fix an issue which I believe was introduced in #11650 where some symbols
where displayed with `rerun_bindings.rerun_bindings` prefix in the front
page tables of the Python docs:

<img width="773" height="530" alt="image"
src="https://github.com/user-attachments/assets/ff9a877d-5f3b-4cfa-86da-6edfdb62b26d"
/>

My fix appears to do the right thing based on `pixi run py-docs-serve`.
Wumpf pushed a commit that referenced this pull request Jan 15, 2026
…ngs` prefix in the Python docs (#12448)

Fix an issue which I believe was introduced in #11650 where some symbols
where displayed with `rerun_bindings.rerun_bindings` prefix in the front
page tables of the Python docs:

<img width="773" height="530" alt="image"
src="https://github.com/user-attachments/assets/ff9a877d-5f3b-4cfa-86da-6edfdb62b26d"
/>

My fix appears to do the right thing based on `pixi run py-docs-serve`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🧑‍💻 dev experience developer experience (excluding CI) 📖 documentation Improvements or additions to documentation include in changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants