Add documentation to dataframe query workflow and fix api rendering for bindings#11650
Add documentation to dataframe query workflow and fix api rendering for bindings#11650ntjohnson1 merged 12 commits intomainfrom
Conversation
…es docs and looks awful
|
Web viewer built successfully.
View image diff on kitdiff. Note: This comment is updated whenever you push a commit. |
|
Latest documentation preview deployed successfully.
Note: This comment is updated whenever you push a commit. |
|
Your changes cannot be automatically cherry-picked to You should remove the |
There was a problem hiding this comment.
Pull Request Overview
This PR adds documentation for querying data via the Open Source Server and fixes API rendering for Python bindings classes. The changes ensure that bindings classes are properly documented by enforcing the module = "rerun_bindings.rerun_bindings" parameter in Rust #[pyclass] declarations and updating the linting infrastructure to support granular NOLINT directives.
Key Changes:
- Enhanced the linting system to check for both
eqandmoduleparameters in#[pyclass]declarations - Added support for specific error codes in NOLINT directives (e.g.,
NOLINT: ignore[py-cls-eq]) - Updated all Rust
#[pyclass]declarations to include the required module parameter - Added new documentation page for querying data via Open Source Server
- Improved mkdocs configuration to properly render bindings documentation
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/lint.py | Expanded linting to enforce module parameter and support code-specific NOLINT directives |
| rerun_py/src/viewer.rs | Updated NOLINT syntax and added module parameter |
| rerun_py/src/python_bridge.rs | Added module parameter to multiple pyclass declarations |
| rerun_py/src/dataframe/schema.rs | Updated NOLINT syntax |
| rerun_py/src/dataframe/rrd.rs | Updated NOLINT syntax |
| rerun_py/src/dataframe/recording_view.rs | Updated NOLINT syntax |
| rerun_py/src/dataframe/recording.rs | Updated NOLINT syntax |
| rerun_py/src/catalog/task.rs | Added module parameter to pyclass declarations |
| rerun_py/src/catalog/table_entry.rs | Added module parameter |
| rerun_py/src/catalog/mod.rs | Added module parameter to enum |
| rerun_py/src/catalog/errors.rs | Updated exception module path |
| rerun_py/src/catalog/entry.rs | Added module parameter to multiple pyclass declarations |
| rerun_py/src/catalog/dataset_entry.rs | Added module parameter |
| rerun_py/src/catalog/datafusion_table.rs | Added module parameter |
| rerun_py/src/catalog/datafusion_catalog.rs | Added module parameter |
| rerun_py/src/catalog/dataframe_rendering.rs | Added module parameter |
| rerun_py/src/catalog/dataframe_query.rs | Added module parameter |
| rerun_py/src/catalog/catalog_client.rs | Added module parameter |
| rerun_py/rerun_bindings/rerun_bindings.pyi | Added docstrings for DatasetEntry and DataframeQueryView |
| rerun_py/mkdocs.yml | Added filters and enabled stub package finding |
| rerun_py/docs/gen_common_index.py | Added class list for catalog section and bindings support |
| docs/content/getting-started/data-out/query-data.md | New documentation page for querying via Open Source Server |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| <!-- TODO(RR-2818) add link to doc --> | ||
|
|
||
| # Open source server |
There was a problem hiding this comment.
let's just nuke this little section
There was a problem hiding this comment.
Nuke this section as in The open source server is still in development to reach API parity with the cloud offering and evolve with the cloud offering. or nuke the whole top level heading?
| ```python | ||
| if "oss_demo" in client.dataset_names(): | ||
| dataset = client.get_dataset_entry(name="oss_demo") | ||
| else: | ||
| dataset = client.create_dataset( | ||
| name="oss_demo", | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
break this into 2.
- create dataset
- get a dataset
jleibs
left a comment
There was a problem hiding this comment.
Scanned the bindings / gen_common_index / linter changes. Thanks for working through this.
### Related #11650 changed our NOLINT syntax around pyclass but didn't pull main before merge. ### What Unbreaks CI.
### Related #11650 ### What Updates the doc string for using_index_values since it was unclear to me.
…or bindings (#11650) * Closes RR-2251 * Primary task: first pass through query-data.md to give an overview of how to start and connect to OSS server and get to a dataframe with pointers to datafusion for follow on work * Will fill an issue to make this into a snippet/tutorial to make testable after we get a python object for the server * Secondary task: Render docs for DataframeQueryView and lots of other bindings exposed through catalog * This scope creeped a lot so I just did the manual class list will file a follow on ticket for proposal to fix this more globally 1. Add lint codes so I can separate out my ignores for pyclass_eq and pyclass_module requirements 1. The lint.py changes (which aren't great since I'm not an experience parser writer but don't seem awful) 3. Add a pyclass module requirement to specify `rerun_bindings.rerun_bindings` see #11268 for more context 1. All the rust file changes 4. Fight mkdocs to render bindings properly 1. The mkdocs related files 2. Adds the ability to generate our docs just off of the stubs without the shared object being present at all. NOTE: The docs actually look better this way, but I added some accommodation to make them look less awful if the shared object is present. Filed an issue to fix all the other missing classes from our rendered api docs: https://linear.app/rerun/issue/RR-2766/cover-remaining-classes-in-python-api-rendered-docs and an issue with a proposal to avoid this problem moving forward: https://linear.app/rerun/issue/RR-2765/change-python-public-api-surface-pattern-to-support-easier-docs
#11650 changed our NOLINT syntax around pyclass but didn't pull main before merge. Unbreaks CI.
### Related #11650 ### What Updates the doc string for using_index_values since it was unclear to me.
…or bindings (#11650) * Closes RR-2251 * Primary task: first pass through query-data.md to give an overview of how to start and connect to OSS server and get to a dataframe with pointers to datafusion for follow on work * Will fill an issue to make this into a snippet/tutorial to make testable after we get a python object for the server * Secondary task: Render docs for DataframeQueryView and lots of other bindings exposed through catalog * This scope creeped a lot so I just did the manual class list will file a follow on ticket for proposal to fix this more globally 1. Add lint codes so I can separate out my ignores for pyclass_eq and pyclass_module requirements 1. The lint.py changes (which aren't great since I'm not an experience parser writer but don't seem awful) 3. Add a pyclass module requirement to specify `rerun_bindings.rerun_bindings` see #11268 for more context 1. All the rust file changes 4. Fight mkdocs to render bindings properly 1. The mkdocs related files 2. Adds the ability to generate our docs just off of the stubs without the shared object being present at all. NOTE: The docs actually look better this way, but I added some accommodation to make them look less awful if the shared object is present. Filed an issue to fix all the other missing classes from our rendered api docs: https://linear.app/rerun/issue/RR-2766/cover-remaining-classes-in-python-api-rendered-docs and an issue with a proposal to avoid this problem moving forward: https://linear.app/rerun/issue/RR-2765/change-python-public-api-surface-pattern-to-support-easier-docs
### Related #11650 ### What Updates the doc string for using_index_values since it was unclear to me.
…isting (#11928) ### Related #11650 ### What I hate seeing `<rerun_bindings.rerun_bindings.DataFusionTable object at 0x10d4220b0>` when I print things. * Updated our lint.py to use git for file discovery. This is ~1000x faster than the package walk. Don't have a way to confirm we aren't missing new things now 😬 if not correct by inspection I guess I could try doing a set diff and manually inspecting * I also asked a robot to generate a new lint rule to check for __str__ in our pymethods * Implemented a whole bunch and added no lint when I got tired of it Here is my minimal repro script to exercise most code paths: ```python import rerun as rr client = rr.catalog.CatalogClient("rerun://sandbox.redap.rerun.io") print(client) print(client.table_entries()) dataset = client.get_dataset(name="droid:raw") print(dataset) print(dataset.partition_table()) print(dataset.list_indexes()) first_partition = dataset.partition_ids()[0] qv = dataset.dataframe_query_view(index="real_time", contents="/observation/gripper_position").filter_partition_id(first_partition) print(qv) print(rr.RecordingStream("foo")) print(rr.GrpcSink("rerun+http://127.0.0.1:9876/proxy")) print(rr.FileSink("out.rerun")) ``` Before ```console CatalogClient(rerun://sandbox.redap.rerun.io:443) [Entry(Table, '__entries')] Entry(Dataset, 'droid:raw') <rerun_bindings.rerun_bindings.DataFusionTable object at 0x10d4220b0> [<rerun_bindings.rerun_bindings.IndexingResult object at 0x10d1e5b00>] <rerun_bindings.rerun_bindings.DataframeQueryView object at 0x10d0e7660> <rerun.recording_stream.RecordingStream object at 0x103232cf0> <rerun_bindings.rerun_bindings.GrpcSink object at 0x10cff4540> <rerun_bindings.rerun_bindings.FileSink object at 0x10cff4540> ``` After ( a little verbose but definitely better than above and we can adjust as needed). ```console CatalogClient(rerun://sandbox.redap.rerun.io:443) [Entry(Table, '__entries')] DatasetEntry(name='droid:raw', id='187514A5086F1F1D5906873e499c6436') DataFusionTable(name='droid:raw_partition_table') [IndexingResult(index='/camera/wrist/embedding:embeddings' on 'real_time': VectorIvfPq { num_sub_vectors: 16, metric: cosine })] DataframeQueryView( dataset=DatasetEntry(name='droid:raw', id='187514A5086F1F1D5906873e499c6436'), query_expression=QueryExpression { view_contents: Some( ViewContentsSelector( { /observation/gripper_position: None, }, ), ), include_semantically_empty_columns: false, include_tombstone_columns: false, include_static_columns: Both, filtered_index: Some( "real_time", ), filtered_index_range: None, filtered_index_values: None, using_index_values: None, filtered_is_not_null: None, sparse_fill_strategy: None, selection: None, }, partition_ids=["ILIAD_50aee79f_2023_07_12_20h_28m_36s"] ) RecordingStream(Some( StoreInfo { store_id: StoreId( Recording, "foo", "935b03a47f834753b5329a70dc991556", ), cloned_from: None, store_source: PythonSdk( 3.11.13, ), store_version: Some( CrateVersion { major: 0, minor: 28, patch: 0, meta: Some( DevAlpha { alpha: 1, commit: None, }, ), }, ), is_partial: false, }, )) GrpcSink(ProxyUri { origin: Origin { scheme: RerunHttp, host: Ipv4( 127.0.0.1, ), port: 9876, }, }) FileSink("out.rerun") ```
…ngs` prefix in the Python docs (#12448) Fix an issue which I believe was introduced in #11650 where some symbols where displayed with `rerun_bindings.rerun_bindings` prefix in the front page tables of the Python docs: <img width="773" height="530" alt="image" src="https://github.com/user-attachments/assets/ff9a877d-5f3b-4cfa-86da-6edfdb62b26d" /> My fix appears to do the right thing based on `pixi run py-docs-serve`.
…ngs` prefix in the Python docs (#12448) Fix an issue which I believe was introduced in #11650 where some symbols where displayed with `rerun_bindings.rerun_bindings` prefix in the front page tables of the Python docs: <img width="773" height="530" alt="image" src="https://github.com/user-attachments/assets/ff9a877d-5f3b-4cfa-86da-6edfdb62b26d" /> My fix appears to do the right thing based on `pixi run py-docs-serve`.
…ngs` prefix in the Python docs (#12448) Fix an issue which I believe was introduced in #11650 where some symbols where displayed with `rerun_bindings.rerun_bindings` prefix in the front page tables of the Python docs: <img width="773" height="530" alt="image" src="https://github.com/user-attachments/assets/ff9a877d-5f3b-4cfa-86da-6edfdb62b26d" /> My fix appears to do the right thing based on `pixi run py-docs-serve`.
Related
What
rerun_bindings.rerun_bindingssee Add module definition to allpyclasses#11268 for more contextFiled an issue to fix all the other missing classes from our rendered api docs: https://linear.app/rerun/issue/RR-2766/cover-remaining-classes-in-python-api-rendered-docs
and an issue with a proposal to avoid this problem moving forward:
https://linear.app/rerun/issue/RR-2765/change-python-public-api-surface-pattern-to-support-easier-docs