Conversation
There was a problem hiding this comment.
Pull request overview
This PR ports the core dataset/mission management logic from Python to a Rust-backed _core extension module (via PyO3/maturin), while keeping a thin Python API surface and updating tests to reflect the new on-disk state (.e4edm.db).
Changes:
- Introduces a Rust implementation for manifests, metadata IO, dataset operations, and manager/config persistence (SQLite).
- Reworks Python modules (
core.py,data.py,exception.py) into wrappers around Rust_core, and removes theschemadependency-based metadata validation. - Updates Python tests to expect
.e4edm.dband adjusts validation/duplicate assertions.
Reviewed changes
Copilot reviewed 17 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/lib.rs |
PyO3 module exports (PyDataManager, PyDataset, PyMission) and Python exception mapping |
src/dataset.rs |
Main dataset lifecycle logic: create/load, stage/commit, validate, duplicate, zip, completeness checks |
src/db.rs |
SQLite storage layer for dataset and manager state |
src/manifest.rs |
Manifest hashing/serialization and validation helpers |
src/metadata.rs |
metadata.json read/write and JSON formatting helpers |
src/manager.rs |
Manager state persistence in config.db |
src/errors.rs |
Centralized Rust error types (mapped into Python exceptions) |
e4e_data_management/core.py |
Python DataManager wrapper over Rust _core |
e4e_data_management/data.py |
Slimmed Python Dataset wrapper + compatibility Manifest helper |
e4e_data_management/exception.py |
Switches Python exceptions to Rust-defined exception types |
e4e_data_management/metadata.py |
Removes schema validation; checks required keys only |
tests/test_*.py |
Updates expected state files (.e4edm.db) and validation patterns |
pyproject.toml |
Moves to PEP 621 + maturin build backend; adds maturin dev dependency |
Cargo.toml / Cargo.lock |
Adds Rust crate + dependency lock for the new extension module |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 27 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
…ir copy verification, release workflow outputs Co-authored-by: ccrutchf <[email protected]> Agent-Logs-Url: https://github.com/UCSD-E4E/e4e-data-management/sessions/fab26efe-bc66-4640-80a8-8b9fb1e588f4
fix: address review feedback on Rust port
|
@copilot can you review again and recreate comments if they still apply? I have resolved them to remove the noise. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 28 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn add_mission_committed_files( | ||
| &self, | ||
| mission_name: &str, | ||
| files: &[String], | ||
| ) -> Result<()> { | ||
| for f in files { | ||
| self.conn.execute( | ||
| "INSERT OR IGNORE INTO mission_committed_files (mission_name, path) VALUES (?1, ?2)", | ||
| params![mission_name, f], | ||
| )?; | ||
| } | ||
| Ok(()) |
There was a problem hiding this comment.
add_mission_committed_files() uses INSERT OR IGNORE, but (as currently defined) mission_committed_files doesn’t have a UNIQUE constraint that would trigger an “ignore”, so this will insert duplicates on repeated calls. Either add a UNIQUE constraint on (mission_name, path) and keep OR IGNORE, or change the write path to clear/rewrite committed files for a mission before inserting.
| for rel_path in manifest_data.keys() { | ||
| let src = state.root.join(rel_path); | ||
| let arc_path = format!("{}/{}", dataset_name, rel_path); | ||
| zip.start_file(&arc_path, options) | ||
| .map_err(|e| crate::errors::E4EError::Runtime(e.to_string()))?; | ||
| let data = fs::read(&src)?; | ||
| zip.write_all(&data)?; | ||
| } |
There was a problem hiding this comment.
create_zip() reads each file fully into memory (fs::read) before writing to the zip. For large datasets this can cause high memory usage or OOM. Prefer streaming the file into the zip writer (e.g., open File, wrap in BufReader, then std::io::copy into the zip entry) to keep memory bounded.
| #[getter] | ||
| fn active_dataset(&mut self) -> PyResult<Option<PyDataset>> { | ||
| match self.ensure_active_dataset() { | ||
| Ok(ds) => Ok(Some(PyDataset { inner: ds.clone() })), | ||
| Err(E4EError::Runtime(_)) => Ok(None), | ||
| Err(e) => Err(e.into()), | ||
| } | ||
| } | ||
|
|
||
| #[getter] | ||
| fn active_mission(&mut self) -> PyResult<Option<PyMission>> { | ||
| let mission_name = match self.active_mission_name.clone() { | ||
| Some(n) if !n.is_empty() => n, | ||
| _ => return Ok(None), | ||
| }; | ||
| match self.ensure_active_dataset() { | ||
| Ok(ds) => { | ||
| let mission = ds | ||
| .missions | ||
| .iter() | ||
| .find(|m| m.record.name == mission_name) | ||
| .map(|m| PyMission { inner: m.clone() }); | ||
| Ok(mission) | ||
| } | ||
| Err(E4EError::Runtime(_)) => Ok(None), | ||
| Err(e) => Err(e.into()), | ||
| } |
There was a problem hiding this comment.
active_dataset (and active_mission) treat any E4EError::Runtime(_) from ensure_active_dataset() as “not active” and return None. This can silently swallow real runtime failures from load_dataset_state (e.g., corrupted dataset, missing files) instead of surfacing them to Python callers. Consider introducing a dedicated error variant for “Dataset not active” (or otherwise distinguishing that case) and only mapping that specific error to None.
| // Persist config values | ||
| if let Some(ref name) = self.active_dataset_name { | ||
| db.set_config("active_dataset", name)?; | ||
| } else { | ||
| // Remove the key if no active dataset | ||
| db.set_config("active_dataset", "")?; | ||
| } | ||
| if let Some(ref name) = self.active_mission_name { | ||
| db.set_config("active_mission", name)?; | ||
| } else { | ||
| db.set_config("active_mission", "")?; | ||
| } |
There was a problem hiding this comment.
The comment says “Remove the key if no active dataset”, but the code writes an empty string instead of deleting the config row. This mismatch is confusing, and load() will then return Some("") rather than None. Either implement/remove-config semantics in ManagerDb and actually delete the key, or consistently treat empty-string values as None in load() (and update the comment accordingly).
| required_keys = {'timestamp', 'device', 'country', 'region', 'site', 'mission', | ||
| 'properties', 'notes'} | ||
| with open(directory.joinpath('metadata.json'), 'r', encoding='ascii') as handle: | ||
| data = json.load(handle) | ||
| metadata = metadata_schema.validate(data) | ||
| missing = required_keys - set(data.keys()) | ||
| if missing: | ||
| raise ValueError(f'metadata.json missing keys: {missing}') | ||
| return Metadata( | ||
| timestamp=dt.datetime.fromisoformat(metadata['timestamp']), | ||
| device=metadata['device'], | ||
| country=metadata['country'], | ||
| region=metadata['region'], | ||
| site=metadata['site'], | ||
| mission=metadata['mission'], | ||
| properties=metadata['properties'], | ||
| notes=metadata['notes'] | ||
| timestamp=dt.datetime.fromisoformat(data['timestamp']), | ||
| device=data['device'], | ||
| country=data['country'], | ||
| region=data['region'], | ||
| site=data['site'], | ||
| mission=data['mission'], | ||
| properties=data['properties'], | ||
| notes=data['notes'] | ||
| ) |
There was a problem hiding this comment.
Metadata.load() now only checks for missing keys and no longer validates value types (e.g., ensuring properties is a dict, timestamp is a string, etc.). This can allow malformed metadata.json to load and fail later in less obvious ways. Consider adding basic type checks for the expected fields (or reintroducing a lightweight schema validation) so errors are caught with clearer messages.
No description provided.