feat: port to rust by ccrutchf · Pull Request #108 · UCSD-E4E/e4e-data-management

ccrutchf · 2026-03-22T04:47:09Z

No description provided.

Copilot

Pull request overview

This PR ports the core dataset/mission management logic from Python to a Rust-backed _core extension module (via PyO3/maturin), while keeping a thin Python API surface and updating tests to reflect the new on-disk state (.e4edm.db).

Changes:

Introduces a Rust implementation for manifests, metadata IO, dataset operations, and manager/config persistence (SQLite).
Reworks Python modules (core.py, data.py, exception.py) into wrappers around Rust _core, and removes the schema dependency-based metadata validation.
Updates Python tests to expect .e4edm.db and adjusts validation/duplicate assertions.

Reviewed changes

Copilot reviewed 17 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`src/lib.rs`	PyO3 module exports (`PyDataManager`, `PyDataset`, `PyMission`) and Python exception mapping
`src/dataset.rs`	Main dataset lifecycle logic: create/load, stage/commit, validate, duplicate, zip, completeness checks
`src/db.rs`	SQLite storage layer for dataset and manager state
`src/manifest.rs`	Manifest hashing/serialization and validation helpers
`src/metadata.rs`	`metadata.json` read/write and JSON formatting helpers
`src/manager.rs`	Manager state persistence in `config.db`
`src/errors.rs`	Centralized Rust error types (mapped into Python exceptions)
`e4e_data_management/core.py`	Python `DataManager` wrapper over Rust `_core`
`e4e_data_management/data.py`	Slimmed Python `Dataset` wrapper + compatibility `Manifest` helper
`e4e_data_management/exception.py`	Switches Python exceptions to Rust-defined exception types
`e4e_data_management/metadata.py`	Removes `schema` validation; checks required keys only
`tests/test_*.py`	Updates expected state files (`.e4edm.db`) and validation patterns
`pyproject.toml`	Moves to PEP 621 + maturin build backend; adds maturin dev dependency
`Cargo.toml` / `Cargo.lock`	Adds Rust crate + dependency lock for the new extension module

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

e4e_data_management/data.py

e4e_data_management/exception.py

src/db.rs

src/manager.rs

src/lib.rs

src/dataset.rs

.github/workflows/cargo-test.yml

.github/workflows/clippy.yml

.github/workflows/cargo-test.yml

…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 23 out of 27 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

e4e_data_management/exception.py

src/dataset.rs

src/manifest.rs

.github/workflows/release.yml

ccrutchf · 2026-03-22T05:48:58Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2026-03-22T05:49:58Z

@ccrutchf I've opened a new pull request, #109, to work on those changes. Once the pull request is ready, I'll request review from you.

…ir copy verification, release workflow outputs Co-authored-by: ccrutchf <[email protected]> Agent-Logs-Url: https://github.com/UCSD-E4E/e4e-data-management/sessions/fab26efe-bc66-4640-80a8-8b9fb1e588f4

fix: address review feedback on Rust port

ccrutchf · 2026-03-22T06:14:08Z

@copilot can you review again and recreate comments if they still apply? I have resolved them to remove the noise.

Copilot · 2026-03-22T06:14:15Z

@ccrutchf I've opened a new pull request, #110, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot

Pull request overview

Copilot reviewed 24 out of 28 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T06:19:34Z

src/db.rs

+    pub fn add_mission_committed_files(
+        &self,
+        mission_name: &str,
+        files: &[String],
+    ) -> Result<()> {
+        for f in files {
+            self.conn.execute(
+                "INSERT OR IGNORE INTO mission_committed_files (mission_name, path) VALUES (?1, ?2)",
+                params![mission_name, f],
+            )?;
+        }
+        Ok(())


add_mission_committed_files() uses INSERT OR IGNORE, but (as currently defined) mission_committed_files doesn’t have a UNIQUE constraint that would trigger an “ignore”, so this will insert duplicates on repeated calls. Either add a UNIQUE constraint on (mission_name, path) and keep OR IGNORE, or change the write path to clear/rewrite committed files for a mission before inserting.

Copilot · 2026-03-22T06:19:35Z

src/dataset.rs

+    for rel_path in manifest_data.keys() {
+        let src = state.root.join(rel_path);
+        let arc_path = format!("{}/{}", dataset_name, rel_path);
+        zip.start_file(&arc_path, options)
+            .map_err(|e| crate::errors::E4EError::Runtime(e.to_string()))?;
+        let data = fs::read(&src)?;
+        zip.write_all(&data)?;
+    }


create_zip() reads each file fully into memory (fs::read) before writing to the zip. For large datasets this can cause high memory usage or OOM. Prefer streaming the file into the zip writer (e.g., open File, wrap in BufReader, then std::io::copy into the zip entry) to keep memory bounded.

Copilot · 2026-03-22T06:19:35Z

src/lib.rs

+    #[getter]
+    fn active_dataset(&mut self) -> PyResult<Option<PyDataset>> {
+        match self.ensure_active_dataset() {
+            Ok(ds) => Ok(Some(PyDataset { inner: ds.clone() })),
+            Err(E4EError::Runtime(_)) => Ok(None),
+            Err(e) => Err(e.into()),
+        }
+    }
+
+    #[getter]
+    fn active_mission(&mut self) -> PyResult<Option<PyMission>> {
+        let mission_name = match self.active_mission_name.clone() {
+            Some(n) if !n.is_empty() => n,
+            _ => return Ok(None),
+        };
+        match self.ensure_active_dataset() {
+            Ok(ds) => {
+                let mission = ds
+                    .missions
+                    .iter()
+                    .find(|m| m.record.name == mission_name)
+                    .map(|m| PyMission { inner: m.clone() });
+                Ok(mission)
+            }
+            Err(E4EError::Runtime(_)) => Ok(None),
+            Err(e) => Err(e.into()),
+        }


active_dataset (and active_mission) treat any E4EError::Runtime(_) from ensure_active_dataset() as “not active” and return None. This can silently swallow real runtime failures from load_dataset_state (e.g., corrupted dataset, missing files) instead of surfacing them to Python callers. Consider introducing a dedicated error variant for “Dataset not active” (or otherwise distinguishing that case) and only mapping that specific error to None.

Copilot · 2026-03-22T06:19:35Z

src/manager.rs

+        // Persist config values
+        if let Some(ref name) = self.active_dataset_name {
+            db.set_config("active_dataset", name)?;
+        } else {
+            // Remove the key if no active dataset
+            db.set_config("active_dataset", "")?;
+        }
+        if let Some(ref name) = self.active_mission_name {
+            db.set_config("active_mission", name)?;
+        } else {
+            db.set_config("active_mission", "")?;
+        }


The comment says “Remove the key if no active dataset”, but the code writes an empty string instead of deleting the config row. This mismatch is confusing, and load() will then return Some("") rather than None. Either implement/remove-config semantics in ManagerDb and actually delete the key, or consistently treat empty-string values as None in load() (and update the comment accordingly).

Copilot · 2026-03-22T06:19:35Z

e4e_data_management/metadata.py

+        required_keys = {'timestamp', 'device', 'country', 'region', 'site', 'mission',
+                         'properties', 'notes'}
        with open(directory.joinpath('metadata.json'), 'r', encoding='ascii') as handle:
            data = json.load(handle)
-        metadata = metadata_schema.validate(data)
+        missing = required_keys - set(data.keys())
+        if missing:
+            raise ValueError(f'metadata.json missing keys: {missing}')
        return Metadata(
-            timestamp=dt.datetime.fromisoformat(metadata['timestamp']),
-            device=metadata['device'],
-            country=metadata['country'],
-            region=metadata['region'],
-            site=metadata['site'],
-            mission=metadata['mission'],
-            properties=metadata['properties'],
-            notes=metadata['notes']
+            timestamp=dt.datetime.fromisoformat(data['timestamp']),
+            device=data['device'],
+            country=data['country'],
+            region=data['region'],
+            site=data['site'],
+            mission=data['mission'],
+            properties=data['properties'],
+            notes=data['notes']
        )


Metadata.load() now only checks for missing keys and no longer validates value types (e.g., ensuring properties is a dict, timestamp is a string, etc.). This can allow malformed metadata.json to load and fail later in less obvious ways. Consider adding basic type checks for the expected fields (or reintroducing a lightweight schema validation) so errors are caught with clearer messages.

feat: port to rust

feat: port to rust

51d6e86

ccrutchf requested a review from Copilot March 22, 2026 04:47

Copilot started reviewing on behalf of ccrutchf March 22, 2026 04:47 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

chore: rust lint

79b17b5

github-advanced-security AI found potential problems Mar 22, 2026

View reviewed changes

.github/workflows/cargo-test.yml Fixed Show fixed Hide fixed

.github/workflows/clippy.yml Fixed Show fixed Hide fixed

fix: remove dependency on macos-13

daef181

github-advanced-security AI found potential problems Mar 22, 2026

View reviewed changes

.github/workflows/cargo-test.yml Fixed Show fixed Hide fixed

ccrutchf and others added 2 commits March 21, 2026 22:14

chore: pylint

a54e167

Potential fix for code scanning alert no. 5: Workflow does not contai…

a4d32cf

…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

ccrutchf requested a review from Copilot March 22, 2026 05:43

Potential fix for code scanning alert no. 3: Workflow does not contai…

d6354e9

…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot started reviewing on behalf of ccrutchf March 22, 2026 05:44 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

e4e_data_management/exception.py Outdated Show resolved Hide resolved

src/dataset.rs Show resolved Hide resolved

src/manifest.rs Outdated Show resolved Hide resolved

.github/workflows/release.yml Show resolved Hide resolved

Initial plan

a7ad3e8

Copilot AI mentioned this pull request Mar 22, 2026

fix: address review feedback on Rust port #109

Merged

Copilot AI and others added 2 commits March 22, 2026 05:56

Merge pull request #109 from UCSD-E4E/copilot/sub-pr-108

9f921e5

fix: address review feedback on Rust port

Copilot AI mentioned this pull request Mar 22, 2026

fix: address remaining review feedback on Rust port #110

Closed

ccrutchf requested a review from Copilot March 22, 2026 06:15

Copilot started reviewing on behalf of ccrutchf March 22, 2026 06:15 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

ccrutchf added 3 commits March 27, 2026 17:30

feat: performance improvements

ff32807

feat: update command structure

a75c86b

feat: fully functional e4edm using rust

9323c28

ccrutchf merged commit aa117d5 into main Mar 28, 2026
11 of 13 checks passed

ccrutchf deleted the feature/rust branch March 28, 2026 00:31

ccrutchf added a commit that referenced this pull request Mar 31, 2026

Merge pull request #108 from UCSD-E4E/feature/rust

9ca5e1e

feat: port to rust

Conversation

ccrutchf commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ccrutchf commented Mar 22, 2026

Uh oh!

Copilot AI commented Mar 22, 2026

Uh oh!

ccrutchf commented Mar 22, 2026

Uh oh!

Copilot AI commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants