Skip to content

feat: port to rust#108

Merged
ccrutchf merged 12 commits intomainfrom
feature/rust
Mar 28, 2026
Merged

feat: port to rust#108
ccrutchf merged 12 commits intomainfrom
feature/rust

Conversation

@ccrutchf
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports the core dataset/mission management logic from Python to a Rust-backed _core extension module (via PyO3/maturin), while keeping a thin Python API surface and updating tests to reflect the new on-disk state (.e4edm.db).

Changes:

  • Introduces a Rust implementation for manifests, metadata IO, dataset operations, and manager/config persistence (SQLite).
  • Reworks Python modules (core.py, data.py, exception.py) into wrappers around Rust _core, and removes the schema dependency-based metadata validation.
  • Updates Python tests to expect .e4edm.db and adjusts validation/duplicate assertions.

Reviewed changes

Copilot reviewed 17 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/lib.rs PyO3 module exports (PyDataManager, PyDataset, PyMission) and Python exception mapping
src/dataset.rs Main dataset lifecycle logic: create/load, stage/commit, validate, duplicate, zip, completeness checks
src/db.rs SQLite storage layer for dataset and manager state
src/manifest.rs Manifest hashing/serialization and validation helpers
src/metadata.rs metadata.json read/write and JSON formatting helpers
src/manager.rs Manager state persistence in config.db
src/errors.rs Centralized Rust error types (mapped into Python exceptions)
e4e_data_management/core.py Python DataManager wrapper over Rust _core
e4e_data_management/data.py Slimmed Python Dataset wrapper + compatibility Manifest helper
e4e_data_management/exception.py Switches Python exceptions to Rust-defined exception types
e4e_data_management/metadata.py Removes schema validation; checks required keys only
tests/test_*.py Updates expected state files (.e4edm.db) and validation patterns
pyproject.toml Moves to PEP 621 + maturin build backend; adds maturin dev dependency
Cargo.toml / Cargo.lock Adds Rust crate + dependency lock for the new extension module

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ccrutchf and others added 2 commits March 21, 2026 22:14
…n permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@ccrutchf ccrutchf requested a review from Copilot March 22, 2026 05:43
…n permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 27 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ccrutchf
Copy link
Copy Markdown
Contributor Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 22, 2026

@ccrutchf I've opened a new pull request, #109, to work on those changes. Once the pull request is ready, I'll request review from you.

@ccrutchf
Copy link
Copy Markdown
Contributor Author

@copilot can you review again and recreate comments if they still apply? I have resolved them to remove the noise.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 22, 2026

@ccrutchf I've opened a new pull request, #110, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 28 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +253 to +264
pub fn add_mission_committed_files(
&self,
mission_name: &str,
files: &[String],
) -> Result<()> {
for f in files {
self.conn.execute(
"INSERT OR IGNORE INTO mission_committed_files (mission_name, path) VALUES (?1, ?2)",
params![mission_name, f],
)?;
}
Ok(())
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_mission_committed_files() uses INSERT OR IGNORE, but (as currently defined) mission_committed_files doesn’t have a UNIQUE constraint that would trigger an “ignore”, so this will insert duplicates on repeated calls. Either add a UNIQUE constraint on (mission_name, path) and keep OR IGNORE, or change the write path to clear/rewrite committed files for a mission before inserting.

Copilot uses AI. Check for mistakes.
Comment on lines +630 to +637
for rel_path in manifest_data.keys() {
let src = state.root.join(rel_path);
let arc_path = format!("{}/{}", dataset_name, rel_path);
zip.start_file(&arc_path, options)
.map_err(|e| crate::errors::E4EError::Runtime(e.to_string()))?;
let data = fs::read(&src)?;
zip.write_all(&data)?;
}
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_zip() reads each file fully into memory (fs::read) before writing to the zip. For large datasets this can cause high memory usage or OOM. Prefer streaming the file into the zip writer (e.g., open File, wrap in BufReader, then std::io::copy into the zip entry) to keep memory bounded.

Copilot uses AI. Check for mistakes.
Comment on lines +313 to +339
#[getter]
fn active_dataset(&mut self) -> PyResult<Option<PyDataset>> {
match self.ensure_active_dataset() {
Ok(ds) => Ok(Some(PyDataset { inner: ds.clone() })),
Err(E4EError::Runtime(_)) => Ok(None),
Err(e) => Err(e.into()),
}
}

#[getter]
fn active_mission(&mut self) -> PyResult<Option<PyMission>> {
let mission_name = match self.active_mission_name.clone() {
Some(n) if !n.is_empty() => n,
_ => return Ok(None),
};
match self.ensure_active_dataset() {
Ok(ds) => {
let mission = ds
.missions
.iter()
.find(|m| m.record.name == mission_name)
.map(|m| PyMission { inner: m.clone() });
Ok(mission)
}
Err(E4EError::Runtime(_)) => Ok(None),
Err(e) => Err(e.into()),
}
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

active_dataset (and active_mission) treat any E4EError::Runtime(_) from ensure_active_dataset() as “not active” and return None. This can silently swallow real runtime failures from load_dataset_state (e.g., corrupted dataset, missing files) instead of surfacing them to Python callers. Consider introducing a dedicated error variant for “Dataset not active” (or otherwise distinguishing that case) and only mapping that specific error to None.

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +75
// Persist config values
if let Some(ref name) = self.active_dataset_name {
db.set_config("active_dataset", name)?;
} else {
// Remove the key if no active dataset
db.set_config("active_dataset", "")?;
}
if let Some(ref name) = self.active_mission_name {
db.set_config("active_mission", name)?;
} else {
db.set_config("active_mission", "")?;
}
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says “Remove the key if no active dataset”, but the code writes an empty string instead of deleting the config row. This mismatch is confusing, and load() will then return Some("") rather than None. Either implement/remove-config semantics in ManagerDb and actually delete the key, or consistently treat empty-string values as None in load() (and update the comment accordingly).

Copilot uses AI. Check for mistakes.
Comment on lines +69 to 85
required_keys = {'timestamp', 'device', 'country', 'region', 'site', 'mission',
'properties', 'notes'}
with open(directory.joinpath('metadata.json'), 'r', encoding='ascii') as handle:
data = json.load(handle)
metadata = metadata_schema.validate(data)
missing = required_keys - set(data.keys())
if missing:
raise ValueError(f'metadata.json missing keys: {missing}')
return Metadata(
timestamp=dt.datetime.fromisoformat(metadata['timestamp']),
device=metadata['device'],
country=metadata['country'],
region=metadata['region'],
site=metadata['site'],
mission=metadata['mission'],
properties=metadata['properties'],
notes=metadata['notes']
timestamp=dt.datetime.fromisoformat(data['timestamp']),
device=data['device'],
country=data['country'],
region=data['region'],
site=data['site'],
mission=data['mission'],
properties=data['properties'],
notes=data['notes']
)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata.load() now only checks for missing keys and no longer validates value types (e.g., ensuring properties is a dict, timestamp is a string, etc.). This can allow malformed metadata.json to load and fail later in less obvious ways. Consider adding basic type checks for the expected fields (or reintroducing a lightweight schema validation) so errors are caught with clearer messages.

Copilot uses AI. Check for mistakes.
@ccrutchf ccrutchf merged commit aa117d5 into main Mar 28, 2026
11 of 13 checks passed
@ccrutchf ccrutchf deleted the feature/rust branch March 28, 2026 00:31
ccrutchf added a commit that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants