tests: add unit tests for functions without direct test coverage#94
tests: add unit tests for functions without direct test coverage#94arnavsharma990 wants to merge 2 commits intomllam:mainfrom
Conversation
Add focused unit tests for: - check_point_in_dataset: test point exists/not exists/None cases - check_step: test constant step matching/mismatching, non-constant step, edge cases - load_input_dataset: test zarr/netCDF loading and error handling - check_chunk_size: test warning behavior for small/large chunks, missing dimensions - chunk_dataset: test successful chunking and error handling These tests follow existing test patterns and provide minimal but complete coverage for previously untested helper functions. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
Adds new unit tests to cover several helper/ops functions that previously lacked direct test coverage, focusing on selection validation, dataset loading, and dataset chunking behavior.
Changes:
- Add tests for
check_point_in_datasetandcheck_stepedge cases and error paths. - Add tests for
load_input_datasetfor Zarr and NetCDF inputs plus missing-file handling. - Add tests for
check_chunk_sizewarning behavior andchunk_datasetsuccess/error handling.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
tests/test_selection_helpers.py |
Adds coverage for selection helper validation and step-size checking edge cases. |
tests/test_loading.py |
Adds coverage for loading datasets from Zarr/NetCDF and missing-path behavior. |
tests/test_chunking.py |
Adds coverage for chunk-size checking and dataset chunking error handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/test_chunking.py
Outdated
| @pytest.fixture | ||
| def large_dataset(): | ||
| """Create a dataset that will exceed chunk size warning.""" | ||
| # Create dataset with large chunks that exceed 1GB warning | ||
| # Using float64 (8 bytes), need > 1GB / 8 = 134217728 elements | ||
| # For simplicity, create a smaller but still large dataset | ||
| size = 5000 | ||
| return xr.Dataset( | ||
| { | ||
| "large_var": (["x", "y"], np.random.random((size, size))), | ||
| }, | ||
| coords={"x": range(size), "y": range(size)}, | ||
| ) |
There was a problem hiding this comment.
large_dataset allocates a 5000x5000 float64 array (~200MB) during test collection/execution, which is likely to slow down or OOM CI. You can trigger the chunk-size warning without a huge dataset (the implementation only uses chunks and dtype), so this fixture should be removed or made tiny.
tests/test_chunking.py
Outdated
| def test_check_chunk_size_large_chunks(large_dataset, caplog): | ||
| """Test check_chunk_size with large chunks (should warn).""" | ||
| # Use chunks that will create large memory usage | ||
| chunks = {"x": 1000, "y": 1000} | ||
| check_chunk_size(large_dataset, chunks) | ||
| # Should log a warning | ||
| assert len(caplog.records) > 0 | ||
| assert "exceeds" in caplog.records[0].message.lower() |
There was a problem hiding this comment.
This warning test is currently inconsistent with the implementation:
check_chunk_sizecomputesmemory_usagefrom the requested chunk sizes (product of values inchunks) and dtype, so{"x": 1000, "y": 1000}is only ~8MB and should not exceed the 1GB threshold.check_chunk_sizelogs vialoguru.logger, whichcaplogdoes not capture by default, socaplog.recordswill remain empty.
Adjust the test to (1) use chunk sizes whose product exceeds the threshold and (2) capture Loguru output using a Loguru sink (or a pytest plugin that bridges Loguru to stdlib logging).
tests/test_loading.py
Outdated
| import tempfile | ||
| from pathlib import Path | ||
|
|
There was a problem hiding this comment.
tempfile and Path are imported but never used in this test module; please remove them to keep the test suite clean (and avoid failing linting if enabled).
| import tempfile | |
| from pathlib import Path |
| def test_load_input_dataset_netcdf(sample_dataset, tmp_path): | ||
| """Test load_input_dataset with netCDF format.""" | ||
| nc_path = tmp_path / "test.nc" | ||
| sample_dataset.to_netcdf(nc_path) | ||
|
|
There was a problem hiding this comment.
sample_dataset.to_netcdf(...) requires an optional NetCDF engine (typically scipy, netCDF4, or h5netcdf). The project dependencies don’t appear to include any of these, so this test may fail in CI depending on the environment. Consider using pytest.importorskip(...) for the chosen engine and specifying it explicitly (e.g., engine="scipy"), or adding an explicit test dependency to ensure NetCDF support is available.
tests/test_selection_helpers.py
Outdated
| """ | ||
| Unit tests for helper functions in ops.selection module. | ||
| """ | ||
| import datetime |
There was a problem hiding this comment.
datetime is imported but unused in this test file; please remove it to avoid unused-import warnings / lint failures.
| import datetime |
| def test_check_step_single_point_coordinate(): | ||
| """Test check_step with single point coordinate (edge case - will raise IndexError).""" | ||
| # Create dataset with single time point (diff will be empty array) | ||
| time_values = pd.date_range("2020-01-01", periods=1, freq="3H") | ||
| ds = xr.Dataset( | ||
| {"var": (["time"], [1])}, | ||
| coords={"time": time_values}, | ||
| ) | ||
| requested_step = pd.Timedelta(hours=3) | ||
| # This will raise IndexError when trying to access all_steps[0] on empty array | ||
| # This documents current behavior - could be improved to raise more descriptive error | ||
| with pytest.raises(IndexError): | ||
| check_step(requested_step, "time", ds) |
There was a problem hiding this comment.
This test locks in an IndexError for the single-point coordinate case, which is an implementation accident (coming from indexing all_steps[0] on an empty diff array). It would be more maintainable to update check_step to raise a descriptive ValueError when ds[coord] has fewer than 2 points, and assert that error type/message here instead.
- test_chunking.py:
- Remove large_dataset fixture (inefficient 5000x5000 allocation)
- Fix test_check_chunk_size_large_chunks to use chunk sizes that actually
exceed 1GB threshold (12000x12000) instead of 1000x1000
- Use loguru handler to capture logs instead of caplog (which doesn't
capture loguru output)
- Remove caplog from test_check_chunk_size_small_chunks
- test_loading.py:
- Remove unused imports: tempfile and Path
- Fix test_load_input_dataset_netcdf to use pytest.importorskip for
netCDF4 engine and specify engine explicitly
- test_selection_helpers.py:
- Remove unused datetime import
- Update test_check_step_single_point_coordinate to expect ValueError
instead of IndexError
- ops/selection.py:
- Fix check_step to raise descriptive ValueError when coordinate has
fewer than 2 points, instead of allowing IndexError
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Please revert to using the default PR template here. Once you've done that you can tag me for a review :) thank you! |
|
@leifdenby I've updated the PR to follow the default template. |
Describe your changes
This PR adds unit tests for several functions that previously lacked direct test coverage in the dataset processing utilities.
The goal is to improve reliability and maintainability of the codebase by ensuring that key dataset validation and chunking functions are properly tested, including edge cases and error handling.
Tests were added for the following functions:
check_point_in_datasetNonecheck_stepload_input_datasetcheck_chunk_sizechunk_datasetThese tests improve test coverage and help ensure dataset validation utilities behave correctly across different conditions.
Dependencies:
No new dependencies were introduced.
Issue Link
Improves test coverage for dataset validation utilities.
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
Checklist for assignee