Skip to content

tests: add unit tests for functions without direct test coverage#94

Open
arnavsharma990 wants to merge 2 commits intomllam:mainfrom
arnavsharma990:tests/add-missing-unit-tests
Open

tests: add unit tests for functions without direct test coverage#94
arnavsharma990 wants to merge 2 commits intomllam:mainfrom
arnavsharma990:tests/add-missing-unit-tests

Conversation

@arnavsharma990
Copy link

@arnavsharma990 arnavsharma990 commented Feb 20, 2026

Describe your changes

This PR adds unit tests for several functions that previously lacked direct test coverage in the dataset processing utilities.

The goal is to improve reliability and maintainability of the codebase by ensuring that key dataset validation and chunking functions are properly tested, including edge cases and error handling.

Tests were added for the following functions:

  • check_point_in_dataset

    • verifies behavior when a point exists, does not exist, or is None
  • check_step

    • tests constant step matching and mismatching cases
    • tests non-constant steps and edge cases
  • load_input_dataset

    • validates correct loading of datasets from Zarr and NetCDF formats
    • tests error handling for invalid inputs
  • check_chunk_size

    • tests warning behavior for small and large chunk sizes
    • verifies behavior when dimensions are missing
  • chunk_dataset

    • tests successful dataset chunking
    • verifies error handling scenarios

These tests improve test coverage and help ensure dataset validation utilities behave correctly across different conditions.

Dependencies:
No new dependencies were introduced.


Issue Link

Improves test coverage for dataset validation utilities.


Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form
  • I have requested a reviewer and an assignee

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section reflecting type of change:
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog
  • Once the PR is ready to be merged, squash commits and merge the PR.

Add focused unit tests for:
- check_point_in_dataset: test point exists/not exists/None cases
- check_step: test constant step matching/mismatching, non-constant step, edge cases
- load_input_dataset: test zarr/netCDF loading and error handling
- check_chunk_size: test warning behavior for small/large chunks, missing dimensions
- chunk_dataset: test successful chunking and error handling

These tests follow existing test patterns and provide minimal but complete
coverage for previously untested helper functions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings February 20, 2026 08:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new unit tests to cover several helper/ops functions that previously lacked direct test coverage, focusing on selection validation, dataset loading, and dataset chunking behavior.

Changes:

  • Add tests for check_point_in_dataset and check_step edge cases and error paths.
  • Add tests for load_input_dataset for Zarr and NetCDF inputs plus missing-file handling.
  • Add tests for check_chunk_size warning behavior and chunk_dataset success/error handling.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
tests/test_selection_helpers.py Adds coverage for selection helper validation and step-size checking edge cases.
tests/test_loading.py Adds coverage for loading datasets from Zarr/NetCDF and missing-path behavior.
tests/test_chunking.py Adds coverage for chunk-size checking and dataset chunking error handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +23 to +35
@pytest.fixture
def large_dataset():
"""Create a dataset that will exceed chunk size warning."""
# Create dataset with large chunks that exceed 1GB warning
# Using float64 (8 bytes), need > 1GB / 8 = 134217728 elements
# For simplicity, create a smaller but still large dataset
size = 5000
return xr.Dataset(
{
"large_var": (["x", "y"], np.random.random((size, size))),
},
coords={"x": range(size), "y": range(size)},
)
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

large_dataset allocates a 5000x5000 float64 array (~200MB) during test collection/execution, which is likely to slow down or OOM CI. You can trigger the chunk-size warning without a huge dataset (the implementation only uses chunks and dtype), so this fixture should be removed or made tiny.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +53
def test_check_chunk_size_large_chunks(large_dataset, caplog):
"""Test check_chunk_size with large chunks (should warn)."""
# Use chunks that will create large memory usage
chunks = {"x": 1000, "y": 1000}
check_chunk_size(large_dataset, chunks)
# Should log a warning
assert len(caplog.records) > 0
assert "exceeds" in caplog.records[0].message.lower()
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning test is currently inconsistent with the implementation:

  • check_chunk_size computes memory_usage from the requested chunk sizes (product of values in chunks) and dtype, so {"x": 1000, "y": 1000} is only ~8MB and should not exceed the 1GB threshold.
  • check_chunk_size logs via loguru.logger, which caplog does not capture by default, so caplog.records will remain empty.
    Adjust the test to (1) use chunk sizes whose product exceeds the threshold and (2) capture Loguru output using a Loguru sink (or a pytest plugin that bridges Loguru to stdlib logging).

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +6
import tempfile
from pathlib import Path

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempfile and Path are imported but never used in this test module; please remove them to keep the test suite clean (and avoid failing linting if enabled).

Suggested change
import tempfile
from pathlib import Path

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +37
def test_load_input_dataset_netcdf(sample_dataset, tmp_path):
"""Test load_input_dataset with netCDF format."""
nc_path = tmp_path / "test.nc"
sample_dataset.to_netcdf(nc_path)

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_dataset.to_netcdf(...) requires an optional NetCDF engine (typically scipy, netCDF4, or h5netcdf). The project dependencies don’t appear to include any of these, so this test may fail in CI depending on the environment. Consider using pytest.importorskip(...) for the chosen engine and specifying it explicitly (e.g., engine="scipy"), or adding an explicit test dependency to ensure NetCDF support is available.

Copilot uses AI. Check for mistakes.
"""
Unit tests for helper functions in ops.selection module.
"""
import datetime
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime is imported but unused in this test file; please remove it to avoid unused-import warnings / lint failures.

Suggested change
import datetime

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +84
def test_check_step_single_point_coordinate():
"""Test check_step with single point coordinate (edge case - will raise IndexError)."""
# Create dataset with single time point (diff will be empty array)
time_values = pd.date_range("2020-01-01", periods=1, freq="3H")
ds = xr.Dataset(
{"var": (["time"], [1])},
coords={"time": time_values},
)
requested_step = pd.Timedelta(hours=3)
# This will raise IndexError when trying to access all_steps[0] on empty array
# This documents current behavior - could be improved to raise more descriptive error
with pytest.raises(IndexError):
check_step(requested_step, "time", ds)
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test locks in an IndexError for the single-point coordinate case, which is an implementation accident (coming from indexing all_steps[0] on an empty diff array). It would be more maintainable to update check_step to raise a descriptive ValueError when ds[coord] has fewer than 2 points, and assert that error type/message here instead.

Copilot uses AI. Check for mistakes.
- test_chunking.py:
  - Remove large_dataset fixture (inefficient 5000x5000 allocation)
  - Fix test_check_chunk_size_large_chunks to use chunk sizes that actually
    exceed 1GB threshold (12000x12000) instead of 1000x1000
  - Use loguru handler to capture logs instead of caplog (which doesn't
    capture loguru output)
  - Remove caplog from test_check_chunk_size_small_chunks

- test_loading.py:
  - Remove unused imports: tempfile and Path
  - Fix test_load_input_dataset_netcdf to use pytest.importorskip for
    netCDF4 engine and specify engine explicitly

- test_selection_helpers.py:
  - Remove unused datetime import
  - Update test_check_step_single_point_coordinate to expect ValueError
    instead of IndexError

- ops/selection.py:
  - Fix check_step to raise descriptive ValueError when coordinate has
    fewer than 2 points, instead of allowing IndexError

Co-authored-by: Cursor <cursoragent@cursor.com>
@leifdenby
Copy link
Member

Please revert to using the default PR template here. Once you've done that you can tag me for a review :) thank you!

@arnavsharma990
Copy link
Author

@leifdenby I've updated the PR to follow the default template.
Please let me know if anything else should be adjusted. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants