Skip to content

Transect subset selection bug #95

@emiliom

Description

@emiliom

PR #89 overhauled the interim transect subset selection approach initially implemented in v0.1.0-alpha. The intent was to implement the approach found in Matlab EchoPro, as described in #33.

I have found a bug that occurs in some circumstances. The bug emerged when rerunning the bootstrapping_walkthrough.ipynb notebook with the current main. PR #89 introduced a very small change to that notebook, changing the removal_percentage argument value in boot_obj.run_bootstrapping from 50.0 to 60.0. The notebook runs successfully when using 50.0, but boot_obj.run_bootstrapping results in a transect selection error when using 60.0

Currently the notebook transect_selection_workflow.ipynb runs successfully while also applying transect subsetting. But the percentage used is also 50.0%. In addition, there is a transect subsetting test, test_transect_selection.py::test_transect_selection_output, that also runs successfully; rather than a %, it uses a preselected list of transect ids.

The error happens in computation/transect_results.py::set_adult_NASC, specifically in this statement:
https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1272-L1274
It's a KeyError (eg, "KeyError: '[6] not in index'") generated when nasc_fraction_adult_df is missing a stratum number found in self.nasc_df.stratum_num. As described in #33, when a subset of transects is requested, the reduced transect set may not contain all possible stratum_num. A scheme is needed to backfill values for the "missing" strata. The scheme introduced in v0.1.0-alpha and described in #33 under "Current solution for missing strata" followed some simple rules Brandon and I devised to fill or interpolate missing strata. PR #89 replaced that strategy with the more involved (and possibly opaque?) scheme used in the Matlab EchoPro code. However, for transect_results.py::set_adult_NASC, it looks like the missing-strata scheme was overlooked. The variable nasc_fraction_adult_df is created based on the strata found in self.bin_ds.len_age_dist_all, then its values are populated by looping over those strata in a for loop. Therefore, when strata are missing in self.bin_ds.len_age_dist_all, they are never backfilled in nasc_fraction_adult_df, which leads to the error. Note: self.bin_ds is an Xarray Dataset that contains stratum_num as a dimension and coordinate variable; self.bin_ds.len_age_dist_all contains that dimension.

This error has nothing to do with bootstrapping per se. But because bootstrapping generates several realizations of a transect subset, it more easily led to the error condition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug_in_pythonSomething in the Python implementation does not match what's in Matlab

    Type

    Projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions