Skip to content

Conversation

@brandynlucca
Copy link
Collaborator

@brandynlucca brandynlucca commented Jun 25, 2024

This PR includes a variety of new features for ingesting and batch processing Echoview exports to both 1) write a new xlsx file with the consolidated vertically integrated (NASC) exports for age-1+ and age-2+ datasets and 2) directly update the Survey class object to read the updated data. These changes include:

New batch loader sub-module

  • The batch_load sub-module has been added to the utils sub-package. This is called when the Survey-class object is initialized, which adds additional arguments to the initialization. Alternatively, the primary batch processing function, bach_read_echoview_exports(...), can be invoked independent of the Survey object.

Data parameterization changes

  • A new section in initialization_config.yml (nasc_exports) introducing various keys for parameterizing the outputs.
  • A new section in survey_year_****_config.yml enabling filepaths to echogram regions (i.e. export_regions).
  • Two new mapping keys have been added to core.py required for data validation (ECHOVIEW_EXPORT_MAP and REGION_EXPORT_MAP).
  • Mismatches in age-1+ and age-2+ region IDs, stratum numbers, and haul numbers necessitated a separator similar to NASC_all_ages vs NASC_no_age1. The *_all_ages and *_no_age1 have now been generalized as appended 'tags' to age group-specific values that are then converted to shared column names (i.e., nasc, stratum_num, and haul_num) depending on the exclude_age1 argument value. This necessitated some changes to the load sub-module in the utils sub-package.

Housekeeping changes

  • Small changes to .pre-commit-config.yaml and the creation of an associated .codespell-ignore-words.txt that is compatible with misspelled column names in data sources.

Note

  • There is a small outstanding issue concerning mismatches in the output of the batch loader (which matches the previous NASC construction operation in EchoPro) and the pre-generated consolidated export files we have received previously. This is being diagnosed, but appears to be due to differences in the source export files rather than anything within the code.

@brandynlucca brandynlucca requested a review from leewujung June 25, 2024 20:58
updates:
- [github.com/psf/black: 24.4.2 → 24.8.0](psf/black@24.4.2...24.8.0)
- [github.com/PyCQA/flake8: 7.1.0 → 7.1.1](PyCQA/flake8@7.1.0...7.1.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@brandynlucca brandynlucca merged commit 80d6bc1 into OSOceanAcoustics:main Aug 28, 2024
@brandynlucca brandynlucca deleted the WIP_nasc_directory_ingestion branch August 28, 2024 18:47
@brandynlucca brandynlucca restored the WIP_nasc_directory_ingestion branch October 16, 2024 05:04
@brandynlucca brandynlucca deleted the WIP_nasc_directory_ingestion branch December 5, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

2 participants