Skip to content

Conversation

@brandynlucca
Copy link
Collaborator

This PR adds:

  • New reports generation module (echopop.extensions.FEATReports) which comprises a series of helper functions that all feed into a central FEATReports class
  • The associated *.xlsx files are produced at the defined directory path via FEATReports.generate(), which can be tuned to produce a subset of the reports if needed
  • Fleshed out docstring documentation
  • An example notebook (and therefore documentation) to demonstrate how FEATReports is used for generating these reports

@brandynlucca brandynlucca added documentation Improvements or additions to documentation enhancement New feature or request design high_priority FEAT labels Nov 16, 2024
@brandynlucca brandynlucca added this to the v0.4.2 (reports, plots) milestone Nov 16, 2024
@brandynlucca brandynlucca self-assigned this Nov 16, 2024
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @brandynlucca : Thanks for the PR! Exciting to see these all in place!

I have a minor inline comment and a few structural ones below that hopefully are straightforward.

Module name

Change FEATReport.py to feat_report.py?

Report output folder

What do you think about making save_directory a required argument in .generate()? The current default is a separate folder report under the data_root_dir in *_config.yml, and there are a few other folders under this same root directory. I feel it is usually good practice to keep data and outputs separately located, so forcing people to provide a path to a folder may help facilitate that.

Report generation API

I think the current implementation with FEATReport being a separate class in echopop.extensions is a great arrangement. I only have one comment in the calling API -- Going through the example notebook, I felt there is a jump to have to bring in a separate FEATReports class after doing survey.SOME_FUNCTION multiple times.

What about keeping the same API format and creating a hook under the hood so that users can do something:

survey.generate_report(kind="FEAT", save_directory="REPORT_PATH", ...)

underneath this function you could pass self and all argument to the current FEATReport class to work in the same way.

Since the FEATReport only has 1 outward-facing function .generate(), I feel this would reduce one less thing users have to worry about, and allows other report types to be flexibly added in the future using the some survey.generate_report() shell but with completely function/class structures.

__METHOD_REFERENCE__

I think using lambda v: v.data to pass self.data (or one of the dict entries) as argument data in functions like aged_length_haul_counts_report, kriged_length_age_biomass_report, etc in __METHOD_REFERENCE__ can make maintenance down the road more difficult. The main thing is that it would require additional effort to see that some of the data argument are just FEATReport.data and some of the data argument are a dict entry of FEATReport.data, and FEATReport.data is nothing but the entire survey object.

Instead of this structure, I suggest the following:

  • remove the data argument and just get everything needed from self.data within each of the class methods (those specified as "function" in the current __METHOD_REFERENCE__), to avoid passing the same thing twice (the current self and data arguments). Things like data(self).input["biology"]["specimen_df"], is be pretty confusing -- isn't it the same as self.input["biology"]["specimen_df"]?
  • since Path(v.save_directory) is used in all "filename" entries in the current __METHOD_REFERENCE__, you can pass save_directory into each function

Take the first entry aged_length_haul_counts for example, you can have:

"aged_length_haul_counts": {
    "function": aged_length_haul_counts_report,
    "title": "Aged Length-Haul Counts ({SEX})",
    "filename": "aged_length_haul_counts_table.xlsx",
},

and in aged_length_haul_counts_report have the function signature

def aged_length_haul_counts_report(self, save_directory, title, filename)
   # assemble report path under this function from save_directory and filename

When calling it, the line can be modified slightly:

report_files = [v.get("function")(self, save_directory, **v) for _, v in report_methods.items()]

@brandynlucca
Copy link
Collaborator Author

brandynlucca commented Nov 20, 2024

Creating a tracking checklist:

  • In-line changes
  • Module name
  • Report output folder
  • Report generation API
  • METHOD_REFERENCE

@brandynlucca brandynlucca merged commit 1715598 into OSOceanAcoustics:main Nov 21, 2024
@brandynlucca brandynlucca deleted the report_generation branch December 5, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

design documentation Improvements or additions to documentation enhancement New feature or request FEAT high_priority

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants