-
Notifications
You must be signed in to change notification settings - Fork 5
Incorporate kriging methods and documentation #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| kriged_results_output = ( | ||
| kriged_results | ||
| .merge( self.biology[ 'weight' ][ 'proportions' ][ 'sex_age_weight_proportions_df' ] , | ||
| on = [ 'stratum_num' , 'sex' ] ) | ||
| .assign( biomass_sex_age = lambda x: x.weight_sex_proportion_all * x.biomass_total ) | ||
| .loc[ : , [ 'centroid_latitude' , 'centroid_longitude' , 'stratum_num' , | ||
| 'cell_area_nmi2' , 'sex' , 'age' , 'biomass_unaged' , 'biomass_aged' , | ||
| 'biomass_total' , 'biomass_adult_cell_CV' , 'biomass_sex_age' ] ] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
df_tmp = (
kriged_results
.merge(
self.biology[ 'weight' ][ 'proportions' ][ 'sex_age_weight_proportions_df' ] ,
on = [ 'stratum_num' , 'sex' ]
)
)
kriged_results["biomass_sex_age"] = df_tmp["weight_sex_proportion_all"] * df_tmp["biomass_total"]
kriged_results_output = kriged_results[
[
'centroid_latitude' , 'centroid_longitude' , 'stratum_num' ,
'cell_area_nmi2' , 'sex' , 'age' , 'biomass_unaged' , 'biomass_aged' ,
'biomass_total' , 'biomass_adult_cell_CV' , 'biomass_sex_age'
]
]| self.statistics[ 'kriging' ].update( | ||
| { | ||
| 'apportioned_kriged_biomass_df': kriged_results_output , | ||
| } | ||
| ) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious about if there's any performance difference between .update or direct assignment. It looks like that's not the case, just that with .update one could assign many keys at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @brandynlucca : Thanks for the draft PR!
I made some inline comments on: 1) vectoring operations rather than using lambda, and 2) breaking large single lines into separate blocks for readability.
I'd also suggest moving the blocks within the very long apportion_kriged_biomass method into multiple functions that sit outside of Survey. This will 1) improve code readability, and 2) enable you to write tests for each explicitly -- that's why even though some are just a few lines it may be good to break them out. This may also provide the opportunity to clean up the columns in the merged dataframes that are not going to be used later in the downstream steps.
Function names below are just placeholders, feel free to change them:
calc_weight_strata:
Lines 1291 to 1313 in 5c36ebe
| ### Calculate station weights -- stratified by strata | |
| weight_strata_station = ( | |
| pd.concat( [ | |
| haul_catch_filtered | |
| .groupby( 'stratum_num' ) | |
| .apply( lambda df: pd.Series( { 'stratum_weight': df.haul_weight.sum( ) } ) ) | |
| .reset_index( ) | |
| .assign( station = 1 ) , | |
| specimen_df_copy | |
| .groupby( [ 'stratum_num' ] ) | |
| .apply( lambda df: pd.Series( { 'stratum_weight': df.weight.sum( ) } ) ) | |
| .reset_index() | |
| .assign( station = 2 ) | |
| ] ) | |
| ) | |
| ### Calculate stratum weights | |
| weight_strata = ( | |
| weight_strata_station | |
| .groupby( [ 'stratum_num' ] ) | |
| .apply( lambda x: pd.Series( { 'weight_stratum_total': x.stratum_weight.sum( ) } ) ) | |
| .reset_index() | |
| ) |
calc_length_age_sex_stratum_weight:
Lines 1315 to 1333 in 5c36ebe
| ### Calculate summed length-age-sex weight bins across strata | |
| length_age_sex_stratum_weight = ( | |
| specimen_df_copy | |
| .dropna( how = 'any' ) | |
| .loc[ lambda x: x.sex != 'unsexed' ] | |
| .pipe( lambda df: pd.concat( [ df , | |
| df.copy( ).assign( sex = 'all' ) ] ) ) | |
| .bin_variable( age_intervals , 'age' ) | |
| .bin_variable( length_intervals , 'length' ) | |
| .groupby( [ 'stratum_num' , 'age_bin' , 'length_bin' , 'sex' ] ) | |
| .apply( lambda df: pd.Series( { 'summed_weight_all': df.weight.sum( ) , | |
| 'summed_weight_adult': df.loc[ df.age > 1 ].weight.sum( ) } ) ) | |
| .reset_index( ) | |
| .replace( np.nan , 0 ) | |
| .assign( total_weight_sex_all = lambda df: df.groupby( [ 'stratum_num' , 'sex' ] )[ 'summed_weight_all' ].transform( sum ) , | |
| total_weight_sex_adult = lambda df: df.groupby( [ 'stratum_num' , 'sex' ] )[ 'summed_weight_adult' ].transform( sum ) , | |
| proportion_weight_all = lambda df: df.summed_weight_all / df.total_weight_sex_all , | |
| proportion_weight_adult = lambda df: df.summed_weight_adult / df.total_weight_sex_adult ) | |
| ) |
calc_dist_weight_sum:
Lines 1335 to 1344 in 5c36ebe
| ### Normalize the age-length-sex indexed proportions | |
| dist_weight_sum = ( | |
| length_age_sex_stratum_weight | |
| .merge( weight_strata , on = [ 'stratum_num' ] ) | |
| .groupby( [ 'stratum_num' , 'sex' ] ) | |
| .apply( lambda df: pd.Series( { | |
| 'proportion_normalized': ( df.proportion_weight_all * ( df.summed_weight_all / df.weight_stratum_total ).sum( ) ).sum( ) | |
| } ) ) | |
| .reset_index( ) | |
| ) |
calc_age_proportions:
Lines 1346 to 1356 in 5c36ebe
| ### Calculate aged / unaged proportions | |
| aged_proportions = ( | |
| length_age_sex_stratum_weight | |
| .loc[ lambda df: df.sex != 'all' ] | |
| .groupby( [ 'stratum_num' ] ) | |
| .apply( lambda df: pd.Series( { 'weight_aged_total': df.summed_weight_all.sum( ) } ) ) | |
| .reset_index( ) | |
| .merge( weight_strata , on = [ 'stratum_num' ] ) | |
| .assign( proportion_aged_total = lambda x: x.weight_aged_total / x.weight_stratum_total , | |
| proportion_unaged_total = lambda x: np.round( 1.0 - x.proportion_aged_total , decimals = 10 ) ) | |
| ) |
calc_haul_sex_weights_normalized:
Lines 1358 to 1408 in 5c36ebe
| ### Calculate interpolated weights based on length bins for each sex per haul | |
| # Pull length-weight fitted values | |
| length_weight_fit = ( | |
| self.statistics[ 'length_weight' ][ 'length_weight_df' ] | |
| .assign( length_bin_value = lambda x: x[ 'length_bin' ].apply( lambda y: y.mid ) ) | |
| ) | |
| # Sum haul weights per sex per stratum | |
| haul_weights = ( | |
| length_df_copy | |
| .bin_variable( length_intervals , 'length' ) | |
| .loc[ lambda x: x.group == 'sexed' ] | |
| .pipe( lambda df: pd.concat( [ df , df.assign( sex = 'all' ) ] ) ) | |
| .merge( length_weight_fit , on = [ 'sex' , 'length_bin' ] ) | |
| .groupby( [ 'stratum_num' , 'haul_num' , 'sex' ] ) | |
| .apply( lambda df: pd.Series( { 'weight_interp': ( np.interp( df.length, | |
| length_weight_fit.loc[ lambda x: x.sex.isin( df.sex ) ][ 'length_bin_value' ] , | |
| length_weight_fit.loc[ lambda x: x.sex.isin( df.sex ) ][ 'weight_modeled' ] ) * | |
| df.length_count ).sum( ) } ) ) | |
| .groupby( [ 'stratum_num' , 'sex' ] ) | |
| .apply( lambda df: pd.Series( { 'summed_haul_weights': df.weight_interp.sum( ) } ) ) | |
| .reset_index( ) | |
| ) | |
| ### Normalize haul weights (Station 1) | |
| haul_sex_weights_normalized = ( | |
| haul_weights | |
| .merge( weight_strata_station.loc[ lambda x: x.station == 1 ] , | |
| on = 'stratum_num' ) | |
| .assign( weight_normalized_station_1 = lambda df: ( | |
| df | |
| .groupby( [ 'stratum_num' ] ) | |
| .apply( lambda strata: strata[ 'stratum_weight' ] * strata[ 'summed_haul_weights' ] / | |
| ( df[ ( df.stratum_num == strata.name ) & ( df.sex == 'male' ) ][ 'summed_haul_weights' ].iloc[ 0 ] + | |
| df[ ( df.stratum_num == strata.name ) & ( df.sex == 'female' ) ][ 'summed_haul_weights' ].iloc[ 0 ] ) ) | |
| .reset_index( drop = True ) ) ) | |
| .merge( weight_strata , on = [ 'stratum_num' ] ) | |
| .assign( proportion_normalized_station_1 = lambda df: ( | |
| df.weight_normalized_station_1 / df.weight_stratum_total | |
| ) ) | |
| .assign( sex_proportion = lambda df: ( | |
| df | |
| .groupby( [ 'stratum_num' ] ) | |
| .apply( lambda strata: strata[ 'proportion_normalized_station_1' ] / | |
| ( df[ ( df.stratum_num == strata.name ) & ( df.sex == 'male' ) ][ 'proportion_normalized_station_1' ].iloc[ 0 ] + | |
| df[ ( df.stratum_num == strata.name ) & ( df.sex == 'female' ) ][ 'proportion_normalized_station_1' ].iloc[ 0 ] ) ) | |
| .reset_index( drop = True ) ) ) | |
| .merge( aged_proportions.loc[ : , [ 'stratum_num' , 'proportion_unaged_total' ] ] , | |
| on = [ 'stratum_num' ] ) | |
| .assign( proportion_unaged_sex = lambda x: x.sex_proportion * x.proportion_unaged_total ) | |
| ) |
calc_grid_CV:
Lines 1437 to 1465 in 5c36ebe
| ### Calculate grid CV | |
| biomass_density_adult = ( | |
| self.biology[ 'population' ][ 'areal_density' ][ 'biomass_density_df' ] | |
| .loc[ lambda df: df.sex == 'all' ] | |
| .drop_duplicates( subset = [ 'transect_num' , 'latitude' , 'longitude' , 'stratum_num' ] ) | |
| ) | |
| ### --------------------------------------------------------------- | |
| ### !!! TODO: | |
| ### While `C0` produces a similar standard deviation as the original | |
| ### code, `Bn` does not. This discrepancy results in an order of | |
| ### magnitude change in the resulting `biomass_adult_cell_CV` estimate. | |
| ### The source of this error stems from `kriged_results.B_a_adult_mean` | |
| ### whereby the right-tail of the `B_a_adult_mean` distribution is | |
| ### substantially shorter than from values (`biomass_density_adult_mean`) | |
| ### produced in the original code. This therefore relates to Issue #202. | |
| ### --------------------------------------------------------------- | |
| C0 = np.std( biomass_density_adult.B_a , ddof = 1 ) ** 2 | |
| Bn = np.nansum( kriged_results.B_a_adult_mean * kriged_results.cell_area_nmi2 ) * 1e-9 | |
| kriged_results = ( | |
| kriged_results | |
| .assign( biomass_adult_cell_CV = lambda df: ( | |
| self.statistics[ 'kriging' ][ 'model_config' ][ 'A0' ] * | |
| np.sqrt( kriged_results.B_a_adult_prediction_variance * C0 ) * | |
| 1e-9 / | |
| Bn * | |
| np.sqrt( len( kriged_results.B_a_adult_prediction_variance ) ) | |
| ) ) | |
| ) |
|
I tracked the above part about factoring blocks into functions in #203. We can address them there. @brandynlucca : There are a few small comments (like paths) that can probably be addressed easily within this PR. Otherwise, feel free to merge this. Great that we have the kriging and downstream incorporated! |
|
I'll merge this now so that #206 that is built on top of this can be cleaned up and merged too. I will create another issue to track the vectorizing comments. |
This (draft) PR includes changes to the primary kriging methods and calculations up to estimating areal biomass densities and subsequent total biomass calculations as well as sex- and age-specific estimates. There are high-priority discrepancies in the final kriged coefficient of variation (CV) and areal biomass density estimates that require further investigation and changes (#202). The complete implementation of the stratified algorithm (i.e. Jolly Hampton, 1990) for the kriged results are forthcoming.