Skip to content

Conversation

@brandynlucca
Copy link
Collaborator

@brandynlucca brandynlucca commented Mar 7, 2024

This (draft) PR includes changes to the primary kriging methods and calculations up to estimating areal biomass densities and subsequent total biomass calculations as well as sex- and age-specific estimates. There are high-priority discrepancies in the final kriged coefficient of variation (CV) and areal biomass density estimates that require further investigation and changes (#202). The complete implementation of the stratified algorithm (i.e. Jolly Hampton, 1990) for the kriged results are forthcoming.

@brandynlucca brandynlucca self-assigned this Mar 7, 2024
@brandynlucca brandynlucca reopened this Mar 7, 2024
@brandynlucca brandynlucca marked this pull request as ready for review March 7, 2024 18:32
@brandynlucca brandynlucca marked this pull request as draft March 7, 2024 18:32
@brandynlucca brandynlucca changed the title Test-branch Incorporate kriging methods and documentation Mar 7, 2024
Comment on lines +1468 to +1476
kriged_results_output = (
kriged_results
.merge( self.biology[ 'weight' ][ 'proportions' ][ 'sex_age_weight_proportions_df' ] ,
on = [ 'stratum_num' , 'sex' ] )
.assign( biomass_sex_age = lambda x: x.weight_sex_proportion_all * x.biomass_total )
.loc[ : , [ 'centroid_latitude' , 'centroid_longitude' , 'stratum_num' ,
'cell_area_nmi2' , 'sex' , 'age' , 'biomass_unaged' , 'biomass_aged' ,
'biomass_total' , 'biomass_adult_cell_CV' , 'biomass_sex_age' ] ]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

df_tmp = (
    kriged_results
    .merge(
        self.biology[ 'weight' ][ 'proportions' ][ 'sex_age_weight_proportions_df' ] , 
        on = [ 'stratum_num' , 'sex' ]
    )
)
kriged_results["biomass_sex_age"] = df_tmp["weight_sex_proportion_all"] * df_tmp["biomass_total"]
kriged_results_output = kriged_results[
    [
        'centroid_latitude' , 'centroid_longitude' , 'stratum_num' , 
        'cell_area_nmi2' , 'sex' , 'age' , 'biomass_unaged' , 'biomass_aged' ,
        'biomass_total' , 'biomass_adult_cell_CV' , 'biomass_sex_age' 
    ]
]

Comment on lines +1479 to +1483
self.statistics[ 'kriging' ].update(
{
'apportioned_kriged_biomass_df': kriged_results_output ,
}
) No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious about if there's any performance difference between .update or direct assignment. It looks like that's not the case, just that with .update one could assign many keys at a time.

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @brandynlucca : Thanks for the draft PR!

I made some inline comments on: 1) vectoring operations rather than using lambda, and 2) breaking large single lines into separate blocks for readability.

I'd also suggest moving the blocks within the very long apportion_kriged_biomass method into multiple functions that sit outside of Survey. This will 1) improve code readability, and 2) enable you to write tests for each explicitly -- that's why even though some are just a few lines it may be good to break them out. This may also provide the opportunity to clean up the columns in the merged dataframes that are not going to be used later in the downstream steps.

Function names below are just placeholders, feel free to change them:

calc_weight_strata:

echopop/echopop/survey.py

Lines 1291 to 1313 in 5c36ebe

### Calculate station weights -- stratified by strata
weight_strata_station = (
pd.concat( [
haul_catch_filtered
.groupby( 'stratum_num' )
.apply( lambda df: pd.Series( { 'stratum_weight': df.haul_weight.sum( ) } ) )
.reset_index( )
.assign( station = 1 ) ,
specimen_df_copy
.groupby( [ 'stratum_num' ] )
.apply( lambda df: pd.Series( { 'stratum_weight': df.weight.sum( ) } ) )
.reset_index()
.assign( station = 2 )
] )
)
### Calculate stratum weights
weight_strata = (
weight_strata_station
.groupby( [ 'stratum_num' ] )
.apply( lambda x: pd.Series( { 'weight_stratum_total': x.stratum_weight.sum( ) } ) )
.reset_index()
)

calc_length_age_sex_stratum_weight:

echopop/echopop/survey.py

Lines 1315 to 1333 in 5c36ebe

### Calculate summed length-age-sex weight bins across strata
length_age_sex_stratum_weight = (
specimen_df_copy
.dropna( how = 'any' )
.loc[ lambda x: x.sex != 'unsexed' ]
.pipe( lambda df: pd.concat( [ df ,
df.copy( ).assign( sex = 'all' ) ] ) )
.bin_variable( age_intervals , 'age' )
.bin_variable( length_intervals , 'length' )
.groupby( [ 'stratum_num' , 'age_bin' , 'length_bin' , 'sex' ] )
.apply( lambda df: pd.Series( { 'summed_weight_all': df.weight.sum( ) ,
'summed_weight_adult': df.loc[ df.age > 1 ].weight.sum( ) } ) )
.reset_index( )
.replace( np.nan , 0 )
.assign( total_weight_sex_all = lambda df: df.groupby( [ 'stratum_num' , 'sex' ] )[ 'summed_weight_all' ].transform( sum ) ,
total_weight_sex_adult = lambda df: df.groupby( [ 'stratum_num' , 'sex' ] )[ 'summed_weight_adult' ].transform( sum ) ,
proportion_weight_all = lambda df: df.summed_weight_all / df.total_weight_sex_all ,
proportion_weight_adult = lambda df: df.summed_weight_adult / df.total_weight_sex_adult )
)

calc_dist_weight_sum:

echopop/echopop/survey.py

Lines 1335 to 1344 in 5c36ebe

### Normalize the age-length-sex indexed proportions
dist_weight_sum = (
length_age_sex_stratum_weight
.merge( weight_strata , on = [ 'stratum_num' ] )
.groupby( [ 'stratum_num' , 'sex' ] )
.apply( lambda df: pd.Series( {
'proportion_normalized': ( df.proportion_weight_all * ( df.summed_weight_all / df.weight_stratum_total ).sum( ) ).sum( )
} ) )
.reset_index( )
)

calc_age_proportions:

echopop/echopop/survey.py

Lines 1346 to 1356 in 5c36ebe

### Calculate aged / unaged proportions
aged_proportions = (
length_age_sex_stratum_weight
.loc[ lambda df: df.sex != 'all' ]
.groupby( [ 'stratum_num' ] )
.apply( lambda df: pd.Series( { 'weight_aged_total': df.summed_weight_all.sum( ) } ) )
.reset_index( )
.merge( weight_strata , on = [ 'stratum_num' ] )
.assign( proportion_aged_total = lambda x: x.weight_aged_total / x.weight_stratum_total ,
proportion_unaged_total = lambda x: np.round( 1.0 - x.proportion_aged_total , decimals = 10 ) )
)

calc_haul_sex_weights_normalized:

echopop/echopop/survey.py

Lines 1358 to 1408 in 5c36ebe

### Calculate interpolated weights based on length bins for each sex per haul
# Pull length-weight fitted values
length_weight_fit = (
self.statistics[ 'length_weight' ][ 'length_weight_df' ]
.assign( length_bin_value = lambda x: x[ 'length_bin' ].apply( lambda y: y.mid ) )
)
# Sum haul weights per sex per stratum
haul_weights = (
length_df_copy
.bin_variable( length_intervals , 'length' )
.loc[ lambda x: x.group == 'sexed' ]
.pipe( lambda df: pd.concat( [ df , df.assign( sex = 'all' ) ] ) )
.merge( length_weight_fit , on = [ 'sex' , 'length_bin' ] )
.groupby( [ 'stratum_num' , 'haul_num' , 'sex' ] )
.apply( lambda df: pd.Series( { 'weight_interp': ( np.interp( df.length,
length_weight_fit.loc[ lambda x: x.sex.isin( df.sex ) ][ 'length_bin_value' ] ,
length_weight_fit.loc[ lambda x: x.sex.isin( df.sex ) ][ 'weight_modeled' ] ) *
df.length_count ).sum( ) } ) )
.groupby( [ 'stratum_num' , 'sex' ] )
.apply( lambda df: pd.Series( { 'summed_haul_weights': df.weight_interp.sum( ) } ) )
.reset_index( )
)
### Normalize haul weights (Station 1)
haul_sex_weights_normalized = (
haul_weights
.merge( weight_strata_station.loc[ lambda x: x.station == 1 ] ,
on = 'stratum_num' )
.assign( weight_normalized_station_1 = lambda df: (
df
.groupby( [ 'stratum_num' ] )
.apply( lambda strata: strata[ 'stratum_weight' ] * strata[ 'summed_haul_weights' ] /
( df[ ( df.stratum_num == strata.name ) & ( df.sex == 'male' ) ][ 'summed_haul_weights' ].iloc[ 0 ] +
df[ ( df.stratum_num == strata.name ) & ( df.sex == 'female' ) ][ 'summed_haul_weights' ].iloc[ 0 ] ) )
.reset_index( drop = True ) ) )
.merge( weight_strata , on = [ 'stratum_num' ] )
.assign( proportion_normalized_station_1 = lambda df: (
df.weight_normalized_station_1 / df.weight_stratum_total
) )
.assign( sex_proportion = lambda df: (
df
.groupby( [ 'stratum_num' ] )
.apply( lambda strata: strata[ 'proportion_normalized_station_1' ] /
( df[ ( df.stratum_num == strata.name ) & ( df.sex == 'male' ) ][ 'proportion_normalized_station_1' ].iloc[ 0 ] +
df[ ( df.stratum_num == strata.name ) & ( df.sex == 'female' ) ][ 'proportion_normalized_station_1' ].iloc[ 0 ] ) )
.reset_index( drop = True ) ) )
.merge( aged_proportions.loc[ : , [ 'stratum_num' , 'proportion_unaged_total' ] ] ,
on = [ 'stratum_num' ] )
.assign( proportion_unaged_sex = lambda x: x.sex_proportion * x.proportion_unaged_total )
)

calc_grid_CV:

echopop/echopop/survey.py

Lines 1437 to 1465 in 5c36ebe

### Calculate grid CV
biomass_density_adult = (
self.biology[ 'population' ][ 'areal_density' ][ 'biomass_density_df' ]
.loc[ lambda df: df.sex == 'all' ]
.drop_duplicates( subset = [ 'transect_num' , 'latitude' , 'longitude' , 'stratum_num' ] )
)
### ---------------------------------------------------------------
### !!! TODO:
### While `C0` produces a similar standard deviation as the original
### code, `Bn` does not. This discrepancy results in an order of
### magnitude change in the resulting `biomass_adult_cell_CV` estimate.
### The source of this error stems from `kriged_results.B_a_adult_mean`
### whereby the right-tail of the `B_a_adult_mean` distribution is
### substantially shorter than from values (`biomass_density_adult_mean`)
### produced in the original code. This therefore relates to Issue #202.
### ---------------------------------------------------------------
C0 = np.std( biomass_density_adult.B_a , ddof = 1 ) ** 2
Bn = np.nansum( kriged_results.B_a_adult_mean * kriged_results.cell_area_nmi2 ) * 1e-9
kriged_results = (
kriged_results
.assign( biomass_adult_cell_CV = lambda df: (
self.statistics[ 'kriging' ][ 'model_config' ][ 'A0' ] *
np.sqrt( kriged_results.B_a_adult_prediction_variance * C0 ) *
1e-9 /
Bn *
np.sqrt( len( kriged_results.B_a_adult_prediction_variance ) )
) )
)

@leewujung
Copy link
Member

leewujung commented Mar 8, 2024

I tracked the above part about factoring blocks into functions in #203. We can address them there.

@brandynlucca : There are a few small comments (like paths) that can probably be addressed easily within this PR. Otherwise, feel free to merge this. Great that we have the kriging and downstream incorporated!

@leewujung
Copy link
Member

I'll merge this now so that #206 that is built on top of this can be cleaned up and merged too. I will create another issue to track the vectorizing comments.

@leewujung leewujung marked this pull request as ready for review March 9, 2024 17:10
@leewujung leewujung merged commit ba9f6f9 into OSOceanAcoustics:main Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants