Skip to content

Detailing where NaN/Inf values occur #180

@brandynlucca

Description

@brandynlucca

There are instances where certain functions, such as stratified_transect_statistic(...), incorporate numpy.nansum(...) and other similar functions that replace NaN values with 0. In some cases, this is necessary to complete the calculation. However, there are other cases where this is no longer the case and may create some confusion in terms of where to expect NaN values to occur. For instance, from stratified_transect_statistic(...):

            # Resample without replacement
            sel_inds = np.random.choice( np.arange( start , end ) , 
                                         num_transects_to_sample[ j ] , 
                                         replace=False )
            
            ### Define stratified weights
            stratified_weights = distance[ sel_inds ] / np.mean( distance[ sel_inds ] )

            ### Weighted value (e.g. biomass)
            value_distance_density = value[ sel_inds ] / distance[ sel_inds ]
            
            ### Compute mean and variance
            rho_j[ j ] = np.nansum( value[ sel_inds ] * stratified_weights ) / np.nansum( stratified_weights )
            var_j[ j ] = np.nansum( ( stratified_weights ** 2 * ( value_distance_density - rho_j[ j ] ) ** 2 ) ) / sample_dof[ j ]

Here it is noted in PR #176 that NaN values should only be relevant based on the distance input, which in this case is calculated via calculate_transect_distance(...). Since this function does not add a NaN to match the size of the original data, an NaN value should not be expected to be produced. Or if there is, then this should trigger an error state rather than letting the remaining transect_analysis(...) or stratified_summary(...) function/operation plod along like nothing is wrong. Although numpy.nansum slower than the standard sum, the size of the data probably means that this isn't a significant bottleneck. Nonetheless, it is probably overly conservative and could introduce cases where NaN values are replaced with unwanted 0 values that aren't accounted for and could influence downstream calculations (e.g. calculating the mean).

Metadata

Metadata

Assignees

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions