Detailing where NaN/Inf values occur

There are instances where certain functions, such as `stratified_transect_statistic(...)`, incorporate `numpy.nansum(...)` and other similar functions that replace `NaN` values with `0`. In some cases, this is necessary to complete the calculation. However, there are other cases where this is no longer the case and may create some confusion in terms of where to expect `NaN` values to occur. For instance, from `stratified_transect_statistic(...)`:

```python
            # Resample without replacement
            sel_inds = np.random.choice( np.arange( start , end ) , 
                                         num_transects_to_sample[ j ] , 
                                         replace=False )
            
            ### Define stratified weights
            stratified_weights = distance[ sel_inds ] / np.mean( distance[ sel_inds ] )

            ### Weighted value (e.g. biomass)
            value_distance_density = value[ sel_inds ] / distance[ sel_inds ]
            
            ### Compute mean and variance
            rho_j[ j ] = np.nansum( value[ sel_inds ] * stratified_weights ) / np.nansum( stratified_weights )
            var_j[ j ] = np.nansum( ( stratified_weights ** 2 * ( value_distance_density - rho_j[ j ] ) ** 2 ) ) / sample_dof[ j ]
```

Here it is noted in PR #176 that `NaN` values should only be relevant based on the distance input, which in this case is calculated via `calculate_transect_distance(...)`. Since this function does not add a `NaN` to match the size of the original data, an `NaN` value should not be expected to be produced. Or if there is, then this should trigger an error state rather than letting the remaining `transect_analysis(...)` or `stratified_summary(...)` function/operation plod along like nothing is wrong. Although `numpy.nansum` slower than the standard `sum`, the size of the data probably means that this isn't a significant bottleneck. Nonetheless, it is probably overly conservative and could introduce cases where `NaN` values are replaced with unwanted `0` values that aren't accounted for and could influence downstream calculations (e.g. calculating the mean). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailing where NaN/Inf values occur #180

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detailing where NaN/Inf values occur #180

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions