Skip to content

Montecarlo dropout uncertainty estimation#258

Draft
lazarusA wants to merge 3 commits into
mainfrom
la/mc_dropout
Draft

Montecarlo dropout uncertainty estimation#258
lazarusA wants to merge 3 commits into
mainfrom
la/mc_dropout

Conversation

@lazarusA
Copy link
Copy Markdown
Member

@lazarusA lazarusA commented Apr 8, 2026

  • generate/save evaluation samples
  • get statistics

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Monte Carlo (MC) Dropout evaluation functionality, adding the Functors dependency to support model layer traversal. The review identifies several critical issues: the current use of compute_loss fails to capture predictions when train_mode is enabled, samples are lost when no file path is provided for in-memory evaluation, and the implementation suffers from significant I/O overhead due to opening and closing the JLD2 file within the sampling loop.

Comment on lines +15 to +25
loss_k, _, ŷ_k = compute_loss(
ghm, ps, st_train,
(x, (y, y_no_nan)),
logging = LoggingLoss(
train_mode = true,
loss_types = loss_types,
training_loss = training_loss,
extra_loss = extra_loss,
agg = agg
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The compute_loss function returns an empty NamedTuple for the stats field (the third return value) when logging.train_mode is set to true. Since this loop explicitly sets train_mode = true to keep dropout active, ŷ_k will be empty, and the model's predictions will not be captured. This prevents the MC Dropout from collecting the necessary samples for uncertainty estimation. You should consider calling the model directly to obtain predictions while in training mode, or adjusting compute_loss to return predictions even when train_mode is true.

Comment on lines +14 to +27
for k in 1:n_samples
loss_k, _, ŷ_k = compute_loss(
ghm, ps, st_train,
(x, (y, y_no_nan)),
logging = LoggingLoss(
train_mode = true,
loss_types = loss_types,
training_loss = training_loss,
extra_loss = extra_loss,
agg = agg
)
)
_store_sample(file_path, train_or_val_name, ŷ_k, loss_k, k)
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When file_path is nothing, the samples generated in the loop are not collected or returned. The _store_sample function returns the prediction and loss, but these values are ignored by the loop, and evaluate_mc_dropout returns nothing. If the intention is to allow in-memory evaluation, you should accumulate the results in a list and return them at the end of the function.



function _store_sample(file_path::String, name, ŷ, loss, sample)
return jldopen(file_path, "a+") do file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Opening and closing the JLD2 file with jldopen in append mode ("a+") inside the loop is inefficient. For a large number of samples, this will cause significant I/O overhead. It is recommended to open the file once before the loop starts and pass the open file handle to the storage function.

@lazarusA lazarusA added the enhancement New feature or request label Apr 8, 2026
@BernhardAhrens
Copy link
Copy Markdown
Collaborator

MC dropout does not make sense with globally estimated parameters, let's go for ensembles for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants