Skip to content

loss nan problem  #142

@pi1ing

Description

@pi1ing

Hello, thank you for your great job.
I tried to train DOTA dataset with the default cfgs(backbone: resnet_50), however got training result like this:

************************************************************
2021-11-05 09:18:35: global_step:20  current_step:20
per_cost_time:4.518s
refine_cls_loss_stage3:0.000
cls_loss:1364.121
refine_reg_loss:0.000
refine_reg_loss_stage3:0.000
reg_loss:2.277
refine_cls_loss:741079.375
total_losses:742445.750

************************************************************
2021-11-05 09:18:44: global_step:40  current_step:40
per_cost_time:0.234s
refine_cls_loss_stage3:0.000
cls_loss:nan
refine_reg_loss:0.000
refine_reg_loss_stage3:0.000
reg_loss:nan
refine_cls_loss:nan
total_losses:nan

by the way, I have one Gefore RTX 3080ti, the development environment uses the recommanded docker images, but the first
_, global_stepnp, summary_str = sess.run([train_op, global_step, summary_op])
took me 10 min to run, is it normal?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions