Skip to content

先增强预训练后SFT,grad_norm为0 #453

@dakongss123

Description

@dakongss123

先用自己的数据进行pt,生成一个模型,之后训练full_sft,训练过程grad_norm为0,不收敛。但是直接用基座模型进行full_sft是可以的。这个是哪里设置不对

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions