Skip to content

v2.5.0

Choose a tag to compare

@shibing624 shibing624 released this 07 Apr 12:18
· 18 commits to main since this release

全流程适配Qwen3/Qwen3.5:

PT(增量预训练)— training/pretraining.py
SFT(有监督微调)— training/supervised_finetuning.py / training/supervised_finetuning_accelerate.py
RM(奖励模型)— training/reward_modeling.py
PPO(强化学习)— training/ppo_training.py
DPO(直接偏好优化)— training/dpo_training.py
ORPO(比值比偏好优化)— training/orpo_training.py
GRPO(群组相对策略优化)— training/grpo_training.py
DeepSpeed ZeRO-3 MoE 支持

针对 MoE 架构模型(Qwen3 MoE、Qwen3.5 MoE),在所有训练脚本中自动检测并设置 set_z3_leaf_modules,确保 DeepSpeed ZeRO-3 分布式训练下 MoE 模块不会被错误分片。同时兼容已有的 Mixtral 和 DeepSeek-V3 MoE 模型。

项目结构重组
将根目录 38 个散落文件按功能重组到 5 个子目录,提升教学仓库的可读性:

MedicalGPT/
├── training/ # 核心训练脚本(PT/SFT/RM/PPO/DPO/ORPO/GRPO + 模板)
├── scripts/ # 一键运行脚本 + DeepSpeed 配置(zero1/2/3.json)
├── demo/ # 推理、Gradio UI、FastAPI 服务、RAG 问答
├── tools/ # LoRA 合并、量化、词表扩充、数据转换
└── notebooks/ # Colab 端到端教程 Notebook
所有脚本从项目根目录运行,例如:bash scripts/run_sft.sh

快速开始
# SFT 微调 Qwen3.5-2B
bash scripts/run_sft.sh
# GRPO 训练
bash scripts/run_grpo.sh
# 推理
python demo/inference.py --base_model Qwen/Qwen3.5-2B --interactive

What's Changed

  • Refactor model_kwargs and logging for clarity by @whaler404 in #470
  • Add MiniMax as LLM provider for roleplay data generation by @octo-patch in #481
  • Fix ppo_training.py: unbalanced multi-GPU memory allocation causing OOM by @Copilot in #484

New Contributors

Full Changelog: 2.4.0...2.5.0