v2.5.0
全流程适配Qwen3/Qwen3.5:
PT(增量预训练)— training/pretraining.py
SFT(有监督微调)— training/supervised_finetuning.py / training/supervised_finetuning_accelerate.py
RM(奖励模型)— training/reward_modeling.py
PPO(强化学习)— training/ppo_training.py
DPO(直接偏好优化)— training/dpo_training.py
ORPO(比值比偏好优化)— training/orpo_training.py
GRPO(群组相对策略优化)— training/grpo_training.py
DeepSpeed ZeRO-3 MoE 支持
针对 MoE 架构模型(Qwen3 MoE、Qwen3.5 MoE),在所有训练脚本中自动检测并设置 set_z3_leaf_modules,确保 DeepSpeed ZeRO-3 分布式训练下 MoE 模块不会被错误分片。同时兼容已有的 Mixtral 和 DeepSeek-V3 MoE 模型。
项目结构重组
将根目录 38 个散落文件按功能重组到 5 个子目录,提升教学仓库的可读性:
MedicalGPT/
├── training/ # 核心训练脚本(PT/SFT/RM/PPO/DPO/ORPO/GRPO + 模板)
├── scripts/ # 一键运行脚本 + DeepSpeed 配置(zero1/2/3.json)
├── demo/ # 推理、Gradio UI、FastAPI 服务、RAG 问答
├── tools/ # LoRA 合并、量化、词表扩充、数据转换
└── notebooks/ # Colab 端到端教程 Notebook
所有脚本从项目根目录运行,例如:bash scripts/run_sft.sh
快速开始
# SFT 微调 Qwen3.5-2B
bash scripts/run_sft.sh
# GRPO 训练
bash scripts/run_grpo.sh
# 推理
python demo/inference.py --base_model Qwen/Qwen3.5-2B --interactive
What's Changed
- Refactor model_kwargs and logging for clarity by @whaler404 in #470
- Add MiniMax as LLM provider for roleplay data generation by @octo-patch in #481
- Fix ppo_training.py: unbalanced multi-GPU memory allocation causing OOM by @Copilot in #484
New Contributors
- @whaler404 made their first contribution in #470
- @octo-patch made their first contribution in #481
- @Copilot made their first contribution in #484
Full Changelog: 2.4.0...2.5.0