Release v2.5.0 · shibing624/MedicalGPT

全流程适配Qwen3/Qwen3.5：

PT（增量预训练）— training/pretraining.py
SFT（有监督微调）— training/supervised_finetuning.py / training/supervised_finetuning_accelerate.py
RM（奖励模型）— training/reward_modeling.py
PPO（强化学习）— training/ppo_training.py
DPO（直接偏好优化）— training/dpo_training.py
ORPO（比值比偏好优化）— training/orpo_training.py
GRPO（群组相对策略优化）— training/grpo_training.py
DeepSpeed ZeRO-3 MoE 支持

针对 MoE 架构模型（Qwen3 MoE、Qwen3.5 MoE），在所有训练脚本中自动检测并设置 set_z3_leaf_modules，确保 DeepSpeed ZeRO-3 分布式训练下 MoE 模块不会被错误分片。同时兼容已有的 Mixtral 和 DeepSeek-V3 MoE 模型。

项目结构重组
将根目录 38 个散落文件按功能重组到 5 个子目录，提升教学仓库的可读性：

MedicalGPT/
├── training/ # 核心训练脚本（PT/SFT/RM/PPO/DPO/ORPO/GRPO + 模板）
├── scripts/ # 一键运行脚本 + DeepSpeed 配置（zero1/2/3.json）
├── demo/ # 推理、Gradio UI、FastAPI 服务、RAG 问答
├── tools/ # LoRA 合并、量化、词表扩充、数据转换
└── notebooks/ # Colab 端到端教程 Notebook
所有脚本从项目根目录运行，例如：bash scripts/run_sft.sh

快速开始
# SFT 微调 Qwen3.5-2B
bash scripts/run_sft.sh
# GRPO 训练
bash scripts/run_grpo.sh
# 推理
python demo/inference.py --base_model Qwen/Qwen3.5-2B --interactive

What's Changed

Refactor model_kwargs and logging for clarity by @whaler404 in #470
Add MiniMax as LLM provider for roleplay data generation by @octo-patch in #481
Fix ppo_training.py: unbalanced multi-GPU memory allocation causing OOM by @Copilot in #484

New Contributors

@whaler404 made their first contribution in #470
@octo-patch made their first contribution in #481
@Copilot made their first contribution in #484

Full Changelog: 2.4.0...2.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.5.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!