opea-project · lvliang-intel · Apr 9, 2025 · Apr 7, 2025 · Apr 8, 2025 · Apr 8, 2025
@@ -34,8 +34,7 @@ ENV PATH=$PATH:/home/user/.local/bin
 RUN cd /home/user/comps/finetuning/src/integrations/xtune && git config --global user.name "test" && git config --global user.email "test" && bash prepare_xtune.sh 
 
 RUN python -m pip install --upgrade pip setuptools peft && \
-    python -m pip install -r /home/user/comps/finetuning/src/requirements.txt && \
-    python -m pip install --no-deps transformers==4.45.0 datasets==2.21.0 accelerate==0.34.2 peft==0.12.0
+    python -m pip install -r /home/user/comps/finetuning/src/requirements.txt 
 
 ENV PYTHONPATH=$PYTHONPATH:/home/user
 

@@ -36,26 +36,21 @@ Run install_xtune.sh to prepare component.
 conda create -n xtune python=3.10 -y
 conda activate xtune
 apt install -y rsync
+# open webui as default
 bash prepare_xtune.sh
+# this way it will not open webui
+# bash prepare_xtune.sh false
 ```
 
 Blow command is in prepare_xtune.sh. You can ignore it if you don't want to update lib manually.
 
 ```bash
-pip install -r requirements.txt
 # if you want to run on NVIDIA GPU
     conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
 # else run on A770
-# You can refer to https://github.com/intel/intel-extension-for-pytorch for latest command
+# You can refer to https://github.com/intel/intel-extension-for-pytorch for latest command to update lib
     python -m pip install torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi torchaudio==2.5.1+cxx11.abi  --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
-
-cd src/llamafactory/clip_finetune/dassl
-python setup.py develop
-cd ../../../..
-pip install matplotlib
-pip install -e ".[metrics]"
-pip install --no-deps transformers==4.45.0 datasets==2.21.0 accelerate==0.34.2 peft==0.12.0
-python -m pip install intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+    python -m pip install intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 
 ### 2. Install xtune on docker
@@ -107,6 +102,13 @@ then make `dataset_info.json` in your dataset directory
 
 ## Fine-Tuning with LLaMA Board GUI (powered by [Gradio](https://github.com/gradio-app/gradio))
 
+When run with prepare_xtune.sh, it will automatic run ZE_AFFINITY_MASK=0 llamafactory-cli webui.
+
+If you see "server start successfully" in terminal.
+You can access in web through http://localhost:7860/
+
+The UI component information can be seen in doc/ui_component.md after run with prepare_xtune.sh.
+
 ```bash
  Run with A100:
  CUDA_VISIBLE_DEVICES=0 llamafactory-cli webui
@@ -116,6 +118,127 @@ then make `dataset_info.json` in your dataset directory
  Then access in web through http://localhost:7860/
 ```
 
+## Fine-Tuning with Shell instead of GUI
+
+After run `prepare_xtune.sh`, it will download all related file. And open webui as default.
+
+You can run `bash prepare_xtune.sh false` to close webui. Then you can run fine-tune with shell.
+
+Below are examples.
+
+### CLIP
+
+Please see [doc](./doc/key_features_for_clip_finetune_tool.md) for how to config feature
+
+```bash
+cd src/llamafactory/clip_finetune
+# Please see README.md in src/llamafactory/clip_finetune for detail
+```
+
+### AdaCLIP
+
+```bash
+cd src/llamafactory/adaclip_finetune
+# Please see README.md in src/llamafactory/adaclip_finetune for detail
+```
+
+### DeepSeek-R1 Distillation(not main function)
+
+Please see [doc](./doc/DeepSeek-R1_distillation_best_practice-v1.1.pdf) for details
+
+#### Step 1: Download existing CoT synthetic dataset from huggingface
+
+Dataset link: https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B
+
+#### Step 2: Convert to sharegpt format
+
+```bash
+cd data
+import json
+from datasets import load_dataset
+# Load the dataset
+dataset = load_dataset("Magpie-Align/Magpie-Reasoning-V2-""250K-CoT-Deepseek-R1-Llama-70B")
+dataset = dataset["train"]
+# Filter dataset
+## Change the filter conditions according to your needs
+dataset = dataset.filter(lambda example: len(example['response']) <= 1024)
+# Save as sharegpt format
+with open("Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B-response1024.json",
+'w') as f:
+  json.dump(list(dataset), f, ensure_ascii=False, indent=4)
+```
+
+#### Step 3: Register CoT dataset LLAMA-Factory dataset_info.json
+
+```bash
+cd data
+vim dataset_info.json
+
+# make sure the file is put under `xtune/data`
+"deepseek-r1-distill-sample": {
+  "file_name": "Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B-response1024.json",
+  "formatting": "sharegpt",
+  "columns": {
+    "messages": "conversations"
+  }
+}
+```
+
+#### Step 4: Use the accelerate command to enable training on XPU plugin
+
+```
+accelerate config
+
+For Single GPU:
+  Which type of machine are you using?
+  No distributed training
+  Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)? [yes/NO]:NO
+  Do you want to use XPU plugin to speed up training on XPU? [yes/NO]:yes
+  Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
+  Do you want to use DeepSpeed? [yes/NO]: NO
+  What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all
+  Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]:
+  Do you wish to use mixed precision?
+  bf16
+For Multi-GPU with FSDP:
+  Which type of machine are you using?
+  multi-XPU
+  How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
+  Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: NO
+  Do you want to use XPU plugin to speed up training on XPU? [yes/NO]:yes
+  Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
+  Do you want to use DeepSpeed? [yes/NO]: NO
+  Do you want to use FullyShardedDataParallel? [yes/NO]: yes
+  What should be your sharding strategy?
+  FULL_SHARD
+  Do you want to offload parameters and gradients to CPU? [yes/NO]: NO
+  What should be your auto wrap policy?
+  TRANSFORMER_BASED_WRAP
+  Do you want to use the model's `_no_split_modules` to wrap. Only applicable for Transformers [yes/NO]: yes
+  What should be your FSDP's backward prefetch policy?
+  BACKWARD_PRE
+  What should be your FSDP's state dict type?
+  SHARDED_STATE_DICT
+  Do you want to enable FSDP's forward prefetch policy? [yes/NO]: yes
+  Do you want to enable FSDP's `use_orig_params` feature? [YES/no]: yes
+  Do you want to enable CPU RAM efficient model loading? Only applicable for Transformers models. [YES/no]: yes
+  Do you want to enable FSDP activation checkpointing? [yes/NO]: yes
+  How many GPU(s) should be used for distributed training? [1]:2
+  Do you wish to use mixed precision?
+  bf16
+```
+
+#### Step 5: Run with train script as follows
+
+```bash
+export ONEAPI_DEVICE_SELECTOR="level_zero:0"
+MODEL_ID="microsoft/Phi-3-mini-4k-instruct"
+EXP_NAME="Phi-3-mini-4k-instruct-r1-distill-finetuned"
+DATASET_NAME="deepseek-r1-distill-sample"
+export OUTPUT_DIR="where to put output"
+accelerate launch src/train.py --stage sft --do_train --use_fast_tokenizer --new_special_tokens "<think>,</think>" --resize_vocab --flash_attn auto --model_name_or_path ${MODEL_ID} --dataset ${DATASET_NAME} --template phi --finetuning_type lora --lora_rank 8 --lora_alpha 16 --lora_target q_proj,v_proj,k_proj,o_proj --additional_target lm_head,embed_tokens --output_dir $OUTPUT_DIR --overwrite_cache --overwrite_output_dir --warmup_steps 100 --weight_decay 0.1 --per_device_train_batch_size 1 --gradient_accumulation_steps 4 --ddp_timeout 9000 --learning_rate 5e-6 --lr_scheduler_type cosine --logging_steps 1 --save_steps 1000 --plot_loss --num_train_epochs 3 --torch_empty_cache_steps 10 --bf16
+```
+
 ## `Xtune` Examples
 
 See screenshot of running CLIP and AdaCLIP finetune on Intel Arc A770 in README_XTUNE.md.