Fix docs of #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

nemonameless merged 2 commits into PaddlePaddle:develop from nemonameless:fix_docs_of

Oct 18, 2024

paddlemix/examples/internvl2/README.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -22,7 +22,7 @@ python paddlemix/examples/internvl2/chat_demo.py \ @@
         --text "Please describe this image in detail."
     ```
     可配置参数说明：
-      * `model_name_or_path`: 指定 internvl2 的模型名字或权重路径以及tokenizer组件，默认 OpenGVLab/InternVL2-8B
+      * `model_name_or_path`: 指定 internvl2 的模型名字或权重路径以及tokenizer组件，默认 OpenGVLab/InternVL2-8B，也可选择 OpenGVLab/InternVL2-2B
       * `image_path`: 指定图片路径
       * `text`: 用户指令, 例如 "Please describe this image in detail."
@@ Expand All / @@ -34,7 +34,7 @@ python paddlemix/examples/internvl2/chat_demo_video.py \ @@
         --text "Please describe this video in detail."
     ```
     可配置参数说明：
-      * `model_name_or_path`: 指定 internvl2 的模型名字或权重路径以及tokenizer组件，默认 OpenGVLab/InternVL2-8B
+      * `model_name_or_path`: 指定 internvl2 的模型名字或权重路径以及tokenizer组件，默认 OpenGVLab/InternVL2-8B，也可选择 OpenGVLab/InternVL2-2B
       * `video_path`: 指定视频路径
       * `text`: 用户指令, 例如 "Please describe this video in detail."
@@ Expand All @@
     PaddleMIX团队整理后的下载链接为：
     ```
-    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground.tar
+    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground.tar # 50G
+    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/LLaVA/LLaVA-SFT.tar # 116G
     ```
+    下载后可解压或软链接在 PaddleMIX/ 目录下。
     PaddleMIX团队也提供了其中单独的`chartqa`数据集的下载链接，作为训练示例：
     ```
     wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground/data/chartqa.tar
@@ Expand All @@
     ### 4.2 微调命令
+    注意：此微调训练为全参数微调，冻结视觉编码器而放开LLM训练，2B模型微调训练的显存大小约为40G，8B模型微调训练的显存大小约为80G。
     ```bash
     # 1B
     sh paddlemix/examples/internvl2/shell/internvl2.0/2nd_finetune/internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_full.sh
@@ Expand Down @@

...2/shell/internvl2.0/2nd_finetune/internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_full.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,7 +11,7 @@ export PYTHONPATH="${PYTHONPATH}:$(pwd)" @@
     export MASTER_PORT=34229
     export TF_CPP_MIN_LOG_LEVEL=3
-    OUTPUT_DIR='work_dirs/internvl_chat_v2_0/internvl2_1b_qwen2_0_5b_dynamic_res_2nd_finetune_full'
+    OUTPUT_DIR='work_dirs/internvl2-1B'
     if [ ! -d "$OUTPUT_DIR" ]; then
       mkdir -p "$OUTPUT_DIR"
@@ Expand Down @@

...ell/internvl2.0/2nd_finetune/internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,7 +11,7 @@ export PYTHONPATH="${PYTHONPATH}:$(pwd)" @@
     export MASTER_PORT=34229
     export TF_CPP_MIN_LOG_LEVEL=3
-    OUTPUT_DIR='work_dirs/internvl_chat_v2_0/internvl2_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full'
+    OUTPUT_DIR='work_dirs/internvl2-2B'
     if [ ! -d "$OUTPUT_DIR" ]; then
       mkdir -p "$OUTPUT_DIR"
@@ Expand Down @@

...shell/internvl2.0/2nd_finetune/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_full.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,7 +11,7 @@ export PYTHONPATH="${PYTHONPATH}:$(pwd)" @@
     export MASTER_PORT=34229
     export TF_CPP_MIN_LOG_LEVEL=3
-    OUTPUT_DIR='work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_full'
+    OUTPUT_DIR='work_dirs/internvl2-8B'
     if [ ! -d "$OUTPUT_DIR" ]; then
       mkdir -p "$OUTPUT_DIR"
@@ Expand Down @@

paddlemix/examples/minimonkey/README.md

-Original file line number
+Diff line change
@@ Expand Up @@
     ## 4 模型微调
-    SFT数据集采用 InternVL2 官方公布的1.3M的SFT数据集中的`llava_instruct_150k_zh`、`dvqa`、`chartqa`、`ai2d`、`docvqa`、`geoqa+`、`synthdog_en`共7个。
+    ### 4.1 微调数据准备
+    SFT数据集采用 InternVL2 官方公布的1.3M的SFT数据集中的`dvqa`、`chartqa`、`ai2d`、`docvqa`、`geoqa+`、`synthdog_en`共6个。
     PaddleMIX团队整理后的下载链接为：
     ```
-    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground.tar
+    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground.tar # 50G
     ```
+    下载后可解压或软链接在 PaddleMIX/ 目录下。
     PaddleMIX团队也提供了其中单独的`chartqa`数据集的下载链接，作为训练示例：
     ```
     wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground/data/chartqa.tar
     wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground/opensource.tar
     ```
     chartqa.tar需下载解压在playground/data/目录下，opensource.tar需下载解压在playground/目录下，opensource里是数据标注的jsonl文件。
+    ### 4.2 微调命令
+    注意：此微调训练为全参数微调，冻结视觉编码器而放开LLM训练，2B模型微调训练的显存大小约为40G。
     ```bash
     sh paddlemix/examples/minimonkey/shell/internvl2.0/2nd_finetune/minimonkey_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh
     ```
@@ Expand Down @@

paddlemix/examples/minimonkey/shell/data/minimonkey_finetune.json

-Original file line number
+Diff line change
@@ -1,11 +1,4 @@
     {
-      "llava_instruct_150k_zh": {
-        "root": "playground/data/coco/",
-        "annotation": "playground/opensource/llava_instruct_150k_zh.jsonl",
-        "data_augment": false,
-        "repeat_time": 1,
-        "length": 157712
-      },
       "dvqa_train_200k": {
         "root": "playground/data/dvqa/",
         "annotation": "playground/opensource/dvqa_train_200k.jsonl",
@@ Expand Down @@

paddlemix/examples/minimonkey/shell/data/minimonkey_finetune_chartqa.json

-Original file line number
+Diff line change
@@ -0,0 +1,9 @@
+    {
+      "chartqa_train_18k": {
+        "root": "playground/data/chartqa/",
+        "annotation": "playground/opensource/chartqa_train_18k.jsonl",
+        "data_augment": false,
+        "repeat_time": 1,
+        "length": 18317
+      }
+    }

...ll/internvl2.0/2nd_finetune/minimonkey_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full.sh

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -11,7 +11,7 @@ export PYTHONPATH="${PYTHONPATH}:$(pwd)"
  
    export MASTER_PORT=34229

    export TF_CPP_MIN_LOG_LEVEL=3

    OUTPUT_DIR='work_dirs/minimonkey_2b_internlm2_1_8b_dynamic_res_2nd_finetune_full'

    OUTPUT_DIR='work_dirs/minimonkey-2B'

    if [ ! -d "$OUTPUT_DIR" ]; then

      mkdir -p "$OUTPUT_DIR"

    @@ -35,7 +35,7 @@ ${TRAINING_PYTHON} --log_dir ${OUTPUT_DIR}/paddle_distributed_logs \
  
      --conv_style "internlm2-chat" \

      --output_dir ${OUTPUT_DIR} \

      --logging_dir ${OUTPUT_DIR}/logs \

      --meta_path "paddlemix/examples/minimonkey/shell/data/minimonkey_finetune.json" \

      --meta_path "paddlemix/examples/minimonkey/shell/data/minimonkey_finetune_chartqa.json" \

      --overwrite_output_dir True \

      --force_image_size 448 \

      --max_dynamic_patch 12 \

paddlemix/examples/qwen2_vl/README.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -44,14 +44,14 @@ SFT数据集选择6个公开的数据集，包括`dvqa`、`chartqa`、`ai2d`、`
  
    PaddleMIX团队整理后的下载链接为：

    ```

    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground.tar

    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground.tar # 50G

    wget https://paddlenlp.bj.bcebos.com/datasets/paddlemix/playground/opensource_json.tar

    ```

    opensource_json.tar需下载解压在playground/目录下，opensource里是数据标注的jsonl文件。

    opensource_json.tar需下载解压在playground/目录下，opensource_json 里是数据标注的json格式文件。

    ### 4.2 微调命令

    注意：此微调训练为冻结视觉编码器而放开LLM训练的，2B模型微调训练的显存大小约为30G，7B模型微调训练的显存大小约为75G。

    注意：此微调训练为全参数微调，冻结视觉编码器而放开LLM训练，2B模型微调训练的显存大小约为30G，7B模型微调训练的显存大小约为75G。

    ```bash

    # 2B

paddlemix/examples/qwen2_vl/configs/add_llavaov_doc_ocr.json

This file was deleted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix docs of #767

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!