Skip to content

Commit 177bf52

Browse files
authored
test=document_fix (#35824)
1 parent 2192193 commit 177bf52

File tree

1 file changed

+24
-25
lines changed

1 file changed

+24
-25
lines changed

python/paddle/distributed/fleet/launch.py

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -400,33 +400,33 @@ def launch():
400400
401401
402402
Base Parameters:
403-
- ``--log_dir``: The path for each process's log. e.g ``--log_dir=output_dir``. Default ``--log_dir=log``.
403+
- ``--log_dir``: The path for each process's log. e.g., ``--log_dir=output_dir``. Default ``--log_dir=log``.
404404
405-
- ``--nproc_per_node``: The number of processes to launch on a node. In gpu training, it should be less or equal to the gpus number of you system(or you set by --gpus). And so each process can bound to one or average number of gpus. e.g ``--nproc_per_node=8``
405+
- ``--nproc_per_node``: The number of processes to launch on a node. In gpu training, it should be less or equal to the gpus number of you system(or you set by --gpus). e.g., ``--nproc_per_node=8``
406406
407-
- ``--run_mode``: run mode of job, can be:collective/ps/ps-heter. e.g ``--run_mode=ps``. Default ``--run_mode=collective``.
407+
- ``--run_mode``: run mode of job, can be:collective/ps/ps-heter. e.g., ``--run_mode=ps``. Default ``--run_mode=collective``.
408408
409-
- ``--gpus``: It's for gpu training. e.g ``--gpus=0,1,2,3`` will launch four training processes each bound to one gpu.
409+
- ``--gpus``: It's for gpu training. e.g., ``--gpus=0,1,2,3`` will launch four training processes each bound to one gpu.
410410
411411
- ``--selected_gpus``: gpus aliases, recommend to use ``--gpus``.
412412
413-
- ``--xpus``: It's for xpu training if xpu is available. e.g ``--xpus=0,1,2,3``.
413+
- ``--xpus``: It's for xpu training if xpu is available. e.g., ``--xpus=0,1,2,3``.
414414
415415
- ``--selected_xpus``: xpus aliases, recommend to use ``--xpus``.
416416
417-
- ``training_script``: The full path to the single GPU training program/script to be launched in parallel, followed by all the arguments for the training script. e.g ``traing.py``
417+
- ``training_script``: The full path to the single GPU training program/script to be launched in parallel, followed by all the arguments for the training script. e.g., ``traing.py``
418418
419-
- ``training_script_args``: The args of training_script. e.g ``--lr=0.1``
419+
- ``training_script_args``: The args of training_script. e.g., ``--lr=0.1``
420420
421421
Collective Parameters:
422-
- ``--ips``: Paddle cluster nodes ips, e.g ``--ips=192.168.0.16,192.168.0.17``. Default ``--ips=127.0.0.1``.
422+
- ``--ips``: Paddle cluster nodes ips, e.g., ``--ips=192.168.0.16,192.168.0.17``. Default ``--ips=127.0.0.1``.
423423
424424
Parameter-Server Parameters:
425-
- ``--servers``: User defined servers ip:port, e.g ``--servers="192.168.0.16:6170,192.168.0.17:6170"``
425+
- ``--servers``: User defined servers ip:port, e.g., ``--servers="192.168.0.16:6170,192.168.0.17:6170"``
426426
427-
- ``--workers``: User defined workers ip:port, e.g ``--workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172"``
427+
- ``--workers``: User defined workers ip:port, e.g., ``--workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172"``
428428
429-
- ``--heter_workers``: User defined heter workers ip:port, e.g ``--heter_workers="192.168.0.16:6172,192.168.0.17:6172"``
429+
- ``--heter_workers``: User defined heter workers ip:port, e.g., ``--heter_workers="192.168.0.16:6172,192.168.0.17:6172"``
430430
431431
- ``--worker_num``: Number of workers (It recommend to set when in the emulated distributed environment using single node)
432432
@@ -437,17 +437,14 @@ def launch():
437437
- ``--http_port``: Gloo http Port
438438
439439
Elastic Parameters:
440-
- ``--elastic_server``: etcd server host:port, e.g ``--elastic_server=127.0.0.1:2379``
440+
- ``--elastic_server``: etcd server host:port, e.g., ``--elastic_server=127.0.0.1:2379``
441441
442-
- ``--job_id``: job unique id, e.g ``--job_id=job1``
442+
- ``--job_id``: job unique id, e.g., ``--job_id=job1``
443443
444-
- ``--np``: job pod/node number, e.g ``--np=2``
445-
446-
- ``--scale``: scale np, not be used now!
444+
- ``--np``: job pod/node number, e.g., ``--np=2``
447445
448446
- ``--host``: bind host, default to POD_IP env.
449447
450-
- ``--force``: update np force, not be used now!
451448
452449
Returns:
453450
``None``
@@ -456,15 +453,17 @@ def launch():
456453
.. code-block:: bash
457454
:name: code-block-example-bash1
458455
459-
# For single node training using 4 gpus
456+
# For training on single node using 4 gpus.
460457
461458
python -m paddle.distributed.launch --gpus=0,1,2,3 train.py --lr=0.01
462459
463460
Examples 2 (collective, multi node):
464461
.. code-block:: bash
465462
:name: code-block-example-bash2
466463
467-
# For multiple node training such as two node:192.168.0.16, 192.168.0.17
464+
# The parameters of --gpus and --ips must be consistent in each node.
465+
466+
# For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17
468467
469468
# On 192.168.0.16:
470469
@@ -477,15 +476,15 @@ def launch():
477476
.. code-block:: bash
478477
:name: code-block-example-bash3
479478
480-
# The emulated distributed environment using single node, 2 server and 4 worker
479+
# To simulate distributed environment using single node, e.g., 2 servers and 4 workers.
481480
482481
python -m paddle.distributed.launch --server_num=2 --worker_num=4 train.py --lr=0.01
483482
484483
Examples 4 (ps, cpu, multi node):
485484
.. code-block:: bash
486485
:name: code-block-example-bash4
487486
488-
# For multiple node training such as two node:192.168.0.16, 192.168.0.17 with 2 servers and total 4 workers
487+
# For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server and 2 workers.
489488
490489
# On 192.168.0.16:
491490
@@ -499,7 +498,7 @@ def launch():
499498
.. code-block:: bash
500499
:name: code-block-example-bash5
501500
502-
# The emulated distributed environment using single node, 2 server and 4 worker, each worker use single gpu
501+
# To simulate distributed environment using single node, e.g., 2 servers and 4 workers, each worker use single gpu.
503502
504503
export CUDA_VISIBLE_DEVICES=0,1,2,3
505504
python -m paddle.distributed.launch --server_num=2 --worker_num=4 train.py --lr=0.01
@@ -508,7 +507,7 @@ def launch():
508507
.. code-block:: bash
509508
:name: code-block-example-bash6
510509
511-
# For multiple node training such as two node:192.168.0.16, 192.168.0.17 with 2 servers and total 4 workers
510+
# For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server and 2 workers.
512511
513512
# On 192.168.0.16:
514513
@@ -524,7 +523,7 @@ def launch():
524523
.. code-block:: bash
525524
:name: code-block-example-bash7
526525
527-
# The emulated distributed environment using single node, 2 server and 4 worker, two worker use gpu, two worker use cpu
526+
# To simulate distributed environment using single node, e.g., 2 servers and 4 workers, two workers use gpu, two workers use cpu.
528527
529528
export CUDA_VISIBLE_DEVICES=0,1
530529
python -m paddle.distributed.launch --server_num=2 --worker_num=2 --heter_worker_num=2 train.py --lr=0.01
@@ -533,7 +532,7 @@ def launch():
533532
.. code-block:: bash
534533
:name: code-block-example-bash8
535534
536-
# For multiple node training such as two node:192.168.0.16, 192.168.0.17 with 2 servers and total 4 workers
535+
# For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server, 1 gpu worker, 1 cpu worker.
537536
538537
# On 192.168.0.16:
539538

0 commit comments

Comments
 (0)