Conversation
| `<dest>` is the path on the cloud. The form must be like `/pfs/$DATACENTER/home/$USER`. `$DATACENTER` is the from configuration where you setup at `~/.paddle/config` | ||
| - `paddlecloud pfs cp <local src> [<local src> ... ] <remote dest>`: Upload a file | ||
| - `paddlecloud pfs cp <remote src> [<remote src> ... ] <local dest>`: Download a file | ||
| - `paddlecloud pfs ls <remote dir>`: List files under `<remote dir>`. |
There was a problem hiding this comment.
简化的命令缺了一个
paddlecloud pfs rm <remote>
doc/client_design.md
Outdated
| | /v1/submit | POST | see [above](#client-commands) | | ||
| | /v1/jobs | GET | see [above](#client-commands) | | ||
| | /v1/quota | GET | see [above](#client-commands) | | ||
| | /v1/pfs/cp | POST | see [above](#client-commands) | |
There was a problem hiding this comment.
引用了Paddle下的design doc
doc/client_design.md
Outdated
|
|
||
| # Goals: | ||
|
|
||
| Developers using PaddlePadle Cloud can use this command-line client for conviniece of manage cloud Deep-Learning jobs, including: |
There was a problem hiding this comment.
command-line client => command-line interface
There was a problem hiding this comment.
conviniece of manage cloud Deep-Learning jobs => managering cloud Deep-Learning jobs conveniently ?
There was a problem hiding this comment.
这里有一个拼写错误convenience。command-line client的确是command-line client吧。这的确是个client而不是一个interface,interface要么是程序调用的要么是个服务端的API接口。
doc/client_design.md
Outdated
| Developers using PaddlePadle Cloud can use this command-line client for conviniece of manage cloud Deep-Learning jobs, including: | ||
|
|
||
| - Submitting a PaddlePaddle cluster training job | ||
| - List jobs that is currently running. |
doc/client_design.md
Outdated
|
|
||
| - Submitting a PaddlePaddle cluster training job | ||
| - List jobs that is currently running. | ||
| - List all job history that have been submited. |
There was a problem hiding this comment.
List each job history that has been submitted?
There was a problem hiding this comment.
需要列出所有的job不是一个一个列,所以是 all
doc/client_design.md
Outdated
| - `-m --memory`: Memory resource each trainer will use. Defaults to 1Gi. | ||
| - `-s --pservers`: Number of parameter servers. Defaults equal to `-p` | ||
| - `-u --pscpu`: Parameter server CPU resource. Defaults to 1. | ||
| - `-y --psmemory`: Parameter server momory resource. Defaults to 1Gi. |
There was a problem hiding this comment.
I think, it's better to show all suffixes, like
a plain integer using one of these suffixes: Ei, Pi, Ti, Gi, Mi, Ki
| A Sample job package may contain files like below: | ||
|
|
||
| ``` | ||
| job_word_emb/ |
There was a problem hiding this comment.
I don't think train.py and data directory in the same level is a good idea.
At the first step of paddlecloud submit, it will collecting the trainer package into a tar file, if data directory in the same level directory, we should tar -czvf --exculde ./data ... and the data directory maybe any name, so I think this one is better:
job_word_emb/
|-- module
|-- trainer.py
|-- dict1.pickle
`-- my_topo.py
`-- data
|-- train
|-- train.txt-000
...
`--test
|-- test.txt-000
...There was a problem hiding this comment.
submit的时候为什么不能上传数据呢,这样用户就不需要单独再上传一次数据了?
另外,使用tar原因如果为了压缩,可以直接使用gzip库,在读取文件的同时在内存中压缩并直接上传,没必要生成中间临时文件。
There was a problem hiding this comment.
记录一下原因: 每次submit的时候check一遍数据很慢,而且trainer.py的变化频率明显高于数据的变化频率
doc/client_design.md
Outdated
|
|
||
| ## Reference | ||
|
|
||
| - `paddlecloud submit [options] [package path]`: submit job to PaddlePaddle Cloud |
There was a problem hiding this comment.
客户端是不是可以叫pcloud or pcloudctl,而且前面的介绍也可以统一使用这个名字了。
| A Sample job package may contain files like below: | ||
|
|
||
| ``` | ||
| job_word_emb/ |
There was a problem hiding this comment.
记录一下原因: 每次submit的时候check一遍数据很慢,而且trainer.py的变化频率明显高于数据的变化频率
doc/client_design.md
Outdated
|
|
||
| ```bash | ||
| # upload training data to cloud, which may be very large | ||
| $ paddlecloud cp -r ./job_word_emb/data /pfs/datacenter1/home/user1/job_word_emb |
There was a problem hiding this comment.
上面写的是paddlecloud pfs cp ...,应该用什么呢?
另外paddlecloud的参数是不是应该按照paddlecloud [SUB_MODULE] [TYPE] [FLAGS]的格式统一呢?例如:
paddlecloud submit => paddlecloud job submit
paddlecloud logs => paddlecloud job logs
paddlecloud jobs => paddlecloud job ls
There was a problem hiding this comment.
一般subcommand 没那么多层,参考:https://github.com/google/subcommands
都是paddlecloud [subcommand] [options...] [args...]只有两层
查看subcommand的help:
paddlecloud help [subcommand]
pfs的如果有可能,直接paddlecloud cp ...也会比较好。
对比kubectl的命令,也可以看到是以subcommand为设计原则的:
Basic Commands (Beginner):
create Create a resource by filename or stdin
expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service
run Run a particular image on the cluster
set Set specific features on objects
Basic Commands (Intermediate):
get Display one or many resources
explain Documentation of resources
edit Edit a resource on the server
delete Delete resources by filenames, stdin, resources and names, or by resources and label selector
Deploy Commands:
rollout Manage a deployment rollout
rolling-update Perform a rolling update of the given ReplicationController
scale Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
autoscale Auto-scale a Deployment, ReplicaSet, or ReplicationController
...
There was a problem hiding this comment.
如果是subcommand的话,那动词看起来靠谱一些,不如pfs也统一改下? @gongweibao
Yancey0623
left a comment
There was a problem hiding this comment.
LGTM except one tiny comment!
| # upload training data to cloud, which may be very large | ||
| $ paddlecloud pfs cp -r ./job_word_emb/data /pfs/datacenter1/home/user1/job_word_emb | ||
| # submit a v1 paddle training job | ||
| $ paddlecloud submit ./job_word_emb -p 4 -c 2 -m 10Gi -t modules/train.py |
There was a problem hiding this comment.
paddlecloud submit ./job_word_emb -p 4 -c 2 -m 10Gi -t modules/train.py
=>
paddlecloud submit ./job_word_emb -p 4 -c 2 -m 10Gi -t train.py
* minor style change Signed-off-by: zhangzhengyuan <zhangzhengyuan0604@gmail.com> * style change Signed-off-by: zhangzhengyuan <zhangzhengyuan0604@gmail.com> * use stricter test in Reconcile Signed-off-by: zhangzhengyuan <zhangzhengyuan0604@gmail.com> * Revert "style change" This reverts commit ac60260dd328fb2ceb78d5fda67ff739f86f8469.
* minor style change Signed-off-by: zhangzhengyuan <zhangzhengyuan0604@gmail.com> * style change Signed-off-by: zhangzhengyuan <zhangzhengyuan0604@gmail.com> * use stricter test in Reconcile Signed-off-by: zhangzhengyuan <zhangzhengyuan0604@gmail.com> * Revert "style change" This reverts commit ac60260dd328fb2ceb78d5fda67ff739f86f8469.
Fix #16