Mxnet executor survey #4298

QiJune · 2017-09-21T08:50:06Z

Here is better for review.
We have a lot of discussions in #4031. The design of TensorFlow and Mxnet are both helpful. I make a Mxnet executor survey for reference and further discussion.

helinwang

helinwang · 2017-09-25T20:42:35Z

doc/design/mxnet_executor_survey.md

+
+- Executor里面会完成Graph的构建，包括插入backward operator/copy operator等；同时完成InferShape/InferType等，并分配内存(这里需要注意的是，当输入数据的大小发生变化时，需要重新bind得到一个新的Executor)
+
+- Executor有一个RunOps方法，在这里依次把operator的操作push到Engine中


Executor和Engine是同一个东西吗？

不是的。
Mxnet暴露给用户端的接口是Symbol，Executor的作用是对Symbol进行解析，得到Graph，并且会对Graph做若干pass的优化。同时Executor也拥有一个RunOps的接口，用来执行Graph中的每一个Op。
而Engine是比较通用的分析数据依赖的引擎，主要作用是对Op的data dependency进行分析，可以做一定的并行优化。Mxnet的Engine设计的比较通用而独立，可以应用在别的需要分析data dependency的场景下。

明白了！
感觉Executor（或者重命名为converter）做Graph优化，Engine运行Graph的设计很合理。
我认为Engine直接运行优化后的Graph就好。Executor在这里更像是做优化的，觉得Mxnet让Executor有RunOps的接口不是很合理，让Executor和Engine耦合在一起了。

Executor的RunOps的接口只是顺序的把Graph中的Op push到Engine中，Op在Engine中是异步执行的，因此这里实际上耦合也并不大。Engine并没有Run的接口，只负责对push进来的操作进行data dependency分析，对满足依赖的Op发起执行。

helinwang · 2017-09-25T20:46:34Z

doc/design/mxnet_executor_survey.md

+
+Mxnet对输入数据，参数的操作接口暴露是命令式的，非常清晰，容易理解
+
+- 输入数据加载，参数初始化/加载/存储，本质上是对变量的set/load/save操作，直接操作比使用Operator更加简便


确实更简便。但是如果不把训练核心用OP表示，不利于分布式训练的调度。比如如果参数初始化不是一个OP，是不是每个node都要对同一个参数做一次初始化呢？如果是OP的话，就可以把这个OP分配给某一个node去做，只有一个node对参数做初始化。

Mxnet对于参数是统一管理的，有一个kvstore来存储，也提供分布式的版本。这里不是很理解某一个node做参数初始化这句话。kvstore对参数的操作还是非常友好的，可以直接使用set/load/save命令式编程的指令。

多机训练的时候需要指定一台机器做初始化，如果初始化用OP来表示，则可以有一个调度系统来指定某一个node运行初始化的OP。如果初始化不用OP来表示，则程序需要一些"hack"的方法来找一个node来做初始化，比如说trainerID=="0"的node来做初始化。

确实是的。当然我们的调度系统也可以支持，默认trainerID=0的节点来负责初始化，这种naive的调度策略。

@QiJune 有调度系统之后就不需要找trainerID＝0了，随便指定一个trainer即可。在分布式系统中如果不特殊处理的话，无法确定哪个节点是trainerID＝0的节点（每个节点同等看待）。

helinwang · 2017-09-25T20:48:31Z

doc/design/mxnet_executor_survey.md

+Mxnet对输入数据，参数的操作接口暴露是命令式的，非常清晰，容易理解
+
+- 输入数据加载，参数初始化/加载/存储，本质上是对变量的set/load/save操作，直接操作比使用Operator更加简便
+- 参数的更新，本质上是对变量读取之后，进行计算，然后再assign的操作，使用的是同一个内存；如果作为Operator加入到Graph中，会带来环，不利于优化


即使参数更新是OP，也不一定会有环，要看graph是如何定义的。我们的graph确实有环，但是TF的就没有：TF的参数OP只有输出，没有输入，所以没有环。输出的是参数的handle（比如内存指针），而不是参数内容。

是的，Graph有环还是会带来问题吧，普遍都会避免产生环。其实参数更新的逻辑跟神经网络的forward/backward还是有一些不同的。
不管怎样设计，这里需要着重注意的一个问题是，如何保证计算与通信相互掩盖。参数更新其实主要是通信开销，神经网络的backward与参数的update完全是可以逐层并行来做的。

“计算与通信相互掩盖”由scheduler来做即可。

helinwang · 2017-09-25T21:03:29Z

doc/design/mxnet_executor_survey.md

+
+    self._params_dirty = True
+    if self._update_on_kvstore:
+        _update_params_on_kvstore(self._exec_group.param_arrays,


好奇以下几点：

单机多卡时kvstore存储参数的地方是GPU内存还是CPU内存。

kvstore本身进行参数运算，还是有一个负责运算的设备把参数从kvstore读出来，计算更新后的参数，存回去，然后其他设备再读。如果是后者，是一个设备计算所有的参数更新，还是分配到了不同的设备上。

单机多卡提供多种方式，可以选择在CPU内存，也可以是GPU显存。

kvstore存储的是模型的全局参数，每个设备也有自己的本地参数。如果是在CPU内存上，每个设备上的数据产生的梯度先拷贝CPU上，然后在CPU上做梯度的求平均，然后再分发给每个设备。在GPU显存上的情况有待进一步阅读源码考证。

如果是在CPU内存上，每个设备上的数据产生的梯度先拷贝CPU上，然后在CPU上做梯度的求平均，然后再分发给每个设备

如果多机所有的梯度都拷贝到同一个机器上做优化，会带来网络吞吐瓶颈。

单机多卡这样做梯度聚合还是可以接受的。目前还没有看多机怎么实现的。

helinwang · 2017-09-25T21:07:46Z

doc/design/mxnet_executor_survey.md

+```
+
+
+4. C++端的executor构造一个图，在NNVM里面专门有一个place_device的pass，来对Graph中的每个节点进行遍历，设置device信息；如果发现跨设备，则插入copy operator


有意思，好奇数据读取是怎分配到特定机器上的？

没太理解这句话。
需要说明的是，mxnet目前只支持单机上的跨设备，不支持多机跨设备。

我的意思是多机模型并行。此时并不是所有的机器都需要读数据的，好奇Mxnet如果支持的话，是怎么做到指定哪些机器读数据，哪些机器不读数据的。

dzhwinter · 2017-09-25T22:25:22Z

doc/design/mxnet_executor_survey.md

+
+- Symbol书写完毕后，会bind到一个Executor上
+
+- Executor里面会完成Graph的构建，包括插入backward operator/copy operator等；同时完成InferShape/InferType等，并分配内存(这里需要注意的是，当输入数据的大小发生变化时，需要重新bind得到一个新的Executor)


当输入数据的大小发生变化时，需要重新bind得到一个新的Executor

我们当前的设计和symbol不太合适，在mxnet中symbol等同于expression，每个node是包含有全部的graph信息的。

* Symbol acts as an interface for building graphs from different components * like Variable, Functor and Group. Symbol is also exported to python front-end * (while Graph is not) to enable quick test and deployment. Conceptually, * symbol is the final operation of a graph and thus including all the information * required (the graph) to evaluate its output value.

是的，所以要尽快搞清楚，我们的Graph是怎么定义的，高层的python api接口是怎么配置网络的。

dzhwinter · 2017-09-25T22:43:36Z

doc/design/mxnet_executor_survey.md

+
+4. update
+
+Module中提供了update方法，来负责优化参数，更新存储在kvstore上的参数：


Module只是为了实现symbolic这套接口和以往接口兼容的替代品。
这里存在两点问题，1、kvstore的抽象对sparse不友好。2、将更新参数的方法做成updater，不利于用户定制优化算法。

下面是mxnet对应的更新参数的两种方法，分别对应了在ps上和本地更新参数。

def _update_params_on_kvstore(param_arrays, grad_arrays, kvstore, param_names): """Perform update of param_arrays from grad_arrays on kvstore.""" for index, pair in enumerate(zip(param_arrays, grad_arrays)): arg_list, grad_list = pair if grad_list[0] is None: continue name = param_names[index] # push gradient, priority is negative index kvstore.push(name, grad_list, priority=-index) # pull back the weights kvstore.pull(name, arg_list, priority=-index) def _update_params(param_arrays, grad_arrays, updater, num_device, kvstore=None, param_names=None): """Perform update of param_arrays from grad_arrays not on kvstore.""" for i, pair in enumerate(zip(param_arrays, grad_arrays)): arg_list, grad_list = pair if grad_list[0] is None: continue index = i if kvstore: name = param_names[index] # push gradient, priority is negative index kvstore.push(name, grad_list, priority=-index) # pull back the sum gradients, to the same locations. kvstore.pull(name, grad_list, priority=-index) for k, p in enumerate(zip(arg_list, grad_list)): # faked an index here, to make optimizer create diff # state for the same index but on diff devs, TODO(mli) # use a better solution later w, g = p updater(index*num_device+k, g, w)

可以深入思考一下，感觉参数更新用Op来描述的话，sparse更新和用户自定义更新算法是不是也会不方便呢

@QiJune 如果用OP来描述，用户可以随意定制，converter自动分配相关OP到负责的pserver的worker上。

luotao1 · 2019-02-01T05:49:12Z

感谢您给PaddlePaddle贡献文档。由于文档已迁移至FluidDoc repo，因此关闭您的PR，欢迎您向FluidDoc Repo贡献文档。
Thanks for contributing to PaddlePaddle! Since documents have been moved to FluidDoc repo, we close this PR. Welcome to contribute to FluidDoc repo.

QiJune added 2 commits September 21, 2017 16:46

add mxnet executor survey

7ddae01

fix syntax

50513f3

QiJune requested review from Superjomn, dzhwinter, helinwang, reyoung and wangkuiyi September 21, 2017 08:50

fix typo

66477b5

helinwang reviewed Sep 25, 2017

View reviewed changes

dzhwinter reviewed Sep 25, 2017

View reviewed changes

luotao1 closed this Feb 1, 2019


		- Executor里面会完成Graph的构建，包括插入backward operator/copy operator等；同时完成InferShape/InferType等，并分配内存(这里需要注意的是，当输入数据的大小发生变化时，需要重新bind得到一个新的Executor)

		- Executor有一个RunOps方法，在这里依次把operator的操作push到Engine中


		Mxnet对输入数据，参数的操作接口暴露是命令式的，非常清晰，容易理解

		- 输入数据加载，参数初始化/加载/存储，本质上是对变量的set/load/save操作，直接操作比使用Operator更加简便

		```


		4. C++端的executor构造一个图，在NNVM里面专门有一个place_device的pass，来对Graph中的每个节点进行遍历，设置device信息；如果发现跨设备，则插入copy operator


		- Symbol书写完毕后，会bind到一个Executor上

		- Executor里面会完成Graph的构建，包括插入backward operator/copy operator等；同时完成InferShape/InferType等，并分配内存(这里需要注意的是，当输入数据的大小发生变化时，需要重新bind得到一个新的Executor)


		4. update

		Module中提供了update方法，来负责优化参数，更新存储在kvstore上的参数：

Mxnet executor survey #4298

Mxnet executor survey #4298

Uh oh!

Conversation

QiJune commented Sep 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

helinwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiJune Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

helinwang Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luotao1 commented Feb 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

QiJune commented Sep 21, 2017 •

edited

Loading

QiJune Sep 25, 2017 •

edited

Loading

helinwang Sep 25, 2017 •

edited

Loading