Incorporate cudnn_lstm into LSTM api#27217
Conversation
test=develop
|
Thanks for your contribution! |
| params = self.parameters() | ||
| shape = [np.prod(param.shape) for param in params] | ||
| # for static-graph, append coalesce_tensor into startup program | ||
| with fluid.program_guard(fluid.default_startup_program(), |
There was a problem hiding this comment.
- coalsectensor 拼接大 tensor 目前不保证内存连续,可能需要修改。
- 如果 cudnn lstm 完成了小 weight list 的调用方式,那么是否还必须在 python 端持有一个 fused tensor?(可能需要测试一下?)
There was a problem hiding this comment.
Thanks. Done
为coalsec_tensor_op增加use_align属性,默认使用原有的保留小tensor的完整chunk,可选的在合并时小tensor连续存放。
| attrs={"copy_data": True, | ||
| "dtype": params[0].dtype}) | ||
|
|
||
| def _cudnn_impl(self, inputs, initial_states, sequence_length): |
There was a problem hiding this comment.
一个小建议:等 cudnn_lstm_op 实现了 python 接口(functional 接口)之后改调用那个接口。
There was a problem hiding this comment.
Thanks. 这个会在cudnn_lstm_op支持functional后调整。
| self.time_major = time_major | ||
| self.num_layers = num_layers | ||
| self.state_components = 1 | ||
| if activation == "tanh": |
There was a problem hiding this comment.
建议把三个改 RNN 网络(SimpleRNN, GRU, LSTM) init 的内容统一成一个,放到 RNNMixin 里面。因为目前这三个类的 init 方法基本是相同的,只要用 cls 区分,稍作修改可能就可以。
RNNMixin 如有必要也可以改名 RNNBase, 因为它目前并不公开。
There was a problem hiding this comment.
Thanks.
RNNMixin 改名 RNNBase 并调整了SimpleRNN, GRU, LSTM对RNNBase的调用
test=develop
Add optional init_h and init_c gradient for cudnn_lstm_op. test=develop
test=develop
…_program. test=develop
test=develop
test=develop
|
测试代码 import paddle
import paddle.fluid as fluid
import numpy as np
paddle.disable_static()
paddle.manual_seed(123)
x = paddle.randn((4, 10, 16))
x.stop_gradient = False
prev_h = paddle.randn((4, 4, 32))
prev_c = paddle.randn((4, 4, 32))
seq_len = paddle.to_tensor(np.array([10,6,8,5]))
mask = fluid.layers.sequence_mask(seq_len, maxlen=10, dtype=prev_h.dtype)
mask = paddle.unsqueeze(mask, [2])
rnn = paddle.nn.LSTM(16, 32, 2, direction="bidirectional")
y, (h, c) = rnn(x, (prev_h, prev_c), seq_len)
y = y * mask
loss = paddle.mean(y)
loss.backward()
optimizer = paddle.optimizer.Adam(learning_rate=0.1, parameters=rnn.parameters())
optimizer.step()
print(rnn[0].cell_fw.weight_hh)多卡 import paddle
import paddle.fluid as fluid
import paddle.distributed as dist
import numpy as np
def train():
paddle.disable_static()
paddle.distributed.init_parallel_env()
paddle.manual_seed(123)
x = paddle.randn((4, 10, 16))
x.stop_gradient = False
prev_h = paddle.randn((4, 4, 32))
prev_c = paddle.randn((4, 4, 32))
seq_len = paddle.to_tensor(np.array([10,6,8,5]))
mask = fluid.layers.sequence_mask(seq_len, maxlen=10, dtype=prev_h.dtype)
mask = paddle.unsqueeze(mask, [2])
rnn = paddle.nn.LSTM(16, 32, 2, direction="bidirectional")#, dropout=0.0)
dp_layer = paddle.DataParallel(rnn)
y, (h, c) = dp_layer(x, (prev_h, prev_c), seq_len)
y = y * mask
loss = paddle.mean(y)
loss = dp_layer.scale_loss(loss)
loss.backward()
dp_layer.apply_collective_grads()
optimizer = paddle.optimizer.Adam(learning_rate=0.1, parameters=rnn.parameters())
optimizer.step()
print(dp_layer._layers[0].cell_fw.weight_hh)
if __name__ == '__main__':
dist.spawn(train, nprocs=2)可以通过修改python api 保存预测: import paddle
import paddle.fluid as fluid
import paddle.distributed as dist
import numpy as np
class Net(paddle.nn.Layer):
def __init__(self):
super(Net, self).__init__()
self.rnn1 = paddle.nn.LSTM(
16, 32, 2, direction="bidirectional", dropout=0.1)
def forward(self, input):
return self.rnn1(input)
def train():
paddle.disable_static()
paddle.distributed.init_parallel_env()
paddle.manual_seed(123)
np.random.seed(123)
x_np = np.random.rand(4, 10, 16).astype("float32")
x = paddle.randn((4, 10, 16))
x = paddle.to_tensor(x_np)
x.stop_gradient = False
prev_h = paddle.randn((4, 4, 32))
prev_c = paddle.randn((4, 4, 32))
seq_len = paddle.to_tensor(np.array([10,6,8,5]))
mask = fluid.layers.sequence_mask(seq_len, maxlen=10, dtype=prev_h.dtype)
mask = paddle.unsqueeze(mask, [2])
rnn = Net()
dp_layer = paddle.DataParallel(rnn)
y, (h, c) = dp_layer(x)
y = y * mask
loss = paddle.mean(y)
loss = dp_layer.scale_loss(loss)
loss.backward()
dp_layer.apply_collective_grads()
optimizer = paddle.optimizer.Adam(learning_rate=0.1, parameters=rnn.parameters())
optimizer.step()
dp_layer.eval()
y, (h, c) = dp_layer(x)
print(y)
dp_layer.train()
if dist.get_rank() == 0:
rnn = paddle.jit.to_static(rnn, [paddle.static.InputSpec(shape=[None, None, 16])])
print(rnn.forward.concrete_program.main_program)
paddle.jit.save(rnn, "./infer")
paddle.enable_static()
place = fluid.CPUPlace() if not fluid.is_compiled_with_cuda(
) else fluid.CUDAPlace(0)
new_scope = fluid.Scope()
with fluid.scope_guard(new_scope):
exe = fluid.Executor(place)
[inference_program, feed_target_names, fetch_targets] = (
fluid.io.load_inference_model(
dirname="./", executor=exe, model_filename="infer.pdmodel", params_filename="infer.pdiparams"))
results = exe.run(inference_program,
feed={feed_target_names[0]: x_np.astype("float32")},
fetch_list=fetch_targets)
print(results)
print(y.numpy() == results[0]) # eval与infer结果相同
if __name__ == '__main__':
dist.spawn(train, nprocs=2) |
|
jzhang533
left a comment
There was a problem hiding this comment.
- SimpleRNN & SimpleRNNCell 有
activation参数,需要相应的更新文档当中的公式和说明。 num_layers=1, activation="tanh", direction="forward", dropout=0., time_major=False,这几个参数的顺序建议做一下调整:num_layers, direction, time_major, dropout, activation这样的顺序,可能会好一些,前三个都是跟shape相关的,放一起容易理解; activation是SimpleRNN独有的,放最后比较合适,同时,文档中可以建议用户用keyword arguments调用;- direction='bidirectional` 时,我们只有concat这一种merge的方式,建议在这个参数的说明里说明一下。
- 1165 行有typo
ZeyuChen
left a comment
There was a problem hiding this comment.
对cudnn_lstm的Kernel名称需要讨论,避免将来cpu和gpu kernel部署的统一造成困扰。
| } | ||
|
|
||
| self._helper.append_op( | ||
| type="cudnn_lstm", inputs=inputs, outputs=outputs, attrs=attrs) |
There was a problem hiding this comment.
如果这里op名字限制为cudnn_lstm, 对将来cpu kernel名字要统一是否会带来困扰,毕竟cudnn不是cpu关键字。
用lstm_v2之类的是否更合适?
test=develop
test=develop
test=develop
@jzhang533 Done, thanks. |
c5b4be9 to
0f8a455
Compare
test=develop
3315512 to
8ec21a0
Compare
…tatic graph usage. test=develop
test=develop
test=develop
test=develop
zhiqiu
left a comment
There was a problem hiding this comment.
flatten_parameters can add dygraph branch in the future.
* Incorporate cudnn_lstm into LSTM api. test=develop * Make coalesce_tensor support alignment optionally. test=develop * Reorganize RNN apis. test=develop * Fix cudnn rnn layout conversion. test=develop * Add sequence_length support for RNN cudnn implement. Add optional init_h and init_c gradient for cudnn_lstm_op. test=develop * Use create_parameter for rnn cudnn impl. test=develop * Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program. test=develop * Update RNN api unittest to use set_device. test=develop * Fix set_place for unit tests of RNN apis. test=develop * Fix use_align in coalesce_tensor_op. test=develop * Adjust RNN apis arguments according to comments. test=develop * Polish documents for SimpleRNN apis. test=develop * Refine random seed in cudnn_lstm_op. Expose rnn params from sublayers to RNN. test=develop * Fix RNN saving for jit.save. Refine cudnn_lstm dropout behavior. test=develop * Fix doc of GRU. test=develop * Use ShareDataWith to avoid copying for cudnn_lstm_op test. test=develop * Remove updates on cudnn_lstm temporarily. test=develop * Use ShareDataWith to avoid copying for cudnn_lstm_op test. test=develop * Refine random seed in cudnn_lstm_op. test=develop * Fix test_lstm by adjust ConcreteProgram buffer getter. test=develop * Use create_parameter instead of create_var for rnn._flat_weight for static graph usage. test=develop * Remove W input for cudnn_lstm to pass unused_var_check. test=develop * Add test_predict for RNN unit tests coverage. test=develop * Fix code style of rnn. test=develop * Fix F.rnn usage in rnn.py. test=develop
* Incorporate cudnn_lstm into LSTM api (#27217) * Incorporate cudnn_lstm into LSTM api. test=develop * Make coalesce_tensor support alignment optionally. test=develop * Reorganize RNN apis. test=develop * Fix cudnn rnn layout conversion. test=develop * Add sequence_length support for RNN cudnn implement. Add optional init_h and init_c gradient for cudnn_lstm_op. test=develop * Use create_parameter for rnn cudnn impl. test=develop * Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program. test=develop * Update RNN api unittest to use set_device. test=develop * Fix set_place for unit tests of RNN apis. test=develop * Fix use_align in coalesce_tensor_op. test=develop * Adjust RNN apis arguments according to comments. test=develop * Polish documents for SimpleRNN apis. test=develop * Refine random seed in cudnn_lstm_op. Expose rnn params from sublayers to RNN. test=develop * Fix RNN saving for jit.save. Refine cudnn_lstm dropout behavior. test=develop * Fix doc of GRU. test=develop * Use ShareDataWith to avoid copying for cudnn_lstm_op test. test=develop * Remove updates on cudnn_lstm temporarily. test=develop * Use ShareDataWith to avoid copying for cudnn_lstm_op test. test=develop * Refine random seed in cudnn_lstm_op. test=develop * Fix test_lstm by adjust ConcreteProgram buffer getter. test=develop * Use create_parameter instead of create_var for rnn._flat_weight for static graph usage. test=develop * Remove W input for cudnn_lstm to pass unused_var_check. test=develop * Add test_predict for RNN unit tests coverage. test=develop * Fix code style of rnn. test=develop * Fix F.rnn usage in rnn.py. test=develop * Fix test_lstm unittest failed and Add more unittest (#28029) * fix test_lstm unittest failed * add more unittest * modify cmakelist * fix judgement Co-authored-by: Aurelius84 <[email protected]>
PR types
New features
PR changes
Others
Describe
Incorporate cudnn_lstm into LSTM api
RNNMixin修改为RNNBase,将RNNCell中的param暴露在RNNBase中