Conversation
[WIP] Add seq2seq model for fluid.
fluid/machine_translation.py
Outdated
|
|
||
| parser = argparse.ArgumentParser(description=__doc__) | ||
| parser.add_argument( | ||
| "--word_vector_dim", |
| import distutils.util | ||
|
|
||
| import paddle.v2 as paddle | ||
| import paddle.v2.fluid as fluid |
There was a problem hiding this comment.
Benchmark as a demo code, we'd like only have
+import paddle.v2 as paddle
+import paddle.v2.fluid as fluidjust like tensorflow
import tensorflow as tf
nothing else.
fluid/machine_translation.py
Outdated
| help="The dictionary capacity. Dictionaries of source sequence and " | ||
| "target dictionary have same capacity. (default: %(default)d)") | ||
| parser.add_argument( | ||
| "--pass_number", |
fluid/machine_translation.py
Outdated
| type=str, | ||
| default='train', | ||
| choices=['train', 'infer'], | ||
| help="Do training or inference. (default: %(default)s)") |
There was a problem hiding this comment.
fluid/machine_translation.py
Outdated
| target_dict_dim, | ||
| is_generating=False, | ||
| beam_size=3, | ||
| max_length=250): |
There was a problem hiding this comment.
leave max_length, beam_size default value to argparse.
fluid/machine_translation.py
Outdated
| """Construct a seq2seq network.""" | ||
| feeding_list = ["source_sequence", "target_sequence", "label_sequence"] | ||
|
|
||
| def bi_lstm_encoder(input_seq, size): |
There was a problem hiding this comment.
Here maybe we need a notation.
the lstm unit has 4 parameters, hidden, memory_cell, ...
so need to multiply by 4.
There was a problem hiding this comment.
Add detailed comments.
fluid/machine_translation.py
Outdated
| size=size * 4, | ||
| act='tanh') | ||
| forward, _ = fluid.layers.dynamic_lstm( | ||
| input=input_forward_proj, size=size * 4) |
There was a problem hiding this comment.
double check dynamic_lstm need 4. I roughly remember it has been done inside the lstm layer.
There was a problem hiding this comment.
name a gate_size is a good idea.
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py#L27
There was a problem hiding this comment.
double check dynamic_lstm need 4. I roughly remember it has been done inside the lstm layer.
name a gate_size is a good idea.
Agree.
| default=16, | ||
| help="The sequence number of a batch data. (default: %(default)d)") | ||
| parser.add_argument( | ||
| "--dict_size", |
There was a problem hiding this comment.
This value is indicated by the dataset. should it be an argument?
There was a problem hiding this comment.
Seems this is an argument:
fluid/machine_translation.py
Outdated
| "--max_length", | ||
| type=int, | ||
| default=250, | ||
| help="The max length of sequence when doing generation. " |
fluid/machine_translation.py
Outdated
| "--batch_size", | ||
| type=int, | ||
| default=16, | ||
| help="The sequence number of a batch data. (default: %(default)d)") |
| "--encoder_size", | ||
| type=int, | ||
| default=512, | ||
| help="The size of encoder bi-rnn unit. (default: %(default)d)") |
There was a problem hiding this comment.
Thinks both are ok, but size is shorter.
| "--decoder_size", | ||
| type=int, | ||
| default=512, | ||
| help="The size of decoder rnn unit. (default: %(default)d)") |
There was a problem hiding this comment.
Thinks both are ok, but size is shorter.
fluid/machine_translation.py
Outdated
| "--use_gpu", | ||
| type=distutils.util.strtobool, | ||
| default=True, | ||
| help="Whether use gpu. (default: %(default)d)") |
|
|
||
| def lstm_decoder_with_attention(target_embedding, encoder_vec, encoder_proj, | ||
| decoder_boot, decoder_size): | ||
| def simple_attention(encoder_vec, encoder_proj, decoder_state): |
There was a problem hiding this comment.
The attention mechanism is wrong. Where is 'tanh' operation which appears in original formula?
There was a problem hiding this comment.
Didn't catch your point, why tanh is necessary for attention? There are several kind of attention mechanisms. Please refer to https://github.com/PaddlePaddle/Paddle/blob/9bfa3013891cf3da832307894acff919d6705cee/python/paddle/trainer_config_helpers/networks.py#L1400
There was a problem hiding this comment.
https://github.com/PaddlePaddle/Paddle/blob/9bfa3013891cf3da832307894acff919d6705cee/python/paddle/trainer_config_helpers/networks.py#L1473
Here, the mixed_layer performs tanh.
And for attention mechanism in Neural Machine Translation By Jointly Learning To Align and Translate, tanh is used. Is this what you want to realize?
There was a problem hiding this comment.
Why do you think it's wrong to apply linear activation ?
There was a problem hiding this comment.
To keep consistent, will apply tanh in next PR. Thanks.
|
|
||
| fetch_outs = exe.run( | ||
| inference_program, | ||
| feed=dict(zip(*[feeding_list, (src_seq, trg_seq, lbl_seq)])), |
There was a problem hiding this comment.
Please fix this issue.
even dict is better than *zip() function sugar.
Resolves #55
Resolves #22