Conversation
lcy-seso
left a comment
There was a problem hiding this comment.
This design is clear to me, thank you.
| with whl.loop(exec_immediately=True): | ||
| time_ = whl.loop_var(time) | ||
| to_stop = pd.less_than(time_, num_steps) | ||
| whl.break_if(to_stop) |
There was a problem hiding this comment.
The design doc (the dynamic RNN part, only this with still cannot support beam search) is clear to me. Thanks for the doc.
I just want to talk about some of my understanding, maybe you can help me to check it.
The least required features for dynamic RNN training.
For a no padding dynamic RNN training (generation will be more complicated), I think three components are the least requirements:
-
TensorArrayas input and output, which sorts the input batch, and re-orders the sorted output to its original order. This helps to achieve a no-padding RNN. -
A user-defined step function ( a sub-graph) which describes computations a RNN performs in a single time step.
- the step function is a required parameter to the
while_loopoperator. - if the step function takes
previous_stateas its input, then it is a recurrent unit; otherwise it acts likemapfunction in python.
- the step function is a required parameter to the
-
The
while_loopoperator.- I guess the
while_loopis very like an executor? - if the step function does not takes
previous_stateas its input,while_loopjust apply a function to every item in aTensorArray, and returns aTensorArray. - For a dynamic RNN forward pass, first, the step function is iteratively executed over the entire input
TensorArray; second, execute the condition check to determine whether to stop expanding the step function (a graph); - The framework takes the responsibility to construct the backward graph based on the expansion step of forwarding computations.
- I guess the
something I haven't thought very carefully yet.
-
As I understand, the
while_loopis a dynamic operator, am I right?- This means the
while_loopaccepts an iterable data input (theTensorArrary), it dynamically iterates over the input rather than pre-expand the entire graph (just an expanded feed-forward network)
- This means the
-
What will happen if the
while_loopoperator be nested for two times, or even more than that (for short-time goals, likeRecurrentLayerGroupfor nested sequence)? -
Can two
TensorArrayworks together, for example, oneTensorArraypacks a sequence and returns a indices map, and other (more than one)TensorArraypacks other sequences by using this indices map. This is useful for attention and NTM models.
For beam generation, and even beam training
I think one of the most difficult things in beam search is we have to dynamically construct the beam in every time step, this involves operators like scatter/gather/k max score/sequence trim, and so one.
| # whl.loop(), whl() will be called | ||
| with whl.loop(exec_immediately=True): | ||
| time_ = whl.loop_var(time) | ||
| to_stop = pd.less_than(time_, num_steps) |
There was a problem hiding this comment.
iterating over the input sequence can be a default condition check for dynamic RNN training/testing.
For beam training/generation more complicated condition is required.
No description provided.