Conversation
Codecov Report
@@ Coverage Diff @@
## master #1503 +/- ##
==========================================
- Coverage 85.87% 83.31% -2.56%
==========================================
Files 52 55 +3
Lines 6909 7517 +608
==========================================
+ Hits 5933 6263 +330
- Misses 976 1254 +278
Continue to review full report at Codecov.
|
| @@ -43,7 +44,7 @@ Some metrics for the unconditional generated text | |||
| | topk=40 | 0.4291 | 0.9666 | 0.0 | | |||
There was a problem hiding this comment.
I think previously we have the results of t=0.9, we should remove that row.
| mx.np.zeros_like(probs) | ||
| ) | ||
| # choose the borderline prob | ||
| p_prob = mx.np.min(masked_probs, axis=2, keepdims=True) |
There was a problem hiding this comment.
Is it possible to use exactly the same implementation as https://gist.github.com/thomwolf/1a5a29f6962089e871b94cbd09daf317?
There was a problem hiding this comment.
I'm referring to the part in which they choose not to mask the top-1 probability:
sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
There was a problem hiding this comment.
Sorry for the confusion. I see that both sort and argsort are implemented but I don't see a way to get both values and indices in one call. The usage of topk(k=-1) that assumes the return values to be sorted seems to be undocumented, which is a bit of a concern.
|
@szha Would you help review? |
|
@szha Would you take a look? |
| probs >= p_prob, | ||
| probs, | ||
| mx.np.zeros_like(probs) | ||
| ) |
There was a problem hiding this comment.
The major difference between the current implementation and the original pytorch-based implementation is that when sampling_topp < max(probs), it is not clear which probability will be picked.
The pytorch-based implementation will always choose the token that is most probable.
Description
fix bug of top-p sampling mentioned in issue
Checklist
Essentials
cc @dmlc/gluon-nlp-team