-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
I have been trying to use lm-evaluation-harness with gpt-neox/eval.py. AFAIK the other query types besides generate_until work fine. With generate_until, I run into this assertion check (in the position embedding module) after a couple of examples have been processed.
| assert seq_len <= self.max_seq_len |
In my testing, the model is about to generate (say) token 48. I have verified that the token_index_to_generate in gpt-neox/megatron/text_generation_utils.py is in fact 48. But somehow RotaryEmbedding is trying to create an embedding for position 1025 (beyond the model_max_length).
To Reproduce
Will fill in reproducible configs. Currently, I'm using a model with a custom config (but trained in neox) and evalauting on a QA dataset (where eval-harness uses generate_until).
Proposed solution
I suspect the issue is caused by a missing clear_cache() between batches of data. Adding model.module.clear_cache() at the start of gpt-neox/megatron/text_generation_utils.py:stream_tokens seems to fix it on my side.
I am unsure whether this is correct and if it's a complete fix. The same clear_cache operation seems to be invoked in generate_samples_interactive but not in generate_samples_from_prompt.
Environment (please complete the following information):
- GPUs: 1x A6000
- Configs: https://github.com/aflah02/gpt-neox/blob/olmo-support/configs/hubble/1_1B.yml