from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m")
input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")
model.eval()
with torch.no_grad():
logits = model(input_ids).logits
print(logits)
print(torch.topk(logits, k = 5))`
This is my code and the output is

For no other model do the logit values get this large. The 410m model has maximum values of ~10. I was wondering if there is a bug in the way logits are computed?