Headers report lower rate limits than expected? #176899
              
                Unanswered
              
          
                  
                    
                      rabo-unumed
                    
                  
                
                  asked this question in
                Models
              
            Replies: 2 comments
        
          
            
              This comment was marked as off-topic.
            
          
            
        
      
    
            
              This comment was marked as off-topic.
            
          
            
        -
| Did some experimentation, and the token limit definitely seems to be per minute. | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Select Topic Area
Question
Body
I also brought it up here, https://github.com/Azure/github-models/issues/22, but thought it might be good to also reach out on the boards since it's more of a general question about rate limits.
I am using gpt-4.1-mini through the Github Models organization inference endpoint. We have activated paid usage.
I keep hitting rate limit errors, and from the header info I can retrieve, it appears I only have 150,000 tokens (per minute, presumably)?
x-ratelimit-limit-tokens': '150000'I cannot find any mention of that being the official rate limit, neither here https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits?tabs=REST nor here https://docs.github.com/en/github-models/use-github-models/prototyping-with-ai-models#rate-limits
Are the listings outdated? Am I misinterpreting the header limit? Is it the limit for a smaller time unit than a minute?
Tried gpt-5-mini as well, and got a larger rate limit of 'x-ratelimit-limit-tokens': '500000', which does not make me less confused.
Beta Was this translation helpful? Give feedback.
All reactions