Headers report lower rate limits than expected? #176899

rabo-unumed · 2025-10-14T13:37:15Z

rabo-unumed
Oct 14, 2025

Select Topic Area

Question

Body

I also brought it up here, https://github.com/Azure/github-models/issues/22, but thought it might be good to also reach out on the boards since it's more of a general question about rate limits.

I am using gpt-4.1-mini through the Github Models organization inference endpoint. We have activated paid usage.
I keep hitting rate limit errors, and from the header info I can retrieve, it appears I only have 150,000 tokens (per minute, presumably)?

x-ratelimit-limit-tokens': '150000'

I cannot find any mention of that being the official rate limit, neither here https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quotas-limits?tabs=REST nor here https://docs.github.com/en/github-models/use-github-models/prototyping-with-ai-models#rate-limits

Are the listings outdated? Am I misinterpreting the header limit? Is it the limit for a smaller time unit than a minute?

Tried gpt-5-mini as well, and got a larger rate limit of 'x-ratelimit-limit-tokens': '500000', which does not make me less confused.

rabo-unumed · 2025-10-16T10:01:45Z

rabo-unumed
Oct 16, 2025
Author

Did some experimentation, and the token limit definitely seems to be per minute.

0 replies

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Headers report lower rate limits than expected? #176899

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

This comment was marked as off-topic.

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GitHub Community

Headers report lower rate limits than expected? #176899

Uh oh!

rabo-unumed Oct 14, 2025

Select Topic Area

Body

Replies: 2 comments

This comment was marked as off-topic.

Uh oh!

rabo-unumed Oct 16, 2025 Author

rabo-unumed
Oct 14, 2025

rabo-unumed
Oct 16, 2025
Author