-
Notifications
You must be signed in to change notification settings - Fork 314
feat(webhook): add rate limiting to webhook endpoint #1210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(webhook): add rate limiting to webhook endpoint #1210
Conversation
b46b8e9 to
a1fdd60
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1210 +/- ##
==========================================
- Coverage 63.34% 63.33% -0.02%
==========================================
Files 23 23
Lines 3165 3175 +10
==========================================
+ Hits 2005 2011 +6
- Misses 1050 1054 +4
Partials 110 110 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Resolved the things mentioned in the review. |
docs/configuration/webhook.md
Outdated
| |`HARBOR_WEBHOOK_SECRET` |`--harbor-webhook-secret`| | ||
| |`QUAY_WEBHOOK_SECRET` |`--quay-webhook-secret`| | ||
| |`ENABLE_WEBHOOK_RATELIMIT`|`--enable-webhook-ratelimit`| | ||
| |`WEBHOOK_RATELIMIT_ALLOWED_`|`--webhook-ratelimit-allowed`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra _ at the end of the env variable name.
|
Sorry about that, tried to get that done quick before leaving the office fixed again. |
|
I was wondering why you would roll up your own rate limiter, especially given the fact that we already use the rate limiter from uber within image updater |
I did not realize this and haven't really touched the registry scanner. I'll refactor this PR tomorrow thanks for the advice I should definitely look more at the packages used next time 🤦♂️ |
|
Good catch, Jan. I should've noticed that earlier. Yes, let's stick to uber ratelimiter. |
|
I updated the rate limiting functionality to use the uber package now. This one is slightly different and it allows for a certain amount of requests per second instead of a certain amount in a window. It also blocks until it is able to make the request so I set it to always be enabled as well because I can see this bringing benefits in not overloading resources if a CI pipeline pushed a bunch of images that had notifications set up for Image Updater. |
931da0f to
678f7ec
Compare
cmd/webhook.go
Outdated
| webhookCmd.Flags().StringVar(&webhookCfg.GHCRSecret, "ghcr-webhook-secret", env.GetStringVal("GHCR_WEBHOOK_SECRET", ""), "Secret for validating GitHub Container Registry webhooks") | ||
| webhookCmd.Flags().StringVar(&webhookCfg.QuaySecret, "quay-webhook-secret", env.GetStringVal("QUAY_WEBHOOK_SECRET", ""), "Secret for validating Quay webhooks") | ||
| webhookCmd.Flags().StringVar(&webhookCfg.HarborSecret, "harbor-webhook-secret", env.GetStringVal("HARBOR_WEBHOOK_SECRET", ""), "Secret for validating Harbor webhooks") | ||
| webhookCmd.Flags().IntVar(&webhookCfg.RateLimitNumAllowedRequests, "webhook-ratelimit-allowed", env.ParseNumFromEnv("WEBHOOK_RATELIMIT_ALLOWED", 20, 0, math.MaxInt), "The number of allowed requests in a window for webhook rate limiting") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the description needs update once we decide on the time unit.
| runCmd.Flags().StringVar(&webhookCfg.QuaySecret, "quay-webhook-secret", env.GetStringVal("QUAY_WEBHOOK_SECRET", ""), "Secret for validating Quay webhooks") | ||
| runCmd.Flags().StringVar(&webhookCfg.HarborSecret, "harbor-webhook-secret", env.GetStringVal("HARBOR_WEBHOOK_SECRET", ""), "Secret for validating Harbor webhooks") | ||
| runCmd.Flags().IntVar(&webhookCfg.RateLimitNumAllowedRequests, "webhook-ratelimit-allowed", env.ParseNumFromEnv("WEBHOOK_RATELIMIT_ALLOWED", 20, 0, math.MaxInt), "The number of allowed requests in a window for webhook rate limiting") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the description needs update once we decide on the time unit.
cmd/run.go
Outdated
| log.Infof("Starting webhook server on port %d", webhookCfg.Port) | ||
| webhookServer = webhook.NewWebhookServer(webhookCfg.Port, handler, cfg.KubeClient, argoClient) | ||
|
|
||
| webhookServer.RateLimiter = ratelimit.New(webhookCfg.RateLimitNumAllowedRequests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the default time unit being second, the lowest one can get is 1. For many users, I guess 1 request per second is still very frequent for image updates. I'm wondering if we could use a larger time unit, like hour or day? What do you think?
How about make 0 mean disabled, and make it the default, so users will need to explicitly opt in to ratelimiter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some quick searches we can use a longer unit of time. One thing that I'm thinking is that the request will be blocked until the rate limiter says it can proceed. So I think having a really long interval might make it so the request will be blocked until the interval of the rate limit passes even if nothing is being updated which IMO will make it a bit too close to being like polling and at that point just use polling.
To combat this maybe we can have it add one request back in when an update is finished so the rate limiting kind of controls how many requests can be done at one time which I would assume the limit to that would be how much resources a user wants to dedicate to image updater. Not sure if I'm thinking about this right but let me know what you think about this?
I do like the idea of it being opt in which could combat the issue I mentioned above. If a user opts in to the rate limiter then they would accept the fact that things might have to wait until the interval is up if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, really longer interval can take long to reset. So day is probably too long, and hour looks like a good middle ground.
I'm hestitant to customize the limiting logic in the ratelimiter lib. The built-in slack option in uber ratelimiter should address above concerns to some degree. And this slack option is on by default and defaults to 10 allowances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the hour interval I think that would work great.
I'll add in the enable flags and slack option by the end of the week. Thanks for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound great! I think we don't need to expose the slack option as a flag. Just using the default slack mechanism in the lib should suffice.
|
I made the adjustments to the PR. One thing I also made an adjustment to is where the rate limit happens. I moved it to right before the updates happen because if the request gets held up on the rate limit then there may be a chance the container registry times it out and then disables the webhook. I know that Quay has it so if the webhook notification fails to send 3 time it will disable the webhook and that could cause some annoyance on the user. LMK what you think about this. |
Signed-off-by: Christopher Coco <[email protected]>
…fests to include them Signed-off-by: Christopher Coco <[email protected]> fix manifests Signed-off-by: Christopher Coco <[email protected]>
Signed-off-by: Christopher Coco <[email protected]>
…ment Signed-off-by: Christopher Coco <[email protected]>
Signed-off-by: Christopher Coco <[email protected]>
Signed-off-by: Christopher Coco <[email protected]>
Signed-off-by: Christopher Coco <[email protected]>
…telimit Signed-off-by: Christopher Coco <[email protected]> fix liniting Signed-off-by: Christopher Coco <[email protected]>
Signed-off-by: cjcocokrisp <[email protected]>
Signed-off-by: cjcocokrisp <[email protected]>
47a9004 to
6390298
Compare
Signed-off-by: cjcocokrisp <[email protected]>
) Signed-off-by: Christopher Coco <[email protected]> Signed-off-by: cjcocokrisp <[email protected]> (cherry picked from commit e6c82f2) Signed-off-by: Denis Karpelevich <[email protected]>
This PR aims to add rate limiting to the webhook endpoint so it can not be spammed with requests. This is accomplished with a sliding window approach where timestamps of requests are stored and then when you make a request it is checked how many requests were made in the specified window from the time of your request. If the window was fixed you could burst the requests at the end of the window and then at the start of the window which could cause overloading problems. The sliding window approach makes it so this is not a problem. The downside to this approach however, is that it requires more memory to store each individual timestamp. Because container registries could have rotating IP addresses this can be a problem due to information of an IP that won't make any more requests will still be stored. So to fix this I added a clean up interval that can be set that will clean any clients that have not been seen in a while.
I looked into some packages that were available and found the rate and uber/ratelimit package but after doing research into what strategies are used for rate limiting it seemed like it wasn't the most difficult to implement.
I will add the documentation for this in a separate PR.