Skip to content

Conversation

@cjcocokrisp
Copy link
Contributor

This PR aims to add rate limiting to the webhook endpoint so it can not be spammed with requests. This is accomplished with a sliding window approach where timestamps of requests are stored and then when you make a request it is checked how many requests were made in the specified window from the time of your request. If the window was fixed you could burst the requests at the end of the window and then at the start of the window which could cause overloading problems. The sliding window approach makes it so this is not a problem. The downside to this approach however, is that it requires more memory to store each individual timestamp. Because container registries could have rotating IP addresses this can be a problem due to information of an IP that won't make any more requests will still be stored. So to fix this I added a clean up interval that can be set that will clean any clients that have not been seen in a while.

I looked into some packages that were available and found the rate and uber/ratelimit package but after doing research into what strategies are used for rate limiting it seemed like it wasn't the most difficult to implement.

I will add the documentation for this in a separate PR.

@cjcocokrisp cjcocokrisp changed the title Feat/webhook extras feat(webhook): add rate limiting to webhook endpoint Aug 7, 2025
@cjcocokrisp cjcocokrisp force-pushed the feat/webhook-extras branch 2 times, most recently from b46b8e9 to a1fdd60 Compare August 7, 2025 16:07
@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2025

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.33%. Comparing base (e816c49) to head (6390298).

Files with missing lines Patch % Lines
cmd/run.go 25.00% 3 Missing ⚠️
cmd/webhook.go 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1210      +/-   ##
==========================================
- Coverage   63.34%   63.33%   -0.02%     
==========================================
  Files          23       23              
  Lines        3165     3175      +10     
==========================================
+ Hits         2005     2011       +6     
- Misses       1050     1054       +4     
  Partials      110      110              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjcocokrisp
Copy link
Contributor Author

Resolved the things mentioned in the review.

|`HARBOR_WEBHOOK_SECRET` |`--harbor-webhook-secret`|
|`QUAY_WEBHOOK_SECRET` |`--quay-webhook-secret`|
|`ENABLE_WEBHOOK_RATELIMIT`|`--enable-webhook-ratelimit`|
|`WEBHOOK_RATELIMIT_ALLOWED_`|`--webhook-ratelimit-allowed`|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra _ at the end of the env variable name.

@cjcocokrisp
Copy link
Contributor Author

Sorry about that, tried to get that done quick before leaving the office fixed again.

@jannfis
Copy link
Contributor

jannfis commented Aug 8, 2025

I was wondering why you would roll up your own rate limiter, especially given the fact that we already use the rate limiter from uber within image updater

@cjcocokrisp
Copy link
Contributor Author

I was wondering why you would roll up your own rate limiter, especially given the fact that we already use the rate limiter from uber within image updater

I did not realize this and haven't really touched the registry scanner. I'll refactor this PR tomorrow thanks for the advice I should definitely look more at the packages used next time 🤦‍♂️

@chengfang
Copy link
Collaborator

Good catch, Jan. I should've noticed that earlier. Yes, let's stick to uber ratelimiter.

@cjcocokrisp
Copy link
Contributor Author

I updated the rate limiting functionality to use the uber package now. This one is slightly different and it allows for a certain amount of requests per second instead of a certain amount in a window. It also blocks until it is able to make the request so I set it to always be enabled as well because I can see this bringing benefits in not overloading resources if a CI pipeline pushed a bunch of images that had notifications set up for Image Updater.

cmd/webhook.go Outdated
webhookCmd.Flags().StringVar(&webhookCfg.GHCRSecret, "ghcr-webhook-secret", env.GetStringVal("GHCR_WEBHOOK_SECRET", ""), "Secret for validating GitHub Container Registry webhooks")
webhookCmd.Flags().StringVar(&webhookCfg.QuaySecret, "quay-webhook-secret", env.GetStringVal("QUAY_WEBHOOK_SECRET", ""), "Secret for validating Quay webhooks")
webhookCmd.Flags().StringVar(&webhookCfg.HarborSecret, "harbor-webhook-secret", env.GetStringVal("HARBOR_WEBHOOK_SECRET", ""), "Secret for validating Harbor webhooks")
webhookCmd.Flags().IntVar(&webhookCfg.RateLimitNumAllowedRequests, "webhook-ratelimit-allowed", env.ParseNumFromEnv("WEBHOOK_RATELIMIT_ALLOWED", 20, 0, math.MaxInt), "The number of allowed requests in a window for webhook rate limiting")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the description needs update once we decide on the time unit.

runCmd.Flags().StringVar(&webhookCfg.QuaySecret, "quay-webhook-secret", env.GetStringVal("QUAY_WEBHOOK_SECRET", ""), "Secret for validating Quay webhooks")
runCmd.Flags().StringVar(&webhookCfg.HarborSecret, "harbor-webhook-secret", env.GetStringVal("HARBOR_WEBHOOK_SECRET", ""), "Secret for validating Harbor webhooks")
runCmd.Flags().IntVar(&webhookCfg.RateLimitNumAllowedRequests, "webhook-ratelimit-allowed", env.ParseNumFromEnv("WEBHOOK_RATELIMIT_ALLOWED", 20, 0, math.MaxInt), "The number of allowed requests in a window for webhook rate limiting")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the description needs update once we decide on the time unit.

cmd/run.go Outdated
log.Infof("Starting webhook server on port %d", webhookCfg.Port)
webhookServer = webhook.NewWebhookServer(webhookCfg.Port, handler, cfg.KubeClient, argoClient)

webhookServer.RateLimiter = ratelimit.New(webhookCfg.RateLimitNumAllowedRequests)
Copy link
Collaborator

@chengfang chengfang Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the default time unit being second, the lowest one can get is 1. For many users, I guess 1 request per second is still very frequent for image updates. I'm wondering if we could use a larger time unit, like hour or day? What do you think?

How about make 0 mean disabled, and make it the default, so users will need to explicitly opt in to ratelimiter?

Copy link
Contributor Author

@cjcocokrisp cjcocokrisp Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some quick searches we can use a longer unit of time. One thing that I'm thinking is that the request will be blocked until the rate limiter says it can proceed. So I think having a really long interval might make it so the request will be blocked until the interval of the rate limit passes even if nothing is being updated which IMO will make it a bit too close to being like polling and at that point just use polling.

To combat this maybe we can have it add one request back in when an update is finished so the rate limiting kind of controls how many requests can be done at one time which I would assume the limit to that would be how much resources a user wants to dedicate to image updater. Not sure if I'm thinking about this right but let me know what you think about this?

I do like the idea of it being opt in which could combat the issue I mentioned above. If a user opts in to the rate limiter then they would accept the fact that things might have to wait until the interval is up if that makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, really longer interval can take long to reset. So day is probably too long, and hour looks like a good middle ground.

I'm hestitant to customize the limiting logic in the ratelimiter lib. The built-in slack option in uber ratelimiter should address above concerns to some degree. And this slack option is on by default and defaults to 10 allowances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the hour interval I think that would work great.

I'll add in the enable flags and slack option by the end of the week. Thanks for the feedback!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound great! I think we don't need to expose the slack option as a flag. Just using the default slack mechanism in the lib should suffice.

@cjcocokrisp
Copy link
Contributor Author

I made the adjustments to the PR.

One thing I also made an adjustment to is where the rate limit happens. I moved it to right before the updates happen because if the request gets held up on the rate limit then there may be a chance the container registry times it out and then disables the webhook.

I know that Quay has it so if the webhook notification fails to send 3 time it will disable the webhook and that could cause some annoyance on the user. LMK what you think about this.

@chengfang chengfang force-pushed the feat/webhook-extras branch from 47a9004 to 6390298 Compare August 15, 2025 02:07
@chengfang chengfang merged commit e6c82f2 into argoproj-labs:master Aug 15, 2025
11 checks passed
dkarpele pushed a commit to dkarpele/argocd-image-updater that referenced this pull request Aug 26, 2025
)

Signed-off-by: Christopher Coco <[email protected]>
Signed-off-by: cjcocokrisp <[email protected]>

(cherry picked from commit e6c82f2)
Signed-off-by: Denis Karpelevich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants