-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Since yesterday, there's been a small, but noticeable uptick in 502 Bad Gateway responses from API backends. Some level of 502 responses is normal (when the underlying API is actually down), but the uptick seems to be coming from APIs that are in fact up, so this is a problem at our layer.
By my estimates, this is affecting around 0.2%-0.25% of API traffic, so it's not super prevalent, but is still something we need to get fixed to prevent intermittent issues for API consumers.
I'm still investigating the root cause, but I believe this is somehow related to keepalive connections to API backends getting closed prematurely. We made some changes to how our AWS network environment is setup yesterday (using an NAT Gateway so we can better scale out the service, while retaining our static egress IPs), so I believe the issue is related to that change, but still getting to the bottom of it.