Investigate uptick in 502 Bad Gateway responses (~0.001% requests)

Since yesterday, there's been a small, but noticeable uptick in 502 Bad Gateway responses from API backends. Some level of 502 responses is normal (when the underlying API is actually down), but the uptick seems to be coming from APIs that are in fact up, so this is a problem at our layer.

By my estimates, this is affecting around 0.2%-0.25% of API traffic, so it's not super prevalent, but is still something we need to get fixed to prevent intermittent issues for API consumers.

I'm still investigating the root cause, but I believe this is somehow related to keepalive connections to API backends getting closed prematurely. We made some changes to how our AWS network environment is setup yesterday (using an NAT Gateway so we can better scale out the service, while retaining our static egress IPs), so I believe the issue is related to that change, but still getting to the bottom of it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate uptick in 502 Bad Gateway responses (~0.001% requests) #446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate uptick in 502 Bad Gateway responses (~0.001% requests) #446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions