-
Notifications
You must be signed in to change notification settings - Fork 71
Description
In light of #129 and #130, I think the overall way we're handling DNS needs to be revisited at some point. I'm hoping we've nailed down most of these weird edge-cases, so our current system can keep humming along, but the way we're handling DNS lookups for API backends does seem over-complicated.
A quick summary of the current approach is that we have a process that resolves all the hostnames of API backend servers. When changes are detected it writes a new nginx config file referencing the IP address everything currently resolves to and reloads nginx (so nginx only references IPs, not hostnames).
This approach has become more complicated over time, because we make an attempt to stick with the current IP for a given host, rather than switching to a new IP as soon as we see it. This is done predominately to deal with backends like Akamai or ELBs, where the DNS resolves to a rotating collection of IPs. If we were to immediately acknowledge the IP seen in these cases, we would basically be reloading nginx every minute or two, due to the super-short TTL on those domains and the rotation of IPs. So generally speaking, we respect DNS TTLs, except for some of these CDN services, where the DNS seems to resolve to a rotating (but stable) collection of active IPs.
All of this complexity is obviously bad, since it's led to these weird edge-case bugs recently, so I'd like to streamline it.
Here's some additional random thoughts, ideas, or reasons we ended up where we are:
- The reason for all this DNS resolving junk is that most proxies don't actually support resolving DNS updates internally. Basically they only resolve domain names during startup, and then they will never change the IP the backend points to until the proxy is completely restarted. With more ELB type backends cropping up, it's a frequently requested feature, but until recently it's been mostly non-existent in the main open source proxy options.
- nginx just recently started supporting live DNS resolution in their commercial nginx plus offering. This is tempting, but this would slightly complicate the fully open source nature of API Umbrella. But something we should consider at least for api.data.gov's use case.
- Our own work on this started when we were using haproxy for proxying. This feature is also on their radar, but still not implemented there. I do have some desire not to tie us too heavily to using nginx-specific features (or any proxies features), since I would like to consider switching back to haproxy for routing once they implement backend keep-alives (maybe in the next version).
- nginx does support live DNS resolution without reloads in the open source version of nginx, but only if you don't use upstreams. This might be the best candidate for solving this. The downside is we can't use upstreams, so we could only use this when the backend is a single domain (so we couldn't support API Umbrella load balancing between two servers in this case). This largely seems reasonable to me, since these super dynamic backend domains are only used when the domain is handling load balancing itself. However, we also loose out on some additional opportunities to configure things with the lack of upstream, so we'd need to make sure we're not dependent on those.
- If we stick with our current approach, it could be greatly simplified if we just always took the latest IP we've seen, and reload nginx then. The main reason these frequent reloads are problematic now is that our Rails web app is served out of the same nginx instance. So reloading nginx causes lots of Rails reloads, which causes slow response times as things spin back up. With the big upgrade happening in API Umbrella upgrade for production site #123, the instances of nginx have been split up, so this is no longer the case. This might make frequent reloads of nginx okay, but we'd need to do more testing to make sure this wouldn't negatively impact active connections.
- If we fully embrace nginx as our proxy server, we could potentially do some interesting stuff to embed the DNS logic into the server with Lua and something like lua-resty-dns. However, even then, I'm not sure that would work, since the nginx lua upstream plugin doesn't currently support modifying the upstreams on the fly (but there's talk of that happening).