Skip to content

Implement a custom DNS resolver for Gitaly

Quang-Minh Nguyen requested to merge qmnguyen0711/implement-dns-resolver into master

For #4529 (closed). For more information, please read this comment and following thread

gRPC supports a built-in DNS resolver. This resolver works quite well in most scenarios. It has some drawbacks:

  • After the DNS is resolved for the first time, the resolver does not refresh the list of addresses until the client connection triggers the resolver actively. Client connection does so when it detects some of its subchannels are unavailable permanently. It means as soon as the client connection is stable, the client is not aware of new hosts added to the cluster via DNS service discovery. This behavior leads to unexpected stickiness and workload skew, especially after a failover.
  • The support for SRV record is in a weird state. This type of record is only supported when grpclb load balancing strategy is enabled. This strategy is deprecated, unfortunately. Its behavior is also not as we expected. In short-term, we would like to use round-robin strategy. In longer term, we may have a custom strategy for Raft-based cluster. Thus, SRV service discovery is crucial in the future.
  • The resolver detects service config via TXT record if any. While this option is convenient for a generic grpc setting, it does not make sense for Gitaly. So, we should get rid of it.

This commit implements a custom DNS resolver. This resolver has somemajor features:

  • Resolve DNS service discovery via A records
  • Periodically refresh the DNS (5 minutes by default)
  • Update DNS state only if it detects real changes
  • Support logging.

Service discovery via SRV records is not supported in this version to keep the backward compatibility with Ruby clients.

gRPC depends on the target's scheme to determine which resolver to use. Built-in DNS Resolver registers itself with "dns" scheme. We should use a different scheme for this resolver. However, Ruby, and other cares-based clients, don't support custom resolver. At GitLab, the gRPC target configuration is shared between components. To ensure the compatibility between clients, this resolver intentionally replaces the built-in resolver under the same "dns" scheme.

In theory, I can stub the whole DNS lookup operation. However, I really don't want to stub too much. To test the real DNS behavior, I bring up a real DNS server with this package. It serves a DNS server via UDP. The answers returned from this server is controlled by the test.

Architecture

flowchart TD
        Target["dns://8.8.8.8:53/gitaly.consul.internal"]--Pick by dns scheme\nOr grpc.WithResolvers--> dnsresolver.Builder
        dnsresolver.Builder--> dnsresolver.Resolver
	subgraph ClientConn
            dnsresolver.Resolver -.Refresh.-> dnsresolver.Resolver
	    dnsresolver.Resolver -- Update state --> LoadBalancer
		LoadBalancer --> SubChannel1
		LoadBalancer --> SubChannel2
		LoadBalancer --> SubChannel3
		SubChannel1 -. Report .-> LoadBalancer
		SubChannel2 -. Report .-> LoadBalancer
		SubChannel3 -. Report .-> LoadBalancer
	end
	subgraph Gitaly
		Gitaly1
		Gitaly2
		Gitaly3
	end
	SubChannel1 -- TCP --> Gitaly1
	SubChannel2 -- TCP --> Gitaly2
	SubChannel3 -- TCP --> Gitaly3
        dnsresolver.Resolver --> net.Resolver
        net.Resolver -.If specify authority.-> Authority[Authority Nameserver\n8.8.8.8:53]
        net.Resolver -..-> Authority2[OS's configured nameserver]
        net.Resolver -..-> /etc/resolv.conf

Note: While the above figure is specific for grpc-go, grpc-core follows a very similar flow.

In general, when a client performs grpc.Dial, the target URL must be resolved by a resolver. gRPC supports many built-in resolvers, including DNS resolver. It also provides a powerful framework to build a custom resolver. From the problem stated in the above section, I decided to build one. A resolver includes two main parts: Builder and Resolver.

Builder creates a resolver object. A builder handles a particular scheme. At module loading time, the builder must register itself with a global resolver registry. Users can also use grpc.WithResolvers to now modify the global registry. When the client connection resolves the target, it depends on its scheme to pick the correct builder. It uses the builder object to creates a Resolver object. Every client connection maintains one resolver object.

Resolver is to resolve the target URL on the behalf of client connection. The result is passed to its client connection via UpdateState API. In this implementation, the DNS resolver starts a Goroutine to watch for the state of the target URL periodically. The client connection can also trigger an early resolution if it detects a connectivity change, connection interruption, for example. Underlying, the Resolver delegates actual name resolution to std net.Resolver. Depending on the runtime platform, std resolver does plenty of things. Eventually, it needs to reach a DNS nameserver via UDP. The DNS nameserver is likely to be configured by the runtime OS. Clients can specify the nameserver address in the target URL (8.8.8.8:53 for example).

Edited by Quang-Minh Nguyen

Merge request reports