At Clever Cloud, we are working on Sōzu, an HTTP reverse proxy that can change its configuration at runtime, without restarting the process. Why would we need that, you might ask?
In our architecture, all the applications sit behind a few load balancers with public IP addresses to which our clients point their DNS entries. We used HAProxy for these load balancers, a well known HTTP reverse proxy. Like a lot of other companies, our solution to serve a lot of applications, each with many backend servers, is to generate a configuration file for HAProxy and ask the current process to spawn new workers to handle the new configuration.
Unfortunately, this approach can lose new TCP connections on configuration updates: the old workers might stop accepting connections, but their listen queue might still contain new connections that will not be transferred to the new workers. Worse: if you change the configuration again while the old workers have not stopped handling their current connections, they would be killed, and the live connections would stop right there. Or the processes could just pile up and hog resources.
We use an immutable infrastructure approach. For every new version of an application, instead of modifying the backend servers in place, we spawn new ones. This means that we must update the HAProxy configuration to route an application to the new backend servers. And so, we must update the global configuration on every new commit from any of our clients. With multiple configuration changes every second, we're bound to lose some connections.
Hot configuration reloading VS restarting processes
That's why we set out to build a new reverse proxy that could handle configuration changes without losing connections.
First, we decided configuration updates should happen without restarts. The ability to change the configuration of a running process is essential because launching new workers, transferring the current state between them, and removing old workers gracefully, is a task with a high impact on server load and latency. You're essentially doubling the server's resource usage, and launching a lot of new processes every time.
This is a cost we have to pay for executable upgrades, though, but those happen less frequently. For that case, we provide ways to do upgrades without downtimes, either automatically, or with more hand holding (you decide when to launch each step) if you want.
Shared and synchronized configuration VS data locality
To handle configuration changes, Sōzu uses a master/workers architecture, with the master receiving configuration changes and transmitting it to clients. Each worker has its own copy of the configuration: if workers had something like a shared memory segment for this, we would need to add cross process synchronization, and make sure that accessing this data is safe, fiddle with pointers, etc. The whole configuration state is not very large (certificates and keys are the biggest part), so we can keep copies in every process. This makes for better data locality and is easier to handle overall.
Configuration changes: push VS pull
To get configuration updates, there are basically two solutions:
- The proxy polls new configuration from files or from tools like Kubernetes, etcd, Consul…
- The proxy exposes a communication channel to get the new configuration
The first solution is essentially what Traefik, the reverse proxy built in Go, does. We chose the second solution because we thought it is not the proxy's reponsibility to communicate with those tools. After all, it is just a pipe, it should not have to understand all the existing configuration protocols. So we expose a channel, and we build tools to bridge between the configuration backend and Sōzu.
That way, we do not impose the configuration format, the proxy binary stays small, and anyone can write their own tool to drive it, in any language they want.
The channel is a Unix socket. We decided on this instead of exposing a TCP port on localhost because anybody on the machine could connect to it, while a Unix socket has its access protected by filesystem rights.
The protocol is quite simple: JSON objects representing the proxy orders and their answers, separated by null characters. Writing new tools is as easy as writing directly to a file, you can even make tools in bash. If you want more control, we provide libraries in Rust to wrap this channel. Other libraries will appear soon to do it easily from other languages.
Working with configuration diffs VS replacing the configuration
An easy way to implement runtime configuration changes might be to replace the whole configuration every time there's a change, and start handling the new routing right away. We choose another way: Sōzu works with configuration diffs. The messages you send through the Unix socket contain information like "add this specific certificate", "add this backend server to this application", "remove this domain name for this application". This is useful because when you replace the whole configuration at once, you lose information.
You might have to remove some openssl contexts still storing old certificates. Or you might want to know when you can drop a backend server: if you tell Sōzu to remove a backend server for an application, it will first tell you that it acknowledges the change and will stop routing new traffic to this server, but will also tell you if there are still connections going on to this server. It will notify you once the connections are gone, so you can safely drop that server.
This also means configuration changes are smaller: instead of loading a complete configuration, you just send a few small messages.
To accomodate for this solution, the configuration protocol is more than request-response: there are 3 possible answers: Error, Ok or Processing. If the proxy answers Processing, that means the actual result might come later. It can send other Processing messages, to keep you posted on the number of current connections or other issues, and send you Ok or Error after a while.
Design goals for a tool with hot reconfiguration
We worked for a while on this and had time to explore the requirements for a tool with hot reconfiguration. There are three important points:
- You have to bake runtime reconfiguration in from the beginning. You cannot retrofit
that correctly in an actual system by messing with backend config switches or other hacks
- Do runtime reconfiguration, not process restarts. You might get away with it or
blue/green deployments if your configuration does not change often and connections
are short lived. Our experience shows it does not happen like this
- Work with configuration diffs instead of replacing the configuration. Otherwise,
you're losing important information for your infrastructure
We hope this architecture will make it easily to make long lived and reliable systems, and we hope people will build awesome tools around Sōzu.