Falling for Rust

If you ever talked to me, or looked at my Twitter feed, you may have noticed that I campaign loudly for the Rust programming language. I am not going to stop. In fact, it will only go crescendo!

I got involved with that language a while ago, when I was looking for a safe alternative to C, for my research for the Videolan project. It was an interesting time, since the language changed often. I basically had to rewrite most of my code every two weeks. But since the release of Rust 1.0 in May 2015, it has gone from being an experimental tool to a serious contender for other system programming platforms.

I'm no longer the only one talking about Rust in the office, and the funny thing is that I'm not the first one to put some Rust in production! Marc-Antoine started rewriting some management tools from bash to Rust, and they were working so well that they quickly ended up running on the platform.

Why we are betting on Rust for the future

We have always used heterogeneous technologies to manage the platform, mainly because we must know how they behave to host them properly. We test and use everything, and running them on our platform gives use the long term view.

For us, Rust gave substantial benefits from the beginning, and showed it is adapted to the way we run production code.

Static binaries

Like C, C++ and Go, Rust can build single file binaries that you will upload on a server and run directly, without any installed dependencies except a libc. This is significant, because when you want to reduce the base disk size and the boot time of a virtual machine, installing a lot of dependencies, like a Python virtual machine and its standard library, quickly takes its toll.

We built an immutable infrastructure: we do not modify virtual machines directly, instead we modify the base image and start new machines from scratch. So we're not too concerned about updating a dynamic library separately from the executable loading it.

Reliability

Memory safety is one of the biggest arguments of Rust, but it goes further than that. There's a big emphasis on providing safe patterns everywhere. As an example, data is immutable by default, you need to explicitely add "mut" to a variable to be able to change it. Most APIs will return an "Option" or "Result" type to safely wrap an actual result or an error, and there are lots of ways to manipulate them easily. Ignoring an error means explicitely adding an "unwrap()" call to that function's result.

All this means that by default, you write Rust code that should not crash, and that should not modify data unless explicitely stated. Add in a good type system to represent correctly and check assumptions. And some unit tests, because we know type systems don't replace them (although you don't need to write as many unit tests as in languages with simpler type systems).

You then get code that will rarely fail, and in which you can easily find the parts that break assumptions (example: grep for unsafe and unwrap). There will still be bugs, but you will work more on functional parts than on plumbing.

Stability

Rust has no garbage collection. Most of the time, you won't care about this, because the benefit of using a language with a garbage collector outweights any of the usual concerns, like performance. Not caring about memory allocation lets you write code without fear of most memory vulnerabilities, and avoid most instances of memory leaks (sadly, not all of them). The real issue in production: the garbage collector disturbs the program's behaviour by regularly executing a task that goes through the memory to detect what to deallocate. Of course, it's not always as simple as a mark and sweep, but GC will still take some CPU time and introduce latency in other tasks. In situations of intense memory pressures (like, a lot of requests to handle), keeping lots of unusable memory around or stopping other tasks can trigger catastrophic failures. When the issue appears, people will typically spend a lot of time tweaking GC settings or rewriting code to avoid it. Either way, it's a lot of work.

Rust avoids garbage collection by handling memory allocation precisely at compile time. You get the same benefits without the runtime cost. The result? an application with predictable CPU and RAM usage. Typically, the RAM usage graphs will be flat, compared to the sawtooth graphs with garbage collectors. Predictability is a key feature for stable production systems: you can make more assumptions about the runtime behaviour and plan for resource usage.

On that point, you get another benefit: it is easy to make boundaries on resource usage. If I know roughly how much memory I need for X concurrent requests, and if I know the capacity of my server, I can put a soft limit on the number of concurrent requests, at which I set up an alert, and a hard limit, because I know that past that limit, the server will just stop answering properly.

Not worrying about it

For us, these benefits make a strong case for Rust as a reliable building block for a production platform. This is the piece of code we don't have to worry about, and it will enable others to run safely.

You can do it, too!

Right now, we are replacing small parts of the infrastructure, and we will soon unveil more interesting tools. In the meantime, we added Rust support in beta on our platform: now you can deploy Rust web applications in a few commands.

Go ahead and put Rust in production!

Blog

À lire également

SuperBOL: The COBOL revolution in the Cloud

COBOL, a programming language that is over 60 years old, continues to power a large proportion of the IT systems of the world's major companies, particularly in the financial and insurance sectors.
Features

Clever Cloud welcomes the first startups to the UP Programme

Clever Cloud is proud to announce the arrival of the first five startups selected to join its UP Programme, an initiative dedicated to supporting young technology companies in their growth phase.
Company

A minor update resulted in a cascade of errors: how it went wrong, what we’ve learnt

On Friday, August 2nd, 2024 Clever Cloud’s platform became very unstable, leading to downtime of varying duration and scope, for customers using services on the EU-FR-1 (PAR) region, and remote zones depending on the EU-FR-1 control plane (OVHcloud, Scaleway, and Oracle). Privates and on-premise zones weren’t impacted.
Company Engineering