Yesterday two issues affecting CPUs have been released to the public.
TL;DR: the attacks are named Meltdown and Spectre. They allow reading the memory of the OS or of other processes, to steal secrets or get information for other exploits. A part of the solution can greatly affect performance of running code. In particular, this attack allows to easily cross container boundaries, and in some cases (not our case) even VM boundaries.
In addition to servers, consumer machines are affected, especially through browsers, so you should definitely update your operating system as well as your browsers.
What it means for Clever Cloud users
Your applications will be (or already have been) automatically restarted (just like any other maintenance deployments). The addons will be patched and restarted in place in the following hours. This will generate limited downtime on addons (usually around a minute, depending on the addon start up time).
In addition to restarting virtual machines, we will also need to restart physical machines, as the attacks theoretically allows VM boundaries crossing. This attack is not usable (yet?) on Clever Cloud due to our virtualization choices and our OS hardening, but we will deploy patches preemptively. Physical machines updates will take place in the following days and will not impact applications. We are currently working on finding the best solution for addons, but it will definitely incur additional downtime for addons.
The patches, while mitigating the issues, also come with performance regressions. It heavily depends on the workload as well as the exact CPU model. The CPUs we use are among the less affected by the performance issues, but a slowdown of at least 5% is to be expected.
The Meltdown attack and the Spectre categories of attack are related to a performance feature of modern processors: branch prediction and speculative execution. Meltdown shows that when an instruction can cause a trap, like the privilege check for user → kernel access), the processor will perform speculative execution: it starts executing the code in case there’s no trap, but rollbacks if there was a trap. This attack happens at the boundary between user code and kernel. Before the processor has completely checked that we have the authorization to run privileged code, it starts executing it. When it turns out we were not authorized, it rolls back the results of that code, but not completely, it can leave some data in the cache. Combined with a technique called “cache timing attack”, it is then possible to guess the content of the data that was loaded in cache, bit by bit. Branch prediction has a related behaviour: when encountering a branch (example: an if/else expression), the processor will start executing one of the branches before it calculates the condition, to avoid waiting too much. It guesses which side of the condition is most likely thanks to its branch predictor. Spectre uses branch prediction to cause speculative execution to read out of a buffer’s bounds (among other consequences) in the kernel or another process, then guess the results from the cache.
The Meltdown attack is specific to Intel processors, it allows reading from the OS’s memory. There are patches available (the kPTI feature, also named KAISER https://lkml.org/lkml/2017/12/4/709). Those patches have a great impact on syscall performance (https://www.phoronix.com/scan.php?page=article&item=linux-415-x86pti&num=1), with programs running 5% to 30% slower depending on the workload. The Intel Haswell processors with the PCID (Process Context Identifiers) feature get the lowest performance hit (5%). We use those processors on Clever Cloud.
Spectre affects processors from Intel, AMD and ARM, it allows reading from the memory of other processes. It looks more like a new attack category, for which we will have to fix the issue individually in each affected software. The only global solution for Spectre is a radical change in processor architecture, and this is unlikely to happen soon. We will follow closely any new related vulnerability and promptly patch our infrastructure.
For further information
- Papers and explanations about Meltdown and Spectre: https://spectreattack.com/
- Proofs of concept from Google’s Project Zero team: https://googleprojectzero.blogspot.fr/2018/01/reading-privileged-memory-with-side.html
- French twitter thread explaining the attacks: https://twitter.com/fenarinarsa/status/948697105996156928
- English twitter thread explaining the attacks: https://twitter.com/nicoleperlroth/status/948684376249962496