Security is a process, not a reaction
Wake up. Check the news. There is a new OpenSSL vulnerability, the world is on fire. That vulnerability was published a week ago. Panic. Patch everything in a hurry. Break production. Panic^2.
If this sounds familiar, you are probably running a web application of some kind. Maybe your whole business depends on it. Maybe you didn't hear about the latest world-on-fire vulnerability. Panic.
How do you keep up with security issues when everything is happening so fast? Which parts of your technical stack are the most at risk? Is the customer data safe? Do you really need to care?
At Clever Cloud, we support many languages and databases, running on hundreds of machines. And our core business is to execute code we didn't write, on our infrastructure.
This has an interesting effect on security management: there is always an issue somewhere. Vulnerabilities appear every day. You are lucky if they are not "0 day vulnerabilities". Those are flaws published without notifying the developers. This means there is no solution available at publication time. How do we handle our security calmly when we should actually run around screaming?
Our approach to security comes from the way we run our systems. You cannot manage hundreds of machines without automation and well defined processes. Every action on our infrastructure must be cheap to perform, or have a great impact.
People see security as a huge cost because of the work it implies:
- unclear risk and impact on the business
- time spent tracking new vulnerabilities for various applications
- unclear result of updating code (will it stop working? Will it break other applications on the same machine?)
You want to reduce that cost, make security management easier and easier, until it is just a part of a day's job.
Defining your risk budget
Calculating the risk requires some time at first, to teach your team how a threat model works, and how to update it. The threat model is a description of your system used to evaluate the cost of an attack:
- targets: user data, intellectual property, machines
- entry points: web server, internal WiFi
- weaknesses: unpatched application, SQL injection, key employees victims of phishing
With this model, you calculate the difficulty of exploiting one weakness, which access level you obtain, where you can go from there. At the end, you get a list of issues in your system, ordered by impact on your system and ease of exploitation. Typically, if an automated script can steal your whole database, fix it immediately.
That model is the baseline everybody will use to evaluate security issues. It makes the risk real, not something you can just handwave with saying "we can take that risk". It is something you can plan for and budget for.
Staying up to date with security news
Once you have a model, you need to keep it up to date with current news. Maybe requiring Java applets in your client's browsers is not such a good idea anymore. Maybe your advertisement network is now serving malware (as a side note, to drastically reduce malware infection at your company, install ad blockers everywhere, trust me on this).
Following security news can look like a daunting task, but you can simplify it with good sources:
- avoid news websites. They write long articles, they want you to panic and they rarely provide usable solutions
- Follow security mailing lists. There are generalist ones, like email@example.com and firstname.lastname@example.org. There are more specific ones, like email@example.com (translate to your specific distribution), or firstname.lastname@example.org and email@example.com. There is also firstname.lastname@example.org, where 0-day vulnerabilities are sometimes published
- Twitter is still a good source of information on vulnerabilities, since people easily share. If you see security people suddenly buzzing in your timeline, you should pay attention. There are good lists of people to follow to get you started here and there. They each have their own focus, though, so you may not be interested in everything
- keep up with new versions of your software and their dependencies. Use your package manager, project specific mailing lists, subscribe to their github feed
Tracking security news becomes a simple process:
- check the mailing lists, see if you use any of the applications mentioned
- check your dependencies: anything new? Any security issues mentioned?
- check Twitter: is the world on fire?
Be careful, though. Twitter is often on fire, and security experts like to jump on the new vulnerability and dissect it at length. Even when there is no information available. Not every vulnerability needs attention right now, some of them may not even apply to your particular usage of the software. Don't panic (yet).
Taking the time to verify security issues regularly makes security part of your daily/weekly process. Applying a security patch is just another item to raise at your morning stand up meeting (or whatever other process).
Note that the person tracking the vulnerability might not be the one fixing it. When I first learned about the Logjam flaw, I was about to enter a plane for 10 hours. Notify the team by SMS/Slack, get an acknowledgment from someone, then go to sleep.
Reducing the risk of code updates
Here lies the huge cost of security: any code change in production is a potential liability. It brings no value to the customer, can introduce bugs or even crash the whole system (please make backups and test them regularly).
But this cost is not limited to security. It applies to your whole business. If modifying the production environment is complex and error prone, bugfixes come rarely. New versions come in huge chunks of code that will break things. Huge list of changes may even require some service downtime.
The point of our job at Clever Cloud is to make new deployments fast and painless. It has influenced our whole approaches to security. If you can start and remove a new instance of your application in seconds, you get huge benefits:
- staging environments to test updates
- replacing huge, risky updates with small increments
- applications can be completely independent. Updating the company's WordPress blog will not affect the SaaS application
This is how we do code updates now: when a project's dependency gets a new version to fix a security issue, just redeploy the application. When there's a security patch for the Linux kernel, apply the patch, redeploy all the virtual machines, move on.
We do not run around with our hair on fire. It is just a basic loop of:
- get notified of a vulnerability
- see if it applies
- see if there's a patch (or if you can develop one quickly)
- apply the patch
- redeploy the applications
- go make yourself a nice tea
We have good examples of this:
- The recent CVE-2016-0728 is a privilege escalation in Linux, something we need to take seriously. We took a look at the advisory, wrote a patch, tested it and deployed it in a few hours. Most Linux distributions took days to publish updated packages.
- In the same way, the infamous Heartbleed bug was fixed quickly. One of our clients came to us hours later asking if we knew about it: "oh, that's the reason my applications were redeployed in the middle of the night"
When deploying new versions of an application is easy, it suddenly reduces the cost of code changes. The operational risk gets tiny, compared to the security risk. And you can update everything fast. You have no more excuse to keep unpatched systems.
Following those tips to set up your security process will improve your operations as well. With a systematic approach, you know your application better, you can see the cost of managing issues and take action.
There is still a lot to talk about, like training for incidents, defining operations procedures, or how to set up your infrastructure for easy deployments. But that last item, we can handle it for you right now.