Keeping containers safe

In a previous post I showed that if you stop treating containers as if they were VMs then container security is easy.

Now we need to look at how to keep the contents of containers safe.

In general there are a number steps:

  1. Build good containers
  2. Scan existing containers
  3. Replace bad containers

Build good containers

This should just be an extension of your existing source control process; your CI/CD process; your “test driven” processes. Whatever you use :-)

A container won’t magically make an application secure, but if you build them with “known good” code and only put things inside the container that need to be there then you’ve made a good start.

Some guidelines, which should just be common sense good coding practices anyway. These guidelines apply to traditional compute environments on physical machines, on VMs, on containers. Control your builds!

Don’t create a whole copy of the OS to put inside the container

Some PaaS systems (eg Cloud Foundry) will provide the necessary infra via “stemcells”. If you’re using docker then you might want to use a baseline to layer on top of. If you’re building a container from scratch then only put in code that is essential. If code isn’t present then it can’t be exploited. Minimise the footprint; minimise the attack surface.

Don’t pull code direct from the internet.

In March we saw how one person removing code from npm caused build systems from all over the internet to fail. Your builds must be repeatable, and in order to do that you must control the dependencies.

Another risk of pulling from external repos is that if you don’t define your dependencies to the specific version then new code may be introduced upstream and break your app. Or, worse, rogue code gets introduced that creates a backdoor.

Similarly if you use a binary repo (eg docker) then don’t have build time dependencies on them; bring the objects in-house.

This may mean you have your own internal repo and code against that. You’ll also need to maintain this repo, but at least you get the chance to scan the code, verify the licensing agreements, etc. Do you want your code to be BSD licensed? Better verify none of the code you’re using is GPL’d.

Code scan

Everything you build should be scanned. Binary objects should be scanned. Results should be compared to known CVEs and to good coding practices (eg those from OWASP). Good tools can identify bad coding practices that could lead to exploits.

Don’t rely on these, of course. You should always strive to write good code. But scanning can be a useful backstop to catch mistakes.

Automated deployment

The resulting image should go through your normal test/QA processes and then the passing image pushed into your “registry”. Production containers should only be created from the approved registry, never from ad-hoc code.

Scan existing container images

You’ve just jumped through a hundred hoops to build your good container image. You can relax, right? Nope.

Unfortunately code has bugs in it. I know that’ll come as a shock! That code you scanned earlier; that library you depend on; that OS you based your image on… a new bug has been found in it.

Fortunately you have your registry of good containers. You should keep rescanning these containers. When something like shellshock appears you will be able to identify what images are vulnerable.

The action you take may depend on the risk. Perhaps you can block any new instances from being created from this image. Maybe you’ll just flag it and make the owner responsible for fixing within 90 days. You may be able to automate repair (eg vendor provided an OS level fix; patch the base image, auto-rebuild the dependent images).

What you do here is very specific to your environment and risk exposure, but it starts with knowing there’s a bug to be fixed. That’s what this repetetive scanning does.

Replace bad running containers

It’s all well and good fixing your image registry, but what about all those containers that are running? You need to know where each image is running so that you can shutdown and replace the bad image with the repaired one.

Your “orchestration layer” of the automated deployment should do this for you; as it builds a running container it should track the container (where it was deployed; what image it came from). When the container is destroyed it needs to track that as well. Now, when your image has been corrected you can push out the good version, remove the bad version. You can also report on known bad running images in order to identify your existing risk.

This is very important in a container world simply due to scale. You may have a million containers running around; you can’t keep track of them all… but your orchestration layer can! Or your service discovery layer. Or some other tool. Something has to track your containers.

Summary

Most of this is identical to the processes you should already be following for good code deployments. Containers don’t bring much new to this, although they may change how you do it. Containers also encourage the development of 12 factor applications.

The challenge, of course, is that many enterprises don’t follow these processes. They allow a wild west development style and call it “DevOps” and “agile”; they focus on speed of delivery and sacrifice many of the controls that are necessary.

Take time, now, to build the management tooling and automation of these steps. Then you can genuinely build an environment where your developers can get code live using whether named process they like. Your automation will help encourage good security, good code design, and also remove hindrances to deployment.