The People Problem

“To summarise the summary of the summary; people are a problem” - Douglas Adams, The Restaurant At The End Of The Universe

In a traditional compute environment we may have a lot of controls. There may be a lot of audit regulations. Organisations create a lot of processes and procedures. Want to login to a Unix machine? Better have an approved account, with the right authorisations. DMZ machines may require 2FA. Want to become root? Off to a password vault for break-glass (and in some environments, such as Monetary Authority of Singapore, then keystroke logging may be involved!) with management signing off on the activity afterwards.

That’s a lot of effort needed, just to login to a machine and make a change.

Changes… I hope you had a approved change record for that activity. Do we need tripwire to verify only the right files were touched? How can we verify you didn’t read “supersecret.txt” while you had root access?

So let’s not do that. Let’s not login at all. If you can’t login then you don’t need all of that technology and processes, and all the fun that involves. Script everything. Automate everything. Hands-off deployment.

“If you have to SSH into your servers, then your automation has failed” - Rich Adams, https://wblinks.com/notes/aws-tips-i-wish-id-known-before-i-started/

This is how we need to manage cloudy environment. We should automate pretty much everything. Ideally our physical servers can not be directly reached. From bare metal we should build our servers, our network, our storage, our management plane and the complete cloud environment via automation, in a few hours. No human login needed.

This automation also allows us to change the way we operate. We never patch a machine; we rebuild it from scratch. The automation brings in the latest version of the packages and the fresh deployment is our fully patched server. If a problem such as heartbleed or shellshock occurred again we are in a position to redeploy all of the servers in a rolling upgrade in hours.

This is also the only way to scale. If you’re building a dynamic elastic environment (if you’re not, then why are you using cloud?) then you may need to build 1000 servers that run for 2 hours then destroy them again. How will you do that without automation?

This is just the next step in a trend. When I started doing stuff 30 years ago, servers had a shared well known root password. DBAs knew this password. Some “trusted” developers knew this password. SAs would login as root and stay logged in for weeks at a time. Slowly we removed access starting with “root”; the developers screamed that they couldn’t deploy their programs because they needed root to do it; the DBAs screamed that they couldn’t extend their databases because they needed root to do it; the SAs screamed that they couldn’t do… pretty much anything, because they are root. And now we’re at the stage where no one even has a login!

And if you can’t login, you can’t break it so easily. We’ve also removed a massive threat vector (no server login means the accounts can’t be brute forced) and so reduced the risk.

By removing human access from the servers we have simplified the environment and made it easier to secure.

And now we’ve got this concept, why stop there? Let’s retro-fit these controls into the traditional compute environment as well. Automate application delivery; automate patching; automate… everything!