One of the “hot” things around today is the concepts of Site Reliability Engineering (SRE). I’m gonna be slightly provocative and state that this is not a new thing; we were doing this 30 years ago. Indeed, these concepts go back to where we were when I started out in this industry. Although, to be fair, there is one new factor. History Now I’ll be the first to say that my take on history is very much biased by my personal experiences, and how I worked.
One of the exciting parts of the “new world” of cloud is the ability to green field solutions. We don’t have the legacy requirements and so we’re free to do what we want. Or so the evangelists would have you believe. The past lingers on The reality is that many people are closer to a brown field environment. The organisation their team is embedded into has a tonne of reporting (“is your machine patched?
This blog post is gonna be a little different; it’s more philosophical than most of what I write. It rose out a question a friend asked: “When does a biological AI become life?” My friend asked me this because he felt that SciFi must have covered this topic, and I’ve read and watched more than my fair share :-) Remove the limitations Now, I found the restriction to “biological AI” is unnecessary limiting.
I’ve spent the past far-too-many years working in the finance industry, in mega-banks and card processors. These companies are traditionally very worried about information security. It’s not to say they always do it well (everyone makes a mistake), but it leads to a conservative attitude. These types of companies end up creating a massive set of standards and procedures to protect themselves. “Thou MUST do this. Thou MUST do that.
Even after all this time I hear statements like “Oh, we can just run our code in the cloud”. This is the core of the lift and shift school of cloud usage. And these people are perfectly correct; they can just run their stuff in the cloud. But it won’t work so well. I’ve previously written about lift and shift issues, but here I want to focus on the “resiliency” issue.
This is an odd post for me. I’m terrible as a manager. I’m terrible as a team leader. I think I’m good as a teacher and mentor, but that’s a different role. Lead by example, teach what I know, learn when I can. I’ve definitely not been in the military. And yet I’m about to write about effective leadership… or maybe bad leadership. Finally I get to see The Last Jedi.
Whenever a new “critical” vulnerability is found, the cry goes out across the land; Patch! Patch! Patch! Whenever a major incident is caused by known vulnerabilities the question is always Why didn’t they patch? We’ve known about this for months! They should have patched! Sometimes this is valid criticism, and learning why the organisation wasn’t patched can lead to some insights into failure modes.
Unless you’ve been living under a rock, you may have heard of two panic panic panic bugs, known as Meltdown and Spectre. People are panicking about them because they are CPU level issues that may impact almost every modern CPU around. Meltdown is Intel specific, but Spectre affects Intel, AMD, and potentially others (Redhat claims POWER and zSeries is impacted). What is the problem? In short, modern CPUs may execute instructions out of order, especially when the order doesn’t matter.
“To summarise the summary of the summary; people are a problem” - Douglas Adams, The Restaurant At The End Of The Universe The above quote is one of my favourite jokes (I’ve used it in a previous post); it highlights how people can complicate any situation. We can try to avoid this by automating as much as possible but, at the end of the day, there’s always a human involved somewhere; even if it’s the team that manages the automation!
It’s a fairly common design in enterprise networks; a three tier network architecture, with firewalls between the tiers. Typically these layers are split up with variations of the following names: Presentation Layer (Web) Application Layer (App) Data (or storage) Layer (Data) Typically you may have additional tooling in front of each layer; e.g a load balancer, a web application firewall, data loss protection tools, intrusion detection tools, database activity monitoring…