Secure your cloud

The Russians are coming! The Russians are coming!

I got asked another question. I’m going to paraphrase the question for this blog entry.

Given the Russian invasion of Ukraine and the response of other nations (sanctions, asset confiscation, withdrawal of services, isolation of the Russian banking system…) there is a chance of enhanced cyber attacks against Western banking infrastructure in retaliation. How can we be 100% sure our cloud environments are secure from this?

Firstly, I want to dispel the “100%” myth. No security is 100%. Your on-prem environment isn’t 100% secure. The cost would become prohibitive and potentially greater than the value of the asset’s you’re protecting. Instead you perform an analysis and reduce the risks to a level you are comfortable with, at a price you’re willing to pay. These risk reduction strategies may include preventative controls, detective controls, impact mitigation strategies and more.

So the question, really, is “Are we sure our controls and strategies are good enough”.

Now there hasn’t yet been any confirmed reports of high threat attacks against financial institutions (going by the CISA reports), the UK GCHQ held a rountable to discuss the threat to critical infrastructure.

So what makes the cloud different? That’s a massive topic because “cloud” is too generic a term. I’m not going to talk about SaaS offerings like Workday, Salesforce and the like (although these are important and may contain sensitive data, so make sure they are controlled!).

And that includes O365! If you’ve outsourced your Exchange environment to Microsoft (and who hasn’t? It’s convenient) this could be a critical dependency… can you still operate if your email is down? Your Teams chat is down? Your very phone system (integrated into Teams) is down?

But that’s not where I’m going; Instead I’ll focus on primary cloud application environments such as AWS, Azure or GCP.

Shadow IT

Infrastructure sitting on-prem automatically gains a level of protection by being behind corporate firewalls. It can’t be reached from the outside (“you can’t hack what you can’t reach”) and it shouldn’t be able to make outgoing connections; firewalls control egress as well as ingress!

And your processes better require CMDB registration before firewalls will be opened (“no you can’t talk to the database from the app tier; firewall blocked!“; “no you can’t talk to the web proxy!”).

Even if the server isn’t in your CMDB it gains these protections.

But cloud Shadow IT… now a whole environment could be spun and exposed to the internet. This may not (should not!) have full access to on-prem resources and so may have a limited blast radius, but it may receive a data feed from on-prem services and so store or process data sensitive data.

And these environments won’t gain any of the traditional on-prem protections. They’re open, exposed… and we may not even know they’re there!

There are services you can sign up for that scan the internet and try to work out the owner (eg based on the TLS certificate information… if it has an OU=yourbank then it may assign that to YourBank). They’re not always very accurate initially, and so have a cleanup cost. But once it’s clean any new “Hey, we just found this!” change should trigger a review… is it another false-positive or did someone spin up shadow IT?

Borders

Which leads into knowing your borders. Again, on-prem datacenters have traditionally been built with a “hard shell and soft chewy center”. We have a well defined border. But with the cloud… ah, there’s a lot more borders than you may think. And they may change! Last year I even showed that Alexa could be used to exfiltrate data!.

This may be the worst part of the cloud; stuff is now exposed to the internet that, really, shouldn’t be. This may include data stores, control planes and more. Since these things may be exposed to the hostile internet, access control, auditing and monitoring become critical

Data Stores

Here’s another area where the cloud is different. Once again, traditional on-prem stores gain a level of protection just by being on-prem. The cloud… this may expose your data stores directly to the internet! Indeed a number of years ago there was almost a new breach notification every week because someone had left data in an open S3 bucket. Yay.

Cloud service providers (CSP) have matured since then and a number of these services now have private endpoints that are exposed only to your VPC, but you need to verify that every data store you have configured (S3, RDS, DynamoDB…) are all set up securely. If any are exposed to the internet then you need to verify they are properly restricted via IAM policies!

Another area the cloud is different at is activity monitoring; your tradition DAM (Database Activity Monitoring) solution may not work, because it may require an agent installation on the server or similar. We can be limited by what the CSPs service provides… and this is variable and different across CSPs, and even different between offerings within a CSP! If there is a breach (eg bad IAM policies, or a pivot from a trusted server) can you track what data the attacker has seen?

But it’s not all complications. Just by being separated from the primary desktop environment, cloud data stores are less susceptible to things like ransomware or wiperware. These typically get into an environment via the desktop, and local disks and NAS mounted volumes may quickly be encrypted or data erased. Your cloud block storage and cloud databases are sufficiently removed. These type of attacks are a common tactic, and the CISA are claiming that Sandworm (aka Fancy Bear), suspected of being associated with the GRU, have new malware. This organisation has a history of ransomware and wiperware, including NotPetya.

Control Plane

And here is where we start to wince. The control plane is possibly the most sensitive part of an infrastructure; you can create VMs, destroy VMs, look at their data, modify permissions… lots of stuff.

Your on-prem control plane may even be on a separate network segment because it’s so sensitive. But the cloud one? That’s exposed to the internet and can not be changed. It’s designed to be accessible everywhere.

The best you can do, here, is to ensure your access control policies are tightly locked down, and auditing and monitoring is in place.

IAM

For a lot of CSP native services, IAM is the core of the access control. It controls everything. Who can access what. What resources are exposed. Where access is allowed from.

If you mess up your IAM policies, you’re asking for a world of hurt.

And it’s not simple. Looking at an AWS IAM policy is like staring into the abyss; do it too long and it’ll stare back at you and you’ll start seeing the whole world as a JSON policy!

Fortunately tooling now exists for this. Even better, the API nature of the cloud means that access need not be so static. We can move from the traditional “minimum necessary access” through “just in time access” all the way to “zero standing access”. You want access to the cloud control plane? Jump through these hoops, do some MFA, and you’ll get the access you need… and we’ll remove it in 2 hours time. In AWS you don’t even need an account; just assume a role.

As with networking “you can’t attack what you can’t reach”, you can’t login to accounts without access.

This tooling can also help with IAM role analysis; are people over permissioned? Are people granted permissions who never use the account? Can we define RBAC models?

I don’t expect many companies are at this stage, today (when we’re supposedly worried), but it’s a target they should be looking at.

But even without this zero standing access control, we already know how to control IAM access, monitor it. It’s a critical area, considering what it controls (including the control plane!) so we’ve put controls in place.

Network Segmentation

Now here is another area the cloud can be better than on-prem. Because of the Infrastructure as Code (IaC) nature of a cloud environment it’s very easy to create macro segmentation between applications. Instead of the nice chewy center of your traditional datacenter we now have a honey comb. East-West traffic can be controlled.

This helps control the blast radius if an application is compromised (again, you can’t attack what you can’t reach). But, further, it makes it easier to shut down access. If an application with access into your core services (e.g. the mainframe) gets compromised it is easy to block all traffic to/from that application. We’ve now contained the problem, prevented further infiltration into the environment and blocked further data exfiltration.

The out-of-band nature of the control plane also now means you can easily snapshot the service and preserve it for forensic analysis.

The cloud can make the whole DFIR process easier than on-prem. Suspicious behaviour can even pre-emptively start snapshotting and tracking of activity before a breach occurs.

Modern security tooling

The cloud makes it possible to do security in new ways.

For a traditional VM in your datacenter you need to do a credentialed scan of the environment to see what’s really going on. This may be remote initiated (eg via ssh), or using an agent. And now we have to worry about coverage (are my agents on every machine? Are they working?). Which requires an accurate CMDB. How frequently is that updated? Are decomm’d assets removed? What about machines that are powered down?

The cloud, though… it can tell you an accurate “as of now” inventory of resources you have, and their state. And you can snapshot the disks (whether the instance is running or not) and scan the snapshot for malware or insecure software or dodgy entries in log files or…

Spin up a new VM and you get scanned even without needing to worry about agents!

Add in information from network flow logs and you can start to build connectivity graphs between assets, and help determine actual risk to a vulnerability.

Of course it’s not a panacea. They can’t see inside the running box, so fileless malware won’t be detected. They also can’t see images from filesystems they don’t understand. They can’t read locally encrypted disks (eg via dm-crypt). And they can’t see process execution. You still need agent/credentialed scans for that. The best of both worlds is using the “ambient” security at the CSP layer, enhanced with data from agents.

But just having the ability to scan something because it is there is a massive change in security. It’s now “ambient” to the environment; something that just happens with no additional effort.

CVSS isn’t necessarily good.

Many risk controls are based around a CVE. “Oh no; this has a 9.5 rating! Fix now!“. But this isn’t necessarily an accurate definition of your actual risk. “Yes, log4j is sitting on this box, in a directory called /opt/myapp_oldversion with a permission of 000.” But I’m not at risk.

This new tooling now lets us look at vulnerabilities in a different way. Let’s say we have a Apache RCE. We find this on 10,000 machines. Do we need to patch them all ASAP? Well, we can look at the risk. Internet exposed services are clearly “fix yesterday”. Non-internet facing is still at risk (eg insider attack) but could be lowered in priority.

But how do we know if something is internet exposed? The connectivity graph can help here; “yep, this goes through a firewall, through a WAF, through a load balancer… even though it only has an internal IP address, it’s internet exposed!”

I’m not sure any tooling is really that good, at the moment, and even the modern tools can’t see inside non-native configurations (e.g. if you have a Palo Alto virtual appliance then the tooling has no idea about connectivity). So our scope to automate this is more limited.

But even if we do come up with a good risk score for each instance based on the best data, is this the right thing?

There’s a number of known vulnerabilities without a CVE at all. And, of course, there’s always a zero day.

Using CVEs and risk scores to target remediation activities and thus lower your risk is good. But it’s not, in my opinion, sufficient. It’s just another part of a defence-in-depth strategy, along with firewalls, WAFs, auditing and the rest.

Your threat intelligence feeds tell you that there’s a new attack on a component; your ambient security tools tell you we have a lot of vulnerable components; we program the WAF to block traffic matching detected patterns and then use risk scoring to prioritize remediation.

I would argue that reporting risk based purely on CVSS scores is doing your organisation a dis-service.

Are we there yet? Probably not! But the new tools I have at my disposal give me more visibility into my cloud environment than I’ve ever had on-prem!

Attacks on the CSP itself

Of course this has all been looking at the attack as an attack on the company.

We also need to consider attacks on the CSP. If AWS infrastructure gets breached then all of the controls I mentioned above don’t necessarily mean a thing.

This isn’t an idle thought, either. Last year Azure CosmosDB had a vulnerability (ChaosDB) which could have exposed data for years. Azure container instances had a flaw that could have led to cluster admin permissions. And, Azure Functions had an exploit.

To the best of my knowledge, all potential CSP exploits were responsibly disclosed and fixed before data had been exposed.

But what if our Russian friends found an AWS zero-day? What is our exposure?

This isn’t easily measurable. The CSPs, themselves, have cyber teams to detect actual exploit and infiltration. What we’re doing as the customer of the CSP, they’re also doing on the underlying infrastructure.

Is it 100% perfect? Obviously not.

CSPs are a high value target for attackers; break into that and you have potential access to thousands of victims. But they’re not the only people in this privilege position. Traditional MSPs have, previously, been used as a route to attack their customers. Every VPN you have to a provider, a partner, a client is an attack vector. Even traditional on-prem equipment (eg firewalls) have had vulnerabilities.

Conclusion

In some places cloud is better and new technology solutions can make security easier and more pervasive than traditional on-prem solutions. In other places we’ve opened new borders, lost traditional protections, exposed critical interfaces to the internet; these all need solutions that on-prem didn’t need because the risk profile is now different.

The security controls and processes are different as a result.

But the cloud isn’t new. I’ve been working with the public cloud for 6 years and I’m a relative newbie compared to many. These are all known problems and you should have solutions already in place; even if they’re just detective.

I suspect most breaches will occur at the application layer, and that’s the same on-prem or in the cloud.

So am I 100% confident our cloud environments are secure from a state sponsored cyber attack? No. But I do have an equivalent level of confidence as I do for on-prem environments!