Resiliency

Can't Patch, Won't Patch

Whenever a new “critical” vulnerability is found, the cry goes out across the land; Patch! Patch! Patch! Whenever a major incident is caused by known vulnerabilities the question is always Why didn’t they patch? We’ve known about this for months! They should have patched! Sometimes this is valid criticism, and learning why the organisation wasn’t patched can lead to some insights into failure modes.

Bottlenecks and SPOFs

If you’ve ever built any enterprise level system you’ll be aware of the needs of performance and resiliency. You may do performance testing on your application; you may have a backup server in a second data center; you may even do regular “Disaster Recovery tests”. And yet, despite all these efforts, your application fails in unexpected ways or isn’t as resilient as you planned. Your primary server dies, the DR server doesn’t work properly.