Something I’ve been pushing (and this is pretty much a truism amongst anyone who’s looked at “Cloud”) is the idea of automation. It doesn’t matter if you’re just treating the cloud as an outsourced datacenter or if you’re doing full 12-factor dynamically scalable apps. Automation is the key to consitency and control.
So, ideally, this means your automation system is the “single point of
truth” for your estate. Whether you use
chef or (saints
cfengine, your configuration file explicitly defines
your target state. You can learn everything from that.
But is this true?
It’s nice in theory but, as is always the case, practice may be different.
Your source of truth may contradict itself.
cfengine is easy to see; one promise could say “X is true” and another
promise could says “!X is true”.
cfengine will complain that these rules
don’t converge (assuming anyone reads the logs) and your server is in an
unknown state. This is simple.
But there’s a more subtle failure mode.
Let’s say we use
ansible to build our environment. The build process
calls a sequence of playbooks to take your machine from raw state through
to final configuration. So far, so good.
Now let’s say each playbook should be in its own
git repo; after all,
the playbook that installs and configures
apache doesn’t really need
to impact the playbook for
postfix. It makes sense to seperate out these
playbooks into different areas; different teams may be responsible; different
access controls can be applied (you don’t want the SMTP team to impact your
OK, that’s a contrived example, but you can see how it goes; the team building out your Postgres database automation shouldn’t necessarily have the ability to change the configuration of your OpenLDAP servers.
But here’s where things get complicated…
Sometimes there is overlap. Your
apache automation may configure the
addresses of your single sign on servers. Your
nginx configuration may
require the same data. If they’re in different repo’s, then how do you
Your single point of truth (“this is the single signon server”) may not be consistent.
There’s no simple answer to this. How you factor your code repositories, how you factor your automation, how you build systems will evolve over time. But be aware; if you define a variable (“single signon server”) for one playbook, maybe it’s also useful elsewhere? Define a global namespace?
I spotted this in my own tooling. I have a script that will build my DNS and DHCP configuration. Given an entry in a config file it will build A, AAAA and PTR records for the machine.
I noticed, today, that one of my domains isn’t controlled this way. It has an A and AAAA record that’s hard-coded. I’m sure this will bite me in the bum down the line (when the primary server fails and I need to failover to a secondary). Will I remember this? Or should I fix my automation. The answer is obvious…