In this glorious new world I’ve been writing about, applications are non persistent. They spin up and are destroyed at will. They have no state in them. They can be rebuilt, scaled out, migrated, replaced and your application shouldn’t notice… if written properly!
But applications are pointless if they don’t have data to work on. In traditional compute an app is associated with a machine (or set of machines). These machines have filesystems. We can write data there. If we want to share data between machines we can use something like NFS. It’s very easy to persist data.
In our new dynamically scalable app migrating world we don’t have this.
So where do we store our data?
The standard answer is to use an external datastore, such as MySQL or CouchDB or an object store (typically presented with an Amazon S3 compatible API, even if not actually using Amazon S3). Your application doesn’t persist any data; these resources are attached (or bound) to your app so they can be used.
Even for users of databases this may require a change in behaviour; you might write out all your important data into the database but write out logs and performance data to the filesystem. You can’t do that any more; everything you want to keep needs to stored in the database or S3 store. And that requires a code rewrite.
But I’ve never been a fan of that. Why can’t we treat a filesystem as if it was another attached resource? This data could also be shared between instances of the app.
With docker we have some of this ability with the
-v flag. Let’s
create a directory and share it across two running instances:
In terminal window 1:
tty1$ sudo mkdir -p /export/myapp tty1$ docker run --rm -it -v /export/myapp:/myapp --name inst1 centos [root@cd0c4a2b0055 /]# ls /myapp [root@cd0c4a2b0055 /]# echo world > /myapp/hello [root@cd0c4a2b0055 /]#
In window 2:
tty2$ docker run --rm -it -v /export/myapp:/myapp --name inst2 centos [root@988151662ef1 /]# cat /myapp/hello world [root@988151662ef1 /]#
We can shut down both containers, and they’ll be destroyed (because of the
--rm flag) and the data will persist on the host
[root@cd0c4a2b0055 /]# exit tty1$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES tty1$ ls /export/myapp/ hello tty1$ docker run --rm -it -v /export/myapp:/myapp --name inst3 centos [root@7ed3b5e955f6 /]# cat /myapp/hello world
So here’s an easy way of persisting data in a manner that developers are already used to.
It’s not that easy
There are a number of problems with this model. First and most important is data security. Because the data is present on the host then anyone with access to the host might be able to read this data. This is why I’ve recommended that production container servers should treat the parent OS with high access restrictions; as restrictive as your hypervisors in a traditional VM.
There may be problems with
SELinux or other labelling systems; the
directory created didn’t have any of the right labels so if the OS
was set to enforcing mode then access to this directory may be rejected
(indeed I had to do a
setenforce 0 for these tests).
We’ve only looked at a single host; in a real world environment your app
may spin up and down across dozens of different servers. So you may need
your persistent datastore to be coming from an NFS server or similar.
You might just
mount that at the host level. Docker also allows for
different backends to be used with the
-v flag; it can talk directly
to an NFS server, for example. That’s pretty powerful!
At lot of this docker functionality is documented in a tutorial.
You also need your orchestration tool to be able to support this configuration and start up your containers with the right flags. Mesos, for example, supports persistent volumes; Kubernettes supports similar.
There’s nothing that says this has to be docker only; for example, if you use
systemd-nspawn to manage your containers then there is a
to do similar stuff.
Even Amazon are in this game with Elastic File System which presents your storage as an NFSv4.1 accessible filesystem that
mount onto your EC2 server. This means you can take the persistency
out of your AMI and put it into EFS; the result is something very similar
to the docker examples earlier.
We don’t have to use storage tools such as S3 to keep to a 12 factor design. It’s perfectly possible to keep your standard filesystem semantics and keep your application layer immutable and all the rest of the goodness.
This can even make operations easier; that same persistent volume could be shared read-only with an “operations application”; your operate team can read the logs, analysis performance statistics on demand (only spin up that app when needed) without needing to access the production application container.