Persistent Applications · Ramblings of a Unix Geek

23 Apr 2017, 16:54

container / docker / public cloud

A while ago I wrote about some of the technology basics that can be used for data persistency. Apparently this is becoming a big issue, so I’m revisiting this from another direction.

Why does this matter?

In essence, an application is a method of changing data from one state to another; “I charge $100 to my credit card” fires off a number of applications that result in my account being debited, and the merchant being credited. Similarly when my employer pays me a million bucks (hey, it could happen!) there’s another set of changes.

At the end of the day, a large proportion of applications operate on some form of persistent data.

persistence on non-persistent HTTP.

Web developers are used to dealing with non-persistency. After all, every HTTP request is a unique connection. Developers learned to deal with this.

In the very early days it was done by the construct of hidden fields in a form. Each step in a multi-step process resulted in the web page, itself, persisting the state from the previous steps.

When cookies came along, some of this information was stored in the cookie. And then the concept of a session ID came along, with data persisted on the server and the client just identifying the session. The languages grew to help and hide this complexity (eg PHP session IDs).

When we expanded to multiple servers and load balancers the load balancer learned how to interpret the session headers and direct traffic to the same node.

Today this type of persistency over a non-persistent transport is considered normal. We know the failure modes (e.g. server reboots) and the smart developer can handle this state.

Non-persistent containers

This web development model nicely grows into the modern container world. Containers can easily have a temporary writeable overly /tmp and so your existing PHP app can maintain its state database for the lifetime of the container. We may reboot containers more frequently, though, so a smart developer might use a distributed store such as etcd; now it doesn’t matter what container processes the request, it has access to the transaction state. This small change in development actually makes life easier; the load balancer doesn’t need to understand state because any container instance is good and so the least loaded instance can be used.

Non-web apps; Lift and shift

The world isn’t so clean cut for non-web apps. A number of these apps are based inherently on the concept of a persistent infrastructure layer. This may require a code change, or else using the “machine container” pattern aka “container as a lightweight VM”. With a small amount of effort the app may be rewriteable to work with an attached persistent store, or to use an external persistence layer (database, S3 store, something else). The lift-and-shift approach gives least benefits of the container model; you might want to reconsider why you want to containerize this app.

Databases

A database is just a special case of a non-web app. We could add a persistent backend store (the equivalent of a -v flag with docker) to the container spin up, and treat it as a cluster build. We can then spin containers and the new image will reconnect to the existing backend store and act as if it was a recovering database rejoining the cluster

Summary

Not everything needs to be placed in a container; is it the right technology solution? However, containers are flexible. There’s more than one way to use them. A PaaS, such as CloudFoundry or Apprenda, pushes for the 12 factor approach of development. This gets a lot of headlines. But it’s not the only approach. An existing web app can normally be easily ported to an immutable container model (just with a writeable /tmp). A small amount of rework can allow a more traditional app to work. Or we can use the “lightweight VM” approach and persist data.

I find the “containers, containers, containers” mantra to be a little offputting. For decades I’ve maintained “use the right technology for the job”, whether that’s a Windows machines or a Linux machine or even a mainframe from the dinosaur pool (they have their uses!). Similarly, physical, virtual, container… and the type of container model; there is no one size fits all solution. Pick what is right for your use case.