When Development is Production

Infrastructure is always critical

It’s an article of faith that the development process starts in the part of the network set aside for development work. Then the code may go to the QA area for QA testing, UAT area for UAT, production area for production.

That statement almost looks like a truism; development work is done in DEV.

So a corporate network may be divided along dev/uat/prod lines, with firewalls between them so that development code can’t impact production services. After all, we don’t want some development code that’s doing application schema updates to accidentally drop the production database!

But this is not always true.

Development is production

For the development team, themselves, they have expectations on the platform infrastructure to be available so they can do their work. If the VMware team decides to test a new patch set in the DEV environment and brings down the development cluster then they’re gonna have a lot of annoyed developers banging on the door. Depending on the severity of the outage this could cause application delivery timelines to slip, for contracted delivery dates to be broken and cause actual financial impact to the company.

Further, some of the core infrastructure may be shared between environments. The SAN arrays may have LUNs assigned to DEV machines and to PROD machines; networking switches (due to VLANs and trunks, inter-datacenter links) may share paths.

So if you’re going to do something that may impact core infrastructure or large numbers of development machines, you don’t start in DEV… you start even earlier.

The lab is for more than use Proof of Concept tests

A lot of people think of “lab environments” as places to do tests. “Hey, we want to test out some piece of software, see if it does what we want. Let’s spin up a proof of concept in the lab.”

And, yes, that’s the right place to do it. Labs are typically isolated from the main networks and needing jumphosts or bastion hosts to be accessed in order to get to the machines. They will have separate network infrastructure, separate Active Directory domains. They’re probably not as well managed and domain admin privileges may be easy to get. This leads to some levels of instability.

But despite this, the lab is also where sustained engineering processes start.

Sustained engineering

Products have a life cycle. They get introduced to the company, tested, deployed, updated, patched, and then (if you’re lucky) sunsetted and shut down.

The introduction is in the lab, as part of the aforementioned Proof Of Concept. Then rolled out to DEV…

But the lab testing doesn’t stop there. Updates should also start in the lab. Let’s say you want to test a new version of the anti-virus package that you’ve been using for 4 years; this should start in the lab. You want to test a new data collection command to be deployed to the existing tooling; this new command should start life in the lab. You want to a new new ansible/chef/puppet/cfengine rule? Start in the lab.

What this means is the lab is a persistent environment. You may destroy and rebuild components of it so the thing doesn’t diverge too far from what is deployed to the main networks, but you’ll always have your AV suite, your data collection tools, your automation tools in the lab.

You then need to deploy to the rest of the network, which may be done in a risk-managed phased approach.

For example, you may not have all the tools available in the lab to talk to (HR systems, for example, may not be there). So you need to perform integration tests outside of the lab. This should be in an “engineering QA” environment; effectively your area of the DEV environment, made to look as close to production as possible. Only when these tests have passed can you start to deploy across the environment…with caution.

You need to look at what impact an outage may have; clearly production impact is worst, but is the DEV environment more/less impactful than QA/UAT? One may be customer facing… but may not be used so often. You may also want to look at regional sizes; perhaps deploying to APAC/DEV first will let you see how things work in the “real world” but having least impact. Then EMEA/DEV, then AMER/DEV…

Once that’s settled in (maybe a week) you could do {APAC,EMEA,AMER}/UAT… and then a week later, {APAC,EMEA,AMER}/PROD. Any deployment issues should be sorted by the time we get to PROD!

Summary

I look at these DEV/UAT/PROD segments this way:

  • DEV is a production environment supporting application development
  • UAT is a production environment supporting application testing
  • PROD is a production environment supporting application production

The key is that all of these are production. If you are providing services to these environments then they are not your DEV/UAT area; these are your production areas, and you need to treat them as such. If you’re bringing in new technology that can impact large numbers of machines (e.g. a new data collection tool, a new automation tool) then you don’t do your testing in DEV; that’s your production environment.

Don’t break production… whether it’s supporting DEV or not!