I’ve recently had to do a bunch of research on containers, Docker specifically and virtualization in general. It started with someone who had obviously drunk the kool-aid – “I can use Docker for EVERYTHING!”
No, seriously. Someone was actually advocating for using Docker for all their virtualization challenges. I knew next to nothing about Docker and I just couldn’t support that. I wasn’t coming from an informational position of strength though. I doubt I’m coming from that place now, but at least I’m closer. I know I’m going to get in trouble here, but here is what I noted.
So what is a Container anyway?
A Container encapsulates an application. This is as opposed to a Virtual Machine, which encapsulates an application and it’s operating system.
A hypervisor can run multiple virtual machines each with their own operating system. Docker runs on Linux and the containers all seem to run Linux because it is leveraging the underlying operating system.
Docker is one (and the most famous) of the container technologies. Heck – it feels like it’s the only one right now (although Microsoft is heading in that direction too). However, there are others – such as Virtuozzo. Docker has the corporate support (with such companies as Mesosphere) and the open community you need to work with it.
Containers are good for small repeatable units
Let’s say you have a web site – maybe based on the latest ASP.NET, or maybe based on Node.js. You’ve done an awesome job. Now you want to scale it and one of the first things you want to do is handle more connections. The natural thing to do is to run multiple copies of the server. But you don’t want to run the same number of copies all the time. You want to rapidly spin up copies when the load gets high and spin them down again when the load drops off.
You’ve got a good case for containers.
Creating a new copy of a container is a light-weight task. I timed my tests in the seconds and I suspect that was due to the speed and resources available to my underlying operating system. Spinning up a new copy of a virtual machine can take minutes by comparison. Replicating the operating system disk takes the bulk of that time.
Containers are not good for stateful applications
This is the bit that I’m probably going to get in trouble for. I think it’s a bad idea to run stateful applications, like a database, in a container. You can (and people do), but that doesn’t mean you should.
That’s because the whole idea of containers is that you can run multiple copies of the same thing. If the thing has state, you are losing a lot of the value from containerizing in the first place. It would be like running VMware and putting one virtual machine on the server. You can – it doesn’t mean you should.
Of course, you can mount an external disk onto a Docker container and use that for the data store. This gives you the ability to transition the container seamlessly to another machine by bringing down one and bringing up the other. But then the state is stored externally – not internal to the container.
You can build containers in a build process
As a sometimes developer, I love this part. I can create a task in my Gulpfile that creates a Docker image (at least, I can if I am developing on Linux). This makes for a great workflow. Developers can be assured of running a golden image – the same as everyone else. If you have the same source files, the same container will result. If you are a developer on a team and have QA, then QA can encapsulate a problem, freeze the container and pass it to the developer for diagnosis. The “works on my machine” problem of the support and QA process reduces significantly.
That’s as opposed to Virtualization. If you are doing the same process in hypervisor-land, you have to set up an operating system as well. This can introduce environment drift that has nothing to do with your application. Technologies like PowerShell DSC, Chef, Puppet, AutomatedLab, Packer, Vagrant and Skytap all try to alleviate this problem of drift – in different ways and with different results. Containers isolate the developer from this problem.
How does this relate to microservices?
One of the enterprise architectures currently being espoused is to split an application into a number of independent pieces that are tied together with simple, network-based, APIs such as REST and JSON. Assuming each independent piece does one thing and handles state appropriately, it can be scaled independently from the rest of the application. Each independent piece could be a REST-based API – a microservice. Then other applications can use several of these microservices to produce a bigger workflow.
Microservices are an ideal thing to use containers, but it’s really an orthogonal problem.
What about performance?
There are several reports on both sides of the fence here. My own tests indicate that – given the same hardware and same number of containers / servers, the performance is pretty much identical. You may find some subtle changes, but you are architecting your application for scale anyway, so a couple of percentage points isn’t really signficant. The actual differences I measured were less than a percentage point.
So are containers good?
It depends on your application.
It depends on your expectation.
Let’s take two examples.
If you are writing a new web application and intend on using a PaaS database (such as Azure SQL as a Service or Amazon RDS) and a suite of microservices for authentication (like Auth0), email delivery (like SendGrid) and others (maybe mobile notifications, maybe IoT integration – who knows), then yes – you should definitely be investigating containers.
If your application depends on large databases, major integration work with other enterprise pieces or has state stored on disk – either plan a re-architect of your application or resign yourself to the fact that you will be using virtualization.
How do I get started with Containers?
Install an Ubuntu 14.04 virtual machine, install Docker and start using it. I’ll be demonstrating a build of my node.js application which includes a docker build soon.