I’ve been talking to some smart people about deployment. First a little background. One of my colleagues was working on a project that ultimately didn’t bear fruit. It was a system for continuous delivery, and involved reacting to
git push by building and shipping to production. But it felt as if the problem shouldn’t be separated from provisioning, and from setting up a development environment, and so these things got folded in, and the effort became a boil-the-ocean project that had to be set aside.
During this process I came to appreciate my colleague’s point of view on topics such as how software should be deployed and how it should be designed to run in its environment. A lot of this is encapsulated in The Twelve Factors. In fact, I actually created and deployed to a Heroku app, and experienced firsthand why people love Heroku.
At Velocity this week, we’re talking a lot about resilience, operations, and so on. Introducing change is often one of the things that exposes failure modes in our products, and this fragility is probably more likely to show up during deployments than almost any other time. But there’s not a lot of discussion at Velocity about deployment — that vital part of the engineering process where we take code from development and mutate our production environment to include it.
I started asking some people about this, because I need to resume the deployment effort at my own company. A few people gave me pointers to prior art to look at, but some other people told me it’s not a solved problem for them either. I’ll include some of that information in this post.
But first I want to write my current thoughts about this subject, to get it on record and to stimulate more conversation. Later I’ll follow up after I’ve learned more.
It’s a little difficult for me to organize my thoughts coherently, so I’ll just drop a list on you:
- I want continuous deployment because if it’s not a part of the culture at the company, we’ll engineer ourselves into a corner away from it and it’ll get more and more difficult to ship code rapidly later. Tools are vital; rules can’t overcome the natural reluctance (or just lack of incentive) to push code to production. I also want continuous deployment for a few reasons that might be obvious to readers. First, small, incremental changes are a lot less risky. Second, it is a positive feedback cycle. Third, code that’s written but not serving customers is inventory that I’ve paid for but not benefited from (and neither are customers). The cost of this inventory is very real; this is a philosophy expressed well in Eli Goldratt’s book The Goal.
- Code should be deployed when it’s merged to master/HEAD and all tests passed. It’s a good question whether it should be deployed completely automatically, or whether it’s good to let people batch together some changes. I favor the latter. We might not want to deploy every typo fixed. We don’t want the batches to accumulate, though, or deployment gets really scary and risky. Visibility into whose changes, and what they are, is important for this scenario. Approval by all involved is also important.
- I don’t want to build all of this infrastructure myself. I want to use external providers as much as possible. I prefer to buy or rent rather than build, because I won’t do the job as well, and it’s not my core business. I don’t want to engage in “undifferentiated heavy lifting,” to quote someone smart at Netflix.
- However, there’s a tension here. External providers must be a convenience, not in the critical path. If one or more external providers is down, that can’t be a hard block on a deployment. The last thing I want is to have downtime I can’t fix because someone else has downtime too. As an example, I want to continue to use Github and CircleCI, but I don’t want to make them SPOFs. But if I have an alternate, less-used deployment route, that’s also a problem; there should only be one deployment system, or the fallback will fail when I need it. I think the solution is to make Github and CircleCI trigger deployment, but only as one possible source of triggers.
- Deploying binaries with restarts is very different from source code deployments, and there are other types of deployments that need to be considered as well. Deployment to stateless resources (a web server) is a lot simpler and less risky than deploying to something that is stateful, or affects something stateful such as a database server. There’s also the matter of migrations. From my experience with lots of large companies, migrations are simplistic and I’ve never seen them scale beyond toy applications. But taking them out-of-band means the system is not completely self-documenting, and may not be runnable unless some change or other requirement is satisfied, which can only be performed and verified by a human. These are concerns I don’t know how to resolve.
- Although it’s tempting to put manifests (Procfile) and include provisioning (and even scaling) in the deployment process, I think it’s better to put a strong barrier between those. Otherwise we’ll end up with a hairball that can’t be dealt with separately. System provisioning and configuration to prepare an environment to be deployed into is not part of deployment. Similarly, there needs to be some thought about a service directory to register and mutate the state of the overall system, such as taking apps in and out of proxies and load balancers before, during, and after deployments. That might need to be part of deployment, or the provisioning, or both.
- Most of the services I’ve seen for deployment want to imagine that the world is all on Heroku, where a deployment is a
git push. Unfortunately, as nice as that is, it isn’t going to work. The other thing many of them offer is “we’ll run your Capistrano jobs” — but that’s also not workable, because allowing external hands to poke into our systems is not an option. Agent-based deployment is preferable. I have good experience with this, even with self-upgrading agents. There are some companies (Distelli) that do something reasonable here.
Thoughts from other people I talked to include:
We think about this basically all the time at [company]. The difficulty is that organizations build their own because there’s a likely corollary to Conway’s Law here: deployment and development infrastructure are context sensitive, so therefore organization sensitive.
And Jez Humble offered this:
I’m running a conference which talks about continuous delivery, lean UX, devops and related stuff: http://flowcon.org/flowcon-sanfran-2013/schedule/index.jsp
I look forward to your thoughts and links to further study. Thanks!
Something interesting happened after I published my ultimate notebook and journal face-off blog post a couple of months ago. I received an email from a company called Grandluxe, asking if I’d like to receive some stationery products in hopes that if I liked them, I’d write a review on them. I had never heard of them before, but they’ve been making paper products for 68 years, and apparently are trying to break out of the Asian market into international territory.
I agreed, and they sent me quite a large box of notebooks. They also sent me their full product line catalog. So you could say that, as a notebook nerd, I’m in heaven!
I really like the products. (I’ve already disclosed my bias. But I’ll do it again. I was given these free of charge.)
Monologue Ruled Notebook
They sent me an entire set of the Monologue ruled notebook in A5 size, assorted attractive colors. It is very similar to a Moleskine-class notebook. It has a slightly softer and more “grippy” cover, but otherwise there’s little difference. I had enough to give one to my wife, to each of my coworkers, and keep the prettiest color for myself.
I received three small (A6) and one A5 in this product line. The paper is a little smoother and the edges are gilded in gold and silver colors. The covers are charcoal black with silvery/goldish undertones. They are hard and stiff, not soft and flexible like the Monologue, and have a knurled or bumpy feel. The spine is embossed with the product name in gilding to match the edge of the sheets. The A5 has a pseudo-reptile-skin pattern on the binding. These would make a great gift or promotional item, because they look really sophisticated. Note that the colors in the image below are a little exaggerated; in reality the gold/silver tones are more subtle.
This item doesn’t seem to be available on their online store yet, but you can find more details and images here.
This is a slim notebook with a flexible cover (thick paper, not rigid at all) that comes in combination packs. Each pack has one ruled notebook and one blank notebook with perforated pages for easy tear-out.
This item doesn’t seem to be available on their online store yet, but you can find more details and images here.
I recently stumbled on an interesting system of journaling / note-taking called Bullet Journal. Although it’s a little bit over-the-top in some ways, and sometimes feels like being told the obvious or told to follow a system that seems unlikely to hold together in the “real world,” it is fun to see that other people enjoy taking notes on paper as much as I do. Worth a read.
There’s an embarrassment of riches to choose from when it comes to journals and notebooks, and my bookshelf is stockpiled with a couple of years’ worth now that I’ve done this epic series. But in today’s digital world, can you ever get too much of good old paper and pen? Enjoy the simpler pleasures… I hope this blog has helped with that a little bit.
I have followed the “Use the Index, Luke!” blog for a while. Today Marcus wrote that (I’ll paraphrase) MongoDB disgraces NoSQL the same way that MySQL disgraces SQL. I agree with a lot of this, actually, although I’m not sure I’d put it so strongly. People often like products for good reasons, and to think that legions of developers are stupid or ill-educated is suspect, in my opinion.
But that wasn’t what I meant to write about. I wanted to point out something about the blog post that’s a little outdated. He wrote, and this time I’ll quote, “MySQL is rather poor at joining because is only supports nested loops joins. Most other SQL database implement the hash join and sort/merge join algorithms too.”
It’s no longer true that MySQL doesn’t support these, and hasn’t been for a while, depending on which version of MySQL you look at. What’s slightly unfortunate, in my opinion, is that MySQL doesn’t call out in the documentation that they’re actually implemented. MySQL documentation talks about Multi-Range Read, Block Nested-Loop, and Batched Key Access join “optimizations.”
Functionally, these are closely related to combinations of hash and sort-merge join algorithms, and really represent mixtures of features from them combined in different ways, depending on the exact query. Most “sophisticated” RDBMSs also implement a lot of subtle variations — edge-case optimizations are really worthwhile. It is rarely as cut-and-dried as pure hash-join or sort-merge join. And in the end, there is always — always — iteration over rows to match them up, regardless of the data structure used, regardless of the RDBMS. MySQL happens to call these variations “nested loop join optimizations” and similar phrases, but that’s what they are in other RDBMSs too.
MySQL does very well on many types of joins for which sort-merge and hash-join algorithms are designed. See, for example, this blog post and this one and also this one on MariaDB’s further optimizations.
I think the MySQL documentation could help a little by calling things names that normal users understand. The names we see in the documentation are really reflective of how the optimizer internals gurus think about the algorithms, in my opinion. I think the names describe the implementation, not the end result. I’d suggest phrasing it differently for general consumption by the DBA public. Perhaps something like “sort-merge join implemented with a _____ algorithm.” Or perhaps — and I will admit I don’t keep the details fresh in my mind so I’m not the one to ask for the right answer — perhaps the algorithms MySQL uses really aren’t as related or comparable as I think they are, and a different type of explanation is in order. But I bet a lot of DBAs from SQL Server and Oracle Database backgrounds would find it helpful to have an explanation in familiar terms. (This concludes my free and probably unwanted advice to the MySQL docs team!)