Continuous Delivery - Going from rags to riches

18 Jun 2016

My last year and a half has been spent guiding a team, working on a legacy codebase, to a level of maturity where they can deliver reliably.

What we took were a series of small steps, working within the constraints we had, and slowly working our way through to a higher degree of continuous delivery maturity.

My aim is to show the first few small steps we took, and to show that applying CD practices can be done in an iterative manner. With each iteration building on what was done before.

Background.

The team was in a chaotic stage. They were responsible for delivering an API for the main mobile app, building on the Microsoft .Net stack. They had just migrated their code away from TFS, to Git, and were starting to use JIRA for rudimentary tracking.

There was no CI, or automation. One of the things that I observed during the early days was the chaos around deployments. Every other deployment was rolled back. The testing team would take a couple of days to test a release. The definition of done was when a release was thrown over the wall to the testing team.

Introduce a basic level of monitoring

Introducing monitoring early on helps the team quantify the response to outages. Not every outage needs the whole team to drop what they are doing and work on the outage. The response can be quantified. The team becomes proactive in dealing with outages. This gives the team a bit more breathing room.

We introduced NewRelic, which gave us out of the box performance and error monitoring and alerting. We hooked it up to PagerDuty to get alerted as soon as issues occur.

Before the introduction of NewRelic we had no visibility of how the API was performing in production, and didn’t even know if was working properly. The only visibility we had of production issues was when customer support calls increased above the usual level.

This gave us a tiny start on CD. The team knew what was going on in production, and were eager to take ownership, without waiting for someone to assign a task to them. We were able to quantify what we were dealing with. We added more monitoring tools later on.

Visibility and limiting work in progress.

Get the team to work on only one thing at a time. A team in a chaotic state needs to get in to the rhythm of finishing work in progress and delivering it. It’s important to stress on a “Ship it” mindset early on and increase the WIP limits when the team can deliver reliably.

When a team is in a chaotic stage, they are already juggling multiple priorities, and don’t have the time to focus on doing one thing well. Limiting work in process helps the team to focus on what they are doing. I still recommend keeping WIP limits very low even when the team is doing well.

Visualise the work the team is doing. Put up the classic card wall, use electronic tracking tools such as JIRA purely for tracking.

A physical Kanban board in the team area empowers the team to take ownership of what they do. They can show what they are working on, and the act of simply working generates tangible artefacts.

Electronic task boards are a hinderance at this stage of a team’s maturity. Usually the electronic task boards are owned by someone else outside the team. The team doesn’t have the sense of ownership of their own process.

Visibility and limiting work in progress allows the team to be clear to everyone else involved on what they are dealing with.

Continuous Integration

Introduce CI as soon as possible. It’s not necessary to build the whole pipeline and an automated pipeline can co-exist with manual steps. The aim should be to convert existing manual steps into automated steps run by a CI server, whilst pulling code from version control.

The first step on our CI pipeline was only a compile step. Getting to this stage was tough. It involved chasing down dependencies that were not checked in or documented. We then used the artefacts generated by the build server to do manual deploys.

We did this first because, even though the team was using version control, deployment artefacts were built on a developer’s machine. Moreover, the artefacts could only be built on a couple of key developers’ machines. Builds had to wait for someone to generate them.

When you don’t have anything else put everyone together

Communication is key during the early stages of helping a team. Good communication builds trust between different members of the team. Focus on building trust between team members. A team in a dysfunctional state, has low trust communication between team members.

Encourage pairs between testers and developers as this creates a tight feedback loop between a developer and a tester, and helps test code even before it is committed. Keep this tight feedback loop till an automated suite of tests is in place. I recommend keeping this close collaboration even after.

What helped us in the early days was sitting together in the same area. We had a tester on the team, who had deep knowledge about the product, and knew all the quirks to test for when doing regression tests. We didn’t have the communication overhead of waiting for someone outside of the team to do a task. I encouraged the team to talk to each other, and move away from using JIRA tickets as the primary communication channel.

Amplify the good things

Even when a team is in a chaotic state there are good practices. Learn to leverage these good practices as a building block for better things. They may not be perfect, but it’s a foundation to build on.

We were lucky to have a meticulous tester, who had built his own suite of tests, even when the developers did not have any reliable tests. The tester used to run his suite of tests after every release. We used these tests as the basis for our first automated regression test suite.

We converted the rudimentary tests into a simple suite of BDD style tests. The tests weren’t perfect, but these were the tests that gave us a little bit more confidence that our system wasn’t broken after a release.

Focus on learning

The practices above, should serve one purpose. To give enough slack time for a team to learn. This is where the real change towards Continuous Delivery maturity happens. Encourage pair programming early. Talk about books, show examples of how things can be done better.

It’s easy to fall into the trap of focussing on the code, but keep in mind that the code is an expression of the thought process of how the individuals on a team think.

Give the team space to experiment and support them even when experiments fail.

Summary

Starting with these small steps, almost a year later, we were able to deliver reliably. We didn’t fix all the problems in the code. It was still gnarly in places.

We had an automated regression test suite which covered all our key scenarios that ran on every commit. We were able to commit to master and have that change deployed to production within the hour and we rarely had a broken build.

Think of all the CD practices as a toolbox. You can’t have a CD pipeline from day one, nor should the teams focus should be on building the perfect CD pipeline. Focus on educating the team to have a quality and delivery focussed mindset.

Iterate on what you already have. The small initial steps can be force multipliers.

Here are a selection of books that have helped me.

Fearless Change : Patterns for introducing new ideas - Linda Rising and Mary Lynn Manns

The Five Dysfunctions of a Team - Patrick Lencioni

Creativity Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration - Ed Catmull

The Nature of Software Development: Keep It Simple, Make It Valuable, Build It Piece by Piece - Ron Jefferies

The Goal: A Process of Ongoing Improvement - Eliyahu M Goldratt