Lessons learned building a deployment pipeline with Docker, Docker-Compose and Rancher (Part 3)
John
Patterson (@cantrobot) and Chris
Lunsford run This End Out, an operations and infrastructure services
company. You can find them online at
www.thisendout.com and on
Twitter @thisendout. Update: All
four parts of the series are now live: Part 1: Getting started with
CI/CD and
Docker
Part 2: Moving to Compose
blueprints
Part 3: Adding Rancher for
OrchestrationPart
4: Completing the Cycle with Service
Discovery
In this installment of our series, we’ll explore how we came to
Rancher, detailing how it solved some issues around deploying and
managing containers. If you recall from part
2
of our series, we migrated our application deployments to Docker Compose
and established deployment jobs for our applications. This provided the
ability for developers easily to make changes to their application
deployment logic and enabled operations to see when an application was
deployed. However, there are some outstanding issues with this setup.
Challenges We Faced with Docker-Compose
First, operations have to schedule all services manually. The deployer
has to decide which host to deploy an application to, which means the
deployer must keep track of the available resources on every host. Also,
if a host or container fails, the operator is responsible for
re-deploying the application. In practice, this means that hosts are
often unbalanced, and services experience longer downtime after failure.
Second, it’s difficult to get information about the state of your
services. As an example, consider a common question asked by operators,
project managers, and developers alike: “Which version of application x
is deployed in staging?” With manual scheduling, finding the answer
often involved directly messaging a favorite ops engineer and having
them log into a server to run a docker ps. This is where Rancher
provided a huge benefit: information about deployed services was easily
accessible by everyone without requiring an ad-hoc request to
operations. Before landing on Rancher, we tried other solutions that
provided interfaces into a Docker host or cluster. One of the biggest
burdens that many other solutions did not address was multi-environment
management. Having 8 environments running various workloads, we needed a
unified way to manage the cluster without having to visit 8 different
services. Also, we wanted to give developers free reign to modify the
development environments, knowing we could rebuild it at a moment’s
notice. However, for production, we wanted to provide them with limited
read-only access. A central management plan for all environments using a
role-based access control (RBAC) model thus became desirable. We started
looking at Rancher mainly due to how easy it was to setup.
Rancher Met the Challenge
Within the span of half a day, Rancher was up and running using AWS
ELB, Elasticache, RDS and our existing Docker hosts. Having the ability
to easily configure authentication was also a big plus. We won’t go
into the details of the deployment of Rancher itself, as docs are
describing how to do this.
Instead, we’ll pick up right after initial setup and explain our
migration from the existing setup (as described in parts
one
and
two
of this series). Let’s start by creating our environments, and to keep
it simple, we’ll setup development (dev), staging (stage), and
production (prod). Each environment has existing Docker hosts running on
top of Ubuntu and configured by an in-house Ansible playbook that
installed Docker, our monitoring agent, and made a few
organization-specific changes. Rancher allows you to add existing hosts
to each environment by running a single command to register the Docker
host to the internal Rancher server. Adding a Rancher Host
Typically, adding hosts requires a few clicks in the web UI and running
an environment specific, generated command on each end system, however,
using the Rancher API, we are able to automate this step using Ansible.
For the curious, the relevant section of our playbook is below (mostly
adapted from the logic in Hussein Galas’
repo):
[[name: install dependencies for uri
module]][
[ apt: name=python-httplib2 update_cache=yes]
] [[[name: check if the
rancher-agent is running] [ command: docker ps
–filter ‘name=rancher-agent’] [ register:
containers]]]
[[[name: get registration command from rancher]
[ uri:] [ method:
GET] [ user: “{{ RANCHER_API_KEY
}}“] [ password: “{{ RANCHER_SECRET_KEY
}}“] [ force_basic_auth:
yes] [ status_code:
200] [ url:
“https://rancher.abc.net/v1/projects/{{ RANCHER_PROJECT_ID
}}/registrationtokens“] [ return_content:
yes] [ validate_certs:
yes] [ register:
rancher_token_url] [ when:
“‘rancher-agent’ not in
containers.stdout“]]]
[[[name: register the host machine with
rancher] [ shell: >]
[ docker run -d –privileged] [ -v
/var/run/docker.sock:/var/run/docker.sock] [
{{ rancher_token_url.json[‘data’][0][‘image’]
}}] [ {{
rancher_token_url.json[‘data’][0][‘command’].split() |
last}}] [ when: “‘rancher-agent’ not in
containers.stdout“]]]
With our environments created and hosts registered, let’s take a look at
how to integrate our deployment workflow with Rancher. On each Docker
host, there are some containers already running, deployed by Ansible via
a Jenkins job. Out of the box, Rancher provides the ability to:
- Manage existing containers (ex. start, stop, edit, view logs, launch
an interactive shell) - Access information about running and stopped containers (ex. image,
entry point, command, port mappings, environment variables) - View resource utilization on a host and container level (ex. CPU,
memory, disk, network)
Standalone Containers Immediately, having done nothing but register
our hosts, we now have visibility into the state of all our containers
in each environment. The best part is we can share this information with
our other teams by giving them limited permissions in each environment.
Having this visibility eliminates the need for operators to log into the
Docker hosts to manually interrogate, and we reduce the number of
requests for environmental information by providing limited access to
the various teams. For example, granting the development team read-only
access to our environments has helped build a bridge between them and
the operations team. Both teams now feel more empowered and connected to
the state of the environment. Troubleshooting has become a joint
venture, instead of a one way, synchronous information flow, which has
reduced the overall time spent resolving issues that crop up. With our
existing Docker hosts added, and after having read a great series on
Jenkins and
Rancher,
we decided the next area to improve was our existing deployment
pipelines, modifying them to use rancher compose instead of
Ansible calling Docker Compose. Before we dive in, however, there are a
couple of things to know about Rancher stacks, scheduling, Docker
Compose, and rancher compose. Stacks and Services: Rancher makes a
distinction between standalone containers (those deployed outside of
Rancher or in a one-off capacity through the Rancher UI) and stacks and
services. Simply put,
stacks are groups of services, and services are all the containers
required that make up an application (more on this later). Standalone
containers are manually scheduled. Scheduling: The previous
deployment techniques required the operator to make decisions about
which hosts a container should run on. In the case of the deployment
script, it was which host the operator ran the script on. In the case of
the Ansible playbook, it was the host(s), or groups passed to the
Jenkins job. Either way, it required the operator to make decisions,
typically based off of very little information, that could be
detrimental to the deployment (what if the host is maxed on CPU
utilization?). Clustering solutions such as Docker Swarm, Kubernetes,
Mesos, and Rancher all implement schedulers to solve this problem.
Schedulers interrogate information about a group of hosts which are
candidates for being targetted for an action. The scheduler will
gradually reduce the list based off of default or custom requirements,
like CPU utilization and (anti)affinity rules (ex. do not deploy two of
the some container on the same host). As an operator performing a
deployment, this makes my life much easier since the scheduler can do
these calculations much faster and more accurately than I can
(especially during late night deployment windows). Out of the box,
Rancher provides a scheduler when deploying services via stacks.
Docker Compose: Rancher uses Docker Compose to create stacks and
define services. Since we already converted our services to use Docker
Compose files, we can easily create stacks in Rancher. Stacks can be
created from the UI, manually, or via the CLI through the rancher
compose utility. Rancher Compose: Rancher compose is a utility that
allows us to manage our stacks and services per environment in Rancher
via the CLI. It also allows additional access to Rancher utilities by
way of a rancher-compose.yml file. This is purely a supplemental file,
and is not a replacement to docker-compose.yml. In a rancher-compose.yml
file you can define, for example:
- An upgrade strategy per service
- Health checks per service
- Desired scale per service
These are all very useful features in Rancher that aren’t available
through Docker Compose or the Docker daemon. For a full list of features
offered by Rancher Compose, you can browse the documentation. We can easily
migrate our services to deploy as Rancher stacks by updating the
existing deployment job to use Rancher Compose instead of Ansible. We,
then are able to remove the DESTINATION parameter, but we kept VERSION
to use when interpolating our docker-compose.yml. Below is a snippet of
the shell logic we use in our Jenkins deployment: [[export
RANCHER_URL=http://rancher.abc.net/]][
[export RANCHER_ACCESS_KEY=…] [export
RANCHER_SECRET_KEY=…] [
] [if [ -f docker/docker-compose.yml ];
then] [
docker_dir=docker] [elif [ -f
/opt/abc/dockerfiles/java-service-1/docker-compose.yml ];
then] [
docker_dir=/opt/abc/dockerfiles/java-service-1]
[else] [ echo “No docker-compose.yml found.
Can’t continue!“] [ exit
1] [fi] [
] [if ! [ -f
${docker_dir}/rancher-compose.yml ]; then]
[ echo “No rancher-compose.yml found. Can’t
continue!“] [ exit
1] [fi] [
] [/usr/local/bin/rancher-compose –verbose
] [ -f ${docker_dir}/docker-compose.yml
] [ -r ${docker_dir}/rancher-compose.yml
] [ up -d
–upgrade]]
Stepping through the snippet, we see that:
- We define how to access our Rancher server via environment variables
- Locate the docker-compose.yml file otherwise exit the job with an
error - Locate the rancher-compose.yml file otherwise exit the job with an
error - Run rancher-compose, telling it to not block and output logs with -d
and to upgrade a service if it already exists (–upgrade)
You can see that, for the most part, the logic has stayed the same; the
biggest difference being the use of rancher-compose instead of the
Ansible deployment playbook and the addition of a rancher-compose.yml
file for each of our services. For our java-service-1 application, the
docker-compose and rancher-compose files now look like:
[[docker-compose.yml java-service-1: image:
registry.abc.net/java-service-1:${VERSION} container_name:
java-service-1 expose: – 8080 ports: – 8080:8080 rancher-compose.yml
java-service-1: scale:
3]]
With our deployment jobs created, let’s review our deployment workflow.
- A developer makes a change to code and pushes that change to git
- Jenkins begins unit testing the code and notifies a downstream job
on success - The downstream job builds and pushes a docker image with the new
code artifact to our private Docker registry - A deployment ticket is created with the application and version
number to be deployed to an environment
[[DEPLOY-111:]][
[ App: JavaService1, branch “release/1.0.1“]
[ Environment:
Production]]
- The deployment engineer runs the deployment Jenkins job for the
application, providing the version number as a parameter - Rancher compose runs, either creating or upgrading the stack, on the
environment and, after the desired scale has been reached, concludes
the job - The deployment engineer and developer verify the service manually
- The deployment engineer confirms the upgrade in the Rancher UI
Key Takeaways
With Rancher managing our service deployments, we benefit from
built-in scheduling, scaling, healing, upgrades, and rollbacks, for very
little effort on our part. Also, the migration from an Ansible
deployment to Rancher was minimal, only requiring the addition of a
rancher-compose.yml. However, having Rancher handle the scheduling of
our containers means it becomes harder for us to keep track of where our
applications are running. For example, since we no longer make the
decision of where the java-service-1 application runs, a load balancer
for that application cannot have a static IP for the backend. We need to
give our applications a way to discover each other. Lastly, in our
java-service-1 application, we are exposing and explicitly binding port
8080 to the docker host running our container. If another service
binding that same port were to be scheduled on the same host, it would
fail to start. A person making scheduling decisions can easily work
around this. However, we need to inform our scheduler to avoid this
scenario. In the last part of our
series,
we will explore the ways we mitigated these new pain points through the
use of affinity rules, host labels, service discovery, and smarter
upgrades and rollbacks. Go to Part
4>>
In the meantime, please download a free copy of “Continuous Integration
and Deployment with Docker and
Rancher” a detailed
eBook that walks through leveraging containers throughout your CI/CD
process.
Related Articles
May 11th, 2023
SUSE Awarded 16 Badges in G2 Spring 2023 Report
Jan 30th, 2023