Canary Releases with Rancher Continuous Delivery
Rancher Continuous Delivery, available since Rancher version 2.5.x, brings the ability to perform GitOps at scale on Rancher-managed clusters. Continuous Delivery, powered by Fleet, allows users to manage the state of their clusters using a GitOps based approach.
Canary release is a popular technique used by software developers to release a new version of the application to a subset of users, and based on metrics such as availability, latency or custom metrics, can be scaled up to serve more users.
In this blog, we’ll explore using Continuous Delivery to perform canary releases for your application workloads.
The actual canary release will be performed by a project named Flagger. Flagger works as a Kubernetes operator. It allows users to specify a custom object that informs Flagger to watch a deployment and create additional primary and canary deployments. As part of this blog, we’ll use Flagger with Istio as the service mesh.
In a nutshell, when we create a deployment, Flagger clones the deployment to a primary deployment. Then it then amends the service associated with the original deployment to point to this new primary deployment. The primary deployment itself gets scaled down to 0.
Flagger uses istio virtualservices to perform the actual canary release. When a new version of the app is deployed, Flagger scales the original deployment back to the original spec and associates a canary service to point to the deployment.
Now a percentage of traffic gets routed to this canary service. Based on predefined metrics, Flagger starts routing more and more traffic to this canary service. Once 100 percent of the traffic has been migrated to the canary service, the primary deployment is recreated with the same spec as the original deployment.
Next, the virtualservice is updated to route 100 percent of traffic back to the primary service. After this traffic switch, the original deployment is scaled back to 0 and the Flagger operator waits and monitors subsequent deployment updates.
Get Started with Flagger and Perform a Canary Release
To get started with Flagger, we will perform the following:
- Set up monitoring and istio
- Setup Flagger and flagger-loadtest
- Deploy a demo application and perform a canary release
1. Set Up Monitoring and Istio
To setup monitoring
and istio
, we will set up a couple of ClusterGroups in Continuous Delivery
monitoring
apiVersion: fleet.cattle.io/v1alpha1
kind: ClusterGroup
metadata:
name: monitoring
namespace: fleet-default
spec:
selector:
matchLabels:
monitoring: enabled
istio
apiVersion: fleet.cattle.io/v1alpha1
kind: ClusterGroup
metadata:
name: istio
namespace: fleet-default
spec:
selector:
matchLabels:
istio: enabled
Now we’ll set up our monitoring
and istio
GitRepos to point to use these ClusterGroups
monitoring repo
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
name: monitoring
namespace: fleet-default
spec:
branch: master
insecureSkipTLSVerify: false
paths:
- monitoring
- monitoring-crd
repo: https://github.com/ibrokethecloud/core-bundles
targets:
- clusterGroup: monitoring
istio repo
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
name: istio
namespace: fleet-default
spec:
branch: master
insecureSkipTLSVerify: false
paths:
- istio
- kiali
repo: https://github.com/ibrokethecloud/core-bundles
targets:
- clusterGroup: istio
To trigger the deployment, we’ll assign a cluster to these ClusterGroups using the desired labels
In a few minutes, the monitoring and istio apps should be installed on the specified cluster
2. Set up Flagger and flagger-loadtest
As part of installing Flagger, we will also install flagger-loadtest to help generate requests on our workload.
Note: Flagger-loadtest is only needed for this demo. In a real-world scenario, we assume that your application will serve real traffic. Flagger will use the metrics from the real traffic to start the switching.
We will set up a ClusterGroup canary
as follows
apiVersion: fleet.cattle.io/v1alpha1
kind: ClusterGroup
metadata:
name: canary
namespace: fleet-default
spec:
selector:
matchLabels:
canary: enabled
Now we can set up the flagger
GitRepo to consume this ClusterGroup
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
name: flagger
namespace: fleet-default
spec:
branch: master
insecureSkipTLSVerify: false
paths:
- flagger
- flagger-loadtest
repo: https://github.com/ibrokethecloud/user-bundles
targets:
- clusterGroup: canary
As we saw earlier, to trigger the deployment we will assign the cluster to the Flagger ClusterGroup
In a few minutes, the Flagger and flagger-loadtest helm charts will be deployed to this cluster
Note that while deploying Flagger, it copies all the labels and annotations from the source deployment to the canary and primary deployments. Continuous Delivery uses labels on objects to reconcile and identify which underlying Bundle they belong to. Flagger trips this up and in the default setup, Continuous Delivery will report additional primary and canary deployments that are not in the GitRepo.
To avoid this, the includeLabelPrefix
setting in the Flagger helm chart is passed and set to dummy
to instruct Flagger to only include labels that have dummy
in their prefix.
This helps us work around the Continuous Delivery reconciliation logic.
The fleet.yaml looks like this
defaultNamespace: istio-system
helm:
releaseName: flagger
repo: https://flagger.app
chart: flagger
version: 1.6.2
values:
crd.create: true
meshProvider: istio
metricsServer: http://rancher-monitoring-prometheus.cattle-monitoring-system:9090
includeLabelPrefix: dummy
diff:
comparePatches:
- apiVersion: apps/v1
kind: Deployment
name: flagger
namespace: istio-system
operations:
- {"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits/cpu"}
- {"op": "remove", "path": "/spec/template/spec/containers/0/volumeMounts"}
- {"op": "remove", "path": "/spec/template/spec/volumes"}
With all the base services set up, we are ready to deploy our workload.
3. Deploy a Demo Application and Perform a Canary Release
Now we’ll now add the canary-demo-app
GitRepo to target the canary
ClusterGroup
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
name: canary-demo-app
namespace: fleet-default
spec:
branch: master
insecureSkipTLSVerify: false
paths:
- canary-demo-app
repo: https://github.com/ibrokethecloud/user-bundles
targets:
- clusterGroup: canary
This will trigger the deployment of the demo app to the canary-demo
namespace.
(⎈ |digitalocean:canary-demo)
~
▶ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
fleet-simple-app 0/0 0 0 80s
fleet-simple-app-primary 1/1 1 1 80s
(⎈ |digitalocean:canary-demo)
The Canary object controlling the behavior of the release is as follows:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: fleet-simple-app
namespace: canary-demo
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: fleet-simple-app
service:
port: 8080
analysis:
interval: 1m
threshold: 10
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
webhooks:
- name: load-test
url: http://flagger-loadtester.loadtester/
timeout: 5s
metadata:
type: cmd
cmd: "hey -z 1m -q 10 -c 2 http://fleet-simple-app-canary.canary-demo:8080"
The key item in this is the webhook to perform the load test to generate enough metrics for Flagger to be able to start switching traffic.
We should also be able to see the status of the canary object as follows:
(⎈ |digitalocean:canary-demo)
~
▶ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
fleet-simple-app Initialized 0 2021-03-22T06:25:17Z
We can now trigger a canary release by updating the GitRepo for canary-demo-app
with a new version of the image for the deployment
In a few minutes, we should see the original deployment scaled up with the new image from the GitRepo. In addition, the canary object moves to a Progressing state and the weight of the canary release changes.
▶ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
fleet-simple-app 1/1 1 1 6m5s
fleet-simple-app-primary 1/1 1 1 6m5s
(⎈ |digitalocean:canary-demo)
~
▶ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
fleet-simple-app Progressing 0 2021-03-22T06:30:17Z
▶ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
fleet-simple-app Progressing 10 2021-03-22T06:31:17Z
The progressing canary also corresponds to the changing weight in the istio virtualservice.
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
creationTimestamp: "2021-03-22T06:25:17Z"
generation: 2
managedFields:
- apiVersion: networking.istio.io/v1alpha3
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:ownerReferences:
.: {}
k:{"uid":"6ae2a7f1-6949-484b-ab48-c385e9827a11"}:
.: {}
f:apiVersion: {}
f:blockOwnerDeletion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
.: {}
f:gateways: {}
f:hosts: {}
f:http: {}
manager: flagger
operation: Update
time: "2021-03-22T06:25:17Z"
name: fleet-simple-app
namespace: canary-demo
ownerReferences:
- apiVersion: flagger.app/v1beta1
blockOwnerDeletion: true
controller: true
kind: Canary
name: fleet-simple-app
uid: 6ae2a7f1-6949-484b-ab48-c385e9827a11
resourceVersion: "10783"
uid: b5aaaf34-7b16-4ba9-972c-b60756943da8
spec:
gateways:
- mesh
hosts:
- fleet-simple-app
http:
- route:
- destination:
host: fleet-simple-app-primary
weight: 90
- destination:
host: fleet-simple-app-canary
weight: 10
In a bit, we should see Flagger promoting the canary release and the primary deployment being switched to the new version.
▶ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
fleet-simple-app Promoting 0 2021-03-22T06:37:17Z
▶ kubectl get pods
NAME READY STATUS RESTARTS AGE
fleet-simple-app-64cd54dfd-tkk8v 2/2 Running 0 9m2s
fleet-simple-app-primary-854d4d84b5-qgfc8 2/2 Running 0 74s
This is following by the finalization of the deployment and we should see the original deployment being scaled down.
▶ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
fleet-simple-app Finalising 0 2021-03-22T06:38:17Z
(⎈ |digitalocean:canary-demo)
~
▶ kubectl get pods
NAME READY STATUS RESTARTS AGE
fleet-simple-app-64cd54dfd-tkk8v 2/2 Terminating 0 9m53s
fleet-simple-app-primary-854d4d84b5-qgfc8 2/2 Running 0 2m5s
▶ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
fleet-simple-app 0/0 0 0 15m
fleet-simple-app-primary 1/1 1 1 15m
Post this the canary object should have been successful
▶ kubectl get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
fleet-simple-app Succeeded 0 2021-03-22T06:39:17Z
That’s it! In summary, in this blog we’ve shown you how to use Continuous Delivery to leverage third party tools like Flagger to perform canary releases for our workload. What tools are you using for Continuous Delivery? Head over to the SUSE & Rancher Community and join the conversation!
Related Articles
Feb 20th, 2024
SUSE Choice Awards: Calling all innovation heroes
Jul 31st, 2024