Prometheus Metric Federation with Thanos
Prometheus is a CNCF graduated project for monitoring and alerting. It is one of the most widely used monitoring and alerting tools in the Kubernetes ecosystem. Rancher users can leverage Prometheus quickly by using the built-in monitoring stack.
Prometheus stores its metrics as a time series database on the local disk. Prometheus local storage is limited by the size of the disk and amount of metrics it can retain.
Prometheus, however, allows integrations with remote systems for writing and reading metrics using the _remotewrite and _remoteread directives. Prometheus also supports a wide number of remote endpoints and storage integrations.
In this blog, we will explore a quick and easy way to set up Rancher monitoring remote endpoint integration with Thanos receive. Thanos is an open source, highly available Prometheus setup with long-term metric capabilities. You can use this solution to federate metrics across all of your Prometheus instances, and allow central Grafana dashboarding to run off Thanos.
Important Note: As part of metric federation, the project/cluster metrics will be leaving the Rancher management plane boundaries. It is essential for cluster administrators to ensure appropriate access control mechanisms are in place to restrict access to this metric store.
Installing Thanos
For this blog, we can set up Thanos on a Kubernetes cluster quickly using kube-thanos.
We will need the following Thanos components:
- Thanos store gateway
- Thanos receiver
- Thanos querier
- Object storage
The solution will look something like this:
+
Tenant's Premise | Provider Premise
|
| +------------------------+
| | |
| +-------->+ Object Storage |
| | | |
| | +-----------+------------+
| | ^
| | S3 API | S3 API
| | |
| | +-----------+------------+
| | | | Store API
| | | Thanos Store Gateway +<-----------------------+
| | | | |
| | +------------------------+ |
| | |
| +---------------------+ |
| | |
+--------------+ | +-----------+------------+ +---------+--------+
| | | Remote | | Store API | |
| Prometheus +------------->+ Thanos Receiver +<-------------+ Thanos Querier |
| | | Write | | | |
+--------------+ | +------------------------+ +---------+--------+
| ^
| |
+--------------+ | |
| | | PromQL |
| User +----------------------------------------------------------------+
| | |
+--------------+ |
+
Source: Thanos receive proposal
Thanos supports several object storage configurations.
We will use MinIO as our object storage. You need to define the object storage in a secret named thanos-objectStorage, which needs to be in the namespace of your Thanos deployment.
The thanos-config.yaml
looks like this:
type: s3
config:
bucket: thanos
endpoint: ${minio-endpoint}
access_key: ${minio-access-key}
secret_key: ${minio-secret-key}
insecure: true
You can create the secret as follows:
kubectl create secret generic thanos-objectstorage --from-file=thanos.yaml="$PATH_TO_CONFIG"/thanos-config.yaml
Configuring Rancher Monitoring
The Prometheus version packaged with the Rancher monitoring operator already supports remote_read
and remote_write
integrations.
The extra settings for the Thanos receive endpoint can be passed via the advanced options as follows:
We need to specify a unique name as required by the Prometheus remote_write
specification.
# The name will be used in metrics and logging in place of a generated value to help users distinguish between
# remote write configs.
[ name: <string> ]
Once you’ve deployed monitoring, you should be able to see your metrics using the Thanos querier.
The stored metrics will also be available in the object storage.
Since all cluster metrics will be available in this Thanos install, the owner for the workloads needs to ensure that access to workload and metrics is secured appropriately.
Conclusion
Using the remote write and read capabilities within Rancher monitoring (Prometheus) and the Thanos receiver, you can achieve long-term metric storage and a global view of your multiple cluster metrics in a few easy steps.
Related Articles
Jan 30th, 2023
Deciphering container complexity from operations to security
Apr 18th, 2023
Utilizing the New Rancher UI Extensions Framework
Sep 12th, 2023