NeuVector prometheus exporter target down with error "context deadline exceeded"
This document (000021520) is provided subject to the disclaimer at the end of this document.
Environment
- NeuVector
- NeuVector Prometheus and exporter
Situation
After deploying the exporter and Prometheus pods, the nv-exporter target is down due to error as follows:
Resolution
As "context deadline exceeded" error is a generic error and could be caused due to multiple factors but with the above error message, it can be clearly understood that Prometheus is unable to scrape metrics from the exporter service on "/metrics" endpoint within time.
This can happen if:
- The Prometheus and nv-exporter deployment has very low request-limits set.
- The neuvector-controller-pods have high load [CPU/memory consumption, performance issues]
- The cluster nodes are loaded [high CPU consumption, load average. memory, etc]
- The cluster nodes are experiencing network latency
- DNS resolution to the Kubernetes service is affected [Check CNI pod log or run curl/wget tests from the pod and node]
In this case, execute a shell into the prometheus-deployment pod and check if the endpoint is accessible:
# kubectl -n neuvector exec -it promtheus-deployment-xxxxx -- sh time wget --spider http://neuvector-svc-prometheus-exporter.neuvector,svc.cluster.local:8068/metrics
If the above command results in "remote file exists" that shows the connection is successful, but due to the use of `time` command in the start the above command will also show how much time it took to establish the connection to the endpoint.
If the time taken to connect to the endpoint takes more than 10 seconds, then "scrape_timeout" parameter needs to be added into the Prometheus configuration and its value needs to be increased.
By default the value of "scrape_timeout" parameter is set to 10 seconds, and can be increased to the time observed within the above wget test as below:
scrape_configs: - job_name: prometheus scrape_interval: 10s static_configs: - targets: ["localhost:9090"] - job_name: nv-exporter scrape_interval: 30s scrape_timeout: 35s <<<<<<< Add this line here in the prometheus configmap static_configs: - targets: ["neuvector-svc-prometheus-exporter.neuvector:8068"]
Note: This issue is environment/cluster-specific and can be caused due to multiple factors mentioned in this article. If any other issues are experienced, please open a support case with SUSE Support.
Cause
The "context deadline exceeded" error in Prometheus usually means that a query took too long to execute, surpassing the allowed timeout. This can happen for several reasons:
- Long or Complex Queries: If a query is too complex or returns a large amount of data, it can exceed the timeout.
- High Load: High load on the Prometheus server can cause delays in processing queries.
- Resource Constraints: Limited CPU or memory resources can slow down query execution.
- Network Latency: If Prometheus is querying data from remote endpoints, network latency can contribute to this error.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021520
- Creation Date: 01-Aug-2024
- Modified Date:02-Aug-2024
-
- SUSE NeuVector
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com