NeuVector prometheus exporter target down with error "context deadline exceeded"

This document (000021520) is provided subject to the disclaimer at the end of this document.

Environment

NeuVector
NeuVector Prometheus and exporter

Situation

To scrape NeuVector metrics, Prometheus and NeuVector exporter will need to be deployed alongside NeuVector pods according to the guidelines mentioned here.
After deploying the exporter and Prometheus pods, the nv-exporter target is down due to error as follows:
Capture (2).PNG

Resolution

As "context deadline exceeded" error is a generic error and could be caused due to multiple factors but with the above error message, it can be clearly understood that Prometheus is unable to scrape metrics from the exporter service on "/metrics" endpoint within time.

This can happen if:

The Prometheus and nv-exporter deployment has very low request-limits set.
The neuvector-controller-pods have high load [CPU/memory consumption, performance issues]
The cluster nodes are loaded [high CPU consumption, load average. memory, etc]
The cluster nodes are experiencing network latency
DNS resolution to the Kubernetes service is affected [Check CNI pod log or run curl/wget tests from the pod and node]

In this case, execute a shell into the prometheus-deployment pod and check if the endpoint is accessible:

# kubectl -n neuvector exec -it promtheus-deployment-xxxxx -- sh

time wget --spider http://neuvector-svc-prometheus-exporter.neuvector,svc.cluster.local:8068/metrics

If the above command results in "remote file exists" that shows the connection is successful, but due to the use of `time` command in the start the above command will also show how much time it took to establish the connection to the endpoint.

If the time taken to connect to the endpoint takes more than 10 seconds, then "scrape_timeout" parameter needs to be added into the Prometheus configuration and its value needs to be increased.
By default the value of "scrape_timeout" parameter is set to 10 seconds, and can be increased to the time observed within the above wget test as below:

scrape_configs:
  - job_name: prometheus
    scrape_interval: 10s
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: nv-exporter
    scrape_interval: 30s
    scrape_timeout: 35s                 <<<<<<< Add this line here in the prometheus configmap
    static_configs:
        - targets: ["neuvector-svc-prometheus-exporter.neuvector:8068"]

Note: This issue is environment/cluster-specific and can be caused due to multiple factors mentioned in this article. If any other issues are experienced, please open a support case with SUSE Support.

Cause

The "context deadline exceeded" error in Prometheus usually means that a query took too long to execute, surpassing the allowed timeout. This can happen for several reasons:

Long or Complex Queries: If a query is too complex or returns a large amount of data, it can exceed the timeout.
High Load: High load on the Prometheus server can cause delays in processing queries.
Resource Constraints: Limited CPU or memory resources can slow down query execution.
Network Latency: If Prometheus is querying data from remote endpoints, network latency can contribute to this error.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

Document ID:000021520
Creation Date: 01-Aug-2024
Modified Date:02-Aug-2024
- SUSE NeuVector

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Report a Software Vulnerability

Go to Customer Center

SUSE Support

Here When You Need Us