Monitoring Your SUSE Virtualization Environment

Share
Share

In today’s dynamic IT landscapes, gaining deep visibility into your hybrid cloud infrastructure is key to ensuring optimal performance and rapid troubleshooting of your virtual machines (VMs). If you’re looking for ways to modernize your infrastructure and reduce costs associated with solutions like VMware and Nutanix, this guide will walk you through how to set up comprehensive monitoring for your SUSE Virtualization environment, which offers containerized workloads in datacenters, public clouds, AI, and at the edge. From installing the SUSE Observability Agent to creating custom dashboards and scraping specialized KubeVirt metrics, you’ll gain insights into how to operate VMs in a cloud-native virtualization environment.

Why Monitor SUSE Virtualization?

Monitoring your SUSE Virtualization cluster allows you to:

  • Gain visibility: Track the performance of your virtual machines (VMs) and underlying Kubernetes infrastructure.
  • Troubleshoot quickly: Identify issues before they impact production.
  • Customize insights: Create tailored views and dashboards focusing on the metrics that matter most to your operations.

While you can monitor your cluster using the SUSE Observability Agent as you would with any standard Kubernetes setup, integrating additional KubeVirt metrics offers deeper insights into your virtualization environment.

Prerequisites

Before you start, ensure you have the following in place:

  • SUSE Observability Instance: A running instance to collect and display your metrics.
  • SUSE Virtualization Cluster: Your virtualization environment must be up and running.
  • Environment Configuration: Make sure your .env file is correctly configured (refer to the main README for setup details).

Quick Connectivity Check

Confirm that you can connect to your cluster by running:

cd suse/virtualization/vmi-metrics
eval $(task common:shell-env)
kubectl get nodes

Step 1: Installing the SUSE Observability Agent

 

Create a StackPack Instance

The first step involves setting up a Kubernetes StackPack instance. The instance name is determined by your CLUSTER_NAME environment variable. Simply run:

task install-stackpack-instance

Deploy the Agent

Deploy the SUSE Observability Agent to start collecting metrics:

task common:deploy-observability-agent

After deployment, log in to your SUSE Observability instance. You should now see your cluster synced and begin receiving data.

Step 2: Creating Custom Views for Virtual Machines

 

SUSE Observability provides the flexibility to create custom views so you can focus on key components of your virtualization environment. This is particularly useful for isolating data on virtual machines and their controllers.

Define Custom Views

Using STQL queries, you can filter the data for different components:

  • VirtControllers:
    (label IN ("kubevirt.io:virt-control") AND type = "pod")
    
  • VirtualMachines:
    (label IN ("kubevirt.io:virt-launcher") AND type = "pod")
    

💡 Pro Tip: Star a view to pin it to the left menu for quick access.

For more details, check out the custom view documentation.

Step 3: Scraping KubeVirt Metrics with OpenTelemetry

 

Your SUSE Virtualization environment exposes metrics through a Kubernetes service called kubevirt-prometheus-metrics. These metrics can be scraped and sent to SUSE Observability using the OpenTelemetry collector.

For a complete list of available metrics, refer to the KubeVirt metrics documentation.

Sample OpenTelemetry Collector Configuration

Below is a sample configuration for the Prometheus receiver to scrape KubeVirt metrics:

config:
  receivers:
    prometheus:
      config:
        scrape_configs:
        - job_name: 'kubevirt-metrics'
          scrape_interval: 30s
          tls_config:
            insecure_skip_verify: true
          scheme: https
          kubernetes_sd_configs:
            - role: endpoints
              selectors:
                - role: endpoints
                  field: "metadata.name=kubevirt-prometheus-metrics"

Deploy the preconfigured SUSE Observability OpenTelemetry collector with:

task deploy-kubevirt-otel-collector

Once deployed, navigate to the Metrics page in SUSE Observability. All collected KubeVirt metrics begin with the prefix kubevirt_.

KubeVirt Metrics

Step 4: Adding Custom Dashboard Charts

 

Visualizing your VM metrics is essential for effective monitoring. SUSE Observability allows you to create custom charts that can be adapted from pre-existing Grafana dashboards, such as the Harvester Grafana Dashboard and the KubeVirt Monitoring Dashboard.

Sample Charts Overview

Below are examples of charts you might consider, grouped by their function:

Control Plane Charts

Name PromQL Query
VMI Creation Time histogram_quantile(0.95, sum(rate(kubevirt_vmi_phase_transition_time_from_creation_seconds_bucket{instance=~"$instance"}[5m])) by (phase, le))
VMI Start Rate sum(rate(kubevirt_vmi_phase_transition_time_from_creation_seconds_count{phase="Running", instance=~"$instance"}[5m])) by (instance)
VMI Phase Transition Latency histogram_quantile(0.95, sum(rate(kubevirt_vmi_phase_transition_time_seconds_bucket{phase="Failed", instance=~"$instance"}[5m])) by (le,phase))
VMI Count (approx.) sum(increase(kubevirt_vmi_phase_transition_time_from_creation_seconds_count{phase="Running", instance=~"$instance"}[20m])) by (instance)
Work Queue – Add Rate sum(rate(kubevirt_workqueue_adds_total{job=~".*kubevirt.*", instance=~"$instance"}[1m])) by (instance, name)
Work Queue – Depth kubevirt_workqueue_depth{job=~".*kubevirt.*", instance=~"$instance"}
Work Queue – Queue Duration histogram_quantile(0.99, sum(rate(kubevirt_workqueue_queue_duration_seconds_bucket{job=~".*kubevirt.*", instance=~"$instance"}[1m])) by (instance, name, le))
Work Queue – Work Duration histogram_quantile(0.99, sum(rate(kubevirt_workqueue_work_duration_seconds_bucket{job=~".*kubevirt.*", instance=~"$instance"}[1m])) by (instance, name, le))
Work Queue – Unfinished Work kubevirt_workqueue_unfinished_work_seconds{job=~".*kubevirt.*", instance=~"$instance"}
Work Queue – Retry Rate rate(kubevirt_workqueue_retries_total{instance=~"$instance"}[1m])
Work Queue – Longest Running Processor kubevirt_workqueue_longest_running_processor_seconds{job=~".*kubevirt.*", instance=~"$instance"}

Virtual Machine Instance Charts

Name PromQL Query
CPU Usage topk(${count}, (avg(rate(kubevirt_vmi_vcpu_seconds[5m])) by (domain, name)))
Memory Usage topk(${count}, ((kubevirt_vmi_memory_available_bytes - kubevirt_vmi_memory_unused_bytes) / kubevirt_vmi_memory_available_bytes))
Storage Read Traffic Bytes topk(${count}, (irate(kubevirt_vmi_storage_read_traffic_bytes_total[5m])))
Storage Write Traffic Bytes topk(${count}, (irate(kubevirt_vmi_storage_write_traffic_bytes_total[5m])))
Network Receive Bits topk(${count}, (irate(kubevirt_vmi_network_receive_bytes_total[5m])*8))
Network Transmit Bits topk(${count}, (irate(kubevirt_vmi_network_transmit_bytes_total[5m])*8))
Network Receive Packets (2h) topk(${count}, (delta(kubevirt_vmi_network_receive_packets_total[2h])))
Network Transmit Packets (2h) topk(${count}, (delta(kubevirt_vmi_network_transmit_packets_total[2h])))
IO Traffic irate(kubevirt_vmi_storage_write_traffic_bytes_total{namespace="$namespace", name="$vm"}[5m])
IO Time irate(kubevirt_vmi_storage_write_times_ms_total{namespace="$namespace", name="$vm"}[5m])
IOPS irate(kubevirt_vmi_storage_iops_write_total{namespace="$namespace", name="$vm"}[5m])

Converting PromQL Queries for SUSE Observability

SUSE Observability uses STQL queries to bind metrics to specific topology components. The key is mapping metric tags to your component’s tags.

Example Conversion

Given the Grafana PromQL query:

kubevirt_workqueue_unfinished_work_seconds{job=~".*kubevirt.*", instance=~"$instance"}

The equivalent query for a SUSE Observability chart would be:

kubevirt_workqueue_unfinished_work_seconds{k8s_cluster_name="${tags.cluster-name}", k8s_pod_name="${name}"}

Completed Chart Definition

Here’s an example of a complete chart configuration:

- id: -120
  name: Unfinished Work
  queries:
    - expression: kubevirt_workqueue_unfinished_work_seconds{k8s_cluster_name="${tags.cluster-name}", k8s_pod_name="${name}"}
      alias: '${name}'
  scope:  type in ("pod") and label = "kubevirt.io:virt-controller"
  identifier: urn:stackpack:kubevirt:shared:metric-binding:work-queue-unfinished-work
  unit: s
  chartType: line
  priority: high
  enabled: true
  layout:
    metricPerspective:
      tab: Virtualization
      section: Work Queue
    componentSummary:
      weight: 3
  _type: MetricBinding

Applying Your Chart Definitions

 

Once you have defined your custom charts, apply them to your SUSE Observability instance with:

task upload-metric-bindings

Now, when you navigate to a virt-controller or virt-launcher pod’s metrics page, you will see the new Virtualization tab showcasing your custom dashboards.

Virtual Controller Dashboard Virtual Launcher Dashboard

By following these steps, you’ve set up an advanced monitoring environment for your SUSE Virtualization cluster. From deploying the SUSE Observability Agent and configuring custom views to scraping specialized KubeVirt metrics and creating tailored dashboards, you now have the tools to maintain peak performance and swiftly address issues within your virtual infrastructure.

Monitoring like this ensures that you capture all the critical details of your SUSE Virtualization environment, enabling you to keep a close eye on both your virtual machines and the underlying Kubernetes infrastructure. Happy monitoring! 🚀

The 258% ROI of SUSE Rancher Prime with Virtualization*

SUSE Rancher Prime with Virtualization delivers a compelling return on investment. Through IT cost savings, staff productivity gains, performance improvements and business enablement, IDC projects that interviewed organizations will achieve benefits worth an annual average of $3.4 million, which result in a three-year ROI of 258%.*

Download the full IDC paper to learn how SUSE Rancher Prime with Virtualization can transform your cloud native journey with business value.

You can find his github repo for this blog here

* IDC Business Value White Paper, sponsored by SUSE, The Business Value of SUSE Rancher Prime with Virtualization, February 2025 | IDC #US52704924

Share
(Visited 1 times, 1 visits today)
Avatar photo
301 views
rnaidoo@suse.com Ravan Naidoo is a dedicated IT professional with a rich history spanning nearly three decades. Beginning his career as a mainframe developer, he advanced through various roles including engineering, consultancy, and innovation leadership. Now a Domain Solution Architect at SUSE, Ravan leverages his extensive expertise in DevOps and Observability to help organizations navigate the evolving world of application management and cloud-native deployments.