openATTIC logs errors: Failed to run "ceph.tasks.get_rbd_performance_data"

This document (7021202) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 4

Situation

Many errors similar to the following are logged to "/var/log/openattic/openattic.log":

2017-07-28 11:22:11,151 1856 runsystemd INFO taskqueue.models#run_once - Running 2005: ceph.tasks.get_rbd_performance_data with [u'56a2cbx-56fb-3c88-83ec-49bd54gs5c28', u'pool_name', u'iamge_name'], {}. Estimated: None
2017-07-28 11:22:11,224 1856 runsystemd ERROR taskqueue.models#run_once - Failed to run "ceph.tasks.get_rbd_performance_data with [u'56a2cbx-56fb-3c88-83ec-49bd54gs5c28', u'pool_name', u'image_name'], {}" created "2017-07-28 11:22:07.633303"
Traceback (most recent call last):
File "/usr/share/openattic/taskqueue/models.py", line 79, in run_once
    res = task.run_once()
File "/usr/share/openattic/taskqueue/models.py", line 239, in run_once
    res = self.wrapper.call_now(*self.args, **self.kwargs)
File "/usr/share/openattic/taskqueue/models.py", line 316, in call_now
    return self._orig_func(*args, **kwargs)
File "/usr/share/openattic/ceph/tasks.py", line 74, in get_rbd_performance_data
    disk_usage = api.image_disk_usage(pool_name, image_name)
File "/usr/share/openattic/ceph/librados.py", line 880, in image_disk_usage
    '--pool', pool_name, '--image', name, '--format', 'json'])
File "/usr/lib64/python2.7/subprocess.py", line 219, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['rbd', 'disk-usage', '--cluster', 'ceph', '--pool', u'pool_name', '--image', u'image_name', '--format', 'json']' returned non-zero exit status 2
2017-07-28 11:22:11,228 1856 runsystemd INFO taskqueue.models#finish_task - Task finished: Command '['rbd', 'disk-usage', '--cluster', 'ceph', '--pool', u'pool_name', '--image', u'image_name', '--format', 'json']' returned non-zero exit status 2

Resolution

There are two possible solutions:

A. From the admin node execute again the command "oaconfig install". This will also result in all the Icinga checks being re-created.

B. Remove the stale configuration files from "/etc/icinga/conf.d/". The naming convention of the RBD (Rados Block Device) configuration files are using the following pattern:

cephrbd_<fsid>_<pool name>_<RBD name>.cfg

For example, if the cluster fsid is "70e9c50b-e375-37c7-a35b-1af02442b751" and the Pool and Image for which the errors are logged are called "testpool" and "testimage" then remove the following stale file from the "/etc/icinga/conf.d/" directory:

cephrbd_70e9c50b-e375-37c7-a35b-1af02442b751_testpool_testimage.cfg

After removing the relevant files restart the following two services:

systemctl restart icinga.service npcd.service

Cause

When removing pools and their images from a cluster it can happen that the Icinga configuration files for these pools are not properly removed from the Icinga configuration.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.