Introduction

ceph-medic is a very simple tool that runs against a Ceph cluster to detect common issues that might prevent correct functionality. It requires non-interactive SSH access to accounts that can sudo without a password prompt.

Usage

The basic usage of ceph-medic is to perform checks against a ceph cluster to identify potential issues with its installation or configuration. To do this, run the following command:

ceph-medic --inventory /path/to/hosts --ssh-config /path/to/ssh_config check

Inventory

ceph-medic needs to know the nodes that exist in your ceph cluster before it can perform checks. The inventory (or hosts file) is a typical Ansible inventory file and will be used to inform ceph-medic of the nodes in your cluster and their respective roles. The following standard host groups are supported by ceph-medic: mons, osds, rgws, mdss, mgrs and clients. An example hosts file would look like:

[mons]
mon0
mon1

[osds]
osd0

[mgrs]
mgr0

The location of the hosts file can be passed into ceph-medic by using the --inventory cli option (e.g ceph-medic --inventory /path/to/hosts).

If the --inventory option is not defined, ceph-medic will first look in the current working directory for a file named hosts. If the file does not exist, it will look for /etc/ansible/hosts to be used as the inventory.

Note

Defining the inventory location is also possible via the config file under the [global] section.

Inventory for Containers

Containerized deployments are also supported, via docker and podman. As with baremetal deployments, an inventory file is required. If the cluster was deployed with ceph-ansible, you may use that existing inventory.

To configure ceph-medic to connect to a containerized cluster, the glocal section of the configuration needs to define deployment_type to either docker or podman. For example:

[global]

deployment_type = podman

Inventory for Container Platforms

Both kubernetes and openshift platforms can host containers remotely, but do allow to connect and retrieve information from a central location. To configure ceph-medic to connect to a platform, the glocal section of the configuration needs to define deployment_type to either kubernetes, which uses the kubectl command, or openshift, which uses the oc command. For example:

[global]

deployment_type = openshift

When using openshift or kubernetes as a deployment type, there is no requirement to define a hosts file. The hosts are generated dynamically by calling out to the platform and retrieving the pods. When the pods are identified, they are grouped by deamon type (osd, mgr, rgw, mon, etc…).

SSH Config

All nodes in your hosts file must be configured to provide non-interactive SSH access to accounts that can sudo without a password prompt.

Note

This is the same ssh config required by ansible. If you’ve used ceph-ansible to deploy your cluster then your nodes are most likely already configured for this type of ssh access. If that is the case, using the same user that performed the initial deployment would be easiest.

To provide your ssh config you must use the --ssh-config flag and give it a path to a file that defines your ssh configuration. For example, a file like this is used to connect with a cluster comprised of vagrant vms:

Host mon0
  HostName 127.0.0.1
  User vagrant
  Port 2200
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile /Users/andrewschoen/.vagrant.d/insecure_private_key
  IdentitiesOnly yes
  LogLevel FATAL

Host osd0
  HostName 127.0.0.1
  User vagrant
  Port 2201
  UserKnownHostsFile /dev/null
  StrictHostKeyChecking no
  PasswordAuthentication no
  IdentityFile /Users/andrewschoen/.vagrant.d/insecure_private_key
  IdentitiesOnly yes
  LogLevel FATAL

Note

SSH configuration is not needed when using kubernetes or openshift

Logging

By default ceph-medic sends complete logs to the current working directory. This log file is more verbose than the output displayed on the terminal. To change where these logs are created, modify the default value for --log-path in ~/.cephmedic.conf.

Running checks

To perform checks against your cluster use the check subcommand. This will perform a series of general checks, as well as checks specific to each daemon. Sample output from this command will look like:

ceph-medic --ssh-config vagrant_ssh_config check
Host: mgr0                  connection: [connected  ]
Host: mon0                  connection: [connected  ]
Host: osd0                  connection: [connected  ]
Collection completed!

=======================  Starting remote check session  ========================
Version: 0.0.1    Cluster Name: "test"
Total hosts: [3]
OSDs:    1    MONs:    1     Clients:    0
MDSs:    0    RGWs:    0     MGRs:       1

================================================================================

---------- managers ----------
 mgr0

------------ osds ------------
 osd0

------------ mons ------------
 mon0

17 passed, 0 errors, on 4 hosts

The logging can also be configured in the cephmedic.conf file in the global section:

[global]
--log-path = .

To ensure that cluster checks run properly, at least one monitor node should have administrative privileges.