Notice

This document is for a development version of Ceph.

Monitoring a Cluster

After you have a running cluster, you can use the ceph tool to monitor your cluster. Monitoring a cluster typically involves checking OSD status, monitor status, placement group status, and metadata server status.

Using the Command Line

Interactive Mode

To run the ceph tool in interactive mode, type ceph at the command line with no arguments. For example:

ceph

health
status
quorum_status
mon stat

Non-default Paths

If you specified non-default locations for your configuration or keyring when you install the cluster, you may specify their locations to the ceph tool by running the following command:

ceph -c /path/to/conf -k /path/to/keyring health

Checking a Cluster’s Status

After you start your cluster, and before you start reading and/or writing data, you should check your cluster’s status.

To check a cluster’s status, run the following command:

ceph status

Alternatively, you can run the following command:

ceph -s

In interactive mode, this operation is performed by typing status and pressing Enter:

status

Ceph will print the cluster status. For example, a tiny Ceph “demonstration cluster” that is running one instance of each service (monitor, manager, and OSD) might print the following:

cluster:
  id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
  health: HEALTH_OK

services:
  mon: 3 daemons, quorum a,b,c
  mgr: x(active)
  mds: cephfs_a-1/1/1 up  {0=a=up:active}, 2 up:standby
  osd: 3 osds: 3 up, 3 in

data:
  pools:   2 pools, 16 pgs
  objects: 21 objects, 2.19K
  usage:   546 GB used, 384 GB / 931 GB avail
  pgs:     16 active+clean

How Ceph Calculates Data Usage

The usage value reflects the actual amount of raw storage used. The xxx GB / xxx GB value means the amount available (the lesser number) of the overall storage capacity of the cluster. The notional number reflects the size of the stored data before it is replicated, cloned or snapshotted. Therefore, the amount of data actually stored typically exceeds the notional amount stored, because Ceph creates replicas of the data and may also use storage capacity for cloning and snapshotting.

Watching a Cluster

Each daemon in the Ceph cluster maintains a log of events, and the Ceph cluster itself maintains a cluster log that records high-level events about the entire Ceph cluster. These events are logged to disk on monitor servers (in the default location /var/log/ceph/ceph.log), and they can be monitored via the command line.

To follow the cluster log, run the following command:

ceph -w

Ceph will print the status of the system, followed by each log message as it is added. For example:

cluster:
  id:     477e46f1-ae41-4e43-9c8f-72c918ab0a20
  health: HEALTH_OK

services:
  mon: 3 daemons, quorum a,b,c
  mgr: x(active)
  mds: cephfs_a-1/1/1 up  {0=a=up:active}, 2 up:standby
  osd: 3 osds: 3 up, 3 in

data:
  pools:   2 pools, 16 pgs
  objects: 21 objects, 2.19K
  usage:   546 GB used, 384 GB / 931 GB avail
  pgs:     16 active+clean


2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available

Instead of printing log lines as they are added, you might want to print only the most recent lines. Run ceph log last [n] to see the most recent n lines from the cluster log.

Monitoring Health Checks

Ceph continuously runs various Health Checks. When a health check fails, this failure is reflected in the output of ceph status and ceph health. The cluster log receives messages that indicate when a check has failed and when the cluster has recovered.

For example, when an OSD goes down, the health section of the status output is updated as follows:

health: HEALTH_WARN
        1 osds down
        Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded

At the same time, cluster log messages are emitted to record the failure of the health checks:

2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)

When the OSD comes back online, the cluster log records the cluster’s return to a healthy state:

2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy

Network Performance Checks

Ceph OSDs send heartbeat ping messages to each other in order to monitor daemon availability and network performance. If a single delayed response is detected, this might indicate nothing more than a busy OSD. But if multiple delays between distinct pairs of OSDs are detected, this might indicate a failed network switch, a NIC failure, or a layer 1 failure.

By default, a heartbeat time that exceeds 1 second (1000 milliseconds) raises a health check (a HEALTH_WARN). For example:

HEALTH_WARN Slow OSD heartbeats on back (longest 1118.001ms)

In the output of the ceph health detail command, you can see which OSDs are experiencing delays and how long the delays are. The output of ceph health detail is limited to ten lines. Here is an example of the output you can expect from the ceph health detail command:

[WRN] OSD_SLOW_PING_TIME_BACK: Slow OSD heartbeats on back (longest 1118.001ms)
    Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.1 [dc1,rack1] 1118.001 msec possibly improving
    Slow OSD heartbeats on back from osd.0 [dc1,rack1] to osd.2 [dc1,rack2] 1030.123 msec
    Slow OSD heartbeats on back from osd.2 [dc1,rack2] to osd.1 [dc1,rack1] 1015.321 msec
    Slow OSD heartbeats on back from osd.1 [dc1,rack1] to osd.0 [dc1,rack1] 1010.456 msec

To see more detail and to collect a complete dump of network performance information, use the dump_osd_network command. This command is usually sent to a Ceph Manager Daemon, but it can be used to collect information about a specific OSD’s interactions by sending it to that OSD. The default threshold for a slow heartbeat is 1 second (1000 milliseconds), but this can be overridden by providing a number of milliseconds as an argument.

To show all network performance data with a specified threshold of 0, send the following command to the mgr:

ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0

{
    "threshold": 0,
    "entries": [
        {
            "last update": "Wed Sep  4 17:04:49 2019",
            "stale": false,
            "from osd": 2,
            "to osd": 0,
            "interface": "front",
            "average": {
                "1min": 1.023,
                "5min": 0.860,
                "15min": 0.883
            },
            "min": {
                "1min": 0.818,
                "5min": 0.607,
                "15min": 0.607
            },
            "max": {
                "1min": 1.164,
                "5min": 1.173,
                "15min": 1.544
            },
            "last": 0.924
        },
        {
            "last update": "Wed Sep  4 17:04:49 2019",
            "stale": false,
            "from osd": 2,
            "to osd": 0,
            "interface": "back",
            "average": {
                "1min": 0.968,
                "5min": 0.897,
                "15min": 0.830
            },
            "min": {
                "1min": 0.860,
                "5min": 0.563,
                "15min": 0.502
            },
            "max": {
                "1min": 1.171,
                "5min": 1.216,
                "15min": 1.456
            },
            "last": 0.845
        },
        {
            "last update": "Wed Sep  4 17:04:48 2019",
            "stale": false,
            "from osd": 0,
            "to osd": 1,
            "interface": "front",
            "average": {
                "1min": 0.965,
                "5min": 0.811,
                "15min": 0.850
            },
            "min": {
                "1min": 0.650,
                "5min": 0.488,
                "15min": 0.466
            },
            "max": {
                "1min": 1.252,
                "5min": 1.252,
                "15min": 1.362
            },
        "last": 0.791
    },
    ...

Muting Health Checks

Health checks can be muted so that they have no effect on the overall reported status of the cluster. For example, if the cluster has raised a single health check and then you mute that health check, then the cluster will report a status of HEALTH_OK. To mute a specific health check, use the health check code that corresponds to that health check (see Health Checks), and run the following command:

ceph health mute <code>

For example, to mute an OSD_DOWN health check, run the following command:

ceph health mute OSD_DOWN

Mutes are reported as part of the short and long form of the ceph health command’s output. For example, in the above scenario, the cluster would report:

ceph health

HEALTH_OK (muted: OSD_DOWN)

ceph health detail

HEALTH_OK (muted: OSD_DOWN)
(MUTED) OSD_DOWN 1 osds down
    osd.1 is down

A mute can be removed by running the following command:

ceph health unmute <code>

For example:

ceph health unmute OSD_DOWN

A “health mute” can have a TTL (Time To Live) associated with it: this means that the mute will automatically expire after a specified period of time. The TTL is specified as an optional duration argument, as seen in the following examples:

ceph health mute OSD_DOWN 4h    # mute for 4 hours
ceph health mute MON_DOWN 15m   # mute for 15 minutes

Normally, if a muted health check is resolved (for example, if the OSD that raised the OSD_DOWN health check in the example above has come back up), the mute goes away. If the health check comes back later, it will be reported in the usual way.

It is possible to make a health mute “sticky”: this means that the mute will remain even if the health check clears. For example, to make a health mute “sticky”, you might run the following command:

ceph health mute OSD_DOWN 1h --sticky   # ignore any/all down OSDs for next hour

Most health mutes disappear if the unhealthy condition that triggered the health check gets worse. For example, suppose that there is one OSD down and the health check is muted. In that case, if one or more additional OSDs go down, then the health mute disappears. This behavior occurs in any health check with a threshold value.

Checking a Cluster’s Usage Stats

To check a cluster’s data usage and data distribution among pools, use the df command. This option is similar to Linux’s df command. Run the following command:

ceph df

The output of ceph df resembles the following:

--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    5.4 PiB  1.2 PiB  4.3 PiB   4.3 PiB      78.58
ssd     22 TiB   19 TiB  2.7 TiB   2.7 TiB      12.36
TOTAL  5.5 PiB  1.2 PiB  4.3 PiB   4.3 PiB      78.32

--- POOLS ---
POOL                         ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                         11     1  558 MiB      141  1.6 GiB      0    5.8 TiB
cephfs_meta                  13  1024  166 GiB   14.59M  499 GiB   2.74    5.8 TiB
cephfs_data                  14  1024      0 B    1.17G      0 B      0    5.8 TiB
cephfsECvol                  19  2048  2.8 PiB    1.81G  3.5 PiB  83.79    561 TiB
.nfs                         20    32  9.7 KiB       61  118 KiB      0    5.8 TiB
testbench                    71    32   12 GiB    3.14k   37 GiB      0    234 TiB
default.rgw.buckets.data     76  2048  482 TiB  132.09M  643 TiB  47.85    526 TiB
.rgw.root                    97     1  1.4 KiB        4   48 KiB      0    5.8 TiB
default.rgw.log              98   256  3.6 KiB      209  408 KiB      0    5.8 TiB
default.rgw.control          99     1      0 B        8      0 B      0    5.8 TiB
default.rgw.meta            100   128  3.8 KiB       20  194 KiB      0    5.8 TiB
default.rgw.buckets.index   101   256  4.2 MiB       33   13 MiB      0    5.8 TiB
default.rgw.buckets.non-ec  102   128  5.6 MiB       13   17 MiB      0    5.8 TiB
kubedata                    104   256   63 GiB   17.65k  188 GiB   0.03    234 TiB
kubemeta                    105   256  241 MiB      166  724 MiB      0    5.8 TiB

CLASS: Statistics for each CRUSH device class present, for example, ssd and hdd.
SIZE: The amount of storage capacity managed by the cluster.
AVAIL: The amount of free space available in the cluster.
USED: The amount of raw storage consumed by user data (excluding BlueStore’s database).
RAW USED: The amount of raw storage consumed by user data, internal overhead, and reserved capacity.
%RAW USED: The percentage of raw storage used. Watch this number in conjunction with backfillfull ratio and near full ratio to be forewarned when your cluster approaches the fullness thresholds. See Storage Capacity.

Additional information may be displayed by invoking as below:

ceph df detail

The output now resembles the below example:

--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    5.4 PiB  1.2 PiB  4.3 PiB   4.3 PiB      78.58
ssd     22 TiB   19 TiB  2.7 TiB   2.7 TiB      12.36
TOTAL  5.5 PiB  1.2 PiB  4.3 PiB   4.3 PiB      78.32

--- POOLS ---
POOL                         ID   PGS   STORED   (DATA)   (OMAP)  OBJECTS     USED   (DATA)   (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
.mgr                         11     1  558 MiB  558 MiB      0 B      141  1.6 GiB  1.6 GiB      0 B      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
cephfs_meta                  13  1024  166 GiB  206 MiB  166 GiB   14.59M  499 GiB  618 MiB  498 GiB   2.74    5.8 TiB            N/A          N/A    N/A         0 B          0 B
cephfs_data                  14  1024      0 B      0 B      0 B    1.17G      0 B      0 B      0 B      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
cephfsECvol                  19  2048  2.8 PiB  2.8 PiB   17 KiB    1.81G  3.5 PiB  3.5 PiB   21 KiB  83.79    561 TiB            N/A          N/A    N/A         0 B          0 B
.nfs                         20    32  9.7 KiB  2.2 KiB  7.5 KiB       61  118 KiB   96 KiB   22 KiB      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
testbench                    71    32   12 GiB   12 GiB  2.3 KiB    3.14k   37 GiB   37 GiB  6.9 KiB      0    234 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.buckets.data     76  2048  482 TiB  482 TiB      0 B  132.09M  643 TiB  643 TiB      0 B  47.85    526 TiB            N/A          N/A    N/A     312 MiB      623 MiB
.rgw.root                    97     1  1.4 KiB  1.4 KiB      0 B        4   48 KiB   48 KiB      0 B      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.log              98   256  3.6 KiB  3.6 KiB      0 B      209  408 KiB  408 KiB      0 B      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.control          99     1      0 B      0 B      0 B        8      0 B      0 B      0 B      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.meta            100   128  3.8 KiB  3.2 KiB    671 B       20  194 KiB  192 KiB  2.0 KiB      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.buckets.index   101   256  4.2 MiB      0 B  4.2 MiB       33   13 MiB      0 B   13 MiB      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
default.rgw.buckets.non-ec  102   128  5.6 MiB      0 B  5.6 MiB       13   17 MiB      0 B   17 MiB      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B
kubedata                    104   256   63 GiB   63 GiB      0 B   17.65k  188 GiB  188 GiB      0 B   0.03    234 TiB            N/A       20 TiB    N/A         0 B          0 B
kubemeta                    105   256  241 MiB  241 MiB  278 KiB      166  723 MiB  722 MiB  833 KiB      0    5.8 TiB            N/A          N/A    N/A         0 B          0 B

POOLS:

The POOLS section of the output provides a list of pools and the notional usage of each pool. This section of the output DOES NOT reflect replicas, clones, or snapshots. For example, if you store an object with 1 MB of data, then the notional usage will be 1 MB, but the actual usage might be 2 MB or more depending on the number of replicas, clones, and snapshots.

ID: The number of the specific node within the pool.
STORED: The actual amount of data that the user has stored in a pool. This is similar to the USED column in earlier versions of Ceph, but the calculations (for BlueStore!) are more precise (in that gaps are properly handled).
- (DATA): Usage for RBD (RADOS Block Device), CephFS file data, and RGW (RADOS Gateway) object data.
- (OMAP): Key-value pairs. Used primarily by CephFS and RGW (RADOS Gateway) for metadata storage.
OBJECTS: The notional number of objects stored per pool (that is, the number of objects other than replicas, clones, or snapshots).
USED: The space allocated for a pool over all OSDs. This includes space for replication, space for allocation granularity, and space for the overhead associated with erasure-coding. Compression savings and object-content gaps are also taken into account. However, BlueStore’s database is not included in the amount reported under USED.
- (DATA): Object usage for RBD (RADOS Block Device), CephFS file data, and RGW (RADOS Gateway) object data.
- (OMAP): Object key-value pairs. Used primarily by CephFS and RGW (RADOS Gateway) for metadata storage.
%USED: The notional percentage of storage used per pool.
MAX AVAIL: An estimate of the notional amount of data that can be written to this pool.
QUOTA OBJECTS: The number of quota objects.
QUOTA BYTES: The number of bytes in the quota objects.
DIRTY: The number of objects in the cache pool that have been written to the cache pool but have not yet been flushed to the base pool. This field is available only when cache tiering is in use.
USED COMPR: The amount of space allocated for compressed data. This includes compressed data in addition to all of the space required for replication, allocation granularity, and erasure- coding overhead.
UNDER COMPR: The amount of data that has passed through compression (summed over all replicas) and that is worth storing in a compressed form.

Note

The numbers in the POOLS section are notional. They do not include the number of replicas, clones, or snapshots. As a result, the sum of the USED and %USED amounts in the POOLS section of the output will not be equal to the sum of the USED and %USED amounts in the RAW section of the output.

Note

The MAX AVAIL value is a complicated function of the replication or the kind of erasure coding used, the CRUSH rule that maps storage to devices, the utilization of those devices, and the configured mon_osd_full_ratio setting.

Checking OSD Status

To check if OSDs are up and in, run the following command:

ceph osd stat

Alternatively, you can run the following command:

ceph osd dump

To view OSDs according to their position in the CRUSH map, run the following command:

ceph osd tree

To print out a CRUSH tree that displays a host, its OSDs, whether the OSDs are up, and the weight of the OSDs, run the following command:

#ID CLASS WEIGHT  TYPE NAME             STATUS REWEIGHT PRI-AFF
 -1       3.00000 pool default
 -3       3.00000 rack mainrack
 -2       3.00000 host osd-host
  0   ssd 1.00000         osd.0             up  1.00000 1.00000
  1   ssd 1.00000         osd.1             up  1.00000 1.00000
  2   ssd 1.00000         osd.2             up  1.00000 1.00000

See Monitoring OSDs and PGs.

Checking Monitor Status

If your cluster has multiple monitors, then you need to perform certain “monitor status” checks. After starting the cluster and before reading or writing data, you should check quorum status. A quorum must be present when multiple monitors are running to ensure proper functioning of your Ceph cluster. Check monitor status regularly in order to ensure that all of the monitors are running.

To display the monitor map, run the following command:

ceph mon stat

Alternatively, you can run the following command:

ceph mon dump

To check the quorum status for the monitor cluster, run the following command:

ceph quorum_status

Ceph returns the quorum status. For example, a Ceph cluster that consists of three monitors might return the following:

{ "election_epoch": 10,
  "quorum": [
        0,
        1,
        2],
  "quorum_names": [
    "a",
    "b",
    "c"],
  "quorum_leader_name": "a",
  "monmap": { "epoch": 1,
      "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
      "modified": "2011-12-12 13:28:27.505520",
      "created": "2011-12-12 13:28:27.505520",
      "features": {"persistent": [
            "kraken",
            "luminous",
            "mimic"],
    "optional": []
      },
      "mons": [
            { "rank": 0,
              "name": "a",
              "addr": "127.0.0.1:6789/0",
          "public_addr": "127.0.0.1:6789/0"},
            { "rank": 1,
              "name": "b",
              "addr": "127.0.0.1:6790/0",
          "public_addr": "127.0.0.1:6790/0"},
            { "rank": 2,
              "name": "c",
              "addr": "127.0.0.1:6791/0",
          "public_addr": "127.0.0.1:6791/0"}
           ]
  }
}

Checking MDS Status

Metadata servers provide metadata services for CephFS. Metadata servers have two sets of states: up | down and active | inactive. To check if your metadata servers are up and active, run the following command:

ceph mds stat

To display details of the metadata servers, run the following command:

ceph fs dump

Checking Placement Group States

Placement groups (PGs) map objects to OSDs. PGs are monitored in order to ensure that they are active and clean. See Monitoring OSDs and PGs.

Using the Admin Socket

The Ceph admin socket allows you to query a daemon via a socket interface. By default, Ceph sockets reside under /var/run/ceph. To access a daemon via the admin socket, log in to the host that is running the daemon and run one of the two following commands:

ceph daemon {daemon-name}
ceph daemon {path-to-socket-file}

For example, the following commands are equivalent to each other:

ceph daemon osd.0 foo
ceph daemon /var/run/ceph/ceph-osd.0.asok foo

There are two methods of running admin socket commands: (1) using ceph daemon as described above, which bypasses the monitor and assumes a direct login to the daemon’s host, and (2) using the ceph tell {daemon-type}.{id} command, which is relayed by monitors and does not require access to the daemon’s host.

Use the raise command to send a signal to a daemon, as if by running kill -X {daemon.pid}. When run via ceph tell it allows signalling a daemon without access to its host:

ceph daemon {daemon-name} raise HUP
ceph tell {daemon-type}.{id} raise -9

To view the available admin-socket commands, run the following command:

ceph daemon {daemon-name} help

Admin-socket commands enable you to view and set your configuration at runtime. For more on viewing your configuration, see Viewing Runtime Settings.

Messenger Status

Ceph daemons and librados clients support an admin socket command messenger dump that surfaces a snapshot of runtime information about connections, sockets, bound addresses, and kernel TCP stats (via tcp(7) TCP_INFO).

Note

The queried messenger needs to lock the connection data structures for the time it takes to create the snapshot. This lock’s duration is in the order of tens of milliseconds. This might interfere with normal operation. Use the dumpcontents argument to limit data structures dumped.

Examples

When a command is issued without specifying a messenger to dump, the list of available messengers is returned:

ceph tell osd.0 messenger dump

{
   "messengers": [
       "client",
       "cluster",
       "hb_back_client",
       "hb_back_server",
       "hb_front_client",
       "hb_front_server",
       "ms_objecter",
       "temp_mon_client"
   ]
 }

The client and cluster messengers correspond to the configured client / cluster network (see Network Configuration Reference). Messengers with hb_ prefix are part of the heartbeat system.

List all current connections on the client messenger:

ceph tell osd.0 messenger dump client \
    | jq -r '.messenger.connections[].async_connection |
        [.conn_id, .socket_fd, .worker_id,
            if .status.connected then "connected" else "disconnected" end,
            .state,
            "\(.peer.type).\(.peer.entity_name.id).\(.peer.id)",
            .protocol.v2.con_mode, .protocol.v2.crypto.rx, .protocol.v2.compression.rx] |
        @tsv'

   102     0       connected       STATE_CONNECTION_ESTABLISHED    client.admin.6407       crc     PLAIN   UNCOMPRESSED
   99      1       connected       STATE_CONNECTION_ESTABLISHED    client.rgw.8000.4473    crc     PLAIN   UNCOMPRESSED
   89      1       connected       STATE_CONNECTION_ESTABLISHED    mgr..-1 secure  AES-128-GCM     UNCOMPRESSED
    101     2       connected       STATE_CONNECTION_ESTABLISHED    client.rgw.8000.4483    crc     PLAIN   UNCOMPRESSED
     86      2       connected       STATE_CONNECTION_ESTABLISHED    mon..-1 secure  AES-128-GCM     UNCOMPRESSED
   102     0       connected       STATE_CONNECTION_ESTABLISHED    client.admin.6383       crc     PLAIN   UNCOMPRESSED

Print active connections and their TCP round trip time and retransmission counters:

ceph tell osd.0 messenger dump client --tcp-info \
    | jq -r '.messenger.connections[].async_connection |
        select(.status.connected) |
        select(.peer.type != "client") |
        [.conn_id, .socket_fd, .worker_id,
            "\(.peer.type).\(.peer.global_id)",
            .tcp_info.tcpi_rtt_us, .tcp_info.tcpi_rttvar_us, .tcp_info.tcpi_total_retrans] |
        @tsv'

248     89      1       mgr.0   863     1677    0
3       86      2       mon.0   230     278     0

Tracking Data Availability Score of a Cluster

Ceph internally tracks the data availability of each pool in a cluster. To check the data availability score of each pool in a cluster, the following command can be invoked:

ceph osd pool availability-status

Example output:

POOL         UPTIME  DOWNTIME  NUMFAILURES  MTBF  MTTR  SCORE     AVAILABLE
rbd             2m     21s           1        2m   21s  0.888889     1
.mgr             86s     0s          0        0s   0s        1       1
cephfs.a.meta    77s     0s          0        0s   0s        1       1
cephfs.a.data    76s     0s          0        0s   0s        1       1

The time values above are rounded for readability. To see the exact second values, use the option --format with json or json-pretty value.

A pool is considered unavailable when at least one PG in the pool becomes inactive or there is at least one unfound object in the pool. Otherwise the pool is considered available. Depending on the current and previous state of the pool we update uptime and downtime values:

Previous State	Current State	Uptime Update	Downtime Update
Available	Available	+diff time	no update
Available	Unavailable	+diff time	no update
Unavailable	Available	+diff time	no update
Unavailable	Unavailable	no update	+diff time

From the updated uptime and downtime values, we calculate the Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR) for each pool. The availability score is then calculated by finding the ratio of MTBF to the total time.

The score is updated every one second. Transient changes to pools that occur and are reverted between successive updates will not be captured. It is possible to configure this interval with a command of the following form:

ceph config set mon pool_availability_update_interval 2

This will set the update interval to two seconds. Please note that it is not possible to set this interval less than the config value set for paxos_propose_interval.

This feature is off by default. To turn the feature on, the enable_availability_tracking config option can be set to true.

ceph config set mon enable_availability_tracking true

While the feature is turned off, the last calculated score will be preserved. The score will again start updating once the feature is turned on again.

It’s also possible to clear the data availability score for a specific pool if needed with a command of the following form:

ceph osd pool clear-availability-status <pool-name>

Note

Clearing a score is not allowed if the feature itself is disabled.

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.