Notice

This document is for a development version of Ceph.

Metrics

The Ceph Object Gateway uses Perf counters to track metrics. The counters can be labeled (Labeled Perf Counters). When counters are labeled, they are stored in the Ceph Object Gateway specific caches.

These metrics can be sent to the time series database Prometheus to visualize a cluster wide view of usage data (ex: number of S3 put operations on a specific bucket) over time.

Op Metrics

The following metrics related to S3 or Swift operations are tracked per Ceph Object Gateway.

Radosgw Op Metrics

Name

Type

Description

put_obj_ops

Counter

Number of put operations

put_obj_bytes

Counter

Number of bytes put

put_obj_lat

Gauge

Total latency of put operations

get_obj_ops

Counter

Number of get operations

get_obj_bytes

Counter

Number of bytes from get requests

get_obj_lat

Gauge

Total latency of get operations

del_obj_ops

Counter

Number of delete object operations

del_obj_bytes

Counter

Number of bytes deleted

del_obj_lat

Gauge

Total latency of delete object operations

del_bucket_ops

Counter

Number of delete bucket operations

del_bucket_lat

Gauge

Total latency of delete bucket operations

copy_obj_ops

Counter

Number of copy object operations

copy_obj_bytes

Counter

Number of bytes copied

copy_obj_lat

Gauge

Total latency of copy object operations

list_object_ops

Counter

Number of list object operations

list_object_lat

Gauge

Total latency of list object operations

list_bucket_ops

Counter

Number of list bucket operations

list_bucket_lat

Gauge

Total latency of list bucket operations

There are three different sections in the output of the counter dump and counter schema commands that show the op metrics and their information. The sections are rgw_op, rgw_op_per_user, and rgw_op_per_bucket.

The counters in the rgw_op section reflect the totals of each op metric for a given Ceph Object Gateway. The counters in the rgw_op_per_user and rgw_op_per_bucket sections are labeled counters of op metrics for a user or bucket respectively.

Information about op metrics can be seen in the rgw_op sections of the output of the counter schema command.

To view op metrics in the Ceph Object Gateway go to the rgw_op sections of the output of the counter dump command:

"rgw_op": [
    {
        "labels": {},
        "counters": {
            "put_obj_ops": 2,
            "put_obj_bytes": 5327,
            "put_obj_lat": {
                "avgcount": 2,
                "sum": 2.818064835,
                "avgtime": 1.409032417
            },
            "get_obj_ops": 5,
            "get_obj_bytes": 5325,
            "get_obj_lat": {
                "avgcount": 2,
                "sum": 0.003000069,
                "avgtime": 0.001500034
            },
            ...
            "list_buckets_ops": 1,
            "list_buckets_lat": {
                "avgcount": 1,
                "sum": 0.002300000,
                "avgtime": 0.002300000
            }
        }
    },
]

Op Metrics Labels

Op metrics can also be tracked per-user or per-bucket. These metrics are exported to Prometheus with labels like Bucket = {name} or User = {userid}:

"rgw_op_per_bucket": [
    ...
    {
        "labels": {
            "Bucket": "bucket1"
        },
        "counters": {
            "put_obj_ops": 2,
            "put_obj_bytes": 5327,
            "put_obj_lat": {
                "avgcount": 2,
                "sum": 2.818064835,
                "avgtime": 1.409032417
            },
            "get_obj_ops": 5,
            "get_obj_bytes": 5325,
            "get_obj_lat": {
                "avgcount": 2,
                "sum": 0.003000069,
                "avgtime": 0.001500034
            },
            ...
            "list_buckets_ops": 1,
            "list_buckets_lat": {
                "avgcount": 1,
                "sum": 0.002300000,
                "avgtime": 0.002300000
            }
        }
    },
    ...
]

RGW Multi-tenancy allows to use buckets and users of the same name simultaneously. If a user or bucket lies under a tenant, a label for tenant in the form Tenant = {tenantid} is added to the metric.

In a large system with many users and buckets, it may not be tractable to export all metrics to Prometheus. For that reason, the collection of these labeled metrics is disabled by default.

Once enabled, the working set of tracked users and buckets is constrained to limit memory and database usage. As a result, the collection of these labeled metrics will not always be reliable.

User & Bucket Counter Caches

To track op metrics by user the Ceph Object Gateway the config value rgw_user_counters_cache must be set to true.

To track op metrics by bucket the Ceph Object Gateway the config value rgw_bucket_counters_cache must be set to true.

These config values are set in Ceph via the command ceph config set client.rgw rgw_{user,bucket}_counters_cache true

Since the op metrics are labeled perf counters, they live in memory. If the Ceph Object Gateway is restarted or crashes, all counters in the Ceph Object Gateway, whether in a cache or not, are lost.

User & Bucket Counter Cache Size & Eviction

Both rgw_user_counters_cache_size and rgw_bucket_counters_cache_size can be used to set number of entries in each cache.

Counters are evicted from a cache once the number of counters in the cache are greater than the cache size config variable. The counters that are evicted are the least recently used (LRU).

For example if the number of buckets exceeded rgw_bucket_counters_cache_size by 1 and the counters with label bucket1 were the last to be updated, the counters for bucket1 would be evicted from the cache. If S3 operations tracked by the op metrics were done on bucket1 after eviction, all of the metrics in the cache for bucket1 would start at 0.

Cache sizing can depend on a number of factors. These factors include:

  1. Number of users in the cluster

  2. Number of buckets in the cluster

  3. Memory usage of the Ceph Object Gateway

  4. Disk and memory usage of Promtheus.

To help calculate the Ceph Object Gateway’s memory usage of a cache, it should be noted that each cache entry, encompassing all of the op metrics, is 1360 bytes. This is an estimate and subject to change if metrics are added or removed from the op metrics list.

Sending Metrics to Prometheus

To get metrics from a Ceph Object Gateway into the time series database Prometheus, the ceph-exporter daemon must be running and configured to scrape the Radogw’s admin socket.

The ceph-exporter daemon scrapes the Ceph Object Gateway’s admin socket at a regular interval, defined by the config variable exporter_stats_period.

Prometheus has a configurable interval in which it scrapes the exporter (see: https://prometheus.io/docs/prometheus/latest/configuration/configuration/).

Config Reference

The following rgw op metrics related settings can be set via ceph config set client.rgw CONFIG_VARIABLE VALUE.

rgw_user_counters_cache

enable a rgw perf counters cache for counters with user label

type

bool

default

false

see also

rgw_user_counters_cache_size

rgw_user_counters_cache_size

Number of labeled perf counters the user perf counters cache can store

type

uint

default

10000

see also

rgw_user_counters_cache

rgw_bucket_counters_cache

enable a rgw perf counters cache for counters with bucket label

type

bool

default

false

see also

rgw_bucket_counters_cache_size

rgw_bucket_counters_cache_size

Number of labeled perf counters the bucket perf counters cache can store

type

uint

default

10000

see also

rgw_bucket_counters_cache

The following are notable ceph-exporter related settings can be set via ceph config set global CONFIG_VARIABLE VALUE.

exporter_stats_period

Time to wait before sending requests again to exporter server (seconds)

type

int

default

5

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.