Notice

This document is for a development version of Ceph.

Metrics 

The Ceph Object Gateway uses Perf counters to track metrics. The counters can be labeled (Labeled Perf Counters). When counters are labeled, they are stored in the Ceph Object Gateway specific caches.

These metrics can be sent to the time series database Prometheus to visualize a cluster wide view of usage data (ex: number of S3 put operations on a specific bucket) over time.

Contents

Metrics

Op Metrics 

The following metrics related to S3 or Swift operations are tracked per Ceph Object Gateway.

Radosgw Op Metrics
Name	Type	Description
put_obj_ops	Counter	Number of put operations
put_obj_bytes	Counter	Number of bytes put
put_obj_lat	Gauge	Total latency of put operations
get_obj_ops	Counter	Number of get operations
get_obj_bytes	Counter	Number of bytes from get requests
get_obj_lat	Gauge	Total latency of get operations
del_obj_ops	Counter	Number of delete object operations
del_obj_bytes	Counter	Number of bytes deleted
del_obj_lat	Gauge	Total latency of delete object operations
del_bucket_ops	Counter	Number of delete bucket operations
del_bucket_lat	Gauge	Total latency of delete bucket operations
copy_obj_ops	Counter	Number of copy object operations
copy_obj_bytes	Counter	Number of bytes copied
copy_obj_lat	Gauge	Total latency of copy object operations
list_object_ops	Counter	Number of list object operations
list_object_lat	Gauge	Total latency of list object operations
list_bucket_ops	Counter	Number of list bucket operations
list_bucket_lat	Gauge	Total latency of list bucket operations

There are three different sections in the output of the counter dump and counter schema commands that show the op metrics and their information. The sections are rgw_op, rgw_op_per_user, and rgw_op_per_bucket.

The counters in the rgw_op section reflect the totals of each op metric for a given Ceph Object Gateway. The counters in the rgw_op_per_user and rgw_op_per_bucket sections are labeled counters of op metrics for a user or bucket respectively.

Information about op metrics can be seen in the rgw_op sections of the output of the counter schema command.

To view op metrics in the Ceph Object Gateway go to the rgw_op sections of the output of the counter dump command:

"rgw_op": [
    {
        "labels": {},
        "counters": {
            "put_obj_ops": 2,
            "put_obj_bytes": 5327,
            "put_obj_lat": {
                "avgcount": 2,
                "sum": 2.818064835,
                "avgtime": 1.409032417
            },
            "get_obj_ops": 5,
            "get_obj_bytes": 5325,
            "get_obj_lat": {
                "avgcount": 2,
                "sum": 0.003000069,
                "avgtime": 0.001500034
            },
            ...
            "list_buckets_ops": 1,
            "list_buckets_lat": {
                "avgcount": 1,
                "sum": 0.002300000,
                "avgtime": 0.002300000
            }
        }
    },
]

Op Metrics Labels 

Op metrics can also be tracked per-user or per-bucket. These metrics are exported to Prometheus with labels like Bucket = {name} or User = {userid}:

"rgw_op_per_bucket": [
    ...
    {
        "labels": {
            "Bucket": "bucket1"
        },
        "counters": {
            "put_obj_ops": 2,
            "put_obj_bytes": 5327,
            "put_obj_lat": {
                "avgcount": 2,
                "sum": 2.818064835,
                "avgtime": 1.409032417
            },
            "get_obj_ops": 5,
            "get_obj_bytes": 5325,
            "get_obj_lat": {
                "avgcount": 2,
                "sum": 0.003000069,
                "avgtime": 0.001500034
            },
            ...
            "list_buckets_ops": 1,
            "list_buckets_lat": {
                "avgcount": 1,
                "sum": 0.002300000,
                "avgtime": 0.002300000
            }
        }
    },
    ...
]

RGW Multi-tenancy allows to use buckets and users of the same name simultaneously. If a user or bucket lies under a tenant, a label for tenant in the form Tenant = {tenantid} is added to the metric.

In a large system with many users and buckets, it may not be tractable to export all metrics to Prometheus. For that reason, the collection of these labeled metrics is disabled by default.

Once enabled, the working set of tracked users and buckets is constrained to limit memory and database usage. As a result, the collection of these labeled metrics will not always be reliable.

User & Bucket Counter Caches 

To track op metrics by user the Ceph Object Gateway the config value rgw_user_counters_cache must be set to true.

To track op metrics by bucket the Ceph Object Gateway the config value rgw_bucket_counters_cache must be set to true.

These config values are set in Ceph via the command ceph config set client.rgw rgw_{user,bucket}_counters_cache true

Since the op metrics are labeled perf counters, they live in memory. If the Ceph Object Gateway is restarted or crashes, all counters in the Ceph Object Gateway, whether in a cache or not, are lost.

User & Bucket Counter Cache Size & Eviction 

Both rgw_user_counters_cache_size and rgw_bucket_counters_cache_size can be used to set number of entries in each cache.

Counters are evicted from a cache once the number of counters in the cache are greater than the cache size config variable. The counters that are evicted are the least recently used (LRU).

For example if the number of buckets exceeded rgw_bucket_counters_cache_size by 1 and the counters with label bucket1 were the last to be updated, the counters for bucket1 would be evicted from the cache. If S3 operations tracked by the op metrics were done on bucket1 after eviction, all of the metrics in the cache for bucket1 would start at 0.

Cache sizing can depend on a number of factors. These factors include:

Number of users in the cluster
Number of buckets in the cluster
Memory usage of the Ceph Object Gateway
Disk and memory usage of Promtheus.

To help calculate the Ceph Object Gateway’s memory usage of a cache, it should be noted that each cache entry, encompassing all of the op metrics, is 1360 bytes. This is an estimate and subject to change if metrics are added or removed from the op metrics list.

Sending Metrics to Prometheus 

To get metrics from a Ceph Object Gateway into the time series database Prometheus, the ceph-exporter daemon must be running and configured to scrape the Radogw’s admin socket.

The ceph-exporter daemon scrapes the Ceph Object Gateway’s admin socket at a regular interval, defined by the config variable exporter_stats_period.

Prometheus has a configurable interval in which it scrapes the exporter (see: https://prometheus.io/docs/prometheus/latest/configuration/configuration/).

Config Reference 

The following rgw op metrics related settings can be set via ceph config set client.rgw CONFIG_VARIABLE VALUE.

rgw_user_counters_cache

enable a rgw perf counters cache for counters with user label

type

bool

default

false

see also

rgw_user_counters_cache_size

rgw_user_counters_cache_size

Number of labeled perf counters the user perf counters cache can store

type

uint

default

10000

see also

rgw_user_counters_cache

rgw_bucket_counters_cache

enable a rgw perf counters cache for counters with bucket label

type

bool

default

false

see also

rgw_bucket_counters_cache_size

rgw_bucket_counters_cache_size

Number of labeled perf counters the bucket perf counters cache can store

type

uint

default

10000

see also

rgw_bucket_counters_cache

The following are notable ceph-exporter related settings can be set via ceph config set global CONFIG_VARIABLE VALUE.

exporter_stats_period

Time to wait before sending requests again to exporter server (seconds)

type

int

default

5

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.

Metrics

Op Metrics

Op Metrics Labels

User & Bucket Counter Caches

User & Bucket Counter Cache Size & Eviction

Sending Metrics to Prometheus

Config Reference