Notice
This document is for a development version of Ceph.
ceph_test_rados — Model-Based RADOS Stress Test
ceph_test_rados is a model-based integration test that verifies the
data correctness of the RADOS layer under stress. It maintains an in-memory
model of expected object data and metadata, and compares it against the
actual object data returned by RADOS after every read, detecting data
corruption, snapshot inconsistencies, and attribute mismatches.
Note
This is not a performance benchmark. For throughput and latency
measurement, use rados bench. ceph_test_rados is a
correctness verifier.
How It Works
Initialization: Creates
--objectsinitial objects via write (or append for EC pools).Stress loop: Generates a randomized stream of up to
--max-opsoperations, each selected by weighted probability from the--oparguments.Verification: Every read dispatches 3 pipelined reads and compares data, xattrs, and omap entries against the in-memory model.
Completion: Prints the error count and per-operation-type statistics to stderr.
Architecture
The tool is built from several components:
TestRados.cc— CLI parsing,main(), and theWeightedTestGeneratorwhich selects operations by weight.RadosModel.h— TheRadosTestContext(in-memory model) and all 26TestOpsubclasses (ReadOp,WriteOp,SnapCreateOp, etc.).Object.h/Object.cc— Content generators (VarLenGenerator,AppendGenerator) and theObjectDescmodel that tracks layered object contents across snapshots.TestOpStat.h— Per-operation-type latency statistics collector.
Synopsis
ceph_test_rados
--op <read|write|write_excl|writesame|delete|snap_create|snap_remove|
rollback|setattr|rmattr|watch|copy_from|hit_set_list|is_dirty|
undirty|cache_flush|cache_try_flush|cache_evict|append|append_excl|
set_redirect|unset_redirect|chunk_read|tier_promote|tier_flush|
set_chunk|tier_evict> <weight>
[--op <operation_type> <weight> ...]
[--pool <pool_name>]
[--max-ops <op_count>]
[--objects <object_count>]
[--max-in-flight <max_concurrent>]
[--size <max_size_bytes>]
[--min-stride-size <bytes>]
[--max-stride-size <bytes>]
[--max-seconds <seconds>]
[--ec-pool]
[--no-omap]
[--no-sparse]
[--pool-snaps]
[--balance-reads]
[--localize-reads]
[--offlen_randomization_ratio <0-100>]
[--write-fadvise-dontneed]
[--max-attr-len <bytes>]
[--set_redirect]
[--set_chunk]
[--low_tier_pool <pool_name>]
[--enable_dedup]
[--dedup_chunk_algo <fastcdc|fixcdc>]
[--dedup_chunk_size <bytes>]
[--timestamps]
At least one --op with a positive weight is required.
Core Parameters
--pool <name>Target RADOS pool (must already exist). Default:
rbd.--max-ops <n>Maximum number of operations to execute (including initial object writes). Default:
1000.--objects <n>Number of distinct objects to create and test against. Must satisfy
max_in_flight * 2 <= objects. Default:50.--max-in-flight <n>Maximum concurrent asynchronous operations. Default:
16.--max-seconds <n>Wall-clock time limit in seconds.
0means unlimited (run until--max-opsis exhausted). Default:0.
Object Geometry
--size <n>Maximum object size in bytes. Actual sizes are randomized within approximately
[size/2, size]. Default:4000000(~3.8 MiB).--min-stride-size <n>Minimum write stride in bytes. Must be <
--max-stride-sizeand <=--size. Default:size / 10.--max-stride-size <n>Maximum write stride in bytes. Must be >
--min-stride-sizeand <=--size. Default:size / 5.
Pool Type and Behavior
--ec-poolIndicates that the target is an erasure-coded pool that does not support overwrites. Must appear before any
--oparguments.Note
This is largely a legacy parameter. When Ceph originally introduced EC pools, they did not support partial overwrites or sparse reads. Today, if an EC pool supports overwrites (e.g., via BlueStore), you should not use this flag, so that
ceph_test_radoscan test partial overwrites. In the Teuthology QA suite, settingerasure_code_use_overwrites: trueprevents the test runner from passing this flag.Using this flag has the following effects:
Implicitly sets
--no-sparse.Initial object creation writes use
appendmode instead ofwrite.Overwrite operations (
write,write_excl,writesame) are disallowed and will cause startup validation to fail.
--no-omapDisable omap operations. Automatically set if the pool does not support omap (auto-detected at startup).
--no-sparseDisable sparse reads (use full reads only). Automatically set when
--ec-poolis used.--pool-snapsUse pool-level snapshots instead of self-managed snapshots.
Read Routing
--balance-readsSet
LIBRADOS_OPERATION_BALANCE_READSon read operations, allowing reads from any replica.--localize-readsSet
LIBRADOS_OPERATION_LOCALIZE_READSon read operations, preferring the closest replica.--offlen_randomization_ratio <n>Percentage chance (0–100) that a read uses a randomized offset instead of reading from offset 0. Default:
50.
Write Behavior
--write-fadvise-dontneedSet the
write_fadvise_dontneedflag on the pool, advising the OSD backend not to cache written data.--max-attr-len <n>Maximum generated xattr length in bytes. Default:
20000.
Manifest and Tiering
--set_redirectEnable redirect manifest testing. Requires
--low_tier_pool.--set_chunkEnable chunk-based manifest testing. Requires
--low_tier_pool.--low_tier_pool <name>Low-tier pool for redirect/chunk/dedup operations. Must be a different pool from
--poolto avoid a known race condition. Required when--set_redirector--set_chunkis set.
Deduplication
--enable_dedupEnable deduplication testing. Requires
--dedup_chunk_algoand--dedup_chunk_size. Configures the pool with SHA-256 fingerprinting and the specified chunking algorithm.--dedup_chunk_algo <algorithm>Chunking algorithm:
fastcdcorfixcdc.--dedup_chunk_size <size>Chunk size for content-defined chunking (e.g.,
131072).
Output
--timestampsPrefix each output line with a coarse timestamp.
Operation Types
Operations are specified via --op <name> <weight>. Weights are
relative: an operation with weight 100 is twice as likely as one with
weight 50.
Name |
Valid with --ec-pool |
Description |
|---|---|---|
|
Yes |
Read and verify object data, xattrs, and omap against the model. |
|
No |
Random-offset partial write. |
|
No |
Random-offset partial write that asserts the object already exists
( |
|
No |
Write same data pattern across an extent. |
|
Yes |
Delete an object. |
|
Yes |
Create a snapshot (quiesces in-flight ops first). |
|
Yes |
Remove a snapshot. |
|
Yes |
Roll back an object to a previous snapshot. |
|
Yes |
Set random xattrs (and omap if supported). |
|
Yes |
Remove random xattrs (and omap if supported). |
|
Yes |
Establish a watch, self-notify, wait for callback. |
|
Yes |
Server-side copy between objects in the pool. |
|
Yes |
List HitSet entries. |
|
Yes |
Check object dirty state (cache tier). |
|
Yes |
Mark object clean (cache tier). |
|
Yes |
Flush object from cache tier (blocking). |
|
Yes |
Try to flush object from cache tier (non-blocking). |
|
Yes |
Evict object from cache tier. |
|
Yes |
Append data to an object. |
|
Yes |
Append data that asserts the object already exists. |
|
Yes |
Set redirect manifest to low-tier pool. |
|
Yes |
Remove redirect manifest. |
|
Yes |
Read and verify a chunk from a manifest object. |
|
Yes |
Promote object from lower tier. |
|
Yes |
Flush object to backing tier. |
|
Yes |
Set chunk manifest (requires |
|
Yes |
Evict object to backing tier. |
Environment Variables
CEPH_CLIENT_IDClient ID for the librados connection. If unset, connects as the default client.
Standard Ceph environment variables (CEPH_CONF, CEPH_KEYRING,
etc.) are respected.
Teuthology Integration
The tool is typically invoked via the rados Teuthology task defined
in qa/tasks/rados.py. The task creates pools, translates YAML
configuration into CLI arguments, and manages the process lifecycle.
Example YAML configuration:
tasks:
- rados:
clients: [client.0]
ops: 400000
max_seconds: 600
objects: 1024
size: 16384
op_weights:
read: 100
write: 100
delete: 50
snap_create: 50
snap_remove: 50
rollback: 50
Workload examples are in qa/suites/rados/thrash*/workloads/.
Note
The Teuthology wrapper automatically splits write and append
weights into regular and _excl halves. This does not happen at
the CLI level: specify both variants explicitly when invoking the
binary directly.
Examples
Basic replicated pool test:
ceph_test_rados \
--pool testpool \
--max-ops 10000 \
--objects 500 \
--max-in-flight 16 \
--size 4000000 \
--op read 100 \
--op write 100 \
--op delete 10
EC pool (without allow_ec_overwrites) with snapshots:
ceph_test_rados \
--ec-pool \
--pool my-ec-pool \
--max-ops 4000 \
--objects 50 \
--pool-snaps \
--op read 100 \
--op append 100 \
--op delete 50 \
--op snap_create 50 \
--op snap_remove 50 \
--op rollback 50
Deduplication test:
ceph_test_rados \
--pool testpool \
--low_tier_pool low_tier \
--set_chunk \
--enable_dedup \
--dedup_chunk_algo fastcdc \
--dedup_chunk_size 131072 \
--max-ops 1500 \
--objects 50 \
--op read 100 \
--op write 50 \
--op set_chunk 30 \
--op tier_promote 10
Exit Status
The tool will immediately panic (via ceph_abort()) and dump core
if any data verification errors (e.g., mismatching object content,
corrupt metadata) are detected during reads.
If no bugs are hit and the execution time/op count is exhausted, the tool will exit cleanly with status 0.
Exit status 1 indicates a startup validation failure (such as incompatible arguments).
Source Files
src/test/osd/TestRados.cc— CLI parsing and main loopsrc/test/osd/RadosModel.h— Test context and operation classessrc/test/osd/Object.h— Content generation and verification modelsrc/test/osd/TestOpStat.h— Operation statisticsqa/tasks/rados.py— Teuthology task wrapper
Brought to you by the Ceph Foundation
The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.