ceph_test_rados — Model-Based RADOS Stress Test ================================================ ``ceph_test_rados`` is a model-based integration test that verifies the data correctness of the RADOS layer under stress. It maintains an in-memory model of expected object data and metadata, and compares it against the actual object data returned by RADOS after every read, detecting data corruption, snapshot inconsistencies, and attribute mismatches. .. note:: This is **not** a performance benchmark. For throughput and latency measurement, use ``rados bench``. ``ceph_test_rados`` is a *correctness verifier*. How It Works ------------ 1. **Initialization**: Creates ``--objects`` initial objects via write (or append for EC pools). 2. **Stress loop**: Generates a randomized stream of up to ``--max-ops`` operations, each selected by weighted probability from the ``--op`` arguments. 3. **Verification**: Every read dispatches 3 pipelined reads and compares data, xattrs, and omap entries against the in-memory model. 4. **Completion**: Prints the error count and per-operation-type statistics to stderr. Architecture ~~~~~~~~~~~~ The tool is built from several components: - ``TestRados.cc`` — CLI parsing, ``main()``, and the ``WeightedTestGenerator`` which selects operations by weight. - ``RadosModel.h`` — The ``RadosTestContext`` (in-memory model) and all 26 ``TestOp`` subclasses (``ReadOp``, ``WriteOp``, ``SnapCreateOp``, etc.). - ``Object.h`` / ``Object.cc`` — Content generators (``VarLenGenerator``, ``AppendGenerator``) and the ``ObjectDesc`` model that tracks layered object contents across snapshots. - ``TestOpStat.h`` — Per-operation-type latency statistics collector. Synopsis -------- :: ceph_test_rados --op [--op ...] [--pool ] [--max-ops ] [--objects ] [--max-in-flight ] [--size ] [--min-stride-size ] [--max-stride-size ] [--max-seconds ] [--ec-pool] [--no-omap] [--no-sparse] [--pool-snaps] [--balance-reads] [--localize-reads] [--offlen_randomization_ratio <0-100>] [--write-fadvise-dontneed] [--max-attr-len ] [--set_redirect] [--set_chunk] [--low_tier_pool ] [--enable_dedup] [--dedup_chunk_algo ] [--dedup_chunk_size ] [--timestamps] At least one ``--op`` with a positive weight is required. Core Parameters --------------- ``--pool `` Target RADOS pool (must already exist). Default: ``rbd``. ``--max-ops `` Maximum number of operations to execute (including initial object writes). Default: ``1000``. ``--objects `` Number of distinct objects to create and test against. Must satisfy ``max_in_flight * 2 <= objects``. Default: ``50``. ``--max-in-flight `` Maximum concurrent asynchronous operations. Default: ``16``. ``--max-seconds `` Wall-clock time limit in seconds. ``0`` means unlimited (run until ``--max-ops`` is exhausted). Default: ``0``. Object Geometry --------------- ``--size `` Maximum object size in bytes. Actual sizes are randomized within approximately ``[size/2, size]``. Default: ``4000000`` (~3.8 MiB). ``--min-stride-size `` Minimum write stride in bytes. Must be < ``--max-stride-size`` and <= ``--size``. Default: ``size / 10``. ``--max-stride-size `` Maximum write stride in bytes. Must be > ``--min-stride-size`` and <= ``--size``. Default: ``size / 5``. Pool Type and Behavior ---------------------- ``--ec-pool`` Indicates that the target is an erasure-coded pool **that does not support overwrites**. **Must appear before any** ``--op`` **arguments.** .. note:: This is largely a legacy parameter. When Ceph originally introduced EC pools, they did not support partial overwrites or sparse reads. Today, if an EC pool supports overwrites (e.g., via BlueStore), you should *not* use this flag, so that ``ceph_test_rados`` can test partial overwrites. In the Teuthology QA suite, setting ``erasure_code_use_overwrites: true`` prevents the test runner from passing this flag. Using this flag has the following effects: 1. Implicitly sets ``--no-sparse``. 2. Initial object creation writes use ``append`` mode instead of ``write``. 3. Overwrite operations (``write``, ``write_excl``, ``writesame``) are disallowed and will cause startup validation to fail. ``--no-omap`` Disable omap operations. Automatically set if the pool does not support omap (auto-detected at startup). ``--no-sparse`` Disable sparse reads (use full reads only). Automatically set when ``--ec-pool`` is used. ``--pool-snaps`` Use pool-level snapshots instead of self-managed snapshots. Read Routing ------------ ``--balance-reads`` Set ``LIBRADOS_OPERATION_BALANCE_READS`` on read operations, allowing reads from any replica. ``--localize-reads`` Set ``LIBRADOS_OPERATION_LOCALIZE_READS`` on read operations, preferring the closest replica. ``--offlen_randomization_ratio `` Percentage chance (0–100) that a read uses a randomized offset instead of reading from offset 0. Default: ``50``. Write Behavior -------------- ``--write-fadvise-dontneed`` Set the ``write_fadvise_dontneed`` flag on the pool, advising the OSD backend not to cache written data. ``--max-attr-len `` Maximum generated xattr length in bytes. Default: ``20000``. Manifest and Tiering -------------------- ``--set_redirect`` Enable redirect manifest testing. Requires ``--low_tier_pool``. ``--set_chunk`` Enable chunk-based manifest testing. Requires ``--low_tier_pool``. ``--low_tier_pool `` Low-tier pool for redirect/chunk/dedup operations. Must be a different pool from ``--pool`` to avoid a known race condition. Required when ``--set_redirect`` or ``--set_chunk`` is set. Deduplication ------------- ``--enable_dedup`` Enable deduplication testing. Requires ``--dedup_chunk_algo`` and ``--dedup_chunk_size``. Configures the pool with SHA-256 fingerprinting and the specified chunking algorithm. ``--dedup_chunk_algo `` Chunking algorithm: ``fastcdc`` or ``fixcdc``. ``--dedup_chunk_size `` Chunk size for content-defined chunking (e.g., ``131072``). Output ------ ``--timestamps`` Prefix each output line with a coarse timestamp. Operation Types --------------- Operations are specified via ``--op ``. Weights are relative: an operation with weight 100 is twice as likely as one with weight 50. .. list-table:: :header-rows: 1 :widths: 20 10 70 * - Name - Valid with --ec-pool - Description * - ``read`` - Yes - Read and verify object data, xattrs, and omap against the model. * - ``write`` - No - Random-offset partial write. * - ``write_excl`` - No - Random-offset partial write that asserts the object already exists (``assert_exists()``) as part of the transaction. * - ``writesame`` - No - Write same data pattern across an extent. * - ``delete`` - Yes - Delete an object. * - ``snap_create`` - Yes - Create a snapshot (quiesces in-flight ops first). * - ``snap_remove`` - Yes - Remove a snapshot. * - ``rollback`` - Yes - Roll back an object to a previous snapshot. * - ``setattr`` - Yes - Set random xattrs (and omap if supported). * - ``rmattr`` - Yes - Remove random xattrs (and omap if supported). * - ``watch`` - Yes - Establish a watch, self-notify, wait for callback. * - ``copy_from`` - Yes - Server-side copy between objects in the pool. * - ``hit_set_list`` - Yes - List HitSet entries. * - ``is_dirty`` - Yes - Check object dirty state (cache tier). * - ``undirty`` - Yes - Mark object clean (cache tier). * - ``cache_flush`` - Yes - Flush object from cache tier (blocking). * - ``cache_try_flush`` - Yes - Try to flush object from cache tier (non-blocking). * - ``cache_evict`` - Yes - Evict object from cache tier. * - ``append`` - Yes - Append data to an object. * - ``append_excl`` - Yes - Append data that asserts the object already exists. * - ``set_redirect`` - Yes - Set redirect manifest to low-tier pool. * - ``unset_redirect`` - Yes - Remove redirect manifest. * - ``chunk_read`` - Yes - Read and verify a chunk from a manifest object. * - ``tier_promote`` - Yes - Promote object from lower tier. * - ``tier_flush`` - Yes - Flush object to backing tier. * - ``set_chunk`` - Yes - Set chunk manifest (requires ``--enable_dedup``). * - ``tier_evict`` - Yes - Evict object to backing tier. Environment Variables --------------------- ``CEPH_CLIENT_ID`` Client ID for the librados connection. If unset, connects as the default client. Standard Ceph environment variables (``CEPH_CONF``, ``CEPH_KEYRING``, etc.) are respected. Teuthology Integration ---------------------- The tool is typically invoked via the ``rados`` Teuthology task defined in ``qa/tasks/rados.py``. The task creates pools, translates YAML configuration into CLI arguments, and manages the process lifecycle. Example YAML configuration:: tasks: - rados: clients: [client.0] ops: 400000 max_seconds: 600 objects: 1024 size: 16384 op_weights: read: 100 write: 100 delete: 50 snap_create: 50 snap_remove: 50 rollback: 50 Workload examples are in ``qa/suites/rados/thrash*/workloads/``. .. note:: The Teuthology wrapper automatically splits ``write`` and ``append`` weights into regular and ``_excl`` halves. This does not happen at the CLI level: specify both variants explicitly when invoking the binary directly. Examples -------- Basic replicated pool test:: ceph_test_rados \ --pool testpool \ --max-ops 10000 \ --objects 500 \ --max-in-flight 16 \ --size 4000000 \ --op read 100 \ --op write 100 \ --op delete 10 EC pool (without allow_ec_overwrites) with snapshots:: ceph_test_rados \ --ec-pool \ --pool my-ec-pool \ --max-ops 4000 \ --objects 50 \ --pool-snaps \ --op read 100 \ --op append 100 \ --op delete 50 \ --op snap_create 50 \ --op snap_remove 50 \ --op rollback 50 Deduplication test:: ceph_test_rados \ --pool testpool \ --low_tier_pool low_tier \ --set_chunk \ --enable_dedup \ --dedup_chunk_algo fastcdc \ --dedup_chunk_size 131072 \ --max-ops 1500 \ --objects 50 \ --op read 100 \ --op write 50 \ --op set_chunk 30 \ --op tier_promote 10 Exit Status ----------- The tool will immediately panic (via ``ceph_abort()``) and dump core if any data verification errors (e.g., mismatching object content, corrupt metadata) are detected during reads. If no bugs are hit and the execution time/op count is exhausted, the tool will exit cleanly with status **0**. Exit status **1** indicates a startup validation failure (such as incompatible arguments). Source Files ------------ - ``src/test/osd/TestRados.cc`` — CLI parsing and main loop - ``src/test/osd/RadosModel.h`` — Test context and operation classes - ``src/test/osd/Object.h`` — Content generation and verification model - ``src/test/osd/TestOpStat.h`` — Operation statistics - ``qa/tasks/rados.py`` — Teuthology task wrapper