Ceph File System Scrub

CephFS provides the cluster admin (operator) to check consistency of a file system via a set of scrub commands. Scrub can be classified into two parts:

Forward Scrub: In which the scrub operation starts at the root of the file system (or a subdirectory) and looks at everything that can be touched in the hierarchy to ensure consistency.
Backward Scrub: In which the scrub operation looks at every RADOS object in the file system pools and maps it back to the file system hierarchy.

This document details commands to initiate and control forward scrub (referred as scrub thereafter).

Warning

CephFS forward scrubs are started and manipulated on rank 0. All scrub commands must be directed at rank 0.

Initiate File System Scrub

To start a scrub operation for a directory tree, run a command of the following form:

ceph tell mds.<fsname>:0 scrub start <path> [scrubopts] [tag]

where scrubopts is a comma-delimited list of recursive, force, or repair and tag is an optional custom string tag (the default is a generated UUID). An example command is:

ceph tell mds.cephfs:0 scrub start / recursive
{
    "return_code": 0,
    "scrub_tag": "6f0d204c-6cfd-4300-9e02-73f382fd23c1",
    "mode": "asynchronous"
}

Recursive scrub is asynchronous (as hinted by mode in the output above). Asynchronous scrubs must be polled using scrub status to determine the status.

The scrub tag is used to differentiate scrubs and also to mark each inode’s first data object in the default data pool (where the backtrace information is stored) with a scrub_tag extended attribute with the value of the tag. You can verify an inode was scrubbed by looking at the extended attribute using the RADOS utilities.

Scrubs work for multiple active MDS (multiple ranks). The scrub is managed by rank 0 and distributed across MDS as appropriate.

Monitor (ongoing) File System Scrubs

Status of ongoing scrubs can be monitored and polled using the scrub status command. This commands lists out ongoing scrubs (identified by the tag) along with the path and options used to initiate the scrub:

ceph tell mds.cephfs:0 scrub status
{
    "status": "scrub active (85 inodes in the stack)",
    "scrubs": {
        "6f0d204c-6cfd-4300-9e02-73f382fd23c1": {
            "path": "/",
            "options": "recursive"
        }
    }
}

status shows the number of inodes that are scheduled to be scrubbed at any point in time. Hence, it can change on subsequent scrub status invocations. Also, a high-level summary of scrub operation (which includes the operation state and paths on which scrub is triggered) gets displayed in ceph status:

ceph status
[...]

task status:
  scrub status:
      mds.0: active [paths:/]

[...]

A scrub is complete when it no longer shows up in this list (although that may change in future releases). Any damage will be reported via cluster health warnings.

Control (ongoing) File System Scrubs

Pause: Pausing ongoing scrub operations results in no new or pending inodes being scrubbed after in-flight RADOS ops (for the inodes that are currently being scrubbed) finish:

ceph tell mds.cephfs:0 scrub pause
{
    "return_code": 0
}

The scrub status after pausing reflects the paused state. At this point, initiating new scrub operations (via scrub start) would just queue the inode for scrub:

ceph tell mds.cephfs:0 scrub status
{
    "status": "PAUSED (66 inodes in the stack)",
    "scrubs": {
        "6f0d204c-6cfd-4300-9e02-73f382fd23c1": {
            "path": "/",
            "options": "recursive"
        }
    }
}

Resume: Resuming kick-starts a paused scrub operation:

ceph tell mds.cephfs:0 scrub resume
{
    "return_code": 0
}

Abort: Aborting ongoing scrub operations removes pending inodes from the scrub queue (thereby aborting the scrub) after in-flight RADOS ops (for the inodes that are currently being scrubbed) finish:
```
ceph tell mds.cephfs:0 scrub abort
{
    "return_code": 0
}
```

Damages

The types of damage that can be reported and repaired by File System Scrub are:

DENTRY : Inode’s dentry is missing.
DIR_FRAG : Inode’s directory fragment(s) is missing.
BACKTRACE : Inode’s backtrace in the data pool is corrupted.

These above named MDS damage types can be repaired by running a command of the following form:

ceph tell mds.<fsname>:0 scrub start /path recursive,repair,force

If scrub is able to repair the damage, the corresponding entry is automatically removed from the damage table.

Note

A scrub invoked with the repair option can identify a damaged hard link but not repair it.

Evaluate Strays Using Recursive Scrub

To evaluate strays i.e. purge stray directories in ~mdsdir, run a command of the following form:

ceph tell mds.<fsname>:0 scrub start ~mdsdir recursive

~mdsdir is not enqueued by default when scrubbing at the CephFS root. To perform stray evaluation at root, run scrub with flags scrub_mdsdir and recursive:

ceph tell mds.<fsname>:0 scrub start / recursive,scrub_mdsdir

Dump Stray Folder Content

To dump stray folder content on a specific MDS, run a command of the following form:

ceph tell mds.<fsname>:0 dump stray
{
"strays": [
    {
        "ino": "0x100000001f7",
        "stray_prior_path": "/dir/dir1",
        "client_caps": [
            {
                "client_id": 4156,
                "pending": "pAsLsXsFscr",
                "issued": "pAsLsXsFscr",
                "wanted": "-",
                "last_sent": 3
            }
        ],
        "loner": -1,
        "want_loner": -1,
        "mds_caps_wanted": [],
        "is_subvolume": false
    }
]}

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.