Notice

This document is for a development version of Ceph.

CephFS Snapshot Mirroring

CephFS supports asynchronous push-based replication of snapshots to a remote CephFS file system via the cephfs-mirror tool. Snapshots are synchronized by mirroring snapshot data followed by creating a remote snapshot with the same name (for a given directory on the remote file system) as the source snapshot.

Requirements

The primary (local) and secondary (remote) Ceph cluster versions should be Pacific or later.

Creating Users

Start by creating a Ceph user (on the primary/local cluster) for the cephfs-mirror daemon. This user requires write capability on the metadata pool to create RADOS objects (index objects) for watch/notify operation and read capability on the data pool(s):

$ ceph auth get-or-create client.mirror mon 'profile cephfs-mirror' mds 'allow r' osd 'allow rw tag cephfs metadata=*, allow r tag cephfs data=*' mgr 'allow r'

Create a Ceph user for each file system peer (on the secondary/remote cluster). This user needs to have full capabilities on the MDS (to take snapshots) and the OSDs:

$ ceph fs authorize <fs_name> client.mirror_remote / rwps

This user will be supplied as part of the peer specification when adding a peer.

Starting Mirror Daemon

The mirror daemon is managed by systemd, though in most cases it is best to use the cephadm interface:

$ systemctl enable cephfs-mirror@mirror
$ systemctl start cephfs-mirror@mirror

cephfs-mirror daemon can be run in foreground using:

$ cephfs-mirror --id mirror --cluster site-a -f

Note

The user specified here is mirror, the creation of which is described in the Creating Users section.

Multiple cephfs-mirror daemons may be deployed for concurrent synchronization and high availability. Mirror daemons share the synchronization load using a simple M/N policy, where M is the number of directories and N is the number of cephfs-mirror daemons.

When cephadm is used to manage a Ceph cluster, cephfs-mirror daemons can be deployed by running the following command:

ceph orch apply cephfs-mirror

To deploy multiple mirror daemons, run a command of the following form:

ceph orch apply cephfs-mirror --placement=<placement-spec>

For example, to deploy three cephfs-mirror daemons on different hosts, run a command of the following form:

$ ceph orch apply cephfs-mirror --placement="3 host1,host2,host3"

Interface

The Manager mirroring module provides interfaces for managing directory snapshot mirroring. These are (mostly) wrappers around monitor commands for managing file system mirroring and is the recommended control interface.

Mirroring Module

The mirroring module is responsible for assigning directories to mirror daemons for synchronization. Multiple mirror daemons can be spawned to achieve concurrency in directory snapshot synchronization. When mirror daemons are spawned (or terminated), the mirroring module discovers the modified set of mirror daemons and rebalances directory assignments across the new set, thus providing high-availability.

Note

Deploying a single mirror daemon is recommended. Running multiple daemons is untested.

The following file types are supported by the mirroring:

Regular files (-)
Directory files (d)
Symbolic link file (l)

The other file types are ignored by the mirroring. So they won’t be available on a successfully synchronized peer.

The mirroring module is disabled by default. To enable the mirroring module, run the following command:

ceph mgr module enable mirroring

The mirroring module provides a family of commands that can be used to control the mirroring of directory snapshots. To add or remove directories, mirroring must be enabled for a given file system. To enable mirroring for a given file system, run a command of the following form:

ceph fs snapshot mirror enable <fs_name>

Note

“Mirroring module” commands are prefixed with fs snapshot mirror. This distinguishes them from “monitor commands”, which are prefixed with fs mirror. Enabling mirroring by using monitor commands will result in the mirror daemon entering the “failed” state due to the absence of the cephfs_mirror index object. So be sure (in this context) to use module commands.

To disable mirroring for a given file system, run a command of the following form:

ceph fs snapshot mirror disable <fs_name>

After mirroring is enabled, add a peer to which directory snapshots are to be mirrored. Peers are specified by the <client>@<cluster> format, which is referred to elsewhere in this document as the remote_cluster_spec. Peers are assigned a unique-id (UUID) when added. See the Creating Users section for instructions that describe how to create Ceph users for mirroring.

To add a peer, run a command of the following form:

ceph fs snapshot mirror peer_add <fs_name> <remote_cluster_spec> [<remote_fs_name>] [<remote_mon_host>] [<cephx_key>]

<remote_cluster_spec> is of the format client.<id>@<cluster_name>.

<remote_fs_name> is optional, and defaults to the same value as <fs_name> (on the remote cluster).

For this command to succeed, the remote cluster’s Ceph configuration and user keyring must be available in the primary cluster. For example, if a user named client_mirror is created on the remote cluster which has rwps permissions for the remote file system named remote_fs (see Creating Users) and the remote cluster is named remote_ceph (that is, the remote cluster configuration file is named remote_ceph.conf on the primary cluster), run the following command to add the remote filesystem as a peer to the primary filesystem primary_fs:

ceph fs snapshot mirror peer_add primary_fs client.mirror_remote@remote_ceph remote_fs

To avoid having to maintain the remote cluster configuration file and remote ceph user keyring in the primary cluster, users can bootstrap a peer (which stores the relevant remote cluster details in the Monitor config store on the primary cluster). See the Bootstrap Peers section.

The peer_add command supports passing the remote cluster Monitor address and the user key. However, bootstrapping a peer is the recommended way to add a peer.

Note

Only a single peer is currently supported. The peer_add command is deprecated and will be removed in a future release. Use the peer_bootstrap command instead.

To remove a peer, run a command of the following form:

ceph fs snapshot mirror peer_remove <fs_name> <peer_uuid>

To list file system mirror peers, run a command of the following form:

ceph fs snapshot mirror peer_list <fs_name>

To configure a directory for mirroring, run a command of the following form:

ceph fs snapshot mirror add <fs_name> <path>

To list the configured directories, run a command of the following form:

ceph fs snapshot mirror ls <fs_name>

To stop mirroring directory snapshots, run a command of the following form:

ceph fs snapshot mirror remove <fs_name> <path>

Only absolute directory paths are allowed.

Paths are normalized by the mirroring module. This means that /a/b/../b is equivalent to /a/b. Paths always start from the CephFS file-system root and not from the host system mount point.

For example:

$ mkdir -p /d0/d1/d2
$ ceph fs snapshot mirror add cephfs /d0/d1/d2
{}
$ ceph fs snapshot mirror add cephfs /d0/d1/../d1/d2
Error EEXIST: directory /d0/d1/d2 is already tracked

After a directory is added for mirroring, the additional mirroring of subdirectories or ancestor directories is disallowed:

$ ceph fs snapshot mirror add cephfs /d0/d1
Error EINVAL: /d0/d1 is a ancestor of tracked path /d0/d1/d2
$ ceph fs snapshot mirror add cephfs /d0/d1/d2/d3
Error EINVAL: /d0/d1/d2/d3 is a subtree of tracked path /d0/d1/d2

The Mirroring Status section contains information about the commands for checking the directory mapping (to mirror daemons) and for checking the directory distribution.

Bootstrap Peers

Adding a peer via the peer_add subcommand requires the peer cluster configuration and user keyring to be available in the primary cluster (Manager host and hosts running the mirror daemon). This can be avoided by bootstrapping and importing a peer token. Peer bootstrap involves creating a bootstrap token on the peer cluster via:

$ ceph fs snapshot mirror peer_bootstrap create <fs_name> <client_entity> <site-name>

e.g.:

$ ceph fs snapshot mirror peer_bootstrap create backup_fs client.mirror_remote site-remote
{"token": "eyJmc2lkIjogIjBkZjE3MjE3LWRmY2QtNDAzMC05MDc5LTM2Nzk4NTVkNDJlZiIsICJmaWxlc3lzdGVtIjogImJhY2t1cF9mcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcGVlcl9ib290c3RyYXAiLCAic2l0ZV9uYW1lIjogInNpdGUtcmVtb3RlIiwgImtleSI6ICJBUUFhcDBCZ0xtRmpOeEFBVnNyZXozai9YYUV0T2UrbUJEZlJDZz09IiwgIm1vbl9ob3N0IjogIlt2MjoxOTIuMTY4LjAuNTo0MDkxOCx2MToxOTIuMTY4LjAuNTo0MDkxOV0ifQ=="}

site-name refers to a user-defined string to identify the remote filesystem. In context of the peer_add subcommand, site-name is that contained in the remote_cluster_spec.

Import the bootstrap token in the primary cluster via:

$ ceph fs snapshot mirror peer_bootstrap import <fs_name> <token>

e.g.:

$ ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogIjBkZjE3MjE3LWRmY2QtNDAzMC05MDc5LTM2Nzk4NTVkNDJlZiIsICJmaWxlc3lzdGVtIjogImJhY2t1cF9mcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcGVlcl9ib290c3RyYXAiLCAic2l0ZV9uYW1lIjogInNpdGUtcmVtb3RlIiwgImtleSI6ICJBUUFhcDBCZ0xtRmpOeEFBVnNyZXozai9YYUV0T2UrbUJEZlJDZz09IiwgIm1vbl9ob3N0IjogIlt2MjoxOTIuMTY4LjAuNTo0MDkxOCx2MToxOTIuMTY4LjAuNTo0MDkxOV0ifQ==

Snapshot Mirroring

To initiate snapshot mirroring, create a snapshot of the configured directory in the primary cluster:

$ mkdir -p /d0/d1/d2/.snap/snap1

Mirroring Status

CephFS mirroring module provides mirror daemon status interface to check mirror daemon status:

$ ceph fs snapshot mirror daemon status
[
  {
    "daemon_id": 284167,
    "filesystems": [
      {
        "filesystem_id": 1,
        "name": "a",
        "directory_count": 1,
        "peers": [
          {
            "uuid": "02117353-8cd1-44db-976b-eb20609aa160",
            "remote": {
              "client_name": "client.mirror_remote",
              "cluster_name": "ceph",
              "fs_name": "backup_fs",
              "mon_host": "[v2:192.168.64.5:40183,v1:192.168.64.5:40184]",
              "fsid": "5682c8e5-50cd-4cfd-b75c-5354dcdda487"
            },
            "stats": {
              "failure_count": 1,
              "recovery_count": 0
            }
          }
        ]
      }
    ]
  }
]

An entry per mirror daemon instance is displayed along with information such as configured peers and basic stats. The peer information includes the remote file system name (fs_name), cluster’s Monitor addresses (mon_host) and cluster FSID (fsid). For more detailed stats, use the admin socket interface as detailed below.

CephFS mirror daemons provide admin socket commands for querying mirror status. To check available commands for mirror status use:

$ ceph --admin-daemon /path/to/mirror/daemon/admin/socket help
{
    ....
    ....
    "fs mirror status cephfs@360": "get filesystem mirror status",
    ....
    ....
}

Commands beginning with fs mirror status provide mirror status for mirror enabled file systems. Note that cephfs@360 is of format filesystem-name@filesystem-id. This format is required since mirror daemons get asynchronously notified regarding file system mirror status (a file system can be deleted and recreated with the same name).

This command currently provides minimal information regarding mirror status:

$ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@360
{
  "rados_inst": "192.168.0.5:0/1476644347",
  "peers": {
      "a2dc7784-e7a1-4723-b103-03ee8d8768f8": {
          "remote": {
              "client_name": "client.mirror_remote",
              "cluster_name": "site-a",
              "fs_name": "backup_fs"
          }
      }
  },
  "snap_dirs": {
      "dir_count": 1
  }
}

The peers section in the command output above shows the peer information including the unique peer-id (UUID) and specification. The peer-id is required when removing an existing peer as mentioned in the Mirroring Module section.

Commands beginning with fs mirror peer status provide peer synchronization status. The command parameter is of format filesystem-name@filesystem-id peer-uuid:

$ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8
{
  "/d0": {
      "state": "idle",
      "last_synced_snap": {
          "id": 120,
          "name": "snap1",
          "sync_duration": 3,
          "sync_time_stamp": "274900.558797s",
          "sync_bytes": 52428800
      },
      "snaps_synced": 2,
      "snaps_deleted": 0,
      "snaps_renamed": 0
  }
}

Synchronization stats including snaps_synced, snaps_deleted and snaps_renamed are reset on daemon restart and/or when a directory is reassigned to another mirror daemon (when multiple mirror daemons are deployed).

A directory can be in one of the following states:

idle: The directory is currently not being synchronized.
syncing: The directory is currently being synchronized.
failed: The directory has hit upper limit of consecutive failures.

When a directory is currently being synchronized, the mirror daemon marks it as syncing and fs mirror peer status shows the snapshot being synchronized under the current_syncing_snap:

$ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8
{
  "/d0": {
      "state": "syncing",
      "current_syncing_snap": {
          "id": 121,
          "name": "snap2"
      },
      "last_synced_snap": {
          "id": 120,
          "name": "snap1",
          "sync_duration": 3,
          "sync_time_stamp": "274900.558797s",
          "sync_bytes": 52428800
      },
      "snaps_synced": 2,
      "snaps_deleted": 0,
      "snaps_renamed": 0
  }
}

The mirror daemon marks it back to idle, when the syncing completes.

When a directory experiences a configured number of consecutive synchronization failures, the mirror daemon marks it as failed. Synchronization for these directories is retried. By default, the number of consecutive failures before a directory is marked as failed is controlled by cephfs_mirror_max_consecutive_failures_per_directory configuration option (default: 10) and the retry interval for failed directories is controlled via cephfs_mirror_retry_failed_directories_interval configuration option (default: 60 seconds).

E.g., adding a regular file for synchronization would result in failed status:

$ ceph fs snapshot mirror add cephfs /f0
$ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8
{
  "/d0": {
      "state": "idle",
      "last_synced_snap": {
          "id": 121,
          "name": "snap2",
          "sync_duration": 5,
          "sync_time_stamp": "500900.600797s",
          "sync_bytes": 78643200
      },
      "snaps_synced": 3,
      "snaps_deleted": 0,
      "snaps_renamed": 0
  },
  "/f0": {
      "state": "failed",
      "snaps_synced": 0,
      "snaps_deleted": 0,
      "snaps_renamed": 0
  }
}

This allows a user to add a non-existent directory for synchronization. The mirror daemon will mark such a directory as failed and retry (less frequently). When the directory is created, the mirror daemon will clear the failed state upon successful synchronization.

Adding a new snapshot or a new directory in the .snap directory of the remote filesystem will result in failed status of the corresponding configured directory. In the remote filesystem:

$ ceph fs subvolume snapshot create cephfs subvol1 snap2 group1
or
$ mkdir /d0/.snap/snap2

$ ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@360 a2dc7784-e7a1-4723-b103-03ee8d8768f8
{
  "/d0": {
      "state": "failed",
      "failure_reason": "snapshot 'snap2' has invalid metadata",
      "last_synced_snap": {
          "id": 120,
          "name": "snap1",
          "sync_duration": 3,
          "sync_time_stamp": "274900.558797s"
      },
      "snaps_synced": 2,
      "snaps_deleted": 0,
      "snaps_renamed": 0
  },
  "/f0": {
      "state": "failed",
      "snaps_synced": 0,
      "snaps_deleted": 0,
      "snaps_renamed": 0
  }
}

When the snapshot or the directory is removed from the remote filesystem, the mirror daemon will clear the failed state upon successful synchronization of the pending snapshots, if any.

Note

Setting snap-schedule on the remote file system for directories that are being mirrored will cause the mirror daemon to report errors like invalid metadata.

Note

Treat the remote filesystem as read-only. Nothing is inherently enforced by CephFS. But with the right MDS caps, users would not be able to snapshot directories in the remote file system.

When mirroring is disabled, the respective fs mirror status command for the file system will not show up in command help.

Metrics

CephFS exports mirroring metrics as Labeled Perf Counters which will be consumed by the OCP/ODF Dashboard to provide monitoring of the Geo Replication. These metrics can be used to measure the progress of cephfs-mirror syncing and thus provide the monitoring capability. CephFS exports the following mirroring metrics, which are displayed using the counter dump command.

Mirror Status Metrics
Name	Type	Description
mirroring_peers	Gauge	The number of peers involved in mirroring
directory_count	Gauge	The total number of directories being synchronized
mirrored_filesystems	Gauge	The total number of filesystems which are mirrored
mirror_enable_failures	Counter	Enable mirroring failures

Replication Metrics
Name	Type	Description
snaps_synced	Counter	The total number of snapshots successfully synchronized
sync_bytes	Counter	The total bytes being synchronized
sync_failures	Counter	The total number of failed snapshot synchronizations
snaps_deleted	Counter	The total number of snapshots deleted
snaps_renamed	Counter	The total number of snapshots renamed
avg_sync_time	Gauge	The average time taken by all snapshot synchronizations
last_synced_start	Gauge	The sync start time of the last synced snapshot
last_synced_end	Gauge	The sync end time of the last synced snapshot
last_synced_duration	Gauge	The time duration of the last synchronization
last_synced_bytes	Counter	The total bytes being synchronized for the last synced snapshot

Configuration Options

cephfs_mirror_max_concurrent_directory_syncs

maximum number of directory snapshots that can be crawled concurrently by cephfs-mirror daemon. Controls the number of synchronization crawler threads. Note that the crawler threads also does entry operations like directory creations, file deletes and snapshot deletes/renames.

type:

uint

runtime updatable:

true

default:

3

min:

1

cephfs_mirror_max_datasync_threads

specifies the maximum number of worker threads in the CephFS mirror data synchronization thread pool. These threads process file synchronization tasks produced by crawler threads for mirrored directory snapshots.

type:

uint

runtime updatable:

true

default:

6

min:

1

cephfs_mirror_distribute_datasync_threads

controls how datasync worker threads are scheduled when multiple snapshots are queued for synchronization. When enabled, worker threads are distributed fairly across active snapshots, preventing a single large snapshot from monopolizing all available threads and causing other snapshots to starve. When disabled, datasync threads process one snapshot until completion before switching to another, which can improve throughput for individual large snapshots but may increase latency for other queued snapshots. Enabling this option improves responsiveness and reduces starvation in environments where multiple directories are configured to be mirrored.

type:

bool

runtime updatable:

true

default:

true

cephfs_mirror_datasync_files_per_batch

defines the maximum number of files a data synchronization thread will process for a specific snapshot before yielding the thread to re-check scheduling logic. This is applicable only when cephfs_mirror_distribute_datasync_threads is enabled. This batch size determines the granularity of thread distribution; smaller batches allow threads to rotate between snapshots more frequently, while larger batches improve throughput by minimizing the overhead of thread re-assignment.

type:

uint

runtime updatable:

true

default:

64

min:

1

cephfs_mirror_action_update_interval

Interval in seconds to process pending mirror update actions.

type:

secs

runtime updatable:

true

default:

2

min:

1

cephfs_mirror_restart_mirror_on_blocklist_interval

Interval in seconds to restart blocklisted mirror instances. Setting to zero (0) disables restarting blocklisted instances.

type:

secs

runtime updatable:

true

default:

30

min:

0

cephfs_mirror_max_snapshot_sync_per_cycle

maximum number of snapshots to mirror when a directory is picked up for mirroring by worker threads.

type:

uint

runtime updatable:

true

default:

3

min:

1

cephfs_mirror_directory_scan_interval

interval in seconds to scan configured directories for snapshot mirroring.

type:

uint

runtime updatable:

true

default:

10

min:

1

cephfs_mirror_max_consecutive_failures_per_directory

number of consecutive snapshot synchronization failures to mark a directory as “failed”. failed directories are retried for synchronization less frequently.

type:

uint

runtime updatable:

true

default:

10

min:

0

cephfs_mirror_retry_failed_directories_interval

interval in seconds to retry synchronization for failed directories.

type:

uint

runtime updatable:

true

default:

60

min:

1

cephfs_mirror_restart_mirror_on_failure_interval

Interval in seconds to restart failed mirror instances. Setting to zero (0) disables restarting failed mirror instances.

type:

secs

runtime updatable:

true

default:

20

min:

0

cephfs_mirror_mount_timeout

Timeout in seconds for mounting primary or secondary (remote) ceph file system by the cephfs-mirror daemon. Setting this to a higher value could result in the mirror daemon getting stalled when mounting a file system if the cluster is not reachable. This option is used to override the usual client_mount_timeout.

type:

secs

runtime updatable:

true

default:

10

min:

0

cephfs_mirror_perf_stats_prio

The daemon will send perf counter data to the manager daemon if the priority is not lower than mgr_stats_threshold.

type:

int

runtime updatable:

true

default:

5

allowed range:

[0, 11]

cephfs_mirror_blockdiff_min_file_size

defines the minimum file size, in bytes, required for CephFS mirroring to use block-level delta synchronization instead of performing a full file copy. When a file’s size is greater than to this threshold, the mirroring engine attempts to synchronize only the modified block extents between snapshots. For files smaller than or equal to this value, a full file copy is performed instead, as block-level diff may not provide meaningful performance benefits for small files.

type:

size

runtime updatable:

true

default:

16Mi

Re-adding Peers

When re-adding (reassigning) a peer to a file system in another cluster, ensure that all mirror daemons have stopped synchronization to the peer. This can be checked via the fs mirror status admin socket command (the Peer UUID should not show up in the command output). Also, it is recommended to purge synchronized directories from the peer before re-adding it to another file system (especially those directories which might exist in the new primary file system). This is not required if re-adding a peer to the same primary file system it was earlier synchronized from.

Multi-threaded snapshot sync

CephFS mirroring now utilizes a multi-threaded architecture to improve synchronization performance. The workload is split into two distinct thread pools: a crawler thread pool, which manages snapshot crawl and a data synchronization thread pool, which handles concurrent file transfers. Users can fine-tune these operations using configuration parameters: - cephfs_mirror_max_concurrent_directory_syncs: controls the number of concurrent snapshots being crawled. - cephfs_mirror_max_datasync_threads: controls the total threads available for data sync. For more information, see https://tracker.ceph.com/issues/73452

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.