Notice

This document is for a development version of Ceph.

Cloud Sync Module

New in version Mimic.

This module syncs zone data to a remote cloud service. The sync is unidirectional; data is not synced back from the remote zone. The goal of this module is to enable syncing data to multiple cloud providers. The currently supported cloud providers are those that are compatible with AWS (S3).

User credentials for the remote cloud object store service need to be configured. Since many cloud services impose limits on the number of buckets that each user can create, the mapping of source objects and buckets is configurable. It is possible to configure different targets to different buckets and bucket prefixes. Note that source ACLs will not be preserved. It is possible to map permissions of specific source users to specific destination users.

Due to API limitations there is no way to preserve original object modification time and ETag. The cloud sync module stores these as metadata attributes on the destination objects.

Cloud Sync Tier Type Configuration

Trivial Configuration:

{
  "connection": {
    "access_key": <access>,
    "secret": <secret>,
    "endpoint": <endpoint>,
    "host_style": <path | virtual>,
  },
  "acls": [ { "type": <id | email | uri>,
              "source_id": <source_id>,
              "dest_id": <dest_id> } ... ],
  "target_path": <target_path>,
}

Non Trivial Configuration:

{
  "default": {
    "connection": {
        "access_key": <access>,
        "secret": <secret>,
        "endpoint": <endpoint>,
        "host_style" <path | virtual>,
    },
    "acls": [
    {
      "type" : <id | email | uri>,   #  optional, default is id
      "source_id": <id>,
      "dest_id": <id>
    } ... ]
    "target_path": <path> # optional
  },
  "connections": [
      {
        "connection_id": <id>,
        "access_key": <access>,
        "secret": <secret>,
        "endpoint": <endpoint>,
        "host_style" <path | virtual>,  # optional
      } ... ],
  "acl_profiles": [
      {
        "acls_id": <id>, # acl mappings
        "acls": [ {
            "type": <id | email | uri>,
            "source_id": <id>,
            "dest_id": <id>
          } ... ]
      }
  ],
  "profiles": [
      {
       "source_bucket": <source>,
       "connection_id": <connection_id>,
       "acls_id": <mappings_id>,
       "target_path": <dest>,          # optional
      } ... ],
}

Note

Trivial configuration can coincide with the non-trivial one.

  • connection (container)

Represents a connection to the remote cloud service. Contains connection_id, access_key, secret, endpoint, and host_style.

  • access_key (string)

The remote cloud access key that will be used for a specific connection.

  • secret (string)

The secret key for the remote cloud service.

  • endpoint (string)

URL of remote cloud service endpoint.

  • host_style (path | virtual)

Type of host style to be used when accessing remote cloud endpoint (default: path).

  • acls (array)

Contains a list of acl_mappings.

  • acl_mapping (container)

Each acl_mapping structure contains type, source_id, and dest_id. These will define the ACL mutation that will be done on each object. An ACL mutation allows converting source user id to a destination id.

  • type (id | email | uri)

ACL type: id defines user id, email defines user by email, and uri defines user by uri (group).

  • source_id (string)

ID of user in the source zone.

  • dest_id (string)

ID of user in the destination.

  • target_path (string)

A string that defines how the target path is created. The target path specifies a prefix to which the source object name is appended. The target path configurable can include any of the following variables: - sid: unique string that represents the sync instance ID - zonegroup: the zonegroup name - zonegroup_id: the zonegroup ID - zone: the zone name - zone_id: the zone id - bucket: source bucket name - owner: source bucket owner ID

For example: target_path = rgwx-${zone}-${sid}/${owner}/${bucket}

  • acl_profiles (array)

An array of acl_profile.

  • acl_profile (container)

Each profile contains acls_id (string) that represents the profile, and acls array that holds a list of acl_mappings.

  • profiles (array)

A list of profiles. Each profile contains the following: - source_bucket: either a bucket name, or a bucket prefix (if ends with *) that defines the source bucket(s) for this profile - target_path: as defined above - connection_id: ID of the connection that will be used for this profile - acls_id: ID of ACLs profile that will be used for this profile

S3 Specific Configurables:

Currently cloud sync will only work with backends that are compatible with AWS S3. There are a few configurables that can be used to tweak its behavior when accessing these cloud services:

{
  "multipart_sync_threshold": {object_size},
  "multipart_min_part_size": {part_size}
}
  • multipart_sync_threshold (integer)

Objects this size or larger will be synced to the cloud using multipart upload.

  • multipart_min_part_size (integer)

Minimum parts size to use when syncing objects using multipart upload.

How to Configure

See Multi-Site for how to multisite config instructions. The cloud sync module requires a creation of a new zone. The zone tier type needs to be defined as cloud:

# radosgw-admin zone create --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --endpoints={http://fqdn}[,{http://fqdn}]
                            --tier-type=cloud

The tier configuration can be then done using the following command

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config={key}={val}[,{key}={val}]

The key in the configuration specifies the config variable that needs to be updated, and the val specifies its new value. Nested values can be accessed using period. For example:

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config=connection.access_key={key},connection.secret={secret}

Configuration array entries can be accessed by specifying the specific entry to be referenced enclosed in square brackets, and adding new array entry can be done by using []. Index value of -1 references the last entry in the array. At the moment it is not possible to create a new entry and reference it again at the same command. For example, creating a new profile for buckets starting with {prefix}:

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config=profiles[].source_bucket={prefix}'*'

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config=profiles[-1].connection_id={conn_id},profiles[-1].acls_id={acls_id}

An entry can be removed by using --tier-config-rm={key}.