Cloud Sync Module

New in version Mimic.

This sync module sync zone data to a remote cloud service. The sync is unidirectional, and data is not synced from the remote zone back. The aim of this sync module is to provide capability of syncing data to different cloud providers. Currently the supported cloud providers are ones that are compatible with AWS (S3).

A user for the remote cloud object store service needs to be configured. Sync operations will be done under that speicified user. Since different cloud services impose limits on the number of buckets that each user can create, the source objects and buckets will be mapped into a different (configurable) buckets and objects. It is possible to configure different targets to different buckets and bucket prefixes. In addition to that, source ACLs will not be preserved. It is possible to map permission to specific source users to a specific destination users.

Due to API limitations, there is no way to preserve original objects modification time, and ETag. The cloud sync module stores these in a separate metadata attributes on the destination objects.

Cloud Sync Tier Type Configuration

Trivial Configuration:

{
  "connection": {
    "access_key": <access>,
    "secret": <secret>,
    "endpoint": <endpoint>,
    "host_style": <path | virtual>,
  },
  "acls": [ { "type": <id | email | uri>,
              "source_id": <source_id>,
              "dest_id": <dest_id> } ... ],
  "target_path": <target_path>,
}

Non Trivial Configuration:

{
  "default": {
    "connection": {
        "access_key": <access>,
        "secret": <secret>,
        "endpoint": <endpoint>,
        "host_style" <path | virtual>,
    },
    "acls": [
    {
      "type" : <id | email | uri>,   #  optional, default is id
      "source_id": <id>,
      "dest_id": <id>
    } ... ]
    "target_path": <path> # optional
  },
  "connections": [
      {
        "connection_id": <id>,
        "access_key": <access>,
        "secret": <secret>,
        "endpoint": <endpoint>,
        "host_style" <path | virtual>,  # optional
      } ... ],
  "acl_profiles": [
      {
        "acls_id": <id>, # acl mappings
        "acls": [ {
            "type": <id | email | uri>,
            "source_id": <id>,
            "dest_id": <id>
          } ... ]
      }
  ],
  "profiles": [
      {
       "source_bucket": <source>,
       "connection_id": <connection_id>,
       "acls_id": <mappings_id>,
       "target_path": <dest>,          # optional
      } ... ],
}

Note

Trivial configuration can coincide with the non-trivial one.

  • connection (container)

Represents a connection to the remote cloud service. Contains conection_id`, ``access_key, secret, endpoint, and host_style.

  • access_key (string)

The remote cloud access key that will be used for a specific connection.

  • secret (string)

The secret key for the remote cloud service.

  • endpoint (string)

URL of remote cloud service endpoint.

  • host_style (path | virtual)

Type of host style to be used when accessing remote cloud endpoint (default: path).

  • acls (array)

Contains a list of acl_mappings.

  • acl_mapping (container)

Each acl_mapping structure contains type, source_id, and dest_id. These will define the ACL mutation that will be done on each object. An ACL mutation allows converting source user id to a destination id.

  • type (id | email | uri)

ACL type: id defines user id, email defines user by email, and uri defines user by uri (group).

  • source_id (string)

ID of user in the source zone.

  • dest_id (string)

ID of user in the destination.

  • target_path (string)

A string that defines how the target path is created. The target path specifies a prefix to which the source object name is appended. The target path configurable can include any of the following variables: - sid: unique string that represents the sync instance ID - zonegroup: the zonegroup name - zonegroup_id: the zonegroup ID - zone: the zone name - zone_id: the zone id - bucket: source bucket name - owner: source bucket owner ID

For example: target_path = rgwx-${zone}-${sid}/${owner}/${bucket}

  • acl_profiles (array)

An array of of acl_profile.

  • acl_profile (container)

Each profile contains acls_id (string) that represents the profile, and acls array that holds a list of acl_mappings.

  • profiles (array)

A list of profiles. Each profile contains the following: - source_bucket: either a bucket name, or a bucket prefix (if ends with *) that defines the source bucket(s) for this profile - target_path: as defined above - connection_id: ID of the connection that will be used for this profile - acls_id: ID of ACLs profile that will be used for this profile

S3 Specific Configurables:

Currently cloud sync will only work with backends that are compatible with AWS S3. There are are a few configurables that can be used to tweak its behavior when accessing these cloud services:

{
  "multipart_sync_threshold": {object_size},
  "multipart_min_part_size": {part_size}
}
  • multipart_sync_threshold (integer)

Objects this size or larger will be synced to the cloud using multipart upload.

  • multipart_min_part_size (integer)

Minimum parts size to use when syncing objects using multipart upload.

How to Configure

See Multisite Configuration for how to multisite config instructions. The cloud sync module requires a creation of a new zone. The zone tier type needs to be defined as cloud:

# radosgw-admin zone create --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --endpoints={http://fqdn}[,{http://fqdn}]
                            --tier-type=cloud

The tier configuration can be then done using the following command

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config={key}={val}[,{key}={val}]

The key in the configuration specifies the config variable that needs to be updated, and the val specifies its new value. Nested values can be accessed using period. For example:

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config=connection.access_key={key},connection.secret={secret}

Configuration array entries can be accessed by specifying the specific entry to be referenced enclosed in square brackets, and adding new array entry can be done by using []. Index value of -1 references the last entry in the array. At the moment it is not possible to create a new entry and reference it again at the same command. For example, creating a new profile for buckets starting with {prefix}:

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config=profiles[].source_bucket={prefix}'*'

# radosgw-admin zone modify --rgw-zonegroup={zone-group-name} \
                            --rgw-zone={zone-name} \
                            --tier-config=profiles[-1].connection_id={conn_id},profiles[-1].acls_id={acls_id}

An entry can be removed by using --tier-config-rm={key}.