Full RGW Object Dedup:
Add a radosgw-admin command to collect and report deduplication stats
Note
This utility doesn’t perform dedup and doesn’t make any change to the existing system and will only collect statistics and report them.
Admin commands:
radosgw-admin dedup stats:Collects & displays last dedup statistics
radosgw-admin dedup pause:Pauses active dedup session (dedup resources are not released)
radosgw-admin dedup resume:Resumes a paused dedup session
radosgw-admin dedup abort:Aborts active dedup session and release all resources used by it
radosgw-admin dedup estimateStarts a new dedup estimate session (aborting first existing session if exists)
Skipped Objects:
Dedup Estimates skips the following objects:
Objects smaller than 4MB (unless they are multipart)
Objects with different placement rules
Objects with different pools
Objects with different same storage-classes
The Dedup process itself (which will be released later) will also skip compressed and user-encrypted objects, but the estimate process will accept them (since we don’t have access to that information during the estimate process)
Estimate Processing:
The Dedup Estimate process collects all the needed information directly from the bucket-indices reading one full bucket-index object with 1000’s of entries at a time.
The Bucket-Indices objects are sharded between the participating members so every bucket-index object is read exactly one time. The sharding allow processing to scale almost linearly spliting the load evenly between the participating members.
The Dedup Estimate process does not access the objects themselves (data/metadata) which means its processing time won’t be affected by the underlined media storing the objects (SSD/HDD) since the bucket-indices are virtually always stored on a fast medium (SSD with heavy memory caching)
Memory Usage:
RGW Obj Count
Memory
____1M____4M___16M___64M__256M_1024M( 1G)_4096M( 4G)16384M(16G) ___8MB__16MB__32MB__64MB_128MB_256MB_512MB1024MB
Brought to you by the Ceph Foundation
The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.