Notice

This document is for a development version of Ceph.

Beginner’s Guide

The purpose of A Beginner’s Guide to Ceph is to make Ceph comprehensible.

Ceph is a clustered and distributed storage manager. If that’s too cryptic, then just think of Ceph as a computer program that stores data and uses a network to make sure that there is a backup copy of the data.

Components of Ceph

Storage Interfaces

Ceph offers several “storage interfaces”, which is another way of saying “ways of storing data”. These storage interfaces include:

  • CephFS (a file system)

  • RBD (block devices)

  • RGW (an object store)

Deep down, though, all three of these are really RADOS object stores. CephFS and RBD are just presenting themselves as file systems and block devices.

Storage Manager: What is It?

Ceph is a clustered and distributed storage manager that offers data redundancy. This sentence might be too cryptic for first-time readers of the Ceph Beginner’s Guide, so let’s explain all of the terms in it:

  • Storage manager. Ceph is a storage manager. This means that Ceph is software that helps storage resources store data. Storage resources come in several forms: hard disk drives (HDD), solid-state drives (SSD), magnetic tape, floppy disks, punched tape, Hollerith-style punch cards, and magnetic drum memory are all forms of storage resources. In this beginner’s guide, we’ll focus on hard disk drives (HDD) and solid-state drives (SSD).

  • Clustered and distributed storage manager. Ceph is a clustered and distributed storage manager. This means that the storage manager is deployed not just on a single server but on several servers that work together as a system: the data that is stored and the infrastructure that supports it is spread across multiple servers and is not centralized in a single server. To better understand what distributed means in this context, it might be helpful to describe what it is not: it is not a system like a traditional enterprise storage array, which is a system that exposes a single logical disk over the network in a 1:1 (one-to-one) mapping.

  • Data Redundancy. Having a second copy of your data somewhere.

Ceph Monitor

The Ceph Monitor is one of the daemons essential to the functioning of a Ceph cluster. Monitors know the location of all the data in the Ceph cluster. Monitors maintain maps of the cluster state, and those maps make it possible for Ceph daemons to work together. These maps include the Monitor map, the OSD map, the MDS map, and the CRUSH map. At least three Monitors are required for the daemons to be resistant to failures. A majority of the Monitors must be in the “up” state in order for them to reach quorum. Quorum is a state that is necessary for a Ceph cluster to work properly.

Manager

The Ceph Manager is one of the daemons essential to the functioning of the Ceph cluster. Managers are in charge of various Ceph cluster management and monitoring tasks that are provided by Manager modules. These tasks include orchestration, the Ceph dashboard web GUI, balancing the data and load evenly in the Ceph cluster, keeping track of runtime metrics, and providing connectivity to non-native clients. Offloading less than critical and resource-intensive tasks from Monitors to Managers simplifies scaling the Ceph cluster.

OSD

Object Storage Daemons (OSDs) store objects.

An OSD is a process that runs on a storage server. The OSD is responsible for managing a single unit of storage, which is usually a single disk.

Pools

A pool is an abstraction that can be designated as either “replicated” or “erasure coded”. In Ceph, the method of data protection is set at the pool level. Ceph offers and supports two types of data protection: replication and erasure coding. Objects are stored in pools. “A storage pool is a collection of storage volumes. A storage volume is the basic unit of storage, such as allocated space on a disk or a single tape cartridge. The server uses the storage volumes to store backed-up, archived, or space-managed files.” (IBM Tivoli Storage Manager, Version 7.1, “Storage Pools”)

Placement Groups

Placement groups are a part of pools.

MDS

A metadata server (MDS) is necessary for the proper functioning of CephFS. See Deploy CephFS and Ceph File System.

Vstart Cluster Installation and Configuration Procedure

  1. Clone the ceph/ceph repository:

    git clone git@github.com:ceph/ceph
    
  2. Update the submodules in the ceph/ceph repository:

    git submodule update --init --recursive --progress
    
  3. Run install-deps.sh from within the directory into which you cloned the ceph/ceph repository:

    ./install-deps.sh
    
  4. Install the python3-routes package:

    apt install python3-routes
    
  5. Move into the ceph directory. You will know that you are in the correct directory if it contains the file do_cmake.sh:

    cd ceph
    
  6. Run the do_cmake.sh script:

    ./do_cmake.sh
    
  7. The do_cmake.sh script creates a build/ directory. Move into the build/ directory:

    cd build
    
  8. Use ninja to build the development environment:

    ninja -j3
    

    Note

    This step takes a long time to run. The ninja -j3 command kicks off a process consisting of 2289 steps. This step took over three hours when I ran it on an Intel NUC with an i7 in September of 2024.

  9. Install the Ceph development environment:

    ninja install
    

    This step does not take as long as the previous step.

  10. Build the vstart cluster:

    ninja vstart
    
  11. Start the vstart cluster:

    ../src/vstart.sh --debug --new -x --localhost --bluestore
    

    Note

    Run this command from within the ceph/build directory.

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.