OSD Scenarios

There are a few scenarios that are supported and the differences are mainly based on the Ceph tooling required to provision OSDs, but can also affect how devices are being configured to create an OSD.

Supported values for the required osd_scenario variable are:

Since the Ceph mimic release, it is preferred to use the lvm scenario that uses the ceph-volume provisioning tool. Any other scenario will cause deprecation warnings.

All the scenarios mentionned above support both containerized and non-containerized cluster. As a reminder, deploying a containerized cluster can be done by setting containerized_deployment to True.

lvm

This OSD scenario uses ceph-volume to create OSDs, primarily using LVM, and is only available when the Ceph release is luminous or newer.

It is the preferred method of provisioning OSDs.

It is enabled with the following setting:

osd_scenario: lvm

Other (optional) supported settings:

  • osd_objectstore: Set the Ceph objectstore for the OSD. Available options are filestore or bluestore. You can only select bluestore with the Ceph release is luminous or greater. Defaults to bluestore if unset.
  • dmcrypt: Enable Ceph’s encryption on OSDs using dmcrypt.
    Defaults to false if unset.
  • osds_per_device: Provision more than 1 OSD (the default if unset) per device.

Simple configuration

With this approach, most of the decisions on how devices are configured to provision an OSD are made by the Ceph tooling (ceph-volume lvm batch in this case). There is almost no room to modify how the OSD is composed given an input of devices.

To use this configuration, the devices option must be populated with the raw device paths that will be used to provision the OSDs.

Note

Raw devices must be “clean”, without a gpt partition table, or logical volumes present.

For example, for a node that has /dev/sda and /dev/sdb intended for Ceph usage, the configuration would be:

osd_scenario: lvm
devices:
  - /dev/sda
  - /dev/sdb

In the above case, if both devices are spinning drives, 2 OSDs would be created, each with its own collocated journal.

Other provisioning strategies are possible, by mixing spinning and solid state devices, for example:

osd_scenario: lvm
devices:
  - /dev/sda
  - /dev/sdb
  - /dev/nvme0n1

Similar to the initial example, this would end up producing 2 OSDs, but data would be placed on the slower spinning drives (/dev/sda, and /dev/sdb) and journals would be placed on the faster solid state device /dev/nvme0n1. The ceph-volume tool describes this in detail in the “batch” subcommand section

Other (optional) supported settings:

  • crush_device_class: Sets the CRUSH device class for all OSDs created with this method (it is not possible to have a per-OSD CRUSH device class using the simple configuration approach). Values must be a string, like crush_device_class: "ssd"

Advanced configuration

This configuration is useful when more granular control is wanted when setting up devices and how they should be arranged to provision an OSD. It requires an existing setup of volume groups and logical volumes (ceph-volume will not create these).

To use this configuration, the lvm_volumes option must be populated with logical volumes and volume groups. Additionally, absolute paths to partitions can be used for journal, block.db, and block.wal.

Note

This configuration uses ceph-volume lvm create to provision OSDs

Supported lvm_volumes configuration settings:

  • data: The logical volume name or full path to a raw device (an LV will be created using 100% of the raw device)
  • data_vg: The volume group name, required if data is a logical volume.
  • crush_device_class: CRUSH device class name for the resulting OSD, allows setting set the device class for each OSD, unlike the global crush_device_class that sets them for all OSDs.

Note

If you wish to set the crush_device_class for the OSDs when using devices you must set it using the global crush_device_class option as shown above. There is no way to define a specific CRUSH device class per OSD when using devices like there is for lvm_volumes.

filestore objectstore variables:

  • journal: The logical volume name or full path to a partition.
  • journal_vg: The volume group name, required if journal is a logical volume.

Warning

Each entry must be unique, duplicate values are not allowed

bluestore objectstore variables:

  • db: The logical volume name or full path to a partition.
  • db_vg: The volume group name, required if db is a logical volume.
  • wal: The logical volume name or full path to a partition.
  • wal_vg: The volume group name, required if wal is a logical volume.

Note

These bluestore variables are optional optimizations. Bluestore’s db and wal will only benefit from faster devices. It is possible to create a bluestore OSD with a single raw device.

Warning

Each entry must be unique, duplicate values are not allowed

bluestore example using raw devices:

osd_objectstore: bluestore
osd_scenario: lvm
lvm_volumes:
  - data: /dev/sda
  - data: /dev/sdb

Note

Volume groups and logical volumes will be created in this case, utilizing 100% of the devices.

bluestore example with logical volumes:

osd_objectstore: bluestore
osd_scenario: lvm
lvm_volumes:
  - data: data-lv1
    data_vg: data-vg1
  - data: data-lv2
    data_vg: data-vg2

Note

Volume groups and logical volumes must exist.

bluestore example defining wal and db logical volumes:

osd_objectstore: bluestore
osd_scenario: lvm
lvm_volumes:
  - data: data-lv1
    data_vg: data-vg1
    db: db-lv1
    db_vg: db-vg1
    wal: wal-lv1
    wal_vg: wal-vg1
  - data: data-lv2
    data_vg: data-vg2
    db: db-lv2
    db_vg: db-vg2
    wal: wal-lv2
    wal_vg: wal-vg2

Note

Volume groups and logical volumes must exist.

filestore example with logical volumes:

osd_objectstore: filestore
osd_scenario: lvm
lvm_volumes:
  - data: data-lv1
    data_vg: data-vg1
    journal: journal-lv1
    journal_vg: journal-vg1
  - data: data-lv2
    data_vg: data-vg2
    journal: journal-lv2
    journal_vg: journal-vg2

Note

Volume groups and logical volumes must exist.

collocated

Warning

This scenario is deprecated in the Ceph mimic release, and fully removed in newer releases. It is recommended to used the lvm scenario instead

This OSD scenario uses ceph-disk to create OSDs with collocated journals from raw devices.

Use osd_scenario: collocated to enable this scenario. This scenario also has the following required configuration options:

  • devices

This scenario has the following optional configuration options:

  • osd_objectstore: defaults to filestore if not set. Available options are filestore or bluestore. You can only select bluestore if the Ceph release is luminous or greater.
  • dmcrypt: defaults to false if not set.

This scenario supports encrypting your OSDs by setting dmcrypt: True.

If osd_objectstore: filestore is enabled both ‘ceph data’ and ‘ceph journal’ partitions will be stored on the same device.

If osd_objectstore: bluestore is enabled ‘ceph data’, ‘ceph block’, ‘ceph block.db’, ‘ceph block.wal’ will be stored on the same device. The device will get 2 partitions:

  • One for ‘data’, called ‘ceph data’
  • One for ‘ceph block’, ‘ceph block.db’, ‘ceph block.wal’ called ‘ceph block’

Example of what you will get:

[root@ceph-osd0 ~]# blkid /dev/sda*
/dev/sda: PTTYPE="gpt"
/dev/sda1: UUID="9c43e346-dd6e-431f-92d8-cbed4ccb25f6" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="749c71c9-ed8f-4930-82a7-a48a3bcdb1c7"
/dev/sda2: PARTLABEL="ceph block" PARTUUID="e6ca3e1d-4702-4569-abfa-e285de328e9d"

An example of using the collocated OSD scenario with encryption would look like:

osd_scenario: collocated
dmcrypt: true
devices:
  - /dev/sda
  - /dev/sdb

non-collocated

Warning

This scenario is deprecated in the Ceph mimic release, and fully removed in newer releases. It is recommended to used the lvm scenario instead

This OSD scenario uses ceph-disk to create OSDs from raw devices with journals that exist on a dedicated device.

Use osd_scenario: non-collocated to enable this scenario. This scenario also has the following required configuration options:

  • devices

This scenario has the following optional configuration options:

  • dedicated_devices: defaults to devices if not set
  • osd_objectstore: defaults to filestore if not set. Available options are filestore or bluestore. You can only select bluestore with the Ceph release is luminous or greater.
  • dmcrypt: defaults to false if not set.

This scenario supports encrypting your OSDs by setting dmcrypt: True.

If osd_objectstore: filestore is enabled ‘ceph data’ and ‘ceph journal’ partitions will be stored on different devices: - ‘ceph data’ will be stored on the device listed in devices - ‘ceph journal’ will be stored on the device listed in dedicated_devices

Let’s take an example, imagine devices was declared like this:

devices:
  - /dev/sda
  - /dev/sdb
  - /dev/sdc
  - /dev/sdd

And dedicated_devices was declared like this:

dedicated_devices:
  - /dev/sdf
  - /dev/sdf
  - /dev/sdg
  - /dev/sdg

This will result in the following mapping:

  • /dev/sda will have /dev/sdf1 as journal
  • /dev/sdb will have /dev/sdf2 as a journal
  • /dev/sdc will have /dev/sdg1 as a journal
  • /dev/sdd will have /dev/sdg2 as a journal

If osd_objectstore: bluestore is enabled, both ‘ceph block.db’ and ‘ceph block.wal’ partitions will be stored on a dedicated device.

So the following will happen:

  • The devices listed in devices will get 2 partitions, one for ‘block’ and one for ‘data’. ‘data’ is only 100MB big and do not store any of your data, it’s just a bunch of Ceph metadata. ‘block’ will store all your actual data.
  • The devices in dedicated_devices will get 1 partition for RocksDB DB, called ‘block.db’ and one for RocksDB WAL, called ‘block.wal’

By default dedicated_devices will represent block.db

Example of what you will get:

[root@ceph-osd0 ~]# blkid /dev/sd*
/dev/sda: PTTYPE="gpt"
/dev/sda1: UUID="c6821801-2f21-4980-add0-b7fc8bd424d5" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="f2cc6fa8-5b41-4428-8d3f-6187453464d0"
/dev/sda2: PARTLABEL="ceph block" PARTUUID="ea454807-983a-4cf2-899e-b2680643bc1c"
/dev/sdb: PTTYPE="gpt"
/dev/sdb1: PARTLABEL="ceph block.db" PARTUUID="af5b2d74-4c08-42cf-be57-7248c739e217"
/dev/sdb2: PARTLABEL="ceph block.wal" PARTUUID="af3f8327-9aa9-4c2b-a497-cf0fe96d126a"

There is more device granularity for Bluestore ONLY if osd_objectstore: bluestore is enabled by setting the bluestore_wal_devices config option.

By default, if bluestore_wal_devices is empty, it will get the content of dedicated_devices. If set, then you will have a dedicated partition on a specific device for block.wal.

Example of what you will get:

[root@ceph-osd0 ~]# blkid /dev/sd*
/dev/sda: PTTYPE="gpt"
/dev/sda1: UUID="39241ae9-d119-4335-96b3-0898da8f45ce" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="961e7313-bdb7-49e7-9ae7-077d65c4c669"
/dev/sda2: PARTLABEL="ceph block" PARTUUID="bff8e54e-b780-4ece-aa16-3b2f2b8eb699"
/dev/sdb: PTTYPE="gpt"
/dev/sdb1: PARTLABEL="ceph block.db" PARTUUID="0734f6b6-cc94-49e9-93de-ba7e1d5b79e3"
/dev/sdc: PTTYPE="gpt"
/dev/sdc1: PARTLABEL="ceph block.wal" PARTUUID="824b84ba-6777-4272-bbbd-bfe2a25cecf3"

An example of using the non-collocated OSD scenario with encryption, bluestore and dedicated wal devices would look like:

osd_scenario: non-collocated
osd_objectstore: bluestore
dmcrypt: true
devices:
  - /dev/sda
  - /dev/sdb
dedicated_devices:
  - /dev/sdc
  - /dev/sdc
bluestore_wal_devices:
  - /dev/sdd
  - /dev/sdd