OSD Scenarios¶
There are a few scenarios that are supported and the differences are mainly based on the Ceph tooling required to provision OSDs, but can also affect how devices are being configured to create an OSD.
Supported values for the required osd_scenario
variable are:
Since the Ceph mimic release, it is preferred to use the lvm scenario that uses the ceph-volume
provisioning tool. Any other
scenario will cause deprecation warnings.
All the scenarios mentionned above support both containerized and non-containerized cluster.
As a reminder, deploying a containerized cluster can be done by setting containerized_deployment
to True
.
lvm¶
This OSD scenario uses ceph-volume
to create OSDs, primarily using LVM, and
is only available when the Ceph release is luminous or newer.
It is the preferred method of provisioning OSDs.
It is enabled with the following setting:
osd_scenario: lvm
Other (optional) supported settings:
osd_objectstore
: Set the Ceph objectstore for the OSD. Available options arefilestore
orbluestore
. You can only selectbluestore
with the Ceph release is luminous or greater. Defaults tobluestore
if unset.dmcrypt
: Enable Ceph’s encryption on OSDs usingdmcrypt
.- Defaults to
false
if unset.
osds_per_device
: Provision more than 1 OSD (the default if unset) per device.
Simple configuration¶
With this approach, most of the decisions on how devices are configured to
provision an OSD are made by the Ceph tooling (ceph-volume lvm batch
in
this case). There is almost no room to modify how the OSD is composed given an
input of devices.
To use this configuration, the devices
option must be populated with the
raw device paths that will be used to provision the OSDs.
Note
Raw devices must be “clean”, without a gpt partition table, or logical volumes present.
For example, for a node that has /dev/sda
and /dev/sdb
intended for
Ceph usage, the configuration would be:
osd_scenario: lvm
devices:
- /dev/sda
- /dev/sdb
In the above case, if both devices are spinning drives, 2 OSDs would be created, each with its own collocated journal.
Other provisioning strategies are possible, by mixing spinning and solid state devices, for example:
osd_scenario: lvm
devices:
- /dev/sda
- /dev/sdb
- /dev/nvme0n1
Similar to the initial example, this would end up producing 2 OSDs, but data
would be placed on the slower spinning drives (/dev/sda
, and /dev/sdb
)
and journals would be placed on the faster solid state device /dev/nvme0n1
.
The ceph-volume
tool describes this in detail in
the “batch” subcommand section
Other (optional) supported settings:
crush_device_class
: Sets the CRUSH device class for all OSDs created with this method (it is not possible to have a per-OSD CRUSH device class using the simple configuration approach). Values must be a string, likecrush_device_class: "ssd"
Advanced configuration¶
This configuration is useful when more granular control is wanted when setting
up devices and how they should be arranged to provision an OSD. It requires an
existing setup of volume groups and logical volumes (ceph-volume
will not
create these).
To use this configuration, the lvm_volumes
option must be populated with
logical volumes and volume groups. Additionally, absolute paths to partitions
can be used for journal
, block.db
, and block.wal
.
Note
This configuration uses ceph-volume lvm create
to provision OSDs
Supported lvm_volumes
configuration settings:
data
: The logical volume name or full path to a raw device (an LV will be created using 100% of the raw device)data_vg
: The volume group name, required ifdata
is a logical volume.crush_device_class
: CRUSH device class name for the resulting OSD, allows setting set the device class for each OSD, unlike the globalcrush_device_class
that sets them for all OSDs.
Note
If you wish to set the crush_device_class
for the OSDs
when using devices
you must set it using the global crush_device_class
option as shown above. There is no way to define a specific CRUSH device class
per OSD when using devices
like there is for lvm_volumes
.
filestore
objectstore variables:
journal
: The logical volume name or full path to a partition.journal_vg
: The volume group name, required ifjournal
is a logical volume.
Warning
Each entry must be unique, duplicate values are not allowed
bluestore
objectstore variables:
db
: The logical volume name or full path to a partition.db_vg
: The volume group name, required ifdb
is a logical volume.wal
: The logical volume name or full path to a partition.wal_vg
: The volume group name, required ifwal
is a logical volume.
Note
These bluestore
variables are optional optimizations. Bluestore’s
db
and wal
will only benefit from faster devices. It is possible to
create a bluestore OSD with a single raw device.
Warning
Each entry must be unique, duplicate values are not allowed
bluestore
example using raw devices:
osd_objectstore: bluestore
osd_scenario: lvm
lvm_volumes:
- data: /dev/sda
- data: /dev/sdb
Note
Volume groups and logical volumes will be created in this case, utilizing 100% of the devices.
bluestore
example with logical volumes:
osd_objectstore: bluestore
osd_scenario: lvm
lvm_volumes:
- data: data-lv1
data_vg: data-vg1
- data: data-lv2
data_vg: data-vg2
Note
Volume groups and logical volumes must exist.
bluestore
example defining wal
and db
logical volumes:
osd_objectstore: bluestore
osd_scenario: lvm
lvm_volumes:
- data: data-lv1
data_vg: data-vg1
db: db-lv1
db_vg: db-vg1
wal: wal-lv1
wal_vg: wal-vg1
- data: data-lv2
data_vg: data-vg2
db: db-lv2
db_vg: db-vg2
wal: wal-lv2
wal_vg: wal-vg2
Note
Volume groups and logical volumes must exist.
filestore
example with logical volumes:
osd_objectstore: filestore
osd_scenario: lvm
lvm_volumes:
- data: data-lv1
data_vg: data-vg1
journal: journal-lv1
journal_vg: journal-vg1
- data: data-lv2
data_vg: data-vg2
journal: journal-lv2
journal_vg: journal-vg2
Note
Volume groups and logical volumes must exist.
collocated¶
Warning
This scenario is deprecated in the Ceph mimic release, and fully removed in newer releases. It is recommended to used the lvm scenario instead
This OSD scenario uses ceph-disk
to create OSDs with collocated journals
from raw devices.
Use osd_scenario: collocated
to enable this scenario. This scenario also
has the following required configuration options:
devices
This scenario has the following optional configuration options:
osd_objectstore
: defaults tofilestore
if not set. Available options arefilestore
orbluestore
. You can only selectbluestore
if the Ceph release is luminous or greater.dmcrypt
: defaults tofalse
if not set.
This scenario supports encrypting your OSDs by setting dmcrypt: True
.
If osd_objectstore: filestore
is enabled both ‘ceph data’ and ‘ceph journal’ partitions
will be stored on the same device.
If osd_objectstore: bluestore
is enabled ‘ceph data’, ‘ceph block’, ‘ceph block.db’, ‘ceph block.wal’ will be stored
on the same device. The device will get 2 partitions:
- One for ‘data’, called ‘ceph data’
- One for ‘ceph block’, ‘ceph block.db’, ‘ceph block.wal’ called ‘ceph block’
Example of what you will get:
[root@ceph-osd0 ~]# blkid /dev/sda*
/dev/sda: PTTYPE="gpt"
/dev/sda1: UUID="9c43e346-dd6e-431f-92d8-cbed4ccb25f6" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="749c71c9-ed8f-4930-82a7-a48a3bcdb1c7"
/dev/sda2: PARTLABEL="ceph block" PARTUUID="e6ca3e1d-4702-4569-abfa-e285de328e9d"
An example of using the collocated
OSD scenario with encryption would look like:
osd_scenario: collocated
dmcrypt: true
devices:
- /dev/sda
- /dev/sdb
non-collocated¶
Warning
This scenario is deprecated in the Ceph mimic release, and fully removed in newer releases. It is recommended to used the lvm scenario instead
This OSD scenario uses ceph-disk
to create OSDs from raw devices with journals that
exist on a dedicated device.
Use osd_scenario: non-collocated
to enable this scenario. This scenario also
has the following required configuration options:
devices
This scenario has the following optional configuration options:
dedicated_devices
: defaults todevices
if not setosd_objectstore
: defaults tofilestore
if not set. Available options arefilestore
orbluestore
. You can only selectbluestore
with the Ceph release is luminous or greater.dmcrypt
: defaults tofalse
if not set.
This scenario supports encrypting your OSDs by setting dmcrypt: True
.
If osd_objectstore: filestore
is enabled ‘ceph data’ and ‘ceph journal’ partitions
will be stored on different devices:
- ‘ceph data’ will be stored on the device listed in devices
- ‘ceph journal’ will be stored on the device listed in dedicated_devices
Let’s take an example, imagine devices
was declared like this:
devices:
- /dev/sda
- /dev/sdb
- /dev/sdc
- /dev/sdd
And dedicated_devices
was declared like this:
dedicated_devices:
- /dev/sdf
- /dev/sdf
- /dev/sdg
- /dev/sdg
This will result in the following mapping:
/dev/sda
will have/dev/sdf1
as journal/dev/sdb
will have/dev/sdf2
as a journal/dev/sdc
will have/dev/sdg1
as a journal/dev/sdd
will have/dev/sdg2
as a journal
If osd_objectstore: bluestore
is enabled, both ‘ceph block.db’ and ‘ceph block.wal’ partitions will be stored
on a dedicated device.
So the following will happen:
- The devices listed in
devices
will get 2 partitions, one for ‘block’ and one for ‘data’. ‘data’ is only 100MB big and do not store any of your data, it’s just a bunch of Ceph metadata. ‘block’ will store all your actual data. - The devices in
dedicated_devices
will get 1 partition for RocksDB DB, called ‘block.db’ and one for RocksDB WAL, called ‘block.wal’
By default dedicated_devices
will represent block.db
Example of what you will get:
[root@ceph-osd0 ~]# blkid /dev/sd*
/dev/sda: PTTYPE="gpt"
/dev/sda1: UUID="c6821801-2f21-4980-add0-b7fc8bd424d5" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="f2cc6fa8-5b41-4428-8d3f-6187453464d0"
/dev/sda2: PARTLABEL="ceph block" PARTUUID="ea454807-983a-4cf2-899e-b2680643bc1c"
/dev/sdb: PTTYPE="gpt"
/dev/sdb1: PARTLABEL="ceph block.db" PARTUUID="af5b2d74-4c08-42cf-be57-7248c739e217"
There is more device granularity for Bluestore ONLY if osd_objectstore: bluestore
is enabled by setting the
bluestore_wal_devices
config option.
By default, if bluestore_wal_devices
is empty.
If set, then you will have a dedicated partition on a specific device for block.wal.
Example of what you will get:
[root@ceph-osd0 ~]# blkid /dev/sd*
/dev/sda: PTTYPE="gpt"
/dev/sda1: UUID="39241ae9-d119-4335-96b3-0898da8f45ce" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="961e7313-bdb7-49e7-9ae7-077d65c4c669"
/dev/sda2: PARTLABEL="ceph block" PARTUUID="bff8e54e-b780-4ece-aa16-3b2f2b8eb699"
/dev/sdb: PTTYPE="gpt"
/dev/sdb1: PARTLABEL="ceph block.db" PARTUUID="0734f6b6-cc94-49e9-93de-ba7e1d5b79e3"
/dev/sdc: PTTYPE="gpt"
/dev/sdc1: PARTLABEL="ceph block.wal" PARTUUID="824b84ba-6777-4272-bbbd-bfe2a25cecf3"
An example of using the non-collocated
OSD scenario with encryption, bluestore and dedicated wal devices would look like:
osd_scenario: non-collocated
osd_objectstore: bluestore
dmcrypt: true
devices:
- /dev/sda
- /dev/sdb
dedicated_devices:
- /dev/sdc
- /dev/sdc
bluestore_wal_devices:
- /dev/sdd
- /dev/sdd