Notice

This document is for a development version of Ceph.

ceph-mgr orchestrator modules

Warning

This is developer documentation, describing Ceph internals that are only relevant to people writing ceph-mgr orchestrator modules.

In this context, orchestrator refers to some external service that provides the ability to discover devices and create Ceph services. This includes external projects such as Rook.

An orchestrator module is a ceph-mgr module (ceph-mgr module developer’s guide) which implements common management operations using a particular orchestrator.

Orchestrator modules subclass the Orchestrator class: this class is an interface, it only provides method definitions to be implemented by subclasses. The purpose of defining this common interface for different orchestrators is to enable common UI code, such as the dashboard, to work with various different backends.

digraph G { subgraph cluster_1 { volumes [label="mgr/volumes"] rook [label="mgr/rook"] dashboard [label="mgr/dashboard"] orchestrator_cli [label="mgr/orchestrator"] orchestrator [label="Orchestrator Interface"] cephadm [label="mgr/cephadm"] label = "ceph-mgr"; } volumes -> orchestrator dashboard -> orchestrator orchestrator_cli -> orchestrator orchestrator -> rook -> rook_io orchestrator -> cephadm rook_io [label="Rook"] rankdir="TB"; }

Behind all the abstraction, the purpose of orchestrator modules is simple: enable Ceph to do things like discover available hardware, create and destroy OSDs, and run MDS and RGW services.

A tutorial is not included here: for full and concrete examples, see the existing implemented orchestrator modules in the Ceph source tree.

Glossary

Stateful service

a daemon that uses local storage, such as OSD or mon.

Stateless service

a daemon that doesn’t use any local storage, such as an MDS, RGW, nfs-ganesha, iSCSI gateway.

Label

arbitrary string tags that may be applied by administrators to hosts. Typically administrators use labels to indicate which hosts should run which kinds of service. Labels are advisory (from human input) and do not guarantee that hosts have particular physical capabilities.

Drive group

collection of block devices with common/shared OSD formatting (typically one or more SSDs acting as journals/dbs for a group of HDDs).

Placement

choice of which host is used to run a service.

Key Concepts

The underlying orchestrator remains the source of truth for information about whether a service is running, what is running where, which hosts are available, etc. Orchestrator modules should avoid taking any internal copies of this information, and read it directly from the orchestrator backend as much as possible.

Bootstrapping hosts and adding them to the underlying orchestration system is outside the scope of Ceph’s orchestrator interface. Ceph can only work on hosts when the orchestrator is already aware of them.

Where possible, placement of stateless services should be left up to the orchestrator.

Completions and batching

All methods that read or modify the state of the system can potentially be long running. Therefore the module needs to schedule those operations.

Each orchestrator module implements its own underlying mechanisms for completions. This might involve running the underlying operations in threads, or batching the operations up before later executing in one go in the background. If implementing such a batching pattern, the module would do no work on any operation until it appeared in a list of completions passed into process.

Error Handling

The main goal of error handling within orchestrator modules is to provide debug information to assist users when dealing with deployment errors.

class orchestrator.OrchestratorError(msg, errno=- 22, event_kind_subject=None)

General orchestrator specific error.

Used for deployment, configuration or user errors.

It’s not intended for programming errors or orchestrator internal errors.

class orchestrator.NoOrchestrator(msg='No orchestrator configured (try `ceph orch set backend`)')

No orchestrator in configured.

class orchestrator.OrchestratorValidationError(msg, errno=- 22, event_kind_subject=None)

Raised when an orchestrator doesn’t support a specific feature.

In detail, orchestrators need to explicitly deal with different kinds of errors:

  1. No orchestrator configured

    See NoOrchestrator.

  2. An orchestrator doesn’t implement a specific method.

    For example, an Orchestrator doesn’t support add_host.

    In this case, a NotImplementedError is raised.

  3. Missing features within implemented methods.

    E.g. optional parameters to a command that are not supported by the backend (e.g. the hosts field in Orchestrator.apply_mons() command with the rook backend).

    See OrchestratorValidationError.

  4. Input validation errors

    The orchestrator module and other calling modules are supposed to provide meaningful error messages.

    See OrchestratorValidationError.

  5. Errors when actually executing commands

    The resulting Completion should contain an error string that assists in understanding the problem. In addition, Completion.is_errored() is set to True

  6. Invalid configuration in the orchestrator modules

    This can be tackled similar to 5.

All other errors are unexpected orchestrator issues and thus should raise an exception that are then logged into the mgr log file. If there is a completion object at that point, Completion.result() may contain an error message.

Excluded functionality

  • Ceph’s orchestrator interface is not a general purpose framework for managing linux servers – it is deliberately constrained to manage the Ceph cluster’s services only.

  • Multipathed storage is not handled (multipathing is unnecessary for Ceph clusters). Each drive is assumed to be visible only on a single host.

Host management

Orchestrator.add_host(host_spec)

Add a host to the orchestrator inventory.

Parameters

host – hostname

Return type

OrchResult[str]

Orchestrator.remove_host(host)

Remove a host from the orchestrator inventory.

Parameters

host (str) – hostname

Return type

OrchResult[str]

Orchestrator.get_hosts()

Report the hosts in the cluster.

Return type

OrchResult[List[HostSpec]]

Returns

list of HostSpec

Orchestrator.update_host_addr(host, addr)

Update a host’s address

Parameters
  • host (str) – hostname

  • addr (str) – address (dns name or IP)

Return type

OrchResult[str]

Orchestrator.add_host_label(host, label)

Add a host label

Return type

OrchResult[str]

Orchestrator.remove_host_label(host, label)

Remove a host label

Return type

OrchResult[str]

class orchestrator.HostSpec(hostname, addr=None, labels=None, status=None, location=None)

Information about hosts. Like e.g. kubectl get nodes

Devices

Orchestrator.get_inventory(host_filter=None, refresh=False)

Returns something that was created by ceph-volume inventory.

Return type

OrchResult[List[InventoryHost]]

Returns

list of InventoryHost

class orchestrator.InventoryFilter(labels=None, hosts=None)

When fetching inventory, use this filter to avoid unnecessarily scanning the whole estate.

Typical use: filter by host when presenting UI workflow for configuring

a particular server. filter by label when not all of estate is Ceph servers, and we want to only learn about the Ceph servers. filter by label when we are interested particularly in e.g. OSD servers.

class ceph.deployment.inventory.Devices(devices)

A container for Device instances with reporting

class ceph.deployment.inventory.Device(path, sys_api=None, available=None, rejected_reasons=None, lvs=None, device_id=None, lsm_data=None)

Placement

A Placement Specification defines the placement of daemons of a specific service.

In general, stateless services do not require any specific placement rules as they can run anywhere that sufficient system resources are available. However, some orchestrators may not include the functionality to choose a location in this way. Optionally, you can specify a location when creating a stateless service.

class ceph.deployment.service_spec.PlacementSpec(label=None, hosts=None, count=None, count_per_host=None, host_pattern=None)

For APIs that need to specify a host subset

classmethod from_string(arg)

A single integer is parsed as a count: >>> PlacementSpec.from_string(‘3’) PlacementSpec(count=3)

A list of names is parsed as host specifications: >>> PlacementSpec.from_string(‘host1 host2’) PlacementSpec(hosts=[HostPlacementSpec(hostname=’host1’, network=’’, name=’’), HostPlacementSpec(hostname=’host2’, network=’’, name=’’)])

You can also prefix the hosts with a count as follows: >>> PlacementSpec.from_string(‘2 host1 host2’) PlacementSpec(count=2, hosts=[HostPlacementSpec(hostname=’host1’, network=’’, name=’’), HostPlacementSpec(hostname=’host2’, network=’’, name=’’)])

You can spefify labels using label:<label> >>> PlacementSpec.from_string(‘label:mon’) PlacementSpec(label=’mon’)

Labels als support a count: >>> PlacementSpec.from_string(‘3 label:mon’) PlacementSpec(count=3, label=’mon’)

fnmatch is also supported: >>> PlacementSpec.from_string(‘data[1-3]’) PlacementSpec(host_pattern=’data[1-3]’)

>>> PlacementSpec.from_string(None)
PlacementSpec()
Return type

PlacementSpec

host_pattern: Optional[str]

fnmatch patterns to select hosts. Can also be a single host.

pretty_str()
>>> 
... ps = PlacementSpec(...)  # For all placement specs:
... PlacementSpec.from_string(ps.pretty_str()) == ps
Return type

str

Services

class orchestrator.ServiceDescription(spec, container_image_id=None, container_image_name=None, rados_config_location=None, service_url=None, last_refresh=None, created=None, deleted=None, size=0, running=0, events=None, virtual_ip=None, ports=[])

For responding to queries about the status of a particular service, stateful or stateless.

This is not about health or performance monitoring of services: it’s about letting the orchestrator tell Ceph whether and where a service is scheduled in the cluster. When an orchestrator tells Ceph “it’s running on host123”, that’s not a promise that the process is literally up this second, it’s a description of where the orchestrator has decided the service should run.

class ceph.deployment.service_spec.ServiceSpec(service_type, service_id=None, placement=None, count=None, config=None, unmanaged=False, preview_only=False, networks=None)

Details of service creation.

Request to the orchestrator for a cluster of daemons such as MDS, RGW, iscsi gateway, MONs, MGRs, Prometheus

This structure is supposed to be enough information to start the services.

Orchestrator.describe_service(service_type=None, service_name=None, refresh=False)

Describe a service (of any kind) that is already configured in the orchestrator. For example, when viewing an OSD in the dashboard we might like to also display information about the orchestrator’s view of the service (like the kubernetes pod ID).

When viewing a CephFS filesystem in the dashboard, we would use this to display the pods being currently run for MDS daemons.

Return type

OrchResult[List[ServiceDescription]]

Returns

list of ServiceDescription objects.

Orchestrator.service_action(action, service_name)

Perform an action (start/stop/reload) on a service (i.e., all daemons providing the logical service).

Parameters
  • action (str) – one of “start”, “stop”, “restart”, “redeploy”, “reconfig”

  • service_name (str) – service_type + ‘.’ + service_id (e.g. “mon”, “mgr”, “mds.mycephfs”, “rgw.realm.zone”, …)

Return type

OrchResult

Orchestrator.remove_service(service_name)

Remove a service (a collection of daemons).

Return type

OrchResult[str]

Returns

None

Daemons

Orchestrator.list_daemons(service_name=None, daemon_type=None, daemon_id=None, host=None, refresh=False)

Describe a daemon (of any kind) that is already configured in the orchestrator.

Return type

OrchResult[List[DaemonDescription]]

Returns

list of DaemonDescription objects.

Orchestrator.remove_daemons(names)

Remove specific daemon(s).

Return type

OrchResult[List[str]]

Returns

None

Orchestrator.daemon_action(action, daemon_name, image=None)

Perform an action (start/stop/reload) on a daemon.

Parameters
  • action (str) – one of “start”, “stop”, “restart”, “redeploy”, “reconfig”

  • daemon_name (str) – name of daemon

  • image (Optional[str]) – Container image when redeploying that daemon

Return type

OrchResult

OSD management

Orchestrator.create_osds(drive_group)

Create one or more OSDs within a single Drive Group.

The principal argument here is the drive_group member of OsdSpec: other fields are advisory/extensible for any finer-grained OSD feature enablement (choice of backing store, compression/encryption, etc).

Return type

OrchResult[str]

Instructs the orchestrator to enable or disable either the ident or the fault LED.

Parameters
  • ident_fault (str) – either "ident" or "fault"

  • on (bool) – True = on.

  • locations (List[DeviceLightLoc]) – See orchestrator.DeviceLightLoc

Return type

OrchResult[List[str]]

class orchestrator.DeviceLightLoc(host, dev, path)

Describes a specific device on a specific host. Used for enabling or disabling LEDs on devices.

hostname as in orchestrator.Orchestrator.get_hosts()

device_id: e.g. ABC1234DEF567-1R1234_ABC8DE0Q.

See ceph osd metadata | jq '.[].device_ids'

OSD Replacement

See Replacing an OSD for the underlying process.

Replacing OSDs is fundamentally a two-staged process, as users need to physically replace drives. The orchestrator therefore exposes this two-staged process.

Phase one is a call to Orchestrator.remove_daemons() with destroy=True in order to mark the OSD as destroyed.

Phase two is a call to Orchestrator.create_osds() with a Drive Group with

DriveGroupSpec.osd_id_claims set to the destroyed OSD ids.

Services

Orchestrator.add_daemon(spec)

Create daemons daemon(s) for unmanaged services

Return type

OrchResult[List[str]]

Orchestrator.apply_mon(spec)

Update mon cluster

Return type

OrchResult[str]

Orchestrator.apply_mgr(spec)

Update mgr cluster

Return type

OrchResult[str]

Orchestrator.apply_mds(spec)

Update MDS cluster

Return type

OrchResult[str]

Orchestrator.apply_rbd_mirror(spec)

Update rbd-mirror cluster

Return type

OrchResult[str]

class ceph.deployment.service_spec.RGWSpec(service_type='rgw', service_id=None, placement=None, rgw_realm=None, rgw_zone=None, rgw_frontend_port=None, rgw_frontend_ssl_certificate=None, rgw_frontend_type=None, unmanaged=False, ssl=False, preview_only=False, config=None, networks=None, subcluster=None)

Settings to configure a (multisite) Ceph RGW

Orchestrator.apply_rgw(spec)

Update RGW cluster

Return type

OrchResult[str]

class ceph.deployment.service_spec.NFSServiceSpec(service_type='nfs', service_id=None, placement=None, unmanaged=False, preview_only=False, config=None, networks=None, pool=None, namespace=None, port=None)
Orchestrator.apply_nfs(spec)

Update NFS cluster

Return type

OrchResult[str]

Upgrades

Orchestrator.upgrade_available()

Report on what versions are available to upgrade to

Return type

OrchResult

Returns

List of strings

Orchestrator.upgrade_start(image, version)
Return type

OrchResult[str]

Orchestrator.upgrade_status()

If an upgrade is currently underway, report on where we are in the process, or if some error has occurred.

Return type

OrchResult[UpgradeStatusSpec]

Returns

UpgradeStatusSpec instance

class orchestrator.UpgradeStatusSpec

Utility

Orchestrator.available()

Report whether we can talk to the orchestrator. This is the place to give the user a meaningful message if the orchestrator isn’t running or can’t be contacted.

This method may be called frequently (e.g. every page load to conditionally display a warning banner), so make sure it’s not too expensive. It’s okay to give a slightly stale status (e.g. based on a periodic background ping of the orchestrator) if that’s necessary to make this method fast.

Note

True doesn’t mean that the desired functionality is actually available in the orchestrator. I.e. this won’t work as expected:

>>> 
... if OrchestratorClientMixin().available()[0]:  # wrong.
...     OrchestratorClientMixin().get_hosts()
Returns

boolean representing whether the module is available/usable

Returns

string describing any error

Return type

Tuple[bool, str, Dict[str, Any]]

Returns

dict containing any module specific information

Orchestrator.get_feature_set()

Describes which methods this orchestrator implements

Note

True doesn’t mean that the desired functionality is actually possible in the orchestrator. I.e. this won’t work as expected:

>>> 
... api = OrchestratorClientMixin()
... if api.get_feature_set()['get_hosts']['available']:  # wrong.
...     api.get_hosts()

It’s better to ask for forgiveness instead:

>>> 
... try:
...     OrchestratorClientMixin().get_hosts()
... except (OrchestratorError, NotImplementedError):
...     ...
Return type

Dict[str, dict]

Returns

Dict of API method names to {'available': True or False}

Client Modules

class orchestrator.OrchestratorClientMixin

A module that inherents from OrchestratorClientMixin can directly call all Orchestrator methods without manually calling remote.

Every interface method from Orchestrator is converted into a stub method that internally calls OrchestratorClientMixin._oremote()

>>> class MyModule(OrchestratorClientMixin):
...    def func(self):
...        completion = self.add_host('somehost')  # calls `_oremote()`
...        self.log.debug(completion.result)

Note

Orchestrator implementations should not inherit from OrchestratorClientMixin. Reason is, that OrchestratorClientMixin magically redirects all methods to the “real” implementation of the orchestrator.

>>> import mgr_module
>>> 
... class MyImplentation(mgr_module.MgrModule, Orchestrator):
...     def __init__(self, ...):
...         self.orch_client = OrchestratorClientMixin()
...         self.orch_client.set_mgr(self.mgr))
set_mgr(mgr)

Useable in the Dashbord that uses a global mgr

Return type

None