Notice
This document is for a development version of Ceph.
Crash Module
The crash module collects information about daemon crashdumps and stores it in the Ceph cluster for later analysis.
Enabling
The crash module is enabled with:
ceph mgr module enable crash
The crash upload key is generated with:
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash'
On each node, you should store this key in
/etc/ceph/ceph.client.crash.keyring
.
Automated collection
Daemon crashdumps are dumped in /var/lib/ceph/crash
by default; this can
be configured with the option ‘crash dir’. Crash directories are named by
time and date and a randomly-generated UUID, and contain a metadata file
‘meta’ and a recent log file, with a “crash_id” that is the same.
These crashes can be automatically submitted and persisted in the monitors’
storage by using ceph-crash.service
.
It watches the crashdump directory and uploads them with ceph crash post
.
ceph-crash
tries some authentication names: client.crash.$hostname
,
client.crash
and client.admin
.
In order to successfully upload with ceph crash post
, these need
the suitable permissions: mon profile crash
and mgr profile crash
and a keyring needs to be in /etc/ceph
.
Commands
ceph crash post -i <metafile>
Save a crash dump. The metadata file is a JSON blob stored in the crash
dir as meta
. As usual, the ceph command can be invoked with -i -
,
and will read from stdin.
ceph crash rm <crashid>
Remove a specific crash dump.
ceph crash ls
List the timestamp/uuid crashids for all new and archived crash info.
ceph crash ls-new
List the timestamp/uuid crashids for all newcrash info.
ceph crash stat
Show a summary of saved crash info grouped by age.
ceph crash info <crashid>
Show all details of a saved crash.
ceph crash prune <keep>
Remove saved crashes older than ‘keep’ days. <keep> must be an integer.
ceph crash archive <crashid>
Archive a crash report so that it is no longer considered for the RECENT_CRASH
health check and does not appear in the crash ls-new
output (it will still appear in the crash ls
output).
ceph crash archive-all
Archive all new crash reports.
Options
mgr/crash/warn_recent_interval
[default: 2 weeks] controls what constitutes “recent” for the purposes of raising theRECENT_CRASH
health warning.mgr/crash/retain_interval
[default: 1 year] controls how long crash reports are retained by the cluster before they are automatically purged.
Brought to you by the Ceph Foundation
The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.