The crash module collects information about daemon crashdumps and stores it in the Ceph cluster for later analysis.
The crash module is enabled with:
ceph mgr module enable crash
The crash upload key is generated with:
ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash'
On each node, you should store this key in
Daemon crashdumps are dumped in
/var/lib/ceph/crash by default; this can
be configured with the option ‘crash dir’. Crash directories are named by
time and date and a randomly-generated UUID, and contain a metadata file
‘meta’ and a recent log file, with a “crash_id” that is the same.
These crashes can be automatically submitted and persisted in the monitors’
storage by using
It watches the crashdump directory and uploads them with
ceph crash post.
ceph-crash tries some authentication names:
In order to successfully upload with
ceph crash post, these need
the suitable permissions:
mon profile crash and
mgr profile crash
and a keyring needs to be in
ceph crash post -i <metafile>
Save a crash dump. The metadata file is a JSON blob stored in the crash
meta. As usual, the ceph command can be invoked with
and will read from stdin.
ceph crash rm <crashid>
Remove a specific crash dump.
ceph crash ls
List the timestamp/uuid crashids for all new and archived crash info.
ceph crash ls-new
List the timestamp/uuid crashids for all newcrash info.
ceph crash stat
Show a summary of saved crash info grouped by age.
ceph crash info <crashid>
Show all details of a saved crash.
ceph crash prune <keep>
Remove saved crashes older than ‘keep’ days. <keep> must be an integer.
ceph crash archive <crashid>
Archive a crash report so that it is no longer considered for the
RECENT_CRASH health check and does not appear in the
crash ls-new output (it will still appear in the
crash ls output).
ceph crash archive-all
Archive all new crash reports.
mgr/crash/warn_recent_interval[default: 2 weeks] controls what constitutes “recent” for the purposes of raising the
mgr/crash/retain_interval[default: 1 year] controls how long crash reports are retained by the cluster before they are automatically purged.