crimson¶
Crimson is the code name of crimson-osd, which is the next generation ceph-osd. It targets fast networking devices, fast storage devices by leveraging state of the art technologies like DPDK and SPDK, for better performance. And it will keep the support of HDDs and low-end SSDs via BlueStore. Crismon will try to be backward compatible with classic OSD.
Building Crimson¶
Crismon is not enabled by default. To enable it:
$ WITH_SEASTAR=true ./install-deps.sh
$ mkdir build && cd build
$ cmake -DWITH_SEASTAR=ON ..
Please note, ASan is enabled by default if crimson is built from a source cloned using git.
Also, Seastar uses its own lockless allocator which does not play well with
the alien threads. So, to use alienstore / bluestore backend, you might want to
pass -DSeastar_CXX_FLAGS=-DSEASTAR_DEFAULT_ALLOCATOR
to cmake
when
configuring this project to use the libc allocator, like:
$ cmake -DWITH_SEASTAR=ON -DSeastar_CXX_FLAGS=-DSEASTAR_DEFAULT_ALLOCATOR ..
Running Crimson¶
As you might expect, crimson is not featurewise on par with its predecessor yet.
object store backend¶
At the moment crimson-osd
offers two object store backends:
CyanStore: CyanStore is modeled after memstore in classic OSD.
AlienStore: AlienStore is short for Alienized BlueStore.
Seastore is still under active development.
daemonize¶
Unlike ceph-osd
, crimson-osd
does daemonize itself even if the
daemonize
option is enabled. Because, to read this option, crimson-osd
needs to ready its config sharded service, but this sharded service lives
in the seastar reactor. If we fork a child process and exit the parent after
starting the Seastar engine, that will leave us with a single thread which is
the replica of the thread calls fork(). This would unnecessarily complicate
the code, if we would have tackled this problem in crimson.
Since a lot of GNU/Linux distros are using systemd nowadays, which is able to
daemonize the application, there is no need to daemonize by ourselves. For
those who are using sysvinit, they can use start-stop-daemon
for daemonizing
crimson-osd
. If this is not acceptable, we can whip up a helper utility
to do the trick.
logging¶
Currently, crimson-osd
uses the logging utility offered by Seastar. see
src/common/dout.h
for the mapping between different logging levels to
the severity levels in Seastar. For instance, the messages sent to derr
will be printed using logger::error()
, and the messages with debug level
over 20
will be printed using logger::trace()
.
ceph |
seastar |
< 0 |
error |
0 |
warn |
[1, 5) |
info |
[5, 20] |
debug |
> 20 |
trace |
Please note, crimson-osd
does not send the logging message to specified log_file
. It writes
the logging messages to stdout and/or syslog. Again, this behavior can be
changed using --log-to-stdout
and --log-to-syslog
command line
options. By default, log-to-stdout
is enabled, and the latter disabled.
vstart.sh¶
To facilitate the development of crimson, following options would be handy when
using vstart.sh
,
--crimson
start
crimson-osd
instead ofceph-osd
--nodaemon
do not daemonize the service
--redirect-output
redirect the stdout and stderr of service to
out/$type.$num.stdout
.--osd-args
pass extra command line options to crimson-osd or ceph-osd. It’s quite useful for passing Seastar options to crimson-osd. For instance, you could use
--osd-args "--memory 2G"
to set the memory to use. Please refer the output of:crimson-osd --help-seastar
for more Seastar specific command line options.
--memstore
use the CyanStore as the object store backend.
--bluestore
use the AlienStore as the object store backend. This is the default setting, if not specified otherwise.
So, a typical command to start a single-crimson-node cluster is:
$ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \
--without-dashboard --memstore \
--crimson --nodaemon --redirect-output \
--osd-args "--memory 4G"
Where we assign 4 GiB memory, a single thread running on core-0 to crimson-osd.
You could stop the vstart cluster using:
$ ../src/stop.sh --crimson
CBT Based Testing¶
We can use cbt for performing perf tests:
$ git checkout master
$ make crimson-osd
$ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/baseline ../src/test/crimson/cbt/radosbench_4K_read.yaml
$ git checkout yet-another-pr
$ make crimson-osd
$ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/yap ../src/test/crimson/cbt/radosbench_4K_read.yaml
$ ~/dev/cbt/compare.py -b /tmp/baseline -a /tmp/yap -v
19:48:23 - INFO - cbt - prefill/gen8/0: bandwidth: (or (greater) (near 0.05)):: 0.183165/0.186155 => accepted
19:48:23 - INFO - cbt - prefill/gen8/0: iops_avg: (or (greater) (near 0.05)):: 46.0/47.0 => accepted
19:48:23 - WARNING - cbt - prefill/gen8/0: iops_stddev: (or (less) (near 0.05)):: 10.4403/6.65833 => rejected
19:48:23 - INFO - cbt - prefill/gen8/0: latency_avg: (or (less) (near 0.05)):: 0.340868/0.333712 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: bandwidth: (or (greater) (near 0.05)):: 0.190447/0.177619 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: iops_avg: (or (greater) (near 0.05)):: 48.0/45.0 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: iops_stddev: (or (less) (near 0.05)):: 6.1101/9.81495 => accepted
19:48:23 - INFO - cbt - prefill/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.325163/0.350251 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: bandwidth: (or (greater) (near 0.05)):: 1.24654/1.22336 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: iops_avg: (or (greater) (near 0.05)):: 319.0/313.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: iops_stddev: (or (less) (near 0.05)):: 0.0/0.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/0: latency_avg: (or (less) (near 0.05)):: 0.0497733/0.0509029 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: bandwidth: (or (greater) (near 0.05)):: 1.22717/1.11372 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: iops_avg: (or (greater) (near 0.05)):: 314.0/285.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: iops_stddev: (or (less) (near 0.05)):: 0.0/0.0 => accepted
19:48:23 - INFO - cbt - seq/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.0508262/0.0557337 => accepted
19:48:23 - WARNING - cbt - 1 tests failed out of 16
Where we compile and run the same test against two branches. One is master
, another is yet-another-pr
branch.
And then we compare the test results. Along with every test case, a set of rules is defined to check if we have
performance regressions when comparing two set of test results. If a possible regression is found, the rule and
corresponding test results are highlighted.
Hacking Crimson¶
Seastar Documents¶
See Seastar Tutorial . Or build a browsable version and start an HTTP server:
$ cd seastar
$ ./configure.py --mode debug
$ ninja -C build/debug docs
$ python3 -m http.server -d build/debug/doc/html
You might want to install pandoc
and other dependencies beforehand.
Debugging Crimson¶
Debugging with GDB¶
The tips for debugging Scylla also apply to Crimson.
Human-readable backtraces with addr2line¶
When a seastar application crashes, it leaves us with a serial of addresses, like:
Segmentation fault.
Backtrace:
0x00000000108254aa
0x00000000107f74b9
0x00000000105366cc
0x000000001053682c
0x00000000105d2c2e
0x0000000010629b96
0x0000000010629c31
0x00002a02ebd8272f
0x00000000105d93ee
0x00000000103eff59
0x000000000d9c1d0a
/lib/x86_64-linux-gnu/libc.so.6+0x000000000002409a
0x000000000d833ac9
Segmentation fault
seastar-addr2line
offered by Seastar can be used to decipher these
addresses. After running the script, it will be waiting for input from stdin,
so we need to copy and paste the above addresses, then send the EOF by inputting
control-D
in the terminal:
$ ../src/seastar/scripts/seastar-addr2line -e bin/crimson-osd
0x00000000108254aa
0x00000000107f74b9
0x00000000105366cc
0x000000001053682c
0x00000000105d2c2e
0x0000000010629b96
0x0000000010629c31
0x00002a02ebd8272f
0x00000000105d93ee
0x00000000103eff59
0x000000000d9c1d0a
0x00000000108254aa
[Backtrace #0]
seastar::backtrace_buffer::append_backtrace() at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1136
seastar::print_with_backtrace(seastar::backtrace_buffer&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1157
seastar::print_with_backtrace(char const*) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1164
seastar::sigsegv_action() at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5119
seastar::install_oneshot_signal_handler<11, &seastar::sigsegv_action>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5105
seastar::install_oneshot_signal_handler<11, &seastar::sigsegv_action>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5101
?? ??:0
seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5418
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/app-template.cc:173 (discriminator 5)
main at /home/kefu/dev/ceph/build/../src/crimson/osd/main.cc:131 (discriminator 1)
Please note, seastar-addr2line
is able to extract the addresses from
the input, so you can also paste the log messages like:
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr:Backtrace:
2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e78dbc
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e7f0
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e8b8
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e985
2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: /lib64/libpthread.so.0+0x0000000000012dbf
Unlike classic OSD, crimson does not print a human-readable backtrace when it handles fatal signals like SIGSEGV or SIGABRT. And it is more complicated when it comes to a stripped binary. So before planting a signal handler for those signals in crimson, we could to use script/ceph-debug-docker.sh to parse the addresses in the backtrace:
# assuming you are under the source tree of ceph
$ ./src/script/ceph-debug-docker.sh --flavor crimson master:27e237c137c330ebb82627166927b7681b20d0aa centos:8
....
[root@3deb50a8ad51 ~]# wget -q https://raw.githubusercontent.com/scylladb/seastar/master/scripts/seastar-addr2line
[root@3deb50a8ad51 ~]# dnf install -q -y file
[root@3deb50a8ad51 ~]# python3 seastar-addr2line -e /usr/bin/crimson-osd
# paste the backtrace here