CephFS Directory Entry Name Normalization and Case Folding

CephFS allows configuring directory trees to normalize and possibly case fold directory entry names. This is typically a useful property for file systems exported by gateways like Samba which enforce a case-insensitive view of the file system, typically with performance penalties on file systems which are not case-insensitive.

The following virtual extended attributes control the character mapping rules for directory entries:

  • ceph.dir.casesensitive: A boolean setting for the case sensitivity of the directory. If true, case fold the directory entry names.

  • ceph.dir.normalization: A string setting for the type of Unicode normalization to apply for directory entry names. Currently the normalization forms D (nfd), C (nfc), KD (nfkd), and KC (nfkc) are understood by the client.

  • ceph.dir.encoding: A string setting for the encoding to use and enforce for directory entry names. The default and presently only supported encoding is UTF-8 (utf8).

There is also a convenience virtual extended attribute that is useful for getting the JSON encoding of the case sensitivity, normalization, and encoding configurations:

  • ceph.dir.charmap: The complete character mapping configuration for a directory.

It can also be used to remove all settings and restore the default CephFS behavior for directory entry names: uninterpreted bytes without / that are NUL terminated.

Note the following restrictions on manipulating any of these extended attributes:

  • The directory must be empty.

  • The directory must not be part of a snapshot.

New subdirectories created under a directory with a charmap configuration will inherit (copy) the parent’s configuration.

Note

You can remove a charmap on a subdirectory which inherited the configuration so long as the preconditions apply: it is empty and not part of an existing snapshot.

Normalization

The ceph.dir.normalization attribute accepts the following normalization forms:

  • nfd: Form D (Canonical Decomposition)

  • nfc: Form C (Canonical Decomposition, followed by Canonical Composition)

  • nfkd: Form KD (Compatibility Decomposition)

  • nfkc: Form KC (Compatibility Decomposition, followed by Canonical Composition)

The default normalization for a character mapping configuration is nfd.

Note

For more information about Unicode normalization forms, please see Unicode normalization standard documents.

Whenever a directory entry name is generated during path traversal or lookup, the client will apply the normalization to the name before submitting any operation to the MDS. On the MDS side, the directory entry names which are stored are only these normalized names.

For example, to set the normalization on a directory:

$ setfattr -n ceph.dir.normalization -v "" foo/

$ getfattr -n ceph.dir.charmap foo/
# file: foo/
ceph.dir.charmap="{\"casesensitive\":true,\"normalization\":\"nfd\",\"encoding\":\"utf8\"}"

$ getfattr -n ceph.dir.normalization foo/
# file: foo/
ceph.dir.normalization="nfd"

Note

Setting the empty string will cause the MDS to pick the default normalization.

All character mapping configurations must have a normalization enabled. Removing the normalization will cause the default to be restored:

$ setfattr -n ceph.dir.normalization -v nfc foo/
$ getfattr -n ceph.dir.normalization foo/
# file: foo/
ceph.dir.normalization="nfc"

$ setfattr -x ceph.dir.normalization foo/
$ getfattr -n ceph.dir.normalization foo/
# file: foo/
ceph.dir.normalization="nfd"

To remove normlization on a directory, you must remove the ceph.dir.charmap configuration.

Note

The MDS maintains an alternate_name metadata (also used for encryption) for directory entries which allows the client to persist the original un-normalized name used by the application. The MDS does not interpret this metadata in any way; it’s only used by clients to reconstruct the original name of the directory entry.

Case Folding

The ceph.dir.casesensitive attribute accepts a boolean value. By default, names are case-sensitive (as normal in a POSIX file system). Setting this value to false will make the directory (and its children) case-insensitive.

Case folding requires that names are also normalized. By default, after setting a directory to be case-insensitive, the charmap will be:

$ setfattr -n ceph.dir.casesensitive -v 0 foo/
$ getfattr -n ceph.dir.casesensitive foo/
# file: foo/
ceph.dir.casesensitive="0"

$ getfattr -n ceph.dir.charmap foo/
# file: foo/
ceph.dir.charmap="{\"casesensitive\":false,\"normalization\":\"nfd\",\"encoding\":\"utf8\"}"

Note that setting the case sensitivity on a directory will cause the default normalization to be selected.

Note

Normalization is applied before case folding. The directory entry name used by the MDS is the case folded and normalized name.

Removing Character Mapping

If a directory is empty and not part of a snapshot, the charmap can be removed:

$ setfattr -x ceph.dir.charmap foo/

One can confirm that this restores the normal CephFS behavior:

$ getfattr -n ceph.dir.charmap foo/
foo/: ceph.dir.charmap: No such attribute

If the attribute does not exist, then there is no character mapping for the directory. Note that a (future) child or parent directory may have a charmap configuration but it will have no effect on this directory. A charmap configuration is only inherited at directory creation.

Note

The default charmap includes normalization that cannot be disabled. The only way to turn off this functionality is by removing this charmap virtual extended attribute.

Restricting Incompatible Client Access

The MDS protects access to directory trees with a charmap via a new client feature bit. The MDS will not allow a client that does not understand the charmap feature to modify a directory with a charmap configuration except to unlink files or remove subdirectories.

You can also require that all clients understand the charmap feature to use the file system at all:

ceph fs required_client_features <fs_name> add charmap

Note

The kernel driver does not understand the charmap feature and probably will not because existing kernel libraries have opinionated case folding and normalization forms. For this reason, adding charmap to the required client features is not recommended.

Permissions

As with other CephFS virtual extended atributes, a client may only set the charmap configuration on a directory with the p MDS auth cap. Viewing the configuration does not require this cap.

Brought to you by the Ceph Foundation

The Ceph Documentation is a community resource funded and hosted by the non-profit Ceph Foundation. If you would like to support this and our other efforts, please consider joining now.