diff options
author | Dave Chinner <david@fromorbit.com> | 2023-04-14 07:11:43 +1000 |
---|---|---|
committer | Dave Chinner <dchinner@redhat.com> | 2023-04-14 07:11:43 +1000 |
commit | 798352cb25d2c27affbb5c733ed28430057228ca (patch) | |
tree | 7e9b83bc23e01b7d8dd4f6fb0ab32b6c29a09686 /fs/xfs/libxfs | |
parent | b89116c2fb3f6627e6b6ef1ccc96919603aa0315 (diff) | |
parent | 7ba83850ca2691865713b307ed001bde5fddb084 (diff) |
Merge tag 'fix-asciici-bugs-6.4_2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into guilt/xfs-for-next
xfs: fix ascii-ci problems, then kill it [v2]
Last week, I was fiddling around with the metadump name obfuscation code
while writing a debugger command to generate directories full of names
that all have the same hash name. I had a few questions about how well
all that worked with ascii-ci mode, and discovered a nasty discrepancy
between the kernel and glibc's implementations of the tolower()
function.
I discovered that I could create a directory that is large enough to
require separate leaf index blocks. The hashes stored in the dabtree
use the ascii-ci specific hash function, which uses a library function
to convert the name to lowercase before hashing. If the kernel and C
library's versions of tolower do not behave exactly identically,
xfs_ascii_ci_hashname will not produce the same results for the same
inputs. xfs_repair will deem the leaf information corrupt and rebuild
the directory. After that, lookups in the kernel will fail because the
hash index doesn't work.
The kernel's tolower function will convert extended ascii uppercase
letters (e.g. A-with-umlaut) to extended ascii lowercase letters (e.g.
a-with-umlaut), whereas glibc's will only do that if you force LANG to
ascii. Tiny embedded libc implementations just plain won't do it at
all, and the result is a mess. Stabilize the behavior of the hash
function by encoding the name transformation function in libxfs, add it
to the selftest, and fix all the userspace tools, none of which handle
this transformation correctly.
The v1 series generated a /lot/ of discussion, in which several things
became very clear: (1) Linus is not enamored of case folding of any
kind; (2) Dave and Christoph don't seem to agree on whether the feature
is supposed to work for 7-bit ascii or latin1; (3) it trashes UTF8
encoded names if those happen to show up; and (4) I don't want to
maintain this mess any longer than I have to. Kill it in 2030.
v2: rename the functions to make it clear we're moving away from the
letters t, o, l, o, w, e, and r; and deprecate the whole feature once
we've fixed the bugs and added tests.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Diffstat (limited to 'fs/xfs/libxfs')
-rw-r--r-- | fs/xfs/libxfs/xfs_dir2.c | 5 | ||||
-rw-r--r-- | fs/xfs/libxfs/xfs_dir2.h | 31 |
2 files changed, 34 insertions, 2 deletions
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index 92bac3373f1f..f5462fd582d5 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -64,7 +64,7 @@ xfs_ascii_ci_hashname( int i; for (i = 0, hash = 0; i < name->len; i++) - hash = tolower(name->name[i]) ^ rol32(hash, 7); + hash = xfs_ascii_ci_xfrm(name->name[i]) ^ rol32(hash, 7); return hash; } @@ -85,7 +85,8 @@ xfs_ascii_ci_compname( for (i = 0; i < len; i++) { if (args->name[i] == name[i]) continue; - if (tolower(args->name[i]) != tolower(name[i])) + if (xfs_ascii_ci_xfrm(args->name[i]) != + xfs_ascii_ci_xfrm(name[i])) return XFS_CMP_DIFFERENT; result = XFS_CMP_CASE; } diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h index dd39f17dd9a9..19af22a16c41 100644 --- a/fs/xfs/libxfs/xfs_dir2.h +++ b/fs/xfs/libxfs/xfs_dir2.h @@ -248,4 +248,35 @@ unsigned int xfs_dir3_data_end_offset(struct xfs_da_geometry *geo, struct xfs_dir2_data_hdr *hdr); bool xfs_dir2_namecheck(const void *name, size_t length); +/* + * The "ascii-ci" feature was created to speed up case-insensitive lookups for + * a Samba product. Because of the inherent problems with CI and UTF-8 + * encoding, etc, it was decided that Samba would be configured to export + * latin1/iso 8859-1 encodings as that covered >90% of the target markets for + * the product. Hence the "ascii-ci" casefolding code could be encoded into + * the XFS directory operations and remove all the overhead of casefolding from + * Samba. + * + * To provide consistent hashing behavior between the userspace and kernel, + * these functions prepare names for hashing by transforming specific bytes + * to other bytes. Robustness with other encodings is not guaranteed. + */ +static inline bool xfs_ascii_ci_need_xfrm(unsigned char c) +{ + if (c >= 0x41 && c <= 0x5a) /* A-Z */ + return true; + if (c >= 0xc0 && c <= 0xd6) /* latin A-O with accents */ + return true; + if (c >= 0xd8 && c <= 0xde) /* latin O-Y with accents */ + return true; + return false; +} + +static inline unsigned char xfs_ascii_ci_xfrm(unsigned char c) +{ + if (xfs_ascii_ci_need_xfrm(c)) + c -= 'A' - 'a'; + return c; +} + #endif /* __XFS_DIR2_H__ */ |