aboutsummaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2022-10-05Merge tag 'sound-6.1-rc1' of ↵Linus Torvalds1-0/+44
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "The majority of changes are ASoC drivers (SOF, Intel, AMD, Mediatek, Qualcomm, TI, Apple Silicon, etc), while we see a few small fixes in ALSA / ASoC core side, too. Here are highlights: Core: - A new string helper parse_int_array_user() and cleanups with it - Continued cleanup of memory allocation helpers - PCM core optimization and hardening - Continued ASoC core code cleanups ASoC: - Improvements to the SOF IPC4 code, especially around trace - Support for AMD Rembrant DSPs, AMD Pink Sardine ACP 6.2, Apple Silicon systems, Everest ES8326, Intel Sky Lake and Kaby Lake, Mediatek MT8186 support, NXP i.MX8ULP DSPs, Qualcomm SC8280XP, SM8250 and SM8450 and Texas Instruments SRC4392 HD- and USB-audio: - Cleanups for unification of hda-ext bus - HD-audio HDMI codec driver cleanups - Continued endpoint management fixes for USB-audio - New quirks as usual" * tag 'sound-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (422 commits) ALSA: hda: Fix position reporting on Poulsbo ALSA: hda/hdmi: Don't skip notification handling during PM operation ASoC: rockchip: i2s: use regmap_read_poll_timeout_atomic to poll I2S_CLR ASoC: dt-bindings: Document audio OF graph dai-tdm-slot-num dai-tdm-slot-width props ASoC: qcom: fix unmet direct dependencies for SND_SOC_QDSP6 ALSA: usb-audio: Fix potential memory leaks ALSA: usb-audio: Fix NULL dererence at error path ASoC: mediatek: mt8192-mt6359: Set the driver name for the card ALSA: hda/realtek: More robust component matching for CS35L41 ASoC: Intel: sof_rt5682: remove SOF_RT1015_SPEAKER_AMP_100FS flag ASoC: nau8825: Add TDM support ASoC: core: clarify the driver name initialization ASoC: mt6660: Fix PM disable depth imbalance in mt6660_i2c_probe ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe ASoC: wcd-mbhc-v2: Revert "ASoC: wcd-mbhc-v2: use pm_runtime_resume_and_get()" ASoC: mediatek: mt8186: Fix spelling mistake "slect" -> "select" ALSA: hda/realtek: Add quirk for HP Zbook Firefly 14 G9 model ALSA: asihpi - Remove unused struct hpi_subsys_response ...
2022-10-04Merge tag 'net-next-6.1' of ↵Linus Torvalds2-4/+57
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ...
2022-10-04Merge branch 'for-6.1-hash-pointer-init' into for-linusPetr Mladek1-21/+26
2022-10-03once: rename _SLOW to _SLEEPABLEJason A. Donenfeld1-5/+5
The _SLOW designation wasn't really descriptive of anything. This is meant to be called from process context when it's possible to sleep. So name this more aptly _SLEEPABLE, which better fits its intended use. Fixes: 62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts") Cc: Christophe Leroy <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-03Merge tag 'hardening-v6.1-rc1' of ↵Linus Torvalds6-59/+329
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: "Most of the collected changes here are fixes across the tree for various hardening features (details noted below). The most notable new feature here is the addition of the memcpy() overflow warning (under CONFIG_FORTIFY_SOURCE), which is the next step on the path to killing the common class of "trivially detectable" buffer overflow conditions (i.e. on arrays with sizes known at compile time) that have resulted in many exploitable vulnerabilities over the years (e.g. BleedingTooth). This feature is expected to still have some undiscovered false positives. It's been in -next for a full development cycle and all the reported false positives have been fixed in their respective trees. All the known-bad code patterns we could find with Coccinelle are also either fixed in their respective trees or in flight. The commit message in commit 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()") for the feature has extensive details, but I'll repeat here that this is a warning _only_, and is not intended to actually block overflows (yet). The many patches fixing array sizes and struct members have been landing for several years now, and we're finally able to turn this on to find any remaining stragglers. Summary: Various fixes across several hardening areas: - loadpin: Fix verity target enforcement (Matthias Kaehlcke). - zero-call-used-regs: Add missing clobbers in paravirt (Bill Wendling). - CFI: clean up sparc function pointer type mismatches (Bart Van Assche). - Clang: Adjust compiler flag detection for various Clang changes (Sami Tolvanen, Kees Cook). - fortify: Fix warnings in arch-specific code in sh, ARM, and xen. Improvements to existing features: - testing: improve overflow KUnit test, introduce fortify KUnit test, add more coverage to LKDTM tests (Bart Van Assche, Kees Cook). - overflow: Relax overflow type checking for wider utility. New features: - string: Introduce strtomem() and strtomem_pad() to fill a gap in strncpy() replacement needs. - um: Enable FORTIFY_SOURCE support. - fortify: Enable run-time struct member memcpy() overflow warning" * tag 'hardening-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (27 commits) Makefile.extrawarn: Move -Wcast-function-type-strict to W=1 hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero sparc: Unbreak the build x86/paravirt: add extra clobbers with ZERO_CALL_USED_REGS enabled x86/paravirt: clean up typos and grammaros fortify: Convert to struct vs member helpers fortify: Explicitly check bounds are compile-time constants x86/entry: Work around Clang __bdos() bug ARM: decompressor: Include .data.rel.ro.local fortify: Adjust KUnit test for modular build sh: machvec: Use char[] for section boundaries kunit/memcpy: Avoid pathological compile-time string size lib: Improve the is_signed_type() kunit test LoadPin: Require file with verity root digests to have a header dm: verity-loadpin: Only trust verity targets with enforcement LoadPin: Fix Kconfig doc about format of file with verity digests um: Enable FORTIFY_SOURCE lkdtm: Update tests for memcpy() run-time warnings fortify: Add run-time WARN for cross-field memcpy() fortify: Use SIZE_MAX instead of (size_t)-1 ...
2022-10-03kmsan: disable strscpy() optimization under KMSANAlexander Potapenko1-0/+8
Disable the efficient 8-byte reading under KMSAN to avoid false positives. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03kmsan: add tests for KMSANAlexander Potapenko1-0/+12
The testing module triggers KMSAN warnings in different cases and checks that the errors are properly reported, using console probes to capture the tool's output. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03kmsan: add iomap supportAlexander Potapenko1-0/+44
Functions from lib/iomap.c interact with hardware, so KMSAN must ensure that: - every read function returns an initialized value - every write function checks values before sending them to hardware. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03kmsan: disable instrumentation of unsupported common kernel codeAlexander Potapenko1-0/+3
EFI stub cannot be linked with KMSAN runtime, so we disable instrumentation for it. Instrumenting kcov, stackdepot or lockdep leads to infinite recursion caused by instrumentation hooks calling instrumented code again. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03kmsan: add KMSAN runtime coreAlexander Potapenko2-0/+51
For each memory location KernelMemorySanitizer maintains two types of metadata: 1. The so-called shadow of that location - а byte:byte mapping describing whether or not individual bits of memory are initialized (shadow is 0) or not (shadow is 1). 2. The origins of that location - а 4-byte:4-byte mapping containing 4-byte IDs of the stack traces where uninitialized values were created. Each struct page now contains pointers to two struct pages holding KMSAN metadata (shadow and origins) for the original struct page. Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the metadata creation, addressing, copying and checking. mm/kmsan/report.c performs error reporting in the cases an uninitialized value is used in a way that leads to undefined behavior. KMSAN compiler instrumentation is responsible for tracking the metadata along with the kernel memory. mm/kmsan/instrumentation.c provides the implementation for instrumentation hooks that are called from files compiled with -fsanitize=kernel-memory. To aid parameter passing (also done at instrumentation level), each task_struct now contains a struct kmsan_task_state used to track the metadata of function parameters and return values for that task. Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN. The KMSAN_SANITIZE:=n Makefile directive can be used to completely disable KMSAN instrumentation for certain files. Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly created stack memory initialized. Users can also use functions from include/linux/kmsan-checks.h to mark certain memory regions as uninitialized or initialized (this is called "poisoning" and "unpoisoning") or check that a particular region is initialized. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03instrumented.h: allow instrumenting both sides of copy_from_user()Alexander Potapenko2-4/+8
Introduce instrument_copy_from_user_before() and instrument_copy_from_user_after() hooks to be invoked before and after the call to copy_from_user(). KASAN and KCSAN will be only using instrument_copy_from_user_before(), but for KMSAN we'll need to insert code after copy_from_user(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03stackdepot: reserve 5 extra bits in depot_stack_handle_tAlexander Potapenko1-5/+24
Some users (currently only KMSAN) may want to use spare bits in depot_stack_handle_t. Let them do so by adding @extra_bits to __stack_depot_save() to store arbitrary flags, and providing stack_depot_get_extra_bits() to retrieve those flags. Also adapt KASAN to the new prototype by passing extra_bits=0, as KASAN does not intend to store additional information in the stack handle. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ilya Leoshkevich <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03mm/hmm/test: use char dev with struct device to get device nodeMika Penttilä1-3/+10
HMM selftests use an in-kernel pseudo device to emulate device memory. The pseudo device registers a major device range for two or four pseudo device instances. User space has a script that reads /proc/devices in order to find the assigned major number, and sends that to mknod(1), once for each node. Change this to properly use cdev and struct device APIs. Delete the /proc/devices parsing from the user-space test script, now that it is unnecessary. Also, delete an unused field in struct dmirror_device: devmem. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mika Penttilä <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03kasan: move tests to mm/kasan/Andrey Konovalov3-1596/+0
Move KASAN tests to mm/kasan/ to keep the test code alongside the implementation. Link: https://lkml.kernel.org/r/676398f0aeecd47d2f8e3369ea0e95563f641a36.1662416260.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03kasan: add another use-after-free testAndrey Konovalov1-0/+24
Add a new use-after-free test that checks that KASAN detects use-after-free when another object was allocated in the same slot. This test is mainly relevant for the tag-based modes, which do not use quarantine. Once [1] is resolved, this test can be extended to check that the stack traces in the report point to the proper kmalloc/kfree calls. [1] https://bugzilla.kernel.org/show_bug.cgi?id=212203 Link: https://lkml.kernel.org/r/0659cfa15809dd38faa02bc0a59d0b5dbbd81211.1662411800.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Acked-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Peter Collingbourne <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03kasan: drop CONFIG_KASAN_TAGS_IDENTIFYAndrey Konovalov1-8/+0
Drop CONFIG_KASAN_TAGS_IDENTIFY and related code to simplify making changes to the reporting code. The dropped functionality will be restored in the following patches in this series. Link: https://lkml.kernel.org/r/4c66ba98eb237e9ed9312c19d423bbcf4ecf88f8.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Peter Collingbourne <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-10-03once: add DO_ONCE_SLOW() for sleepable contextsEric Dumazet1-0/+30
Christophe Leroy reported a ~80ms latency spike happening at first TCP connect() time. This is because __inet_hash_connect() uses get_random_once() to populate a perturbation table which became quite big after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") get_random_once() uses DO_ONCE(), which block hard irqs for the duration of the operation. This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock for operations where we prefer to stay in process context. Then __inet_hash_connect() can use get_random_slow_once() to populate its perturbation table. Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time") Reported-by: Christophe Leroy <[email protected]> Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <[email protected]> Cc: Willy Tarreau <[email protected]> Tested-by: Christophe Leroy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03zstd: Fixing mixed module-builtin objectsAlexey Kardashevskiy4-12/+27
With CONFIG_ZSTD_COMPRESS=m and CONFIG_ZSTD_DECOMPRESS=y we end up in a situation when files from lib/zstd/common/ are compiled once to be linked later for ZSTD_DECOMPRESS (build-in) and ZSTD_COMPRESS (module) even though CFLAGS are different for builtins and modules. So far somehow this was not a problem but enabling LLVM LTO exposes the problem as: ld.lld: error: linking module flags 'Code Model': IDs have conflicting values in 'lib/built-in.a(zstd_common.o at 5868)' and 'ld-temp.o' This particular conflict is caused by KBUILD_CFLAGS=-mcmodel=medium vs. KBUILD_CFLAGS_MODULE=-mcmodel=large , modules use the large model on POWERPC as explained at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Makefile?h=v5.18-rc4#n127 but the current use of common files is wrong anyway. This works around the issue by introducing a zstd_common module with shared code. Signed-off-by: Alexey Kardashevskiy <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2022-10-01lib/bitmap: add tests for for_each() loopsYury Norov1-1/+243
We have a test for test_for_each_set_clump8 only. Add basic tests for the others. Signed-off-by: Yury Norov <[email protected]>
2022-10-01lib/find_bit: add find_next{,_and}_bit_wrapYury Norov1-9/+3
The helper is better optimized for the worst case: in case of empty cpumask, current code traverses 2 * size: next = cpumask_next_and(prev, src1p, src2p); if (next >= nr_cpu_ids) next = cpumask_first_and(src1p, src2p); At bitmap level we can stop earlier after checking 'size + offset' bits. Signed-off-by: Yury Norov <[email protected]>
2022-09-30lib: stackinit: update reference to kunit-toolTales Aparecida1-1/+1
Replace URL with an updated path to the full Documentation page Signed-off-by: Tales Aparecida <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2022-09-30lib: overflow: update reference to kunit-toolTales Aparecida1-1/+1
Replace URL with an updated path to the full Documentation page Signed-off-by: Tales Aparecida <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2022-09-30kunit: add kunit.enable to enable/disable KUnit testJoe Fradley3-0/+39
This patch adds the kunit.enable module parameter that will need to be set to true in addition to KUNIT being enabled for KUnit tests to run. The default value is true giving backwards compatibility. However, for the production+testing use case the new config option KUNIT_DEFAULT_ENABLED can be set to N requiring the tester to opt-in by passing kunit.enable=1 to the kernel. Signed-off-by: Joe Fradley <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2022-09-29sbitmap: fix lockup while swappingHugh Dickins1-1/+1
Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting") is a big improvement: without it, I had to revert to before commit 040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup") to avoid the high system time and freezes which that had introduced. Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy swapping (kernel builds in low memory) on another: soon locking up in sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling around with waitqueue_active() but wait_cnt 0 . Here is a backtrace, showing the common pattern of outer sbitmap_queue_wake_up() interrupted before setting wait_cnt 0 back to wake_batch (in some cases other CPUs are idle, in other cases they're spinning for a lock in dd_bio_merge()): sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag < __blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request < scsi_end_request < scsi_io_completion < scsi_finish_command < scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq < __irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt < _raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up < sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag < __blk_mq_free_request < blk_mq_free_request < dd_bio_merge < blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio < __submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct < submit_bio < __swap_writepage < swap_writepage < pageout < shrink_folio_list < evict_folios < lru_gen_shrink_lruvec < shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages < __alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio < do_anonymous_page < __handle_mm_fault < handle_mm_fault < do_user_addr_fault < exc_page_fault < asm_exc_page_fault See how the process-context sbitmap_queue_wake_up() has been interrupted, after bringing wait_cnt down to 0 (and in this example, after doing its wakeups), before advancing wake_index and refilling wake_cnt: an interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck. I have almost no grasp of all the possible sbitmap races, and their consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0, so it is better if sbq_wake_ptr() skips on to the next ws in that case: which fixes the lockup and shows no adverse consequence for me. The check for wait_cnt being 0 is obviously racy, and ultimately can lead to lost wakeups: for example, when there is only a single waitqueue with waiters. However, lost wakeups are unlikely to matter in these cases, and a proper fix requires redesign (and benchmarking) of the batched wakeup code: so let's plug the hole with this bandaid for now. Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+3
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-29lib/vsprintf: Initialize vsprintf's pointer hash once the random core is ready.Sebastian Andrzej Siewior1-19/+27
The printk code invokes vnsprintf in order to compute the complete string before adding it into its buffer. This happens in an IRQ-off region which leads to a warning on PREEMPT_RT in the random code if the format strings contains a %p for pointer printing. This happens because the random core acquires locks which become sleeping locks on PREEMPT_RT which must not be acquired with disabled interrupts and or preemption disabled. By default the pointers are hashed which requires a random value on the first invocation (either by printk or another user which comes first. One could argue that there is no need for printk to disable interrupts during the vsprintf() invocation which would fix the just mentioned problem. However printk itself can be invoked in a context with disabled interrupts which would lead to the very same problem. Move the initialization of ptr_key into a worker and schedule it from subsys_initcall(). This happens early but after the workqueue subsystem is ready. Use get_random_bytes() to retrieve the random value if the RNG core is ready, otherwise schedule a worker in two seconds and try again. Another advantage is that it removes a lock from the vsprintf() code path. It prevents a possible deadlock when printk("%p", ptr) is called under the lock taken in get_random_bytes(). Reported-by: Mike Galbraith <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Jason A. Donenfeld <[email protected]> Reviewed-by: Petr Mladek <[email protected]> [[email protected]: Added a note about the it prevented a possible deadlock in printk().] Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-29lib/vsprintf: Remove static_branch_likely() from __ptr_to_hashval().Sebastian Andrzej Siewior1-11/+8
Using static_branch_likely() to signal that ptr_key has been filled is a bit much given that it is not a fast path. Replace static_branch_likely() with bool for condition and a memory barrier for ptr_key. Suggested-by: Petr Mladek <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-29Merge branch 'v6.0-rc7'Peter Zijlstra5-29/+49
Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <[email protected]>
2022-09-28Kbuild: add Rust supportMiguel Ojeda1-0/+34
Having most of the new files in place, we now enable Rust support in the build system, including `Kconfig` entries related to Rust, the Rust configuration printer and a few other bits. Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Tested-by: Nick Desaulniers <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Co-developed-by: Alex Gaynor <[email protected]> Signed-off-by: Alex Gaynor <[email protected]> Co-developed-by: Finn Behrens <[email protected]> Signed-off-by: Finn Behrens <[email protected]> Co-developed-by: Adam Bratschi-Kaye <[email protected]> Signed-off-by: Adam Bratschi-Kaye <[email protected]> Co-developed-by: Wedson Almeida Filho <[email protected]> Signed-off-by: Wedson Almeida Filho <[email protected]> Co-developed-by: Michael Ellerman <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Co-developed-by: Sven Van Asbroeck <[email protected]> Signed-off-by: Sven Van Asbroeck <[email protected]> Co-developed-by: Gary Guo <[email protected]> Signed-off-by: Gary Guo <[email protected]> Co-developed-by: Boris-Chengbiao Zhou <[email protected]> Signed-off-by: Boris-Chengbiao Zhou <[email protected]> Co-developed-by: Boqun Feng <[email protected]> Signed-off-by: Boqun Feng <[email protected]> Co-developed-by: Douglas Su <[email protected]> Signed-off-by: Douglas Su <[email protected]> Co-developed-by: Dariusz Sosnowski <[email protected]> Signed-off-by: Dariusz Sosnowski <[email protected]> Co-developed-by: Antonio Terceiro <[email protected]> Signed-off-by: Antonio Terceiro <[email protected]> Co-developed-by: Daniel Xu <[email protected]> Signed-off-by: Daniel Xu <[email protected]> Co-developed-by: Björn Roy Baron <[email protected]> Signed-off-by: Björn Roy Baron <[email protected]> Co-developed-by: Martin Rodriguez Reboredo <[email protected]> Signed-off-by: Martin Rodriguez Reboredo <[email protected]> Signed-off-by: Miguel Ojeda <[email protected]>
2022-09-28vsprintf: add new `%pA` format specifierGary Guo1-0/+13
This patch adds a format specifier `%pA` to `vsprintf` which formats a pointer as `core::fmt::Arguments`. Doing so allows us to directly format to the internal buffer of `printf`, so we do not have to use a temporary buffer on the stack to pre-assemble the message on the Rust side. This specifier is intended only to be used from Rust and not for C, so `checkpatch.pl` is intentionally unchanged to catch any misuse. Reviewed-by: Kees Cook <[email protected]> Acked-by: Petr Mladek <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Co-developed-by: Alex Gaynor <[email protected]> Signed-off-by: Alex Gaynor <[email protected]> Co-developed-by: Wedson Almeida Filho <[email protected]> Signed-off-by: Wedson Almeida Filho <[email protected]> Signed-off-by: Gary Guo <[email protected]> Co-developed-by: Miguel Ojeda <[email protected]> Signed-off-by: Miguel Ojeda <[email protected]>
2022-09-26mm: reduce noise in show_mem for lowmem allocationsMichal Hocko1-2/+2
While discussing early DMA pool pre-allocation failure with Christoph [1] I have realized that the allocation failure warning is rather noisy for constrained allocations like GFP_DMA{32}. Those zones are usually not populated on all nodes very often as their memory ranges are constrained. This is an attempt to reduce the ballast that doesn't provide any relevant information for those allocation failures investigation. Please note that I have only compile tested it (in my default config setup) and I am throwing it mostly to see what people think about it. [1] http://lkml.kernel.org/r/[email protected] [[email protected]: update] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix build] [[email protected]: fix it for mapletree] [[email protected]: update it for Michal's update] [[email protected]: fix arch/powerpc/xmon/xmon.c] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix arch/sparc/kernel/setup_32.c] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26mm: remove vmacacheLiam R. Howlett1-8/+0
By using the maple tree and the maple tree state, the vmacache is no longer beneficial and is complicating the VMA code. Remove the vmacache to reduce the work in keeping it up to date and code complexity. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26lib/test_maple_tree: add testing for maple treeLiam R. Howlett1-0/+38307
This is a test suite that uses the radix test infrastructure. It has been split into its own commit to allow for easier review of the maple tree code. The testing includes: - Allocation of nodes - gfp flag allocation checks - Expansion & contraction of tree - preallocation checks - tree navigation by next/prev - tree navigation by iterators (mas_for_each, etc) - Number of nodes for a given number of entries - Generic tree construction tests - Addition and removal of entries in forward and reverse numerical indexes - gap searching both forward and reverse - Combining gaps by overwriting entries in different ways - splitting right-most node - splitting left-most node - overwriting multiple slots - overwriting across different levels of the tree - overwriting the middle of a tree - causing a 3-way split up to the root by overwriting the last slot and first slot of different nodes and spanning different levels - RCU stress testing of the tree with threads - Duplication of the tree by entry count - Tests which were generated by fuzzers have been added. - A large number of tests which come from recording crashing in a VM and reconstructing the tree (see check_erase2_set()) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26Maple Tree: add new data structureLiam R. Howlett3-1/+7146
Patch series "Introducing the Maple Tree" The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. Davidlor said : Yes I like the maple tree, and at this stage I don't think we can ask for : more from this series wrt the MM - albeit there seems to still be some : folks reporting breakage. Fundamentally I see Liam's work to (re)move : complexity out of the MM (not to say that the actual maple tree is not : complex) by consolidating the three complimentary data structures very : much worth it considering performance does not take a hit. This was very : much a turn off with the range locking approach, which worst case scenario : incurred in prohibitive overhead. Also as Liam and Matthew have : mentioned, RCU opens up a lot of nice performance opportunities, and in : addition academia[1] has shown outstanding scalability of address spaces : with the foundation of replacing the locked rbtree with RCU aware trees. A similar work has been discovered in the academic press https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf Sheer coincidence. We designed our tree with the intention of solving the hardest problem first. Upon settling on a b-tree variant and a rough outline, we researched ranged based b-trees and RCU b-trees and did find that article. So it was nice to find reassurances that we were on the right path, but our design choice of using ranges made that paper unusable for us. This patch (of 70): The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. There is additional BUG_ON() calls added within the tree, most of which are in debug code. These will be replaced with a WARN_ON() call in the future. There is also additional BUG_ON() calls within the code which will also be reduced in number at a later date. These exist to catch things such as out-of-range accesses which would crash anyways. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: David Howells <[email protected]> Tested-by: Sven Schnelle <[email protected]> Tested-by: Yu Zhao <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-09-26cpumask: add cpumask_nth_{,and,andnot}Yury Norov1-15/+13
Add cpumask_nth_{,and,andnot} as wrappers around corresponding find functions, and use it in cpumask_local_spread(). Signed-off-by: Yury Norov <[email protected]>
2022-09-26lib/bitmap: remove bitmap_ord_to_posYury Norov1-33/+3
Now that we have find_nth_bit(), we can drop bitmap_ord_to_pos(). Signed-off-by: Yury Norov <[email protected]>
2022-09-26lib/bitmap: add tests for find_nth_bit()Yury Norov2-2/+63
Add functional and performance tests for find_nth_bit(). Signed-off-by: Yury Norov <[email protected]>
2022-09-26lib: add find_nth{,_and,_andnot}_bit()Yury Norov1-0/+44
Kernel lacks for a function that searches for Nth bit in a bitmap. Usually people do it like this: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; We can do it more efficiently, if we: 1. find a word containing Nth bit, using hweight(); and 2. find the bit, using a helper fns(), that works similarly to __ffs() and ffz(). fns() is implemented as a simple loop. For x86_64, there's PDEP instruction to do that: ret = clz(pdep(1 << idx, num)). However, for large bitmaps the most of improvement comes from using hweight(), so I kept fns() simple. New find_nth_bit() is ~70 times faster on x86_64/kvm in find_bit benchmark: find_nth_bit: 7154190 ns, 16411 iterations for_each_bit: 505493126 ns, 16315 iterations With all that, a family of 3 new functions is added, and used where appropriate in the following patches. Signed-off-by: Yury Norov <[email protected]>
2022-09-26lib/bitmap: add bitmap_weight_and()Yury Norov1-9/+21
The function calculates Hamming weight of (bitmap1 & bitmap2). Now we have to do like this: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); This requires additional memory, adds pressure on alloc subsystem, and way less cache-friendly than just: weight = bitmap_weight_and(map1, map2, nbits); The following patches apply it for cpumask functions. Signed-off-by: Yury Norov <[email protected]>
2022-09-26lib/bitmap: don't call __bitmap_weight() in kernel codeYury Norov1-1/+1
__bitmap_weight() is not to be used directly in the kernel code because it's a helper for bitmap_weight(). Switch everything to bitmap_weight(). Signed-off-by: Yury Norov <[email protected]>
2022-09-24Makefile.debug: re-enable debug info for .S filesNick Desaulniers1-1/+3
Alexey reported that the fraction of unknown filename instances in kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down to assembler defined symbols, which regressed as a result of: commit b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") In that commit, I allude to restoring debug info for assembler defined symbols in a follow up patch, but it seems I forgot to do so in commit a66049e2cf0e ("Kbuild: make DWARF version a choice") Link: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=31bf18645d98b4d3d7357353be840e320649a67d Fixes: b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1") Reported-by: Alexey Alexandrov <[email protected]> Reported-by: Bill Wendling <[email protected]> Reported-by: Greg Thelen <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Suggested-by: Masahiro Yamada <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2022-09-23lib/sg_pool: change module_init(sg_pool_init) to subsys_initcallMasahiro Yamada1-14/+2
sg_alloc_table_chained() is called by several drivers, but if it is called before sg_pool_init(), it results in a NULL pointer dereference in sg_pool_alloc(). Since commit 9b1d6c895002 ("lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c"), we rely on module_init(sg_pool_init) is invoked before other module_init calls but this assumption is fragile. I slightly changed the link order while refactoring Kbuild, then uncovered this issue. I should keep the current link order, but depending on a specific call order among module_init is so fragile. We usually define the init order by specifying *_initcall correctly, or delay the driver probing by returning -EPROBE_DEFER. Change module_initcall() to subsys_initcall(), and also delete the pointless module_exit() because lib/sg_pool.c is always compiled as built-in. (CONFIG_SG_POOL is bool) Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/[email protected]/ Fixes: 9b1d6c895002 ("lib: scatterlist: move SG pool code from SCSI driver to lib/sg_pool.c") Reported-by: Guenter Roeck <[email protected]> Reported-by: Marek Szyprowski <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-09-21lib/find_bit: optimize find_next_bit() functionsYury Norov1-49/+70
Over the past couple years, the function _find_next_bit() was extended with parameters that modify its behavior to implement and- zero- and le- flavors. The parameters are passed at compile time, but current design prevents a compiler from optimizing out the conditionals. As find_next_bit() API grows, I expect that more parameters will be added. Current design would require more conditional code in _find_next_bit(), which would bloat the helper even more and make it barely readable. This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds a set of wrappers, so that the compile-time optimizations become possible. The common logic is moved to the new macro, and all flavors may be generated by providing a FETCH macro parameter, like in this example: #define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ... find_next_xornot_and_bit(addr1, addr2, addr3, size, start) { return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx], /* nop */, size, start); } The FETCH may be of any complexity, as soon as it only refers the bitmap(s) and an iterator idx. MUNGE is here to support _le code generation for BE builds. May be empty. I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top of 6.0-rc2 + this series. The results for kvm/x86_64 are: v6.0-rc2 Optimized Difference Z-score Random dense bitmap ns ns ns % find_next_bit: 787735 670546 117189 14.9 3.97 find_next_zero_bit: 777492 664208 113284 14.6 10.51 find_last_bit: 830925 687573 143352 17.3 2.35 find_first_bit: 3874366 3306635 567731 14.7 1.84 find_first_and_bit: 40677125 37739887 2937238 7.2 1.36 find_next_and_bit: 347865 304456 43409 12.5 1.35 Random sparse bitmap find_next_bit: 19816 14021 5795 29.2 6.10 find_next_zero_bit: 1318901 1223794 95107 7.2 1.41 find_last_bit: 14573 13514 1059 7.3 6.92 find_first_bit: 1313321 1249024 64297 4.9 1.53 find_first_and_bit: 8921 8098 823 9.2 4.56 find_next_and_bit: 9796 7176 2620 26.7 5.39 Where the statistics is significant (z-score > 3), the improvement is ~15%. According to the bloat-o-meter, the Image size is 10-11K less: x86_64/defconfig: add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177) arm64/defconfig: add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948) Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Yury Norov <[email protected]>
2022-09-21lib/find_bit: create find_first_zero_bit_le()Yury Norov1-0/+16
find_first_zero_bit_le() is an alias to find_next_zero_bit_le(), despite that 'next' is known to be slower than 'first' version. Now that we have common FIND_FIRST_BIT() macro helper, it's trivial to implement find_first_zero_bit_le() as a real function. Reviewed-by: Valentin Schneider <[email protected]> Signed-off-by: Yury Norov <[email protected]>
2022-09-21lib/find_bit: introduce FIND_FIRST_BIT() macroYury Norov1-25/+24
Now that we have many flavors of find_first_bit(), and expect even more, it's better to have one macro that generates optimal code for all and makes maintaining of slightly different functions simpler. The logic common to all versions is moved to the new macro, and all the flavors are generated by providing an FETCH macro-parameter, like in this example: #define FIND_FIRST_BIT(FETCH, MUNGE, size) ... find_first_ornot_and_bit(addr1, addr2, addr3, size) { return FIND_FIRST_BIT(addr1[idx] | ~addr2[idx] & addr3[idx], /* nop */, size); } The FETCH may be of any complexity, as soon as it only refers the bitmap(s) and an iterator idx. MUNGE is here to support _le code generation for BE builds. May be empty. Suggested-by: Linus Torvalds <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Signed-off-by: Yury Norov <[email protected]>
2022-09-20lib/cpumask: add FORCE_NR_CPUS config optionYury Norov1-0/+9
The size of cpumasks is hard-limited by compile-time parameter NR_CPUS, but defined at boot-time when kernel parses ACPI/DT tables, and stored in nr_cpu_ids. In many practical cases, number of CPUs for a target is known at compile time, and can be provided with NR_CPUS. In that case, compiler may be instructed to rely on NR_CPUS as on actual number of CPUs, not an upper limit. It allows to optimize many cpumask routines and significantly shrink size of the kernel image. This patch adds FORCE_NR_CPUS option to teach the compiler to rely on NR_CPUS and enable corresponding optimizations. If FORCE_NR_CPUS=y, kernel will not set nr_cpu_ids at boot, but only check that the actual number of possible CPUs is equal to NR_CPUS, and WARN if that doesn't hold. The new option is especially useful in embedded applications because kernel configurations are unique for each SoC, the number of CPUs is constant and known well, and memory limitations are typically harder. For my 4-CPU ARM64 build with NR_CPUS=4, FORCE_NR_CPUS=y saves 46KB: add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300) Signed-off-by: Yury Norov <[email protected]>
2022-09-19flex_proportions: Disable preemption entering the write section.Sebastian Andrzej Siewior1-0/+2
The seqcount fprop_global::sequence is not associated with a lock. The write section (fprop_new_period()) is invoked from a timer and since the softirq is preemptible on PREEMPT_RT it is possible to preempt the write section which is not desited. Disable preemption around the write section on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-19mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()Thomas Gleixner1-0/+3
Some places in the VM code expect interrupts disabled, which is a valid expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in some places because the RT spinlock substitution does not disable interrupts. To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places, provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and PREEMPT_RT=n. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Michal Hocko <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-09-14fortify: Adjust KUnit test for modular buildKees Cook1-2/+1
A much better "unknown size" string pointer is available directly from struct test, so use that instead of a global that isn't shared with modules. Reported-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/lkml/YyCOHOchVuE/[email protected] Fixes: 875bfd5276f3 ("fortify: Add KUnit test for FORTIFY_SOURCE internals") Cc: [email protected] Build-tested-by: Nathan Chancellor <[email protected]> Reviewed-by: David Gow <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2022-09-13ASoC: Merge tag 'v6.0-rc4' into asoc-6.1Mark Brown6-31/+48
Linux 6.0-rc4 so we can test on BeagleBone again.