aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-01-20Merge branch 'stmmac-fixes'David S. Miller1-16/+26
Yuji Ishikawa says: ==================== net: stmmac: dwmac-visconti: Fix bit definitions and clock configuration for RMII mode This series is a fix for RMII/MII operation mode of the dwmac-visconti driver. It is composed of two parts: * 1/2: fix constant definitions for cleared bits in ETHER_CLK_SEL register * 2/2: fix configuration of ETHER_CLK_SEL register for running in RMII operation mode. net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL v1 -> v2: - added Fixes tag to commit message net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode v1 -> v2: - added Fixes tag to commit message ==================== Signed-off-by: David S. Miller <[email protected]>
2022-01-20net: stmmac: dwmac-visconti: Fix clock configuration for RMII modeYuji Ishikawa1-11/+21
Bit pattern of the ETHER_CLOCK_SEL register for RMII/MII mode should be fixed. Also, some control bits should be modified with a specific sequence. Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver") Signed-off-by: Yuji Ishikawa <[email protected]> Reviewed-by: Nobuhiro Iwamatsu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-20net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SELYuji Ishikawa1-5/+5
just 0 should be used to represent cleared bits * ETHER_CLK_SEL_DIV_SEL_20 * ETHER_CLK_SEL_TX_CLK_EXT_SEL_IN * ETHER_CLK_SEL_RX_CLK_EXT_SEL_IN * ETHER_CLK_SEL_TX_CLK_O_TX_I * ETHER_CLK_SEL_RMII_CLK_SEL_IN Fixes: b38dd98ff8d0 ("net: stmmac: Add Toshiba Visconti SoCs glue driver") Signed-off-by: Yuji Ishikawa <[email protected]> Reviewed-by: Nobuhiro Iwamatsu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-20ipv6_tunnel: Rate limit warning messagesIdo Schimmel1-4/+4
The warning messages can be invoked from the data path for every packet transmitted through an ip6gre netdev, leading to high CPU utilization. Fix that by rate limiting the messages. Fixes: 09c6bbf090ec ("[IPV6]: Do mandatory IPv6 tunnel endpoint checks in realtime") Reported-by: Maksym Yaremchuk <[email protected]> Tested-by: Maksym Yaremchuk <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-20ethtool: Fix link extended state for big endianMoshe Tal1-1/+1
The link extended sub-states are assigned as enum that is an integer size but read from a union as u8, this is working for small values on little endian systems but for big endian this always give 0. Fix the variable in the union to match the enum size. Fixes: ecc31c60240b ("ethtool: Add link extended state") Signed-off-by: Moshe Tal <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Reviewed-by: Gal Pressman <[email protected]> Reviewed-by: Amit Cohen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-20net: phy: broadcom: hook up soft_reset for BCM54616SRobert Hancock1-0/+1
A problem was encountered with the Bel-Fuse 1GBT-SFP05 SFP module (which is a 1 Gbps copper module operating in SGMII mode with an internal BCM54616S PHY device) using the Xilinx AXI Ethernet MAC core, where the module would work properly on the initial insertion or boot of the device, but after the device was rebooted, the link would either only come up at 100 Mbps speeds or go up and down erratically. I found no meaningful changes in the PHY configuration registers between the working and non-working boots, but the status registers seemed to have a lot of error indications set on the SERDES side of the device on the non-working boot. I suspect the problem is that whatever happens on the SGMII link when the device is rebooted and the FPGA logic gets reloaded ends up putting the module's onboard PHY into a bad state. Since commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") the genphy_soft_reset call is not made automatically by the PHY core unless the callback is explicitly specified in the driver structure. For most of these Broadcom devices, there is probably a hardware reset that gets asserted to reset the PHY during boot, however for SFP modules (where the BCM54616S is commonly found) no such reset line exists, so if the board keeps the SFP cage powered up across a reboot, it will end up with no reset occurring during reboots. Hook up the genphy_soft_reset callback for BCM54616S to ensure that a PHY reset is performed before the device is initialized. This appears to fix the issue with erratic operation after a reboot with this SFP module. Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") Signed-off-by: Robert Hancock <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-20net: sched: Clarify error message when qdisc kind is unknownVictor Nogueira1-1/+1
When adding a tc rule with a qdisc kind that is not supported or not compiled into the kernel, the kernel emits the following error: "Error: Specified qdisc not found.". Found via tdc testing when ETS qdisc was not compiled in and it was not obvious right away what the message meant without looking at the kernel code. Change the error message to be more explicit and say the qdisc kind is unknown. Signed-off-by: Victor Nogueira <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-20net: fix information leakage in /proc/net/ptypeCongyu Liu3-1/+5
In one net namespace, after creating a packet socket without binding it to a device, users in other net namespaces can observe the new `packet_type` added by this packet socket by reading `/proc/net/ptype` file. This is minor information leakage as packet socket is namespace aware. Add a net pointer in `packet_type` to keep the net namespace of of corresponding packet socket. In `ptype_seq_show`, this net pointer must be checked when it is not NULL. Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.") Signed-off-by: Congyu Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-01-20Merge tag 'net-5.17-rc1' of ↵Linus Torvalds91-414/+1041
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, bpf. Quite a handful of old regression fixes but most of those are pre-5.16. Current release - regressions: - fix memory leaks in the skb free deferral scheme if upper layer protocols are used, i.e. in-kernel TCP readers like TLS Current release - new code bugs: - nf_tables: fix NULL check typo in _clone() functions - change the default to y for Vertexcom vendor Kconfig - a couple of fixes to incorrect uses of ref tracking - two fixes for constifying netdev->dev_addr Previous releases - regressions: - bpf: - various verifier fixes mainly around register offset handling when passed to helper functions - fix mount source displayed for bpffs (none -> bpffs) - bonding: - fix extraction of ports for connection hash calculation - fix bond_xmit_broadcast return value when some devices are down - phy: marvell: add Marvell specific PHY loopback - sch_api: don't skip qdisc attach on ingress, prevent ref leak - htb: restore minimal packet size handling in rate control - sfp: fix high power modules without diagnostic monitoring - mscc: ocelot: - don't let phylink re-enable TX PAUSE on the NPI port - don't dereference NULL pointers with shared tc filters - smsc95xx: correct reset handling for LAN9514 - cpsw: avoid alignment faults by taking NET_IP_ALIGN into account - phy: micrel: use kszphy_suspend/_resume for irq aware devices, avoid races with the interrupt Previous releases - always broken: - xdp: check prog type before updating BPF link - smc: resolve various races around abnormal connection termination - sit: allow encapsulated IPv6 traffic to be delivered locally - axienet: fix init/reset handling, add missing barriers, read the right status words, stop queues correctly - add missing dev_put() in sock_timestamping_bind_phc() Misc: - ipv4: prevent accidentally passing RTO_ONLINK to ip_route_output_key_hash() by sanitizing flags - ipv4: avoid quadratic behavior in netns dismantle - stmmac: dwmac-oxnas: add support for OX810SE - fsl: xgmac_mdio: add workaround for erratum A-009885" * tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits) ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys ipv4: avoid quadratic behavior in netns dismantle net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses dt-bindings: net: Document fsl,erratum-a009885 net/fsl: xgmac_mdio: Add workaround for erratum A-009885 net: mscc: ocelot: fix using match before it is set net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind() net: axienet: increase default TX ring size to 128 net: axienet: fix for TX busy handling net: axienet: fix number of TX ring slots for available check net: axienet: Fix TX ring slot available check net: axienet: limit minimum TX ring size net: axienet: add missing memory barriers net: axienet: reset core on initialization prior to MDIO access net: axienet: Wait for PhyRstCmplt after core reset net: axienet: increase reset timeout bpf, selftests: Add ringbuf memory type confusion test ...
2022-01-20Merge branch 'akpm' (patches from Andrew)Linus Torvalds77-873/+826
Merge more updates from Andrew Morton: "55 patches. Subsystems affected by this patch series: percpu, procfs, sysctl, misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2, hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan" * emailed patches from Andrew Morton <[email protected]>: (55 commits) lib: remove redundant assignment to variable ret ubsan: remove CONFIG_UBSAN_OBJECT_SIZE kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB btrfs: use generic Kconfig option for 256kB page size limit arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB configs: introduce debug.config for CI-like setup delayacct: track delays from memory compact Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact delayacct: cleanup flags in struct task_delay_info and functions use it delayacct: fix incomplete disable operation when switch enable to disable delayacct: support swapin delay accounting for swapping without blkio panic: remove oops_id panic: use error_report_end tracepoint on warnings fs/adfs: remove unneeded variable make code cleaner FAT: use io_schedule_timeout() instead of congestion_wait() hfsplus: use struct_group_attr() for memcpy() region nilfs2: remove redundant pointer sbufs fs/binfmt_elf: use PT_LOAD p_align values for static PIE const_structs.checkpatch: add frequently used ops structs ...
2022-01-20lib: remove redundant assignment to variable retColin Ian King1-2/+0
The variable ret is being assigned a value that is never read. If the for-loop is entered then ret is immediately re-assigned a new value. If the for-loop is not executed ret is never read. The assignment is redundant and can be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20ubsan: remove CONFIG_UBSAN_OBJECT_SIZEKees Cook3-36/+0
The object-size sanitizer is redundant to -Warray-bounds, and inappropriately performs its checks at run-time when all information needed for the evaluation is available at compile-time, making it quite difficult to use: https://bugzilla.kernel.org/show_bug.cgi?id=214861 With -Warray-bounds almost enabled globally, it doesn't make sense to keep this around. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Michal Marek <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTRMarco Elver2-1/+3
Until recent versions of GCC and Clang, it was not possible to disable KCOV instrumentation via a function attribute. The relevant function attribute was introduced in 540540d06e9d9 ("kcov: add __no_sanitize_coverage to fix noinstr for all architectures"). x86 was the first architecture to want a working noinstr, and at the time no compiler support for the attribute existed yet. Therefore, commit 0f1441b44e823 ("objtool: Fix noinstr vs KCOV") introduced the ability to NOP __sanitizer_cov_*() calls in .noinstr.text. However, this doesn't work for other architectures like arm64 and s390 that want a working noinstr per ARCH_WANTS_NO_INSTR. At the time of 0f1441b44e823, we didn't yet have ARCH_WANTS_NO_INSTR, but now we can move the Kconfig dependency checks to the generic KCOV option. KCOV will be available if: - architecture does not care about noinstr, OR - we have objtool support (like on x86), OR - GCC is 12.0 or newer, OR - Clang is 13.0 or newer. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KBNathan Chancellor1-0/+1
Commit b05fbcc36be1 ("btrfs: disable build on platforms having page size 256K") disabled btrfs for configurations that used a 256kB page size. However, it did not fully solve the problem because CONFIG_TEST_KMOD selects CONFIG_BTRFS, which does not account for the dependency. This results in a Kconfig warning and the failed BUILD_BUG_ON error returning. WARNING: unmet direct dependencies detected for BTRFS_FS Depends on [n]: BLOCK [=y] && !PPC_256K_PAGES && !PAGE_SIZE_256KB [=y] Selected by [m]: - TEST_KMOD [=m] && RUNTIME_TESTING_MENU [=y] && m && MODULES [=y] && NETDEVICES [=y] && NET_CORE [=y] && INET [=y] && BLOCK [=y] To resolve this, add CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency of CONFIG_TEST_KMOD so there is no more invalid configuration or build errors. Link: https://lkml.kernel.org/r/[email protected] Fixes: b05fbcc36be1 ("btrfs: disable build on platforms having page size 256K") Signed-off-by: Nathan Chancellor <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Chris Mason <[email protected]> Cc: David Sterba <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Nick Desaulniers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20btrfs: use generic Kconfig option for 256kB page size limitNathan Chancellor1-2/+1
Use the newly introduced CONFIG_PAGE_SIZE_LESS_THAN_256KB to describe the dependency introduced by commit b05fbcc36be1 ("btrfs: disable build on platforms having page size 256K"). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nathan Chancellor <[email protected]> Acked-by: David Sterba <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: kernel test robot <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Nick Desaulniers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KBNathan Chancellor1-0/+4
Patch series "Fix CONFIG_TEST_KMOD with 256kB page size". The kernel test robot reported a build error [1] from a failed assertion in fs/btrfs/inode.c with a hexagon randconfig that includes CONFIG_PAGE_SIZE_256KB. This error is the same one that was addressed by commit b05fbcc36be1 ("btrfs: disable build on platforms having page size 256K") but CONFIG_TEST_KMOD selects CONFIG_BTRFS without having the "page size less than 256kB dependency", which results in the error reappearing. The first patch introduces CONFIG_PAGE_SIZE_LESS_THAN_256KB by splitting it off from CONFIG_PAGE_SIZE_LESS_THAN_64KB, which was introduced in commit 1f0e290cc5fd ("arch: Add generic Kconfig option indicating page size smaller than 64k") for a similar reason in 5.16-rc3. The second patch uses that configuration option for CONFIG_BTRFS to reduce duplication. The third patch resolves the build error by adding CONFIG_PAGE_SIZE_LESS_THAN_256KB as a dependency to CONFIG_TEST_KMOD so that CONFIG_BTRFS does not get enabled under that invalid configuration. [1]: https://lore.kernel.org/r/[email protected]/ This patch (of 3): btrfs requires a page size smaller than 256kB. To use that dependency in other places, introduce CONFIG_PAGE_SIZE_LESS_THAN_256KB and reuse that dependency in CONFIG_PAGE_SIZE_LESS_THAN_64KB. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nathan Chancellor <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: David Sterba <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Nick Desaulniers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20configs: introduce debug.config for CI-like setupQian Cai1-0/+105
Some general debugging features like kmemleak, KASAN, lockdep, UBSAN etc help fix many viruses like a microscope. On the other hand, those features are scatter around and mixed up with more situational debugging options making them difficult to consume properly. This cold help amplify the general debugging/testing efforts and help establish sensitive default values for those options across the broad. This could also help different distros to collaborate on maintaining debug-flavored kernels. The config is based on years' experiences running daily CI inside the largest enterprise Linux distro company to seek regressions on linux-next builds on different bare-metal and virtual platforms. It can be used for example, $ make ARCH=arm64 defconfig debug.config Since KASAN and KCSAN can't be enabled together, we will need to create a separate one for KCSAN later as well. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qian Cai <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Cc: Marco Elver <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Daniel Thompson <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Naresh Kamboju <[email protected]> Cc: "Stephen Rothwell" <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20delayacct: track delays from memory compactwangyong5-2/+59
Delay accounting does not track the delay of memory compact. When there is not enough free memory, tasks can spend a amount of their time waiting for compact. To get the impact of tasks in direct memory compact, measure the delay when allocating memory through memory compact. Also update tools/accounting/getdelays.c: / # ./getdelays_next -di -p 304 print delayacct stats ON printing IO accounting PID 304 CPU count real total virtual total delay total delay average 277 780000000 849039485 18877296 0.068ms IO count delay total delay average 0 0 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 5 11088812685 2217ms THRASHING count delay total delay average 0 0 0ms COMPACT count delay total delay average 3 72758 0ms watch: read=0, write=0, cancelled_write=0 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: wangyong <[email protected]> Reviewed-by: Jiang Xuexin <[email protected]> Reviewed-by: Zhang Wenya <[email protected]> Reviewed-by: Yang Yang <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20Documentation/accounting/delay-accounting.rst: add thrashing page cache and ↵wangyong1-28/+27
direct compact Add thrashing page cache and direct compact related descriptions and update the usage of getdelays userspace utility. The following patches modifications have been updated: https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/1638619795-71451-1-git-send-email- [email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: wangyong <[email protected]> Reviewed-by: Yang Yang <[email protected]> Reported-by: Zeal Robot <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20delayacct: cleanup flags in struct task_delay_info and functions use itYang Yang1-17/+0
Flags in struct task_delay_info is used to distinguish the difference between swapin and blkio delay acountings. But after patch "delayacct: support swapin delay accounting for swapping without blkio", there is no need to do that since swapin and blkio delay accounting use their own functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Yang <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Zeal Robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20delayacct: fix incomplete disable operation when switch enable to disableYang Yang1-0/+18
When a task is created after delayacct is enabled, kernel will do all the delay accountings for that task. The problems is if user disables delayacct by set /proc/sys/kernel/task_delayacct to zero, only blkio delay accounting is disabled. Now disable all the kinds of delay accountings when /proc/sys/kernel/task_delayacct sets to zero. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Yang <[email protected]> Reported-by: Zeal Robot <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20delayacct: support swapin delay accounting for swapping without blkioYang Yang4-41/+43
Currently delayacct accounts swapin delay only for swapping that cause blkio. If we use zram for swapping, tools/accounting/getdelays can't get any SWAP delay. It's useful to get zram swapin delay information, for example to adjust compress algorithm or /proc/sys/vm/swappiness. Reference to PSI, it accounts any kind of swapping by doing its work in swap_readpage(), no matter whether swapping causes blkio. Let delayacct do the similar work. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Yang <[email protected]> Reported-by: Zeal Robot <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20panic: remove oops_idSebastian Andrzej Siewior1-18/+1
The oops id has been added as part of the end of trace marker for the kerneloops.org project. The id is used to automatically identify duplicate submissions of the same report. Identical looking reports with different a id can be considered as the same oops occurred again. The early initialisation of the oops_id can create a warning if the random core is not yet fully initialized. On PREEMPT_RT it is problematic if the id is initialized on demand from non preemptible context. The kernel oops project is not available since 2017. Remove the oops_id and use 0 in the output in case parser rely on it. Link: https://bugs.debian.org/953172 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20panic: use error_report_end tracepoint on warningsMarco Elver2-3/+7
Introduce the error detector "warning" to the error_report event and use the error_report_end tracepoint at the end of a warning report. This allows in-kernel tests but also userspace to more easily determine if a warning occurred without polling kernel logs. [[email protected]: add comma to enum list, per Andy] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Wei Liu <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: John Ogness <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Alexander Popov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20fs/adfs: remove unneeded variable make code cleanerMinghao Chi1-3/+1
Return value directly instead of taking this in a variable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Minghao Chi <[email protected]> Reported-by: Zeal Robot <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20FAT: use io_schedule_timeout() instead of congestion_wait()NeilBrown1-2/+3
congestion_wait() in this context is just a sleep - block devices do not support congestion signalling any more. The goal for this wait, which was introduced in commit ae78bf9c4f5f ("[PATCH] add -o flush for fat") is to wait for any recently written data to get to storage. We currently have no direct mechanism to do this, so a simple wait that behaves identically to the current congestion_wait() is the best we can do. This is a step towards removing congestion_wait() Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: NeilBrown <[email protected]> Acked-by: OGAWA Hirofumi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20hfsplus: use struct_group_attr() for memcpy() regionKees Cook2-6/+10
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add struct_group() to mark the "info" region (containing struct DInfo and struct DXInfo structs) in struct hfsplus_cat_folder and struct hfsplus_cat_file that are written into directly, so the compiler can correctly reason about the expected size of the writes. "pahole" shows no size nor member offset changes to struct hfsplus_cat_folder nor struct hfsplus_cat_file. "objdump -d" shows no object code changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Acked-by: Christian Brauner <[email protected]> Cc: Zhen Lei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20nilfs2: remove redundant pointer sbufsColin Ian King1-2/+2
Pointer sbufs is being assigned a value but it's not being used later on. The pointer is redundant and can be removed. Cleans up scan-build static analysis warning: fs/nilfs2/page.c:203:8: warning: Although the value stored to 'sbufs' is used in the enclosing expression, the value is never actually read from 'sbufs' [deadcode.DeadStores] sbh = sbufs = page_buffers(src); Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20fs/binfmt_elf: use PT_LOAD p_align values for static PIEH.J. Lu1-2/+2
Extend commit ce81bb256a22 ("fs/binfmt_elf: use PT_LOAD p_align values for suitable start address") which fixed PIE binaries built with -Wl,-z,max-page-size=0x200000, to cover static PIE binaries. This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=215275 Tested by verifying static PIE binaries with -Wl,-z,max-page-size=0x200000 loading. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: H.J. Lu <[email protected]> Cc: Chris Kennelly <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Song Liu <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sandeep Patil <[email protected]> Cc: Fangrui Song <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20const_structs.checkpatch: add frequently used ops structsRikard Falkeborn1-0/+23
Add commonly used structs (>50 instances) which are always or almost always const. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rikard Falkeborn <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20checkpatch: improve Kconfig help testJoe Perches1-26/+26
The Kconfig help test erroneously counts patch context lines as part of the help text. Fix that and improve the message block output. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joe Perches <[email protected]> Tested-by: Randy Dunlap <[email protected]> Acked-by: Randy Dunlap <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Dwaipayan Ray <[email protected]> Cc: Lukas Bulwahn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20checkpatch: relax regexp for COMMIT_LOG_LONG_LINEJerome Forissier1-1/+1
One exceptions to the COMMIT_LOG_LONG_LINE rule is a file path followed by ':'. That is typically some sort diagnostic message from a compiler or a build tool, in which case we don't want to wrap the lines but keep the message unmodified. The regular expression used to match this pattern currently doesn't accept absolute paths or + characters. This can result in false positives as in the following (out-of-tree) example: ... /home/jerome/work/optee_repo_qemu/build/../toolchains/aarch32/bin/arm-linux-gnueabihf-ld.bfd: /home/jerome/work/toolchains-gcc10.2/aarch32/bin/../lib/gcc/arm-none-linux-gnueabihf/10.2.1/../../../../arm-none-linux-gnueabihf/lib/libstdc++.a(eh_alloc.o): in function `__cxa_allocate_exception': /tmp/dgboter/bbs/build03--cen7x86_64/buildbot/cen7x86_64--arm-none-linux-gnueabihf/build/src/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:284: undefined reference to `malloc' ... Update the regular expression to match the above paths. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jerome Forissier <[email protected]> Acked-by: Joe Perches <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Dwaipayan Ray <[email protected]> Cc: Lukas Bulwahn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20lib/test_meminit: destroy cache in kmem_cache_alloc_bulk() testAndrey Konovalov1-0/+1
Make do_kmem_cache_size_bulk() destroy the cache it creates. Link: https://lkml.kernel.org/r/aced20a94bf04159a139f0846e41d38a1537debb.1640018297.git.andreyknvl@google.com Fixes: 03a9349ac0e0 ("lib/test_meminit: add a kmem_cache_alloc_bulk() test") Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20uuid: remove licence boilerplate text from the headerAndy Shevchenko1-9/+0
Remove licence boilerplate text from the UAPI header. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20uuid: discourage people from using UAPI header in new codeAndy Shevchenko1-0/+1
Discourage people from using UAPI header in new code by adding a note. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20kunit: replace kernel.h with the necessary inclusionsAndy Shevchenko1-1/+1
When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Brendan Higgins <[email protected]> Tested-by: Brendan Higgins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20test_hash.c: refactor into kunitIsabella Basso3-143/+81
Use KUnit framework to make tests more easily integrable with CIs. Even though these tests are not yet properly written as unit tests this change should help in debugging. Also remove kernel messages (i.e. through pr_info) as KUnit handles all debugging output and let it handle module init and exit details. Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: David Gow <[email protected]> Reported-by: kernel test robot <[email protected]> Tested-by: David Gow <[email protected]> Co-developed-by: Augusto Durães Camargo <[email protected]> Signed-off-by: Augusto Durães Camargo <[email protected]> Co-developed-by: Enzo Ferreira <[email protected]> Signed-off-by: Enzo Ferreira <[email protected]> Signed-off-by: Isabella Basso <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Daniel Latypov <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20lib/Kconfig.debug: properly split hash test kernel entriesIsabella Basso2-4/+13
Split TEST_HASH so that each entry only has one file. Note that there's no stringhash test file, but actually <linux/stringhash.h> tests are performed in lib/test_hash.c. Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: David Gow <[email protected]> Tested-by: David Gow <[email protected]> Signed-off-by: Isabella Basso <[email protected]> Cc: Augusto Durães Camargo <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Daniel Latypov <[email protected]> Cc: Enzo Ferreira <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: kernel test robot <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20test_hash.c: split test_hash_initIsabella Basso1-12/+54
Split up test_hash_init so that it calls each test more explicitly insofar it is possible without rewriting the entire file. This aims at improving readability. Split tests performed on string_or as they don't interfere with those performed in hash_or. Also separate pr_info calls about skipped tests as they're not part of the tests themselves, but only warn about (un)defined arch-specific hash functions. Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: David Gow <[email protected]> Tested-by: David Gow <[email protected]> Signed-off-by: Isabella Basso <[email protected]> Cc: Augusto Durães Camargo <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Daniel Latypov <[email protected]> Cc: Enzo Ferreira <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: kernel test robot <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20test_hash.c: split test_int_hash into arch-specific functionsIsabella Basso1-29/+62
Split the test_int_hash function to keep its mainloop separate from arch-specific chunks, which are only compiled as needed. This aims at improving readability. Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: David Gow <[email protected]> Tested-by: David Gow <[email protected]> Signed-off-by: Isabella Basso <[email protected]> Cc: Augusto Durães Camargo <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Daniel Latypov <[email protected]> Cc: Enzo Ferreira <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: kernel test robot <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20hash.h: remove unused define directiveIsabella Basso4-33/+4
Patch series "test_hash.c: refactor into KUnit", v3. We refactored the lib/test_hash.c file into KUnit as part of the student group LKCAMP [1] introductory hackathon for kernel development. This test was pointed to our group by Daniel Latypov [2], so its full conversion into a pure KUnit test was our goal in this patch series, but we ran into many problems relating to it not being split as unit tests, which complicated matters a bit, as the reasoning behind the original tests is quite cryptic for those unfamiliar with hash implementations. Some interesting developments we'd like to highlight are: - In patch 1/5 we noticed that there was an unused define directive that could be removed. - In patch 4/5 we noticed how stringhash and hash tests are all under the lib/test_hash.c file, which might cause some confusion, and we also broke those kernel config entries up. Overall KUnit developments have been made in the other patches in this series: In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c file so as to make it more compatible with the KUnit style, whilst preserving the original idea of the maintainer who designed it (i.e. George Spelvin), which might be undesirable for unit tests, but we assume it is enough for a first patch. This patch (of 5): Currently, there exist hash_32() and __hash_32() functions, which were introduced in a patch [1] targeting architecture specific optimizations. These functions can be overridden on a per-architecture basis to achieve such optimizations. They must set their corresponding define directive (HAVE_ARCH_HASH_32 and HAVE_ARCH__HASH_32, respectively) so that header files can deal with these overrides properly. As the supported 32-bit architectures that have their own hash function implementation (i.e. m68k, Microblaze, H8/300, pa-risc) have only been making use of the (more general) __hash_32() function (which only lacks a right shift operation when compared to the hash_32() function), remove the define directive corresponding to the arch-specific hash_32() implementation. [1] https://lore.kernel.org/lkml/[email protected]/ [[email protected]: hash_32_generic() becomes hash_32()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: David Gow <[email protected]> Tested-by: David Gow <[email protected]> Co-developed-by: Augusto Durães Camargo <[email protected]> Signed-off-by: Augusto Durães Camargo <[email protected]> Co-developed-by: Enzo Ferreira <[email protected]> Signed-off-by: Enzo Ferreira <[email protected]> Signed-off-by: Isabella Basso <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Brendan Higgins <[email protected]> Cc: Daniel Latypov <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20lib/list_debug.c: print more list debugging context in __list_del_entry_valid()Zhen Lei1-4/+4
Currently, the entry->prev and entry->next are considered to be valid as long as they are not LIST_POISON{1|2}. However, the memory may be corrupted. The prev->next is invalid probably because 'prev' is invalid, not because prev->next's content is illegal. Unfortunately, the printk and its subfunctions will modify the registers that hold the 'prev' and 'next', and we don't see this valuable information in the BUG context. So print the contents of 'entry->prev' and 'entry->next'. Here's an example: list_del corruption. prev->next should be c0ecbf74, but was c08410dc kernel BUG at lib/list_debug.c:53! ... ... PC is at __list_del_entry_valid+0x58/0x98 LR is at __list_del_entry_valid+0x58/0x98 psr: 60000093 sp : c0ecbf30 ip : 00000000 fp : 00000001 r10: c08410d0 r9 : 00000001 r8 : c0825e0c r7 : 20000013 r6 : c08410d0 r5 : c0ecbf74 r4 : c0ecbf74 r3 : c0825d08 r2 : 00000000 r1 : df7ce6f4 r0 : 00000044 ... ... Stack: (0xc0ecbf30 to 0xc0ecc000) bf20: c0ecbf74 c0164fd0 c0ecbf70 c0165170 bf40: c0eca000 c0840c00 c0840c00 c0824500 c0825e0c c0189bbc c088f404 60000013 bf60: 60000013 c0e85100 000004ec 00000000 c0ebcdc0 c0ecbf74 c0ecbf74 c0825d08 bf80: c0e807c0 c018965c 00000000 c013f2a0 c0e807c0 c013f154 00000000 00000000 bfa0: 00000000 00000000 00000000 c01001b0 00000000 00000000 00000000 00000000 bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 (__list_del_entry_valid) from (__list_del_entry+0xc/0x20) (__list_del_entry) from (finish_swait+0x60/0x7c) (finish_swait) from (rcu_gp_kthread+0x560/0xa20) (rcu_gp_kthread) from (kthread+0x14c/0x15c) (kthread) from (ret_from_fork+0x14/0x24) At first, I thought prev->next was overwritten. Later, I carefully analyzed the RCU code and the disassembly code. The error occurred when deleting a node from the list rcu_state.gp_wq. The System.map shows that the address of rcu_state is c0840c00. Then I use gdb to obtain the offset of rcu_state.gp_wq.task_list. (gdb) p &((struct rcu_state *)0)->gp_wq.task_list $1 = (struct list_head *) 0x4dc Again: list_del corruption. prev->next should be c0ecbf74, but was c08410dc c08410dc = c0840c00 + 0x4dc = &rcu_state.gp_wq.task_list Because rcu_state.gp_wq has at most one node, so I can guess that "prev = &rcu_state.gp_wq.task_list". But for other scenes, maybe I wasn't so lucky, I cannot figure out the value of 'prev'. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20list: introduce list_is_head() helper and re-use it in list.hAndy Shevchenko1-14/+22
Introduce list_is_head() in the similar (*) way as it's done for list_entry_is_head(). Make use of it in the list.h. *) it's done as inliner and not a macro to be aligned with other list_is_*() APIs; while at it, make all three to have the same style. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Cc: Heikki Krogerus <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20kstrtox: uninline everythingAlexey Dobriyan1-0/+12
I've made a mistake of looking into lib/kstrtox.o code generation. The only function remotely performance critical is _parse_integer() (via /proc/*/map_files/*), everything else is not. Uninline everything, shrink lib/kstrtox.o by ~20 % ! Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/23 up/down: 0/-1269 (-1269 !!!) Function old new delta kstrtoull 16 13 -3 kstrtouint 59 48 -11 kstrtou8 60 49 -11 kstrtou16 61 50 -11 _kstrtoul 46 35 -11 kstrtoull_from_user 95 83 -12 kstrtoul_from_user 95 83 -12 kstrtoll 93 80 -13 kstrtouint_from_user 124 83 -41 kstrtou8_from_user 125 83 -42 kstrtou16_from_user 126 83 -43 kstrtos8 101 50 -51 kstrtos16 102 51 -51 kstrtoint 100 49 -51 _kstrtol 93 35 -58 kstrtobool_from_user 156 75 -81 kstrtoll_from_user 165 83 -82 kstrtol_from_user 165 83 -82 kstrtoint_from_user 172 83 -89 kstrtos8_from_user 173 83 -90 kstrtos16_from_user 174 83 -91 _parse_integer 136 10 -126 _kstrtoull 308 101 -207 Total: Before=3421236, After=3419967, chg -0.04% Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20get_maintainer: don't remind about no git repo when --nogit is usedRandy Dunlap1-1/+1
When --nogit is used with scripts/get_maintainer.pl, the script spews 4 lines of unnecessary information (noise). Do not print those lines when --nogit is specified. This change removes the printing of these 4 lines: ./scripts/get_maintainer.pl: No supported VCS found. Add --nogit to options? Using a git repository produces better results. Try Linus Torvalds' latest git repository using: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP)Davidlohr Bueso1-8/+8
PRIO_PGRP needs the tasklist_lock mainly to serialize vs setpgid(2), to protect against any concurrent change_pid(PIDTYPE_PGID) that can move the task from one hlist to another while iterating. However, the remaining can only rely only on RCU: PRIO_PROCESS only does the task lookup and never iterates over tasklist and we already have an rcu-aware stable pointer. PRIO_USER is already racy vs setuid(2) so with creds being rcu protected, we can end up seeing stale data. When removing the tasklist_lock there can be a race with (i) fork but this is benign as the child's nice is inherited and the new task is not observable by the user yet either, hence the return semantics do not differ. And (ii) a race with exit, which is a small window and can cause us to miss a task which was removed from the list and it had the highest nice. Similarly change the buggy do_each_thread/while_each_thread combo in PRIO_USER for the rcu-safe for_each_process_thread flavor, which doesn't make use of next_thread/p->thread_group. [[email protected]: coding style fixes] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20kthread: dynamically allocate memory to store kthread's full nameYafang Shao3-2/+34
When I was implementing a new per-cpu kthread cfs_migration, I found the comm of it "cfs_migration/%u" is truncated due to the limitation of TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 all have the same name "cfs_migration/1", which will confuse the user. This issue is not critical, because we can get the corresponding CPU from the task's Cpus_allowed. But for kthreads corresponding to other hardware devices, it is not easy to get the detailed device info from task comm, for example, jbd2/nvme0n1p2- xfs-reclaim/sdf Currently there are so many truncated kthreads: rcu_tasks_kthre rcu_tasks_rude_ rcu_tasks_trace poll_mpt3sas0_s ext4-rsv-conver xfs-reclaim/sd{a, b, c, ...} xfs-blockgc/sd{a, b, c, ...} xfs-inodegc/sd{a, b, c, ...} audit_send_repl ecryptfs-kthrea vfio-irqfd-clea jbd2/nvme0n1p2- ... We can shorten these names to work around this problem, but it may be not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for example, it is a nice name, and it is not a good idea to shorten it. One possible way to fix this issue is extending the task comm size, but as task->comm is used in lots of places, that may cause some potential buffer overflows. Another more conservative approach is introducing a new pointer to store kthread's full name if it is truncated, which won't introduce too much overhead as it is in the non-critical path. Finally we make a dicision to use the second approach. See also the discussions in this thread: https://lore.kernel.org/lkml/[email protected]/ After this change, the full name of these truncated kthreads will be displayed via /proc/[pid]/comm: rcu_tasks_kthread rcu_tasks_rude_kthread rcu_tasks_trace_kthread poll_mpt3sas0_statu ext4-rsv-conversion xfs-reclaim/sdf1 xfs-blockgc/sdf1 xfs-inodegc/sdf1 audit_send_reply ecryptfs-kthread vfio-irqfd-cleanup jbd2/nvme0n1p2-8 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Suggested-by: Petr Mladek <[email protected]> Suggested-by: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Michal Miroslaw <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Al Viro <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LENYafang Shao3-8/+13
As the sched:sched_switch tracepoint args are derived from the kernel, we'd better make it same with the kernel. So the macro TASK_COMM_LEN is converted to type enum, then all the BPF programs can get it through BTF. The BPF program which wants to use TASK_COMM_LEN should include the header vmlinux.h. Regarding the test_stacktrace_map and test_tracepoint, as the type defined in linux/bpf.h are also defined in vmlinux.h, so we don't need to include linux/bpf.h again. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Michal Miroslaw <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Al Viro <[email protected]> Cc: Kees Cook <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Dennis Dalessandro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20tools/bpf/bpftool/skeleton: replace bpf_probe_read_kernel with ↵Yafang Shao1-2/+2
bpf_probe_read_kernel_str to get task comm bpf_probe_read_kernel_str() will add a nul terminator to the dst, then we don't care about if the dst size is big enough. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Michal Miroslaw <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Al Viro <[email protected]> Cc: Kees Cook <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Dennis Dalessandro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20samples/bpf/test_overhead_kprobe_kern: replace bpf_probe_read_kernel with ↵Yafang Shao3-9/+11
bpf_probe_read_kernel_str to get task comm bpf_probe_read_kernel_str() will add a nul terminator to the dst, then we don't care about if the dst size is big enough. This patch also replaces the hard-coded 16 with TASK_COMM_LEN to make it grepable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Michal Miroslaw <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Al Viro <[email protected]> Cc: Kees Cook <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Dennis Dalessandro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>