aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-05-27objtool: Add CONFIG_HAVE_UACCESS_VALIDATIONJosh Poimboeuf4-2/+9
Allow an arch specify that it has objtool uaccess validation with CONFIG_HAVE_UACCESS_VALIDATION. For now, doing so unconditionally selects CONFIG_OBJTOOL. Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/d393d5e2fe73aec6e8e41d5c24f4b6fe8583f2d8.1650384225.git.jpoimboe@redhat.com
2022-05-27x86/mm: Use PAGE_ALIGNED(x) instead of IS_ALIGNED(x, PAGE_SIZE)Fanjun Kong1-4/+4
The <linux/mm.h> already provides the PAGE_ALIGNED() macro. Let's use this macro instead of IS_ALIGNED() and passing PAGE_SIZE directly. No change in functionality. [ mingo: Tweak changelog. ] Signed-off-by: Fanjun Kong <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-27x86: Fix all occurences of the "the the" typoBo Liu3-3/+3
Rather than waiting for the bots to fix these one-by-one, fix all occurences of "the the" throughout arch/x86. Signed-off-by: Bo Liu <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Paolo Bonzini <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-27perf/core: Remove unused local variableHaowen Bai1-1/+0
Drop LIST_HEAD() where the variable it declares is never used. Compiler probably never warned us, because the LIST_HEAD() initializer is technically 'usage'. [ mingo: Tweak changelog. ] Signed-off-by: Haowen Bai <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller4-25/+23
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following contain more Netfilter fixes for net: 1) syzbot warning in nfnetlink bind, from Florian. 2) Refetch conntrack after __nf_conntrack_confirm(), from Florian Westphal. 3) Move struct nf_ct_timeout back at the bottom of the ctnl_time, to where it before recent update, also from Florian. 4) Add NL_SET_BAD_ATTR() to nf_tables netlink for proper set element commands error reporting. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-05-27netfilter: nf_tables: set element extended ACK reporting supportPablo Neira Ayuso1-3/+9
Report the element that causes problems via netlink extended ACK for set element commands. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-27netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exitFlorian Westphal1-2/+3
syzbot reports: BUG: KASAN: slab-out-of-bounds in __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42 [..] list_del include/linux/list.h:148 [inline] cttimeout_net_exit+0x211/0x540 net/netfilter/nfnetlink_cttimeout.c:617 No reproducer so far. Looking at recent changes in this area its clear that the free_head must not be at the end of the structure because nf_ct_timeout structure has variable size. Reported-by: <[email protected]> Fixes: 78222bacfca9 ("netfilter: cttimeout: decouple unlink and free on netns destruction") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-27netfilter: conntrack: re-fetch conntrack after insertionFlorian Westphal1-1/+6
In case the conntrack is clashing, insertion can free skb->_nfct and set skb->_nfct to the already-confirmed entry. This wasn't found before because the conntrack entry and the extension space used to free'd after an rcu grace period, plus the race needs events enabled to trigger. Reported-by: <[email protected]> Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-27netfilter: nfnetlink: fix warn in nfnetlink_unbindFlorian Westphal1-19/+5
syzbot reports following warn: WARNING: CPU: 0 PID: 3600 at net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0 net/netfilter/nfnetlink.c:694 The syzbot generated program does this: socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3 setsockopt(3, SOL_NETLINK, NETLINK_DROP_MEMBERSHIP, [1], 4) = 0 ... which triggers 'WARN_ON_ONCE(nfnlnet->ctnetlink_listeners == 0)' check. Instead of counting, just enable reporting for every bind request and check if we still have listeners on unbind. While at it, also add the needed bounds check on nfnl_group2type[] access. Reported-by: <[email protected]> Reported-by: <[email protected]> Fixes: 2794cdb0b97b ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-05-27xen: switch gnttab_end_foreign_access() to take a struct page pointerJuergen Gross11-32/+28
Instead of a virtual kernel address use a pointer of the associated struct page as second parameter of gnttab_end_foreign_access(). Most users have that pointer available already and are creating the virtual address from it, risking problems in case the memory is located in highmem. gnttab_end_foreign_access() itself won't need to get the struct page from the address again. Suggested-by: Jan Beulich <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2022-05-27Merge tag 'timers-v5.19-rc1' of ↵Thomas Gleixner18-98/+22
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clockevent/clocksource driver updates from Daniel Lezcano: - Add Mediatek MT8186 DT bindings (Allen-KH Cheng) - Remove dead code corresponding of the IXP4xx board removal (Linus Walleij) - Add CLOCK_EVT_FEAT_C3STOP flag for the RISC-V SBI timer (Samuel Holland) - Do not return an error if there are multiple definitions of the sp804 timers in the DT (Andre Przywara) - Add the missing SPDX identifier (Thomas Gleixner) - Remove an unncessary NULL check as it is done right before at probe time for the timer-ti-dm (Dan Carpenter) - Fix the irq_of_parse_and_map() return code check on onexas-nps (Krzysztof Kozlowski) Link: https://lore.kernel.org/lkml/[email protected]
2022-05-27kbuild: replace $(if A,A,B) with $(or A,B) in scripts/Makefile.modpostMasahiro Yamada1-1/+1
Similar cleanup to commit 5c8166419acf ("kbuild: replace $(if A,A,B) with $(or A,B)"). Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]>
2022-05-27modpost: squash if...else-if in find_elf_symbol2()Masahiro Yamada1-7/+3
if ((addr - sym->st_value) < distance) { distance = addr - sym->st_value; near = sym; } else if ((addr - sym->st_value) == distance) { near = sym; } is equivalent to: if (addr - sym->st_value <= distance) { distance = addr - sym->st_value; near = sym; } (The else-if block can overwrite 'distance' with the same value). Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]>
2022-05-27modpost: reuse ARRAY_SIZE() macro for section_mismatch()Masahiro Yamada3-6/+6
Move ARRAY_SIZE() from file2alias.c to modpost.h to reuse it in section_mismatch(). Also, move the variable 'check' inside the for-loop. Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]>
2022-05-27modpost: remove the unused argument of check_sec_ref()Masahiro Yamada1-3/+2
check_sec_ref() does not use the first parameter 'mod'. Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]>
2022-05-27modpost: fix undefined behavior of is_arm_mapping_symbol()Masahiro Yamada1-1/+2
The return value of is_arm_mapping_symbol() is unpredictable when "$" is passed in. strchr(3) says: The strchr() and strrchr() functions return a pointer to the matched character or NULL if the character is not found. The terminating null byte is considered part of the string, so that if c is specified as '\0', these functions return a pointer to the terminator. When str[1] is '\0', strchr("axtd", str[1]) is not NULL, and str[2] is referenced (i.e. buffer overrun). Test code --------- char str1[] = "abc"; char str2[] = "ab"; strcpy(str1, "$"); strcpy(str2, "$"); printf("test1: %d\n", is_arm_mapping_symbol(str1)); printf("test2: %d\n", is_arm_mapping_symbol(str2)); Result ------ test1: 0 test2: 1 Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]>
2022-05-27modpost: fix removing numeric suffixesAlexander Lobakin1-1/+1
With the `-z unique-symbol` linker flag or any similar mechanism, it is possible to trigger the following: ERROR: modpost: "param_set_uint.0" [vmlinux] is a static EXPORT_SYMBOL The reason is that for now the condition from remove_dot(): if (m && (s[n + m] == '.' || s[n + m] == 0)) which was designed to test if it's a dot or a '\0' after the suffix is never satisfied. This is due to that `s[n + m]` always points to the last digit of a numeric suffix, not on the symbol next to it (from a custom debug print added to modpost): param_set_uint.0, s[n + m] is '0', s[n + m + 1] is '\0' So it's off-by-one and was like that since 2014. Fix this for the sake of any potential upcoming features, but don't bother stable-backporting, as it's well hidden -- apart from that LD flag, it can be triggered only with GCC LTO which never landed upstream. Fixes: fcd38ed0ff26 ("scripts: modpost: fix compilation warning") Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2022-05-27um: Fix out-of-bounds read in LDT setupVincent Whitchurch1-2/+4
syscall_stub_data() expects the data_count parameter to be the number of longs, not bytes. ================================================================== BUG: KASAN: stack-out-of-bounds in syscall_stub_data+0x70/0xe0 Read of size 128 at addr 000000006411f6f0 by task swapper/1 CPU: 0 PID: 1 Comm: swapper Not tainted 5.18.0+ #18 Call Trace: show_stack.cold+0x166/0x2a7 __dump_stack+0x3a/0x43 dump_stack_lvl+0x1f/0x27 print_report.cold+0xdb/0xf81 kasan_report+0x119/0x1f0 kasan_check_range+0x3a3/0x440 memcpy+0x52/0x140 syscall_stub_data+0x70/0xe0 write_ldt_entry+0xac/0x190 init_new_ldt+0x515/0x960 init_new_context+0x2c4/0x4d0 mm_init.constprop.0+0x5ed/0x760 mm_alloc+0x118/0x170 0x60033f48 do_one_initcall+0x1d7/0x860 0x60003e7b kernel_init+0x6e/0x3d4 new_thread_handler+0x1e7/0x2c0 The buggy address belongs to stack of task swapper/1 and is located at offset 64 in frame: init_new_ldt+0x0/0x960 This frame has 2 objects: [32, 40) 'addr' [64, 80) 'desc' ================================================================== Fixes: 858259cf7d1c443c83 ("uml: maintain own LDT entries") Signed-off-by: Vincent Whitchurch <[email protected]> Cc: [email protected] Signed-off-by: Richard Weinberger <[email protected]>
2022-05-27um: chan_user: Fix winch_tramp() return valueJohannes Berg1-4/+5
The previous fix here was only partially correct, it did result in returning a proper error value in case of error, but it also clobbered the pid that we need to return from this function (not just zero for success). As a result, it returned 0 here, but later this is treated as a pid and used to kill the process, but since it's now 0 we kill(0, SIGKILL), which makes UML kill itself rather than just the helper thread. Fix that and make it more obvious by using a separate variable for the pid. Fixes: ccf1236ecac4 ("um: fix error return code in winch_tramp()") Reported-and-tested-by: Nathan Chancellor <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Cc: [email protected] Signed-off-by: Richard Weinberger <[email protected]>
2022-05-27um: virtio_uml: Fix broken device handling in time-travelJohannes Berg1-10/+23
If a device implementation crashes, virtio_uml will mark it as dead by calling virtio_break_device() and scheduling the work that will remove it. This still seems like the right thing to do, but it's done directly while reading the message, and if time-travel is used, this is in the time-travel handler, outside of the normal Linux machinery. Therefore, we cannot acquire locks or do normal "linux-y" things because e.g. lockdep will be confused about the context. Move handling this situation out of the read function and into the actual IRQ handler and response handling instead, so that in the case of time-travel we don't call it in the wrong context. Chances are the system will still crash immediately, since the device implementation crashing may also cause the time- travel controller to go down, but at least all of that now happens without strange warnings from lockdep. Fixes: c8177aba37ca ("um: time-travel: rework interrupt handling in ext mode") Cc: [email protected] Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2022-05-27um: line: Use separate IRQs per lineJohannes Berg6-33/+29
Today, all possible serial lines (ssl*=) as well as all possible consoles (con*=) each share a single interrupt (with a fixed number) with others of the same type. Now, if you have two lines, say ssl0 and ssl1, and one of them is connected to an fd you cannot read (e.g. a file), but the other gets a read interrupt, then both of them get the interrupt since it's shared. Then, the read() call will return EOF, since it's a file being written and there's nothing to read (at least not at the current offset, at the end). Unfortunately, this is treated as a read error, and we close this line, losing all the possible output. It might be possible to work around this and make the IRQ sharing work, however, now that we have dynamically allocated IRQs that are easy to use, simply use that to achieve separating between the events; then there's no interrupt for that line and we never attempt the read in the first place, thus not closing the line. This manifested itself in the wifi hostap/hwsim tests where the parallel script communicates via one serial console and the kernel messages go to another (a file) and sending data on the communication console caused the kernel messages to stop flowing into the file. Reported-by: Jouni Malinen <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: anton ivanov <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2022-05-27net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_registerMiaoqian Lin1-0/+1
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when done. mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register(). We don't need the device node after it. Add missing of_node_put() to avoid refcount leak. Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") Signed-off-by: Miaoqian Lin <[email protected]> Reviewed-by: Marek Behún <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-27um: Enable ARCH_HAS_GCOV_PROFILE_ALLVincent Whitchurch1-0/+1
Enable ARCH_HAS_GCOV_PROFILE_ALL so that CONFIG_GCOV_PROFILE_ALL can be selected on UML. I didn't need to explicitly disable GCOV on anything to get this to work on the configs I tested. Signed-off-by: Vincent Whitchurch <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2022-05-27um: Use asm-generic/dma-mapping.hJohannes Berg1-0/+1
If DMA (PCI over virtio) is enabled, then some drivers may enable CONFIG_DMA_OPS as well, and then we pull in the x86 definition of get_arch_dma_ops(), which uses the dma_ops symbol, which isn't defined. Since we don't have real DMA ops nor any kind of IOMMU fix this in the simplest possible way: pull in the asm-generic file instead of inheriting the x86 one. It's not clear why those drivers that do (e.g. VDPA) "select DMA_OPS", and if they'd even work with this, but chances are nobody will be wanting to do that anyway, so fixing the build failure is good enough. Reported-by: Randy Dunlap <[email protected]> Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Johannes Berg <[email protected]> Tested-by: Randy Dunlap <[email protected]> Acked-by: Randy Dunlap <[email protected]> Cc: [email protected] Signed-off-by: Richard Weinberger <[email protected]>
2022-05-27net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaksMiaoqian Lin1-1/+2
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release the refcount in error case. Add missing of_node_put() to avoid refcount leak. Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support") Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-27um: daemon: Make default socket configurableJohannes Berg2-1/+9
Even if daemon network is deprecated, some configurations may still use it (e.g. Debian), and not want to default to the /tmp/uml.ctl socket location. Allow configuring the default socket location. Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2022-05-27net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()Dan Carpenter1-0/+3
The "fsp->location" variable comes from user via ethtool_get_rxnfc(). Check that it is valid to prevent an out of bounds read. Fixes: 7aab747e5563 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-05-27scripts/kallsyms: update usage message of the kallsyms programYuntao Wang1-1/+1
The kallsyms program supports --absolute-percpu option but does not display it in the usage message, fix it. Signed-off-by: Yuntao Wang <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2022-05-27kbuild: Fix include path in scripts/Makefile.modpostJing Leng1-2/+1
When building an external module, if users don't need to separate the compilation output and source code, they run the following command: "make -C $(LINUX_SRC_DIR) M=$(PWD)". At this point, "$(KBUILD_EXTMOD)" and "$(src)" are the same. If they need to separate them, they run "make -C $(KERNEL_SRC_DIR) O=$(KERNEL_OUT_DIR) M=$(OUT_DIR) src=$(PWD)". Before running the command, they need to copy "Kbuild" or "Makefile" to "$(OUT_DIR)" to prevent compilation failure. So the kernel should change the included path to avoid the copy operation. Signed-off-by: Jing Leng <[email protected]> [masahiro: I do not think "M=$(OUT_DIR) src=$(PWD)" is the official way, but this patch is a nice clean up anyway.] Signed-off-by: Masahiro Yamada <[email protected]>
2022-05-27um: xterm: Make default terminal emulator configurableJohannes Berg3-3/+13
Make the default terminal emulator configurable so e.g. Debian can set it to x-terminal-emulator instead of the current default of xterm. Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]>
2022-05-26Merge tag 'for-5.19/dm-changes' of ↵Linus Torvalds15-287/+409
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set. This change improves DM's hipri bio polling (REQ_POLLED) performance by 7 - 20% depending on the system. - Update DM core to use jump_labels to further reduce cost of unlikely branches for zoned block devices, dm-stats and swap_bios throttling. - Various DM core changes to reduce bio-based DM overhead and simplify IO accounting. - Fundamental DM core improvements to dm_io reference counting and the elimination of using bio_split()+bio_chain() -- instead DM's bio-based IO accounting is updated to account that a split occurred. - Improve DM core's abnormal bio processing to do less work. - Improve DM core's hipri polling support to use a single list rather than an hlist. - Update DM core to pass NULL bdev to bio_alloc_clone() so that initialization that isn't useful for DM can be elided. - Add cond_resched to DM stats' various loops that loop over all entries. - Fix incorrect error code return from DM integrity's constructor. - Make DM crypt's printing of the key constant-time. - Update bio-based DM multipath to provide high-resolution timer to the Historical Service Time (HST) path selector. * tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm: pass NULL bdev to bio_alloc_clone dm cache metadata: remove unnecessary variable in __dump_mapping dm mpath: provide high-resolution timer to HST for bio-based dm crypt: make printing of the key constant-time dm integrity: fix error code in dm_integrity_ctr() dm stats: add cond_resched when looping over entries dm: improve abnormal bio processing dm: simplify bio-based IO accounting further dm: put all polled dm_io instances into a single list dm: improve dm_io reference counting dm: don't grab target io reference in dm_zone_map_bio dm: improve bio splitting and associated IO accounting dm: switch to bdev based IO accounting interfaces dm: pass dm_io instance to dm_io_acct directly dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct dm: use bio_sectors in dm_aceept_partial_bio dm: simplify basic targets dm: conditionally enable branching for less used features dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio dm: move hot dm_io members to same cacheline as dm_target_io ...
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds88-2002/+1968
Pull rdma updates from Jason Gunthorpe: "Small collection of incremental improvement patches: - Minor code cleanup patches, comment improvements, etc from static tools - Clean the some of the kernel caps, reducing the historical stealth uAPI leftovers - Bug fixes and minor changes for rdmavt, hns, rxe, irdma - Remove unimplemented cruft from rxe - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer - flush_workqueue(system_unbound_wq) removal - Ensure rxe waits for objects to be unused before allowing the core to free them - Several rc quality bug fixes for hfi1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits) RDMA/rtrs-clt: Fix one kernel-doc comment RDMA/hfi1: Remove all traces of diagpkt support RDMA/hfi1: Consolidate software versions RDMA/hfi1: Remove pointless driver version RDMA/hfi1: Fix potential integer multiplication overflow errors RDMA/hfi1: Prevent panic when SDMA is disabled RDMA/hfi1: Prevent use of lock before it is initialized RDMA/rxe: Fix an error handling path in rxe_get_mcg() IB/core: Fix typo in comment RDMA/core: Fix typo in comment IB/hf1: Fix typo in comment IB/qib: Fix typo in comment IB/iser: Fix typo in comment RDMA/mlx4: Avoid flush_scheduled_work() usage IB/isert: Avoid flush_scheduled_work() usage RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr() RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq() RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx() RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx() RDMA/irdma: Add SW mechanism to generate completions on error ...
2022-05-26Merge tag 'hardening-v5.19-rc1-fix1' of ↵Linus Torvalds6-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening fix from Kees Cook: "This fixes an unlucky build race condition when using the GCC plugins, noticed by a few folks. - Avoid GCC plugins needing utsrelease.h build target (Masahiro Yamada)" * tag 'hardening-v5.19-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: use KERNELVERSION for plugin version
2022-05-26Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds30-426/+985
Pull nfsd updates from Chuck Lever: "We introduce 'courteous server' in this release. Previously NFSD would purge open and lock state for an unresponsive client after one lease period (typically 90 seconds). Now, after one lease period, another client can open and lock those files and the unresponsive client's lease is purged; otherwise if the unresponsive client's open and lock state is uncontended, the server retains that open and lock state for up to 24 hours, allowing the client's workload to resume after a lengthy network partition. A longstanding issue with NFSv4 file creation is also addressed. Previously a file creation can fail internally, returning an error to the client, but leave the newly created file in place as an artifact. The file creation code path has been reorganized so that internal failures and race conditions are less likely to result in an unwanted file creation. A fault injector has been added to help exercise paths that are run during kernel metadata cache invalidation. These caches contain information maintained by user space about exported filesystems. Many of our test workloads do not trigger cache invalidation. There is one patch that is needed to support PREEMPT_RT and a fix for an ancient 'sleep while spin-locked' splat that seems to have become easier to hit since v5.18-rc3" * tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits) NFSD: nfsd_file_put() can sleep NFSD: Add documenting comment for nfsd4_release_lockowner() NFSD: Modernize nfsd4_release_lockowner() NFSD: Fix possible sleep during nfsd4_release_lockowner() nfsd: destroy percpu stats counters after reply cache shutdown nfsd: Fix null-ptr-deref in nfsd_fill_super() nfsd: Unregister the cld notifier when laundry_wq create failed SUNRPC: Use RMW bitops in single-threaded hot paths NFSD: Clean up the show_nf_flags() macro NFSD: Trace filecache opens NFSD: Move documenting comment for nfsd4_process_open2() NFSD: Fix whitespace NFSD: Remove dprintk call sites from tail of nfsd4_open() NFSD: Instantiate a struct file when creating a regular NFSv4 file NFSD: Clean up nfsd_open_verified() NFSD: Remove do_nfsd_create() NFSD: Refactor NFSv4 OPEN(CREATE) NFSD: Refactor NFSv3 CREATE NFSD: Refactor nfsd_create_setattr() NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() ...
2022-05-26net: sched: fixed barrier to prevent skbuff sticking in qdisc backlogVincent Ray1-28/+8
In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit() does not provide any ordering guarantee as test_bit() is not an atomic operation. This, added to the fact that the spin_trylock() call at the beginning of qdisc_run_begin() does not guarantee acquire semantics if it does not grab the lock, makes it possible for the following statement : if (test_bit(__QDISC_STATE_MISSED, &qdisc->state)) to be executed before an enqueue operation called before qdisc_run_begin(). As a result the following race can happen : CPU 1 CPU 2 qdisc_run_begin() qdisc_run_begin() /* true */ set(MISSED) . /* returns false */ . . /* sees MISSED = 1 */ . /* so qdisc not empty */ . __qdisc_run() . . . pfifo_fast_dequeue() ----> /* may be done here */ . | . clear(MISSED) | . . | . smp_mb __after_atomic(); | . . | . /* recheck the queue */ | . /* nothing => exit */ | enqueue(skb1) | . | qdisc_run_begin() | . | spin_trylock() /* fail */ | . | smp_mb__before_atomic() /* not enough */ | . ---- if (test_bit(MISSED)) return false; /* exit */ In the above scenario, CPU 1 and CPU 2 both try to grab the qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the bypass code path, where it emits its skb then calls __qdisc_run(). CPU1 fails, sets MISSED and goes down the traditionnal enqueue() + dequeue() code path. But when executing qdisc_run_begin() for the second time, after enqueuing its skbuff, it sees the MISSED bit still set (by itself) and consequently chooses to exit early without setting it again nor trying to grab the spinlock again. Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue and found it empty, so it returned. At the end of the sequence, we end up with skb1 enqueued in the backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set, and no __netif_schedule() called made. skb1 will now linger in the qdisc until somebody later performs a full __qdisc_run(). Associated to the bypass capacity of the qdisc, and the ability of the TCP layer to avoid resending packets which it knows are still in the qdisc, this can lead to serious traffic "holes" in a TCP connection. We fix this by replacing the smp_mb__before_atomic() / test_bit() / set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin() by a single test_and_set_bit() call, which is more concise and enforces the needed memory barriers. Fixes: 89837eb4b246 ("net: sched: add barrier to ensure correct ordering for lockless qdisc") Signed-off-by: Vincent Ray <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-26net: lan966x: check devm_of_phy_get() for -EDEFER_PROBEMichael Walle1-2/+7
At the moment, if devm_of_phy_get() returns an error the serdes simply isn't set. While it is bad to ignore an error in general, there is a particular bug that network isn't working if the serdes driver is compiled as a module. In that case, devm_of_phy_get() returns -EDEFER_PROBE and the error is silently ignored. The serdes is optional, it is not there if the port is using RGMII, in which case devm_of_phy_get() returns -ENODEV. Rearrange the error handling so that -ENODEV will be handled but other error codes will abort the probing. Fixes: d28d6d2e37d1 ("net: lan966x: add port module support") Signed-off-by: Michael Walle <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski2-9/+12
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix UAF when creating non-stateful expression in set. 2) Set limit cost when cloning expression accordingly, from Phil Sutter. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_limit: Clone packet limits' cost value netfilter: nf_tables: disallow non-stateful expression in sets earlier ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-26tracing: Fix comments for event_trigger_separate_filter()sunliming1-4/+4
The parameter name in comments of event_trigger_separate_filter() is inconsistent with actual parameter name, fix it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: sunliming <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26x86/traceponit: Fix comment about irq vector tracepointssunliming1-3/+0
Commit: 4b9a8dca0e58 ("x86/idt: Remove the tracing IDT completely") removed the 'tracing IDT' from arch/x86/kernel/tracepoint.c, but left related comment. So that the comment become anachronistic. Just remove the comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: sunliming <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26x86,tracing: Remove unused headerssunliming1-3/+0
Commit 4b9a8dca0e58 ("x86/idt: Remove the tracing IDT completely") removed the tracing IDT from the file arch/x86/kernel/tracepoint.c, but left the related headers unused, remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: sunliming <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26ftrace: Clean up hash direct_functions on register failuresSong Liu1-3/+2
We see the following GPF when register_ftrace_direct fails: [ ] general protection fault, probably for non-canonical address \ 0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [...] [ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70 [ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...] [ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206 [ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b [ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0 [ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001 [ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0 [ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000 [ ] FS: 00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000 [ ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0 [ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ ] Call Trace: [ ] <TASK> [ ] register_ftrace_direct+0x54/0x290 [ ] ? render_sigset_t+0xa0/0xa0 [ ] bpf_trampoline_update+0x3f5/0x4a0 [ ] ? 0xffffffffa00a3000 [ ] bpf_trampoline_link_prog+0xa9/0x140 [ ] bpf_tracing_prog_attach+0x1dc/0x450 [ ] bpf_raw_tracepoint_open+0x9a/0x1e0 [ ] ? find_held_lock+0x2d/0x90 [ ] ? lock_release+0x150/0x430 [ ] __sys_bpf+0xbd6/0x2700 [ ] ? lock_is_held_type+0xd8/0x130 [ ] __x64_sys_bpf+0x1c/0x20 [ ] do_syscall_64+0x3a/0x80 [ ] entry_SYSCALL_64_after_hwframe+0x44/0xae [ ] RIP: 0033:0x7fa9421defa9 [ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...] [ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9 [ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011 [ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000 [ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001 [ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001 [ ] </TASK> [ ] Modules linked in: klp_vm(OK) [ ] ---[ end trace 0000000000000000 ]--- One way to trigger this is: 1. load a livepatch that patches kernel function xxx; 2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now); 3. repeat #2 => gpf. This is because the entry is added to direct_functions, but not removed. Fix this by remove the entry from direct_functions when register_ftrace_direct fails. Also remove the last trailing space from ftrace.c, so we don't have to worry about it anymore. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 763e34e74bb7 ("ftrace: Add register_ftrace_direct()") Signed-off-by: Song Liu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26tracing: Fix comments of create_filter()sunliming1-1/+1
The name in comments of parameter "filter_string" in function create_filter is annotated as "filter_str", just fix it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: sunliming <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26tracing: Disable kcov on trace_preemptirq.cCongyu Liu1-0/+4
Functions in trace_preemptirq.c could be invoked from early interrupt code that bypasses kcov trace function's in_task() check. Disable kcov on this file to reduce random code coverage. Link: https://lkml.kernel.org/r/[email protected] Acked-by: Dmitry Vyukov <[email protected]> Signed-off-by: Congyu Liu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26tracing: Initialize integer variable to prevent garbage return valueGautam Menghani1-1/+1
Initialize the integer variable to 0 to fix the clang scan warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return ret; Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 8993665abcce ("tracing/boot: Support multiple handlers for per-event histogram") Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Gautam Menghani <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26ftrace: Fix typo in commentJulia Lawall1-1/+1
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26ftrace: Remove return value of ftrace_arch_modify_*()Li kunyu6-28/+13
All instances of the function ftrace_arch_modify_prepare() and ftrace_arch_modify_post_process() return zero. There's no point in checking their return value. Just have them be void functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Li kunyu <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26tracing: Cleanup code by removing init "char *name"liqiong1-3/+1
The pointer is assigned to "type->name" anyway. no need to initialize with "preemption". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: liqiong <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26tracing: Change "char *" string form to "char []"liqiong2-2/+2
The "char []" string form declares a single variable. It is better than "char *" which creates two variables in the final assembly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: liqiong <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQDaniel Bristot de Oliveira1-0/+2
There is no need to wakeup the timerlat/ thread if stop tracing is hit at the timerlat's IRQ handler. Return before waking up timerlat's thread. Link: https://lkml.kernel.org/r/b392356c91b56aedd2b289513cc56a84cf87e60d.1652175637.git.bristot@kernel.org Cc: Juri Lelli <[email protected]> Cc: Clark Williams <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-05-26tracing/timerlat: Print stacktrace in the IRQ handler if neededDaniel Bristot de Oliveira2-2/+16
If print_stack and stop_tracing_us are set, and stop_tracing_us is hit with latency higher than or equal to print_stack, print the stack at the IRQ handler as it is useful to define the root cause for the IRQ latency. Link: https://lkml.kernel.org/r/fd04530ce98ae9270e41bb124ee5bf67b05ecfed.1652175637.git.bristot@kernel.org Cc: Juri Lelli <[email protected]> Cc: Clark Williams <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>