aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-07-25tcp: Fix data-races around sk_pacing_rate.Kuniyuki Iwashima1-2/+2
While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: 43e122b014c9 ("tcp: refine pacing rate determination") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25net: mld: fix reference count leak in mld_{query | report}_work()Taehee Yoo1-6/+8
mld_{query | report}_work() processes queued events. If there are too many events in the queue, it re-queue a work. And then, it returns without in6_dev_put(). But if queuing is failed, it should call in6_dev_put(), but it doesn't. So, a reference count leak would occur. THREAD0 THREAD1 mld_report_work() spin_lock_bh() if (!mod_delayed_work()) in6_dev_hold(); spin_unlock_bh() spin_lock_bh() schedule_delayed_work() spin_unlock_bh() Script to reproduce(by Hangbin Liu): ip netns add ns1 ip netns add ns2 ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip -n ns1 link add veth0 type veth peer name veth0 netns ns2 ip -n ns1 link set veth0 up ip -n ns2 link set veth0 up for i in `seq 50`; do for j in `seq 100`; do ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0 ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0 done done modprobe -r veth ip -a netns del splat looks like: unregister_netdevice: waiting for veth0 to become free. Usage count = 2 leaked reference. ipv6_add_dev+0x324/0xec0 addrconf_notify+0x481/0xd10 raw_notifier_call_chain+0xe3/0x120 call_netdevice_notifiers+0x106/0x160 register_netdevice+0x114c/0x16b0 veth_newlink+0x48b/0xa50 [veth] rtnl_newlink+0x11a2/0x1a40 rtnetlink_rcv_msg+0x63f/0xc00 netlink_rcv_skb+0x1df/0x3e0 netlink_unicast+0x5de/0x850 netlink_sendmsg+0x6c9/0xa90 ____sys_sendmsg+0x76a/0x780 __sys_sendmsg+0x27c/0x340 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Tested-by: Hangbin Liu <[email protected]> Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Taehee Yoo <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25random: handle archrandom with multiple longsJason A. Donenfeld11-188/+116
The archrandom interface was originally designed for x86, which supplies RDRAND/RDSEED for receiving random words into registers, resulting in one function to generate an int and another to generate a long. However, other architectures don't follow this. On arm64, the SMCCC TRNG interface can return between one and three longs. On s390, the CPACF TRNG interface can return arbitrary amounts, with four longs having the same cost as one. On UML, the os_getrandom() interface can return arbitrary amounts. So change the api signature to take a "max_longs" parameter designating the maximum number of longs requested, and then return the number of longs generated. Since callers need to check this return value and loop anyway, each arch implementation does not bother implementing its own loop to try again to fill the maximum number of longs. Additionally, all existing callers pass in a constant max_longs parameter. Taken together, these two things mean that the codegen doesn't really change much for one-word-at-a-time platforms, while performance is greatly improved on platforms such as s390. Acked-by: Heiko Carstens <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Mark Rutland <[email protected]> Acked-by: Michael Ellerman <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2022-07-25net: macsec: fix potential resource leak in macsec_add_rxsa() and ↵Jianglei Nie1-2/+2
macsec_add_txsa() init_rx_sa() allocates relevant resource for rx_sa->stats and rx_sa-> key.tfm with alloc_percpu() and macsec_alloc_tfm(). When some error occurs after init_rx_sa() is called in macsec_add_rxsa(), the function released rx_sa with kfree() without releasing rx_sa->stats and rx_sa-> key.tfm, which will lead to a resource leak. We should call macsec_rxsa_put() instead of kfree() to decrease the ref count of rx_sa and release the relevant resource if the refcount is 0. The same bug exists in macsec_add_txsa() for tx_sa as well. This patch fixes the above two bugs. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Signed-off-by: Jianglei Nie <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25Merge branch 'macsec-config-issues'David S. Miller1-10/+19
Sabrina Dubroca says: ==================== macsec: fix config issues The patch adding netlink support for XPN (commit 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)")) introduced several issues, including a kernel panic reported at [1]. Reproducing those bugs with upstream iproute is limited, since iproute doesn't currently support XPN. I'm also working on this. [1] https://bugzilla.kernel.org/show_bug.cgi?id=208315 ==================== Signed-off-by: David S. Miller <[email protected]>
2022-07-25macsec: always read MACSEC_SA_ATTR_PN as a u64Sabrina Dubroca1-3/+3
Currently, MACSEC_SA_ATTR_PN is handled inconsistently, sometimes as a u32, sometimes forced into a u64 without checking the actual length of the attribute. Instead, we can use nla_get_u64 everywhere, which will read up to 64 bits into a u64, capped by the actual length of the attribute coming from userspace. This fixes several issues: - the check in validate_add_rxsa doesn't work with 32-bit attributes - the checks in validate_add_txsa and validate_upd_sa incorrectly reject X << 32 (with X != 0) Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25macsec: limit replay window size with XPNSabrina Dubroca1-4/+12
IEEE 802.1AEbw-2013 (section 10.7.8) specifies that the maximum value of the replay window is 2^30-1, to help with recovery of the upper bits of the PN. To avoid leaving the existing macsec device in an inconsistent state if this test fails during changelink, reuse the cleanup mechanism introduced for HW offload. This wasn't needed until now because macsec_changelink_common could not fail during changelink, as modifying the cipher suite was not allowed. Finally, this must happen after handling IFLA_MACSEC_CIPHER_SUITE so that secy->xpn is set. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25macsec: fix error message in macsec_add_rxsa and _txsaSabrina Dubroca1-2/+2
The expected length is MACSEC_SALT_LEN, not MACSEC_SA_ATTR_SALT. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25macsec: fix NULL deref in macsec_add_rxsaSabrina Dubroca1-1/+2
Commit 48ef50fa866a added a test on tb_sa[MACSEC_SA_ATTR_PN], but nothing guarantees that it's not NULL at this point. The same code was added to macsec_add_txsa, but there it's not a problem because validate_add_txsa checks that the MACSEC_SA_ATTR_PN attribute is present. Note: it's not possible to reproduce with iproute, because iproute doesn't allow creating an SA without specifying the PN. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208315 Reported-by: Frantisek Sumsal <[email protected]> Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}Marc Zyngier2-28/+29
Even if we are now able to tell the kernel to avoid exposing SVE/SME from the command line, we still have a couple of places where we unconditionally access the ZCR_EL1 (resp. SMCR_EL1) registers. On systems with broken firmwares, this results in a crash even if arm64.nosve (resp. arm64.nosme) was passed on the command-line. To avoid this, only update cpuinfo_arm64::reg_{zcr,smcr} once we have computed the sanitised version for the corresponding feature registers (ID_AA64PFR0 for SVE, and ID_AA64PFR1 for SME). This results in some minor refactoring. Reviewed-by: Mark Brown <[email protected]> Reviewed-by: Peter Collingbourne <[email protected]> Tested-by: Peter Collingbourne <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2022-07-25Merge branch 'for-next/boot' into for-next/coreWill Deacon24-612/+750
* for-next/boot: (34 commits) arm64: fix KASAN_INLINE arm64: Add an override for ID_AA64SMFR0_EL1.FA64 arm64: Add the arm64.nosve command line option arm64: Add the arm64.nosme command line option arm64: Expose a __check_override primitive for oddball features arm64: Allow the idreg override to deal with variable field width arm64: Factor out checking of a feature against the override into a macro arm64: Allow sticky E2H when entering EL1 arm64: Save state of HCR_EL2.E2H before switch to EL1 arm64: Rename the VHE switch to "finalise_el2" arm64: mm: fix booting with 52-bit address space arm64: head: remove __PHYS_OFFSET arm64: lds: use PROVIDE instead of conditional definitions arm64: setup: drop early FDT pointer helpers arm64: head: avoid relocating the kernel twice for KASLR arm64: kaslr: defer initialization to initcall where permitted arm64: head: record CPU boot mode after enabling the MMU arm64: head: populate kernel page tables with MMU and caches on arm64: head: factor out TTBR1 assignment into a macro arm64: idreg-override: use early FDT mapping in ID map ...
2022-07-25Merge branch 'for-next/cpufeature' into for-next/coreWill Deacon8-9/+41
* for-next/cpufeature: arm64/hwcap: Support FEAT_EBF16 arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long arm64/hwcap: Document allocation of upper bits of AT_HWCAP arm64: trap implementation defined functionality in userspace
2022-07-25Merge branch 'for-next/vdso' into for-next/coreWill Deacon6-6/+48
* for-next/vdso: arm64: vdso32: Add DWARF_DEBUG arm64: vdso32: Shuffle .ARM.exidx section above ELF_DETAILS arm64: compat: Move sigreturn32.S to .rodata section arm64: vdso*: place got/plt sections in .rodata arm64: vdso32: add ARM.exidx* sections arm64: compat: Move kuser32.S to .rodata section arm64: vdso32: enable orphan handling for VDSO arm64: vdso32: put ELF related sections in the linker script arm64: vdso: enable orphan handling for VDSO arm64: vdso: put ELF related sections in the linker script
2022-07-25Merge branch 'for-next/sysregs' into for-next/coreWill Deacon17-315/+457
* for-next/sysregs: (28 commits) arm64/sysreg: Convert ID_AA64ZFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AA64SMFR0_EL1 to automatic generation arm64/sysreg: Convert LORID_EL1 to automatic generation arm64/sysreg: Convert LORC_EL1 to automatic generation arm64/sysreg: Convert LORN_EL1 to automatic generation arm64/sysreg: Convert LOREA_EL1 to automatic generation arm64/sysreg: Convert LORSA_EL1 to automatic generation arm64/sysreg: Convert ID_AA64ISAR2_EL1 to automatic generation arm64/sysreg: Convert ID_AA64ISAR1_EL1 to automatic generation arm64/sysreg: Convert GMID to automatic generation arm64/sysreg: Convert DCZID_EL0 to automatic generation arm64/sysreg: Convert CTR_EL0 to automatic generation arm64/sysreg: Add _EL1 into ID_AA64ISAR2_EL1 definition names arm64/sysreg: Add _EL1 into ID_AA64ISAR1_EL1 definition names arm64/sysreg: Remove defines for RPRES enumeration arm64/sysreg: Standardise naming for ID_AA64ZFR0_EL1 fields arm64/sysreg: Standardise naming for ID_AA64SMFR0_EL1 enums arm64/sysreg: Standardise naming for WFxT defines arm64/sysreg: Make BHB clear feature defines match the architecture arm64/sysreg: Align pointer auth enumeration defines with architecture ...
2022-07-25Merge branch 'for-next/stacktrace' into for-next/coreWill Deacon2-24/+80
* for-next/stacktrace: arm64: Copy the task argument to unwind_state arm64: Split unwind_init() arm64: stacktrace: use non-atomic __set_bit arm64: kasan: do not instrument stacktrace.c
2022-07-25Merge branch 'for-next/sme' into for-next/coreWill Deacon5-11/+37
* for-next/sme: arm64/fpsimd: Remove duplicate SYS_SVCR read arm64/signal: Clean up SVE/SME feature checking inconsistency arm64/sme: Expose SMIDR through sysfs
2022-07-25Merge branch 'for-next/perf' into for-next/coreWill Deacon23-105/+1993
* for-next/perf: drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node() docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING drivers/perf: hisi: add driver for HNS3 PMU drivers/perf: hisi: Add description for HNS3 PMU driver drivers/perf: riscv_pmu_sbi: perf format perf/arm-cci: Use the bitmap API to allocate bitmaps drivers/perf: riscv_pmu: Add riscv pmu pm notifier perf: hisi: Extract hisi_pmu_init perf/marvell_cn10k: Fix TAD PMU register offset perf/marvell_cn10k: Remove useless license text when SPDX-License-Identifier is already used arm64: cpufeature: Allow different PMU versions in ID_DFR0_EL1 perf/arm-cci: fix typo in comment drivers/perf:Directly use ida_alloc()/free() drivers/perf: Directly use ida_alloc()/free()
2022-07-25Merge branch 'for-next/mte' into for-next/coreWill Deacon8-44/+13
* for-next/mte: arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON mm: kasan: Skip unpoisoning of user pages mm: kasan: Ensure the tags are visible before the tag in page->flags
2022-07-25Merge branch 'for-next/mm' into for-next/coreWill Deacon4-1/+20
* for-next/mm: arm64: enable THP_SWAP for arm64
2022-07-25Merge branch 'for-next/misc' into for-next/coreWill Deacon11-68/+36
* for-next/misc: arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52 arm64: numa: Don't check node against MAX_NUMNODES arm64: mm: Remove assembly DMA cache maintenance wrappers arm64/mm: Define defer_reserve_crashkernel() arm64: fix oops in concurrently setting insn_emulation sysctls arm64: Do not forget syscall when starting a new thread. arm64: boot: add zstd support
2022-07-25Merge branch 'for-next/kpti' into for-next/coreWill Deacon7-130/+176
* for-next/kpti: arm64: correct the effect of mitigations off on kpti arm64: entry: simplify trampoline data page arm64: mm: install KPTI nG mappings with MMU enabled arm64: kpti-ng: simplify page table traversal logic
2022-07-25Merge branch 'for-next/kcsan' into for-next/coreWill Deacon3-11/+20
* for-next/kcsan: arm64: kcsan: Support detecting more missing memory barriers asm-generic: Add memory barrier dma_mb()
2022-07-25Merge branch 'for-next/irqflags-nmi' into for-next/coreWill Deacon4-3/+5
* for-next/irqflags-nmi: arm64: select TRACE_IRQFLAGS_NMI_SUPPORT arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic
2022-07-25Merge branch 'for-next/ioremap' into for-next/coreWill Deacon12-127/+90
* for-next/ioremap: arm64: Add HAVE_IOREMAP_PROT support arm64: mm: Convert to GENERIC_IOREMAP mm: ioremap: Add ioremap/iounmap_allowed() mm: ioremap: Setup phys_addr of struct vm_struct mm: ioremap: Use more sensible name in ioremap_prot() ARM: mm: kill unused runtime hook arch_iounmap()
2022-07-25Merge branch 'for-next/extable' into for-next/coreWill Deacon5-88/+111
* for-next/extable: arm64: extable: cleanup redundant extable type EX_TYPE_FIXUP arm64: extable: move _cond_extable to _cond_uaccess_extable arm64: extable: make uaaccess helper use extable type EX_TYPE_UACCESS_ERR_ZERO arm64: asm-extable: add asm uacess helpers arm64: asm-extable: move data fields arm64: extable: add new extable type EX_TYPE_KACCESS_ERR_ZERO support
2022-07-25Merge branch 'for-next/errata' into for-next/coreWill Deacon5-2/+76
* for-next/errata: arm64: errata: Remove AES hwcap for COMPAT tasks arm64: errata: Add Cortex-A510 to the repeat tlbi list
2022-07-25Merge branch 'for-next/docs' into for-next/coreWill Deacon1-6/+4
* for-next/docs: Documentation/arm64: update memory layout table.
2022-07-25Merge branch 'for-next/cpuidle' into for-next/coreWill Deacon4-54/+2
* for-next/cpuidle: arm64: cpuidle: remove generic cpuidle support cpuidle: cpuidle-arm: remove arm64 support
2022-07-25s390/qeth: Fix typo 'the the' in commentSlark Xiao1-1/+1
Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25net: ipa: Fix typo 'the the' in commentSlark Xiao1-1/+1
Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25nfp: bpf: Fix typo 'the the' in commentSlark Xiao1-1/+1
Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-25Merge branch irq/misc-5.20 into irq/irqchip-nextMarc Zyngier10-15/+31
* irq/misc-5.20: : . : Misc IRQ changes for 5.20: : : - Let irq_set_chip_handler_name_locked() take a const struct irq_chip * : : - Convert the ocelot irq_chip to being immutable (depends on the above) : : - Tidy-up the NOMAP irqdomain API variant : : - Teach action_show() to use for_each_action_of_desc() : : - Check ioremap() return value in the MIPS GIC driver : : - Move MMP driver init function declarations into the common .h : : - The obligatory typo fixes : . irqchip/mmp: Declare init functions in common header file irqchip/mips-gic: Check the return value of ioremap() in gic_of_init() genirq: Use for_each_action_of_desc in actions_show() irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains irqdomain: Report irq number for NOMAP domains irqchip/gic-v3: Fix comment typo pinctrl: ocelot: Make irq_chip immutable genirq: Allow irq_set_chip_handler_name_locked() to take a const irq_chip Signed-off-by: Marc Zyngier <[email protected]>
2022-07-25irqchip/mmp: Declare init functions in common header fileBen Dooks4-3/+6
The functions icu_init_irq and mmp2_init_icu are exported from this code, so declare them in the header file to avoid the following sparse warnings: drivers/irqchip/irq-mmp.c:248:13: warning: symbol 'icu_init_irq' was not declared. Should it be static? drivers/irqchip/irq-mmp.c:271:13: warning: symbol 'mmp2_init_icu' was not declared. Should it be static? Signed-off-by: Ben Dooks <[email protected]> [maz: fixup commit message] Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-25x86/purgatory: Omit use of bin2cMasahiro Yamada5-10/+17
The .incbin assembler directive is much faster than bin2c + $(CC). Do similar refactoring as in 4c0f032d4963 ("s390/purgatory: Omit use of bin2c"). Please note the .quad directive matches to size_t in C (both 8 byte) because the purgatory is compiled only for the 64-bit kernel. (KEXEC_FILE depends on X86_64). Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-25x86/purgatory: Hard-code obj-y in MakefileMasahiro Yamada1-1/+1
arch/x86/Kbuild guards the entire purgatory/ directory, and CONFIG_KEXEC_FILE is bool type. $(CONFIG_KEXEC_FILE) is always 'y' when this directory is being built. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-25dt-bindings: arm: at91: add lan966 pcb8309 boardHoratiu Vultur1-2/+4
Add documentation for Microchip LAN9662 PCB8309. Signed-off-by: Horatiu Vultur <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Claudiu Beznea <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-25nvme-pci: Crucial P2 has bogus namespace idsTobias Gruetzmacher1-0/+2
This adds a quirk for the Crucial P2. Signed-off-by: Tobias Gruetzmacher <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-07-25Merge tag 'v5.19-rc7' into fixesMichael Ellerman1356-10097/+17734
Merge v5.19-rc7 into fixes to bring in: d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
2022-07-24selftests/io_uring: test zerocopy sendPavel Begunkov3-0/+737
Add selftests for io_uring zerocopy sends and io_uring's notification infrastructure. It's largely influenced by msg_zerocopy and uses it on the receive side. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/03d5ec78061cf52db420f88ed0b48eb8f47ce9f7.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: enable managed frags with register buffersPavel Begunkov1-1/+55
io_uring's registered buffers infra has a good performant way of pinning pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely register buffer backed. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: add zc notification flush requestsPavel Begunkov2-0/+39
Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/df13e2363400682a73dd9e71c3b990b8d1ff0333.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: rename IORING_OP_FILES_UPDATEPavel Begunkov4-9/+33
IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/0a907133907d9af3415a8a7aa1802c6aa97c03c6.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: flush notifiers after sendzcPavel Begunkov6-12/+31
Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: sendzc with fixed buffersPavel Begunkov2-6/+29
Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/e1d8bd1b5934e541d90c1824eb4020ae3f5f43f3.1657643355.git.asml.silence@gmail.com [axboe: fold in 32-bit pointer cast warning fix] Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: allow to pass addr into sendzcPavel Begunkov2-3/+17
Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: account locked pages for non-fixed zcPavel Begunkov2-0/+7
Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/19b6e3975440f59f1f6199c7ee7acf977b4eecdc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: wire send zc request typePavel Begunkov4-0/+117
Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/a80387c6a68ce9cf99b3b6ef6f71068468761fb7.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: add notification slot registrationPavel Begunkov4-0/+72
Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/a0aa8161fe3ebb2a4cc6e5dbd0cffb96e6881cf5.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: add rsrc referencing for notifiersPavel Begunkov3-3/+15
In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/3cd7a01d26837945b6982fa9cf15a63230f2ed4f.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-07-24io_uring: complete notifiers in twPavel Begunkov2-3/+22
We need a task context to post CQEs but using wq is too expensive. Try to complete notifiers using task_work and fall back to wq if fails. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/089799ab665b10b78fdc614ae6d59fa7ef0d5f91.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>