aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-06bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()Kent Overstreet1-0/+2
We're using mutex_lock() inside a wait_event() conditional - prepare_to_wait() has already flipped task state, so potentially blocking ops need annotation. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-06net: fix out-of-bounds access in ops_initThadeu Lima de Souza Cascardo1-3/+10
net_alloc_generic is called by net_alloc, which is called without any locking. It reads max_gen_ptrs, which is changed under pernet_ops_rwsem. It is read twice, first to allocate an array, then to set s.len, which is later used to limit the bounds of the array access. It is possible that the array is allocated and another thread is registering a new pernet ops, increments max_gen_ptrs, which is then used to set s.len with a larger than allocated length for the variable array. Fix it by reading max_gen_ptrs only once in net_alloc_generic. If max_gen_ptrs is later incremented, it will be caught in net_assign_generic. Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]> Fixes: 073862ba5d24 ("netns: fix net_alloc_generic()") Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06Merge branch 'add-tcp-fraglist-gro-support'Paolo Abeni6-69/+325
Felix Fietkau says: ==================== Add TCP fraglist GRO support When forwarding TCP after GRO, software segmentation is very expensive, especially when the checksum needs to be recalculated. One case where that's currently unavoidable is when routing packets over PPPoE. Performance improves significantly when using fraglist GRO implemented in the same way as for UDP. When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established socket in the same netns as the receiving device. While this may not cover all relevant use cases in multi-netns configurations, it should be good enough for most configurations that need this. Here's a measurement of running 2 TCP streams through a MediaTek MT7622 device (2-core Cortex-A53), which runs NAT with flow offload enabled from one ethernet port to PPPoE on another ethernet port + cake qdisc set to 1Gbps. rx-gro-list off: 630 Mbit/s, CPU 35% idle rx-gro-list on: 770 Mbit/s, CPU 40% idle Changes since v4: - add likely() to prefer the non-fraglist path in check Changes since v3: - optimize __tcpv4_gso_segment_csum - add unlikely() - reorder dev_net/skb_gro_network_header calls after NETIF_F_GRO_FRAGLIST check - add support for ipv6 nat - drop redundant pskb_may_pull check Changes since v2: - create tcp_gro_header_pull helper function to pull tcp header only once - optimize __tcpv4_gso_segment_list_csum, drop obsolete flags check Changes since v1: - revert bogus tcp flags overwrite on segmentation - fix kbuild issue with !CONFIG_IPV6 - only perform socket lookup for the first skb in the GRO train Changes since RFC: - split up patches - handle TCP flags mutations ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06net: add heuristic for enabling TCP fraglist GROFelix Fietkau2-0/+67
When forwarding TCP after GRO, software segmentation is very expensive, especially when the checksum needs to be recalculated. One case where that's currently unavoidable is when routing packets over PPPoE. Performance improves significantly when using fraglist GRO implemented in the same way as for UDP. When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established socket in the same netns as the receiving device. While this may not cover all relevant use cases in multi-netns configurations, it should be good enough for most configurations that need this. Here's a measurement of running 2 TCP streams through a MediaTek MT7622 device (2-core Cortex-A53), which runs NAT with flow offload enabled from one ethernet port to PPPoE on another ethernet port + cake qdisc set to 1Gbps. rx-gro-list off: 630 Mbit/s, CPU 35% idle rx-gro-list on: 770 Mbit/s, CPU 40% idle Acked-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06net: create tcp_gro_header_pull helper functionFelix Fietkau3-27/+50
Pull the code out of tcp_gro_receive in order to access the tcp header from tcp4/6_gro_receive. Acked-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06net: create tcp_gro_lookup helper functionFelix Fietkau2-16/+26
This pulls the flow port matching out of tcp_gro_receive, so that it can be reused for the next change, which adds the TCP fraglist GRO heuristic. Acked-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06net: add code for TCP fraglist GROFelix Fietkau2-0/+30
This implements fraglist GRO similar to how it's handled in UDP, however no functional changes are added yet. The next change adds a heuristic for using fraglist GRO instead of regular GRO. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06net: add support for segmenting TCP fraglist GSO packetsFelix Fietkau2-0/+125
Preparation for adding TCP fraglist GRO support. It expects packets to be combined in a similar way as UDP fraglist GSO packets. For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO. Acked-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06net: move skb_gro_receive_list from udp to coreFelix Fietkau3-27/+28
This helper function will be used for TCP fraglist GRO support Acked-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06net: microchip: lan743x: Reduce PTP timeout on HW failureRengarajan S2-1/+2
The PTP_CMD_CTL is a self clearing register which controls the PTP clock values. In the current implementation driver waits for a duration of 20 sec in case of HW failure to clear the PTP_CMD_CTL register bit. This timeout of 20 sec is very long to recognize a HW failure, as it is typically cleared in one clock(<16ns). Hence reducing the timeout to 1 sec would be sufficient to conclude if there is any HW failure observed. The usleep_range will sleep somewhere between 1 msec to 20 msec for each iteration. By setting the PTP_CMD_CTL_TIMEOUT_CNT to 50 the max timeout is extended to 1 sec. Signed-off-by: Rengarajan S <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-05-06netfilter: conntrack: dccp: try not to drop skb in conntrackJason Xing1-2/+2
It would be better not to drop skb in conntrack unless we have good alternatives. So we can treat the result of testing skb's header pointer as nf_conntrack_tcp_packet() does. Signed-off-by: Jason Xing <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-05-06netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router DiscoveryLinus Lüssing2-1/+4
So far Multicast Router Advertisements and Multicast Router Solicitations from the Multicast Router Discovery protocol (RFC4286) would be marked as INVALID for IPv6, even if they are in fact intact and adhering to RFC4286. This broke MRA reception and by that multicast reception on IPv6 multicast routers in a Proxmox managed setup, where Proxmox would install a rule like "-m conntrack --ctstate INVALID -j DROP" at the top of the FORWARD chain with br-nf-call-ip6tables enabled by default. Similar to as it's done for MLDv1, MLDv2 and IPv6 Neighbor Discovery already, fix this issue by excluding MRD from connection tracking handling as MRD always uses predefined multicast destinations for its messages, too. This changes the ct-state for ICMPv6 MRD messages from INVALID to UNTRACKED. This issue was found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-05-06netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handlerPablo Neira Ayuso1-5/+1
Originally, device name used to be stored in the basechain, but it is not the case anymore. Remove check for NETDEV_CHANGENAME. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-05-06netfilter: nf_tables: skip transaction if update object is not implementedPablo Neira Ayuso1-2/+6
Turn update into noop as a follow up for: 9fedd894b4e1 ("netfilter: nf_tables: fix unexpected EOPNOTSUPP error") instead of adding a transaction object which is simply discarded at a later stage of the commit protocol. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-05-06Merge tag 'rtw-next-2024-05-04-v2' of https://github.com/pkshih/rtwKalle Valo81-8438/+8371
rtw-next patches for v6.10 Major changes are listed as below rtl8xxxu: - remove rtl8xxxu_ prefix from filename - cleanup includes of header files rtlwifi: - adjust code to share with coming support of rtl8192du rtw89: - complete features of new WiFi 7 chip 8922AE including BT-coexistence and WoWLAN - use BIOS ACPI settings to set TX power and channels
2024-05-05Linux 6.9-rc7Linus Torvalds1-1/+1
2024-05-05epoll: be better about file lifetimesLinus Torvalds1-1/+37
epoll can call out to vfs_poll() with a file pointer that may race with the last 'fput()'. That would make f_count go down to zero, and while the ep->mtx locking means that the resulting file pointer tear-down will be blocked until the poll returns, it means that f_count is already dead, and any use of it won't actually get a reference to the file any more: it's dead regardless. Make sure we have a valid ref on the file pointer before we call down to vfs_poll() from the epoll routines. Link: https://lore.kernel.org/lkml/[email protected]/ Reported-by: [email protected] Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-05-05Merge tag 'edac_urgent_for_v6.9_rc7' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Fix error logging and check user-supplied data when injecting an error in the versal EDAC driver * tag 'edac_urgent_for_v6.9_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/versal: Do not log total error counts EDAC/versal: Check user-supplied data before injecting an error EDAC/versal: Do not register for NOC errors
2024-05-05Merge tag 'powerpc-6.9-4' of ↵Linus Torvalds3-8/+15
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix incorrect delay handling in the plpks (keystore) code - Fix a panic when an LPAR boots with a frozen PE Thanks to Andrew Donnellan, Gaurav Batra, Nageswara R Sastry, and Nayna Jain. * tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE powerpc/pseries: make max polling consistent for longer H_CALLs
2024-05-05Merge tag 'x86-urgent-2024-05-05' of ↵Linus Torvalds9-67/+64
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Remove the broken vsyscall emulation code from the page fault code - Fix kexec crash triggered by certain SEV RMP table layouts - Fix unchecked MSR access error when disabling the x2APIC via iommu=off * tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove broken vsyscall emulation code from the page fault code x86/apic: Don't access the APIC when disabling x2APIC x86/sev: Add callback to apply RMP table fixups for kexec x86/e820: Add a new e820 table update helper
2024-05-05Merge tag 'irq-urgent-2024-05-05' of ↵Linus Torvalds1-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix suspicious RCU usage in __do_softirq()" * tag 'irq-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: softirq: Fix suspicious RCU usage in __do_softirq()
2024-05-05Merge tag 'char-misc-6.9-rc7' of ↵Linus Torvalds13-25/+118
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc/other driver fixes and new device ids for 6.9-rc7 that resolve some reported problems. Included in here are: - iio driver fixes - mei driver fix and new device ids - dyndbg bugfix - pvpanic-pci driver bugfix - slimbus driver bugfix - fpga new device id All have been in linux-next with no reported problems" * tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: slimbus: qcom-ngd-ctrl: Add timeout for wait operation dyndbg: fix old BUG_ON in >control parser misc/pvpanic-pci: register attributes via pci_driver fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card mei: me: add lunar lake point M DID mei: pxp: match against PCI_CLASS_DISPLAY_OTHER iio:imu: adis16475: Fix sync mode setting iio: accel: mxc4005: Reset chip on probe() and resume() iio: accel: mxc4005: Interrupt handling fixes dt-bindings: iio: health: maxim,max30102: fix compatible check iio: pressure: Fixes SPI support for BMP3xx devices iio: pressure: Fixes BME280 SPI driver data
2024-05-05Merge tag 'usb-6.9-rc7' of ↵Linus Torvalds15-79/+147
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB driver fixes for reported problems for 6.9-rc7. Included in here are: - usb core fixes for found issues - typec driver fixes for reported problems - usb gadget driver fixes for reported problems - xhci build fixes - dwc3 driver fixes for reported issues All of these have been in linux-next this past week with no reported problems" * tag 'usb-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tcpm: Check for port partner validity before consuming it usb: typec: tcpm: enforce ready state when queueing alt mode vdm usb: typec: tcpm: unregister existing source caps before re-registration usb: typec: tcpm: clear pd_event queue in PORT_RESET usb: typec: tcpm: queue correct sop type in tcpm_queue_vdm_unlocked usb: Fix regression caused by invalid ep0 maxpacket in virtual SuperSpeed device usb: ohci: Prevent missed ohci interrupts usb: typec: qcom-pmic: fix pdphy start() error handling usb: typec: qcom-pmic: fix use-after-free on late probe errors usb: gadget: f_fs: Fix a race condition when processing setup packets. USB: core: Fix access violation during port device removal usb: dwc3: core: Prevent phy suspend during init usb: xhci-plat: Don't include xhci.h usb: gadget: uvc: use correct buffer size when parsing configfs lists usb: gadget: composite: fix OS descriptors w_value logic usb: gadget: f_fs: Fix race between aio_cancel() and AIO request complete
2024-05-05Merge tag 'input-for-v6.9-rc6' of ↵Linus Torvalds2-1/+9
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - a new ID for ASUS ROG RAIKIRI controllers added to xpad driver - amimouse driver structure annotated with __refdata to prevent section mismatch warnings. * tag 'input-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: amimouse - mark driver struct with __refdata to prevent section mismatch Input: xpad - add support for ASUS ROG RAIKIRI
2024-05-05Merge tag 'probes-fixes-v6.9-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - probe-events: Fix memory leak in parsing probe argument. There is a memory leak (forget to free an allocated buffer) in a memory allocation failure path. Fix it to jump to the correct error handling code. * tag 'probes-fixes-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Fix memory leak in traceprobe_parse_probe_arg_body()
2024-05-05Merge tag 'trace-v6.9-rc6-2' of ↵Linus Torvalds5-59/+210
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing and tracefs fixes from Steven Rostedt: - Fix RCU callback of freeing an eventfs_inode. The freeing of the eventfs_inode from the kref going to zero freed the contents of the eventfs_inode and then used kfree_rcu() to free the inode itself. But the contents should also be protected by RCU. Switch to a call_rcu() that calls a function to free all of the eventfs_inode after the RCU synchronization. - The tracing subsystem maps its own descriptor to a file represented by eventfs. The freeing of this descriptor needs to know when the last reference of an eventfs_inode is released, but currently there is no interface for that. Add a "release" callback to the eventfs_inode entry array that allows for freeing of data that can be referenced by the eventfs_inode being opened. Then increment the ref counter for this descriptor when the eventfs_inode file is created, and decrement/free it when the last reference to the eventfs_inode is released and the file is removed. This prevents races between freeing the descriptor and the opening of the eventfs file. - Fix the permission processing of eventfs. The change to make the permissions of eventfs default to the mount point but keep track of when changes were made had a side effect that could cause security concerns. When the tracefs is remounted with a given gid or uid, all the files within it should inherit that gid or uid. But if the admin had changed the permission of some file within the tracefs file system, it would not get updated by the remount. This caused the kselftest of file permissions to fail the second time it is run. The first time, all changes would look fine, but the second time, because the changes were "saved", the remount did not reset them. Create a link list of all existing tracefs inodes, and clear the saved flags on them on a remount if the remount changes the corresponding gid or uid fields. This also simplifies the code by removing the distinction between the toplevel eventfs and an instance eventfs. They should both act the same. They were different because of a misconception due to the remount not resetting the flags. Now that remount resets all the files and directories to default to the root node if a uid/gid is specified, it makes the logic simpler to implement. * tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Have "events" directory get permissions from its parent eventfs: Do not treat events directory different than other directories eventfs: Do not differentiate the toplevel events directory tracefs: Still use mount point as default permissions for instances tracefs: Reset permissions on remount if permissions are options eventfs: Free all of the eventfs_inode after RCU eventfs/tracing: Add callback for release of an eventfs_inode
2024-05-05Merge tag 'dma-mapping-6.9-2024-05-04' of ↵Linus Torvalds1-0/+1
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - fix the combination of restricted pools and dynamic swiotlb (Will Deacon) * tag 'dma-mapping-6.9-2024-05-04' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=y
2024-05-05Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds7-8/+60
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of clk driver fixes: - Avoid a deadlock in the Qualcomm clk driver by making the regulator which supplies the GDSC optional - Restore RPM clks on Qualcomm msm8976 by setting num_clks - Fix Allwinner H6 CPU rate changing logic to avoid system crashes by temporarily reparenting the CPU clk to something that isn't being changed - Set a MIPI PLL min/max rate on Allwinner A64 to fix blank screens on some devices - Revert back to of_match_device() in the Samsung clkout driver to get the match data based on the parent device's compatible string" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: samsung: Revert "clk: Use device_get_match_data()" clk: sunxi-ng: a64: Set minimum and maximum rate for PLL-MIPI clk: sunxi-ng: common: Support minimum and maximum rate clk: sunxi-ng: h6: Reparent CPUX during PLL CPUX rate change clk: qcom: smd-rpm: Restore msm8976 num_clk clk: qcom: gdsc: treat optional supplies as optional
2024-05-05Merge branch 'gve-queue-api'David S. Miller11-406/+489
Shailend Chand says: ==================== gve: Implement queue api Following the discussion on https://patchwork.kernel.org/project/linux-media/patch/[email protected]/, the queue api defined by Mina is implemented for gve. The first patch is just Mina's introduction of the api. The rest of the patches make surgical changes in gve to enable it to work correctly with only a subset of queues present (thus far it had assumed that either all queues are up or all are down). The final patch has the api implementation. Changes since v1: clang warning fixes, kdoc warning fix, and addressed review comments. ==================== Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Alloc and free QPLs with the ringsShailend Chand7-331/+171
Every tx and rx ring has its own queue-page-list (QPL) that serves as the bounce buffer. Previously we were allocating QPLs for all queues before the queues themselves were allocated and later associating a QPL with a queue. This is avoidable complexity: it is much more natural for each queue to allocate and free its own QPL. Moreover, the advent of new queue-manipulating ndo hooks make it hard to keep things as is: we would need to transfer a QPL from an old queue to a new queue, and that is unpleasant. Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Account for stopped queues when reading NIC statsShailend Chand1-6/+35
We now account for the fact that the NIC might send us stats for a subset of queues. Without this change, gve_get_ethtool_stats might make an invalid access on the priv->stats_report->stats array. Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Reset Rx ring state in the ring-stop funcsShailend Chand2-30/+120
This does not fix any existing bug. In anticipation of the ndo queue api hooks that alloc/free/start/stop a single Rx queue, the already existing per-queue stop functions are being made more robust. Specifically for this use case: rx_queue_n.stop() + rx_queue_n.start() Note that this is not the use case being used in devmem tcp (the first place these new ndo hooks would be used). There the usecase is: new_queue.alloc() + old_queue.stop() + new_queue.start() + old_queue.free() Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Avoid rescheduling napi if on wrong cpuShailend Chand2-2/+32
In order to make possible the implementation of per-queue ndo hooks, gve_turnup was changed in a previous patch to account for queues already having some unprocessed descriptors: it does a one-off napi_schdule to handle them. If conditions of consistent high traffic persist in the immediate aftermath of this, the poll routine for a queue can be "stuck" on the cpu on which the ndo hooks ran, instead of the cpu its irq has affinity with. This situation is exacerbated by the fact that the ndo hooks for all the queues are invoked on the same cpu, potentially causing all the napi poll routines to be residing on the same cpu. A self correcting mechanism in the poll method itself solves this problem. Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Make gve_turnup work for nonempty queuesShailend Chand1-0/+14
gVNIC has a requirement that all queues have to be quiesced before any queue is operated on (created or destroyed). To enable the implementation of future ndo hooks that work on a single queue, we need to evolve gve_turnup to account for queues already having some unprocessed descriptors in the ring. Say rxq 4 is being stopped and started via the queue api. Due to gve's requirement of quiescence, queues 0 through 3 are not processing their rings while queue 4 is being toggled. Once they are made live, these queues need to be poked to cause them to check their rings for descriptors that were written during their brief period of quiescence. Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Make gve_turn(up|down) ignore stopped queuesShailend Chand1-0/+10
Currently the queues are either all live or all dead, toggling from one state to the other via the ndo open and stop hooks. The future addition of single-queue ndo hooks changes this, and thus gve_turnup and gve_turndown should evolve to account for a state where some queues are live and some aren't. Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Add adminq funcs to add/remove a single Rx queueShailend Chand2-27/+54
This allows for implementing future ndo hooks that act on a single queue. Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05gve: Make the GQ RX free queue funcs idempotentShailend Chand1-10/+19
Although this is not fixing any existing double free bug, making these functions idempotent allows for a simpler implementation of future ndo hooks that act on a single queue. Tested-by: Mina Almasry <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-05queue_api: define queue apiMina Almasry2-0/+34
This API enables the net stack to reset the queues used for devmem TCP. Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Shailend Chand <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-05-04ksmbd: do not grant v2 lease if parent lease key and epoch are not setNamjae Jeon1-5/+9
This patch fix xfstests generic/070 test with smb2 leases = yes. cifs.ko doesn't set parent lease key and epoch in create context v2 lease. ksmbd suppose that parent lease and epoch are vaild if data length is v2 lease context size and handle directory lease using this values. ksmbd should hanle it as v1 lease not v2 lease if parent lease key and epoch are not set in create context v2 lease. Cc: [email protected] Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-05-04ksmbd: use rwsem instead of rwlock for lease breakNamjae Jeon5-38/+30
lease break wait for lease break acknowledgment. rwsem is more suitable than unlock while traversing the list for parent lease break in ->m_op_list. Cc: [email protected] Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-05-04ksmbd: avoid to send duplicate lease break notificationsNamjae Jeon1-6/+15
This patch fixes generic/011 when enable smb2 leases. if ksmbd sends multiple notifications for a file, cifs increments the reference count of the file but it does not decrement the count by the failure of queue_work. So even if the file is closed, cifs does not send a SMB2_CLOSE request. Cc: [email protected] Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-05-04ksmbd: off ipv6only for both ipv4/ipv6 bindingNamjae Jeon1-0/+4
ΕΛΕΝΗ reported that ksmbd binds to the IPV6 wildcard (::) by default for ipv4 and ipv6 binding. So IPV4 connections are successful only when the Linux system parameter bindv6only is set to 0 [default value]. If this parameter is set to 1, then the ipv6 wildcard only represents any IPV6 address. Samba creates different sockets for ipv4 and ipv6 by default. This patch off sk_ipv6only to support IPV4/IPV6 connections without creating two sockets. Cc: [email protected] Reported-by: ΕΛΕΝΗ ΤΖΑΒΕΛΛΑ <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-05-04wifi: rtlwifi: 8192d: initialize rate_mask in rtl92de_update_hal_rate_mask()Ping-Ke Shih1-1/+1
le32p_replace_bits() only updates partial bits of rate_mask, and GCC warns below. Set initial value to avoid warnings, and prevent random value of missed bits (bit 6 of rate_mask.macid_and_short_gi). In file included from ./include/linux/fortify-string.h:5, from ./include/linux/string.h:369, from ./include/linux/bitmap.h:13, from ./include/linux/cpumask.h:13, from ./include/linux/sched.h:16, from drivers/net/wireless/realtek/rtlwifi/rtl8192d/../wifi.h:9, from drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c:4: In function 'le32p_replace_bits', inlined from 'rtl92de_update_hal_rate_mask.isra' at drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c:986:2: ./include/linux/bitfield.h:189:15: warning: 'rate_mask' is used uninitialized [-Wuninitialized] 189 | *p = (*p & ~to(field)) | type##_encode_bits(val, field); \ | ^~ ./include/linux/bitfield.h:196:9: note: in expansion of macro '____MAKE_OP' 196 | ____MAKE_OP(le##size,u##size,cpu_to_le##size,le##size##_to_cpu) \ | ^~~~~~~~~~~ ./include/linux/bitfield.h:201:1: note: in expansion of macro '__MAKE_OP' 201 | __MAKE_OP(32) | ^~~~~~~~~ drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c: In function 'rtl92de_update_hal_rate_mask.isra': drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c:863:37: note: 'rate_mask' declared here 863 | struct rtl92d_rate_mask_h2c rate_mask; | ^~~~~~~~~ Compile tested only. Fixes: 014bba73b525 ("wifi: rtlwifi: Adjust rtl8192d-common for USB") Cc: Bitterblue Smith <[email protected]> Signed-off-by: Ping-Ke Shih <[email protected]> Link: https://msgid.link/[email protected]
2024-05-04eventfs: Have "events" directory get permissions from its parentSteven Rostedt (Google)1-6/+24
The events directory gets its permissions from the root inode. But this can cause an inconsistency if the instances directory changes its permissions, as the permissions of the created directories under it should inherit the permissions of the instances directory when directories under it are created. Currently the behavior is: # cd /sys/kernel/tracing # chgrp 1002 instances # mkdir instances/foo # ls -l instances/foo [..] -r--r----- 1 root lkp 0 May 1 18:55 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 18:55 current_tracer -rw-r----- 1 root lkp 0 May 1 18:55 error_log drwxr-xr-x 1 root root 0 May 1 18:55 events --w------- 1 root lkp 0 May 1 18:55 free_buffer drwxr-x--- 2 root lkp 0 May 1 18:55 options drwxr-x--- 10 root lkp 0 May 1 18:55 per_cpu -rw-r----- 1 root lkp 0 May 1 18:55 set_event All the files and directories under "foo" has the "lkp" group except the "events" directory. That's because its getting its default value from the mount point instead of its parent. Have the "events" directory make its default value based on its parent's permissions. That now gives: # ls -l instances/foo [..] -rw-r----- 1 root lkp 0 May 1 21:16 buffer_subbuf_size_kb -r--r----- 1 root lkp 0 May 1 21:16 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 21:16 current_tracer -rw-r----- 1 root lkp 0 May 1 21:16 error_log drwxr-xr-x 1 root lkp 0 May 1 21:16 events --w------- 1 root lkp 0 May 1 21:16 free_buffer drwxr-x--- 2 root lkp 0 May 1 21:16 options drwxr-x--- 10 root lkp 0 May 1 21:16 per_cpu -rw-r----- 1 root lkp 0 May 1 21:16 set_event Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04eventfs: Do not treat events directory different than other directoriesSteven Rostedt (Google)1-15/+1
Treat the events directory the same as other directories when it comes to permissions. The events directory was considered different because it's dentry is persistent, whereas the other directory dentries are created when accessed. But the way tracefs now does its ownership by using the root dentry's permissions as the default permissions, the events directory can get out of sync when a remount is performed setting the group and user permissions. Remove the special case for the events directory on setting the attributes. This allows the updates caused by remount to work properly as well as simplifies the code. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04eventfs: Do not differentiate the toplevel events directorySteven Rostedt (Google)2-25/+11
The toplevel events directory is really no different than the events directory of instances. Having the two be different caused inconsistencies and made it harder to fix the permissions bugs. Make all events directories act the same. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04tracefs: Still use mount point as default permissions for instancesSteven Rostedt (Google)1-2/+25
If the instances directory's permissions were never change, then have it and its children use the mount point permissions as the default. Currently, the permissions of instance directories are determined by the instance directory's permissions itself. But if the tracefs file system is remounted and changes the permissions, the instance directory and its children should use the new permission. But because both the instance directory and its children use the instance directory's inode for permissions, it misses the update. To demonstrate this: # cd /sys/kernel/tracing/ # mkdir instances/foo # ls -ld instances/foo drwxr-x--- 5 root root 0 May 1 19:07 instances/foo # ls -ld instances drwxr-x--- 3 root root 0 May 1 18:57 instances # ls -ld current_tracer -rw-r----- 1 root root 0 May 1 18:57 current_tracer # mount -o remount,gid=1002 . # ls -ld instances drwxr-x--- 3 root root 0 May 1 18:57 instances # ls -ld instances/foo/ drwxr-x--- 5 root root 0 May 1 19:07 instances/foo/ # ls -ld current_tracer -rw-r----- 1 root lkp 0 May 1 18:57 current_tracer Notice that changing the group id to that of "lkp" did not affect the instances directory nor its children. It should have been: # ls -ld current_tracer -rw-r----- 1 root root 0 May 1 19:19 current_tracer # ls -ld instances/foo/ drwxr-x--- 5 root root 0 May 1 19:25 instances/foo/ # ls -ld instances drwxr-x--- 3 root root 0 May 1 19:19 instances # mount -o remount,gid=1002 . # ls -ld current_tracer -rw-r----- 1 root lkp 0 May 1 19:19 current_tracer # ls -ld instances drwxr-x--- 3 root lkp 0 May 1 19:19 instances # ls -ld instances/foo/ drwxr-x--- 5 root lkp 0 May 1 19:25 instances/foo/ Where all files were updated by the remount gid update. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04tracefs: Reset permissions on remount if permissions are optionsSteven Rostedt (Google)3-2/+99
There's an inconsistency with the way permissions are handled in tracefs. Because the permissions are generated when accessed, they default to the root inode's permission if they were never set by the user. If the user sets the permissions, then a flag is set and the permissions are saved via the inode (for tracefs files) or an internal attribute field (for eventfs). But if a remount happens that specify the permissions, all the files that were not changed by the user gets updated, but the ones that were are not. If the user were to remount the file system with a given permission, then all files and directories within that file system should be updated. This can cause security issues if a file's permission was updated but the admin forgot about it. They could incorrectly think that remounting with permissions set would update all files, but miss some. For example: # cd /sys/kernel/tracing # chgrp 1002 current_tracer # ls -l [..] -rw-r----- 1 root root 0 May 1 21:25 buffer_size_kb -rw-r----- 1 root root 0 May 1 21:25 buffer_subbuf_size_kb -r--r----- 1 root root 0 May 1 21:25 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 21:25 current_tracer -rw-r----- 1 root root 0 May 1 21:25 dynamic_events -r--r----- 1 root root 0 May 1 21:25 dyn_ftrace_total_info -r--r----- 1 root root 0 May 1 21:25 enabled_functions Where current_tracer now has group "lkp". # mount -o remount,gid=1001 . # ls -l -rw-r----- 1 root tracing 0 May 1 21:25 buffer_size_kb -rw-r----- 1 root tracing 0 May 1 21:25 buffer_subbuf_size_kb -r--r----- 1 root tracing 0 May 1 21:25 buffer_total_size_kb -rw-r----- 1 root lkp 0 May 1 21:25 current_tracer -rw-r----- 1 root tracing 0 May 1 21:25 dynamic_events -r--r----- 1 root tracing 0 May 1 21:25 dyn_ftrace_total_info -r--r----- 1 root tracing 0 May 1 21:25 enabled_functions Everything changed but the "current_tracer". Add a new link list that keeps track of all the tracefs_inodes which has the permission flags that tell if the file/dir should use the root inode's permission or not. Then on remount, clear all the flags so that the default behavior of using the root inode's permission is done for all files and directories. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04eventfs: Free all of the eventfs_inode after RCUSteven Rostedt (Google)1-9/+16
The freeing of eventfs_inode via a kfree_rcu() callback. But the content of the eventfs_inode was being freed after the last kref. This is dangerous, as changes are being made that can access the content of an eventfs_inode from an RCU loop. Instead of using kfree_rcu() use call_rcu() that calls a function to do all the freeing of the eventfs_inode after a RCU grace period has expired. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-05-04eventfs/tracing: Add callback for release of an eventfs_inodeSteven Rostedt (Google)3-2/+36
Synthetic events create and destroy tracefs files when they are created and removed. The tracing subsystem has its own file descriptor representing the state of the events attached to the tracefs files. There's a race between the eventfs files and this file descriptor of the tracing system where the following can cause an issue: With two scripts 'A' and 'B' doing: Script 'A': echo "hello int aaa" > /sys/kernel/tracing/synthetic_events while : do echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable done Script 'B': echo > /sys/kernel/tracing/synthetic_events Script 'A' creates a synthetic event "hello" and then just writes zero into its enable file. Script 'B' removes all synthetic events (including the newly created "hello" event). What happens is that the opening of the "enable" file has: { struct trace_event_file *file = inode->i_private; int ret; ret = tracing_check_open_get_tr(file->tr); [..] But deleting the events frees the "file" descriptor, and a "use after free" happens with the dereference at "file->tr". The file descriptor does have a reference counter, but there needs to be a way to decrement it from the eventfs when the eventfs_inode is removed that represents this file descriptor. Add an optional "release" callback to the eventfs_entry array structure, that gets called when the eventfs file is about to be removed. This allows for the creating on the eventfs file to increment the tracing file descriptor ref counter. When the eventfs file is deleted, it can call the release function that will call the put function for the tracing file descriptor. This will protect the tracing file from being freed while a eventfs file that references it is being opened. Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Tze-nan wu <[email protected]> Tested-by: Tze-nan Wu (吳澤南) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>