Age | Commit message (Collapse) | Author | Files | Lines |
|
We're using mutex_lock() inside a wait_event() conditional -
prepare_to_wait() has already flipped task state, so potentially
blocking ops need annotation.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
net_alloc_generic is called by net_alloc, which is called without any
locking. It reads max_gen_ptrs, which is changed under pernet_ops_rwsem. It
is read twice, first to allocate an array, then to set s.len, which is
later used to limit the bounds of the array access.
It is possible that the array is allocated and another thread is
registering a new pernet ops, increments max_gen_ptrs, which is then used
to set s.len with a larger than allocated length for the variable array.
Fix it by reading max_gen_ptrs only once in net_alloc_generic. If
max_gen_ptrs is later incremented, it will be caught in net_assign_generic.
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Fixes: 073862ba5d24 ("netns: fix net_alloc_generic()")
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Felix Fietkau says:
====================
Add TCP fraglist GRO support
When forwarding TCP after GRO, software segmentation is very expensive,
especially when the checksum needs to be recalculated.
One case where that's currently unavoidable is when routing packets over
PPPoE. Performance improves significantly when using fraglist GRO
implemented in the same way as for UDP.
When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
socket in the same netns as the receiving device. While this may not
cover all relevant use cases in multi-netns configurations, it should be
good enough for most configurations that need this.
Here's a measurement of running 2 TCP streams through a MediaTek MT7622
device (2-core Cortex-A53), which runs NAT with flow offload enabled from
one ethernet port to PPPoE on another ethernet port + cake qdisc set to
1Gbps.
rx-gro-list off: 630 Mbit/s, CPU 35% idle
rx-gro-list on: 770 Mbit/s, CPU 40% idle
Changes since v4:
- add likely() to prefer the non-fraglist path in check
Changes since v3:
- optimize __tcpv4_gso_segment_csum
- add unlikely()
- reorder dev_net/skb_gro_network_header calls after NETIF_F_GRO_FRAGLIST
check
- add support for ipv6 nat
- drop redundant pskb_may_pull check
Changes since v2:
- create tcp_gro_header_pull helper function to pull tcp header only once
- optimize __tcpv4_gso_segment_list_csum, drop obsolete flags check
Changes since v1:
- revert bogus tcp flags overwrite on segmentation
- fix kbuild issue with !CONFIG_IPV6
- only perform socket lookup for the first skb in the GRO train
Changes since RFC:
- split up patches
- handle TCP flags mutations
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When forwarding TCP after GRO, software segmentation is very expensive,
especially when the checksum needs to be recalculated.
One case where that's currently unavoidable is when routing packets over
PPPoE. Performance improves significantly when using fraglist GRO
implemented in the same way as for UDP.
When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established
socket in the same netns as the receiving device. While this may not
cover all relevant use cases in multi-netns configurations, it should be
good enough for most configurations that need this.
Here's a measurement of running 2 TCP streams through a MediaTek MT7622
device (2-core Cortex-A53), which runs NAT with flow offload enabled from
one ethernet port to PPPoE on another ethernet port + cake qdisc set to
1Gbps.
rx-gro-list off: 630 Mbit/s, CPU 35% idle
rx-gro-list on: 770 Mbit/s, CPU 40% idle
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Pull the code out of tcp_gro_receive in order to access the tcp header
from tcp4/6_gro_receive.
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This pulls the flow port matching out of tcp_gro_receive, so that it can be
reused for the next change, which adds the TCP fraglist GRO heuristic.
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This implements fraglist GRO similar to how it's handled in UDP, however
no functional changes are added yet. The next change adds a heuristic for
using fraglist GRO instead of regular GRO.
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Preparation for adding TCP fraglist GRO support. It expects packets to be
combined in a similar way as UDP fraglist GSO packets.
For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO.
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This helper function will be used for TCP fraglist GRO support
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The PTP_CMD_CTL is a self clearing register which controls the PTP clock
values. In the current implementation driver waits for a duration of 20
sec in case of HW failure to clear the PTP_CMD_CTL register bit. This
timeout of 20 sec is very long to recognize a HW failure, as it is
typically cleared in one clock(<16ns). Hence reducing the timeout to 1 sec
would be sufficient to conclude if there is any HW failure observed. The
usleep_range will sleep somewhere between 1 msec to 20 msec for each
iteration. By setting the PTP_CMD_CTL_TIMEOUT_CNT to 50 the max timeout
is extended to 1 sec.
Signed-off-by: Rengarajan S <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
It would be better not to drop skb in conntrack unless we have good
alternatives. So we can treat the result of testing skb's header
pointer as nf_conntrack_tcp_packet() does.
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
So far Multicast Router Advertisements and Multicast Router
Solicitations from the Multicast Router Discovery protocol (RFC4286)
would be marked as INVALID for IPv6, even if they are in fact intact
and adhering to RFC4286.
This broke MRA reception and by that multicast reception on
IPv6 multicast routers in a Proxmox managed setup, where Proxmox
would install a rule like "-m conntrack --ctstate INVALID -j DROP"
at the top of the FORWARD chain with br-nf-call-ip6tables enabled
by default.
Similar to as it's done for MLDv1, MLDv2 and IPv6 Neighbor Discovery
already, fix this issue by excluding MRD from connection tracking
handling as MRD always uses predefined multicast destinations
for its messages, too. This changes the ct-state for ICMPv6 MRD messages
from INVALID to UNTRACKED.
This issue was found and fixed with the help of the mrdisc tool
(https://github.com/troglobit/mrdisc).
Signed-off-by: Linus Lüssing <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Originally, device name used to be stored in the basechain, but it is
not the case anymore. Remove check for NETDEV_CHANGENAME.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Turn update into noop as a follow up for:
9fedd894b4e1 ("netfilter: nf_tables: fix unexpected EOPNOTSUPP error")
instead of adding a transaction object which is simply discarded at a
later stage of the commit protocol.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
rtw-next patches for v6.10
Major changes are listed as below
rtl8xxxu:
- remove rtl8xxxu_ prefix from filename
- cleanup includes of header files
rtlwifi:
- adjust code to share with coming support of rtl8192du
rtw89:
- complete features of new WiFi 7 chip 8922AE including BT-coexistence
and WoWLAN
- use BIOS ACPI settings to set TX power and channels
|
|
|
|
epoll can call out to vfs_poll() with a file pointer that may race with
the last 'fput()'. That would make f_count go down to zero, and while
the ep->mtx locking means that the resulting file pointer tear-down will
be blocked until the poll returns, it means that f_count is already
dead, and any use of it won't actually get a reference to the file any
more: it's dead regardless.
Make sure we have a valid ref on the file pointer before we call down to
vfs_poll() from the epoll routines.
Link: https://lore.kernel.org/lkml/[email protected]/
Reported-by: [email protected]
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix error logging and check user-supplied data when injecting an
error in the versal EDAC driver
* tag 'edac_urgent_for_v6.9_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/versal: Do not log total error counts
EDAC/versal: Check user-supplied data before injecting an error
EDAC/versal: Do not register for NOC errors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix incorrect delay handling in the plpks (keystore) code
- Fix a panic when an LPAR boots with a frozen PE
Thanks to Andrew Donnellan, Gaurav Batra, Nageswara R Sastry, and Nayna
Jain.
* tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE
powerpc/pseries: make max polling consistent for longer H_CALLs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Remove the broken vsyscall emulation code from
the page fault code
- Fix kexec crash triggered by certain SEV RMP
table layouts
- Fix unchecked MSR access error when disabling
the x2APIC via iommu=off
* tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove broken vsyscall emulation code from the page fault code
x86/apic: Don't access the APIC when disabling x2APIC
x86/sev: Add callback to apply RMP table fixups for kexec
x86/e820: Add a new e820 table update helper
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Fix suspicious RCU usage in __do_softirq()"
* tag 'irq-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
softirq: Fix suspicious RCU usage in __do_softirq()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc/other driver fixes and new device ids
for 6.9-rc7 that resolve some reported problems.
Included in here are:
- iio driver fixes
- mei driver fix and new device ids
- dyndbg bugfix
- pvpanic-pci driver bugfix
- slimbus driver bugfix
- fpga new device id
All have been in linux-next with no reported problems"
* tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
slimbus: qcom-ngd-ctrl: Add timeout for wait operation
dyndbg: fix old BUG_ON in >control parser
misc/pvpanic-pci: register attributes via pci_driver
fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card
mei: me: add lunar lake point M DID
mei: pxp: match against PCI_CLASS_DISPLAY_OTHER
iio:imu: adis16475: Fix sync mode setting
iio: accel: mxc4005: Reset chip on probe() and resume()
iio: accel: mxc4005: Interrupt handling fixes
dt-bindings: iio: health: maxim,max30102: fix compatible check
iio: pressure: Fixes SPI support for BMP3xx devices
iio: pressure: Fixes BME280 SPI driver data
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for reported problems for
6.9-rc7. Included in here are:
- usb core fixes for found issues
- typec driver fixes for reported problems
- usb gadget driver fixes for reported problems
- xhci build fixes
- dwc3 driver fixes for reported issues
All of these have been in linux-next this past week with no reported
problems"
* tag 'usb-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: Check for port partner validity before consuming it
usb: typec: tcpm: enforce ready state when queueing alt mode vdm
usb: typec: tcpm: unregister existing source caps before re-registration
usb: typec: tcpm: clear pd_event queue in PORT_RESET
usb: typec: tcpm: queue correct sop type in tcpm_queue_vdm_unlocked
usb: Fix regression caused by invalid ep0 maxpacket in virtual SuperSpeed device
usb: ohci: Prevent missed ohci interrupts
usb: typec: qcom-pmic: fix pdphy start() error handling
usb: typec: qcom-pmic: fix use-after-free on late probe errors
usb: gadget: f_fs: Fix a race condition when processing setup packets.
USB: core: Fix access violation during port device removal
usb: dwc3: core: Prevent phy suspend during init
usb: xhci-plat: Don't include xhci.h
usb: gadget: uvc: use correct buffer size when parsing configfs lists
usb: gadget: composite: fix OS descriptors w_value logic
usb: gadget: f_fs: Fix race between aio_cancel() and AIO request complete
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a new ID for ASUS ROG RAIKIRI controllers added to xpad driver
- amimouse driver structure annotated with __refdata to prevent section
mismatch warnings.
* tag 'input-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: amimouse - mark driver struct with __refdata to prevent section mismatch
Input: xpad - add support for ASUS ROG RAIKIRI
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fix from Masami Hiramatsu:
- probe-events: Fix memory leak in parsing probe argument.
There is a memory leak (forget to free an allocated buffer) in a
memory allocation failure path. Fix it to jump to the correct error
handling code.
* tag 'probes-fixes-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Fix memory leak in traceprobe_parse_probe_arg_body()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing and tracefs fixes from Steven Rostedt:
- Fix RCU callback of freeing an eventfs_inode.
The freeing of the eventfs_inode from the kref going to zero freed
the contents of the eventfs_inode and then used kfree_rcu() to free
the inode itself. But the contents should also be protected by RCU.
Switch to a call_rcu() that calls a function to free all of the
eventfs_inode after the RCU synchronization.
- The tracing subsystem maps its own descriptor to a file represented
by eventfs. The freeing of this descriptor needs to know when the
last reference of an eventfs_inode is released, but currently there
is no interface for that.
Add a "release" callback to the eventfs_inode entry array that allows
for freeing of data that can be referenced by the eventfs_inode being
opened. Then increment the ref counter for this descriptor when the
eventfs_inode file is created, and decrement/free it when the last
reference to the eventfs_inode is released and the file is removed.
This prevents races between freeing the descriptor and the opening of
the eventfs file.
- Fix the permission processing of eventfs.
The change to make the permissions of eventfs default to the mount
point but keep track of when changes were made had a side effect that
could cause security concerns. When the tracefs is remounted with a
given gid or uid, all the files within it should inherit that gid or
uid. But if the admin had changed the permission of some file within
the tracefs file system, it would not get updated by the remount.
This caused the kselftest of file permissions to fail the second time
it is run. The first time, all changes would look fine, but the
second time, because the changes were "saved", the remount did not
reset them.
Create a link list of all existing tracefs inodes, and clear the
saved flags on them on a remount if the remount changes the
corresponding gid or uid fields.
This also simplifies the code by removing the distinction between the
toplevel eventfs and an instance eventfs. They should both act the
same. They were different because of a misconception due to the
remount not resetting the flags. Now that remount resets all the
files and directories to default to the root node if a uid/gid is
specified, it makes the logic simpler to implement.
* tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Have "events" directory get permissions from its parent
eventfs: Do not treat events directory different than other directories
eventfs: Do not differentiate the toplevel events directory
tracefs: Still use mount point as default permissions for instances
tracefs: Reset permissions on remount if permissions are options
eventfs: Free all of the eventfs_inode after RCU
eventfs/tracing: Add callback for release of an eventfs_inode
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- fix the combination of restricted pools and dynamic swiotlb
(Will Deacon)
* tag 'dma-mapping-6.9-2024-05-04' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=y
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of clk driver fixes:
- Avoid a deadlock in the Qualcomm clk driver by making the regulator
which supplies the GDSC optional
- Restore RPM clks on Qualcomm msm8976 by setting num_clks
- Fix Allwinner H6 CPU rate changing logic to avoid system crashes by
temporarily reparenting the CPU clk to something that isn't being
changed
- Set a MIPI PLL min/max rate on Allwinner A64 to fix blank screens
on some devices
- Revert back to of_match_device() in the Samsung clkout driver to
get the match data based on the parent device's compatible string"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: samsung: Revert "clk: Use device_get_match_data()"
clk: sunxi-ng: a64: Set minimum and maximum rate for PLL-MIPI
clk: sunxi-ng: common: Support minimum and maximum rate
clk: sunxi-ng: h6: Reparent CPUX during PLL CPUX rate change
clk: qcom: smd-rpm: Restore msm8976 num_clk
clk: qcom: gdsc: treat optional supplies as optional
|
|
Shailend Chand says:
====================
gve: Implement queue api
Following the discussion on
https://patchwork.kernel.org/project/linux-media/patch/[email protected]/,
the queue api defined by Mina is implemented for gve.
The first patch is just Mina's introduction of the api. The rest of the
patches make surgical changes in gve to enable it to work correctly with
only a subset of queues present (thus far it had assumed that either all
queues are up or all are down). The final patch has the api
implementation.
Changes since v1: clang warning fixes, kdoc warning fix, and addressed
review comments.
====================
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Every tx and rx ring has its own queue-page-list (QPL) that serves as
the bounce buffer. Previously we were allocating QPLs for all queues
before the queues themselves were allocated and later associating a QPL
with a queue. This is avoidable complexity: it is much more natural for
each queue to allocate and free its own QPL.
Moreover, the advent of new queue-manipulating ndo hooks make it hard to
keep things as is: we would need to transfer a QPL from an old queue to
a new queue, and that is unpleasant.
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We now account for the fact that the NIC might send us stats for a
subset of queues. Without this change, gve_get_ethtool_stats might make
an invalid access on the priv->stats_report->stats array.
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This does not fix any existing bug. In anticipation of the ndo queue api
hooks that alloc/free/start/stop a single Rx queue, the already existing
per-queue stop functions are being made more robust. Specifically for
this use case: rx_queue_n.stop() + rx_queue_n.start()
Note that this is not the use case being used in devmem tcp (the first
place these new ndo hooks would be used). There the usecase is:
new_queue.alloc() + old_queue.stop() + new_queue.start() + old_queue.free()
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In order to make possible the implementation of per-queue ndo hooks,
gve_turnup was changed in a previous patch to account for queues already
having some unprocessed descriptors: it does a one-off napi_schdule to
handle them. If conditions of consistent high traffic persist in the
immediate aftermath of this, the poll routine for a queue can be "stuck"
on the cpu on which the ndo hooks ran, instead of the cpu its irq has
affinity with.
This situation is exacerbated by the fact that the ndo hooks for all the
queues are invoked on the same cpu, potentially causing all the napi
poll routines to be residing on the same cpu.
A self correcting mechanism in the poll method itself solves this
problem.
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
gVNIC has a requirement that all queues have to be quiesced before any
queue is operated on (created or destroyed). To enable the
implementation of future ndo hooks that work on a single queue, we need
to evolve gve_turnup to account for queues already having some
unprocessed descriptors in the ring.
Say rxq 4 is being stopped and started via the queue api. Due to gve's
requirement of quiescence, queues 0 through 3 are not processing their
rings while queue 4 is being toggled. Once they are made live, these
queues need to be poked to cause them to check their rings for
descriptors that were written during their brief period of quiescence.
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently the queues are either all live or all dead, toggling from one
state to the other via the ndo open and stop hooks. The future addition
of single-queue ndo hooks changes this, and thus gve_turnup and
gve_turndown should evolve to account for a state where some queues are
live and some aren't.
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This allows for implementing future ndo hooks that act on a single
queue.
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Although this is not fixing any existing double free bug, making these
functions idempotent allows for a simpler implementation of future ndo
hooks that act on a single queue.
Tested-by: Mina Almasry <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This API enables the net stack to reset the queues used for devmem TCP.
Signed-off-by: Mina Almasry <[email protected]>
Signed-off-by: Shailend Chand <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch fix xfstests generic/070 test with smb2 leases = yes.
cifs.ko doesn't set parent lease key and epoch in create context v2 lease.
ksmbd suppose that parent lease and epoch are vaild if data length is
v2 lease context size and handle directory lease using this values.
ksmbd should hanle it as v1 lease not v2 lease if parent lease key and
epoch are not set in create context v2 lease.
Cc: [email protected]
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
lease break wait for lease break acknowledgment.
rwsem is more suitable than unlock while traversing the list for parent
lease break in ->m_op_list.
Cc: [email protected]
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
This patch fixes generic/011 when enable smb2 leases.
if ksmbd sends multiple notifications for a file, cifs increments
the reference count of the file but it does not decrement the count by
the failure of queue_work.
So even if the file is closed, cifs does not send a SMB2_CLOSE request.
Cc: [email protected]
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
ΕΛΕΝΗ reported that ksmbd binds to the IPV6 wildcard (::) by default for
ipv4 and ipv6 binding. So IPV4 connections are successful only when
the Linux system parameter bindv6only is set to 0 [default value].
If this parameter is set to 1, then the ipv6 wildcard only represents
any IPV6 address. Samba creates different sockets for ipv4 and ipv6
by default. This patch off sk_ipv6only to support IPV4/IPV6 connections
without creating two sockets.
Cc: [email protected]
Reported-by: ΕΛΕΝΗ ΤΖΑΒΕΛΛΑ <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
le32p_replace_bits() only updates partial bits of rate_mask, and GCC warns
below. Set initial value to avoid warnings, and prevent random value of
missed bits (bit 6 of rate_mask.macid_and_short_gi).
In file included from ./include/linux/fortify-string.h:5,
from ./include/linux/string.h:369,
from ./include/linux/bitmap.h:13,
from ./include/linux/cpumask.h:13,
from ./include/linux/sched.h:16,
from drivers/net/wireless/realtek/rtlwifi/rtl8192d/../wifi.h:9,
from drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c:4:
In function 'le32p_replace_bits',
inlined from 'rtl92de_update_hal_rate_mask.isra' at drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c:986:2:
./include/linux/bitfield.h:189:15: warning: 'rate_mask' is used uninitialized [-Wuninitialized]
189 | *p = (*p & ~to(field)) | type##_encode_bits(val, field); \
| ^~
./include/linux/bitfield.h:196:9: note: in expansion of macro '____MAKE_OP'
196 | ____MAKE_OP(le##size,u##size,cpu_to_le##size,le##size##_to_cpu) \
| ^~~~~~~~~~~
./include/linux/bitfield.h:201:1: note: in expansion of macro '__MAKE_OP'
201 | __MAKE_OP(32)
| ^~~~~~~~~
drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c: In function 'rtl92de_update_hal_rate_mask.isra':
drivers/net/wireless/realtek/rtlwifi/rtl8192d/hw_common.c:863:37: note: 'rate_mask' declared here
863 | struct rtl92d_rate_mask_h2c rate_mask;
| ^~~~~~~~~
Compile tested only.
Fixes: 014bba73b525 ("wifi: rtlwifi: Adjust rtl8192d-common for USB")
Cc: Bitterblue Smith <[email protected]>
Signed-off-by: Ping-Ke Shih <[email protected]>
Link: https://msgid.link/[email protected]
|
|
The events directory gets its permissions from the root inode. But this
can cause an inconsistency if the instances directory changes its
permissions, as the permissions of the created directories under it should
inherit the permissions of the instances directory when directories under
it are created.
Currently the behavior is:
# cd /sys/kernel/tracing
# chgrp 1002 instances
# mkdir instances/foo
# ls -l instances/foo
[..]
-r--r----- 1 root lkp 0 May 1 18:55 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 18:55 current_tracer
-rw-r----- 1 root lkp 0 May 1 18:55 error_log
drwxr-xr-x 1 root root 0 May 1 18:55 events
--w------- 1 root lkp 0 May 1 18:55 free_buffer
drwxr-x--- 2 root lkp 0 May 1 18:55 options
drwxr-x--- 10 root lkp 0 May 1 18:55 per_cpu
-rw-r----- 1 root lkp 0 May 1 18:55 set_event
All the files and directories under "foo" has the "lkp" group except the
"events" directory. That's because its getting its default value from the
mount point instead of its parent.
Have the "events" directory make its default value based on its parent's
permissions. That now gives:
# ls -l instances/foo
[..]
-rw-r----- 1 root lkp 0 May 1 21:16 buffer_subbuf_size_kb
-r--r----- 1 root lkp 0 May 1 21:16 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:16 current_tracer
-rw-r----- 1 root lkp 0 May 1 21:16 error_log
drwxr-xr-x 1 root lkp 0 May 1 21:16 events
--w------- 1 root lkp 0 May 1 21:16 free_buffer
drwxr-x--- 2 root lkp 0 May 1 21:16 options
drwxr-x--- 10 root lkp 0 May 1 21:16 per_cpu
-rw-r----- 1 root lkp 0 May 1 21:16 set_event
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Treat the events directory the same as other directories when it comes to
permissions. The events directory was considered different because it's
dentry is persistent, whereas the other directory dentries are created
when accessed. But the way tracefs now does its ownership by using the
root dentry's permissions as the default permissions, the events directory
can get out of sync when a remount is performed setting the group and user
permissions.
Remove the special case for the events directory on setting the
attributes. This allows the updates caused by remount to work properly as
well as simplifies the code.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The toplevel events directory is really no different than the events
directory of instances. Having the two be different caused
inconsistencies and made it harder to fix the permissions bugs.
Make all events directories act the same.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
If the instances directory's permissions were never change, then have it
and its children use the mount point permissions as the default.
Currently, the permissions of instance directories are determined by the
instance directory's permissions itself. But if the tracefs file system is
remounted and changes the permissions, the instance directory and its
children should use the new permission.
But because both the instance directory and its children use the instance
directory's inode for permissions, it misses the update.
To demonstrate this:
# cd /sys/kernel/tracing/
# mkdir instances/foo
# ls -ld instances/foo
drwxr-x--- 5 root root 0 May 1 19:07 instances/foo
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 18:57 instances
# ls -ld current_tracer
-rw-r----- 1 root root 0 May 1 18:57 current_tracer
# mount -o remount,gid=1002 .
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 18:57 instances
# ls -ld instances/foo/
drwxr-x--- 5 root root 0 May 1 19:07 instances/foo/
# ls -ld current_tracer
-rw-r----- 1 root lkp 0 May 1 18:57 current_tracer
Notice that changing the group id to that of "lkp" did not affect the
instances directory nor its children. It should have been:
# ls -ld current_tracer
-rw-r----- 1 root root 0 May 1 19:19 current_tracer
# ls -ld instances/foo/
drwxr-x--- 5 root root 0 May 1 19:25 instances/foo/
# ls -ld instances
drwxr-x--- 3 root root 0 May 1 19:19 instances
# mount -o remount,gid=1002 .
# ls -ld current_tracer
-rw-r----- 1 root lkp 0 May 1 19:19 current_tracer
# ls -ld instances
drwxr-x--- 3 root lkp 0 May 1 19:19 instances
# ls -ld instances/foo/
drwxr-x--- 5 root lkp 0 May 1 19:25 instances/foo/
Where all files were updated by the remount gid update.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
There's an inconsistency with the way permissions are handled in tracefs.
Because the permissions are generated when accessed, they default to the
root inode's permission if they were never set by the user. If the user
sets the permissions, then a flag is set and the permissions are saved via
the inode (for tracefs files) or an internal attribute field (for
eventfs).
But if a remount happens that specify the permissions, all the files that
were not changed by the user gets updated, but the ones that were are not.
If the user were to remount the file system with a given permission, then
all files and directories within that file system should be updated.
This can cause security issues if a file's permission was updated but the
admin forgot about it. They could incorrectly think that remounting with
permissions set would update all files, but miss some.
For example:
# cd /sys/kernel/tracing
# chgrp 1002 current_tracer
# ls -l
[..]
-rw-r----- 1 root root 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root root 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root root 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root root 0 May 1 21:25 dynamic_events
-r--r----- 1 root root 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root root 0 May 1 21:25 enabled_functions
Where current_tracer now has group "lkp".
# mount -o remount,gid=1001 .
# ls -l
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root tracing 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root tracing 0 May 1 21:25 dynamic_events
-r--r----- 1 root tracing 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root tracing 0 May 1 21:25 enabled_functions
Everything changed but the "current_tracer".
Add a new link list that keeps track of all the tracefs_inodes which has
the permission flags that tell if the file/dir should use the root inode's
permission or not. Then on remount, clear all the flags so that the
default behavior of using the root inode's permission is done for all
files and directories.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The freeing of eventfs_inode via a kfree_rcu() callback. But the content
of the eventfs_inode was being freed after the last kref. This is
dangerous, as changes are being made that can access the content of an
eventfs_inode from an RCU loop.
Instead of using kfree_rcu() use call_rcu() that calls a function to do
all the freeing of the eventfs_inode after a RCU grace period has expired.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Synthetic events create and destroy tracefs files when they are created
and removed. The tracing subsystem has its own file descriptor
representing the state of the events attached to the tracefs files.
There's a race between the eventfs files and this file descriptor of the
tracing system where the following can cause an issue:
With two scripts 'A' and 'B' doing:
Script 'A':
echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
while :
do
echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
done
Script 'B':
echo > /sys/kernel/tracing/synthetic_events
Script 'A' creates a synthetic event "hello" and then just writes zero
into its enable file.
Script 'B' removes all synthetic events (including the newly created
"hello" event).
What happens is that the opening of the "enable" file has:
{
struct trace_event_file *file = inode->i_private;
int ret;
ret = tracing_check_open_get_tr(file->tr);
[..]
But deleting the events frees the "file" descriptor, and a "use after
free" happens with the dereference at "file->tr".
The file descriptor does have a reference counter, but there needs to be a
way to decrement it from the eventfs when the eventfs_inode is removed
that represents this file descriptor.
Add an optional "release" callback to the eventfs_entry array structure,
that gets called when the eventfs file is about to be removed. This allows
for the creating on the eventfs file to increment the tracing file
descriptor ref counter. When the eventfs file is deleted, it can call the
release function that will call the put function for the tracing file
descriptor.
This will protect the tracing file from being freed while a eventfs file
that references it is being opened.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Tze-nan wu <[email protected]>
Tested-by: Tze-nan Wu (吳澤南) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|