Age | Commit message (Collapse) | Author | Files | Lines |
|
Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.
Use a new profile to provide for future differences between SFs and PFs.
The mr_cache not used for non-PF functions, so it is excluded from the
new profile.
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Bodong Wang <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin,
fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table.
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to
avoid build errors (Reinette Chatre)
- Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid
broken USB after hotplug or suspend/resume (Basavaraj Natikar)
- Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring)
* tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix use-after-free in pci_bus_release_domain_nr()
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
|
|
Simplify internal verifier log API down to bpf_vlog_init() and
bpf_vlog_finalize(). The former handles input arguments validation in
one place and makes it easier to change it. The latter subsumes -ENOSPC
(truncation) and -EFAULT handling and simplifies both caller's code
(bpf_check() and btf_parse()).
For btf_parse(), this patch also makes sure that verifier log
finalization happens even if there is some error condition during BTF
verification process prior to normal finalization step.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Lorenz Bauer <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add output-only log_true_size and btf_log_true_size field to
BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return
the size of log buffer necessary to fit in all the log contents at
specified log_level. This is very useful for BPF loader libraries like
libbpf to be able to size log buffer correctly, but could be used by
users directly, if necessary, as well.
This patch plumbs all this through the code, taking into account actual
bpf_attr size provided by user to determine if these new fields are
expected by users. And if they are, set them from kernel on return.
We refactory btf_parse() function to accommodate this, moving attr and
uattr handling inside it. The rest is very straightforward code, which
is split from the logging accounting changes in the previous patch to
make it simpler to review logic vs UAPI changes.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Lorenz Bauer <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Change how we do accounting in BPF_LOG_FIXED mode and adopt log->end_pos
as *logical* log position. This means that we can go beyond physical log
buffer size now and be able to tell what log buffer size should be to
fit entire log contents without -ENOSPC.
To do this for BPF_LOG_FIXED mode, we need to remove a short-circuiting
logic of not vsnprintf()'ing further log content once we filled up
user-provided buffer, which is done by bpf_verifier_log_needed() checks.
We modify these checks to always keep going if log->level is non-zero
(i.e., log is requested), even if log->ubuf was NULL'ed out due to
copying data to user-space, or if entire log buffer is physically full.
We adopt bpf_verifier_vlog() routine to work correctly with
log->ubuf == NULL condition, performing log formatting into temporary
kernel buffer, doing all the necessary accounting, but just avoiding
copying data out if buffer is full or NULL'ed out.
With these changes, it's now possible to do this sort of determination of
log contents size in both BPF_LOG_FIXED and default rolling log mode.
We need to keep in mind bpf_vlog_reset(), though, which shrinks log
contents after successful verification of a particular code path. This
log reset means that log->end_pos isn't always increasing, so to return
back to users what should be the log buffer size to fit all log content
without causing -ENOSPC even in the presence of log resetting, we need
to keep maximum over "lifetime" of logging. We do this accounting in
bpf_vlog_update_len_max() helper.
A related and subtle aspect is that with this logical log->end_pos even in
BPF_LOG_FIXED mode we could temporary "overflow" buffer, but then reset
it back with bpf_vlog_reset() to a position inside user-supplied
log_buf. In such situation we still want to properly maintain
terminating zero. We will eventually return -ENOSPC even if final log
buffer is small (we detect this through log->len_max check). This
behavior is simpler to reason about and is consistent with current
behavior of verifier log. Handling of this required a small addition to
bpf_vlog_reset() logic to avoid doing put_user() beyond physical log
buffer dimensions.
Another issue to keep in mind is that we limit log buffer size to 32-bit
value and keep such log length as u32, but theoretically verifier could
produce huge log stretching beyond 4GB. Instead of keeping (and later
returning) 64-bit log length, we cap it at UINT_MAX. Current UAPI makes
it impossible to specify log buffer size bigger than 4GB anyways, so we
don't really loose anything here and keep everything consistently 32-bit
in UAPI. This property will be utilized in next patch.
Doing the same determination of maximum log buffer for rolling mode is
trivial, as log->end_pos and log->start_pos are already logical
positions, so there is nothing new there.
These changes do incidentally fix one small issue with previous logging
logic. Previously, if use provided log buffer of size N, and actual log
output was exactly N-1 bytes + terminating \0, kernel logic coun't
distinguish this condition from log truncation scenario which would end
up with truncated log contents of N-1 bytes + terminating \0 as well.
But now with log->end_pos being logical position that could go beyond
actual log buffer size, we can distinguish these two conditions, which
we do in this patch. This plays nicely with returning log_size_actual
(implemented in UAPI in the next patch), as we can now guarantee that if
user takes such log_size_actual and provides log buffer of that exact
size, they will not get -ENOSPC in return.
All in all, all these changes do conceptually unify fixed and rolling
log modes much better, and allow a nice feature requested by users:
knowing what should be the size of the buffer to avoid -ENOSPC.
We'll plumb this through the UAPI and the code in the next patch.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Lorenz Bauer <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently, if user-supplied log buffer to collect BPF verifier log turns
out to be too small to contain full log, bpf() syscall returns -ENOSPC,
fails BPF program verification/load, and preserves first N-1 bytes of
the verifier log (where N is the size of user-supplied buffer).
This is problematic in a bunch of common scenarios, especially when
working with real-world BPF programs that tend to be pretty complex as
far as verification goes and require big log buffers. Typically, it's
when debugging tricky cases at log level 2 (verbose). Also, when BPF program
is successfully validated, log level 2 is the only way to actually see
verifier state progression and all the important details.
Even with log level 1, it's possible to get -ENOSPC even if the final
verifier log fits in log buffer, if there is a code path that's deep
enough to fill up entire log, even if normally it would be reset later
on (there is a logic to chop off successfully validated portions of BPF
verifier log).
In short, it's not always possible to pre-size log buffer. Also, what's
worse, in practice, the end of the log most often is way more important
than the beginning, but verifier stops emitting log as soon as initial
log buffer is filled up.
This patch switches BPF verifier log behavior to effectively behave as
rotating log. That is, if user-supplied log buffer turns out to be too
short, verifier will keep overwriting previously written log,
effectively treating user's log buffer as a ring buffer. -ENOSPC is
still going to be returned at the end, to notify user that log contents
was truncated, but the important last N bytes of the log would be
returned, which might be all that user really needs. This consistent
-ENOSPC behavior, regardless of rotating or fixed log behavior, allows
to prevent backwards compatibility breakage. The only user-visible
change is which portion of verifier log user ends up seeing *if buffer
is too small*. Given contents of verifier log itself is not an ABI,
there is no breakage due to this behavior change. Specialized tools that
rely on specific contents of verifier log in -ENOSPC scenario are
expected to be easily adapted to accommodate old and new behaviors.
Importantly, though, to preserve good user experience and not require
every user-space application to adopt to this new behavior, before
exiting to user-space verifier will rotate log (in place) to make it
start at the very beginning of user buffer as a continuous
zero-terminated string. The contents will be a chopped off N-1 last
bytes of full verifier log, of course.
Given beginning of log is sometimes important as well, we add
BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows
tools like veristat to request first part of verifier log, if necessary.
BPF_LOG_FIXED flag is also a simple and straightforward way to check if
BPF verifier supports rotating behavior.
On the implementation side, conceptually, it's all simple. We maintain
64-bit logical start and end positions. If we need to truncate the log,
start position will be adjusted accordingly to lag end position by
N bytes. We then use those logical positions to calculate their matching
actual positions in user buffer and handle wrap around the end of the
buffer properly. Finally, right before returning from bpf_check(), we
rotate user log buffer contents in-place as necessary, to make log
contents contiguous. See comments in relevant functions for details.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Lorenz Bauer <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
kernel/bpf/verifier.c file is large and growing larger all the time. So
it's good to start splitting off more or less self-contained parts into
separate files to keep source code size (somewhat) somewhat under
control.
This patch is a one step in this direction, moving some of BPF verifier log
routines into a separate kernel/bpf/log.c. Right now it's most low-level
and isolated routines to append data to log, reset log to previous
position, etc. Eventually we could probably move verifier state
printing logic here as well, but this patch doesn't attempt to do that
yet.
Subsequent patches will add more logic to verifier log management, so
having basics in a separate file will make sure verifier.c doesn't grow
more with new changes.
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Lorenz Bauer <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Drivers call netdev_tx_completed_queue() right before
netif_txq_maybe_wake(). If BQL is enabled netdev_tx_completed_queue()
should issue a memory barrier, so we can depend on that separating
the stop check from the consumer index update, instead of adding
another barrier in netif_txq_maybe_wake().
This matters more than the barriers on the xmit path, because
the wake condition is almost always true. So we issue the
consumer side barrier often.
Wrap netdev_tx_completed_queue() in a local helper to issue
the barrier even if BQL is disabled. Keep the same semantics
as netdev_tx_completed_queue() (barrier only if bytes != 0)
to make it clear that the barrier is conditional.
Plus since macro gets pkt/byte counts as arguments now -
we can skip waking if there were no packets completed.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
A lot of drivers follow the same scheme to stop / start queues
without introducing locks between xmit and NAPI tx completions.
I'm guessing they all copy'n'paste each other's code.
The original code dates back all the way to e1000 and Linux 2.6.19.
Smaller drivers shy away from the scheme and introduce a lock
which may cause deadlocks in netpoll.
Provide macros which encapsulate the necessary logic.
The macros do not prevent false wake ups, the extra barrier
required to close that race is not worth it. See discussion in:
https://lore.kernel.org/all/[email protected]/
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull virtio fixes from Michael Tsirkin:
"Some last minute fixes - most of them for regressions"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa_sim_net: complete the initialization before register the device
vdpa/mlx5: Add and remove debugfs in setup/teardown driver
tools/virtio: fix typo in README instructions
vhost-scsi: Fix crash during LUN unmapping
vhost-scsi: Fix vhost_scsi struct use after free
virtio-blk: fix ZBD probe in kernels without ZBD support
virtio-blk: fix to match virtio spec
|
|
This fixes errors like bellow when LE Connection times out since that
is actually not a controller error:
Bluetooth: hci0: Opcode 0x200d failed: -110
Bluetooth: hci0: request failed to create LE connection: err -110
Instead the code shall properly detect if -ETIMEDOUT is returned and
send HCI_OP_LE_CREATE_CONN_CANCEL to give up on the connection.
Link: https://github.com/bluez/bluez/issues/340
Fixes: 8e8b92ee60de ("Bluetooth: hci_sync: Add hci_le_create_conn_sync")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull in pre-requisite patches from Guenter Roeck to constify
pointers to hwmon_channel_info.
* 'hwmon-const' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: constify pointers to hwmon_channel_info
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link (cxl) fixes from Dan Williams:
"Several fixes for driver startup regressions that landed during the
merge window as well as some older bugs.
The regressions were due to a lack of testing with what the CXL
specification calls Restricted CXL Host (RCH) topologies compared to
the testing with Virtual Host (VH) CXL topologies. A VH topology is
typical PCIe while RCH topologies map CXL endpoints as Root Complex
Integrated endpoints. The impact is some driver crashes on startup.
This merge window also added compatibility for range registers (the
mechanism that CXL 1.1 defined for mapping memory) to treat them like
HDM decoders (the mechanism that CXL 2.0 defined for mapping
Host-managed Device Memory). That work collided with the new region
enumeration code that was tested with CXL 2.0 setups, and fails with
crashes at startup.
Lastly, the DOE (Data Object Exchange) implementation for retrieving
an ACPI-like data table from CXL devices is being reworked for v6.4.
Several fixes fell out of that work that are suitable for v6.3.
All of this has been in linux-next for a while, and all reported
issues [1] have been addressed.
Summary:
- Fix several issues with region enumeration in RCH topologies that
can trigger crashes on driver startup or shutdown.
- Fix CXL DVSEC range register compatibility versus region
enumeration that leads to startup crashes
- Fix CDAT endiannes handling
- Fix multiple buffer handling boundary conditions
- Fix Data Object Exchange (DOE) workqueue usage vs
CONFIG_DEBUG_OBJECTS warn splats"
Link: http://lore.kernel.org/r/[email protected] [1]
* tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/hdm: Extend DVSEC range register emulation for region enumeration
cxl/hdm: Limit emulation to the number of range registers
cxl/region: Move coherence tracking into cxl_region_attach()
cxl/region: Fix region setup/teardown for RCDs
cxl/port: Fix find_cxl_root() for RCDs and simplify it
cxl/hdm: Skip emulation when driver manages mem_enable
cxl/hdm: Fix double allocation of @cxlhdm
PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
cxl/pci: Handle excessive CDAT length
cxl/pci: Handle truncated CDAT entries
cxl/pci: Handle truncated CDAT header
cxl/pci: Fix CDAT retrieval on big endian
|
|
There was a sort of rush surrounding commit 88c0a6b503b7 ("net: create a
netdev notifier for DSA to reject PTP on DSA master"), due to a desire
to convert DSA's attempt to deny TX timestamping on a DSA master to
something that doesn't block the kernel-wide API conversion from
ndo_eth_ioctl() to ndo_hwtstamp_set().
What was required was a mechanism that did not depend on ndo_eth_ioctl(),
and what was provided was a mechanism that did not depend on
ndo_eth_ioctl(), while at the same time introducing something that
wasn't absolutely necessary - a new netdev notifier.
There have been objections from Jakub Kicinski that using notifiers in
general when they are not absolutely necessary creates complications to
the control flow and difficulties to maintainers who look at the code.
So there is a desire to not use notifiers.
In addition to that, the notifier chain gets called even if there is no
DSA in the system and no one is interested in applying any restriction.
Take the model of udp_tunnel_nic_ops and introduce a stub mechanism,
through which net/core/dev_ioctl.c can call into DSA even when
CONFIG_NET_DSA=m.
Compared to the code that existed prior to the notifier conversion, aka
what was added in commits:
- 4cfab3566710 ("net: dsa: Add wrappers for overloaded ndo_ops")
- 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers")
this is different because we are not overloading any struct
net_device_ops of the DSA master anymore, but rather, we are exposing a
rather specific functionality which is orthogonal to which API is used
to enable it - ndo_eth_ioctl() or ndo_hwtstamp_set().
Also, what is similar is that both approaches use function pointers to
get from built-in code to DSA.
There is no point in replicating the function pointers towards
__dsa_master_hwtstamp_validate() once for every CPU port (dev->dsa_ptr).
Instead, it is sufficient to introduce a singleton struct dsa_stubs,
built into the kernel, which contains a single function pointer to
__dsa_master_hwtstamp_validate().
I find this approach preferable to what we had originally, because
dev->dsa_ptr->netdev_ops->ndo_do_ioctl() used to require going through
struct dsa_port (dev->dsa_ptr), and so, this was incompatible with any
attempts to add any data encapsulation and hide DSA data structures from
the outside world.
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four small fixes, all in drivers. They're all one or two lines except
for the ufs one, but that's a simple revert of a previous feature"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)
scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"28 hotfixes.
23 are cc:stable and the other five address issues which were
introduced during this merge cycle.
20 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: fix a potential concurrency bug in RCU mode
maple_tree: fix get wrong data_end in mtree_lookup_walk()
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
nilfs2: fix sysfs interface lifetime
mm: take a page reference when removing device exclusive entries
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
zsmalloc: document freeable stats
zsmalloc: document new fullness grouping
fsdax: force clear dirty mark if CoW
mm/hugetlb: fix uffd wr-protection for CoW optimization path
mm: enable maple tree RCU mode by default
maple_tree: add RCU lock checking to rcu callback functions
maple_tree: add smp_rmb() to dead node detection
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: detect dead nodes in mas_start()
maple_tree: be more cautious about dead nodes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Fix the ACPI backlight override mechanism for the cases when
acpi_backlight=video is set through the kernel command line or a DMI
quirk and add backlight quirks for Apple iMac14,1 and iMac14,2 and
Lenovo ThinkPad W530 (Hans de Goede)"
* tag 'acpi-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Add acpi_backlight=video quirk for Lenovo ThinkPad W530
ACPI: video: Add acpi_backlight=video quirk for Apple iMac14,1 and iMac14,2
ACPI: video: Make acpi_backlight=video work independent from GPU driver
ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()
|
|
HWmon core receives an array of pointers to hwmon_channel_info and it
does not modify it, thus it can be array of const pointers for safety.
This allows drivers to make them also const.
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
When arp_validate is set to 2, 3, or 6, validation is performed for
backup slaves as well. As stated in the bond documentation, validation
involves checking the broadcast ARP request sent out via the active
slave. This helps determine which slaves are more likely to function in
the event of an active slave failure.
However, when the target is an IPv6 address, the NS message sent from
the active interface is not checked on backup slaves. Additionally,
based on the bond_arp_rcv() rule b, we must reverse the saddr and daddr
when checking the NS message.
Note that when checking the NS message, the destination address is a
multicast address. Therefore, we must convert the target address to
solicited multicast in the bond_get_targets_ip6() function.
Prior to the fix, the backup slaves had a mii status of "down", but
after the fix, all of the slaves' mii status was updated to "UP".
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Reviewed-by: Jonathan Toppins <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
WED version 2 (on MT7986 and later) can offload flows originating from
wireless devices.
In order to make that work, ndo_setup_tc needs to be implemented on the
netdevs. This adds the required code to offload flows coming in from WED,
while keeping track of the incoming wed index used for selecting the
correct PPE device.
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
pci_msix_can_alloc_dyn() is not declared when CONFIG_PCI_MSI is disabled.
There is no existing user of pci_msix_can_alloc_dyn() but work is in
progress to change this. This work encounters the following error when
CONFIG_PCI_MSI is disabled:
drivers/vfio/pci/vfio_pci_intrs.c:427:21: error: implicit declaration of function 'pci_msix_can_alloc_dyn' [-Werror=implicit-function-declaration]
Provide definition for pci_msix_can_alloc_dyn() in preparation for users
that need to compile when CONFIG_PCI_MSI is disabled.
[bhelgaas: Also reported by Arnd Bergmann <[email protected]> in
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c; added his Fixes: line]
Fixes: fb0a6a268dcd ("net/mlx5: Provide external API for allocating vectors")
Fixes: 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Link: https://lore.kernel.org/r/310ecc4815dae4174031062f525245f0755c70e2.1680119924.git.reinette.chatre@intel.com
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Cc: [email protected] # v6.2+
|
|
Conflicts:
drivers/net/ethernet/google/gve/gve.h
3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
Adjacent changes:
net/can/isotp.c
051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and can.
Current release - regressions:
- wifi: mac80211:
- fix potential null pointer dereference
- fix receiving mesh packets in forwarding=0 networks
- fix mesh forwarding
Current release - new code bugs:
- virtio/vsock: fix leaks due to missing skb owner
Previous releases - regressions:
- raw: fix NULL deref in raw_get_next().
- sctp: check send stream number after wait_for_sndbuf
- qrtr:
- fix a refcount bug in qrtr_recvmsg()
- do not do DEL_SERVER broadcast after DEL_CLIENT
- wifi: brcmfmac: fix SDIO suspend/resume regression
- wifi: mt76: fix use-after-free in fw features query.
- can: fix race between isotp_sendsmg() and isotp_release()
- eth: mtk_eth_soc: fix remaining throughput regression
- eth: ice: reset FDIR counter in FDIR init stage
Previous releases - always broken:
- core: don't let netpoll invoke NAPI if in xmit context
- icmp: guard against too small mtu
- ipv6: fix an uninit variable access bug in __ip6_make_skb()
- wifi: mac80211: fix the size calculation of
ieee80211_ie_len_eht_cap()
- can: fix poll() to not report false EPOLLOUT events
- eth: gve: secure enough bytes in the first TX desc for all TCP
pkts"
* tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net: stmmac: check fwnode for phy device before scanning for phy
net: stmmac: Add queue reset into stmmac_xdp_open() function
selftests: net: rps_default_mask.sh: delete veth link specifically
net: fec: make use of MDIO C45 quirk
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
gve: Secure enough bytes in the first TX desc for all TCP pkts
netlink: annotate lockless accesses to nlk->max_recvmsg_len
ethtool: reset #lanes when lanes is omitted
ping: Fix potentail NULL deref for /proc/net/icmp.
raw: Fix NULL deref in raw_get_next().
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
net: stmmac: fix up RX flow hash indirection table when setting channels
net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
ipv6: Fix an uninit variable access bug in __ip6_make_skb()
...
|
|
Make sure unaligned descriptors that straddle the end of the UMEM are
considered invalid. Currently, descriptor validation is broken for
zero-copy mode which only checks descriptors at page granularity.
For example, descriptors in zero-copy mode that overrun the end of the
UMEM but not a page boundary are (incorrectly) considered valid. The
UMEM boundary check needs to happen before the page boundary and
contiguity checks in xp_desc_crosses_non_contig_pg(). Do this check in
xp_unaligned_validate_desc() instead like xp_check_unaligned() already
does.
Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Kal Conley <[email protected]>
Acked-by: Magnus Karlsson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"These are minor fixes to address false-positive build warnings:
Some of the less common I/O accessors are missing __force casts and
cause sparse warnings for their implied byteswap, and a recent change
to __generic_cmpxchg_local() causes a warning about constant integer
truncation"
* tag 'asm-generic-fixes-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: avoid __generic_cmpxchg_local warnings
asm-generic/io.h: suppress endianness warnings for relaxed accessors
asm-generic/io.h: suppress endianness warnings for readq() and writeq()
|
|
Recent attempt to ensure PREROUTING hook is executed again when a
decrypted ipsec packet received on a bridge passes through the network
stack a second time broke the physdev match in INPUT hook.
We can't discard the nf_bridge info strct from sabotage_in hook, as
this is needed by the physdev match.
Keep the struct around and handle this with another conditional instead.
Fixes: 2b272bb558f1 ("netfilter: br_netfilter: disable sabotage_in hook after first suppression")
Reported-and-tested-by: Farid BENAMROUCHE <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Functions for searching module kallsyms should have non-empty
definitions only if CONFIG_MODULES=y and CONFIG_KALLSYMS=y. Until now,
only CONFIG_MODULES check was used for many of these, which may have
caused complilation errors on some configs.
This patch moves all relevant functions under the correct configs.
Fixes: bd5314f8dd2d ("kallsyms, bpf: Move find_kallsyms_symbol_value out of internal header")
Signed-off-by: Viktor Malik <[email protected]>
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This patch adds support for platform-specific reset logic in the
stmmac driver. Some SoCs require a different reset mechanism than
the standard dwmac IP reset. To support these platforms, a new function
pointer 'fix_soc_reset' is added to the plat_stmmacenet_data structure.
The stmmac_reset in hwif.h is modified to call the 'fix_soc_reset'
function if it exists. This enables the driver to use the platform-specific
reset logic when necessary.
Signed-off-by: Shenwei Wang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space. Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
[[email protected]: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Allow callers of __acpi_video_get_backlight_type() to pass a pointer
to a bool which will get set to false if the backlight-type comes from
the cmdline or a DMI quirk and set to true if auto-detection was used.
And make __acpi_video_get_backlight_type() non static so that it can
be called directly outside of video_detect.c .
While at it turn the acpi_video_get_backlight_type() and
acpi_video_backlight_use_native() wrappers into static inline functions
in include/acpi/video.h, so that we need to export one less symbol.
Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix timerlat notification, as it was not triggering the notify to
users when a new max latency was hit.
- Do not trigger max latency if the tracing is off.
When tracing is off, the ring buffer is not updated, it does not make
sense to notify when there's a new max latency detected by the
tracer, as why that latency happened is not available. The tracing
logic still runs when the ring buffer is disabled, but it should not
be triggering notifications.
- Fix race on freeing the synthetic event "last_cmd" variable by adding
a mutex around it.
- Fix race between reader and writer of the ring buffer by adding
memory barriers. When the writer is still on the reader page it must
have its content visible on the buffer before it moves the commit
index that the reader uses to know how much content is on the page.
- Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and
_RET_IP_, which gets broken if it is not inlined.
- Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build.
The field formats of trace events are calculated by using
sizeof(type) and other means by what is passed into the structure
macros like __field(). The __field() macro is only meant for atom
types like int, long, short, pointer, etc. It is not meant for
arrays.
The code will currently compile with arrays, but then the format
produced will be inaccurate, and user space parsing tools will break.
Two bugs have already been fixed, now add code that will make the
kernel fail to build if another trace event includes this buggy field
format.
- Fix boot up snapshot code:
Boot snapshots were triggering when not even asked for on the kernel
command line. This was caused by two bugs:
1) It would trigger a snapshot on any instance if one was created
from the kernel command line.
2) The error handling would only affect the top level instance.
So the fact that a snapshot was done on a instance that didn't
allocate a buffer triggered a warning written into the top level
buffer, and worse yet, disabled the top level buffer.
- Fix memory leak that was caused when an error was logged in a trace
buffer instance, and then the buffer instance was removed.
The allocated error log messages still needed to be freed.
* tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Free error logs of tracing instances
tracing: Fix ftrace_boot_snapshot command line logic
tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
tracing: Error if a trace event has an array for a __field()
tracing/osnoise: Fix notify new tracing_max_latency
tracing/timerlat: Notify new max thread latency
ftrace: Mark get_lock_parent_ip() __always_inline
ring-buffer: Fix race while reader and writer are on the same page
tracing/synthetic: Fix races on freeing last_cmd
|
|
The arguments passed to the trace events are of type unsigned int,
however the signature of the events used __le32 parameters.
I may be missing the point here, but sparse flagged this and it
does seem incorrect to me.
net/qrtr/ns.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/qrtr.h):
./include/trace/events/qrtr.h:11:1: warning: cast to restricted __le32
./include/trace/events/qrtr.h:11:1: warning: restricted __le32 degrades to integer
./include/trace/events/qrtr.h:11:1: warning: restricted __le32 degrades to integer
... (a lot more similar warnings)
net/qrtr/ns.c:115:47: expected restricted __le32 [usertype] service
net/qrtr/ns.c:115:47: got unsigned int service
net/qrtr/ns.c:115:61: warning: incorrect type in argument 2 (different base types)
... (a lot more similar warnings)
Fixes: dfddb54043f0 ("net: qrtr: Add tracepoint support")
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Dae R. Jeong reported a NULL deref in raw_get_next() [0].
It seems that the repro was running these sequences in parallel so
that one thread was iterating on a socket that was being freed in
another netns.
unshare(0x40060200)
r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
socket$inet_icmp_raw(0x2, 0x3, 0x1)
pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)
After commit 0daf07e52709 ("raw: convert raw sockets to RCU"), we
use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
sockets. However, we should use spinlock for slow paths to avoid
the NULL deref.
Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
is not reused during iteration in the grace period. In fact, the
lockless readers do not check the nulls marker with get_nulls_value().
So, SOCK_RAW should use hlist instead of hlist_nulls.
Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
fast paths and sk_for_each() and spinlock for /proc/net/raw.
[0]:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
seq_read+0x224/0x320 fs/seq_file.c:162
pde_read fs/proc/inode.c:316 [inline]
proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
vfs_read+0x31e/0xd30 fs/read_write.c:468
ksys_pread64 fs/read_write.c:665 [inline]
__do_sys_pread64 fs/read_write.c:675 [inline]
__se_sys_pread64 fs/read_write.c:672 [inline]
__x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x478d29
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29
RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003
RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000
R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740
R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
RIP: 0010:sock_net include/net/sock.h:649 [inline]
RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 0daf07e52709 ("raw: convert raw sockets to RCU")
Reported-by: syzbot <[email protected]>
Reported-by: Dae R. Jeong <[email protected]>
Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
btf_nested_type_is_trusted() tries to find a struct member at corresponding offset.
It works for flat structures and falls apart in more complex structs with nested structs.
The offset->member search is already performed by btf_struct_walk() including nested structs.
Reuse this work and pass {field name, field btf id} into btf_nested_type_is_trusted()
instead of offset to make BTF_TYPE_SAFE*() logic more robust.
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: David Vernet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Remove unused arguments from btf_struct_access() callback.
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: David Vernet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Pull kvm fixes from Paolo Bonzini:
"PPC:
- Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled
s390:
- Fix handling of external interrupts in protected guests
x86:
- Resample the pending state of IOAPIC interrupts when unmasking them
- Fix usage of Hyper-V "enlightened TLB" on AMD
- Small fixes to real mode exceptions
- Suppress pending MMIO write exits if emulator detects exception
Documentation:
- Fix rST syntax"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
docs: kvm: x86: Fix broken field list
KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
KVM: s390: pv: fix external interruption loop not always detected
KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
KVM: x86: Suppress pending MMIO write exits if emulator detects exception
KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
KVM: irqfd: Make resampler_list an RCU list
KVM: SVM: Flush Hyper-V TLB when required
|
|
Code that passes a 32-bit constant into cmpxchg() produces a harmless
sparse warning because of the truncation in the branch that is not taken:
fs/erofs/zdata.c: note: in included file (through /home/arnd/arm-soc/arch/arm/include/asm/cmpxchg.h, /home/arnd/arm-soc/arch/arm/include/asm/atomic.h, /home/arnd/arm-soc/include/linux/atomic.h, ...):
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:29:33: warning: cast truncates bits from constant value (5f0ecafe becomes fe)
include/asm-generic/cmpxchg-local.h:30:42: warning: cast truncates bits from constant value (5f0edead becomes ad)
include/asm-generic/cmpxchg-local.h:33:34: warning: cast truncates bits from constant value (5f0ecafe becomes cafe)
include/asm-generic/cmpxchg-local.h:34:44: warning: cast truncates bits from constant value (5f0edead becomes dead)
This was reported as a regression to Matt's recent __generic_cmpxchg_local
patch, though this patch only added more warnings on top of the ones
that were already there.
Rewording the truncation to use an explicit bitmask instead of a cast
to a smaller type avoids the warning but otherwise leaves the code
unchanged.
I had another look at why the cast is even needed for atomic_cmpxchg(),
and as Matt describes the problem here is that atomic_t contains a
signed 'int', but cmpxchg() takes an 'unsigned long' argument, and
converting between the two leads to a 64-bit sign-extension of
negative 32-bit atomics.
I checked the other implementations of arch_cmpxchg() and did not find
any others that run into the same problem as __generic_cmpxchg_local(),
but it's easy to be on the safe side here and always convert the
signed int into an unsigned int when calling arch_cmpxchg(), as this
will work even when any of the arch_cmpxchg() implementations run
into the same problem.
Fixes: 624654152284 ("locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value")
Reviewed-by: Matt Evans <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
Copy the forced type casts from the normal MMIO accessors to suppress
the sparse warnings that point out __raw_readl() returns a native endian
word (just like readl()).
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
Commit c1d55d50139b ("asm-generic/io.h: Fix sparse warnings on
big-endian architectures") missed fixing the 64-bit accessors.
Arnd explains in the attached link why the casts are necessary, even if
__raw_readq() and __raw_writeq() do not take endian-specific types.
Link: https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Arnd Bergmann <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
The merged patch series to support zoned block devices in virtio-blk
is not the most up to date version. The merged patch can be found at
https://lore.kernel.org/linux-block/[email protected]/
but the latest and reviewed version is
https://lore.kernel.org/linux-block/[email protected]/
The reason is apparently that the correct mailing lists and
maintainers were not copied.
The differences between the two are mostly cleanups, but there is one
change that is very important in terms of compatibility with the
approved virtio-zbd specification.
Before it was approved, the OASIS virtio spec had a change in
VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the
current virtio-blk driver code. In the running code, the status is
the first byte of the in-header that is followed by some pad bytes
and the u64 that carries the sector at which the data has been written
to the zone back to the driver, aka the append sector.
This layout turned out to be problematic for implementing in QEMU and
the request status byte has been eventually made the last byte of the
in-header. The current code doesn't expect that and this causes the
append sector value always come as zero to the block layer. This needs
to be fixed ASAP.
Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
Cc: [email protected]
Signed-off-by: Dmitry Fomichev <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
A __field() in the TRACE_EVENT() macro is used to set up the fields of the
trace event data. It is for single storage units (word, char, int,
pointer, etc) and not for complex structures or arrays. Unfortunately,
there's nothing preventing the build from accepting:
__field(int, arr[5]);
from building. It will turn into a array value. This use to work fine, as
the offset and size use to be determined by the macro using the field name,
but things have changed and the offset and size are now determined by the
type. So the above would only be size 4, and the next field will be
located 4 bytes from it (instead of 20).
The proper way to declare static arrays is to use the __array() macro.
Instead of __field(int, arr[5]) it should be __array(int, arr, 5).
Add some macro tricks to the building of a trace event from the
TRACE_EVENT() macro such that __field(int, arr[5]) will fail to build. A
comment by the failure will explain why the build failed.
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Reported-by: Douglas RAILLARD <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
|
|
If the compiler decides not to inline this function then preemption
tracing will always show an IP inside the preemption disabling path and
never the function actually calling preempt_{enable,disable}.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Fixes: f904f58263e1d ("sched/debug: Fix preempt_disable_ip recording for preempt_disable()")
Signed-off-by: John Keeping <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
This adds support for the EDPD PHY tunable.
Per default EDPD is disabled in interrupt mode, the tunable can be used
to override this, e.g. if the link partner doesn't use EDPD.
The interval to check for energy can be chosen between 1000ms and
2000ms. Note that this value consists of the 1000ms phylib interval
for state machine runs plus the time to wait for energy being detected.
v2:
- consider that phylib core holds phydev->lock when calling the
phy tunable hooks
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The fact that PTP 2-step TX timestamping is broken on DSA switches if
the master also timestamps the same packets is documented by commit
f685e609a301 ("net: dsa: Deny PTP on master if switch supports it").
We attempt to help the users avoid shooting themselves in the foot by
making DSA reject the timestamping ioctls on an interface that is a DSA
master, and the switch tree beneath it contains switches which are aware
of PTP.
The only problem is that there isn't an established way of intercepting
ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network
stack by creating a struct dsa_netdevice_ops with overlaid function
pointers that are manually checked from the relevant call sites. There
used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only
one left.
There is an ongoing effort to migrate driver-visible hardware timestamping
control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set()
model, but DSA actively prevents that migration, since dsa_master_ioctl()
is currently coded to manually call the master's legacy ndo_eth_ioctl(),
and so, whenever a network device driver would be converted to the new
API, DSA's restrictions would be circumvented, because any device could
be used as a DSA master.
The established way for unrelated modules to react on a net device event
is via netdevice notifiers. So we create a new notifier which gets
called whenever there is an attempt to change hardware timestamping
settings on a device.
Finally, there is another reason why a netdev notifier will be a good
idea, besides strictly DSA, and this has to do with PHY timestamping.
With ndo_eth_ioctl(), all MAC drivers must manually call
phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP,
otherwise they must pass this ioctl to the PHY driver via
phy_mii_ioctl().
With the new ndo_hwtstamp_set() API, it will be desirable to simply not
make any calls into the MAC device driver when timestamping should be
performed at the PHY level.
But there exist drivers, such as the lan966x switch, which need to
install packet traps for PTP regardless of whether they are the layer
that provides the hardware timestamps, or the PHY is. That would be
impossible to support with the new API.
The proposal there, too, is to introduce a netdev notifier which acts as
a better cue for switching drivers to add or remove PTP packet traps,
than ndo_hwtstamp_set(). The one introduced here "almost" works there as
well, except for the fact that packet traps should only be installed if
the PHY driver succeeded to enable hardware timestamping, whereas here,
we need to deny hardware timestamping on the DSA master before it
actually gets enabled. This is why this notifier is called "PRE_", and
the notifier that would get used for PHY timestamping and packet traps
would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for
example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing.
In expectation of future netlink UAPI, we also pass a non-NULL extack
pointer to the netdev notifier, and we make DSA populate it with an
informative reason for the rejection. To avoid making it go to waste, we
make the ioctl-based dev_set_hwtstamp() create a fake extack and print
the message to the kernel log.
Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/
Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Jakub Kicinski suggested that we may want to add new UAPI for
controlling hardware timestamping through netlink in the future, and in
that case, we will be limited to the struct hwtstamp_config that is
currently passed in fixed binary format through the SIOCGHWTSTAMP and
SIOCSHWTSTAMP ioctls. It would be good if new kernel code already
started operating on an extensible kernel variant of that structure,
similar in concept to struct kernel_ethtool_coalesce vs struct
ethtool_coalesce.
Since struct hwtstamp_config is in include/uapi/linux/net_tstamp.h, here
we introduce include/linux/net_tstamp.h which shadows that other header,
but also includes it, so that existing includers of this header work as
before. In addition to that, we add the definition for the kernel-only
structure, and a helper which translates all fields by manual copying.
I am doing a manual copy in order to not force the alignment (or type)
of the fields of struct kernel_hwtstamp_config to be the same as of
struct hwtstamp_config, even though now, they are the same.
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit 7dafc3e007918384c8693ff8d70381b5c1e9c247.
This patch introduced a regression [1] where hba->pwr_info is used before
being initialized, which could create issues in ufshcd_scale_gear(). Revert
it until a better solution is found.
[1] https://lore.kernel.org/all/CAGaU9a_PMZhqv+YJ0r3w-hJMsR922oxW6Kg59vw+oen-NZ6Otw@mail.gmail.com
Signed-off-by: Adrien Thierry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Stanley Chu <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
napi_id is read by GRO and drivers to mark skbs, and it currently
sits at the end of the structure, in a mostly unused cache line.
Move it up into a hole, and separate the clearly control path
fields from the important ones.
Before:
struct napi_struct {
struct list_head poll_list; /* 0 16 */
long unsigned int state; /* 16 8 */
int weight; /* 24 4 */
int defer_hard_irqs_count; /* 28 4 */
long unsigned int gro_bitmask; /* 32 8 */
int (*poll)(struct napi_struct *, int); /* 40 8 */
int poll_owner; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
struct net_device * dev; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct gro_list gro_hash[8]; /* 64 192 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct sk_buff * skb; /* 256 8 */
struct list_head rx_list; /* 264 16 */
int rx_count; /* 280 4 */
/* XXX 4 bytes hole, try to pack */
struct hrtimer timer; /* 288 64 */
/* XXX last struct has 4 bytes of padding */
/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
struct list_head dev_list; /* 352 16 */
struct hlist_node napi_hash_node; /* 368 16 */
/* --- cacheline 6 boundary (384 bytes) --- */
unsigned int napi_id; /* 384 4 */
/* XXX 4 bytes hole, try to pack */
struct task_struct * thread; /* 392 8 */
/* size: 400, cachelines: 7, members: 17 */
/* sum members: 388, holes: 3, sum holes: 12 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 16 bytes */
};
After:
struct napi_struct {
struct list_head poll_list; /* 0 16 */
long unsigned int state; /* 16 8 */
int weight; /* 24 4 */
int defer_hard_irqs_count; /* 28 4 */
long unsigned int gro_bitmask; /* 32 8 */
int (*poll)(struct napi_struct *, int); /* 40 8 */
int poll_owner; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
struct net_device * dev; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct gro_list gro_hash[8]; /* 64 192 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct sk_buff * skb; /* 256 8 */
struct list_head rx_list; /* 264 16 */
int rx_count; /* 280 4 */
unsigned int napi_id; /* 284 4 */
struct hrtimer timer; /* 288 64 */
/* XXX last struct has 4 bytes of padding */
/* --- cacheline 5 boundary (320 bytes) was 32 bytes ago --- */
struct task_struct * thread; /* 352 8 */
struct list_head dev_list; /* 360 16 */
struct hlist_node napi_hash_node; /* 376 16 */
/* size: 392, cachelines: 7, members: 17 */
/* sum members: 388, holes: 1, sum holes: 4 */
/* paddings: 1, sum paddings: 4 */
/* forced alignments: 1 */
/* last cacheline: 8 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Maintainer update for S390 IOMMU driver
- A fix for the set_platform_dma_ops() call-back in the Exynos
IOMMU driver
- Intel VT-d fixes from Lu Baolu:
- Fix a lockdep splat
- Fix a supplement of the specification
- Fix a warning in perfmon code
* tag 'iommu-fixes-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
iommu/vt-d: Allow zero SAGAW if second-stage not supported
iommu/vt-d: Remove unnecessary locking in intel_irq_remapping_alloc()
iommu/exynos: Fix set_platform_dma_ops() callback
MAINTAINERS: Update s390-iommu driver maintainer information
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter updates for net-next
1. No need to disable BH in nfnetlink proc handler, freeing happens
via call_rcu.
2. Expose classid in nfetlink_queue, from Eric Sage.
3. Fix nfnetlink message description comments, from Matthieu De Beule.
4. Allow removal of offloaded connections via ctnetlink, from Paul Blakey.
* tag 'nf-next-2023-03-30' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: ctnetlink: Support offloaded conntrack entry deletion
netfilter: Correct documentation errors in nf_tables.h
netfilter: nfnetlink_queue: enable classid socket info retrieval
netfilter: nfnetlink_log: remove rcu_bh usage
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|