| Age | Commit message (Collapse) | Author | Files | Lines |
|
A bus_dma_region necessarily stores both CPU and DMA base addresses for
a range, so there's no need to also store the difference between them.
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Send credit update message when SO_RCVLOWAT is updated and it is bigger
than number of bytes in rx queue. It is needed, because 'poll()' will
wait until number of bytes in rx queue will be not smaller than
O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup
for tx/rx is possible: sender waits for free space and receiver is
waiting data in 'poll()'.
Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set
'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the
transport doesn't need to do it.
Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages")
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-12-13
Preparation for mlx5e socket direct feature.
Socket direct will allow multiple PF devices attached to different
NUMA nodes but sharing the same physical port.
The following series is a small refactoring series in preparation
to support socket direct in the following submission.
Highlights:
- Define required device registers and bits related to socket direct
- Flow steering re-arrangements
- Generalize TX objects (TISs) and store them in a common object, will
be useful in the next series for per function object management.
- Decouple raw CQ objects from their parent netdev priv
- Prepare devcom for Socket Direct device group discovery.
Please see the individual patches for more information.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
In order to optimize the data transfer, let's use the async DMA operation
for writing (queuing) data to the host.
In the async path, the completion event for the transfer ring will only be
sent to the host when the controller driver notifies the MHI stack of the
actual transfer completion using the callback (mhi_ep_skb_completion)
supplied in "struct mhi_ep_buf_info".
Also to accommodate the async operation, the transfer ring read offset
(ring->rd_offset) is cached in the "struct mhi_ep_chan" and updated locally
to let the stack queue further ring items to the controller driver. But the
actual read offset of the transfer ring will only be updated in the
completion callback.
Signed-off-by: Manivannan Sadhasivam <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
* add (and fix) certificate for regdb handover to Chen-Yu Tsai
* fix rfkill GPIO handling
* a few driver (iwlwifi, mt76) crash fixes
* logic fixes in the stack
* tag 'wireless-2023-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: fix certs build to not depend on file order
wifi: mt76: fix crash with WED rx support enabled
wifi: iwlwifi: pcie: avoid a NULL pointer dereference
wifi: mac80211: mesh_plink: fix matches_local logic
wifi: mac80211: mesh: check element parsing succeeded
wifi: mac80211: check defragmentation succeeded
wifi: mac80211: don't re-add debugfs during reconfig
net: rfkill: gpio: set GPIO direction
wifi: mac80211: check if the existing link config remains unchanged
wifi: cfg80211: Add my certificate
wifi: iwlwifi: pcie: add another missing bh-disable for rxq->lock
wifi: ieee80211: don't require protected vendor action frames
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Correct spelling as reported by codespell.
Signed-off-by: Randy Dunlap <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/ethernet/intel/iavf/iavf_ethtool.c
3a0b5a2929fd ("iavf: Introduce new state machines for flow director")
95260816b489 ("iavf: use iavf_schedule_aq_request() helper")
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/broadcom/bnxt/bnxt.c
c13e268c0768 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic")
c2f8063309da ("bnxt_en: Refactor RX VLAN acceleration logic.")
a7445d69809f ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
1c7fd6ee2fe4 ("bnxt_en: Rename some macros for the P5 chips")
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
bd6781c18cb5 ("bnxt_en: Fix wrong return value check in bnxt_close_nic()")
84793a499578 ("bnxt_en: Skip nic close/open when configuring tstamp filters")
https://lore.kernel.org/all/[email protected]/
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
3d7a3f2612d7 ("net/mlx5: Nack sync reset request when HotPlug is enabled")
cecf44ea1a1f ("net/mlx5: Allow sync reset flow when BF MGT interface device is present")
https://lore.kernel.org/all/[email protected]/
No adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - regressions:
- tcp: fix tcp_disordered_ack() vs usec TS resolution
Current release - new code bugs:
- dpll: sanitize possible null pointer dereference in
dpll_pin_parent_pin_set()
- eth: octeon_ep: initialise control mbox tasks before using APIs
Previous releases - regressions:
- io_uring/af_unix: disable sending io_uring over sockets
- eth: mlx5e:
- TC, don't offload post action rule if not supported
- fix possible deadlock on mlx5e_tx_timeout_work
- eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close
- eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()
- eth: ena: fix DMA syncing in XDP path when SWIOTLB is on
- eth: team: fix use-after-free when an option instance allocation
fails
Previous releases - always broken:
- neighbour: don't let neigh_forced_gc() disable preemption for long
- net: prevent mss overflow in skb_segment()
- ipv6: support reporting otherwise unknown prefix flags in
RTM_NEWPREFIX
- tcp: remove acked SYN flag from packet in the transmit queue
correctly
- eth: octeontx2-af:
- fix a use-after-free in rvu_nix_register_reporters
- fix promisc mcam entry action
- eth: dwmac-loongson: make sure MDIO is initialized before use
- eth: atlantic: fix double free in ring reinit logic"
* tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
net: atlantic: fix double free in ring reinit logic
appletalk: Fix Use-After-Free in atalk_ioctl
net: stmmac: Handle disabled MDIO busses from devicetree
net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
dpaa2-switch: do not ask for MDB, VLAN and FDB replay
dpaa2-switch: fix size of the dma_unmap
net: prevent mss overflow in skb_segment()
vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set"
MIPS: dts: loongson: drop incorrect dwmac fallback compatible
stmmac: dwmac-loongson: drop useless check for compatible fallback
stmmac: dwmac-loongson: Make sure MDIO is initialized before use
tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
net: ena: Fix XDP redirection error
net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
net: ena: Fix xdp drops handling due to multibuf packets
net: ena: Destroy correct number of xdp queues upon failure
net: Remove acked SYN flag from packet in the transmit queue correctly
qed: Fix a potential use-after-free in qed_cxt_tables_alloc
...
|
|
Add way to query the children of a particular mount. This is a more
flexible way to iterate the mount tree than having to parse
/proc/self/mountinfo.
Lookup the mount by the new 64bit mount ID. If a mount needs to be
queried based on path, then statx(2) can be used to first query the
mount ID belonging to the path.
Return an array of new (64bit) mount ID's. Without privileges only
mounts are listed which are reachable from the task's root.
Folded into this patch are several later improvements. Keeping them
separate would make the history pointlessly confusing:
* Recursive listing of mounts is the default now (cf. [1]).
* Remove explicit LISTMOUNT_UNREACHABLE flag (cf. [1]) and fail if mount
is unreachable from current root. This also makes permission checking
consistent with statmount() (cf. [3]).
* Start listing mounts in unique mount ID order (cf. [2]) to allow
continuing listmount() from a midpoint.
* Allow to continue listmount(). The @request_mask parameter is renamed
and to @param to be usable by both statmount() and listmount().
If @param is set to a mount id then listmount() will continue listing
mounts from that id on. This allows listing mounts in multiple
listmount invocations without having to resize the buffer. If @param
is zero then the listing starts from the beginning (cf. [4]).
* Don't return EOVERFLOW, instead return the buffer size which allows to
detect a full buffer as well (cf. [4]).
Signed-off-by: Miklos Szeredi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Ian Kent <[email protected]>
Link: https://lore.kernel.org/r/[email protected] [1] (folded)
Link: https://lore.kernel.org/r/[email protected] [2] (folded)
Link: https://lore.kernel.org/r/[email protected] [3] (folded)
Link: https://lore.kernel.org/r/[email protected] [4] (folded)
[Christian Brauner <[email protected]>: various smaller fixes]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Immutable branch between pdx86 amd wbrf branch and wifi / amdgpu due for the v6.8 merge window
platform-drivers-x86-amd-wbrf-v6.8-1: v6.7-rc1 + AMD WBRF support
for merging into the wifi subsys and amdgpu driver for 6.8.
|
|
These callbacks can be implemented by the controller drivers to perform
async read/write operation that increases the throughput.
For aiding the async operation, a completion callback is also introduced.
Signed-off-by: Manivannan Sadhasivam <[email protected]>
|
|
In the preparation for adding async API support, let's rename the existing
APIs to read_sync() and write_sync() to make it explicit that these APIs
are used for synchronous read/write.
Signed-off-by: Manivannan Sadhasivam <[email protected]>
|
|
In the preparation of DMA async support, let's pass the parameters to
read_from_host() and write_to_host() APIs using mhi_ep_buf_info structure.
No functional change.
Signed-off-by: Manivannan Sadhasivam <[email protected]>
|
|
Use slab allocator for allocating the memory for objects used frequently
and are of fixed size. This reduces the overheard associated with
kmalloc().
Suggested-by: Alex Elder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
|
|
Allow the user to set the symmetric Toeplitz hash function via:
# ethtool -X eth0 hfunc toeplitz symmetric-xor
The driver will reject any new RSS configuration if a field other than
(IP src/dst and L4 src/dst ports) is requested for hashing.
The symmetric RSS will not be supported on PFs not advertising the ADV RSS
Offload flag (ADV_RSS_SUPPORT()), for example the E700 series (i40e).
Reviewed-by: Madhu Chittim <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Refactor the driver to use a communication data structure for RSS
config. To do so we introduce the new ice_rss_hash_cfg struct, and then
pass it as an argument to several functions.
Also introduce enum ice_rss_cfg_hdr_type to specify a more granular and
flexible RSS configuration:
ICE_RSS_OUTER_HEADERS - take outer layer as RSS input set
ICE_RSS_INNER_HEADERS - take inner layer as RSS input set
ICE_RSS_INNER_HEADERS_W_OUTER_IPV4 - take inner layer as RSS input set for
packet with outer IPV4
ICE_RSS_INNER_HEADERS_W_OUTER_IPV6 - take inner layer as RSS input set for
packet with outer IPV6
ICE_RSS_ANY_HEADERS - try with outer first then inner (same as the
behaviour without this change)
Finally, move the virtchnl_rss_algorithm enum to be with the other RSS
related structures in the virtchnl.h file.
There should be no functional change due to this patch.
Reviewed-by: Wojciech Drewek <[email protected]>
Signed-off-by: Qi Zhang <[email protected]>
Co-developed-by: Jesse Brandeburg <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Co-developed-by: Ahmed Zaki <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Symmetric RSS hash functions are beneficial in applications that monitor
both Tx and Rx packets of the same flow (IDS, software firewalls, ..etc).
Getting all traffic of the same flow on the same RX queue results in
higher CPU cache efficiency.
A NIC that supports "symmetric-xor" can achieve this RSS hash symmetry
by XORing the source and destination fields and pass the values to the
RSS hash algorithm.
The user may request RSS hash symmetry for a specific algorithm, via:
# ethtool -X eth0 hfunc <hash_alg> symmetric-xor
or turn symmetry off (asymmetric) by:
# ethtool -X eth0 hfunc <hash_alg>
The specific fields for each flow type should then be specified as usual
via:
# ethtool -N|-U eth0 rx-flow-hash <flow_type> s|d|f|n
Reviewed-by: Wojciech Drewek <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add the RSS context parameters to struct ethtool_rxfh_param and use the
get/set_rxfh to handle the RSS contexts as well.
This is part 2/2 of the fix suggested in [1]:
- Add a rss_context member to the argument struct and a capability
like cap_link_lanes_supported to indicate whether driver supports
rss contexts, then you can remove *et_rxfh_context functions,
and instead call *et_rxfh() with a non-zero rss_context.
Link: https://lore.kernel.org/netdev/[email protected]/ [1]
CC: Jesse Brandeburg <[email protected]>
CC: Tony Nguyen <[email protected]>
CC: Marcin Wojtas <[email protected]>
CC: Russell King <[email protected]>
CC: Sunil Goutham <[email protected]>
CC: Geetha sowjanya <[email protected]>
CC: Subbaraya Sundeep <[email protected]>
CC: hariprasad <[email protected]>
CC: Saeed Mahameed <[email protected]>
CC: Leon Romanovsky <[email protected]>
CC: Edward Cree <[email protected]>
CC: Martin Habets <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters
as direct function arguments. This will force us to change the API (and
all drivers' functions) every time some new parameters are added.
This is part 1/2 of the fix, as suggested in [1]:
- First simplify the code by always providing a pointer to all params
(indir, key and func); the fact that some of them may be NULL seems
like a weird historic thing or a premature optimization.
It will simplify the drivers if all pointers are always present.
- Then make the functions take a dev pointer, and a pointer to a
single struct wrapping all arguments. The set_* should also take
an extack.
Link: https://lore.kernel.org/netdev/[email protected]/ [1]
Suggested-by: Jakub Kicinski <[email protected]>
Suggested-by: Jacob Keller <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Some devices(eg. SDX75) take longer than expected (default, 8 seconds) to
set ready after reboot. Hence add optional ready timeout parameter and pass
the appropriate timeout value to mhi_poll_reg_field() to wait enough for
device ready as part of power up sequence.
Signed-off-by: Qiang Yu <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
|
|
This driver does not have any in-tree users but is passing a
legacy GPIO number through platform data.
Convert it to use a GPIO descriptor, new users or outoftree
users can easily be implemented using GPIO descriptor tables
or software nodes.
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
The driver supports passing some GPIO lines for rows and columns
through the driver data, but there is no in-kernel user of this.
Further the use seems convoluted because the GPIO lines are unused
in the driver, then explicitly free:ed when removing it without
being requested when probing it, which is assymetric and just
a recepie for disaster.
Remove the support for these unused GPIOs, if need be support can
be reestablished in an organized fashion using GPIO descriptors.
Signed-off-by: Linus Walleij <[email protected]>
Reviewed-by: Tony Lindgren <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
The Navpoint driver uses a GPIO line, convert this to use
a GPIO descriptor. There are no in-kernel users but out of tree
users can easily be added or converted using a GPIO descriptor
table as with numerous other drivers.
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
To support multiple users referencing the same fragment,
'pp_frag_count' is renamed to 'pp_ref_count', transitioning pp pages
from fragment management to reference count management after draining
based on the suggestion from [1].
The idea is that the concept of fragmenting exists before the page is
drained, and all related functions retain their current names.
However, once the page is drained, its management shifts to being
governed by 'pp_ref_count'. Therefore, all functions associated with
that lifecycle stage of a pp page are renamed.
[1]
http://lore.kernel.org/netdev/[email protected]
Signed-off-by: Liang Chen <[email protected]>
Reviewed-by: Yunsheng Lin <[email protected]>
Reviewed-by: Ilias Apalodimas <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow up commit 9690ae604290 ("ethtool: add header/data split
indication") and add the set part of Ethtool's header split, i.e.
ability to enable/disable header split via the Ethtool Netlink
interface. This might be helpful to optimize the setup for particular
workloads, for example, to avoid XDP frags, and so on.
A driver should advertise ``ETHTOOL_RING_USE_TCP_DATA_SPLIT`` in its
ops->supported_ring_params to allow doing that. "Unknown" passed from
the userspace when the header split is supported means the driver is
free to choose the preferred state.
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The transport interface send (TIS) object is responsible for performing
all transport related operations of the transmit side. Messages from
Send Queues get segmented and transmitted by the TIS including all
transport required implications, e.g. in the case of large send offload,
the TIS is responsible for the segmentation.
These are stateless objects and can be used by multiple netdevs (e.g.
representors) who share the same core device.
Providing the TISes as a service from the core layer to the netdev layer
reduces the number of replecated TIS objects (in case of multiple
netdevs), and will ease the transition to netdev with multiple mdevs.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
MPIR register allows to query the PCIe indexes
and Socket-Direct related parameters.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Multiple device caps and features are required to support
single netdev Socket-Direct.
Add them here in preparation for the feature implementation.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Move part of the genphy_c45_pma_read_abilities() code to a separate
function.
Some PHYs do not implement PMA/PMD status 2 register (Register 1.8) but
do implement PMA/PMD extended ability register (Register 1.11). To make
use of it, we need to be able to access this part of code separately.
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Implement the newly added .xmo_rx_vlan_tag() hint function.
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Larysa Zaremba <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
__vlan_hwaccel_get_tag() is used in veth XDP hints implementation,
its return value (-EINVAL if skb is not VLAN tagged) is passed to bpf code,
but XDP hints specification requires drivers to return -ENODATA, if a hint
cannot be provided for a particular packet.
Solve this inconsistency by changing error return value of
__vlan_hwaccel_get_tag() from -EINVAL to -ENODATA, do the same thing to
__vlan_get_tag(), because this function is supposed to follow the same
convention. This, in turn, makes -ENODATA the only non-zero value
vlan_get_tag() can return. We can do this with no side effects, because
none of the users of the 3 above-mentioned functions rely on the exact
value.
Suggested-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Larysa Zaremba <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Parse uid and gid in bpf_parse_param() so that they can be passed in as
the `data` parameter when mount() bpffs. This will be useful when we
want to control which user/group has the control to the mounted bpffs,
otherwise a separate chown() call will be needed.
Signed-off-by: Jie Jiang <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Mike Frysinger <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Immutable branch between pdx86 amd wbrf branch and wifi / amdgpu due for the v6.8 merge window
platform-drivers-x86-amd-wbrf-v6.8-1: v6.7-rc1 + AMD WBRF support
for merging into the wifi subsys and amdgpu driver for 6.8.
Signed-off-by: Alex Deucher <[email protected]>
|
|
'fixes.2023.12.13a', 'rcu-tasks.2023.12.12b' and 'srcu.2023.12.13a' into rcu-merge.2023.12.13a
|
|
The brief summary in the docstring for function list_next_or_null_rcu()
states that the function is supposed to provide the "first" member of a
list, whereas in truth it returns the next member.
Change the docstring so it describes what the function actually does.
Signed-off-by: Philipp Stanner <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay (AMD) <[email protected]>
|
|
It is claimed that srcu_read_lock_nmisafe() NMI-safe. However it
triggers a lockdep if used from NMI because lockdep expects a deadlock
since nothing disables NMIs while the lock is acquired.
This is because commit f0f44752f5f61 ("rcu: Annotate SRCU's update-side
lockdep dependencies") annotates synchronize_srcu() as a write lock
usage. This helps to detect a deadlocks such as
srcu_read_lock();
synchronize_srcu();
srcu_read_unlock();
The side effect is that the lock srcu_struct now has a USED usage in normal
contexts, so it conflicts with a USED_READ usage in NMI. But this shouldn't
cause a real deadlock because the write lock usage from synchronize_srcu()
is a fake one and only used for read/write deadlock detection.
Use a try-lock annotation for srcu_read_lock_nmisafe() to avoid lockdep
complains if used from NMI.
Fixes: f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Boqun Feng <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay (AMD) <[email protected]>
|
|
There are a few quirks around using lazy wake for poll unconditionally,
and one of them is related the EPOLLEXCLUSIVE. Those may trigger
exclusive wakeups, which wake a limited number of entries in the wait
queue. If that wake number is less than the number of entries someone is
waiting for (and that someone is also using DEFER_TASKRUN), then we can
get stuck waiting for more entries while we should be processing the ones
we already got.
If we're doing exclusive poll waits, flag the request as not being
compatible with lazy wakeups.
Reported-by: Pavel Begunkov <[email protected]>
Fixes: 6ce4a93dbb5b ("io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups")
Signed-off-by: Jens Axboe <[email protected]>
|
|
This function was added with a8df7b1ab70b ("leds: add led_trigger_rename
function") 11 yrs ago, but it has no users. So remove it.
Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lee Jones <[email protected]>
|
|
The clear and set pattern is commonly used for accessing PCI config,
move the helper pci_clear_and_set_dword() from aspm.c into PCI header.
In addition, rename to pci_clear_and_set_config_dword() to retain the
"config" information and match the other accessors.
No functional change intended.
Signed-off-by: Shuai Xue <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Tested-by: Ilkka Koskinen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
The Alibaba Vendor ID (0x1ded) is now used by Alibaba elasticRDMA ("erdma")
and will be shared with the upcoming PCIe PMU ("dwc_pcie_pmu"). Move the
Vendor ID to linux/pci_ids.h so that it can shared by several drivers
later.
Signed-off-by: Shuai Xue <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]> # pci_ids.h
Tested-by: Ilkka Koskinen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
The _store callbacks of the trip point temperature and hysteresis sysfs
attributes invoke thermal_notify_tz_trip_change() to send a notification
regarding the trip point change, but when trip points are updated by the
platform firmware, trip point change notifications are not sent.
To make the behavior after a trip point change more consistent,
modify all of the 3 places where trip point temperature is updated
to use a new function called thermal_zone_set_trip_temp() for this
purpose and make that function call thermal_notify_tz_trip_change().
Note that trip point hysteresis can only be updated via sysfs and
trip_point_hyst_store() calls thermal_notify_tz_trip_change() already,
so this code path need not be changed.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
|
|
There is no in-kernel function to get the status register of a tty device
like the TIOCMGET ioctl returns to userspace. Create a new function,
tty_get_tiocm(), to obtain the status register that other portions of the
kernel can call if they need this information, and move the existing
internal tty_tiocmget() function to use this interface.
Signed-off-by: Florian Eckert <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lee Jones <[email protected]>
|
|
Add 2.5G, 5G and 10G as available speeds to the netdev LED trigger.
Signed-off-by: Daniel Golle <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/99e7d3304c6bba7f4863a4a80764a869855f2085.1701143925.git.daniel@makrotopia.org
Signed-off-by: Lee Jones <[email protected]>
|
|
Mode supported is currently reported to the user exactly the same, as
the current mode. That's because mode changing is not implemented.
Remove the leftover mode_supported() op and use mode_get() to fill up
the supported mode exposed to user.
One, if even, mode changing is going to be introduced, this could be
very easily taken back. In the meantime, prevent drivers form
implementing this in wrong way (as for example recent netdevsim
implementation attempt intended to do).
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since commit 7c41cdcd3bbe ("OPP: Simplify the over-designed pstate <->
level dance"), there is no longer any users of the
pm_genpd_opp_to_performance_state() API. Let's therefore drop it and its
corresponding ->opp_to_performance_state() callback, which also no longer
has any users.
Signed-off-by: Ulf Hansson <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
|
|
In the effort to reduce zombie memcgs [1], it was discovered that the
memcg LRU doesn't apply enough pressure on offlined memcgs. Specifically,
instead of rotating them to the tail of the current generation
(MEMCG_LRU_TAIL) for a second attempt, it moves them to the next
generation (MEMCG_LRU_YOUNG) after the first attempt.
Not applying enough pressure on offlined memcgs can cause them to build
up, and this can be particularly harmful to memory-constrained systems.
On Pixel 8 Pro, launching apps for 50 cycles:
Before After Change
Zombie memcgs 45 35 -22%
[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <[email protected]>
Reported-by: T.J. Mercier <[email protected]>
Tested-by: T.J. Mercier <[email protected]>
Cc: Charan Teja Kalla <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jaroslav Pulchart <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru:
try to stop at high watermarks"), it was discovered that the memcg LRU can
breach the thrashing protection imposed by min_ttl_ms.
Before the memcg LRU:
kswapd()
shrink_node_memcgs()
mem_cgroup_iter()
inc_max_seq() // always hit a different memcg
lru_gen_age_node()
mem_cgroup_iter()
check the timestamp of the oldest generation
After the memcg LRU:
kswapd()
shrink_many()
restart:
iterate the memcg LRU:
inc_max_seq() // occasionally hit the same memcg
if raced with lru_gen_rotate_memcg():
goto restart
lru_gen_age_node()
mem_cgroup_iter()
check the timestamp of the oldest generation
Specifically, when the restart happens in shrink_many(), it needs to stick
with the (memcg LRU) generation it began with. In other words, it should
neither re-read memcg_lru->seq nor age an lruvec of a different
generation. Otherwise it can hit the same memcg multiple times without
giving lru_gen_age_node() a chance to check the timestamp of that memcg's
oldest generation (against min_ttl_ms).
[1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <[email protected]>
Tested-by: T.J. Mercier <[email protected]>
Cc: Charan Teja Kalla <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jaroslav Pulchart <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Unmapped folios accessed through file descriptors can be underprotected.
Those folios are added to the oldest generation based on:
1. The fact that they are less costly to reclaim (no need to walk the
rmap and flush the TLB) and have less impact on performance (don't
cause major PFs and can be non-blocking if needed again).
2. The observation that they are likely to be single-use. E.g., for
client use cases like Android, its apps parse configuration files
and store the data in heap (anon); for server use cases like MySQL,
it reads from InnoDB files and holds the cached data for tables in
buffer pools (anon).
However, the oldest generation can be very short lived, and if so, it
doesn't provide the PID controller with enough time to respond to a surge
of refaults. (Note that the PID controller uses weighted refaults and
those from evicted generations only take a half of the whole weight.) In
other words, for a short lived generation, the moving average smooths out
the spike quickly.
To fix the problem:
1. For folios that are already on LRU, if they can be beyond the
tracking range of tiers, i.e., five accesses through file
descriptors, move them to the second oldest generation to give them
more time to age. (Note that tiers are used by the PID controller
to statistically determine whether folios accessed multiple times
through file descriptors are worth protecting.)
2. When adding unmapped folios to LRU, adjust the placement of them so
that they are not too close to the tail. The effect of this is
similar to the above.
On Android, launching 55 apps sequentially:
Before After Change
workingset_refault_anon 25641024 25598972 0%
workingset_refault_file 115016834 106178438 -8%
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Yu Zhao <[email protected]>
Reported-by: Charan Teja Kalla <[email protected]>
Tested-by: Kalesh Singh <[email protected]>
Cc: T.J. Mercier <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Jaroslav Pulchart <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The cleanup tasks of kdamond threads including reset of corresponding
DAMON context's ->kdamond field and decrease of global nr_running_ctxs
counter is supposed to be executed by kdamond_fn(). However, commit
0f91d13366a4 ("mm/damon: simplify stop mechanism") made neither
damon_start() nor damon_stop() ensure the corresponding kdamond has
started the execution of kdamond_fn().
As a result, the cleanup can be skipped if damon_stop() is called fast
enough after the previous damon_start(). Especially the skipped reset
of ->kdamond could cause a use-after-free.
Fix it by waiting for start of kdamond_fn() execution from
damon_start().
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0f91d13366a4 ("mm/damon: simplify stop mechanism")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: Jakub Acs <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Jakub Acs <[email protected]>
Cc: <[email protected]> # 5.15.x
Signed-off-by: Andrew Morton <[email protected]>
|