| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-07-24
1) Generalize devcom implementation to be independent of number of ports
or device's GUID.
2) Save memory on command interface statistics.
3) General code cleanups
* tag 'mlx5-updates-2023-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix
net/mlx5: Make mlx5_eswitch_load/unload_vport() static
net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static
net/mlx5: Remove pointless devlink_rate checks
net/mlx5: Don't check vport->enabled in port ops
net/mlx5e: Make flow classification filters static
net/mlx5e: Remove duplicate code for user flow
net/mlx5: Allocate command stats with xarray
net/mlx5: split mlx5_cmd_init() to probe and reload routines
net/mlx5: Remove redundant cmdif revision check
net/mlx5: Re-organize mlx5_cmd struct
net/mlx5e: E-Switch, Allow devcom initialization on more vports
net/mlx5e: E-Switch, Register devcom device with switch id key
net/mlx5: Devcom, Infrastructure changes
net/mlx5: Use shared code for checking lag is supported
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
accept_ra_min_rtr_lft only considered the lifetime of the default route
and discarded entire RAs accordingly.
This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and
applies the value to individual RA sections; in particular, router
lifetime, PIO preferred lifetime, and RIO lifetime. If any of those
lifetimes are lower than the configured value, the specific RA section
is ignored.
In order for the sysctl to be useful to Android, it should really apply
to all lifetimes in the RA, since that is what determines the minimum
frequency at which RAs must be processed by the kernel. Android uses
hardware offloads to drop RAs for a fraction of the minimum of all
lifetimes present in the RA (some networks have very frequent RAs (5s)
with high lifetimes (2h)). Despite this, we have encountered networks
that set the router lifetime to 30s which results in very frequent CPU
wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the
WiFi firmware) entirely on such networks, it seems better to ignore the
misconfigured routers while still processing RAs from other IPv6 routers
on the same network (i.e. to support IoT applications).
The previous implementation dropped the entire RA based on router
lifetime. This turned out to be hard to expand to the other lifetimes
present in the RA in a consistent manner; dropping the entire RA based
on RIO/PIO lifetimes would essentially require parsing the whole thing
twice.
Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft")
Cc: Lorenzo Colitti <[email protected]>
Signed-off-by: Patrick Rohr <[email protected]>
Reviewed-by: Maciej Żenczykowski <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Reap the benefits of easier iteration thanks to the xarray.
Convert just the genetlink ones, those are easier to test.
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Iterating over the netdev hash table for netlink dumps is hard.
Dumps are done in "chunks" so we need to save the position
after each chunk, so we know where to restart from. Because
netdevs are stored in a hash table we remember which bucket
we were in and how many devices we dumped.
Since we don't hold any locks across the "chunks" - devices may
come and go while we're dumping. If that happens we may miss
a device (if device is deleted from the bucket we were in).
We indicate to user space that this may have happened by setting
NLM_F_DUMP_INTR. User space is supposed to dump again (I think)
if it sees that. Somehow I doubt most user space gets this right..
To illustrate let's look at an example:
System state:
start: # [A, B, C]
del: B # [A, C]
with the hash table we may dump [A, B], missing C completely even
tho it existed both before and after the "del B".
Add an xarray and use it to allocate ifindexes. This way we
can iterate ifindexes in order, without the worry that we'll
skip one. We may still generate a dump of a state which "never
existed", for example for a set of values and sequence of ops:
System state:
start: # [A, B]
add: C # [A, C, B]
del: B # [A, C]
we may generate a dump of [A], if C got an index between A and B.
System has never been in such state. But I'm 90% sure that's perfectly
fine, important part is that we can't _miss_ devices which exist before
and after. User space which wants to mirror kernel's state subscribes
to notifications and does periodic dumps so it will know that C exists
from the notification about its creation or from the next dump
(next dump is _guaranteed_ to include C, if it doesn't get removed).
To avoid any perf regressions keep the hash table for now. Most
net namespaces have very few devices and microbenchmarking 1M lookups
on Skylake I get the following results (not counting loopback
to number of devs):
#devs | hash | xa | delta
2 | 18.3 | 20.1 | + 9.8%
16 | 18.3 | 20.1 | + 9.5%
64 | 18.3 | 26.3 | +43.8%
128 | 20.4 | 26.3 | +28.6%
256 | 20.0 | 26.4 | +32.1%
1024 | 26.6 | 26.7 | + 0.2%
8192 |541.3 | 33.5 | -93.8%
No surprises since the hash table has 256 entries.
The microbenchmark scans indexes in order, if the pattern is more
random xa starts to win at 512 devices already. But that's a lot
of devices, in practice.
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A fix for a performance problem in QubesOS, adding a way to drain the
queue of grants experiencing delayed unmaps faster
- A patch enabling the use of static event channels from user mode,
which was omitted when introducing supporting static event channels
- A fix for a problem where Xen related code didn't check properly for
running in a Xen environment, resulting in a WARN splat
* tag 'for-linus-6.5a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: speed up grant-table reclaim
xen/evtchn: Introduce new IOCTL to bind static evtchn
xenbus: check xen_domain in xenbus_probe_initcall
|
|
Pull block fixes from Jens Axboe:
"A few fixes that should go into the current kernel release, mainly:
- Set of fixes for dasd (Stefan)
- Handle interruptible waits returning because of a signal for ublk
(Ming)"
* tag 'block-6.5-2023-07-28' of git://git.kernel.dk/linux:
ublk: return -EINTR if breaking from waiting for existed users in DEL_DEV
ublk: fail to recover device if queue setup is interrupted
ublk: fail to start device if queue setup is interrupted
block: Fix a source code comment in include/uapi/linux/blkzoned.h
s390/dasd: print copy pair message only for the correct error
s390/dasd: fix hanging device after request requeue
s390/dasd: use correct number of retries for ERP requests
s390/dasd: fix hanging device after quiesce/resume
|
|
Pull drm fixes from Dave Airlie:
"Regular scheduled fixes, msm and amdgpu leading the way, with some
i915 and a single misc fbdev, all seems fine.
fbdev:
- remove unused function
amdgpu:
- gfxhub partition fix
- Fix error handling in psp_sw_init()
- SMU13 fix
- DCN 3.1 fix
- DCN 3.2 fix
- Fix for display PHY programming sequence
- DP MST error handling fix
- GFX 9.4.3 fix
amdkfd:
- GFX11 trap handling fix
i915:
- Use shmem for dpt objects
- Fix an error handling path in igt_write_huge()
msm:
- display:
- Fix to correct the UBWC programming for decoder version 4.3 seen
on SM8550
- Add the missing flush and fetch bits for DMA4 and DMA5 SSPPs.
- Fix to drop the unused dpu_core_perf_data_bus_id enum from the
code
- Drop the unused dsi_phy_14nm_17mA_regulators from QCM 2290 DSI
cfg.
- gpu:
- Fix warn splat for newer devices without revn
- Remove name/revn for a690.. we shouldn't be populating these for
newer devices, for consistency, but it slipped through review
- Fix a6xx gpu snapshot BINDLESS_DATA size (was listed in bytes
instead of dwords, causing AHB faults on a6xx gen4/a660-family)
- Disallow submit with fence id 0"
* tag 'drm-fixes-2023-07-28' of git://anongit.freedesktop.org/drm/drm: (22 commits)
drm/msm: Disallow submit with fence id 0
drm/amdgpu: Restore HQD persistent state register
drm/amd/display: Unlock on error path in dm_handle_mst_sideband_msg_ready_event()
drm/amd/display: Exit idle optimizations before attempt to access PHY
drm/amd/display: Don't apply FIFO resync W/A if rdivider = 0
drm/amd/display: Guard DCN31 PHYD32CLK logic against chip family
drm/amd/smu: use AverageGfxclkFrequency* to replace previous GFX Curr Clock
drm/amd: Fix an error handling mistake in psp_sw_init()
drm/amdgpu: Fix infinite loop in gfxhub_v1_2_xcc_gart_enable (v2)
drm/amdkfd: fix trap handling work around for debugging
drm/fb-helper: Remove unused inline function drm_fb_helper_defio_init()
drm/i915: Fix an error handling path in igt_write_huge()
drm/i915/dpt: Use shmem for dpt objects
drm/msm: Fix hw_fence error path cleanup
drm/msm: Fix IS_ERR_OR_NULL() vs NULL check in a5xx_submit_in_rb()
drm/msm/adreno: Fix snapshot BINDLESS_DATA size
drm/msm/a690: Remove revn and name
drm/msm/adreno: Fix warn splat for devices without revn
drm/msm/dsi: Drop unused regulators from QCM2290 14nm DSI PHY config
drm/msm/dpu: drop enum dpu_core_perf_data_bus_id
...
|
|
Also add support to pass topdir to ynl-regen.sh (Jakub) and call
it from the makefile to update the UAPI headers.
Signed-off-by: Stanislav Fomichev <[email protected]>
Co-developed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Now both the physical path and the emulated path can support an IO page
table replacement. Call iommufd_device_replace/iommufd_access_replace(),
when vdev->iommufd_attached is true.
Also update the VFIO_DEVICE_ATTACH_IOMMUFD_PT kdoc in the uAPI header.
Link: https://lore.kernel.org/r/b5f01956ff161f76aa52c95b0fa1ad6eaca95c4a.1690523699.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Alex Williamson <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Taking advantage of the new iommufd_access_change_ioas_id helper, add an
iommufd_access_replace() API for the VFIO emulated pathway to use.
Link: https://lore.kernel.org/r/a3267b924fd5f45e0d3a1dd13a9237e923563862.1690523699.git.nicolinc@nvidia.com
Suggested-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Add an inversed variant of str_read_write(), i.e. str_write_read().
Signed-off-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Benjamin Tissoires <[email protected]>
|
|
If the dev is known (e.g. a devm-based sys_off_handler is used), it can
be passed to the handler's callback to have it available there.
Otherwise, cb_data might be set to the dev in most of the cases.
Reviewed-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Benjamin Bara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Lee Jones <[email protected]>
|
|
Add extack info for IPv6 address add/delete, which would be useful for
users to understand the problem without having to read kernel code.
Suggested-by: Beniamino Galvani <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Support matching V4L2 async sub-devices based on particular fwnode
endpoint. This makes it possible to instantiate multiple V4L2 sub-devices
based on given fwnode endpoints from a single device, based on driver
needs.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
When the v4l2-async framework was introduced, the use case for it was to
connect a camera sensor with a parallel receiver. Both tended to be rather
simple devices with a single connection between them.
The framework has been since improved in multiple ways but there are
limitations that have remained, for instance the assumption an async
sub-device is connected towards a single notifier and via a single link
only.
This patch enables connecting a sub-device to one or more notifiers
simultaneously, with one or more connections per notifier. The notifier
information is moved from the sub-device to the connection and the
connections in sub-device are no longer a pointer but a linked list.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Add v4l2_async_connection_unique() function for obtaining a struct
v4l2_async_connection, typically allocated by drivers together with their
own information on an external sub-device.
The relation between connections and sub-devices still remains 1:1 but
this code becomes more complex when the relation soon changes.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
This patch re-arranges internal V4L2 async lists for preparation of
supporting multiple connections per sub-device as well as cleaning up used
lists.
The list of unbound V4L2 sub-devices shall be maintained for the purpose of
listing those sub-devices only, not for their bindin status. Also, the V4L2
async connections now have, instead of two list entries, a single list
entry in the notifier's list, be that either waiting or done lists, while
the notifier's asc_list is removed.
The one-to-one relation between a sub-device and a connection is still
maintained in this patch.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Rename v4l2_async_subdev as v4l2_async_connection, in order to
differentiate between the sub-devices and their connections: one
sub-device can have many connections but the V4L2 async framework has so
far allowed just a single one. Connections in this context will later
translate into either MC ancillary or data links.
This patch prepares changing that relation by changing existing users of
v4l2_async_subdev to switch to v4l2_async_connection. Async sub-devices
themselves will not be needed anymore
Additionally, __v4l2_async_nf_add_subdev() has been renamed
__v4l2_async_nf_add_connection().
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
The naming of list heads and list entries is confusing as they're named
similarly. Use _list for list head and _entry for list entries.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
The async match type is a struct field now, rename V4L2_ASYNC_MATCH_*
macros as V4L2_ASYNC_MATCH_TYPE_* instead.
This patch has been produced by:
git grep -l V4L2_ASYNC_MATCH_ -- drivers/media/ drivers/staging/media/ \
include/ Documentation/|xargs perl -i -pe \
's/V4L2_ASYNC_MATCH_\K/TYPE_/g'
so it must be correct.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Reviewed-by: Laurent Pinchart <[email protected]>
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Make V4L2 async match information a struct, making it easier to use it
elsewhere outside the scope of struct v4l2_async_subdev.
Also remove an obsolete comment --- none of these fields are supposed to
be touched by drivers.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Reviewed-by: Laurent Pinchart <[email protected]>
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Remove an unneeded declaration for struct fwnode_handle.
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
The v4l2_async_nf_parse_fwnode_endpoints() function, part of
v4l2-fwnode.c, was a helper meant to register one async sub-dev for each
fwnode endpoint of a device.
The function is marked as deprecated in the documentation and is actually
not used anywhere anymore. Drop it and remove the helper function
v4l2_async_nf_fwnode_parse_endpoint() from v4l2-fwnode.c.
This change allows to make the helper function
__v4l2_async_nf_add_connection() visibility private to v4l2-async.c so
that there is no risk drivers can mistakenly use it.
[Sakari Ailus: Small fixups on top.]
Signed-off-by: Jacopo Mondi <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Tested-by: Philipp Zabel <[email protected]> # imx6qp
Tested-by: Niklas Söderlund <[email protected]> # rcar + adv746x
Reviewed-by: Laurent Pinchart <[email protected]>
Tested-by: Aishwarya Kothari <[email protected]> # Apalis i.MX6Q with TC358743
Tested-by: Lad Prabhakar <[email protected]> # Renesas RZ/G2L SMARC
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
Some architectures allow partial SMT states at boot time, ie. when not all
SMT threads are brought online.
To support that the SMT code needs to know the maximum number of SMT
threads, and also the currently configured number.
The architecture code knows the max number of threads, so have the
architecture code pass that value to cpu_smt_set_num_threads(). Note that
although topology_max_smt_threads() exists, it is not configured early
enough to be used here. As architecture, like PowerPC, allows the threads
number to be set through the kernel command line, also pass that value.
[ ldufour: Slightly reword the commit message ]
[ ldufour: Rename cpu_smt_check_topology and add a num_threads argument ]
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In order to export the cpuhp_smt_control enum as part of the interface
between generic and architecture code, the architecture code needs to
include asm/topology.h.
But that leads to circular header dependencies. So split the enum and
related declarations into a separate header.
[ ldufour: Reworded the commit's description ]
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add can_rx_offload_get_echo_skb_queue_tail(). This function addds the
echo skb at the end of rx-offload the queue. This is intended for
devices without timestamp support.
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
can_rx_offload_get_echo_skb_queue_timestamp()
Rename the rx_offload_get_echo_skb() function to
can_rx_offload_get_echo_skb_queue_timestamp(), since it inserts the
echo skb into the rx-offload queue sorted by timestamp.
This is a preparation for adding
can_rx_offload_get_echo_skb_queue_tail(), which adds the echo skb to
the end of the queue. This is intended for devices that do not support
timestamps.
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
The priv variable is _always_ of type (struct stmmac_priv *), so let's
stop using (void *) since it isn't abstracting anything.
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit 3a99f121fe0b ("firmware: qcom: scm: Introduce pas_metadata
context") left out the `extern` specifier for the API it introduced, so
add it.
Signed-off-by: Guru Das Srinagesh <[email protected]>
Link: https://lore.kernel.org/r/bce25c8e215f7cfc7b0780d6965d09f5efe1cc5f.1690503893.git.quic_gurus@quicinc.com
Signed-off-by: Bjorn Andersson <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter updates for net-next
1. silence a harmless warning for CONFIG_NF_CONNTRACK_PROCFS=n builds,
from Zhu Wang.
2, 3:
Allow NLA_POLICY_MASK to be used with BE16/BE32 types, and replace a few
manual checks with nla_policy based one in nf_tables, from myself.
4: cleanup in ctnetlink to validate while parsing rather than
using two steps, from Lin Ma.
5: refactor boyer-moore textsearch by moving a small chunk to
a helper function, rom Jeremy Sowden.
* tag 'nf-next-23-07-27' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
lib/ts_bm: add helper to reduce indentation and improve readability
netfilter: conntrack: validate cta_ip via parsing
netfilter: nf_tables: use NLA_POLICY_MASK to test for valid flag options
netlink: allow be16 and be32 types in all uint policy checks
nf_conntrack: fix -Wunused-const-variable=
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Typical misuse of
nla_parse_nested(array, XXX_MAX, ...);
array must be declared as
struct nlattr *array[XXX_MAX + 1];
v2: Based on feedbacks from Ido Schimmel and Zahari Doychev,
I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy
definitions.
syzbot reported:
BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014
CPU: 0 PID: 5014 Comm: syz-executor296 Not tainted 6.5.0-rc2-syzkaller-00307-gd192f5382581 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0x163/0x540 mm/kasan/report.c:475
kasan_report+0x175/0x1b0 mm/kasan/report.c:588
kasan_check_range+0x27e/0x290 mm/kasan/generic.c:187
__asan_memset+0x23/0x40 mm/kasan/shadow.c:84
__nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
__nla_parse+0x40/0x50 lib/nlattr.c:700
nla_parse_nested include/net/netlink.h:1262 [inline]
fl_set_key_cfm+0x1e3/0x440 net/sched/cls_flower.c:1718
fl_set_key+0x2168/0x6620 net/sched/cls_flower.c:1884
fl_tmplt_create+0x1fe/0x510 net/sched/cls_flower.c:2666
tc_chain_tmplt_add net/sched/cls_api.c:2959 [inline]
tc_ctl_chain+0x131d/0x1ac0 net/sched/cls_api.c:3068
rtnetlink_rcv_msg+0x82b/0xf50 net/core/rtnetlink.c:6424
netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2549
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7c3/0x990 net/netlink/af_netlink.c:1365
netlink_sendmsg+0xa2a/0xd60 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x592/0x890 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2577
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f54c6150759
Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe06c30578 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f54c619902d RCX: 00007f54c6150759
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007ffe06c30590 R08: 0000000000000000 R09: 00007ffe06c305f0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54c61c35f0
R13: 00007ffe06c30778 R14: 0000000000000001 R15: 0000000000000001
</TASK>
The buggy address belongs to stack of task syz-executor296/5014
and is located at offset 32 in frame:
fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374
This frame has 1 object:
[32, 56) 'nla_cfm_opt'
The buggy address belongs to the virtual mapping at
[ffffc90003a08000, ffffc90003a11000) created by:
copy_process+0x5c8/0x4290 kernel/fork.c:2330
Fixes: 7cfffd5fed3e ("net: flower: add support for matching cfm fields")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Simon Horman <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Zahari Doychev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Handle new insns properly in bpf_jit_blind_insn() function.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.
Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A single patch to remove an unused function.
Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/dqvxednqyab5t7gvwvcq72x6yu7ug5gusmhpgs3kq6z7pf3co6@ofr6s7547gbe
|
|
These declarations is not used after ipx protocol removed.
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This is not used, so can remove it.
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, netfilter.
Current release - regressions:
- core: fix splice_to_socket() for O_NONBLOCK socket
- af_unix: fix fortify_panic() in unix_bind_bsd().
- can: raw: fix lockdep issue in raw_release()
Previous releases - regressions:
- tcp: reduce chance of collisions in inet6_hashfn().
- netfilter: skip immediate deactivate in _PREPARE_ERROR
- tipc: stop tipc crypto on failure in tipc_node_create
- eth: igc: fix kernel panic during ndo_tx_timeout callback
- eth: iavf: fix potential deadlock on allocation failure
Previous releases - always broken:
- ipv6: fix bug where deleting a mngtmpaddr can create a new
temporary address
- eth: ice: fix memory management in ice_ethtool_fdir.c
- eth: hns3: fix the imp capability bit cannot exceed 32 bits issue
- eth: vxlan: calculate correct header length for GPE
- eth: stmmac: apply redundant write work around on 4.xx too"
* tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
tipc: stop tipc crypto on failure in tipc_node_create
af_unix: Terminate sun_path when bind()ing pathname socket.
tipc: check return value of pskb_trim()
benet: fix return value check in be_lancer_xmit_workarounds()
virtio-net: fix race between set queues and probe
net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
splice, net: Fix splice_to_socket() for O_NONBLOCK socket
net: fec: tx processing does not call XDP APIs if budget is 0
mptcp: more accurate NL event generation
selftests: mptcp: join: only check for ip6tables if needed
tools: ynl-gen: fix parse multi-attr enum attribute
tools: ynl-gen: fix enum index in _decode_enum(..)
netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
netfilter: nft_set_rbtree: fix overlap expiration walk
igc: Fix Kernel Panic during ndo_tx_timeout callback
net: dsa: qca8k: fix mdb add/del case with 0 VID
net: dsa: qca8k: fix broken search_and_del
net: dsa: qca8k: fix search_and_insert wrong handling of new rule
net: dsa: qca8k: enable use_single_write for qca8xxx
...
|
|
Command stats is an array with more than 2K entries, which amounts to
~180KB. This is way more than actually needed, as only ~190 entries
are being used.
Therefore, replace the array with xarray.
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Downstream patch will split mlx5_cmd_init() to probe and reload
routines. As a preparation, organize mlx5_cmd struct so that any
field that will be used in the reload routine are grouped at new
nested struct.
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Update devcom infrastructure to be more generic, without
depending on max supported ports definition or a device guid,
and also more encapsulated so callers don't need to pass
the register devcom component id per event call.
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When seq_show_option_n() is used, it is for non-string memory that
happens to be printable bytes. As such, we must use memcpy() to copy the
bytes and then explicitly NUL-terminate the result.
Cc: Andy Shevchenko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Muchun Song <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
gcc gets confused when -ftrivial-auto-var-init=pattern is used on sparse
bit fields such as 'struct spi_mem_op', which caused the previous false
positive warning about an uninitialized variable:
drivers/mtd/spi-nor/spansion.c: error: 'op' is used uninitialized [-Werror=uninitialized]
In fact, the variable is fully initialized and gcc does not see it being
used, so the warning is entirely bogus. The problem appears to be
a misoptimization in the initialization of single bit fields when the
rest of the bytes are not initialized.
A previous workaround added another initialization, which ended up
shutting up the warning in spansion.c, though it apparently still happens
in other files as reported by Peter Foley in the gcc bugzilla. The
workaround of adding a fake initialization seems particularly bad
because it would set values that can never be correct but prevent the
compiler from warning about actually missing initializations.
Revert the broken workaround and instead pad the structure to only
have bitfields that add up to full bytes, which should avoid this
behavior in all drivers.
I also filed a new bug against gcc with what I found, so this can
hopefully be addressed in future gcc releases. At the moment, only
gcc-12 and gcc-13 are affected.
Cc: Peter Foley <[email protected]>
Cc: Pedro Falcato <[email protected]>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110743
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108402
Link: https://godbolt.org/z/efMMsG1Kx
Fixes: 420c4495b5e56 ("mtd: spi-nor: spansion: make sure local struct does not contain garbage")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Mark Brown <[email protected]>
Acked-by: Tudor Ambarus <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into arm/fixes
Memory controller drivers - fixes for v6.5
Two fixes are needed for Tegra194 memory controllers caused by the same
Tegra PCI commit merged in v6.5-rc1. The Tegra PCI requires now
interconnect from the memory controller, which was set only for
Tegra234, but not for Tegra194, causing probe deferrals. Expose some
dummy interconnect provider for Tegra194, to satisfy PCI driver needs.
* tag 'memory-controller-drv-fixes-6.5' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
memory: tegra: make icc_set_bw return zero if BWMGR not supported
memory: tegra: Add dummy implementation on Tegra194
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
The corgi_lcd_limit_intensity() function is called from platform
and defined in a driver, but the driver does not see the declaration:
drivers/video/backlight/corgi_lcd.c:434:6: error: no previous prototype for 'corgi_lcd_limit_intensity' [-Werror=missing-prototypes]
434 | void corgi_lcd_limit_intensity(int limit)
Move the prototype into a header that can be included from both
sides to shut up the warning.
Reviewed-by: Daniel Thompson <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
__NLA_IS_BEINT_TYPE(tp) isn't useful. NLA_BE16/32 are identical to
NLA_U16/32, the only difference is that it tells the netlink validation
functions that byteorder conversion might be needed before comparing
the value to the policy min/max ones.
After this change all policy macros that can be used with UINT types,
such as NLA_POLICY_MASK() can also be used with NLA_BE16/32.
This will be used to validate nf_tables flag attributes which
are in bigendian byte order.
Signed-off-by: Florian Westphal <[email protected]>
|
|
This registers the new fchmodat2 syscall in most places as nuber 452,
with alpha being the exception where it's 562. I found all these sites
by grepping for fspick, which I assume has found me everything.
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Message-Id: <a677d521f048e4ca439e7080a5328f21eb8e960e.1689092120.git.legion@kernel.org>
Signed-off-by: Christian Brauner <[email protected]>
|
|
On the userspace side fchmodat(3) is implemented as a wrapper
function which implements the POSIX-specified interface. This
interface differs from the underlying kernel system call, which does not
have a flags argument. Most implementations require procfs [1][2].
There doesn't appear to be a good userspace workaround for this issue
but the implementation in the kernel is pretty straight-forward.
The new fchmodat2() syscall allows to pass the AT_SYMLINK_NOFOLLOW flag,
unlike existing fchmodat.
[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35
[2] https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28
Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Message-Id: <f2a846ef495943c5d101011eebcf01179d0c7b61.1689092120.git.legion@kernel.org>
[brauner: pre reviews, do flag conversion in do_fchmodat() directly]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
|