| Age | Commit message (Collapse) | Author | Files | Lines |
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ff_device.
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
Fix spelling of "Structure".
Fix multiple kernel-doc warnings:
clk-provider.h:269: warning: Function parameter or member 'recalc_rate' not described in 'clk_ops'
clk-provider.h:468: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_with_accuracy_parent_data'
clk-provider.h:468: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_with_accuracy_parent_data'
clk-provider.h:482: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_parent_accuracy'
clk-provider.h:482: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_parent_accuracy'
clk-provider.h:687: warning: Function parameter or member 'flags' not described in 'clk_divider'
clk-provider.h:1164: warning: Function parameter or member 'flags' not described in 'clk_fractional_divider'
clk-provider.h:1164: warning: Function parameter or member 'approximation' not described in 'clk_fractional_divider'
clk-provider.h:1213: warning: Function parameter or member 'flags' not described in 'clk_multiplier'
Fixes: 9fba738a53dd ("clk: add duty cycle support")
Fixes: b2476490ef11 ("clk: introduce the common clock framework")
Fixes: 2d34f09e79c9 ("clk: fixed-rate: Add support for specifying parents via DT/pointers")
Fixes: f5290d8e4f0c ("clk: asm9260: use parent index to link the reference clock")
Fixes: 9d9f78ed9af0 ("clk: basic clock hardware types")
Fixes: e2d0e90fae82 ("clk: new basic clk type for fractional divider")
Fixes: f2e0a53271a4 ("clk: Add a basic multiplier clock")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Michael Turquette <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>
|
|
Check cpu_mitigations_off() first to avoid calling capable() if it is off.
This can avoid unnecessary audit log.
Fixes: bc5bc309db45 ("bpf: Inherit system settings for CPU security mitigations")
Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/CAEf4Bza6UVUWqcWQ-66weZ-nMDr+TFU3Mtq=dumZFD-pSqU7Ow@mail.gmail.com/
Link: https://lore.kernel.org/bpf/[email protected]
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
This PR is collected from
https://lore.kernel.org/all/[email protected]
This series from Patrisious extends mlx5 to support IPsec packet offload
in multiport devices (MPV, see [1] for more details).
These devices have single flow steering logic and two netdev interfaces,
which require extra logic to manage IPsec configurations as they performed
on netdevs.
[1] https://lore.kernel.org/linux-rdma/[email protected]/
* 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Handle IPsec steering upon master unbind/bind
net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic
net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic
net/mlx5: Add create alias flow table function to ipsec roce
net/mlx5: Implement alias object allow and create functions
net/mlx5: Add alias flow table bits
net/mlx5: Store devcom pointer inside IPsec RoCE
net/mlx5: Register mlx5e priv to devcom in MPV mode
RDMA/mlx5: Send events from IB driver about device affiliation state
net/mlx5: Introduce ifc bits for migration in a chunk mode
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
An io_uring openat operation can update an audit reference count
from multiple threads resulting in the call trace below.
A call to io_uring_submit() with a single openat op with a flag of
IOSQE_ASYNC results in the following reference count updates.
These first part of the system call performs two increments that do not race.
do_syscall_64()
__do_sys_io_uring_enter()
io_submit_sqes()
io_openat_prep()
__io_openat_prep()
getname()
getname_flags() /* update 1 (increment) */
__audit_getname() /* update 2 (increment) */
The openat op is queued to an io_uring worker thread which starts the
opportunity for a race. The system call exit performs one decrement.
do_syscall_64()
syscall_exit_to_user_mode()
syscall_exit_to_user_mode_prepare()
__audit_syscall_exit()
audit_reset_context()
putname() /* update 3 (decrement) */
The io_uring worker thread performs one increment and two decrements.
These updates can race with the system call decrement.
io_wqe_worker()
io_worker_handle_work()
io_wq_submit_work()
io_issue_sqe()
io_openat()
io_openat2()
do_filp_open()
path_openat()
__audit_inode() /* update 4 (increment) */
putname() /* update 5 (decrement) */
__audit_uring_exit()
audit_reset_context()
putname() /* update 6 (decrement) */
The fix is to change the refcnt member of struct audit_names
from int to atomic_t.
kernel BUG at fs/namei.c:262!
Call Trace:
...
? putname+0x68/0x70
audit_reset_context.part.0.constprop.0+0xe1/0x300
__audit_uring_exit+0xda/0x1c0
io_issue_sqe+0x1f3/0x450
? lock_timer_base+0x3b/0xd0
io_wq_submit_work+0x8d/0x2b0
? __try_to_del_timer_sync+0x67/0xa0
io_worker_handle_work+0x17c/0x2b0
io_wqe_worker+0x10a/0x350
Cc: [email protected]
Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Dan Clash <[email protected]>
Link: https://lore.kernel.org/r/20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
The merge commit 92716869375b ("Merge branch 'br-flush-filtering'")
added support for FDB flushing in bridge driver. The following patches
will extend VXLAN driver to support FDB flushing as well. The netlink
message for bulk delete is shared between the drivers. With the existing
implementation, there is no way to prevent user from flushing with
attributes that are not supported per driver. For example, when VNI will
be added, user will not get an error for flush FDB entries in bridge
with VNI, although this attribute is not relevant for bridge.
As preparation for support of FDB flush in VXLAN driver, move the policy
to be handled in bridge driver, later a new policy for VXLAN will be
added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(),
as this field is relevant only for bridge.
Signed-off-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The semantic of chip_data is a bit surprising as it's cleared when
pwm_put() is called. Also there is a big overlap with the standard driver
data.
All drivers were adapted to not make use of chip_data any more, so it can
go away.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Instead of requiring each driver to care for assigning the owner member
of struct pwm_ops, handle that implicitly using a macro. Note that the
owner member has to be moved to struct pwm_chip, as the ops structure
usually lives in read-only memory and so cannot be modified.
The upside is that new low level drivers cannot forget the assignment and
save one line each. The pwm-crc driver didn't assign .owner, that's not
a problem in practice though as the driver cannot be compiled as a
module.
Acked-by: Andy Shevchenko <[email protected]> # Intel LPSS
Reviewed-by: Florian Fainelli <[email protected]> # pwm-{bcm,brcm}*.c
Acked-by: Jernej Skrabec <[email protected]> # sun4i
Acked-by: Andi Shyti <[email protected]>
Acked-by: Nobuhiro Iwamatsu <[email protected]> # pwm-visconti
Acked-by: Heiko Stuebner <[email protected]> # pwm-rockchip
Acked-by: Michael Walle <[email protected]> # pwm-sl28cpld
Acked-by: Neil Armstrong <[email protected]> # pwm-meson
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
|
|
Let's start adding getters for the opaque struct gpio_device. Start with
a function allowing to retrieve the base GPIO number.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Linus Walleij <[email protected]>
|
|
Accessing struct gpio_chip backing a GPIO device is only allowed for the
actual providers of that chip.
Similarly to how we introduced gpio_device_find() in order to replace
the abused gpiochip_find(), let's introduce a counterpart to
gpiod_to_chip() that returns a reference to the GPIO device owning the
descriptor. This is done in order to later remove gpiod_to_chip()
entirely.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Peter Rosin <[email protected]>
Acked-by: Linus Walleij <[email protected]>
|
|
There are users in the kernel who need to retrieve the address of the
struct device backing the GPIO device. Currently they needlessly poke in
the internals of GPIOLIB. Add a dedicated getter function.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Peter Rosin <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Linus Walleij <[email protected]>
|
|
Typo trasmitters -> transmitters.
Signed-off-by: Mika Westerberg <[email protected]>
Signed-off-by: Gil Fine <[email protected]>
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes, the core is msm and amdgpu with some scattered fixes
across vmwgfx, panel and the core stuff.
atomic-helper:
- Relax checks for unregistered connectors
dma-buf:
- Work around race condition when retrieving fence timestamp
gem:
- Avoid OOB access in BO memory range
panel:
- boe-tv101wun-ml6: Fix flickering
simpledrm:
- Fix error output
vwmgfx:
- Fix size calculation in texture-state code
- Ref GEM BOs in surfaces
msm:
- PHY/link training reset fix
- msm8998 - correct highest bank bit
- skip video mode if timing engine disabled
- check irq_of_parse_and_map return code
- add new lines to some prints
- fail atomic check for max mdp clk test
amdgpu:
- Seamless boot fix
- Fix TTM BO resource check
- SI fix for doorbell handling"
* tag 'drm-fixes-2023-10-13' of git://anongit.freedesktop.org/drm/drm:
drm/tiny: correctly print `struct resource *` on error
drm: Do not overrun array in drm_gem_get_pages()
drm/atomic-helper: relax unregistered connector check
drm/panel: boe-tv101wum-nl6: Completely pull GPW to VGL before TP term
drm/amdgpu: fix SI failure due to doorbells allocation
drm/amdgpu: add missing NULL check
drm/amd/display: Don't set dpms_off for seamless boot
drm/vmwgfx: Keep a gem reference to user bos in surfaces
drm/vmwgfx: fix typo of sizeof argument
drm/msm/dpu: fail dpu_plane_atomic_check() based on mdp clk limits
dma-buf: add dma_fence_timestamp helper
drm/msm/dp: Add newlines to debug printks
drm/msm/dpu: change _dpu_plane_calc_bw() to use u64 to avoid overflow
drm/msm/dsi: fix irq_of_parse_and_map() error checking
drm/msm/dsi: skip the wait for video mode done if not applicable
drm/msm/mdss: fix highest-bank-bit for msm8998
drm/msm/dp: do not reinitialize phy unless retry during link training
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Short summary of fixes pull:
* atomic-helper: Relax checks for unregistered connectors
* dma-buf: Work around race condition when retrieving fence timestamp
* gem: Avoid OOB access in BO memory range
* panel:
* boe-tv101wun-ml6: Fix flickering
* simpledrm: Fix error output
* vwmgfx:
* Fix size calculation in texture-state code
* Ref GEM BOs in surfaces
Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20231012111638.GA25037@linux-uq9g
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- In cgroup1, the `tasks` file could have duplicate pids which can
trigger a warning in seq_file. Fix it by removing duplicate items
after sorting
- Comment update
* tag 'cgroup-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Fix incorrect css_set_rwsem reference in comment
cgroup: Remove duplicates in cgroup v1 tasks file
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
kernel/bpf/verifier.c
829955981c55 ("bpf: Fix verifier log for async callback return values")
a923819fb2c5 ("bpf: Treat first argument as return value for bpf_throw")
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN and BPF.
We have a regression in TC currently under investigation, otherwise
the things that stand off most are probably the TCP and AF_PACKET
fixes, with both issues coming from 6.5.
Previous releases - regressions:
- af_packet: fix fortified memcpy() without flex array.
- tcp: fix crashes trying to free half-baked MTU probes
- xdp: fix zero-size allocation warning in xskq_create()
- can: sja1000: always restart the tx queue after an overrun
- eth: mlx5e: again mutually exclude RX-FCS and RX-port-timestamp
- eth: nfp: avoid rmmod nfp crash issues
- eth: octeontx2-pf: fix page pool frag allocation warning
Previous releases - always broken:
- mctp: perform route lookups under a RCU read-side lock
- bpf: s390: fix clobbering the caller's backchain in the trampoline
- phy: lynx-28g: cancel the CDR check work item on the remove path
- dsa: qca8k: fix qca8k driver for Turris 1.x
- eth: ravb: fix use-after-free issue in ravb_tx_timeout_work()
- eth: ixgbe: fix crash with empty VF macvlan list"
* tag 'net-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits)
rswitch: Fix imbalance phy_power_off() calling
rswitch: Fix renesas_eth_sw_remove() implementation
octeontx2-pf: Fix page pool frag allocation warning
nfc: nci: assert requested protocol is valid
af_packet: Fix fortified memcpy() without flex array.
net: tcp: fix crashes trying to free half-baked MTU probes
net/smc: Fix pos miscalculation in statistics
nfp: flower: avoid rmmod nfp crash issues
net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read
ethtool: Fix mod state of verbose no_mask bitset
net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn()
mctp: perform route lookups under a RCU read-side lock
net: skbuff: fix kernel-doc typos
s390/bpf: Fix unwinding past the trampoline
s390/bpf: Fix clobbering the caller's backchain in the trampoline
net/mlx5e: Again mutually exclude RX-FCS and RX-port-timestamp
net/smc: Fix dependency of SMC on ISM
ixgbe: fix crash with empty VF macvlan list
net/mlx5e: macsec: use update_pn flag instead of PN comparation
net: phy: mscc: macsec: reject PN update requests
...
|
|
This simplifies the macro and makes it easy to add the new seqprop's
with 2 or more args.
Plus this way we do not lose the type info, the (void*) type cast is
no longer needed.
And the latter reveals the problem: a lot of seqcount_t helpers pass
the "const seqcount_t *s" argument to __seqprop_ptr(seqcount_t *s)
but (before this patch) "(void *)(s)" masked the problem.
So this patch changes __seqprop_ptr() and __seqprop_##lockname##_ptr()
to accept the "const LOCKNAME *s" argument. This is not nice either,
they need to drop the constness on return because these helpers are used
by both the readers and writers, but at least it is clear what's going on.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
1. Kill the "lockmember" argument. It is always s->lock plus
__seqprop_##lockname##_sequence() already uses s->lock and
ignores "lockmember".
2. Kill the "lock_acquire" argument. __seqprop_##lockname##_sequence()
can use the same "lockbase" prefix for _lock and _unlock.
Apart from line numbers, gcc -E outputs the same code.
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Namhyung reported that bd2756811766 ("perf: Rewrite core context handling")
regresses context switch overhead when perf-cgroup is in use together
with 'slow' PMUs like uncore.
Specifically, perf_cgroup_switch()'s perf_ctx_disable() /
ctx_sched_out() etc.. all iterate the full list of active PMUs for
that CPU, even if they don't have cgroup events.
Previously there was cgrp_cpuctx_list which linked the relevant PMUs
together, but that got lost in the rework. Instead of re-instruducing
a similar list, let the perf_event_pmu_context iteration skip those
that do not have cgroup events. This avoids growing multiple versions
of the perf_event_pmu_context iteration.
Measured performance (on a slightly different patch):
Before)
$ taskset -c 0 ./perf bench sched pipe -l 10000 -G AAA,BBB
# Running 'sched/pipe' benchmark:
# Executed 10000 pipe operations between two processes
Total time: 0.901 [sec]
90.128700 usecs/op
11095 ops/sec
After)
$ taskset -c 0 ./perf bench sched pipe -l 10000 -G AAA,BBB
# Running 'sched/pipe' benchmark:
# Executed 10000 pipe operations between two processes
Total time: 0.065 [sec]
6.560100 usecs/op
152436 ops/sec
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: Namhyung Kim <[email protected]>
Debugged-by: Namhyung Kim <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Since commit f0d9a5f17575 ("cgroup: make css_set_rwsem a spinlock
and rename it to css_set_lock"), css_set_rwsem has been replaced by
css_set_lock. That commit, however, missed the css_set_rwsem reference
in include/linux/cgroup-defs.h. Fix that by changing it to css_set_lock
as well.
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Factor out vfs_parse_monolithic_sep() from generic_parse_monolithic(),
so filesystems could use it with a custom option separator callback.
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
|
|
One of the ways of looking up GPIO devices is using their fwnode.
Provide a helper for that to avoid every user implementing their
own matching function.
Reviewed-by: Dipen Patel <[email protected]>
Tested-by: Dipen Patel <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
Correct spelling of "beginning".
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: [email protected]
Reviewed-by: Thomas Zimmermann <[email protected]>
Signed-off-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Now that napi_schedule return a bool, we can drop napi_reschedule that
does the same exact function. The function comes from a very old commit
bfe13f54f502 ("ibm_emac: Convert to use napi_struct independent of struct
net_device") and the purpose is actually deprecated in favour of
different logic.
Convert every user of napi_reschedule to napi_schedule.
Signed-off-by: Christian Marangi <[email protected]>
Acked-by: Jeff Johnson <[email protected]> # ath10k
Acked-by: Nick Child <[email protected]> # ibm
Acked-by: Marc Kleine-Budde <[email protected]> # for can/dev/rx-offload.c
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Tariq Toukan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Change napi_schedule to return a bool on NAPI successful schedule.
This might be useful for some driver to do additional steps after a
NAPI has been scheduled.
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Christian Marangi <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
These hooks allows intercepting connect(), getsockname(),
getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
socket hooks get write access to the address length because the
address length is not fixed when dealing with unix sockets and
needs to be modified when a unix socket address is modified by
the hook. Because abstract socket unix addresses start with a
NUL byte, we cannot recalculate the socket address in kernelspace
after running the hook by calculating the length of the unix socket
path using strlen().
These hooks can be used when users want to multiplex syscall to a
single unix socket to multiple different processes behind the scenes
by redirecting the connect() and other syscalls to process specific
sockets.
We do not implement support for intercepting bind() because when
using bind() with unix sockets with a pathname address, this creates
an inode in the filesystem which must be cleaned up. If we rewrite
the address, the user might try to clean up the wrong file, leaking
the socket in the filesystem where it is never cleaned up. Until we
figure out a solution for this (and a use case for intercepting bind()),
we opt to not allow rewriting the sockaddr in bind() calls.
We also implement recvmsg() support for connected streams so that
after a connect() that is modified by a sockaddr hook, any corresponding
recmvsg() on the connected socket can also be modified to make the
connected program think it is connected to the "intended" remote.
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Daan De Meyer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
As prep for adding unix socket support to the cgroup sockaddr hooks,
let's propagate the sockaddr length back to the caller after running
a bpf cgroup sockaddr hook program. While not important for AF_INET or
AF_INET6, the sockaddr length is important when working with AF_UNIX
sockaddrs as the size of the sockaddr cannot be determined just from the
address family or the sockaddr's contents.
__cgroup_bpf_run_filter_sock_addr() is modified to take the uaddrlen as
an input/output argument. After running the program, the modified sockaddr
length is stored in the uaddrlen pointer.
Signed-off-by: Daan De Meyer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota regression fix from Jan Kara.
* tag 'fs_for_v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix slow quotaoff
|
|
Rename arch_has_sparse_bitmaps to arch_has_sparse_bitmasks to ensure
consistent terminology throughout resctrl.
Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Maciej Wieczor-Retman <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Reviewed-by: Peter Newman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Peter Newman <[email protected]>
Link: https://lore.kernel.org/r/e330fcdae873ef1a831e707025a4b70fa346666e.1696934091.git.maciej.wieczor-retman@intel.com
|
|
For in-kernel consumers one cannot readily assign a user (eg when
running from a workqueue), so the normal key search permissions
cannot be applied.
This patch exports the 'key_lookup()' function for a simple lookup
of keys without checking for permissions.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Acked-by: David Howells <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
Implement a function to select the preferred PSK for TLS.
A 'retained' PSK should be preferred over a 'generated' PSK,
and SHA-384 should be preferred to SHA-256.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
Register a '.nvme' keyring to hold keys for TLS and DH-HMAC-CHAP and
add a new config option NVME_KEYRING.
We need a separate keyring for NVMe as the configuration is done
via individual commands (eg for configfs), and the usual per-session
or per-process keyrings can't be used.
Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
The struct spi_message can be embedded into another structures.
With that the flexible array might be problematic as sparse
complains about it, although there is no real issue in the code
because when the message is embedded it doesn't use flexible array
member. That memeber is a private to spi_message_alloc() API, so
move it to that API in a form of an inherited data type.
Reported-by: Marc Kleine-Budde <[email protected]>
Fixes: 75e308ffc4f0 ("spi: Use struct_size() helper"))
Closes: https://lore.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Marc Kleine-Budde <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
Enable unprivileged sandboxes to create their own binfmt_misc mounts.
This is based on Laurent's work in [1] but has been significantly
reworked to fix various issues we identified in earlier versions.
While binfmt_misc can currently only be mounted in the initial user
namespace, binary types registered in this binfmt_misc instance are
available to all sandboxes (Either by having them installed in the
sandbox or by registering the binary type with the F flag causing the
interpreter to be opened right away). So binfmt_misc binary types are
already delegated to sandboxes implicitly.
However, while a sandbox has access to all registered binary types in
binfmt_misc a sandbox cannot currently register its own binary types
in binfmt_misc. This has prevented various use-cases some of which were
already outlined in [1] but we have a range of issues associated with
this (cf. [3]-[5] below which are just a small sample).
Extend binfmt_misc to be mountable in non-initial user namespaces.
Similar to other filesystem such as nfsd, mqueue, and sunrpc we use
keyed superblock management. The key determines whether we need to
create a new superblock or can reuse an already existing one. We use the
user namespace of the mount as key. This means a new binfmt_misc
superblock is created once per user namespace creation. Subsequent
mounts of binfmt_misc in the same user namespace will mount the same
binfmt_misc instance. We explicitly do not create a new binfmt_misc
superblock on every binfmt_misc mount as the semantics for
load_misc_binary() line up with the keying model. This also allows us to
retrieve the relevant binfmt_misc instance based on the caller's user
namespace which can be done in a simple (bounded to 32 levels) loop.
Similar to the current binfmt_misc semantics allowing access to the
binary types in the initial binfmt_misc instance we do allow sandboxes
access to their parent's binfmt_misc mounts if they do not have created
a separate binfmt_misc instance.
Overall, this will unblock the use-cases mentioned below and in general
will also allow to support and harden execution of another
architecture's binaries in tight sandboxes. For instance, using the
unshare binary it possible to start a chroot of another architecture and
configure the binfmt_misc interpreter without being root to run the
binaries in this chroot and without requiring the host to modify its
binary type handlers.
Henning had already posted a few experiments in the cover letter at [1].
But here's an additional example where an unprivileged container
registers qemu-user-static binary handlers for various binary types in
its separate binfmt_misc mount and is then seamlessly able to start
containers with a different architecture without affecting the host:
root [lxc monitor] /var/snap/lxd/common/lxd/containers f1
1000000 \_ /sbin/init
1000000 \_ /lib/systemd/systemd-journald
1000000 \_ /lib/systemd/systemd-udevd
1000100 \_ /lib/systemd/systemd-networkd
1000101 \_ /lib/systemd/systemd-resolved
1000000 \_ /usr/sbin/cron -f
1000103 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
1000000 \_ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
1000104 \_ /usr/sbin/rsyslogd -n -iNONE
1000000 \_ /lib/systemd/systemd-logind
1000000 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220
1000107 \_ dnsmasq --conf-file=/dev/null -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --liste
1000000 \_ [lxc monitor] /var/lib/lxc f1-s390x
1100000 \_ /usr/bin/qemu-s390x-static /sbin/init
1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-journald
1100000 \_ /usr/bin/qemu-s390x-static /usr/sbin/cron -f
1100103 \_ /usr/bin/qemu-s390x-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-ac
1100000 \_ /usr/bin/qemu-s390x-static /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
1100104 \_ /usr/bin/qemu-s390x-static /usr/sbin/rsyslogd -n -iNONE
1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-logind
1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220
1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/0 115200,38400,9600 vt220
1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/1 115200,38400,9600 vt220
1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/2 115200,38400,9600 vt220
1100000 \_ /usr/bin/qemu-s390x-static /sbin/agetty -o -p -- \u --noclear --keep-baud pts/3 115200,38400,9600 vt220
1100000 \_ /usr/bin/qemu-s390x-static /lib/systemd/systemd-udevd
[1]: https://lore.kernel.org/all/[email protected]
[2]: https://discuss.linuxcontainers.org/t/binfmt-misc-permission-denied
[3]: https://discuss.linuxcontainers.org/t/lxd-binfmt-support-for-qemu-static-interpreters
[4]: https://discuss.linuxcontainers.org/t/3-1-0-binfmt-support-service-in-unprivileged-guest-requires-write-access-on-hosts-proc-sys-fs-binfmt-misc
[5]: https://discuss.linuxcontainers.org/t/qemu-user-static-not-working-4-11
Link: https://lore.kernel.org/r/[email protected] (origin)
Link: https://lore.kernel.org/r/[email protected] (v1)
Cc: Sargun Dhillon <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Henning Schild <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Laurent Vivier <[email protected]>
Cc: [email protected]
Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
/* v2 */
- Serge Hallyn <[email protected]>:
- Use GFP_KERNEL_ACCOUNT for userspace triggered allocations when a
new binary type handler is registered.
- Christian Brauner <[email protected]>:
- Switch authorship to me. I refused to do that earlier even though
Laurent said I should do so because I think it's genuinely bad form.
But by now I have changed so many things that it'd be unfair to
blame Laurent for any potential bugs in here.
- Add more comments that explain what's going on.
- Rename functions while changing them to better reflect what they are
doing to make the code easier to understand.
- In the first version when a specific binary type handler was removed
either through a write to the entry's file or all binary type
handlers were removed by a write to the binfmt_misc mount's status
file all cleanup work happened during inode eviction.
That includes removal of the relevant entries from entry list. While
that works fine I disliked that model after thinking about it for a
bit. Because it means that there was a window were someone has
already removed a or all binary handlers but they could still be
safely reached from load_misc_binary() when it has managed to take
the read_lock() on the entries list while inode eviction was already
happening. Again, that perfectly benign but it's cleaner to remove
the binary handler from the list immediately meaning that ones the
write to then entry's file or the binfmt_misc status file returns
the binary type cannot be executed anymore. That gives stronger
guarantees to the user.
|
|
This adds a new optional field to struct iio_info to allow drivers to
specify a label for the event. This is useful for cases where there are
many events or the event attribute name is not descriptive enough or
where an event doesn't have any other attributes.
The implementation is based on the existing label support for channels.
So either all events of a device have a label attribute or none do.
Signed-off-by: David Lechner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Cameron <[email protected]>
|
|
Provide common prototypes for arch_register_cpu() and
arch_unregister_cpu(). These are called by acpi_processor.c, with weak
versions, so the prototype for this is already set. It is generally not
necessary for function prototypes to be conditional on preprocessor macros.
Some architectures (e.g. Loongarch) are missing the prototype for this, and
rather than add it to Loongarch's asm/cpu.h, do the job once for everyone.
Since this covers everyone, remove the now unnecessary prototypes in
asm/cpu.h, and therefore remove the 'static' from one of ia64's
arch_register_cpu() definitions.
[ tglx: Bring back the ia64 part and remove the ACPI prototypes ]
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Implements the USB part of Intel USB-I2C/GPIO/SPI adapter device
named "La Jolla Cove Adapter" (LJCA).
The communication between the various LJCA module drivers and the
hardware will be muxed/demuxed by this driver. Three modules (
I2C, GPIO, and SPI) are supported currently.
Each sub-module of LJCA device is identified by type field within
the LJCA message header.
The sub-modules of LJCA can use ljca_transfer() to issue a transfer
between host and hardware. And ljca_register_event_cb is exported
to LJCA sub-module drivers for hardware event subscription.
The minimum code in ASL that covers this board is
Scope (\_SB.PCI0.DWC3.RHUB.HS01)
{
Device (GPIO)
{
Name (_ADR, Zero)
Name (_STA, 0x0F)
}
Device (I2C)
{
Name (_ADR, One)
Name (_STA, 0x0F)
}
Device (SPI)
{
Name (_ADR, 0x02)
Name (_STA, 0x0F)
}
}
Signed-off-by: Wentong Wu <[email protected]>
Reviewed-by: Sakari Ailus <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Tested-by: Hans de Goede <[email protected]>
Reviewed-by: Oliver Neukum <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Although there is a kfree_skb_reason() helper function that can be used to
find the reason why this skb is dropped, but most callers didn't increase
one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
For the users, people are more concerned about why the dropped in ip
is increasing.
Introduce netdev_core_stats_inc() for trace the caller of
dev_core_stats_*_inc().
Also, add __code to netdev_core_stats_alloc(), as it's called with small
probability. And add noinline make sure netdev_core_stats_inc was never
inlined.
Signed-off-by: Yajun Deng <[email protected]>
Suggested-by: Alexander Lobakin <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Correct punctuation and drop an extraneous word.
Signed-off-by: Randy Dunlap <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
VMAs are skipped if there is no recent fault activity but this represents
a chicken-and-egg problem as there may be no fault activity if the PTEs
are never updated to trap NUMA hints. There is an indirect reliance on
scanning to be forced early in the lifetime of a task but this may fail
to detect changes in phase behaviour. Force inactive VMAs to be scanned
when all other eligible VMAs have been updated within the same scan
sequence.
Test results in general look good with some changes in performance, both
negative and positive, depending on whether the additional scanning and
faulting was beneficial or not to the workload. The autonuma benchmark
workload NUMA01_THREADLOCAL was picked for closer examination. The workload
creates two processes with numerous threads and thread-local storage that
is zero-filled in a loop. It exercises the corner case where unrelated
threads may skip VMAs that are thread-local to another thread and still
has some VMAs that inactive while the workload executes.
The VMA skipping activity frequency with and without the patch:
6.6.0-rc2-sched-numabtrace-v1
=============================
649 reason=scan_delay
9,094 reason=unsuitable
48,915 reason=shared_ro
143,919 reason=inaccessible
193,050 reason=pid_inactive
6.6.0-rc2-sched-numabselective-v1
=============================
146 reason=seq_completed
622 reason=ignore_pid_inactive
624 reason=scan_delay
6,570 reason=unsuitable
16,101 reason=shared_ro
27,608 reason=inaccessible
41,939 reason=pid_inactive
Note that with the patch applied, the PID activity is ignored
(ignore_pid_inactive) to ensure a VMA with some activity is completely
scanned. In addition, a small number of VMAs are scanned when no other
eligible VMA is available during a single scan window (seq_completed).
The number of times a VMA is skipped due to no PID activity from the
scanning task (pid_inactive) drops dramatically. It is expected that
this will increase the number of PTEs updated for NUMA hinting faults
as well as hinting faults but these represent PTEs that would otherwise
have been missed. The tradeoff is scan+fault overhead versus improving
locality due to migration.
On a 2-socket Cascade Lake test machine, the time to complete the
workload is as follows;
6.6.0-rc2 6.6.0-rc2
sched-numabtrace-v1 sched-numabselective-v1
Min elsp-NUMA01_THREADLOCAL 174.22 ( 0.00%) 117.64 ( 32.48%)
Amean elsp-NUMA01_THREADLOCAL 175.68 ( 0.00%) 123.34 * 29.79%*
Stddev elsp-NUMA01_THREADLOCAL 1.20 ( 0.00%) 4.06 (-238.20%)
CoeffVar elsp-NUMA01_THREADLOCAL 0.68 ( 0.00%) 3.29 (-381.70%)
Max elsp-NUMA01_THREADLOCAL 177.18 ( 0.00%) 128.03 ( 27.74%)
The time to complete the workload is reduced by almost 30%:
6.6.0-rc2 6.6.0-rc2
sched-numabtrace-v1 sched-numabselective-v1 /
Duration User 91201.80 63506.64
Duration System 2015.53 1819.78
Duration Elapsed 1234.77 868.37
In this specific case, system CPU time was not increased but it's not
universally true.
From vmstat, the NUMA scanning and fault activity is as follows;
6.6.0-rc2 6.6.0-rc2
sched-numabtrace-v1 sched-numabselective-v1
Ops NUMA base-page range updates 64272.00 26374386.00
Ops NUMA PTE updates 36624.00 55538.00
Ops NUMA PMD updates 54.00 51404.00
Ops NUMA hint faults 15504.00 75786.00
Ops NUMA hint local faults % 14860.00 56763.00
Ops NUMA hint local percent 95.85 74.90
Ops NUMA pages migrated 1629.00 6469222.00
Both the number of PTE updates and hint faults is dramatically
increased. While this is superficially unfortunate, it represents
ranges that were simply skipped without the patch. As a result
of the scanning and hinting faults, many more pages were also
migrated but as the time to completion is reduced, the overhead
is offset by the gain.
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Raghavendra K T <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
NUMA Balancing skips VMAs when the current task has not trapped a NUMA
fault within the VMA. If the VMA is skipped then mm->numa_scan_offset
advances and a task that is trapping faults within the VMA may never
fully update PTEs within the VMA.
Force tasks to update PTEs for partially scanned PTEs. The VMA will
be tagged for NUMA hints by some task but this removes some of the
benefit of tracking PID activity within a VMA. A follow-on patch
will mitigate this problem.
The test cases and machines evaluated did not trigger the corner case so
the performance results are neutral with only small changes within the
noise from normal test-to-test variance. However, the next patch makes
the corner case easier to trigger.
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Raghavendra K T <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Introduce a new iommu_domain op to create domains owned by userspace,
e.g. through IOMMUFD. These domains have a few different properties
compares to kernel owned domains:
- They may be PAGING domains, but created with special parameters.
For instance aperture size changes/number of levels, different
IOPTE formats, or other things necessary to make a vIOMMU work
- We have to track all the memory allocations with GFP_KERNEL_ACCOUNT
to make the cgroup sandbox stronger
- Device-specialty domains, such as NESTED domains can be created by
IOMMUFD.
The new op clearly says the domain is being created by IOMMUFD, that the
domain is intended for userspace use, and it provides a way to pass user
flags or a driver specific uAPI structure to customize the created domain
to exactly what the vIOMMU userspace driver requires.
iommu drivers that cannot support VFIO/IOMMUFD should not support this
op. This includes any driver that cannot provide a fully functional PAGING
domain.
This new op for now is only supposed to be used by IOMMUFD, hence no
wrapper for it. IOMMUFD would call the callback directly. As for domain
free, IOMMUFD would use iommu_domain_free().
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Co-developed-by: Nicolin Chen <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
This makes it harder for accidental or malicious changes to
sockfs_xattr_handlers at runtime.
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: [email protected]
Signed-off-by: Wedson Almeida Filho <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
The MIPI I3C spec refers to a Provisioned ID, since it is (sometimes)
provisioned at device manufacturing.
Signed-off-by: Matt Johnston <[email protected]>
Acked-by: Rob Herring <[email protected]>
Reviewed-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexandre Belloni <[email protected]>
|
|
NUMA balancing skips or scans VMAs for a variety of reasons. In preparation
for completing scans of VMAs regardless of PID access, trace the reasons
why a VMA was skipped. In a later patch, the tracing will be used to track
if a VMA was forcibly scanned.
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
::next_pid_reset => ::pids_active_reset
The access_pids[] field name is somewhat ambiguous as no PIDs are accessed.
Similarly, it's not clear that next_pid_reset is related to access_pids[].
Rename the fields to more accurately reflect their purpose.
[ mingo: Rename in the comments too. ]
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Document the intended usage of the fields.
[ mingo: Reformatted to take less vertical space & tidied it up. ]
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|