Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix the module compression with xz so the in-kernel decompressor
works
- Document a kconfig idiom to express an optional dependency between
modules
- Make modpost, when W=1 is given, detect broken drivers that reference
.exit.* sections
- Remove unused code
* tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: remove stale code for 'source' symlink in packaging scripts
modpost: Don't let "driver"s reference .exit.*
vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
modpost: add missing else to the "of" check
Documentation: kbuild: explain handling optional dependencies
kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
pertain to issues which were introduced after 6.5"
* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Crash: add lock to serialize crash hotplug handling
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
mm, memcg: reconsider kmem.limit_in_bytes deprecation
mm: zswap: fix potential memory corruption on duplicate store
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
mm: hugetlb: add huge page size param to set_huge_pte_at()
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
maple_tree: add mas_is_active() to detect in-tree walks
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
mm: abstract moving to the next PFN
mm: report success more often from filemap_map_folio_range()
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull misc driver fix from Greg KH:
"Here is a single, much requested, fix for a set of misc drivers to
resolve a much reported regression in the -rc series that has also
propagated back to the stable releases. Sorry for the delay, lots of
conference travel for a few weeks put me very far behind in patch
wrangling.
It has been reported by many to resolve the reported problem, and has
been in linux-next with no reported issues"
* tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver fixes from Greg KH:
"Here are two tty/serial driver fixes for 6.6-rc4 that resolve some
reported regressions:
- revert a n_gsm change that ended up causing problems
- 8250_port fix for irq data
both have been in linux-next for over a week with no reported
problems"
* tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux"
serial: 8250_port: Check IRQ data before use
|
|
Similar to the change in commit 0bdf399342c5("net: Avoid address
overwrite in kernel_connect"), BPF hooks run on bind may rewrite the
address passed to kernel_bind(). This change
1) Makes a copy of the bind address in kernel_bind() to insulate
callers.
2) Replaces direct calls to sock->ops->bind() in net with kernel_bind()
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Cc: [email protected]
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
space may observe their value of msg_name change in cases where BPF
sendmsg hooks rewrite the send address. This has been confirmed to break
NFS mounts running in UDP mode and has the potential to break other
systems.
This patch:
1) Creates a new function called __sock_sendmsg() with same logic as the
old sock_sendmsg() function.
2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
__sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
as these system calls are already protected.
3) Modifies sock_sendmsg() so that it makes a copy of msg_name if
present before passing it down the stack to insulate callers from
changes to the send address.
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Cc: [email protected]
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
commit 0bdf399342c5 ("net: Avoid address overwrite in kernel_connect")
ensured that kernel_connect() will not overwrite the address parameter
in cases where BPF connect hooks perform an address rewrite. This change
replaces direct calls to sock->ops->connect() in net with kernel_connect()
to make these call safe.
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Cc: [email protected]
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Including the transhdrlen in length is a problem when the packet is
partially filled (e.g. something like send(MSG_MORE) happened previously)
when appending to an IPv4 or IPv6 packet as we don't want to repeat the
transport header or account for it twice. This can happen under some
circumstances, such as splicing into an L2TP socket.
The symptom observed is a warning in __ip6_append_data():
WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800
that occurs when MSG_SPLICE_PAGES is used to append more data to an already
partially occupied skbuff. The warning occurs when 'copy' is larger than
the amount of data in the message iterator. This is because the requested
length includes the transport header length when it shouldn't. This can be
triggered by, for example:
sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
bind(sfd, ...); // ::1
connect(sfd, ...); // ::1 port 7
send(sfd, buffer, 4100, MSG_MORE);
sendfile(sfd, dfd, NULL, 1024);
Fix this by only adding transhdrlen into the length if the write queue is
empty in l2tp_ip6_sendmsg(), analogously to how UDP does things.
l2tp_ip_sendmsg() looks like it won't suffer from this problem as it builds
the UDP packet itself.
Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Reported-by: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Suggested-by: Willem de Bruijn <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Willem de Bruijn <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: David Ahern <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a kerneldoc build warning fix, add SRSO mitigation for
AMD-derived Hygon processors, and fix a SGX kernel crash in the page
fault handler that can trigger when ksgxd races to reclaim the SECS
special page, by making the SECS page unswappable"
* tag 'x86-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Resolves SECS reclaim vs. page fault for EAUG race
x86/srso: Add SRSO mitigation for Hygon processors
x86/kgdb: Fix a kerneldoc warning when build with W=1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix a spurious kernel warning during CPU hotplug events that may
trigger when timer/hrtimer softirqs are pending, which are otherwise
hotplug-safe and don't merit a warning"
* tag 'timers-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Tag (hr)timer softirq as hotplug safe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a RT tasks related lockup/live-lock during CPU offlining"
* tag 'sched-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Fix live lock between select_fallback_rq() and RT push
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
"Misc fixes: work around an AMD microcode bug on certain models, and
fix kexec kernel PMI handlers on AMD systems that get loaded on older
kernels that have an unexpected register state"
* tag 'perf-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd: Do not WARN() on every IRQ
perf/x86/amd/core: Fix overflow reset on hotplug
|
|
n->output field can be read locklessly, while a writer
might change the pointer concurrently.
Add missing annotations to prevent load-store tearing.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
While looking at a related syzbot report involving neigh_periodic_work(),
I found that I forgot to add an annotation when deleting an
RCU protected item from a list.
Readers use rcu_deference(*np), we need to use either
rcu_assign_pointer() or WRITE_ONCE() on writer side
to prevent store tearing.
I use rcu_assign_pointer() to have lockdep support,
this was the choice made in neigh_flush_dev().
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since commit d8131c2965d5 ("kbuild: remove $(MODLIB)/source symlink"),
modules_install does not create the 'source' symlink.
Remove the stale code from builddeb and kernel.spec.
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
bluetooth pull request for net:
- Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER
- Fix handling of listen for ISO unicast
- Fix build warnings
- Fix leaking content of local_codecs
- Add shutdown function for QCA6174
- Delete unused hci_req_prepare_suspend() declaration
- Fix hci_link_tx_to RCU lock usage
- Avoid redundant authentication
Signed-off-by: David S. Miller <[email protected]>
|
|
The second parameter of stmmac_pltfr_init() needs the pointer of
"struct plat_stmmacenet_data". So, correct the parameter typo when calling the
function.
Otherwise, it may cause this alignment exception when doing suspend/resume.
[ 49.067201] CPU1 is up
[ 49.135258] Internal error: SP/PC alignment exception: 000000008a000000 [#1] PREEMPT SMP
[ 49.143346] Modules linked in: soc_imx9 crct10dif_ce polyval_ce nvmem_imx_ocotp_fsb_s400 polyval_generic layerscape_edac_mod snd_soc_fsl_asoc_card snd_soc_imx_audmux snd_soc_imx_card snd_soc_wm8962 el_enclave snd_soc_fsl_micfil rtc_pcf2127 rtc_pcf2131 flexcan can_dev snd_soc_fsl_xcvr snd_soc_fsl_sai imx8_media_dev(C) snd_soc_fsl_utils fuse
[ 49.173393] CPU: 0 PID: 565 Comm: sh Tainted: G C 6.5.0-rc4-next-20230804-05047-g5781a6249dae #677
[ 49.183721] Hardware name: NXP i.MX93 11X11 EVK board (DT)
[ 49.189190] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 49.196140] pc : 0x80800052
[ 49.198931] lr : stmmac_pltfr_resume+0x34/0x50
[ 49.203368] sp : ffff800082f8bab0
[ 49.206670] x29: ffff800082f8bab0 x28: ffff0000047d0ec0 x27: ffff80008186c170
[ 49.213794] x26: 0000000b5e4ff1ba x25: ffff800081e5fa74 x24: 0000000000000010
[ 49.220918] x23: ffff800081fe0000 x22: 0000000000000000 x21: 0000000000000000
[ 49.228042] x20: ffff0000001b4010 x19: ffff0000001b4010 x18: 0000000000000006
[ 49.235166] x17: ffff7ffffe007000 x16: ffff800080000000 x15: 0000000000000000
[ 49.242290] x14: 00000000000000fc x13: 0000000000000000 x12: 0000000000000000
[ 49.249414] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082f8b8c0
[ 49.256538] x8 : 0000000000000008 x7 : 0000000000000001 x6 : 000000005f54a200
[ 49.263662] x5 : 0000000001000000 x4 : ffff800081b93680 x3 : ffff800081519be0
[ 49.270786] x2 : 0000000080800052 x1 : 0000000000000000 x0 : ffff0000001b4000
[ 49.277911] Call trace:
[ 49.280346] 0x80800052
[ 49.282781] platform_pm_resume+0x2c/0x68
[ 49.286785] dpm_run_callback.constprop.0+0x74/0x134
[ 49.291742] device_resume+0x88/0x194
[ 49.295391] dpm_resume+0x10c/0x230
[ 49.298866] dpm_resume_end+0x18/0x30
[ 49.302515] suspend_devices_and_enter+0x2b8/0x624
[ 49.307299] pm_suspend+0x1fc/0x348
[ 49.310774] state_store+0x80/0x104
[ 49.314258] kobj_attr_store+0x18/0x2c
[ 49.318002] sysfs_kf_write+0x44/0x54
[ 49.321659] kernfs_fop_write_iter+0x120/0x1ec
[ 49.326088] vfs_write+0x1bc/0x300
[ 49.329485] ksys_write+0x70/0x104
[ 49.332874] __arm64_sys_write+0x1c/0x28
[ 49.336783] invoke_syscall+0x48/0x114
[ 49.340527] el0_svc_common.constprop.0+0xc4/0xe4
[ 49.345224] do_el0_svc+0x38/0x98
[ 49.348526] el0_svc+0x2c/0x84
[ 49.351568] el0t_64_sync_handler+0x100/0x12c
[ 49.355910] el0t_64_sync+0x190/0x194
[ 49.359567] Code: ???????? ???????? ???????? ???????? (????????)
[ 49.365644] ---[ end trace 0000000000000000 ]---
Fixes: 97117eb51ec8 ("net: stmmac: platform: provide stmmac_pltfr_init()")
Signed-off-by: Clark Wang <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Drivers must not reference functions marked with __exit as these likely
are not available when the code is built-in.
There are few creative offenders uncovered for example in ARCH=amd64
allmodconfig builds. So only trigger the section mismatch warning for
W=1 builds.
The dual rule that drivers must not reference .init.* is implemented
since commit 0db252452378 ("modpost: don't allow *driver to reference
.init.*") which however missed that .exit.* should be handled in the
same way.
Thanks to Masahiro Yamada and Arnd Bergmann who gave valuable hints to
find this improvement.
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
Remove the left-over of commit e24f6628811e ("modpost: remove all
traces of cpuinit/cpuexit sections").
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Paul Gortmaker <[email protected]>
|
|
Without this 'else' statement, an "usb" name goes into two handlers:
the first/previous 'if' statement _AND_ the for-loop over 'devtable',
but the latter is useless as it has no 'usb' device_id entry anyway.
Tested with allmodconfig before/after patch; no changes to *.mod.c:
git checkout v6.6-rc3
make -j$(nproc) allmodconfig
make -j$(nproc) olddefconfig
make -j$(nproc)
find . -name '*.mod.c' | cpio -pd /tmp/before
# apply patch
make -j$(nproc)
find . -name '*.mod.c' | cpio -pd /tmp/after
diff -r /tmp/before/ /tmp/after/
# no difference
Fixes: acbef7b76629 ("modpost: fix module autoloading for OF devices with generic compatible property")
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These are the latest bug fixes that have come up in the soc tree. Most
of these are fairly minor. Most notably, the majority of changes this
time are not for dts files as usual.
- Updates to the addresses of the broadcom and aspeed entries in the
MAINTAINERS file.
- Defconfig updates to address a regression on samsung and a build
warning from an unknown Kconfig symbol
- Build fixes for the StrongARM and Uniphier platforms
- Code fixes for SCMI and FF-A firmware drivers, both of which had a
simple bug that resulted in invalid data, and a lesser fix for the
optee firmware driver
- Multiple fixes for the recently added loongson/loongarch "guts" soc
driver
- Devicetree fixes for RISC-V on the startfive platform, addressing
issues with NOR flash, usb and uart.
- Multiple fixes for NXP i.MX8/i.MX9 dts files, fixing problems with
clock, gpio, hdmi settings and the Makefile
- Bug fixes for i.MX firmware code and the OCOTP soc driver
- Multiple fixes for the TI sysc bus driver
- Minor dts updates for TI omap dts files, to address boot time
warnings and errors"
* tag 'soc-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
MAINTAINERS: Fix Florian Fainelli's email address
arm64: defconfig: enable syscon-poweroff driver
ARM: locomo: fix locomolcd_power declaration
soc: loongson: loongson2_guts: Remove unneeded semicolon
soc: loongson: loongson2_guts: Convert to devm_platform_ioremap_resource()
soc: loongson: loongson_pm2: Populate children syscon nodes
dt-bindings: soc: loongson,ls2k-pmc: Allow syscon-reboot/syscon-poweroff as child
soc: loongson: loongson_pm2: Drop useless of_device_id compatible
dt-bindings: soc: loongson,ls2k-pmc: Use fallbacks for ls2k-pmc compatible
soc: loongson: loongson_pm2: Add dependency for INPUT
arm64: defconfig: remove CONFIG_COMMON_CLK_NPCM8XX=y
ARM: uniphier: fix cache kernel-doc warnings
MAINTAINERS: aspeed: Update Andrew's email address
MAINTAINERS: aspeed: Update git tree URL
firmware: arm_ffa: Don't set the memory region attributes for MEM_LEND
arm64: dts: imx: Add imx8mm-prt8mm.dtb to build
arm64: dts: imx8mm-evk: Fix hdmi@3d node
soc: imx8m: Enable OCOTP clock for imx8mm before reading registers
arm64: dts: imx8mp-beacon-kit: Fix audio_pll2 clock
arm64: dts: imx8mp: Fix SDMA2/3 clocks
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Make sure 32-bit applications using user events have aligned access
when running on a 64-bit kernel.
- Add cond_resched in the loop that handles converting enums in
print_fmt string is trace events.
- Fix premature wake ups of polling processes in the tracing ring
buffer. When a task polls waiting for a percentage of the ring buffer
to be filled, the writer still will wake it up at every event. Add
the polling's percentage to the "shortest_full" list to tell the
writer when to wake it up.
- For eventfs dir lookups on dynamic events, an event system's only
event could be removed, leaving its dentry with no children. This is
totally legitimate. But in eventfs_release() it must not access the
children array, as it is only allocated when the dentry has children.
* tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Test for dentries array allocated in eventfs_release()
tracing/user_events: Align set_bit() address for all archs
tracing: relax trace_event_eval_update() execution with cond_resched()
ring-buffer: Update "shortest_full" in polling
|
|
The dcache_dir_open_wrapper() could be called when a dynamic event is
being deleted leaving a dentry with no children. In this case the
dlist->dentries array will never be allocated. This needs to be checked
for in eventfs_release(), otherwise it will trigger a NULL pointer
dereference.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Mark Rutland <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Fixes: ef36b4f92868 ("eventfs: Remember what dentries were created on dir open")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
All architectures should use a long aligned address passed to set_bit().
User processes can pass either a 32-bit or 64-bit sized value to be
updated when tracing is enabled when on a 64-bit kernel. Both cases are
ensured to be naturally aligned, however, that is not enough. The
address must be long aligned without affecting checks on the value
within the user process which require different adjustments for the bit
for little and big endian CPUs.
Add a compat flag to user_event_enabler that indicates when a 32-bit
value is being used on a 64-bit kernel. Long align addresses and correct
the bit to be used by set_bit() to account for this alignment. Ensure
compat flags are copied during forks and used during deletion clears.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Cc: [email protected]
Fixes: 7235759084a4 ("tracing/user_events: Use remote writes for event enablement")
Reported-by: Clément Léger <[email protected]>
Suggested-by: Clément Léger <[email protected]>
Signed-off-by: Beau Belgrave <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.
Adding a call to cond_resched() in trace_event_eval_update() allows
other tasks to be executed and thus continue working asynchronously
like before without blocking any pending task at boot time.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Clément Léger <[email protected]>
Tested-by: Atish Patra <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
It was discovered that the ring buffer polling was incorrectly stating
that read would not block, but that's because polling did not take into
account that reads will block if the "buffer-percent" was set. Instead,
the ring buffer polling would say reads would not block if there was any
data in the ring buffer. This was incorrect behavior from a user space
point of view. This was fixed by commit 42fb0a1e84ff by having the polling
code check if the ring buffer had more data than what the user specified
"buffer percent" had.
The problem now is that the polling code did not register itself to the
writer that it wanted to wait for a specific "full" value of the ring
buffer. The result was that the writer would wake the polling waiter
whenever there was a new event. The polling waiter would then wake up, see
that there's not enough data in the ring buffer to notify user space and
then go back to sleep. The next event would wake it up again.
Before the polling fix was added, the code would wake up around 100 times
for a hackbench 30 benchmark. After the "fix", due to the constant waking
of the writer, it would wake up over 11,0000 times! It would never leave
the kernel, so the user space behavior was still "correct", but this
definitely is not the desired effect.
To fix this, have the polling code add what it's waiting for to the
"shortest_full" variable, to tell the writer not to wake it up if the
buffer is not as full as it expects to be.
Note, after this fix, it appears that the waiter is now woken up around 2x
the times it was before (~200). This is a tremendous improvement from the
11,000 times, but I will need to spend some time to see why polling is
more aggressive in its wakeups than the read blocking code.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Fixes: 42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
Reported-by: Julia Lawall <[email protected]>
Tested-by: Julia Lawall <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Add the missing endianness conversion when sending the enable request so
that the driver will work also on a hypothetical big-endian machine.
This issue was reported by sparse.
Fixes: 29e8142b5623 ("power: supply: Introduce Qualcomm PMIC GLINK power supply")
Cc: [email protected] # 6.3
Cc: Bjorn Andersson <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sebastian Reichel <[email protected]>
|
|
qcom_battmgr_update_request.battery_id is written to using cpu_to_le32()
and should be of type __le32, just like all other 32bit integer requests
for qcom_battmgr.
Cc: [email protected] # 6.3
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Fixes: 29e8142b5623 ("power: supply: Introduce Qualcomm PMIC GLINK power supply")
Reviewed-by: Johan Hovold <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sebastian Reichel <[email protected]>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- fix the narea calculation in swiotlb initialization (Ross Lagerwall)
- fix the check whether a device has used swiotlb (Petr Tesarik)
* tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix the check whether a device has used software IO TLB
swiotlb: use the calculated number of areas
|
|
Pull iomap fixes from Darrick Wong:
- Handle a race between writing and shrinking block devices by
returning EIO
- Fix a typo in a comment
* tag 'iomap-6.6-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Spelling s/preceeding/preceding/g
iomap: add a workaround for racy i_size updates on block devices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Usual business: a driver fix, a DT fix, a minor core fix"
* tag 'i2c-for-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: npcm7xx: Fix callback completion ordering
i2c: mux: Avoid potential false error message in i2c_mux_add_adapter
dt-bindings: i2c: mxs: Pass ref and 'unevaluatedProperties: false'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a possible NULL pointer dereference in the error path of
acpi_video_bus_add() resulting from recent changes (Dinghao Liu)"
* tag 'acpi-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Fix NULL pointer dereference in acpi_video_bus_add()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix arch_stack_walk_reliable(), used by live patching
- Fix powerpc selftests to work with run_kselftest.sh
Thanks to Joe Lawrence and Petr Mladek.
* tag 'powerpc-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix emit_tests to work with run_kselftest.sh
powerpc/stacktrace: Fix arch_stack_walk_reliable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix NFSv4 READ corner case
* tag 'nfsd-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set
|
|
Commit d52b59315bf5 ("bpf: Adjust size_index according to the value of
KMALLOC_MIN_SIZE") uses KMALLOC_MIN_SIZE to adjust size_index, but as
reported by Nathan, the adjustment is not enough, because
__kmalloc_minalign() also decides the minimal alignment of slab object
as shown in new_kmalloc_cache() and its value may be greater than
KMALLOC_MIN_SIZE (e.g., 64 bytes vs 8 bytes under a riscv QEMU VM).
Instead of invoking __kmalloc_minalign() in bpf subsystem to find the
maximal alignment, just using kmalloc_size_roundup() directly to get the
corresponding slab object size for each allocation size. If these two
sizes are unmatched, adjust size_index to select a bpf_mem_cache with
unit_size equal to the object_size of the underlying slab cache for the
allocation size.
Fixes: 822fb26bdb55 ("bpf: Add a hint to allocated objects.")
Reported-by: Nathan Chancellor <[email protected]>
Closes: https://lore.kernel.org/bpf/[email protected]/
Signed-off-by: Hou Tao <[email protected]>
Tested-by: Emil Renner Berthing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Pull smb client fix from Steve French:
"Fix for password freeing potential oops (also for stable)"
* tag '6.6-rc3-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
fs/smb/client: Reset password pointer to NULL
|
|
The kunit_action_platform_driver_unregister is added with
&fake_platform_driver as ctx, but the kunit_release_action is called
pdev as ctx. Fix that by replacing it with &fake_platform_driver.
Fixes: 4f2b0b583baa ("drm/tests: helpers: Switch to kunit actions")
Signed-off-by: Arthur Grillo <[email protected]>
Reviewed-by: Maíra Canal <[email protected]>
Acked-by: Maxime Ripard <[email protected]>
Signed-off-by: Maíra Canal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
In the plpar_hcall trace code, currently we use r0
to store the value of r4. But this value is not
used subsequently in the code. Hence remove this unused
save to r0 in plpar_hcall and plpar_hcall9
Suggested-by: Naveen N Rao <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
In powerpc pseries system, below behaviour is observed while
enabling tracing on hcall:
# cd /sys/kernel/debug/tracing/
# cat events/powerpc/hcall_exit/enable
0
# echo 1 > events/powerpc/hcall_exit/enable
# ls
-bash: fork: Bad address
Above is from power9 lpar with latest kernel. Past this, softlockup
is observed. Initially while attempting via perf_event_open to
use "PERF_TYPE_TRACEPOINT", kernel panic was observed.
perf config used:
================
memset(&pe[1],0,sizeof(struct perf_event_attr));
pe[1].type=PERF_TYPE_TRACEPOINT;
pe[1].size=96;
pe[1].config=0x26ULL; /* 38 raw_syscalls/sys_exit */
pe[1].sample_type=0; /* 0 */
pe[1].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1f */
pe[1].inherit=1;
pe[1].precise_ip=0; /* arbitrary skid */
pe[1].wakeup_events=0;
pe[1].bp_type=HW_BREAKPOINT_EMPTY;
pe[1].config1=0x1ULL;
Kernel panic logs:
==================
Kernel attempted to read user page (8) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000008
Faulting instruction address: 0xc0000000004c2814
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: nfnetlink bonding tls rfkill sunrpc dm_service_time dm_multipath pseries_rng xts vmx_crypto xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ibmvfc scsi_transport_fc ibmveth dm_mirror dm_region_hash dm_log dm_mod fuse
CPU: 0 PID: 1431 Comm: login Not tainted 6.4.0+ #1
Hardware name: IBM,8375-42A POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.30 (VL950_892) hv:phyp pSeries
NIP page_remove_rmap+0x44/0x320
LR wp_page_copy+0x384/0xec0
Call Trace:
0xc00000001416e400 (unreliable)
wp_page_copy+0x384/0xec0
__handle_mm_fault+0x9d4/0xfb0
handle_mm_fault+0xf0/0x350
___do_page_fault+0x48c/0xc90
hash__do_page_fault+0x30/0x70
do_hash_fault+0x1a4/0x330
data_access_common_virt+0x198/0x1f0
--- interrupt: 300 at 0x7fffae971abc
git bisect tracked this down to below commit:
'commit baa49d81a94b ("powerpc/pseries: hvcall stack frame overhead")'
This commit changed STACK_FRAME_OVERHEAD (112 ) to
STACK_FRAME_MIN_SIZE (32 ) since 32 bytes is the minimum size
for ELFv2 stack. With the latest kernel, when running on ELFv2,
STACK_FRAME_MIN_SIZE is used to allocate stack size.
During plpar_hcall_trace, first call is made to HCALL_INST_PRECALL
which saves the registers and allocates new stack frame. In the
plpar_hcall_trace code, STK_PARAM is accessed at two places.
1. To save r4: std r4,STK_PARAM(R4)(r1)
2. To access r4 back: ld r12,STK_PARAM(R4)(r1)
HCALL_INST_PRECALL precall allocates a new stack frame. So all
the stack parameter access after the precall, needs to be accessed
with +STACK_FRAME_MIN_SIZE. So the store instruction should be:
std r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
If the "std" is not updated with STACK_FRAME_MIN_SIZE, we will
end up with overwriting stack contents and cause corruption.
But instead of updating 'std', we can instead remove it since
HCALL_INST_PRECALL already saves it to the correct location.
similarly load instruction should be:
ld r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
Fix the load instruction to correctly access the stack parameter
with +STACK_FRAME_MIN_SIZE and remove the store of r4 since the
precall saves it correctly.
Cc: [email protected] # v6.2+
Fixes: baa49d81a94b ("powerpc/pseries: hvcall stack frame overhead")
Co-developed-by: Naveen N Rao <[email protected]>
Signed-off-by: Naveen N Rao <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.6
There's quite a lot of changes here, but a lot of them are simple quirks
or device IDs rather than actual fixes. The fixes that are here are all
quite device specific and relatively minor.
|
|
Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.
=======
[ 78.714569] Fallback order for Node 0: 0
[ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886
[ 78.717133] Policy zone: Normal
[ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 80.056643] PEFILE: Unsigned PE binary
=======
The memory hotplug events are notified very quickly and very many, while
the handling of crash hotplug is much slower relatively. So the atomic
variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.
Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug
handling specifically. This doesn't impact the usage of __kexec_lock.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 247262756121 ("crash: add generic infrastructure for crash hotplug support")
Signed-off-by: Baoquan He <[email protected]>
Tested-by: Eric DeVolder <[email protected]>
Reviewed-by: Eric DeVolder <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Cc: Sourabh Jain <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
hugetlb_reparenting_test.sh that may cause error
According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).
The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).
Error Example:
awk: not an option: -e
Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Juntong Deng <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
specified
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page. Then commit 6f4576e3687b ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone. The
return value problem was fixed by commit a7f40cfe3b7a ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.
The code should conceptually do:
if (MPOL_MF_MOVE|MOVEALL)
scan all vmas
try to migrate the existing pages
return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
scan all vmas
try to migrate the existing pages
return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
check the ranges in test_walk, EFAULT without mbind_range() if discontig.
Fixed the behavior.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Rafael Aquini <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: <[email protected]> [4.9+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
When CONFIG_DAMON_VADDR_KUNIT_TEST=y and making CONFIG_DEBUG_KMEMLEAK=y
and CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y, the below memory leak is detected.
Since commit 9f86d624292c ("mm/damon/vaddr-test: remove unnecessary
variables"), the damon_destroy_ctx() is removed, but still call
damon_new_target() and damon_new_region(), the damon_region which is
allocated by kmem_cache_alloc() in damon_new_region() and the damon_target
which is allocated by kmalloc in damon_new_target() are not freed. And
the damon_region which is allocated in damon_new_region() in
damon_set_regions() is also not freed.
So use damon_destroy_target to free all the damon_regions and damon_target.
unreferenced object 0xffff888107c9a940 (size 64):
comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b ............kkkk
60 c7 9c 07 81 88 ff ff f8 cb 9c 07 81 88 ff ff `...............
backtrace:
[<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff8881079cc740 (size 56):
comm "kunit_try_catch", pid 1069, jiffies 4294670592 (age 732.761s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
[<ffffffff819c82be>] damon_test_apply_three_regions1+0x21e/0x260
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107c9ac40 (size 64):
comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b ............kkkk
a0 cc 9c 07 81 88 ff ff 78 a1 76 07 81 88 ff ff ........x.v.....
backtrace:
[<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff8881079ccc80 (size 56):
comm "kunit_try_catch", pid 1071, jiffies 4294670595 (age 732.843s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
[<ffffffff819c851e>] damon_test_apply_three_regions2+0x21e/0x260
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888107c9af40 (size 64):
comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 06 00 00 00 6b 6b 6b 6b ............kkkk
20 a2 76 07 81 88 ff ff b8 a6 76 07 81 88 ff ff .v.......v.....
backtrace:
[<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810776a200 (size 56):
comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.011s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 14 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<ffffffff819c7d91>] damon_do_test_apply_three_regions.constprop.0+0xd1/0x3e0
[<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810776a740 (size 56):
comm "kunit_try_catch", pid 1073, jiffies 4294670597 (age 733.025s)
hex dump (first 32 bytes):
3d 00 00 00 00 00 00 00 3f 00 00 00 00 00 00 00 =.......?.......
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
[<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
[<ffffffff819c877e>] damon_test_apply_three_regions3+0x21e/0x260
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff888108038240 (size 64):
comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 03 00 00 00 6b 6b 6b 6b ............kkkk
48 ad 76 07 81 88 ff ff 98 ae 76 07 81 88 ff ff H.v.......v.....
backtrace:
[<ffffffff817e0167>] kmalloc_trace+0x27/0xa0
[<ffffffff819c11cf>] damon_new_target+0x3f/0x1b0
[<ffffffff819c7d55>] damon_do_test_apply_three_regions.constprop.0+0x95/0x3e0
[<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
unreferenced object 0xffff88810776ad28 (size 56):
comm "kunit_try_catch", pid 1075, jiffies 4294670600 (age 733.022s)
hex dump (first 32 bytes):
05 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 ................
6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b kkkkkkkk....kkkk
backtrace:
[<ffffffff819bc492>] damon_new_region+0x22/0x1c0
[<ffffffff819bfcc2>] damon_set_regions+0x4c2/0x8e0
[<ffffffff819c7dbb>] damon_do_test_apply_three_regions.constprop.0+0xfb/0x3e0
[<ffffffff819c898d>] damon_test_apply_three_regions4+0x1cd/0x210
[<ffffffff829fce6a>] kunit_generic_run_threadfn_adapter+0x4a/0x90
[<ffffffff81237cf6>] kthread+0x2b6/0x380
[<ffffffff81097add>] ret_from_fork+0x2d/0x70
[<ffffffff81003791>] ret_from_fork_asm+0x11/0x20
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9f86d624292c ("mm/damon/vaddr-test: remove unnecessary variables")
Signed-off-by: Jinjie Ruan <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and
partially reverts 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit. Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1]. The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result. The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.
Now, reverting alone 86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file. Therefore we have to go and revisit 58056f77502f as
well. There are two ways to go ahead. Either we give up on the
deprecation and fully revert 58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.
Note to backporters to stable trees. a8c49af3be5f ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore. Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).
[[email protected]: fix build - remove unused local]
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net [1]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jeremi Piotrowski <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Tejun heo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
While stress-testing zswap a memory corruption was happening when writing
back pages. __frontswap_store used to check for duplicate entries before
attempting to store a page in zswap, this was because if the store fails
the old entry isn't removed from the tree. This change removes duplicate
entries in zswap_store before the actual attempt.
[[email protected]: add a warning and a comment, per Johannes]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 42c06a0e8ebe ("mm: kill frontswap")
Signed-off-by: Domenico Cerasuolo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Nhat Pham <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above. While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs. And since then new
callers have come along which rely on this working. But given the
brokeness is only observable after commit 8a13897fb0da ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.
Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <[email protected]>
Reviewed-by: Axel Rasmussen <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: <[email protected]> [6.5+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "Fix set_huge_pte_at() panic on arm64", v2.
This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic. The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory. This test (and the uffd poison feature) was merged for
v6.5-rc7.
Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.
Description of Bug
==================
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():
static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
{
VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
return page_folio(pfn_to_page(swp_offset_pfn(entry)));
}
Fix
===
The simplest fix would have been to revert the dodgy cleanup commit
18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at(). As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.
So instead, I've added a huge page size parameter to set_huge_pte_at().
This means that the arm64 code has the size in all cases. It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.
I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64). I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.
This patch (of 2):
In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at(). Provide for this by
adding an `unsigned long sz` parameter to the function. This follows the
same pattern as huge_pte_clear().
This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc). The actual arm64 bug will be fixed in a separate
commit.
No behavioral changes intended.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <[email protected]> [vmalloc change]
Cc: Alexandre Ghiti <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: <[email protected]> [6.5+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
When updating the maple tree iterator to avoid rewalks, an issue was
introduced when shifting beyond the limits. This can be seen by trying to
go to the previous address of 0, which would set the maple node to
MAS_NONE and keep the range as the last entry.
Subsequent calls to mas_find() would then search upwards from mas->last
and skip the value at mas->index/mas->last. This showed up as a bug in
mprotect which skips the actual VMA at the current range after attempting
to go to the previous VMA from 0.
Since MAS_NONE may already be set when searching for a value that isn't
contained within a node, changing the handling of MAS_NONE in mas_find()
would make the code more complicated and error prone. Furthermore, there
was no way to tell which limit was hit, and thus which action to take
(next or the entry at the current range).
This solution is to add two states to track what happened with the
previous iterator action. This allows for the expected behaviour of the
next command to return the correct item (either the item at the range
requested, or the next/previous).
Tests are also added and updated accordingly.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Link: https://lore.kernel.org/linux-mm/[email protected]/
Fixes: 39193685d585 ("maple_tree: try harder to keep active node with mas_prev()")
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: Pedro Falcato <[email protected]>
Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b
Closes: https://bugs.archlinux.org/task/79656
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "maple_tree: Fix mas_prev() state regression".
Pedro Falcato retported an mprotect regression [1] which was bisected back
to the iterator changes for maple tree. Root cause analysis showed the
mas_prev() running off the end of the VMA space (previous from 0) followed
by mas_find(), would skip the first value.
This patchset introduces maple state underflow/overflow so the sequence of
calls on the maple state will return what the user expects.
Users who encounter this bug may see mprotect(), userfaultfd_register(),
and mlock() fail on VMAs mapped with address 0.
This patch (of 2):
Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam R. Howlett <[email protected]>
Cc: Pedro Falcato <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|