Age | Commit message (Collapse) | Author | Files | Lines |
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Merge thermal driver fixes for 6.10-rc5 from Daniel Lezcano:
"- Remove the filtered mode for mt8188 as it is not supported on this
platform (Julien Panis)
- Fail in case the golden temperature is zero as that means the efuse
data is not correctly set (Julien Panis)"
* tag 'thermal-v6.10-rc4' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
|
|
VMWARE_HYPERCALL alternative will not work as intended without VMware guest code
initialization.
[ bp: note that this doesn't reproduce with newer gccs so it must be
something gcc-9-specific. ]
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Alexey Makhalov <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It's not possible to use the joiner at the same time with eDP MSO. When
a panel needs MSO, it's not optional, so MSO trumps joiner.
v3: Only change intel_dp_has_joiner(), leave debugfs alone (Ville)
Fixes: bc71194e8897 ("drm/i915/edp: enable eDP MSO during link training")
Cc: <[email protected]> # v5.13+
Cc: Ville Syrjala <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1668
Reviewed-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Jani Nikula <[email protected]>
(cherry picked from commit 8b5a92ca24eb96bb71e2a55e352687487d87687f)
Signed-off-by: Jani Nikula <[email protected]>
|
|
Luis has been reporting an assert failure when freeing an inode
cluster during inode inactivation for a while. The assert looks
like:
XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.10.0-rc1 #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: xfs-inodegc/loop5 xfs_inodegc_worker [xfs]
RIP: 0010:assfail (fs/xfs/xfs_message.c:102) xfs
RSP: 0018:ffff88810188f7f0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88816e748250 RCX: 1ffffffff844b0e7
RDX: 0000000000000004 RSI: ffff88810188f558 RDI: ffffffffc2431fa0
RBP: 1ffff11020311f01 R08: 0000000042431f9f R09: ffffed1020311e9b
R10: ffff88810188f4df R11: ffffffffac725d70 R12: ffff88817a3f4000
R13: ffff88812182f000 R14: ffff88810188f998 R15: ffffffffc2423f80
FS: 0000000000000000(0000) GS:ffff8881c8400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055fe9d0f109c CR3: 000000014426c002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:241 (discriminator 1)) xfs
xfs_imap_to_bp (fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_inode_buf.c:138) xfs
xfs_inode_item_precommit (fs/xfs/xfs_inode_item.c:145) xfs
xfs_trans_run_precommits (fs/xfs/xfs_trans.c:931) xfs
__xfs_trans_commit (fs/xfs/xfs_trans.c:966) xfs
xfs_inactive_ifree (fs/xfs/xfs_inode.c:1811) xfs
xfs_inactive (fs/xfs/xfs_inode.c:2013) xfs
xfs_inodegc_worker (fs/xfs/xfs_icache.c:1841 fs/xfs/xfs_icache.c:1886) xfs
process_one_work (kernel/workqueue.c:3231)
worker_thread (kernel/workqueue.c:3306 (discriminator 2) kernel/workqueue.c:3393 (discriminator 2))
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
And occurs when the the inode precommit handlers is attempt to look
up the inode cluster buffer to attach the inode for writeback.
The trail of logic that I can reconstruct is as follows.
1. the inode is clean when inodegc runs, so it is not
attached to a cluster buffer when precommit runs.
2. #1 implies the inode cluster buffer may be clean and not
pinned by dirty inodes when inodegc runs.
3. #2 implies that the inode cluster buffer can be reclaimed
by memory pressure at any time.
4. The assert failure implies that the cluster buffer was
attached to the transaction, but not marked done. It had
been accessed earlier in the transaction, but not marked
done.
5. #4 implies the cluster buffer has been invalidated (i.e.
marked stale).
6. #5 implies that the inode cluster buffer was instantiated
uninitialised in the transaction in xfs_ifree_cluster(),
which only instantiates the buffers to invalidate them
and never marks them as done.
Given factors 1-3, this issue is highly dependent on timing and
environmental factors. Hence the issue can be very difficult to
reproduce in some situations, but highly reliable in others. Luis
has an environment where it can be reproduced easily by g/531 but,
OTOH, I've reproduced it only once in ~2000 cycles of g/531.
I think the fix is to have xfs_ifree_cluster() set the XBF_DONE flag
on the cluster buffers, even though they may not be initialised. The
reasons why I think this is safe are:
1. A buffer cache lookup hit on a XBF_STALE buffer will
clear the XBF_DONE flag. Hence all future users of the
buffer know they have to re-initialise the contents
before use and mark it done themselves.
2. xfs_trans_binval() sets the XFS_BLI_STALE flag, which
means the buffer remains locked until the journal commit
completes and the buffer is unpinned. Hence once marked
XBF_STALE/XFS_BLI_STALE by xfs_ifree_cluster(), the only
context that can access the freed buffer is the currently
running transaction.
3. #2 implies that future buffer lookups in the currently
running transaction will hit the transaction match code
and not the buffer cache. Hence XBF_STALE and
XFS_BLI_STALE will not be cleared unless the transaction
initialises and logs the buffer with valid contents
again. At which point, the buffer will be marked marked
XBF_DONE again, so having XBF_DONE already set on the
stale buffer is a moot point.
4. #2 also implies that any concurrent access to that
cluster buffer will block waiting on the buffer lock
until the inode cluster has been fully freed and is no
longer an active inode cluster buffer.
5. #4 + #1 means that any future user of the disk range of
that buffer will always see the range of disk blocks
covered by the cluster buffer as not done, and hence must
initialise the contents themselves.
6. Setting XBF_DONE in xfs_ifree_cluster() then means the
unlinked inode precommit code will see a XBF_DONE buffer
from the transaction match as it expects. It can then
attach the stale but newly dirtied inode to the stale
but newly dirtied cluster buffer without unexpected
failures. The stale buffer will then sail through the
journal and do the right thing with the attached stale
inode during unpin.
Hence the fix is just one line of extra code. The explanation of
why we have to set XBF_DONE in xfs_ifree_cluster, OTOH, is long and
complex....
Fixes: 82842fee6e59 ("xfs: fix AGF vs inode cluster buffer deadlock")
Signed-off-by: Dave Chinner <[email protected]>
Tested-by: Luis Chamberlain <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Chandan Babu R <[email protected]>
|
|
The gpio in "reg_usdhc2_vmmc" should be 7 instead of 19.
Cc: [email protected]
Fixes: 307fd14d4b14 ("arm64: dts: imx: add imx8qm mek support")
Reviewed-by: Peng Fan <[email protected]>
Signed-off-by: Frank Li <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"On parisc we have suffered since years from random segfaults which
seem to have been triggered due to cache inconsistencies. Those
segfaults happened more often on machines with PA8800 and PA8900 CPUs,
which have much bigger caches than the earlier machines.
Dave Anglin has worked over the last few weeks to fix this bug. His
patch has been successfully tested by various people on various
machines and with various kernels (6.6, 6.8 and 6.9), and the debian
buildd servers haven't shown a single random segfault with this patch.
Since the cache handling has been reworked, the patch is slightly
bigger than I would like in this stage, but the greatly improved
stability IMHO justifies the inclusion now"
* tag 'parisc-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Try to fix random segmentation faults in package builds
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two fixes to correctly report i2c functionality, ensuring that
I2C_FUNC_SLAVE is reported when a device operates solely as a slave
interface"
* tag 'i2c-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: Fix the functionality flags of the slave-only interface
i2c: at91: Fix the functionality flags of the slave-only interface
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.10-rc4.
Included in here are:
- thunderbolt debugfs bugfix
- USB typec bugfixes
- kcov usb bugfix
- xhci bugfixes
- usb-storage bugfix
- dt-bindings bugfix
- cdc-wdm log message spam bugfix
All of these, except for the last cdc-wdm log level change, have been
in linux-next for a while with no reported problems. The cdc-wdm
bugfix has been tested by syzbot and proved to fix the reported cpu
lockup issues when the log is constantly spammed by a broken device"
* tag 'usb-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
xhci: Handle TD clearing for multiple streams case
xhci: Apply broken streams quirk to Etron EJ188 xHCI host
xhci: Apply reset resume quirk to Etron EJ188 xHCI host
xhci: Set correct transferred length for cancelled bulk transfers
usb-storage: alauda: Check whether the media is initialized
usb: typec: ucsi: Ack also failed Get Error commands
kcov, usb: disable interrupts in kcov_remote_start_usb_softirq
dt-bindings: usb: realtek,rts5411: Add missing "additionalProperties" on child nodes
usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state
usb: typec: tcpm: fix use-after-free case in tcpm_register_source_caps
USB: xen-hcd: Traverse host/ when CONFIG_USB_XEN_HCD is selected
usb: typec: ucsi: glink: increase max ports for x1e80100
Revert "usb: chipidea: move ci_ulpi_init after the phy initialization"
thunderbolt: debugfs: Fix margin debugfs node creation condition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes that resolve som
reported problems. Included in here are:
- n_tty lookahead buffer bugfix
- WARN_ON() removal where it was not needed
- 8250_dw driver bugfixes
- 8250_pxa bugfix
- sc16is7xx Kconfig fixes for reported build issues
All of these have been in linux-next for over a week with no reported
problems"
* tag 'tty-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: drop debugging WARN_ON_ONCE() from uart_write()
serial: sc16is7xx: re-add Kconfig SPI or I2C dependency
serial: sc16is7xx: rename Kconfig CONFIG_SERIAL_SC16IS7XX_CORE
serial: port: Don't block system suspend even if bytes are left to xmit
serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level
serial: 8250_dw: Revert "Move definitions to the shared header"
serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw
tty: n_tty: Fix buffer offsets when lookahead is used
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
"Here is a single staging driver fix, for the vc04 driver. It resolves
a reported problem that showed up in the merge window set of changes.
It's been in linux-next for over a week with no reported problems"
* tag 'staging-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vchiq_debugfs: Fix NPD in vchiq_dump_state
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and sysfs fixes from Greg KH:
"Here are three small changes for 6.10-rc4 that resolve reported
problems, and finally drop an unused api call. These are:
- removal of devm_device_add_groups(), all the callers of this are
finally gone after the 6.10-rc1 merge (changes came in through
different trees), so it's safe to remove.
- much reported sysfs build error fixed up for systems that did not
have sysfs enabled
- driver core sync issue fix for a many reported issue over the years
that no one really paid much attention to, until Dirk finally
tracked down the real issue and made the "obviously correct and
simple" fix for it.
All of these have been in linux-next for over a week with no reported
problems"
* tag 'driver-core-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: core: synchronize really_probe() and dev_uevent()
sysfs: Unbreak the build around sysfs_bin_attr_simple_read()
driver core: remove devm_device_add_groups()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a number of small char/misc and iio driver fixes for
6.10-rc4. Included in here are the following:
- iio driver fixes for a bunch of reported problems.
- mei driver fixes for a number of reported issues.
- amiga parport driver build fix.
- .editorconfig fix that was causing lots of unintended whitespace
changes to happen to files when they were being edited. Unless we
want to sweep the whole tree and remove all trailing whitespace at
once, this is needed for the .editorconfig file to be able to be
used at all. This change is required because the original
submitters never touched older files in the tree.
- jfs bugfix for a buffer overflow
The jfs bugfix is in here as I didn't know where else to put it, and
it's been ignored for a while as the filesystem seems to be abandoned
and I'm tired of seeing the same issue reported in multiple places.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (25 commits)
.editorconfig: remove trim_trailing_whitespace option
jfs: xattr: fix buffer overflow for invalid xattr
misc: microchip: pci1xxxx: Fix a memory leak in the error handling of gp_aux_bus_probe()
misc: microchip: pci1xxxx: fix double free in the error handling of gp_aux_bus_probe()
parport: amiga: Mark driver struct with __refdata to prevent section mismatch
mei: vsc: Fix wrong invocation of ACPI SID method
mei: vsc: Don't stop/restart mei device during system suspend/resume
mei: me: release irq in mei_me_pci_resume error path
mei: demote client disconnect warning on suspend to debug
iio: inkern: fix channel read regression
iio: imu: inv_mpu6050: stabilized timestamping in interrupt
iio: adc: ad7173: Fix sampling frequency setting
iio: adc: ad7173: Clear append status bit
iio: imu: inv_icm42600: delete unneeded update watermark call
iio: imu: inv_icm42600: stabilized timestamp in interrupt
iio: invensense: fix odr switching to same value
iio: adc: ad7173: Remove index from temp channel
iio: adc: ad7173: Add ad7173_device_info names
iio: adc: ad7173: fix buffers enablement for ad7176-2
iio: temperature: mlx90635: Fix ERR_PTR dereference in mlx90635_probe()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
"Fix a bug where the SCSI Removable Media Bit (RMB) was incorrectly set
for hot-plug capable (and eSATA) ports.
The RMB bit means that the media is removable (e.g. floppy or CD-ROM),
not that the device server is removable. If the RMB bit is set, SCSI
will set the removable media sysfs attribute.
If the removable media sysfs attribute is set on a device,
GNOME/udisks will automatically mount the device on boot.
We only want to set the SCSI RMB bit (and thus the removable media
sysfs attribute) for devices where the ATA removable media device bit
is set"
* tag 'ata-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-scsi: Set the RMB bit only for removable media devices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix two issues with MI300 address translation logic
* tag 'edac_urgent_for_v6.10_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation
RAS/AMD/ATL: Fix MI300 bank hash
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:
- Update tracepoints events introduced in v6.10-rc1 so that it includes
the numeric identifier of host card in which the event happens
- replace wiki URL with the current website URL in Kconfig
* tag 'firewire-fixes-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: record card index in bus_reset_handle tracepoints event
firewire: core: record card index in tracepoinrts events derived from bus_reset_arrange_template
firewire: core: record card index in async_phy_inbound tracepoints event
firewire: core: record card index in async_phy_outbound_complete tracepoints event
firewire: core: record card index in async_phy_outbound_initiate tracepoints event
firewire: core: record card index in tracepoinrts events derived from async_inbound_template
firewire: core: record card index in tracepoinrts events derived from async_outbound_initiate_template
firewire: core: record card index in tracepoinrts events derived from async_outbound_complete_template
firewire: fix website URL in Kconfig
|
|
trigger the default trigger"
Commit 66601a29bb23 ("leds: class: If no default trigger is given, make
hw_control trigger the default trigger") causes ledtrig-netdev to get
set as default trigger on various network LEDs.
This causes users to hit a pre-existing AB-BA deadlock issue in
ledtrig-netdev between the LED-trigger locks and the rtnl mutex,
resulting in hung tasks in kernels >= 6.9.
Solving the deadlock is non trivial, so for now revert the change to
set the hw_control trigger as default trigger, so that ledtrig-netdev
no longer gets activated automatically for various network LEDs.
The netdev trigger is not needed because the network LEDs are usually under
hw-control and the netdev trigger tries to leave things that way so setting
it as the active trigger for the LED class device is a no-op.
Fixes: 66601a29bb23 ("leds: class: If no default trigger is given, make hw_control trigger the default trigger")
Reported-by: Genes Lists <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Reported-by: Johannes Wüller <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Acked-by: Lee Jones <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
SODIMM 17 can be used as an edge triggered interrupt supplied from an
off board source.
Enable hysteresis on the pinmuxing to increase immunity against noise
on the signal.
Fixes: 60f01b5b5c7d ("arm64: dts: imx8mm-verdin: update iomux configuration")
Signed-off-by: Max Krummenacher <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>
|
|
Similarly to other Lenovo laptops these also have a dual speaker
setup with a shared amplifier.
The model also seems to have a conflicting PCI SSID with the codec SSID for
the Legion Y9000X 2022 IAH7. Only tested on the Yoga Pro 7, as I don't have
access to the other laptop.
Signed-off-by: Gergely Meszaros <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Two fixes from Jean aim to correctly report i2c functionality,
specifically ensuring that I2C_FUNC_SLAVE is reported when a
device operates solely as a slave interface.
|
|
translation
The currently used normalized address format is not applicable to all
MI300 systems. This leads to incorrect results during address
translation.
Drop the fixed layout and construct the normalized address from system
settings.
Fixes: 87a612375307 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support")
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Headset microphone do not work out of the box with this laptop. This
quirk fixes it. Zihao Wang specified the wrong subsystem id in his patch.
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 3b79954fd00d ("ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers")
Signed-off-by: Ajrat Makhmutov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Introduce numa_valid_node(nid) that verifies that nid is a valid node ID
and use that instead of comparing nid parameter with either NUMA_NO_NODE
or MAX_NUMNODES.
This makes the checks for valid node IDs consistent and more robust and
allows to get rid of multiple WARNings.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Mike Rapoport (IBM) <[email protected]>
|
|
The usdhc2 port is connected to the microSD slot. The presence of the
'no-sdio' property prevents Wifi SDIO cards, such as CMP9010-X-EVB [1]
to be detected.
Remove the 'no-sdio' property so that SDIO cards could also work.
[1] https://www.nxp.com/products/wireless-connectivity/wi-fi-plus-bluetooth-plus-802-15-4/cmp9010-x-evb-iw416-usd-interface-evaluation-board:CMP9010-X-EVB
Fixes: e37907bd8294 ("arm64: dts: freescale: add i.MX93 11x11 EVK basic support")
Signed-off-by: Fabio Estevam <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>
|
|
Al reported a possible use-after-free (UAF) in kvm_spapr_tce_attach_iommu_group().
It looks up `stt` from tablefd, but then continues to use it after doing
fdput() on the returned fd. After the fdput() the tablefd is free to be
closed by another thread. The close calls kvm_spapr_tce_release() and
then release_spapr_tce_table() (via call_rcu()) which frees `stt`.
Although there are calls to rcu_read_lock() in
kvm_spapr_tce_attach_iommu_group() they are not sufficient to prevent
the UAF, because `stt` is used outside the locked regions.
With an artifcial delay after the fdput() and a userspace program which
triggers the race, KASAN detects the UAF:
BUG: KASAN: slab-use-after-free in kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
Read of size 4 at addr c000200027552c30 by task kvm-vfio/2505
CPU: 54 PID: 2505 Comm: kvm-vfio Not tainted 6.10.0-rc3-next-20240612-dirty #1
Hardware name: 8335-GTH POWER9 0x4e1202 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
Call Trace:
dump_stack_lvl+0xb4/0x108 (unreliable)
print_report+0x2b4/0x6ec
kasan_report+0x118/0x2b0
__asan_load4+0xb8/0xd0
kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
kvm_vfio_set_attr+0x524/0xac0 [kvm]
kvm_device_ioctl+0x144/0x240 [kvm]
sys_ioctl+0x62c/0x1810
system_call_exception+0x190/0x440
system_call_vectored_common+0x15c/0x2ec
...
Freed by task 0:
...
kfree+0xec/0x3e0
release_spapr_tce_table+0xd4/0x11c [kvm]
rcu_core+0x568/0x16a0
handle_softirqs+0x23c/0x920
do_softirq_own_stack+0x6c/0x90
do_softirq_own_stack+0x58/0x90
__irq_exit_rcu+0x218/0x2d0
irq_exit+0x30/0x80
arch_local_irq_restore+0x128/0x230
arch_local_irq_enable+0x1c/0x30
cpuidle_enter_state+0x134/0x5cc
cpuidle_enter+0x6c/0xb0
call_cpuidle+0x7c/0x100
do_idle+0x394/0x410
cpu_startup_entry+0x60/0x70
start_secondary+0x3fc/0x410
start_secondary_prolog+0x10/0x14
Fix it by delaying the fdput() until `stt` is no longer in use, which
is effectively the entire function. To keep the patch minimal add a call
to fdput() at each of the existing return paths. Future work can convert
the function to goto or __cleanup style cleanup.
With the fix in place the test case no longer triggers the UAF.
Reported-by: Al Viro <[email protected]>
Closes: https://lore.kernel.org/all/20240610024437.GA1464458@ZenIV/
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]
|
|
Pull xfs fix from Chandan Babu:
"Ensure xfs incore superblock's allocated inode counter, free inode
counter, and free data block counter are all zero or positive when
they are copied over from xfs_mount->m_[icount,ifree,fdblocks]
respectively"
* tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: make sure sb_fdblocks is non-negative
|
|
Pull smb server fixes from Steve French:
"Two small smb3 server fixes:
- set xatttr fix
- pathname parsing check fix"
* tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix missing use of get_write in in smb2_set_ea()
ksmbd: move leading slash check to smb2_get_name()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix the 8 bytes get_user() logic on x86-32
- Fix build bug that creates weird & mistaken target directory under
arch/x86/
* tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Don't add the EFI stub to targets, again
x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix boot-time warning in tick_setup_device()"
* tag 'timers-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
|
|
In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV
metadata of the current task into a per-CPU variable. However, the
kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV
coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote
KCOV objects.
If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens
to get interrupted and kcov_remote_start() is called, it ultimately leads
to kcov_remote_stop() NOT restoring the original KCOV reference. So when
the task exits, all registered remote KCOV handles remain active forever.
The most uncomfortable effect (at least for syzkaller) is that the bug
prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If
we obtain it in the parent process and then e.g. drop some
capabilities and continuously fork to execute individual programs, at
some point current->kcov of the forked process is lost,
kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls
calls from subsequent forks fail.
And, yes, the efficiency is also affected if we keep on losing remote
kcov objects.
a) kcov_remote_map keeps on growing forever.
b) (If I'm not mistaken), we're also not freeing the memory referenced
by kcov->area.
Fix it by introducing a special kcov_mode that is assigned to the task
that owns a KCOV remote object. It makes kcov_mode_enabled() return true
and yet does not trigger coverage collection in __sanitizer_cov_trace_pc()
and write_comp_data().
[[email protected]: replace WRITE_ONCE() with an ordinary assignment]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts")
Signed-off-by: Aleksandr Nogikh <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When testing shmem swapin, I encountered the warning below on my machine.
The reason is that replacing an old shmem folio with a new one causes
mem_cgroup_migrate() to clear the old folio's memcg data. As a result,
the old folio cannot get the correct memcg's lruvec needed to remove
itself from the LRU list when it is being freed. This could lead to
possible serious problems, such as LRU list crashes due to holding the
wrong LRU lock, and incorrect LRU statistics.
To fix this issue, we can fallback to use the mem_cgroup_replace_folio()
to replace the old shmem folio.
[ 5241.100311] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5d9960
[ 5241.100317] head: order:4 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 5241.100319] flags: 0x17fffe0000040068(uptodate|lru|head|swapbacked|node=0|zone=2|lastcpupid=0x3ffff)
[ 5241.100323] raw: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
[ 5241.100325] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100326] head: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
[ 5241.100327] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100328] head: 17fffe0000000204 fffffdffd6665801 ffffffffffffffff 0000000000000000
[ 5241.100329] head: 0000000a00000010 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100330] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
[ 5241.100338] ------------[ cut here ]------------
[ 5241.100339] WARNING: CPU: 19 PID: 78402 at include/linux/memcontrol.h:775 folio_lruvec_lock_irqsave+0x140/0x150
[...]
[ 5241.100374] pc : folio_lruvec_lock_irqsave+0x140/0x150
[ 5241.100375] lr : folio_lruvec_lock_irqsave+0x138/0x150
[ 5241.100376] sp : ffff80008b38b930
[...]
[ 5241.100398] Call trace:
[ 5241.100399] folio_lruvec_lock_irqsave+0x140/0x150
[ 5241.100401] __page_cache_release+0x90/0x300
[ 5241.100404] __folio_put+0x50/0x108
[ 5241.100406] shmem_replace_folio+0x1b4/0x240
[ 5241.100409] shmem_swapin_folio+0x314/0x528
[ 5241.100411] shmem_get_folio_gfp+0x3b4/0x930
[ 5241.100412] shmem_fault+0x74/0x160
[ 5241.100414] __do_fault+0x40/0x218
[ 5241.100417] do_shared_fault+0x34/0x1b0
[ 5241.100419] do_fault+0x40/0x168
[ 5241.100420] handle_pte_fault+0x80/0x228
[ 5241.100422] __handle_mm_fault+0x1c4/0x440
[ 5241.100424] handle_mm_fault+0x60/0x1f0
[ 5241.100426] do_page_fault+0x120/0x488
[ 5241.100429] do_translation_fault+0x4c/0x68
[ 5241.100431] do_mem_abort+0x48/0xa0
[ 5241.100434] el0_da+0x38/0xc0
[ 5241.100436] el0t_64_sync_handler+0x68/0xc0
[ 5241.100437] el0t_64_sync+0x14c/0x150
[ 5241.100439] ---[ end trace 0000000000000000 ]---
[[email protected]: remove less helpful comments, per Matthew]
Link: https://lkml.kernel.org/r/ccad3fe1375b468ebca3227b6b729f3eaf9d8046.1718423197.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/3c11000dd6c1df83015a8321a859e9775ebbc23e.1718266112.git.baolin.wang@linux.alibaba.com
Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration")
Signed-off-by: Baolin Wang <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Macro RANDOM_ORVALUE was used to make sure the pgtable entry will be
populated with !none data in clear tests.
The RANDOM_ORVALUE tried to cover mostly all the bits in a pgtable entry,
even if there's no discussion on whether all the bits will be vaild. Both
S390 and PPC64 have their own masks to avoid touching some bits. Now it's
the turn for x86_64.
The issue is there's a recent report from Mikhail Gavrilov showing that
this can cause a warning with the newly added pte set check in commit
8430557fc5 on writable v.s. userfaultfd-wp bit, even though the check
itself was valid, the random pte is not. We can choose to mask more bits
out.
However the need to have such random bits setup is questionable, as now
it's already guaranteed to be true on below:
- For pte level, the pgtable entry will be installed with value from
pfn_pte(), where pfn points to a valid page. Hence the pte will be
!none already if populated with pfn_pte().
- For upper-than-pte level, the pgtable entry should contain a directory
entry always, which is also !none.
All the cases look like good enough to test a pxx_clear() helper. Instead
of extending the bitmask, drop the "set random bits" trick completely. Add
some warning guards to make sure the entries will be !none before clear().
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8430557fc584 ("mm/page_table_check: support userfault wr-protect entries")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: Mikhail Gavrilov <[email protected]>
Link: https://lore.kernel.org/r/CABXGCsMB9A8-X+Np_Q+fWLURYL_0t3Y-MdoNabDM-Lzk58-DGA@mail.gmail.com
Tested-by: Mikhail Gavrilov <[email protected]>
Reviewed-by: Pasha Tatashin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Gavin Shan <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The large folio is mapped with folio size(not greater PMD_SIZE) aligned
virtual address during the pagefault, ie, 'addr = ALIGN_DOWN(vmf->address,
nr_pages * PAGE_SIZE)' in do_anonymous_page(). But after the mremap(),
the virtual address only requires PAGE_SIZE alignment. Also pte is moved
to new in move_page_tables(), then traversal of the new pte in the
numa_rebuild_large_mapping() could hit the following issue,
Unable to handle kernel paging request at virtual address 00000a80c021a788
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00002040341a6000
[00000a80c021a788] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
...
CPU: 76 PID: 15187 Comm: git Kdump: loaded Tainted: G W 6.10.0-rc2+ #209
Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.79 08/21/2021
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : numa_rebuild_large_mapping+0x338/0x638
lr : numa_rebuild_large_mapping+0x320/0x638
sp : ffff8000b41c3b00
x29: ffff8000b41c3b30 x28: ffff8000812a0000 x27: 00000000000a8000
x26: 00000000000000a8 x25: 0010000000000001 x24: ffff20401c7170f0
x23: 0000ffff33a1e000 x22: 0000ffff33a76000 x21: ffff20400869eca0
x20: 0000ffff33976000 x19: 00000000000000a8 x18: ffffffffffffffff
x17: 0000000000000000 x16: 0000000000000020 x15: ffff8000b41c36a8
x14: 0000000000000000 x13: 205d373831353154 x12: 5b5d333331363732
x11: 000000000011ff78 x10: 000000000011ff10 x9 : ffff800080273f30
x8 : 000000320400869e x7 : c0000000ffffd87f x6 : 00000000001e6ba8
x5 : ffff206f3fb5af88 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : fffffdffc0000000 x0 : 00000a80c021a780
Call trace:
numa_rebuild_large_mapping+0x338/0x638
do_numa_page+0x3e4/0x4e0
handle_pte_fault+0x1bc/0x238
__handle_mm_fault+0x20c/0x400
handle_mm_fault+0xa8/0x288
do_page_fault+0x124/0x498
do_translation_fault+0x54/0x80
do_mem_abort+0x4c/0xa8
el0_da+0x40/0x110
el0t_64_sync_handler+0xe4/0x158
el0t_64_sync+0x188/0x190
Fix it by making the start and end not only within the vma range, but also
within the page table range.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: d2136d749d76 ("mm: support multi-size THP numa balancing")
Signed-off-by: Kefeng Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Liu Shixin <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ryan Roberts <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); and
if DEBUG_VM were off, then pages would be lost on a local list.
Our convention is that if migrate_pages() reports complete success (0),
then the migratepages list will be empty; but if it reports an error or
some pages remaining, then its caller must putback_movable_pages().
There's a new case in which migrate_pages() has been reporting complete
success, but returning with pages left on the migratepages list: when
migrate_pages_batch() successfully split a folio on the deferred list, but
then the "Failure isn't counted" call does not dispose of them all.
Since that block is expecting the large folio to have been counted as 1
failure already, and since the return code is later adjusted to success
whenever the returned list is found empty, the simple way to fix this
safely is to count splitting the deferred folio as "a failure".
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
KTAP parsers interpret the output of ksft_test_result_*() as being the
name of the test. The map_fixed_noreplace test uses a dynamically
allocated base address for the mmap()s that it tests and currently
includes this in the test names that it logs so the test names that are
logged are not stable between runs. It also uses multiples of PAGE_SIZE
which mean that runs for kernels with different PAGE_SIZE configurations
can't be directly compared. Both these factors cause issues for CI
systems when interpreting and displaying results.
Fix this by replacing the current test names with fixed strings describing
the intent of the mappings that are logged, the existing messages with the
actual addresses and sizes are retained as diagnostic prints to aid in
debugging.
Link: https://lkml.kernel.org/r/20240605-kselftest-mm-fixed-noreplace-v1-1-a235db8b9be9@kernel.org
Fixes: 4838cf70e539 ("selftests/mm: map_fixed_noreplace: conform test to TAP format output")
Signed-off-by: Mark Brown <[email protected]>
Reviewed-by: Ryan Roberts <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it didn't
have proper documentation. This led to a lot of confusion, especially
about whether or not memfd created with the MFD_NOEXEC_SEAL flag is
sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea is
to make it easier to use memfd in the most common way, which is NOEXEC +
F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl vm.noexec to help
existing applications move to a more secure way of using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1], Those
are based on the viewpoint that each flag is an atomic unit, which is a
reasonable assumption. However, MFD_NOEXEC_SEAL was designed with the
intent of promoting the most secure method of using memfd, therefore a
combination of multiple functionalities into one bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one year,
and multiple applications and distributions have backported and utilized
it. Altering ABI now presents a degree of risk and may lead to
disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code to
use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system vm.noexec
= 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying the
semantics of MFD_NOEXEC_SEAL, thereby preventing any potential future
confusion.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/[email protected]/
[[email protected]: updates per Randy]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: v3]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jeff Xu <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Barnabás Pőcze <[email protected]>
Cc: Daniel Verkamp <[email protected]>
Cc: David Rheinsberg <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jorge Lucangeli Obes <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
default
An ASLR regression was noticed [1] and tracked down to file-mapped areas
being backed by THP in recent kernels. The 21-bit alignment constraint
for such mappings reduces the entropy for randomizing the placement of
64-bit library mappings and breaks ASLR completely for 32-bit libraries.
The reported issue is easily addressed by increasing vm.mmap_rnd_bits and
vm.mmap_rnd_compat_bits. This patch just provides a simple way to set
ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS to their maximum values
allowed by the architecture at build time.
[1] https://zolutal.github.io/aslrnt/
[[email protected]: default to `y' if 32-bit, per Rafael]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1854bc6e2420 ("mm/readahead: Align file mappings for non-DAX")
Signed-off-by: Rafael Aquini <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Samuel Holland <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte
long .gcda files with no usable data. To fix this, update GCOV_COUNTERS
to match the value defined by GCC 14.
Tested with GCC versions 14.1.0 and 13.2.0.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Oberparleiter <[email protected]>
Reported-by: Allison Henderson <[email protected]>
Reported-by: Chuck Lever III <[email protected]>
Tested-by: Chuck Lever <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
kernel_wait4() doesn't sleep and returns -EINTR if there is no
eligible child and signal_pending() is true.
That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
return false and avoid a busy-wait loop.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Rachel Menge <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Reviewed-by: Boqun Feng <[email protected]>
Tested-by: Wei Fu <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Allen Pais <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Joel Granados <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Mike Christie <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt (Google) <[email protected]>
Cc: Zqiang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When I did a large folios split test, a WARNING "[ 5059.122759][ T166]
Cannot split file folio to non-0 order" was triggered. But the test cases
are only for anonmous folios. while mapping_large_folio_support() is only
reasonable for page cache folios.
In split_huge_page_to_list_to_order(), the folio passed to
mapping_large_folio_support() maybe anonmous folio. The folio_test_anon()
check is missing. So the split of the anonmous THP is failed. This is
also the same for shmem_mapping(). We'd better add a check for both. But
the shmem_mapping() in __split_huge_page() is not involved, as for
anonmous folios, the end parameter is set to -1, so (head[i].index >= end)
is always false. shmem_mapping() is not called.
Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon
mapping, So we can detect the wrong use more easily.
THP folios maybe exist in the pagecache even the file system doesn't
support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is
enabled, khugepaged will try to collapse read-only file-backed pages to
THP. But the mapping does not actually support multi order large folios
properly.
Using /sys/kernel/debug/split_huge_pages to verify this, with this patch,
large anon THP is successfully split and the warning is ceased.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Reviewed-by: Barry Song <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Ran Xiaokai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: xu xin <[email protected]>
Cc: Yang Yang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
put_page_tag_ref() should be called only when get_page_tag_ref() returns a
valid reference because only in that case get_page_tag_ref() enters RCU
read section while put_page_tag_ref() will call rcu_read_unlock() even if
the provided reference is NULL. Fix pgalloc_tag_get() which does not
follow this rule causing RCU imbalance. Add a warning in
put_page_tag_ref() to catch any future mistakes.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: cc92eba1c88b ("mm: fix non-compound multi-order memory accounting in __free_pages")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Acked-by: Vlastimil Babka <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Memory allocation profiling is trying to register sysctl interface even
when CONFIG_SYSCTL=n, resulting in proc_do_static_key() being undefined.
Prevent that by skipping sysctl registration for such configurations.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 22d407b164ff ("lib: add allocation tagging support for memory allocation profiling")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Acked-by: Vlastimil Babka <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
I haven't had the bandwidth to review vmalloc patches recently and I
suspect I won't be able to do so consistently moving forwards, so I think
it's best if I remove myself as reviewer for the time being.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lorenzo Stoakes <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There was insufficient review and no agreement that this is the right
approach.
There are serious flaws with the implementation that make processes using
mlock() not even work with simple fork() [1] and we get reliable crashes
when rebooting.
Further, simply because we might be unmapping a single PTE of a large
mlocked folio, we shouldn't zero out the whole folio.
... especially because the code can also *corrupt* urelated memory because
kernel_init_pages(page, folio_nr_pages(folio));
Could end up writing outside of the actual folio if we work with a tail
page.
Let's revert it. Once there is agreement that this is the right approach,
the issues were fixed and there was reasonable review and proper testing,
we can consider it again.
[1] https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ba42b524a040 ("mm: init_mlocked_on_free_v3")
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: David Wang <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Lance Yang <[email protected]>
Closes: https://lkml.kernel.org/r/[email protected]
Acked-by: Lance Yang <[email protected]>
Cc: York Jasper Niebuhr <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Not all pages may apply to pgtable check. One example is ZONE_DEVICE
pages: they map PFNs directly, and they don't allocate page_ext at all
even if there's struct page around. One may reference
devm_memremap_pages().
When both ZONE_DEVICE and page-table-check enabled, then try to map some
dax memories, one can trigger kernel bug constantly now when the kernel
was trying to inject some pfn maps on the dax device:
kernel BUG at mm/page_table_check.c:55!
While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
fault resolutions, skip all the checks if page_ext doesn't even exist in
pgtable checker, which applies to ZONE_DEVICE but maybe more.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: df4e817b7108 ("mm: page table check")
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Pasha Tatashin <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Alistair Popple <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
'-Warray-bounds' is already disabled for gcc-10+. Now that we've merged
bitmap_{read,write), I see the following error when building the kernel
with gcc-9.4 (Ubuntu 20.04.4 LTS) for x86_64 allmodconfig:
drivers/pinctrl/pinctrl-cy8c95x0.c: In function `cy8c95x0_read_regs_mask.isra.0':
include/linux/bitmap.h:756:18: error: array subscript [1, 288230376151711744] is outside array bounds of `long unsigned int[1]' [-Werror=array-bounds]
756 | value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
| ~~~^~~~~~~~~~~
The immediate reason is that the commit b44759705f7d ("bitmap: make
bitmap_{get,set}_value8() use bitmap_{read,write}()") switched the
bitmap_get_value8() to an alias of bitmap_read(); the same for 'set'.
Now; the code that triggers Warray-bounds, calls the function like this:
#define MAX_BANK 8
#define BANK_SZ 8
#define MAX_LINE (MAX_BANK * BANK_SZ)
DECLARE_BITMAP(tval, MAX_LINE); // 64-bit map: unsigned long tval[1]
read_val |= bitmap_get_value8(tval, i * BANK_SZ) & ~bits;
bitmap_read() is implemented such that it may conditionally dereference a
pointer beyond the boundary like this:
unsigned long offset = start % BITS_PER_LONG;
unsigned long space = BITS_PER_LONG - offset;
if (space >= nbits)
return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits);
value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
return (value_low >> offset) | (value_high << space);
In case of bitmap_get_value8(), it's impossible to violate the boundary
because 'space >= nbits' is never the true for byte-aligned 8-bit access.
So, this is clearly a false-positive.
The same type of false-positives break my allmodconfig build in many
places. gcc-8, is clear, however.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b44759705f7d ("bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()")
Signed-off-by: Yury Norov <[email protected]>
Cc: Alexander Lobakin <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Yoann Congal <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
bdev->bd_super has been removed and commit 8887b94d9322 change the usage
from bdev->bd_super to b_assoc_map->host->i_sb. Since ocfs2 hasn't set
bh->b_assoc_map, it will trigger NULL pointer dereference when calling
into ocfs2_abort_trigger().
Actually this was pointed out in history, see commit 74e364ad1b13. But
I've made a mistake when reviewing commit 8887b94d9322 and then
re-introduce this regression.
Since we cannot revive bdev in buffer head, so fix this issue by
initializing all types of ocfs2 triggers when fill super, and then get the
specific ocfs2 trigger from ocfs2_caching_info when access journal.
[[email protected]: v2]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8887b94d9322 ("ocfs2: stop using bdev->bd_super for journal error logging")
Signed-off-by: Joseph Qi <[email protected]>
Reviewed-by: Heming Zhao <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]> [6.6+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
bdev->bd_super has been removed and commit 8887b94d9322 change the usage
from bdev->bd_super to b_assoc_map->host->i_sb. This introduces the
following NULL pointer dereference in ocfs2_journal_dirty() since
b_assoc_map is still not initialized. This can be easily reproduced by
running xfstests generic/186, which simulate no more credits.
[ 134.351592] BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 134.355341] RIP: 0010:ocfs2_journal_dirty+0x14f/0x160 [ocfs2]
...
[ 134.365071] Call Trace:
[ 134.365312] <TASK>
[ 134.365524] ? __die_body+0x1e/0x60
[ 134.365868] ? page_fault_oops+0x13d/0x4f0
[ 134.366265] ? __pfx_bit_wait_io+0x10/0x10
[ 134.366659] ? schedule+0x27/0xb0
[ 134.366981] ? exc_page_fault+0x6a/0x140
[ 134.367356] ? asm_exc_page_fault+0x26/0x30
[ 134.367762] ? ocfs2_journal_dirty+0x14f/0x160 [ocfs2]
[ 134.368305] ? ocfs2_journal_dirty+0x13d/0x160 [ocfs2]
[ 134.368837] ocfs2_create_new_meta_bhs.isra.51+0x139/0x2e0 [ocfs2]
[ 134.369454] ocfs2_grow_tree+0x688/0x8a0 [ocfs2]
[ 134.369927] ocfs2_split_and_insert.isra.67+0x35c/0x4a0 [ocfs2]
[ 134.370521] ocfs2_split_extent+0x314/0x4d0 [ocfs2]
[ 134.371019] ocfs2_change_extent_flag+0x174/0x410 [ocfs2]
[ 134.371566] ocfs2_add_refcount_flag+0x3fa/0x630 [ocfs2]
[ 134.372117] ocfs2_reflink_remap_extent+0x21b/0x4c0 [ocfs2]
[ 134.372994] ? inode_update_timestamps+0x4a/0x120
[ 134.373692] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2]
[ 134.374545] ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2]
[ 134.375393] ocfs2_reflink_remap_blocks+0xe4/0x4e0 [ocfs2]
[ 134.376197] ocfs2_remap_file_range+0x1de/0x390 [ocfs2]
[ 134.376971] ? security_file_permission+0x29/0x50
[ 134.377644] vfs_clone_file_range+0xfe/0x320
[ 134.378268] ioctl_file_clone+0x45/0xa0
[ 134.378853] do_vfs_ioctl+0x457/0x990
[ 134.379422] __x64_sys_ioctl+0x6e/0xd0
[ 134.379987] do_syscall_64+0x5d/0x170
[ 134.380550] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 134.381231] RIP: 0033:0x7fa4926397cb
[ 134.381786] Code: 73 01 c3 48 8b 0d bd 56 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 56 38 00 f7 d8 64 89 01 48
[ 134.383930] RSP: 002b:00007ffc2b39f7b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 134.384854] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa4926397cb
[ 134.385734] RDX: 00007ffc2b39f7f0 RSI: 000000004020940d RDI: 0000000000000003
[ 134.386606] RBP: 0000000000000000 R08: 00111a82a4f015bb R09: 00007fa494221000
[ 134.387476] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 134.388342] R13: 0000000000f10000 R14: 0000558e844e2ac8 R15: 0000000000f10000
[ 134.389207] </TASK>
Fix it by only aborting transaction and journal in ocfs2_journal_dirty()
now, and leave ocfs2_abort() later when detecting an aborted handle,
e.g. start next transaction. Also log the handle details in this case.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8887b94d9322 ("ocfs2: stop using bdev->bd_super for journal error logging")
Signed-off-by: Joseph Qi <[email protected]>
Reviewed-by: Heming Zhao <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]> [6.6+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
Fix the invalid BT shutdown GPIO (gpio1_io3 not gpio4_io16)
Fixes: 716ced308234 ("arm64: dts: freescale: Add imx8mp-venice-gw73xx-2x")
Signed-off-by: Tim Harvey <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>
|
|
The kmemleak code sometimes complains about the following leak:
unreferenced object 0xffff8000102e0000 (size 32768):
comm "swapper/0", pid 1, jiffies 4294937323 (age 71.240s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000db9a88a3>] __vmalloc_node_range+0x324/0x450
[<00000000ff8903a4>] __vmalloc_node+0x90/0xd0
[<000000001a06634f>] arm64_efi_rt_init+0x64/0xdc
[<0000000007826a8d>] do_one_initcall+0x178/0xac0
[<0000000054a87017>] do_initcalls+0x190/0x1d0
[<00000000308092d0>] kernel_init_freeable+0x2c0/0x2f0
[<000000003e7b99e0>] kernel_init+0x28/0x14c
[<000000002246af5b>] ret_from_fork+0x10/0x20
The memory object in this case is for efi_rt_stack_top and is allocated
in an initcall. So this is certainly a false positive. Mark the object
as not a leak to quash it.
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
|