Age | Commit message (Collapse) | Author | Files | Lines |
|
Various elements are parsed with a requirement to have an
exact size, when really we should only check that they have
the minimum size that we need. Check only that and therefore
ignore any additional data that they might carry.
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.cd101f8040a4.Iadf0e9b37b100c6c6e79c7b298cc657c2be9151a@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
Apparently we never clear these values, so they'll remain set
since the setting of them is conditional. Clear the values in
the relevant other cases.
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.316e32d136a9.I2a12e51814258e1e1b526103894f4b9f19a91c8d@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
If cfg80211_pmsr_process_abort() moves all the PMSR requests that
need to be freed into a local list before aborting and freeing them.
As a result, it is possible that cfg80211_pmsr_complete() will run in
parallel and free the same PMSR request.
Fix it by freeing the request in cfg80211_pmsr_complete() only if it
is still in the original pmsr list.
Cc: [email protected]
Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Avraham Stern <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.1fbef57e269a.I00294bebdb0680b892f8d1d5c871fd9dbe785a5e@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
If all net/wireless/certs/*.hex files are deleted, the build
will hang at this point since the 'cat' command will have no
arguments. Do "echo | cat - ..." so that even if the "..."
part is empty, the whole thing won't hang.
Cc: [email protected]
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.c989056c3664.Ic3b77531d00b30b26dcd69c64e55ae2f60c3f31e@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
We need to skip sampling if the next sample time is after jiffies, not before.
This patch fixes an issue where in some cases only very little sampling (or none
at all) is performed, leading to really bad data rates
Fixes: 80d55154b2f8 ("mac80211: minstrel_ht: significantly redesign the rate probing strategy")
Cc: [email protected]
Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
On systems without any specific PMU driver support registered, running
perf record causes Oops.
The relevant portion from call trace:
BUG: Kernel NULL pointer dereference on read at 0x00000040
Faulting instruction address: 0xc0021f0c
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PREEMPT CMPCPRO
SAF3000 DIE NOTIFICATION
CPU: 0 PID: 442 Comm: null_syscall Not tainted 5.13.0-rc6-s3k-dev-01645-g7649ee3d2957 #5164
NIP: c0021f0c LR: c00e8ad8 CTR: c00d8a5c
NIP perf_instruction_pointer+0x10/0x60
LR perf_prepare_sample+0x344/0x674
Call Trace:
perf_prepare_sample+0x7c/0x674 (unreliable)
perf_event_output_forward+0x3c/0x94
__perf_event_overflow+0x74/0x14c
perf_swevent_hrtimer+0xf8/0x170
__hrtimer_run_queues.constprop.0+0x160/0x318
hrtimer_interrupt+0x148/0x3b0
timer_interrupt+0xc4/0x22c
Decrementer_virt+0xb8/0xbc
During perf record session, perf_instruction_pointer() is called to
capture the sample IP. This function in core-book3s accesses
ppmu->flags. If a platform specific PMU driver is not registered, ppmu
is set to NULL and accessing its members results in a crash. Fix this
crash by checking if ppmu is set.
Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero")
Cc: [email protected] # v5.11+
Reported-by: Christophe Leroy <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Tested-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.13-2021-06-16:
amdgpu:
- GFX9 and 10 powergating fixes
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Pull kvm fixes from Paolo Bonzini:
"Miscellaneous bugfixes.
The main interesting one is a NULL pointer dereference reported by
syzkaller ("KVM: x86: Immediately reset the MMU context when the SMM
flag is cleared")"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Fix kvm_check_cap() assertion
KVM: x86/mmu: Calculate and check "full" mmu_role for nested MMU
KVM: X86: Fix x86_emulator slab cache leak
KVM: SVM: Call SEV Guest Decommission if ASID binding fails
KVM: x86: Immediately reset the MMU context when the SMM flag is cleared
KVM: x86: Fix fall-through warnings for Clang
KVM: SVM: fix doc warnings
KVM: selftests: Fix compiling errors when initializing the static structure
kvm: LAPIC: Restore guard to prevent illegal APIC register access
|
|
The source (&dcbx_info->operational.params) and dest
(&p_hwfn->p_dcbx_info->set.config.params) are both struct qed_dcbx_params
(560 bytes), not struct qed_dcbx_admin_params (564 bytes), which is used
as the memcpy() size.
However it seems that struct qed_dcbx_operational_params
(dcbx_info->operational)'s layout matches struct qed_dcbx_admin_params
(p_hwfn->p_dcbx_info->set.config)'s 4 byte difference (3 padding, 1 byte
for "valid").
On the assumption that the size is wrong (rather than the source structure
type), adjust the memcpy() size argument to be 4 bytes smaller and add
a BUILD_BUG_ON() to validate any changes to the structure sizes.
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
when usbnet transmit a skb, eem fixup it in eem_tx_fixup(),
if skb_copy_expand() failed, it return NULL,
usbnet_start_xmit() will have no chance to free original skb.
fix it by free orginal skb in eem_tx_fixup() first,
then check skb clone status, if failed, return NULL to usbnet.
Fixes: 9f722c0978b0 ("usbnet: CDC EEM support (v5)")
Signed-off-by: Linyu Yuan <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-06-16
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
My local syzbot instance hit memory leak in
mkiss_open()[1]. The problem was in missing
free_netdev() in mkiss_close().
In mkiss_open() netdevice is allocated and then
registered, but in mkiss_close() netdevice was
only unregistered, but not freed.
Fail log:
BUG: memory leak
unreferenced object 0xffff8880281ba000 (size 4096):
comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
hex dump (first 32 bytes):
61 78 30 00 00 00 00 00 00 00 00 00 00 00 00 00 ax0.............
00 27 fa 2a 80 88 ff ff 00 00 00 00 00 00 00 00 .'.*............
backtrace:
[<ffffffff81a27201>] kvmalloc_node+0x61/0xf0
[<ffffffff8706e7e8>] alloc_netdev_mqs+0x98/0xe80
[<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
[<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
[<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
[<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
[<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
[<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
[<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak
unreferenced object 0xffff8880141a9a00 (size 96):
comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
hex dump (first 32 bytes):
e8 a2 1b 28 80 88 ff ff e8 a2 1b 28 80 88 ff ff ...(.......(....
98 92 9c aa b0 40 02 00 00 00 00 00 00 00 00 00 .....@..........
backtrace:
[<ffffffff8709f68b>] __hw_addr_create_ex+0x5b/0x310
[<ffffffff8709fb38>] __hw_addr_add_ex+0x1f8/0x2b0
[<ffffffff870a0c7b>] dev_addr_init+0x10b/0x1f0
[<ffffffff8706e88b>] alloc_netdev_mqs+0x13b/0xe80
[<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
[<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
[<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
[<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
[<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
[<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
[<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak
unreferenced object 0xffff8880219bfc00 (size 512):
comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
hex dump (first 32 bytes):
00 a0 1b 28 80 88 ff ff 80 8f b1 8d ff ff ff ff ...(............
80 8f b1 8d ff ff ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81a27201>] kvmalloc_node+0x61/0xf0
[<ffffffff8706eec7>] alloc_netdev_mqs+0x777/0xe80
[<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
[<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
[<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
[<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
[<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
[<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
[<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
BUG: memory leak
unreferenced object 0xffff888029b2b200 (size 256):
comm "syz-executor.1", pid 11443, jiffies 4295046091 (age 17.660s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81a27201>] kvmalloc_node+0x61/0xf0
[<ffffffff8706f062>] alloc_netdev_mqs+0x912/0xe80
[<ffffffff84e64192>] mkiss_open+0xb2/0x6f0 [1]
[<ffffffff842355db>] tty_ldisc_open+0x9b/0x110
[<ffffffff84236488>] tty_set_ldisc+0x2e8/0x670
[<ffffffff8421f7f3>] tty_ioctl+0xda3/0x1440
[<ffffffff81c9f273>] __x64_sys_ioctl+0x193/0x200
[<ffffffff8911263a>] do_syscall_64+0x3a/0xb0
[<ffffffff89200068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 815f62bf7427 ("[PATCH] SMP rewrite of mkiss")
Signed-off-by: Pavel Skripkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If an error occurs after a 'pci_enable_pcie_error_reporting()' call, it
must be undone by a corresponding 'pci_disable_pcie_error_reporting()'
call, as already done in the remove function.
Fixes: d6b6d9877878 ("be2net: use PCIe AER capability")
Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Somnath Kotur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
KVM_CHECK_EXTENSION ioctl can return any negative value on error,
and not necessarily -1. Change the assertion to reflect that.
Signed-off-by: Fuad Tabba <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota and fanotify fixes from Jan Kara:
"A fixup finishing disabling of quotactl_path() syscall (I've missed
archs using different way to declare syscalls) and a fix of an fd leak
in error handling path of fanotify"
* tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: finish disable quotactl_path syscall
fanotify: fix copy_event_to_user() fid error clean up
|
|
The Cypress CY7C65632 appears to have an issue with auto suspend and
detecting devices, not too dissimilar to the SMSC 5534B hub. It is
easiest to reproduce by connecting multiple mass storage devices to
the hub at the same time. On a Lenovo Yoga, around 1 in 3 attempts
result in the devices not being detected. It is however possible to
make them appear using lsusb -v.
Disabling autosuspend for this hub resolves the issue.
Fixes: 1208f9e1d758 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub")
Cc: [email protected]
Signed-off-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- Fix GICv3 NMI handling where an IRQ could be mistakenly handled
as a NMI, with disatrous effects
Link: https://lore.kernel.org/r/[email protected]
|
|
Consider we have a using block group on zoned btrfs.
|<- ZU ->|<- used ->|<---free--->|
`- Alloc offset
ZU: Zone unusable
Marking the block group read-only will migrate the zone unusable bytes
to the read-only bytes. So, we will have this.
|<- RO ->|<- used ->|<--- RO --->|
RO: Read only
When marking it back to read-write, btrfs_dec_block_group_ro()
subtracts the above "RO" bytes from the
space_info->bytes_readonly. And, it moves the zone unusable bytes back
and again subtracts those bytes from the space_info->bytes_readonly,
leading to negative bytes_readonly.
This can be observed in the output as eg.:
Data, single: total=512.00MiB, used=165.21MiB, zone_unusable=16.00EiB
Data, single: total=536870912, used=173256704, zone_unusable=18446744073603186688
This commit fixes the issue by reordering the operations.
Link: https://github.com/naota/linux/issues/37
Reported-by: David Sterba <[email protected]>
Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
CC: [email protected] # 5.12+
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Reset only the index part of the mkey and keep the variant part. On
devlink reload, driver recreates mkeys, so the mkey index may change.
Trying to preserve the variant part of the mkey, driver mistakenly
merged the mkey index with current value. In case of a devlink reload,
current value of index part is dirty, so the index may be corrupted.
Fixes: 54c62e13ad76 ("{IB,net}/mlx5: Setup mkey variant before mr create command invocation")
Signed-off-by: Aya Levin <[email protected]>
Signed-off-by: Amir Tzin <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Running devlink reload command for port in switchdev mode cause
resources to corrupt: driver can't release allocated EQ and reclaim
memory pages, because "rdma" auxiliary device had add CQs which blocks
EQ from deletion.
Erroneous sequence happens during reload-down phase, and is following:
1. detach device - suspends auxiliary devices which support it, destroys
others. During this step "eth-rep" and "rdma-rep" are destroyed,
"eth" - suspended.
2. disable SRIOV - moves device to legacy mode; as part of disablement -
rescans drivers. This step adds "rdma" auxiliary device.
3. destroy EQ table - <failure>.
Driver shouldn't create any device during unload flows. To handle that
implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
on device attach. If flag is set do no-op on drivers rescan.
Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Signed-off-by: Dmytro Linkin <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Decapsulation L3 on small inner packets which are less than
64 Bytes was done incorrectly. In small packets there is an
extra padding added in L2 which should not be included in L3
length. The issue was that after decapL3 the extra L2 padding
caused an update on the L3 length.
To avoid this issue the new header is pushed to the beginning
of the packet (offset 0) which should not cause a HW reparse
and update the L3 length.
Fixes: c349b4137cfd ("net/mlx5: DR, Add STEv1 modify header logic")
Reviewed-by: Erez Shitrit <[email protected]>
Reviewed-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Alex Vesker <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When auxiliary bus autoprobe is disabled and SF is in ACTIVE state,
on SF port deletion it transitions from ACTIVE->ALLOCATED->INVALID.
When VHCA event handler queries the state, it is already transition
to INVALID state.
In this scenario, event handler missed to delete the SF device.
Fix it by deleting the SF when SF state is INVALID.
Fixes: 90d010b8634b ("net/mlx5: SF, Add auxiliary device support")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
E-switch should be able to set the GUID of host PF vport.
Currently it returns an error. This results in below error
when user attempts to configure MAC address of the PF of an
external controller.
$ devlink port function set pci/0000:03:00.0/196608 \
hw_addr 00:00:00:11:22:33
mlx5_core 0000:03:00.0: mlx5_esw_set_vport_mac_locked:1876:(pid 6715):\
"Failed to set vport 0 node guid, err = -22.
RDMA_CM will not function properly for this VF."
Check for zero vport is no longer needed.
Fixes: 330077d14de1 ("net/mlx5: E-switch, Supporting setting devlink port function mac address")
Signed-off-by: Yuval Avnery <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Bodong Wang <[email protected]>
Reviewed-by: Alaa Hleihel <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
External controller PF's MAC address is not read from the device during
vport setup. Fail to read this results in showing all zeros to user
while the factory programmed MAC is a valid value.
$ devlink port show eth1 -jp
{
"port": {
"pci/0000:03:00.0/196608": {
"type": "eth",
"netdev": "eth1",
"flavour": "pcipf",
"controller": 1,
"pfnum": 0,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00"
}
}
}
}
Hence, read it when enabling a vport.
After the fix,
$ devlink port show eth1 -jp
{
"port": {
"pci/0000:03:00.0/196608": {
"type": "eth",
"netdev": "eth1",
"flavour": "pcipf",
"controller": 1,
"pfnum": 0,
"splittable": false,
"function": {
"hw_addr": "98:03:9b:a0:60:11"
}
}
}
}
Fixes: f099fde16db3 ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Bodong Wang <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Alaa Hleihel <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The device can be requested to be attached despite being not probed.
This situation is possible if devlink reload races with module removal,
and the following kernel panic is an outcome of such race.
mlx5_core 0000:00:09.0: firmware version: 4.7.9999
mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
BUG: unable to handle page fault for address: fffffffffffffff0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 3218067 P4D 3218067 PUD 321a067 PMD 0
Oops: 0000 [#1] SMP KASAN NOPTI
CPU: 7 PID: 250 Comm: devlink Not tainted 5.12.0-rc2+ #2836
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5_attach_device+0x80/0x280 [mlx5_core]
Code: f8 48 c1 e8 03 42 80 3c 38 00 0f 85 80 01 00 00 48 8b 45 68 48 8d 78 f0 48 89 fe 48 c1 ee 03 42 80 3c 3e 00 0f 85 70 01 00 00 <48> 8b 40 f0 48 85 c0 74 0d 48 89 ef ff d0 85 c0 0f 85 84 05 0e 00
RSP: 0018:ffff8880129675f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff827407f1
RDX: 1ffff110011336cf RSI: 1ffffffffffffffe RDI: fffffffffffffff0
RBP: ffff888008e0c000 R08: 0000000000000008 R09: ffffffffa0662ee7
R10: fffffbfff40cc5dc R11: 0000000000000000 R12: ffff88800ea002e0
R13: ffffed1001d459f7 R14: ffffffffa05ef4f8 R15: dffffc0000000000
FS: 00007f51dfeaf740(0000) GS:ffff88806d5c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffff0 CR3: 000000000bc82006 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
mlx5_load_one+0x117/0x1d0 [mlx5_core]
devlink_reload+0x2d5/0x520
? devlink_remote_reload_actions_performed+0x30/0x30
? mutex_trylock+0x24b/0x2d0
? devlink_nl_cmd_reload+0x62b/0x1070
devlink_nl_cmd_reload+0x66d/0x1070
? devlink_reload+0x520/0x520
? devlink_nl_pre_doit+0x64/0x4d0
genl_family_rcv_msg_doit+0x1e9/0x2f0
? mutex_lock_io_nested+0x1130/0x1130
? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
? security_capable+0x51/0x90
genl_rcv_msg+0x27f/0x4a0
? genl_get_cmd+0x3c0/0x3c0
? lock_acquire+0x1a9/0x6d0
? devlink_reload+0x520/0x520
? lock_release+0x6c0/0x6c0
netlink_rcv_skb+0x11d/0x340
? genl_get_cmd+0x3c0/0x3c0
? netlink_ack+0x9f0/0x9f0
? lock_release+0x1f9/0x6c0
genl_rcv+0x24/0x40
netlink_unicast+0x433/0x700
? netlink_attachskb+0x730/0x730
? _copy_from_iter_full+0x178/0x650
? __alloc_skb+0x113/0x2b0
netlink_sendmsg+0x6f1/0xbd0
? netlink_unicast+0x700/0x700
? netlink_unicast+0x700/0x700
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x193/0x240
? __x64_sys_getpeername+0xb0/0xb0
? copy_page_range+0x2300/0x2300
? __up_read+0x1a1/0x7b0
? do_user_addr_fault+0x219/0xdc0
__x64_sys_sendto+0xdd/0x1b0
? syscall_enter_from_user_mode+0x1d/0x50
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f51dffb514a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
RSP: 002b:00007ffcaef22e78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f51dffb514a
RDX: 0000000000000030 RSI: 000055750daf2440 RDI: 0000000000000003
RBP: 000055750daf2410 R08: 00007f51e0081200 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Modules linked in: mlx5_core(-) ptp pps_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [last unloaded: mlx5_ib]
CR2: fffffffffffffff0
---[ end trace 7789831bfe74fa42 ]---
Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
In the case of the failure to execute mlx5_core_set_hca_defaults(),
we used wrong goto label to execute error unwind flow.
Fixes: 5bef709d76a2 ("net/mlx5: Enable host PF HCA after eswitch is initialized")
Reviewed-by: Saeed Mahameed <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When a AP queue is switched to soft offline, all pending
requests are purged out of the pending requests list and
'received' by the upper layer like zcrypt device drivers.
This is also done for requests which are already enqueued
into the firmware queue. A request in a firmware queue
may eventually produce an response message, but there is
no waiting process any more. However, the response was
counted with the queue_counter and as this counter was
reset to 0 with the offline switch, the pending response
caused the queue_counter to get negative. The next request
increased this counter to 0 (instead of 1) which caused
the ap code to assume there is nothing to receive and so
the response for this valid request was never tried to
fetch from the firmware queue.
This all caused a queue to not work properly after a
switch offline/online and in the end processes to hang
forever when trying to send a crypto request after an
queue offline/online switch cicle.
Fixed by a) making sure the counter does not drop below 0
and b) on a successful enqueue of a message has at least
a value of 1.
Additionally a warning is emitted, when a reply can't get
assigned to a waiting process. This may be normal operation
(process had timeout or has been killed) but may give a
hint that something unexpected happened (like this odd
behavior described above).
Signed-off-by: Harald Freudenberger <[email protected]>
Cc: [email protected]
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
Signed-off-by: Yifan Zhang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.
The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.
The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally reading across neighboring array fields.
The memcpy() is copying the entire structure, not just the first array.
Adjust the source argument so the compiler can do appropriate bounds
checking.
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
udpgro_fwd.sh contains many bash specific operators ("[[", "local -r"),
but it's using /bin/sh; in some distro /bin/sh is mapped to /bin/dash,
that doesn't support such operators.
Force the test to use /bin/bash explicitly and prevent false positive
test failures.
Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
While unix_may_send(sk, osk) is called while osk is locked, it appears
unix_release_sock() can overwrite unix_peer() after this lock has been
released, making KCSAN unhappy.
Changing unix_release_sock() to access/change unix_peer()
before lock is released should fix this issue.
BUG: KCSAN: data-race in unix_dgram_sendmsg / unix_release_sock
write to 0xffff88810465a338 of 8 bytes by task 20852 on cpu 1:
unix_release_sock+0x4ed/0x6e0 net/unix/af_unix.c:558
unix_release+0x2f/0x50 net/unix/af_unix.c:859
__sock_release net/socket.c:599 [inline]
sock_close+0x6c/0x150 net/socket.c:1258
__fput+0x25b/0x4e0 fs/file_table.c:280
____fput+0x11/0x20 fs/file_table.c:313
task_work_run+0xae/0x130 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:209
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:302
do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88810465a338 of 8 bytes by task 20888 on cpu 0:
unix_may_send net/unix/af_unix.c:189 [inline]
unix_dgram_sendmsg+0x923/0x1610 net/unix/af_unix.c:1712
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
__do_sys_sendmmsg net/socket.c:2519 [inline]
__se_sys_sendmmsg net/socket.c:2516 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff888167905400 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 20888 Comm: syz-executor.0 Not tainted 5.13.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
veth.sh is a shell script that uses /bin/sh; some distro (Ubuntu for
example) use dash as /bin/sh and in this case the test reports the
following error:
# ./veth.sh: 21: local: -r: bad variable name
# ./veth.sh: 21: local: -r: bad variable name
This happens because dash doesn't support the option "-r" with local.
Moreover, in case of missing bpf object, the script is exiting -1, that
is an illegal number for dash:
exit: Illegal number: -1
Change the script to be compatible both with bash and dash and prevent
the errors above.
Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Eric Dumazet says:
====================
net/packet: annotate data races
KCSAN sent two reports about data races in af_packet.
Nothing serious, but worth fixing.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Like prior patch, we need to annotate lockless accesses to po->ifindex
For instance, packet_getname() is reading po->ifindex (twice) while
another thread is able to change po->ifindex.
KCSAN reported:
BUG: KCSAN: data-race in packet_do_bind / packet_getname
write to 0xffff888143ce3cbc of 4 bytes by task 25573 on cpu 1:
packet_do_bind+0x420/0x7e0 net/packet/af_packet.c:3191
packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
__sys_bind+0x200/0x290 net/socket.c:1637
__do_sys_bind net/socket.c:1648 [inline]
__se_sys_bind net/socket.c:1646 [inline]
__x64_sys_bind+0x3d/0x50 net/socket.c:1646
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888143ce3cbc of 4 bytes by task 25578 on cpu 0:
packet_getname+0x5b/0x1a0 net/packet/af_packet.c:3525
__sys_getsockname+0x10e/0x1a0 net/socket.c:1887
__do_sys_getsockname net/socket.c:1902 [inline]
__se_sys_getsockname net/socket.c:1899 [inline]
__x64_sys_getsockname+0x3e/0x50 net/socket.c:1899
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 25578 Comm: syz-executor.5 Not tainted 5.13.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
tpacket_snd(), packet_snd(), packet_getname() and packet_seq_show()
can read po->num without holding a lock. This means other threads
can change po->num at the same time.
KCSAN complained about this known fact [1]
Add READ_ONCE()/WRITE_ONCE() to address the issue.
[1] BUG: KCSAN: data-race in packet_do_bind / packet_sendmsg
write to 0xffff888131a0dcc0 of 2 bytes by task 24714 on cpu 0:
packet_do_bind+0x3ab/0x7e0 net/packet/af_packet.c:3181
packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
__sys_bind+0x200/0x290 net/socket.c:1637
__do_sys_bind net/socket.c:1648 [inline]
__se_sys_bind net/socket.c:1646 [inline]
__x64_sys_bind+0x3d/0x50 net/socket.c:1646
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888131a0dcc0 of 2 bytes by task 24719 on cpu 1:
packet_snd net/packet/af_packet.c:2899 [inline]
packet_sendmsg+0x317/0x3570 net/packet/af_packet.c:3040
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmsg+0x1ed/0x270 net/socket.c:2433
__do_sys_sendmsg net/socket.c:2442 [inline]
__se_sys_sendmsg net/socket.c:2440 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2440
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000 -> 0x1200
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 24719 Comm: syz-executor.5 Not tainted 5.13.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-06-16
this is a pull request of 4 patches for net/master.
The first patch is by Oleksij Rempel and fixes a Use-after-Free found
by syzbot in the j1939 stack.
The next patch is by Tetsuo Handa and fixes hung task detected by
syzbot in the bcm, raw and isotp protocols.
Norbert Slusarek's patch fixes a infoleak in bcm's struct
bcm_msg_head.
Pavel Skripkin's patch fixes a memory leak in the mcba_usb driver.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
BUG: memory leak
unreferenced object 0xffff888101bc4c00 (size 32):
comm "syz-executor527", pid 360, jiffies 4294807421 (age 19.329s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01 00 00 00 00 00 00 00 ac 14 14 bb 00 00 02 00 ................
backtrace:
[<00000000f17c5244>] kmalloc include/linux/slab.h:558 [inline]
[<00000000f17c5244>] kzalloc include/linux/slab.h:688 [inline]
[<00000000f17c5244>] ip_mc_add1_src net/ipv4/igmp.c:1971 [inline]
[<00000000f17c5244>] ip_mc_add_src+0x95f/0xdb0 net/ipv4/igmp.c:2095
[<000000001cb99709>] ip_mc_source+0x84c/0xea0 net/ipv4/igmp.c:2416
[<0000000052cf19ed>] do_ip_setsockopt net/ipv4/ip_sockglue.c:1294 [inline]
[<0000000052cf19ed>] ip_setsockopt+0x114b/0x30c0 net/ipv4/ip_sockglue.c:1423
[<00000000477edfbc>] raw_setsockopt+0x13d/0x170 net/ipv4/raw.c:857
[<00000000e75ca9bb>] __sys_setsockopt+0x158/0x270 net/socket.c:2117
[<00000000bdb993a8>] __do_sys_setsockopt net/socket.c:2128 [inline]
[<00000000bdb993a8>] __se_sys_setsockopt net/socket.c:2125 [inline]
[<00000000bdb993a8>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2125
[<000000006a1ffdbd>] do_syscall_64+0x40/0x80 arch/x86/entry/common.c:47
[<00000000b11467c4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
In commit 24803f38a5c0 ("igmp: do not remove igmp souce list info when set
link down"), the ip_mc_clear_src() in ip_mc_destroy_dev() was removed,
because it was also called in igmpv3_clear_delrec().
Rough callgraph:
inetdev_destroy
-> ip_mc_destroy_dev
-> igmpv3_clear_delrec
-> ip_mc_clear_src
-> RCU_INIT_POINTER(dev->ip_ptr, NULL)
However, ip_mc_clear_src() called in igmpv3_clear_delrec() doesn't
release in_dev->mc_list->sources. And RCU_INIT_POINTER() assigns the
NULL to dev->ip_ptr. As a result, in_dev cannot be obtained through
inetdev_by_index() and then in_dev->mc_list->sources cannot be released
by ip_mc_del1_src() in the sock_close. Rough call sequence goes like:
sock_close
-> __sock_release
-> inet_release
-> ip_mc_drop_socket
-> inetdev_by_index
-> ip_mc_leave_src
-> ip_mc_del_src
-> ip_mc_del1_src
So we still need to call ip_mc_clear_src() in ip_mc_destroy_dev() to free
in_dev->mc_list->sources.
Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info ...")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Chengyang Fan <[email protected]>
Acked-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Joakim Zhang says:
====================
net: fixes for fec ptp
Small fixes for fec ptp.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit da722186f654 ("net: fec: set GPR bit on suspend by DT configuration.")
refactor the fec_devtype, need adjust ptp driver accordingly.
Fixes: da722186f654 ("net: fec: set GPR bit on suspend by DT configuration.")
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add clock rate zero check to fix coverity issue of "divide by 0".
Fixes: commit 85bd1798b24a ("net: fec: fix spin_lock dead lock")
Signed-off-by: Fugang Duan <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The commit 46a8b29c6306 ("net: usb: fix memory leak in smsc75xx_bind")
fails to clean up the work scheduled in smsc75xx_reset->
smsc75xx_set_multicast, which leads to use-after-free if the work is
scheduled to start after the deallocation. In addition, this patch
also removes a dangling pointer - dev->data[0].
This patch calls cancel_work_sync to cancel the scheduled work and set
the dangling pointer to NULL.
Fixes: 46a8b29c6306 ("net: usb: fix memory leak in smsc75xx_bind")
Signed-off-by: Dongliang Mu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Platform drivers may call stmmac_probe_config_dt() to parse dt, could
call stmmac_remove_config_dt() in error handing after dt parsed, so need
disable clocks in stmmac_remove_config_dt().
Go through all platforms drivers which use stmmac_probe_config_dt(),
none of them disable clocks manually, so it's safe to disable them in
stmmac_remove_config_dt().
Fixes: commit d2ed0a7755fe ("net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks")
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Merge misc fixes from Andrew Morton:
"18 patches.
Subsystems affected by this patch series: mm (memory-failure, swap,
slub, hugetlb, memory-failure, slub, thp, sparsemem), and coredump"
* emailed patches from Andrew Morton <[email protected]>:
mm/sparse: fix check_usemap_section_nr warnings
mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
mm/thp: fix page_address_in_vma() on file THP tails
mm/thp: fix vma_address() if virtual address below file offset
mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
mm/thp: make is_huge_zero_pmd() safe and quicker
mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
mm, thp: use head page in __migration_entry_wait()
mm/slub.c: include swab.h
crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo
mm/memory-failure: make sure wait for page writeback in memory_failure
mm/hugetlb: expand restore_reserve_on_error functionality
mm/slub: actually fix freelist pointer vs redzoning
mm/slub: fix redzoning for small allocations
mm/slub: clarify verification reporting
mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare
mm,hwpoison: fix race with hugetlb page allocation
|
|
I see a "virt_to_phys used for non-linear address" warning from
check_usemap_section_nr() on arm64 platforms.
In current implementation of NODE_DATA, if CONFIG_NEED_MULTIPLE_NODES=y,
pglist_data is dynamically allocated and assigned to node_data[].
For example, in arch/arm64/include/asm/mmzone.h:
extern struct pglist_data *node_data[];
#define NODE_DATA(nid) (node_data[(nid)])
If CONFIG_NEED_MULTIPLE_NODES=n, pglist_data is defined as a global
variable named "contig_page_data".
For example, in include/linux/mmzone.h:
extern struct pglist_data contig_page_data;
#define NODE_DATA(nid) (&contig_page_data)
If CONFIG_DEBUG_VIRTUAL is not enabled, __pa() can handle both
dynamically allocated linear addresses and symbol addresses. However,
if (CONFIG_DEBUG_VIRTUAL=y && CONFIG_NEED_MULTIPLE_NODES=n) we can see
the "virt_to_phys used for non-linear address" warning because that
&contig_page_data is not a linear address on arm64.
Warning message:
virt_to_phys used for non-linear address: (contig_page_data+0x0/0x1c00)
WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x58/0x68
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Tainted: G W 5.13.0-rc1-00074-g1140ab592e2e #3
Hardware name: linux,dummy-virt (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)
Call trace:
__virt_to_phys+0x58/0x68
check_usemap_section_nr+0x50/0xfc
sparse_init_nid+0x1ac/0x28c
sparse_init+0x1c4/0x1e0
bootmem_init+0x60/0x90
setup_arch+0x184/0x1f0
start_kernel+0x78/0x488
To fix it, create a small function to handle both translation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miles Chen <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Kazu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When debugging the bug reported by Wang Yugui [1], try_to_unmap() may
fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however
it may miss the failure when head page is unmapped but other subpage is
mapped. Then the second DEBUG_VM BUG() that check total mapcount would
catch it. This may incur some confusion.
As this is not a fatal issue, so consolidate the two DEBUG_VM checks
into one VM_WARN_ON_ONCE_PAGE().
[1] https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
it, but before its page_remove_rmap() gets to decrement
compound_mapcount: generating false "BUG: Bad page cache" reports that
the page is still mapped when deleted. This commit fixes that, but not
in the way I hoped.
The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has
often been an annoyance that we usually call unmap_mapping_range() with
no pages locked, but there apply it to a single locked page.
try_to_unmap() looks more suitable for a single locked page.
However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
it is used to insert THP migration entries, but not used to unmap THPs.
Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
needs are different, I'm too ignorant of the DAX cases, and couldn't
decide how far to go for anon+swap. Set that aside.
The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end. Nice. But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage. It would need serious help to get working on powerpc (with a
minor optimization issue on s390 too). Set that aside.
Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.
This successful iteration introduces "unmap_mapping_page(page)" instead
of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
with an addition to details. Then zap_pmd_range() watches for this
case, and does spin_unlock(pmd_lock) if so - just like
page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but
safe.
Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
assert its interface; but currently that's only used to make sure that
page->mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not. Along these lines, in invalidate_inode_pages2_range()
move the initial unmap_mapping_range() out from under page lock, before
then calling unmap_mapping_page() under page lock if still mapped.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: fc127da085c2 ("truncate: handle file thp")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jue Wang <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Anon THP tails were already supported, but memory-failure may need to
use page_address_in_vma() on file THP tails, which its page->mapping
check did not permit: fix it.
hughd adds: no current usage is known to hit the issue, but this does
fix a subtle trap in a general helper: best fixed in stable sooner than
later.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Jue Wang <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wang Yugui <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|