Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull drm fixes from Dave Airlie:
"A bunch of misc fixes across the board.
amdgpu is the usual bulk with a revert and other fixes, nouveau has a
race fix that was causing a UAF that was hard hanging systems,
otherwise some qaic, bridge and radeon.
amdgpu:
- GFX9 preemption fixes
- Add missing radeon secondary PCI ID
- vblflash fixes
- SMU 13 fix
- VCN 4.0 fix
- Re-enable TOPDOWN flag for large BAR systems to fix regression
- eDP fix
- PSR hang fix
- DPIA fix
radeon:
- fbdev client warning fix
qaic:
- leak fix
- null ptr deref fix
nouveau:
- use-after-free caused by fence race fix
- runtime pm fix
- NULL ptr checks
bridge:
- ti-sn65dsi86: Avoid possible buffer overflow"
* tag 'drm-fixes-2023-06-17' of git://anongit.freedesktop.org/drm/drm: (21 commits)
nouveau: fix client work fence deletion race
drm/amd/display: limit DPIA link rate to HBR3
drm/amd/display: fix the system hang while disable PSR
drm/amd/display: edp do not add non-edid timings
Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"
drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1
drm/radeon: Disable outputs when releasing fbdev client
drm/amd/pm: workaround for compute workload type on some skus
drm/amd: Tighten permissions on VBIOS flashing attributes
drm/amd: Make sure image is written to trigger VBIOS image update flow
drm/amdgpu: add missing radeon secondary PCI ID
drm/amdgpu: Implement gfx9 patch functions for resubmission
drm/amdgpu: Modify indirect buffer packages for resubmission
drm/amdgpu: Program gds backup address as zero if no gds allocated
drm/nouveau: add nv_encoder pointer check for NULL
drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled
drm/nouveau/dp: check for NULL nv_connector->native_mode
drm/bridge: ti-sn65dsi86: Avoid possible buffer overflow
drm/nouveau: don't detect DSM for non-NVIDIA device
accel/qaic: Fix NULL pointer deref in qaic_destroy_drm_device()
...
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes maybe in time for v6.4-rc7:
- qaic leak and null deref fix.
- Fix runtime pm in nouveau.
- Fix array overflow in ti-sn65dsi86 pwm chip handling.
- Assorted null check fixes in nouveau.
Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
ASO query can be scheduled in atomic context as such it can't use usleep.
Use udelay as recommended in Documentation/timers/timers-howto.rst.
Fixes: 76e463f6508b ("net/mlx5e: Overcome slow response for first IPsec ASO WQE")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really
need to hold lock while modifying flow steering rules to drop traffic.
That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit()
work will be canceled anyway and won't run in parallel.
Fixes: b2f7b01d36a9 ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Previously during mlx5e_ipsec_handle_event the driver tried to execute
an operation that could sleep, while holding a spinlock, which caused
the kernel panic mentioned below.
Move the function call that can sleep outside of the spinlock context.
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x6c
__schedule_bug.cold+0x42/0x4e
schedule_debug.constprop.0+0xe0/0x118
__schedule+0x59/0x58a
? __mod_timer+0x2a1/0x3ef
schedule+0x5e/0xd4
schedule_timeout+0x99/0x164
? __pfx_process_timeout+0x10/0x10
__wait_for_common+0x90/0x1da
? __pfx_schedule_timeout+0x10/0x10
wait_func+0x34/0x142 [mlx5_core]
mlx5_cmd_invoke+0x1f3/0x313 [mlx5_core]
cmd_exec+0x1fe/0x325 [mlx5_core]
mlx5_cmd_do+0x22/0x50 [mlx5_core]
mlx5_cmd_exec+0x1c/0x40 [mlx5_core]
mlx5_modify_ipsec_obj+0xb2/0x17f [mlx5_core]
mlx5e_ipsec_update_esn_state+0x69/0xf0 [mlx5_core]
? wake_affine+0x62/0x1f8
mlx5e_ipsec_handle_event+0xb1/0xc0 [mlx5_core]
process_one_work+0x1e2/0x3e6
? __pfx_worker_thread+0x10/0x10
worker_thread+0x54/0x3ad
? __pfx_worker_thread+0x10/0x10
kthread+0xda/0x101
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x37
</TASK>
BUG: workqueue leaked lock or atomic: kworker/u256:4/0x7fffffff/189754#012 last function: mlx5e_ipsec_handle_event [mlx5_core]
CPU: 66 PID: 189754 Comm: kworker/u256:4 Kdump: loaded Tainted: G W 6.2.0-2596.20230309201517_5.el8uek.rc1.x86_64 #2
Hardware name: Oracle Corporation ORACLE SERVER X9-2/ASMMBX9-2, BIOS 61070300 08/17/2022
Workqueue: mlx5e_ipsec: eth%d mlx5e_ipsec_handle_event [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x6c
process_one_work.cold+0x2b/0x3c
? __pfx_worker_thread+0x10/0x10
worker_thread+0x54/0x3ad
? __pfx_worker_thread+0x10/0x10
kthread+0xda/0x101
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x37
</TASK>
BUG: scheduling while atomic: kworker/u256:4/189754/0x00000000
Fixes: cee137a63431 ("net/mlx5e: Handle ESN update events")
Signed-off-by: Patrisious Haddad <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
XFRM core provides two callbacks to release resources, one is .xdo_dev_policy_delete()
and another is .xdo_dev_policy_free(). This separation allows delayed release so
"ip xfrm policy free" commands won't starve. Unfortunately, mlx5 command interface
can't run in .xdo_dev_policy_free() callbacks as the latter runs in ATOMIC context.
BUG: scheduling while atomic: swapper/7/0/0x00000100
Modules linked in: act_mirred act_tunnel_key cls_flower sch_ingress vxlan mlx5_vdpa vringh vhost_iotlb vdpa rpcrdma rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.3.0+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x33/0x50
__schedule_bug+0x4e/0x60
__schedule+0x5d5/0x780
? __mod_timer+0x286/0x3d0
schedule+0x50/0x90
schedule_timeout+0x7c/0xf0
? __bpf_trace_tick_stop+0x10/0x10
__wait_for_common+0x88/0x190
? usleep_range_state+0x90/0x90
cmd_exec+0x42e/0xb40 [mlx5_core]
mlx5_cmd_do+0x1e/0x40 [mlx5_core]
mlx5_cmd_exec+0x18/0x30 [mlx5_core]
mlx5_cmd_delete_fte+0xa8/0xd0 [mlx5_core]
del_hw_fte+0x60/0x120 [mlx5_core]
mlx5_del_flow_rules+0xec/0x270 [mlx5_core]
? default_send_IPI_single_phys+0x26/0x30
mlx5e_accel_ipsec_fs_del_pol+0x1a/0x60 [mlx5_core]
mlx5e_xfrm_free_policy+0x15/0x20 [mlx5_core]
xfrm_policy_destroy+0x5a/0xb0
xfrm4_dst_destroy+0x7b/0x100
dst_destroy+0x37/0x120
rcu_core+0x2d6/0x540
__do_softirq+0xcd/0x273
irq_exit_rcu+0x82/0xb0
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:default_idle+0x13/0x20
Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 7a 4d ee 00 85 c0 7e 07 0f 00 2d 2f 98 2e 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 40 b4 02 00
RSP: 0018:ffff888100843ee0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffff888100812b00 RCX: 4000000000000000
RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000002d2ec
RBP: 0000000000000007 R08: 00000021daeded59 R09: 0000000000000001
R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
default_idle_call+0x30/0xb0
do_idle+0x1c1/0x1d0
cpu_startup_entry+0x19/0x20
start_secondary+0xfe/0x120
secondary_startup_64_no_verify+0xf3/0xfb
</TASK>
bad: scheduling from the idle thread!
Fixes: a5b8ca9471d3 ("net/mlx5e: Add XFRM policy offload logic")
Signed-off-by: Leon Romanovsky <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The kernel IRQ system needs the irq affinity notifier to be clear
before attempting to free the irq, see WARN_ON log below.
On a normal driver unload we don't have this issue since we do the
complete cleanup of the irq resources.
To fix this, put the important resources cleanup in a helper function
and use it in both normal driver unload and shutdown flows.
[ 4497.498434] ------------[ cut here ]------------
[ 4497.498726] WARNING: CPU: 0 PID: 9 at kernel/irq/manage.c:2034 free_irq+0x295/0x340
[ 4497.499193] Modules linked in:
[ 4497.499386] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.4.0-rc4+ #10
[ 4497.499876] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 4497.500518] Workqueue: events do_poweroff
[ 4497.500849] RIP: 0010:free_irq+0x295/0x340
[ 4497.501132] Code: 85 c0 0f 84 1d ff ff ff 48 89 ef ff d0 0f 1f 00 e9 10 ff ff ff 0f 0b e9 72 ff ff ff 49 8d 7f 28 ff d0 0f 1f 00 e9 df fd ff ff <0f> 0b 48 c7 80 c0 008
[ 4497.502269] RSP: 0018:ffffc90000053da0 EFLAGS: 00010282
[ 4497.502589] RAX: ffff888100949600 RBX: ffff88810330b948 RCX: 0000000000000000
[ 4497.503035] RDX: ffff888100949600 RSI: ffff888100400490 RDI: 0000000000000023
[ 4497.503472] RBP: ffff88810330c7e0 R08: ffff8881004005d0 R09: ffffffff8273a260
[ 4497.503923] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881009ae000
[ 4497.504359] R13: ffff8881009ae148 R14: 0000000000000000 R15: ffff888100949600
[ 4497.504804] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[ 4497.505302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4497.505671] CR2: 00007fce98806298 CR3: 000000000262e005 CR4: 0000000000370ef0
[ 4497.506104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4497.506540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4497.507002] Call Trace:
[ 4497.507158] <TASK>
[ 4497.507299] ? free_irq+0x295/0x340
[ 4497.507522] ? __warn+0x7c/0x130
[ 4497.507740] ? free_irq+0x295/0x340
[ 4497.507963] ? report_bug+0x171/0x1a0
[ 4497.508197] ? handle_bug+0x3c/0x70
[ 4497.508417] ? exc_invalid_op+0x17/0x70
[ 4497.508662] ? asm_exc_invalid_op+0x1a/0x20
[ 4497.508926] ? free_irq+0x295/0x340
[ 4497.509146] mlx5_irq_pool_free_irqs+0x48/0x90
[ 4497.509421] mlx5_irq_table_free_irqs+0x38/0x50
[ 4497.509714] mlx5_core_eq_free_irqs+0x27/0x40
[ 4497.509984] shutdown+0x7b/0x100
[ 4497.510184] pci_device_shutdown+0x30/0x60
[ 4497.510440] device_shutdown+0x14d/0x240
[ 4497.510698] kernel_power_off+0x30/0x70
[ 4497.510938] process_one_work+0x1e6/0x3e0
[ 4497.511183] worker_thread+0x49/0x3b0
[ 4497.511407] ? __pfx_worker_thread+0x10/0x10
[ 4497.511679] kthread+0xe0/0x110
[ 4497.511879] ? __pfx_kthread+0x10/0x10
[ 4497.512114] ret_from_fork+0x29/0x50
[ 4497.512342] </TASK>
Fixes: 9c2d08010963 ("net/mlx5: Free irqs only on shutdown callback")
Signed-off-by: Saeed Mahameed <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
|
|
When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local
variable was passed as its HW action data, resulting in attempt to
free invalid address:
BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core]
Fixes: 4781df92f4da ("net/mlx5: DR, Move STEv0 modify header logic")
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Alex Vesker <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
In some cases, steering might need to use SW-created action in
FW table, which results in wrong packet reformat being used:
mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154):
SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed,
status bad resource(0×5), syndrome (0xf2ff71)
This patch adds support for usage of SW-created packet reformat (encap)
actions in FW tables, and adds clear error flow for attempt to use
SW-created modify header on FW tables.
Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Erez Shitrit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The cited commit removes special handling of CT action. But it
removes too much. Pre ct/ct_nat tables and some other resources
are not destroyed due to the cited commit.
Fix it by adding it back.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The cited commits add hardware miss support to tc action. But if
the rules can't be offloaded, the pointers are null and system
will panic when accessing them.
Fix it by checking null pointer.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action")
Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When a PCI device has just one msix vector available, we want to share
this vector between async and completion events. Current code fails to
do that assuming it will always have at least one dedicated vector for
completion events. Fix this by detecting when the pool contains just a
single vector.
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The cited commit missed setting napi_id on XSK RQs, it only affected
regular RQs. Add the missing part to support socket busy polling on XSK
RQs.
Fixes: a2740f529da2 ("net/mlx5e: xsk: Set napi_id to support busy polling")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The cited commits missed passing frag_size to __xdp_rxq_info_reg, which
is required by bpf_xdp_adjust_tail to support growing the tail pointer
in fragmented packets. Pass the missing parameter when the current RQ
mode allows XDP multi buffer.
Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Fixes: 9cb9482ef10e ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Cc: Tariq Toukan <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few small fixes. The only change to the core code is for a
minor race in ALSA OSS sequencer, and the rest are all device-specific
fixes (regression fixes and a usual quirk)"
* tag 'sound-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add quirk flag for HEM devices to enable native DSD playback
ALSA: usb-audio: Fix broken resume due to UAC3 power state
ALSA: seq: oss: Fix racy open/close of MIDI devices
ASoC: tegra: Fix Master Volume Control
ALSA: hda/realtek: Add a quirk for Compaq N14JP6
firmware: cs_dsp: Log correct region name in bin error messages
|
|
Move allocation code down to avoid memory leak.
Fixes: 29f54745f245 ("iommu/amd: Add missing domain type checks")
Signed-off-by: Su Hui <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jerry Snitselaar <[email protected]>
Reviewed-by: Vasant Hegde <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and netfilter.
Selftests excluded - we have 58 patches and diff of +442/-199, which
isn't really small but perhaps with the exception of the WiFi locking
change it's old(ish) bugs.
We have no known problems with v6.4.
The selftest changes are rather large as MPTCP folks try to apply
Greg's guidance that selftest from torvalds/linux should be able to
run against stable kernels.
Last thing I should call out is the DCCP/UDP-lite deprecation notices.
We are fairly sure those are dead, but if we're wrong reverting them
back in won't be fun.
Current release - regressions:
- wifi:
- cfg80211: fix double lock bug in reg_wdev_chan_valid()
- iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
Current release - new code bugs:
- handshake: remove fput() that causes use-after-free
Previous releases - regressions:
- sched: cls_u32: fix reference counter leak leading to overflow
- sched: cls_api: fix lockup on flushing explicitly created chain
Previous releases - always broken:
- nf_tables: integrate pipapo into commit protocol
- nf_tables: incorrect error path handling with NFT_MSG_NEWRULE, fix
dangling pointer on failure
- ping6: fix send to link-local addresses with VRF
- sched: act_pedit: parse L3 header for L4 offset, the skb may not
have the offset saved
- sched: act_ct: fix promotion of offloaded unreplied tuple
- sched: refuse to destroy an ingress and clsact Qdiscs if there are
lockless change operations in flight
- wifi: mac80211: fix handful of bugs in multi-link operation
- ipvlan: fix bound dev checking for IPv6 l3s mode
- eth: enetc: correct the indexes of highest and 2nd highest TCs
- eth: ice: fix XDP memory leak when NIC is brought up and down
Misc:
- add deprecation notices for UDP-lite and DCCP
- selftests: mptcp: skip tests not supported by old kernels
- sctp: handle invalid error codes without calling BUG()"
* tag 'net-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
dccp: Print deprecation notice.
udplite: Print deprecation notice.
octeon_ep: Add missing check for ioremap
selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET
net: ethernet: stmicro: stmmac: fix possible memory leak in __stmmac_open
net: tipc: resize nlattr array to correct size
sfc: fix XDP queues mode with legacy IRQ
net: macsec: fix double free of percpu stats
net: lapbether: only support ethernet devices
MAINTAINERS: add reviewers for SMC Sockets
s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit()
net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames
net/sched: cls_api: Fix lockup on flushing explicitly created chain
ice: Fix ice module unload
net/handshake: remove fput() that causes use-after-free
selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step
net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM thinp discard performance regression introduced during this
merge window where DM core was splitting large discards every 128K
(max_sectors_kb) rather than every 64M (discard_max_bytes).
- Extend DM core LOCKFS fix, made during 6.4 merge, to also fix race
between do_mount and dm's do_suspend (in addition to the earlier
fix's do_mount race with dm's do_resume).
- Fix DM thin metadata operations to first check if the thin-pool is in
"fail_io" mode; otherwise UAF can occur.
- Fix DM thinp's call to __blkdev_issue_discard to use GFP_NOIO rather
than GFP_NOWAIT (__blkdev_issue_discard cannot handle NULL return
from bio_alloc).
* tag 'for-6.4/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: use op specific max_sectors when splitting abnormal io
dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard
dm thin metadata: check fail_io before using data_sm
dm: don't lock fs when the map is NULL during suspend or resume
|
|
Pull rdma fixes from Jason Gunthorpe:
"This is an unusually large bunch of bug fixes for the later rc cycle,
rxe and mlx5 both dumped a lot of things at once. rxe continues to fix
itself, and mlx5 is fixing a bunch of "queue counters" related bugs.
There is one highly notable bug fix regarding the qkey. This small
security check was missed in the original 2005 implementation and it
allows some significant issues.
Summary:
- Two rtrs bug fixes for error unwind bugs
- Several rxe bug fixes:
* Incorrect Rx packet validation
* Using memory without a refcount
* Syzkaller found use before initialization
* Regression fix for missing locking with the tasklet conversion
from this merge window
- Have bnxt report the correct link properties to userspace, this was
a regression in v6.3
- Several mlx5 bug fixes:
* Kernel crash triggerable by userspace for the RAW ethernet
profile
* Defend against steering refcounting issues created by userspace
* Incorrect change of QP port affinity parameters in some LAG
configurations
- Fix mlx5 Q counters:
* Do not over allocate Q counters to allow userspace to use the
full port capacity
* Kernel crash triggered by eswitch due to mis-use of Q counters
* Incorrect mlx5_device for Q counters in some LAG configurations
- Properly implement the IBA spec restricting privileged qkeys to
root
- Always an error when reading from a disassociated device's event
queue
- isert bug fixes:
* Avoid a deadlock with the CM handler and CM ID destruction
* Correct list corruption due to incorrect locking
* Fix a use after free around connection tear down"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix rxe_cq_post
IB/isert: Fix incorrect release of isert connection
IB/isert: Fix possible list corruption in CMA handler
IB/isert: Fix dead lock in ib_isert
RDMA/mlx5: Fix affinity assignment
IB/uverbs: Fix to consider event queue closing also upon non-blocking mode
RDMA/uverbs: Restrict usage of privileged QKEYs
RDMA/cma: Always set static rate to 0 for RoCE
RDMA/mlx5: Fix Q-counters query in LAG mode
RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters
RDMA/mlx5: Fix Q-counters per vport allocation
RDMA/mlx5: Create an indirect flow table for steering anchor
RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions
RDMA/rxe: Fix the use-before-initialization error of resp_pkts
RDMA/bnxt_re: Fix reporting active_{speed,width} attributes
RDMA/rxe: Fix ref count error in check_rkey()
RDMA/rxe: Fix packet length checks
RDMA/rtrs: Fix rxe_dealloc_pd warning
RDMA/rtrs: Fix the last iu->buf leak in err path
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few more driver specific fixes.
The DesignWare fix is for an issue introduced by conversion to the
chip select accessor functions and is pretty important but the other
two are less severe"
* tag 'spi-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dw: Replace incorrect spi_get_chipselect with set
spi: fsl-dspi: avoid SCK glitches with continuous transfers
spi: cadence-quadspi: Add missing check for dma_set_mask
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"The set of regulators described for the Qualcomm PM8550 just seems to
have been completely wrong and would likely not have worked at all if
anything tried to actually configure anything except for enabling and
disabling at runtime"
* tag 'regulator-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix regulators for PM8550
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"Another fix for the maple tree cache, Takashi noticed that unlike
other caches the maple tree cache didn't check for read only registers
before trying to sync which would result in spurious syncs for read
only registers where we don't have a default.
This was due to the check being open coded in the caches, we now check
in the shared 'does this register need sync' function so that is fixed
for this and future caches"
* tag 'regmap-fix-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: regcache: Don't sync read-only registers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A fix for dvb-core to avoid a race condition during DVB board
registration"
* tag 'media/v6.4-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "media: dvb-core: Fix use-after-free on race condition at dvb_frontend"
|
|
This seems to have existed for ever but is now more apparant after
commit 9bff18d13473 ("drm/ttm: use per BO cleanup workers")
My analysis: two threads are running, one in the irq signalling the
fence, in dma_fence_signal_timestamp_locked, it has done the
DMA_FENCE_FLAG_SIGNALLED_BIT setting, but hasn't yet reached the
callbacks.
The second thread in nouveau_cli_work_ready, where it sees the fence is
signalled, so then puts the fence, cleanups the object and frees the
work item, which contains the callback.
Thread one goes again and tries to call the callback and causes the
use-after-free.
Proposed fix: lock the fence signalled check in nouveau_cli_work_ready,
so either the callbacks are done or the memory is freed.
Reviewed-by: Karol Herbst <[email protected]>
Fixes: 11e451e74050 ("drm/nouveau: remove fence wait code from deferred client work handler")
Cc: [email protected]
Signed-off-by: Dave Airlie <[email protected]>
Link: https://lore.kernel.org/dri-devel/[email protected]/
|
|
Add check for ioremap() and return the error if it fails in order to
guarantee the success of ioremap().
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Jiasheng Jiang <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix a possible memory leak in __stmmac_open when stmmac_init_phy fails.
It's also needed to free everything allocated by stmmac_setup_dma_desc
and not just the dma_conf struct.
Drop free_dma_desc_resources from __stmmac_open and correctly call
free_dma_desc_resources on each user of __stmmac_open on error.
Reported-by: Jose Abreu <[email protected]>
Fixes: ba39b344e924 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
Signed-off-by: Christian Marangi <[email protected]>
Cc: [email protected]
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Jose Abreu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Split abnormal IO in terms of the corresponding operation specific
max_sectors (max_discard_sectors, max_secure_erase_sectors or
max_write_zeroes_sectors).
This fixes a significant dm-thinp discard performance regression that
was introduced with commit e2dd8aca2d76 ("dm bio prison v1: improve
concurrent IO performance"). Relative to discard: max_discard_sectors
is used instead of max_sectors; which fixes excessive discard splitting
(e.g. max_sectors=128K vs max_discard_sectors=64M).
Tested by discarding an 1 Petabyte dm-thin device:
lvcreate -V 1125899906842624B -T test/pool -n thin
time blkdiscard /dev/test/thin
Before this fix (splitting discards every 128K): ~116m
After this fix (splitting discards every 64M) : 0m33.460s
Reported-by: Zorro Lang <[email protected]>
Fixes: 06961c487a33 ("dm: split discards further if target sets max_discard_granularity")
Requires: 13f6facf3fae ("dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE")
Fixes: e2dd8aca2d76 ("dm bio prison v1: improve concurrent IO performance")
Signed-off-by: Mike Snitzer <[email protected]>
|
|
issue_discard() passes GFP_NOWAIT to __blkdev_issue_discard() despite
its code assuming bio_alloc() always succeeds.
Commit 3dba53a958a75 ("dm thin: use __blkdev_issue_discard for async
discard support") clearly shows where things went bad:
Before commit 3dba53a958a75, dm-thin.c's open-coded
__blkdev_issue_discard_async() properly handled using GFP_NOWAIT.
Unfortunately __blkdev_issue_discard() doesn't and it was missed
during review.
Cc: [email protected]
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Must check pmd->fail_io before using pmd->data_sm since
pmd->data_sm may be destroyed by other processes.
P1(kworker) P2(message)
do_worker
process_prepared
process_prepared_discard_passdown_pt2
dm_pool_dec_data_range
pool_message
commit
dm_pool_commit_metadata
↓
// commit failed
metadata_operation_failed
abort_transaction
dm_pool_abort_metadata
__open_or_format_metadata
↓
dm_sm_disk_open
↓
// open failed
// pmd->data_sm is NULL
dm_sm_dec_blocks
↓
// try to access pmd->data_sm --> UAF
As shown above, if dm_pool_commit_metadata() and
dm_pool_abort_metadata() fail in pool_message process, kworker may
trigger UAF.
Fixes: be500ed721a6 ("dm space maps: improve performance with inc/dec on ranges of blocks")
Cc: [email protected]
Signed-off-by: Li Lingfeng <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
As described in commit 38d11da522aa ("dm: don't lock fs when the map is
NULL in process of resume"), a deadlock may be triggered between
do_resume() and do_mount().
This commit preserves the fix from commit 38d11da522aa but moves it to
where it also serves to fix a similar deadlock between do_suspend()
and do_mount(). It does so, if the active map is NULL, by clearing
DM_SUSPEND_LOCKFS_FLAG in dm_suspend() which is called by both
do_suspend() and do_resume().
Fixes: 38d11da522aa ("dm: don't lock fs when the map is NULL in process of resume")
Signed-off-by: Li Lingfeng <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
|
|
Since commit 955fb8719efb ("thermal/intel/intel_soc_dts_iosf: Use Intel
TCC library") intel_soc_dts_iosf is reporting the wrong temperature.
The driver expects tj_max to be in milli-degrees-celcius but after
the switch to the TCC library this is now in degrees celcius so
instead of e.g. 90000 it is set to 90 causing a temperature 45
degrees below tj_max to be reported as -44910 milli-degrees
instead of as 45000 milli-degrees.
Fix this by adding back the lost factor of 1000.
Fixes: 955fb8719efb ("thermal/intel/intel_soc_dts_iosf: Use Intel TCC library")
Reported-by: Bernhard Krug <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Zhang Rui <[email protected]>
Cc: 6.3+ <[email protected]> # 6.3+
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
The addition of might_sleep() to down_timeout() caused the latter to
enable interrupts unconditionally in some cases, which in turn broke
the ACPI S3 wakeup path in acpi_suspend_enter(), where down_timeout()
is called by acpi_disable_all_gpes() via acpi_ut_acquire_mutex().
Namely, if CONFIG_DEBUG_ATOMIC_SLEEP is set, might_sleep() causes
might_resched() to be used and if CONFIG_PREEMPT_VOLUNTARY is set,
this triggers __cond_resched() which may call preempt_schedule_common(),
so __schedule() gets invoked and it ends up with enabled interrupts (in
the prev == next case).
Now, enabling interrupts early in the S3 wakeup path causes the kernel
to crash.
Address this by modifying acpi_suspend_enter() to disable GPEs without
attempting to acquire the sleeping lock which is not needed in that code
path anyway.
Fixes: 99409b935c9a ("locking/semaphore: Add might_sleep() to down_*() family")
Reported-by: Srinivas Pandruvada <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: 5.15+ <[email protected]> # 5.15+
|
|
Now spi_geni_grab_gpi_chan() errors are correctly reported, the
-EPROBE_DEFER error should be returned from probe in case the
GPI dma driver is built as module and/or not probed yet.
Fixes: b59c122484ec ("spi: spi-geni-qcom: Add support for GPI dma")
Fixes: 6532582c353f ("spi: spi-geni-qcom: fix error handling in spi_geni_grab_gpi_chan()")
Signed-off-by: Neil Armstrong <[email protected]>
Link: https://lore.kernel.org/r/20230615-topic-sm8550-upstream-fix-spi-geni-qcom-probe-v2-1-670c3d9e8c9c@linaro.org
Signed-off-by: Mark Brown <[email protected]>
|
|
In systems without MSI-X capabilities, xdp_txq_queues_mode is calculated
in efx_allocate_msix_channels, but when enabling MSI-X fails, it was not
changed to a proper default value. This was leading to the driver
thinking that it has dedicated XDP queues, when it didn't.
Fix it by setting xdp_txq_queues_mode to the correct value if the driver
fallbacks to MSI or legacy IRQ mode. The correct value is
EFX_XDP_TX_QUEUES_BORROWED because there are no XDP dedicated queues.
The issue can be easily visible if the kernel is started with pci=nomsi,
then a call trace is shown. It is not shown only with sfc's modparam
interrupt_mode=2. Call trace example:
WARNING: CPU: 2 PID: 663 at drivers/net/ethernet/sfc/efx_channels.c:828 efx_set_xdp_channels+0x124/0x260 [sfc]
[...skip...]
Call Trace:
<TASK>
efx_set_channels+0x5c/0xc0 [sfc]
efx_probe_nic+0x9b/0x15a [sfc]
efx_probe_all+0x10/0x1a2 [sfc]
efx_pci_probe_main+0x12/0x156 [sfc]
efx_pci_probe_post_io+0x18/0x103 [sfc]
efx_pci_probe.cold+0x154/0x257 [sfc]
local_pci_probe+0x42/0x80
Fixes: 6215b608a8c4 ("sfc: last resort fallback for lack of xdp tx queues")
Reported-by: Yanghang Liu <[email protected]>
Signed-off-by: Íñigo Huguet <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Inside macsec_add_dev() we free percpu macsec->secy.tx_sc.stats and
macsec->stats on some of the memory allocation failure paths. However, the
net_device is already registered to that moment: in macsec_newlink(), just
before calling macsec_add_dev(). This means that during unregister process
its priv_destructor - macsec_free_netdev() - will be called and will free
the stats again.
Remove freeing percpu stats inside macsec_add_dev() because
macsec_free_netdev() will correctly free the already allocated ones. The
pointers to unallocated stats stay NULL, and free_percpu() treats that
correctly.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 0a28bfd4971f ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support")
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Fedor Pchelkin <[email protected]>
Reviewed-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
It probbaly makes no sense to support arbitrary network devices
for lapbether.
syzbot reported:
skbuff: skb_under_panic: text:ffff80008934c100 len:44 put:40 head:ffff0000d18dd200 data:ffff0000d18dd1ea tail:0x16 end:0x140 dev:bond1
kernel BUG at net/core/skbuff.c:200 !
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 5643 Comm: dhcpcd Not tainted 6.4.0-rc5-syzkaller-g4641cff8e810 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : skb_panic net/core/skbuff.c:196 [inline]
pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:210
lr : skb_panic net/core/skbuff.c:196 [inline]
lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:210
sp : ffff8000973b7260
x29: ffff8000973b7270 x28: ffff8000973b7360 x27: dfff800000000000
x26: ffff0000d85d8150 x25: 0000000000000016 x24: ffff0000d18dd1ea
x23: ffff0000d18dd200 x22: 000000000000002c x21: 0000000000000140
x20: 0000000000000028 x19: ffff80008934c100 x18: ffff8000973b68a0
x17: 0000000000000000 x16: ffff80008a43bfbc x15: 0000000000000202
x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000001
x11: 0000000000000201 x10: 0000000000000000 x9 : f22f7eb937cced00
x8 : f22f7eb937cced00 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff8000973b6b78 x4 : ffff80008df9ee80 x3 : ffff8000805974f4
x2 : 0000000000000001 x1 : 0000000100000201 x0 : 0000000000000086
Call trace:
skb_panic net/core/skbuff.c:196 [inline]
skb_under_panic+0x13c/0x140 net/core/skbuff.c:210
skb_push+0xf0/0x108 net/core/skbuff.c:2409
ip6gre_header+0xbc/0x738 net/ipv6/ip6_gre.c:1383
dev_hard_header include/linux/netdevice.h:3137 [inline]
lapbeth_data_transmit+0x1c4/0x298 drivers/net/wan/lapbether.c:257
lapb_data_transmit+0x8c/0xb0 net/lapb/lapb_iface.c:447
lapb_transmit_buffer+0x178/0x204 net/lapb/lapb_out.c:149
lapb_send_control+0x220/0x320 net/lapb/lapb_subr.c:251
lapb_establish_data_link+0x94/0xec
lapb_device_event+0x348/0x4e0
notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93
raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461
__dev_notify_flags+0x2bc/0x544
dev_change_flags+0xd0/0x15c net/core/dev.c:8643
devinet_ioctl+0x858/0x17e4 net/ipv4/devinet.c:1150
inet_ioctl+0x2ac/0x4d8 net/ipv4/af_inet.c:979
sock_do_ioctl+0x134/0x2dc net/socket.c:1201
sock_ioctl+0x4ec/0x858 net/socket.c:1318
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:856
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191
el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Code: aa1803e6 aa1903e7 a90023f5 947730f5 (d4210000)
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Martin Schiller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch prevents the system from crashing when unloading the ISM module.
How to reproduce: Attach an ISM device and execute 'rmmod ism'.
Error-Log:
- Trying to free already-free IRQ 0
- WARNING: CPU: 1 PID: 966 at kernel/irq/manage.c:1890 free_irq+0x140/0x540
After calling ism_dev_exit() for each ISM device in the exit routine,
pci_unregister_driver() will execute ism_remove() for each ISM device.
Because ism_remove() also calls ism_dev_exit(),
free_irq(pci_irq_vector(pdev, 0), ism) is called twice for each ISM
device. This results in a crash with the error
'Trying to free already-free IRQ'.
In the exit routine, it is enough to call pci_unregister_driver()
because it ensures that ism_dev_exit() is called once per
ISM device.
Cc: <[email protected]> # 6.3+
Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration")
Reviewed-by: Niklas Schnelle <[email protected]>
Signed-off-by: Julian Ruess <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The DEV_MAC_MAXLEN_CFG register contains a 16-bit value - up to 65535.
Plus 2 * VLAN_HLEN (4), that is up to 65543.
The picos_per_byte variable is the largest when "speed" is lowest -
SPEED_10 = 10. In that case it is (1000000L * 8) / 10 = 800000.
Their product - 52434400000 - exceeds 32 bits, which is a problem,
because apparently, a multiplication between two 32-bit factors is
evaluated as 32-bit before being assigned to a 64-bit variable.
In fact it's a problem for any MTU value larger than 5368.
Cast one of the factors of the multiplication to u64 to force the
multiplication to take place on 64 bits.
Issue found by Coverity.
Fixes: 55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Clearing the interrupt scheme before PFR reset,
during the removal routine, could cause the hardware
errors and possibly lead to system reboot, as the PF
reset can cause the interrupt to be generated.
Place the call for PFR reset inside ice_deinit_dev(),
wait until reset and all pending transactions are done,
then call ice_clear_interrupt_scheme().
This introduces a PFR reset to multiple error paths.
Additionally, remove the call for the reset from
ice_load() - it will be a part of ice_unload() now.
Error example:
[ 75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
[ 77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 77.571418] {1}[Hardware Error]: event severity: recoverable
[ 77.571459] {1}[Hardware Error]: Error 0, type: recoverable
[ 77.571500] {1}[Hardware Error]: section_type: PCIe error
[ 77.571540] {1}[Hardware Error]: port_type: 4, root port
[ 77.571580] {1}[Hardware Error]: version: 3.0
[ 77.571615] {1}[Hardware Error]: command: 0x0547, status: 0x4010
[ 77.571661] {1}[Hardware Error]: device_id: 0000:c9:02.0
[ 77.571703] {1}[Hardware Error]: slot: 25
[ 77.571736] {1}[Hardware Error]: secondary_bus: 0xca
[ 77.571773] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
[ 77.571821] {1}[Hardware Error]: class_code: 060400
[ 77.571858] {1}[Hardware Error]: bridge: secondary_status: 0x2800, control: 0x0013
[ 77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[ 77.572870] pcieport 0000:c9:02.0: [21] ACSViol (First)
[ 77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
[ 77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 77.691738] {2}[Hardware Error]: event severity: recoverable
[ 77.691971] {2}[Hardware Error]: Error 0, type: recoverable
[ 77.692192] {2}[Hardware Error]: section_type: PCIe error
[ 77.692403] {2}[Hardware Error]: port_type: 4, root port
[ 77.692616] {2}[Hardware Error]: version: 3.0
[ 77.692825] {2}[Hardware Error]: command: 0x0547, status: 0x4010
[ 77.693032] {2}[Hardware Error]: device_id: 0000:c9:02.0
[ 77.693238] {2}[Hardware Error]: slot: 25
[ 77.693440] {2}[Hardware Error]: secondary_bus: 0xca
[ 77.693641] {2}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
[ 77.693853] {2}[Hardware Error]: class_code: 060400
[ 77.694054] {2}[Hardware Error]: bridge: secondary_status: 0x0800, control: 0x0013
[ 77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
[ 77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
[ 77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
[ 77.719390] pcieport 0000:c9:02.0: [21] ACSViol (First)
[ 77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Signed-off-by: Jakub Buchocki <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-06-12 (igc, igb)
This series contains updates to igc and igb drivers.
Husaini clears Tx rings when interface is brought down for igc.
Vinicius disables PTM and PCI busmaster when removing igc driver.
Alex adds error check and path for NVM read error on igb.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igb: fix nvm.ops.read() error handling
igc: Fix possible system crash when loading module
igc: Clean the TX buffer and TX descriptor ring
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A couple of straggler fixes, mostly in the stack:
- fix fragmentation for multi-link related elements
- fix callback copy/paste error
- fix multi-link locking
- remove double-locking of wiphy mutex
- transmit only on active links, not all
- activate links in the correct order
- don't remove links that weren't added
- disable soft-IRQs for LQ lock in iwlwifi
* tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
wifi: mac80211: fragment per STA profile correctly
wifi: mac80211: Use active_links instead of valid_links in Tx
wifi: cfg80211: remove links only on AP
wifi: mac80211: take lock before setting vif links
wifi: cfg80211: fix link del callback to call correct handler
wifi: mac80211: fix link activation settings order
wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The kernel test robot reported sparse warnings regarding incorrect type
assignment for __be16 variables in bsg loopback path.
Change the flagged lines to use the be16_to_cpu() and cpu_to_be16() macros
appropriately.
Signed-off-by: Justin Tee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
In the error exits in target_setup_session(), if a branch is taken to
free_sess: transport_free_session() may call to target_free_cmd_counter()
and then fall through to call target_free_cmd_counter() a second time.
This can, and does, sometimes cause seg faults since the data field in
cmd_cnt->refcnt has been freed in the first call.
Fix this problem by simply returning after the call to
transport_free_session(). The second call is redundant for those cases.
Fixes: 4edba7e4a8f3 ("scsi: target: Move cmd counter allocation")
Signed-off-by: Bob Pearson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Hyper-V synthetic SCSI devices do not support the MAINTENANCE_IN SCSI
command, so scsi_report_opcode() always fails, resulting in messages like
this:
hv_storvsc <guid>: tag#205 cmd 0xa3 status: scsi 0x2 srb 0x86 hv 0xc0000001
The recently added support for command duration limits calls
scsi_report_opcode() four times as each device comes online, which
significantly increases the number of messages logged in a system with many
disks.
Fix the problem by always marking Hyper-V synthetic SCSI devices as not
supporting scsi_report_opcode(). With this setting, the MAINTENANCE_IN SCSI
command is not issued and no messages are logged.
Signed-off-by: Michael Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Fix the I/O hang that arises because of the MSIx vector not having a mapped
online CPU upon receiving completion.
SCSI cmds take the blk_mq route, which is setup during init. Reserved cmds
fetch the vector_no from mq_map after init is complete. Before init, they
have to use 0 - as per the norm.
Reviewed-by: Gilbert Wu <[email protected]>
Signed-off-by: Sagar Biradar <[email protected]>
Reviewed-by: John Garry <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
sparse points out an embarrasing bug in an older patch of mine,
which uses the register offset instead of an __iomem pointer:
drivers/clk/pxa/clk-pxa3xx.c:167:9: sparse: sparse: Using plain integer as NULL pointer
Unlike sparse, gcc and clang ignore this bug and fail to warn
because a literal '0' is considered a valid representation of
a NULL pointer.
Fixes: 3c816d950a49 ("ARM: pxa: move clk register definitions to driver")
Cc: [email protected]
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>
|
|
As reported by Thomas Voegtle <[email protected]>, sometimes a DVB card does
not initialize properly booting Linux 6.4-rc4. This is not always, maybe
in 3 out of 4 attempts.
After double-checking, the root cause seems to be related to the
UAF fix, which is causing a race issue:
[ 26.332149] tda10071 7-0005: found a 'NXP TDA10071' in cold state, will try to load a firmware
[ 26.340779] tda10071 7-0005: downloading firmware from file 'dvb-fe-tda10071.fw'
[ 989.277402] INFO: task vdr:743 blocked for more than 491 seconds.
[ 989.283504] Not tainted 6.4.0-rc5-i5 #249
[ 989.288036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 989.295860] task:vdr state:D stack:0 pid:743 ppid:711 flags:0x00004002
[ 989.295865] Call Trace:
[ 989.295867] <TASK>
[ 989.295869] __schedule+0x2ea/0x12d0
[ 989.295877] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 989.295881] schedule+0x57/0xc0
[ 989.295884] schedule_preempt_disabled+0xc/0x20
[ 989.295887] __mutex_lock.isra.16+0x237/0x480
[ 989.295891] ? dvb_get_property.isra.10+0x1bc/0xa50
[ 989.295898] ? dvb_frontend_stop+0x36/0x180
[ 989.338777] dvb_frontend_stop+0x36/0x180
[ 989.338781] dvb_frontend_open+0x2f1/0x470
[ 989.338784] dvb_device_open+0x81/0xf0
[ 989.338804] ? exact_lock+0x20/0x20
[ 989.338808] chrdev_open+0x7f/0x1c0
[ 989.338811] ? generic_permission+0x1a2/0x230
[ 989.338813] ? link_path_walk.part.63+0x340/0x380
[ 989.338815] ? exact_lock+0x20/0x20
[ 989.338817] do_dentry_open+0x18e/0x450
[ 989.374030] path_openat+0xca5/0xe00
[ 989.374031] ? terminate_walk+0xec/0x100
[ 989.374034] ? path_lookupat+0x93/0x140
[ 989.374036] do_filp_open+0xc0/0x140
[ 989.374038] ? __call_rcu_common.constprop.91+0x92/0x240
[ 989.374041] ? __check_object_size+0x147/0x260
[ 989.374043] ? __check_object_size+0x147/0x260
[ 989.374045] ? alloc_fd+0xbb/0x180
[ 989.374048] ? do_sys_openat2+0x243/0x310
[ 989.374050] do_sys_openat2+0x243/0x310
[ 989.374052] do_sys_open+0x52/0x80
[ 989.374055] do_syscall_64+0x5b/0x80
[ 989.421335] ? __task_pid_nr_ns+0x92/0xa0
[ 989.421337] ? syscall_exit_to_user_mode+0x20/0x40
[ 989.421339] ? do_syscall_64+0x67/0x80
[ 989.421341] ? syscall_exit_to_user_mode+0x20/0x40
[ 989.421343] ? do_syscall_64+0x67/0x80
[ 989.421345] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 989.421348] RIP: 0033:0x7fe895d067e3
[ 989.421349] RSP: 002b:00007fff933c2ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[ 989.421351] RAX: ffffffffffffffda RBX: 00007fff933c2c10 RCX: 00007fe895d067e3
[ 989.421352] RDX: 0000000000000802 RSI: 00005594acdce160 RDI: 00000000ffffff9c
[ 989.421353] RBP: 0000000000000802 R08: 0000000000000000 R09: 0000000000000000
[ 989.421353] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001
[ 989.421354] R13: 00007fff933c2ca0 R14: 00000000ffffffff R15: 00007fff933c2c90
[ 989.421355] </TASK>
This reverts commit 6769a0b7ee0c3b31e1b22c3fadff2bfb642de23f.
Fixes: 6769a0b7ee0c ("media: dvb-core: Fix use-after-free on race condition at dvb_frontend")
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
A recent patch replaced a tasklet execution of cq->comp_handler by a
direct call. While this made sense it let changes to cq->notify state be
unprotected and assumed that the cq completion machinery and the ulp done
callbacks were reentrant. The result is that in some cases completion
events can be lost. This patch moves the cq->comp_handler call inside of
the spinlock in rxe_cq_post which solves both issues. This is compatible
with the matching code in the request notify verb.
Fixes: 78b26a335310 ("RDMA/rxe: Remove tasklet call from rxe_cq.c")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The call to mmc_request_done() can schedule, so it must not be called
from irq context. Wake the irq thread if it needs to be called, and let
its existing logic do its work.
Fixes the following kernel bug, which appears when running an RT patched
kernel on the AmLogic Meson AXG A113X SoC:
[ 11.111407] BUG: scheduling while atomic: kworker/0:1H/75/0x00010001
[ 11.111438] Modules linked in:
[ 11.111451] CPU: 0 PID: 75 Comm: kworker/0:1H Not tainted 6.4.0-rc3-rt2-rtx-00081-gfd07f41ed6b4-dirty #1
[ 11.111461] Hardware name: RTX AXG A113X Linux Platform Board (DT)
[ 11.111469] Workqueue: kblockd blk_mq_run_work_fn
[ 11.111492] Call trace:
[ 11.111497] dump_backtrace+0xac/0xe8
[ 11.111510] show_stack+0x18/0x28
[ 11.111518] dump_stack_lvl+0x48/0x60
[ 11.111530] dump_stack+0x18/0x24
[ 11.111537] __schedule_bug+0x4c/0x68
[ 11.111548] __schedule+0x80/0x574
[ 11.111558] schedule_loop+0x2c/0x50
[ 11.111567] schedule_rtlock+0x14/0x20
[ 11.111576] rtlock_slowlock_locked+0x468/0x730
[ 11.111587] rt_spin_lock+0x40/0x64
[ 11.111596] __wake_up_common_lock+0x5c/0xc4
[ 11.111610] __wake_up+0x18/0x24
[ 11.111620] mmc_blk_mq_req_done+0x68/0x138
[ 11.111633] mmc_request_done+0x104/0x118
[ 11.111644] meson_mmc_request_done+0x38/0x48
[ 11.111654] meson_mmc_irq+0x128/0x1f0
[ 11.111663] __handle_irq_event_percpu+0x70/0x114
[ 11.111674] handle_irq_event_percpu+0x18/0x4c
[ 11.111683] handle_irq_event+0x80/0xb8
[ 11.111691] handle_fasteoi_irq+0xa4/0x120
[ 11.111704] handle_irq_desc+0x20/0x38
[ 11.111712] generic_handle_domain_irq+0x1c/0x28
[ 11.111721] gic_handle_irq+0x8c/0xa8
[ 11.111735] call_on_irq_stack+0x24/0x4c
[ 11.111746] do_interrupt_handler+0x88/0x94
[ 11.111757] el1_interrupt+0x34/0x64
[ 11.111769] el1h_64_irq_handler+0x18/0x24
[ 11.111779] el1h_64_irq+0x64/0x68
[ 11.111786] __add_wait_queue+0x0/0x4c
[ 11.111795] mmc_blk_rw_wait+0x84/0x118
[ 11.111804] mmc_blk_mq_issue_rq+0x5c4/0x654
[ 11.111814] mmc_mq_queue_rq+0x194/0x214
[ 11.111822] blk_mq_dispatch_rq_list+0x3ac/0x528
[ 11.111834] __blk_mq_sched_dispatch_requests+0x340/0x4d0
[ 11.111847] blk_mq_sched_dispatch_requests+0x38/0x70
[ 11.111858] blk_mq_run_work_fn+0x3c/0x70
[ 11.111865] process_one_work+0x17c/0x1f0
[ 11.111876] worker_thread+0x1d4/0x26c
[ 11.111885] kthread+0xe4/0xf4
[ 11.111894] ret_from_fork+0x10/0x20
Fixes: 51c5d8447bd7 ("MMC: meson: initial support for GX platforms")
Cc: [email protected]
Signed-off-by: Martin Hundebøll <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
Lockdep on 6.4-rc on ThinkPad X1 Carbon 5th says
=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
6.4.0-rc5 #1 Not tainted
-----------------------------------------------------
kworker/3:1/49 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
ffff8881066fa368 (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}, at: rs_drv_get_rate+0x46/0xe7
and this task is already holding:
ffff8881066f80a8 (&sta->rate_ctrl_lock){+.-.}-{2:2}, at: rate_control_get_rate+0xbd/0x126
which would create a new lock dependency:
(&sta->rate_ctrl_lock){+.-.}-{2:2} -> (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(&sta->rate_ctrl_lock){+.-.}-{2:2}
etc. etc. etc.
Changing the spin_lock() in rs_drv_get_rate() to spin_lock_bh() was not
enough to pacify lockdep, but changing them all on pers.lock has worked.
Fixes: a8938bc881d2 ("wifi: iwlwifi: mvm: Add locking to the rate read flow")
Signed-off-by: Hugh Dickins <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|