Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, bpf and netfilter.
There are a bunch of regressions addressed here, but hopefully nothing
spectacular. We are still waiting the driver fix from Intel, mentioned
by Jakub in the previous networking pull.
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed
TFO
- batman-adv: fix RCU race at module unload time
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not
confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset
- fix the corner case with may_goto and jump to the 1st insn
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM
transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added"
* tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: mana: Fix possible double free in error handling path
selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c.
af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head.
selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c
selftest: af_unix: Check SIGURG after every send() in msg_oob.c
selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c
af_unix: Don't stop recv() at consumed ex-OOB skb.
selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c.
af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.
af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.
selftest: af_unix: Add msg_oob.c.
selftest: af_unix: Remove test_unix_oob.c.
tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
net: usb: qmi_wwan: add Telit FN912 compositions
tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
ionic: use dev_consume_skb_any outside of napi
net: dsa: microchip: fix wrong register write when masking interrupt
Fix race for duplicate reqsk on identical SYN
ibmvnic: Add tx check to prevent skb leak
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became bigger than usual, as it receives a pile of pending ASoC
fixes. Most of changes are for device-specific issues while there are
a few core fixes that are all rather trivial:
- DMA-engine sync fixes
- Continued MIDI2 conversion fixes
- Various ASoC Intel SOF fixes
- A series of ASoC topology fixes for memory handling
- AMD ACP fix, curing a recent regression, too
- Platform / codec-specific fixes for mediatek, atmel, realtek, etc"
* tag 'sound-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (40 commits)
ASoC: rt5645: fix issue of random interrupt from push-button
ALSA: seq: Fix missing MSB in MIDI2 SPP conversion
ASoC: amd: yc: Fix non-functional mic on ASUS M5602RA
ALSA: hda/realtek: fix mute/micmute LEDs don't work for EliteBook 645/665 G11.
ALSA: hda/realtek: Fix conflicting quirk for PCI SSID 17aa:3820
ALSA: dmaengine_pcm: terminate dmaengine before synchronize
ALSA: hda/relatek: Enable Mute LED on HP Laptop 15-gw0xxx
ALSA: PCM: Allow resume only for suspended streams
ALSA: seq: Fix missing channel at encoding RPN/NRPN MIDI2 messages
ASoC: mediatek: mt8195: Add platform entry for ETDM1_OUT_BE dai link
ASoC: fsl-asoc-card: set priv->pdev before using it
ASoC: amd: acp: move chip->flag variable assignment
ASoC: amd: acp: remove i2s configuration check in acp_i2s_probe()
ASoC: amd: acp: add a null check for chip_pdev structure
ASoC: Intel: soc-acpi: mtl: fix speaker no sound on Dell SKU 0C64
ASoC: q6apm-lpass-dai: close graph on prepare errors
ASoC: cs35l56: Disconnect ASP1 TX sources when ASP1 DAI is hooked up
ASoC: topology: Fix route memory corruption
ASoC: rt722-sdca-sdw: add debounce time for type detection
ASoC: SOF: sof-audio: Skip unprepare for in-use widgets on error rollback
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains two Netfilter fixes for net:
Patch #1 fixes CONFIG_SYSCTL=n for a patch coming in the previous PR
to move the sysctl toggle to enable SRv6 netfilter hooks from
nf_conntrack to the core, from Jianguo Wu.
Patch #2 fixes a possible pointer leak to userspace due to insufficient
validation of NFT_DATA_VALUE.
Linus found this pointer leak to userspace via zdi-disclosures@ and
forwarded the notice to Netfilter maintainers, he appears as reporter
because whoever found this issue never approached Netfilter
maintainers neither via security@ nor in private.
netfilter pull request 24-06-27
* tag 'nf-24-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
netfilter: fix undefined reference to 'netfilter_lwtunnel_*' when CONFIG_SYSCTL=n
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), callback function adev_release
calls kfree(madev). We shouldn't call kfree(madev) again
in the error handling path. Set 'madev' to NULL.
Fixes: a69839d4327d ("net: mana: Add support for auxiliary device")
Signed-off-by: Ma Ke <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Kuniyuki Iwashima says:
====================
af_unix: Fix bunch of MSG_OOB bugs and add new tests.
This series rewrites the selftest for AF_UNIX MSG_OOB and fixes
bunch of bugs that AF_UNIX behaves differently compared to TCP.
Note that the test discovered few more bugs in TCP side, which
will be fixed in another series.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
To catch regression, let's check ioctl(SIOCATMARK) after every
send() and recv() calls.
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Even if OOB data is recv()ed, ioctl(SIOCATMARK) must return 1 when the
OOB skb is at the head of the receive queue and no new OOB data is queued.
Without fix:
# RUN msg_oob.no_peek.oob ...
# msg_oob.c:305:oob:Expected answ[0] (0) == oob_head (1)
# oob: Test terminated by assertion
# FAIL msg_oob.no_peek.oob
not ok 2 msg_oob.no_peek.oob
With fix:
# RUN msg_oob.no_peek.oob ...
# OK msg_oob.no_peek.oob
ok 2 msg_oob.no_peek.oob
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When OOB data is in recvq, we can detect it with epoll by checking
EPOLLPRI.
This patch add checks for EPOLLPRI after every send() and recv() in
all test cases.
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When data is sent with MSG_OOB, SIGURG is sent to a process if the
receiver socket has set its owner to the process by ioctl(FIOSETOWN)
or fcntl(F_SETOWN).
This patch adds SIGURG check after every send(MSG_OOB) call.
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When SO_OOBINLINE is enabled on a socket, MSG_OOB can be recv()ed
without MSG_OOB flag, and ioctl(SIOCATMARK) will behaves differently.
This patch adds some test cases for SO_OOBINLINE.
Note the new test cases found two bugs in TCP.
1) After reading OOB data with non-inline mode, we can re-read
the data by setting SO_OOBINLINE.
# RUN msg_oob.no_peek.inline_oob_ahead_break ...
# msg_oob.c:146:inline_oob_ahead_break:AF_UNIX :world
# msg_oob.c:147:inline_oob_ahead_break:TCP :oworld
# OK msg_oob.no_peek.inline_oob_ahead_break
ok 14 msg_oob.no_peek.inline_oob_ahead_break
2) The head OOB data is dropped if SO_OOBINLINE is disabled
if a new OOB data is queued.
# RUN msg_oob.no_peek.inline_ex_oob_drop ...
# msg_oob.c:171:inline_ex_oob_drop:AF_UNIX :x
# msg_oob.c:172:inline_ex_oob_drop:TCP :y
# msg_oob.c:146:inline_ex_oob_drop:AF_UNIX :y
# msg_oob.c:147:inline_ex_oob_drop:TCP :Resource temporarily unavailable
# OK msg_oob.no_peek.inline_ex_oob_drop
ok 17 msg_oob.no_peek.inline_ex_oob_drop
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Currently, recv() is stopped at a consumed OOB skb even if a new
OOB skb is queued and we can ignore the old OOB skb.
>>> from socket import *
>>> c1, c2 = socket(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'hellowor', MSG_OOB)
8
>>> c2.recv(1, MSG_OOB) # consume OOB data stays at middle of recvq.
b'r'
>>> c1.send(b'ld', MSG_OOB)
2
>>> c2.recv(10) # recv() stops at the old consumed OOB
b'hellowo' # should be 'hellowol'
manage_oob() should not stop recv() at the old consumed OOB skb if
there is a new OOB data queued.
Note that TCP behaviour is apparently wrong in this test case because
we can recv() the same OOB data twice.
Without fix:
# RUN msg_oob.no_peek.ex_oob_ahead_break ...
# msg_oob.c:138:ex_oob_ahead_break:AF_UNIX :hellowo
# msg_oob.c:139:ex_oob_ahead_break:Expected:hellowol
# msg_oob.c:141:ex_oob_ahead_break:Expected ret[0] (7) == expected_len (8)
# ex_oob_ahead_break: Test terminated by assertion
# FAIL msg_oob.no_peek.ex_oob_ahead_break
not ok 11 msg_oob.no_peek.ex_oob_ahead_break
With fix:
# RUN msg_oob.no_peek.ex_oob_ahead_break ...
# msg_oob.c:146:ex_oob_ahead_break:AF_UNIX :hellowol
# msg_oob.c:147:ex_oob_ahead_break:TCP :helloworl
# OK msg_oob.no_peek.ex_oob_ahead_break
ok 11 msg_oob.no_peek.ex_oob_ahead_break
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
While testing, I found some weird behaviour on the TCP side as well.
For example, TCP drops the preceding OOB data when queueing a new
OOB data if the old OOB data is at the head of recvq.
# RUN msg_oob.no_peek.ex_oob_drop ...
# msg_oob.c:146:ex_oob_drop:AF_UNIX :x
# msg_oob.c:147:ex_oob_drop:TCP :Resource temporarily unavailable
# msg_oob.c:146:ex_oob_drop:AF_UNIX :y
# msg_oob.c:147:ex_oob_drop:TCP :Invalid argument
# OK msg_oob.no_peek.ex_oob_drop
ok 9 msg_oob.no_peek.ex_oob_drop
# RUN msg_oob.no_peek.ex_oob_drop_2 ...
# msg_oob.c:146:ex_oob_drop_2:AF_UNIX :x
# msg_oob.c:147:ex_oob_drop_2:TCP :Resource temporarily unavailable
# OK msg_oob.no_peek.ex_oob_drop_2
ok 10 msg_oob.no_peek.ex_oob_drop_2
This patch allows AF_UNIX's MSG_OOB implementation to produce different
results from TCP when operations are guarded with tcp_incompliant{}.
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Let's say a socket send()s "hello" with MSG_OOB and "world" without flags,
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX)
>>> c1.send(b'hello', MSG_OOB)
5
>>> c1.send(b'world')
5
and its peer recv()s "hell" and "o".
>>> c2.recv(10)
b'hell'
>>> c2.recv(1, MSG_OOB)
b'o'
Now the consumed OOB skb stays at the head of recvq to return a correct
value for ioctl(SIOCATMARK), which is broken now and fixed by a later
patch.
Then, if peer issues recv() with MSG_DONTWAIT, manage_oob() returns NULL,
so recv() ends up with -EAGAIN.
>>> c2.setblocking(False) # This causes -EAGAIN even with available data
>>> c2.recv(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BlockingIOError: [Errno 11] Resource temporarily unavailable
However, next recv() will return the following available data, "world".
>>> c2.recv(5)
b'world'
When the consumed OOB skb is at the head of the queue, we need to fetch
the next skb to fix the weird behaviour.
Note that the issue does not happen without MSG_DONTWAIT because we can
retry after manage_oob().
This patch also adds a test case that covers the issue.
Without fix:
# RUN msg_oob.no_peek.ex_oob_break ...
# msg_oob.c:134:ex_oob_break:AF_UNIX :Resource temporarily unavailable
# msg_oob.c:135:ex_oob_break:Expected:ld
# msg_oob.c:137:ex_oob_break:Expected ret[0] (-1) == expected_len (2)
# ex_oob_break: Test terminated by assertion
# FAIL msg_oob.no_peek.ex_oob_break
not ok 8 msg_oob.no_peek.ex_oob_break
With fix:
# RUN msg_oob.no_peek.ex_oob_break ...
# OK msg_oob.no_peek.ex_oob_break
ok 8 msg_oob.no_peek.ex_oob_break
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
After consuming OOB data, recv() reading the preceding data must break at
the OOB skb regardless of MSG_PEEK.
Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
not compliant with TCP.
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX)
>>> c1.send(b'hello', MSG_OOB)
5
>>> c1.send(b'world')
5
>>> c2.recv(1, MSG_OOB)
b'o'
>>> c2.recv(9, MSG_PEEK) # This should return b'hell'
b'hellworld' # even with enough buffer.
Let's fix it by returning NULL for consumed skb and unlinking it only if
MSG_PEEK is not specified.
This patch also adds test cases that add recv(MSG_PEEK) before each recv().
Without fix:
# RUN msg_oob.peek.oob_ahead_break ...
# msg_oob.c:134:oob_ahead_break:AF_UNIX :hellworld
# msg_oob.c:135:oob_ahead_break:Expected:hell
# msg_oob.c:137:oob_ahead_break:Expected ret[0] (9) == expected_len (4)
# oob_ahead_break: Test terminated by assertion
# FAIL msg_oob.peek.oob_ahead_break
not ok 13 msg_oob.peek.oob_ahead_break
With fix:
# RUN msg_oob.peek.oob_ahead_break ...
# OK msg_oob.peek.oob_ahead_break
ok 13 msg_oob.peek.oob_ahead_break
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
AF_UNIX's MSG_OOB functionality lacked thorough testing, and we found
some bizarre behaviour.
The new selftest validates every MSG_OOB operation against TCP as a
reference implementation.
This patch adds only a few tests with basic send() and recv() that
do not fail.
The following patches will add more test cases for SO_OOBINLINE, SIGURG,
EPOLLPRI, and SIOCATMARK.
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
test_unix_oob.c does not fully cover AF_UNIX's MSG_OOB functionality,
thus there are discrepancies between TCP behaviour.
Also, the test uses fork() to create message producer, and it's not
easy to understand and add more test cases.
Let's remove test_unix_oob.c and rewrite a new test.
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from
qdisc->dev_queue->dev <NULL> ->name
This situation simulated from bunch of veths and Bluetooth disconnection
and reconnection.
During qdisc initialization, qdisc was being set to noop_queue.
In veth_init_queue, the initial tx_num was reduced back to one,
causing the qdisc reset to be called with noop, which led to the kernel
panic.
I've attached the GitHub gist link that C converted syz-execprogram
source code and 3 log of reproduced vmcore-dmesg.
https://gist.github.com/yskelg/cc64562873ce249cdd0d5a358b77d740
Yeoreum and I use two fuzzing tool simultaneously.
One process with syz-executor : https://github.com/google/syzkaller
$ ./syz-execprog -executor=./syz-executor -repeat=1 -sandbox=setuid \
-enable=none -collide=false log1
The other process with perf fuzzer:
https://github.com/deater/perf_event_tests/tree/master/fuzzer
$ perf_event_tests/fuzzer/perf_fuzzer
I think this will happen on the kernel version.
Linux kernel version +v6.7.10, +v6.8, +v6.9 and it could happen in v6.10.
This occurred from 51270d573a8d. I think this patch is absolutely
necessary. Previously, It was showing not intended string value of name.
I've reproduced 3 time from my fedora 40 Debug Kernel with any other module
or patched.
version: 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug
[ 5287.164555] veth0_vlan: left promiscuous mode
[ 5287.164929] veth1_macvtap: left promiscuous mode
[ 5287.164950] veth0_macvtap: left promiscuous mode
[ 5287.164983] veth1_vlan: left promiscuous mode
[ 5287.165008] veth0_vlan: left promiscuous mode
[ 5287.165450] veth1_macvtap: left promiscuous mode
[ 5287.165472] veth0_macvtap: left promiscuous mode
[ 5287.165502] veth1_vlan: left promiscuous mode
…
[ 5297.598240] bridge0: port 2(bridge_slave_1) entered blocking state
[ 5297.598262] bridge0: port 2(bridge_slave_1) entered forwarding state
[ 5297.598296] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5297.598313] bridge0: port 1(bridge_slave_0) entered forwarding state
[ 5297.616090] 8021q: adding VLAN 0 to HW filter on device bond0
[ 5297.620405] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5297.620730] bridge0: port 2(bridge_slave_1) entered disabled state
[ 5297.627247] 8021q: adding VLAN 0 to HW filter on device team0
[ 5297.629636] bridge0: port 1(bridge_slave_0) entered blocking state
…
[ 5298.002798] bridge_slave_0: left promiscuous mode
[ 5298.002869] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5298.309444] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface
[ 5298.315206] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
[ 5298.320207] bond0 (unregistering): Released all slaves
[ 5298.354296] hsr_slave_0: left promiscuous mode
[ 5298.360750] hsr_slave_1: left promiscuous mode
[ 5298.374889] veth1_macvtap: left promiscuous mode
[ 5298.374931] veth0_macvtap: left promiscuous mode
[ 5298.374988] veth1_vlan: left promiscuous mode
[ 5298.375024] veth0_vlan: left promiscuous mode
[ 5299.109741] team0 (unregistering): Port device team_slave_1 removed
[ 5299.185870] team0 (unregistering): Port device team_slave_0 removed
…
[ 5300.155443] Bluetooth: hci3: unexpected cc 0x0c03 length: 249 > 1
[ 5300.155724] Bluetooth: hci3: unexpected cc 0x1003 length: 249 > 9
[ 5300.155988] Bluetooth: hci3: unexpected cc 0x1001 length: 249 > 9
….
[ 5301.075531] team0: Port device team_slave_1 added
[ 5301.085515] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5301.085531] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5301.085588] bridge_slave_0: entered allmulticast mode
[ 5301.085800] bridge_slave_0: entered promiscuous mode
[ 5301.095617] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5301.095633] bridge0: port 1(bridge_slave_0) entered disabled state
…
[ 5301.149734] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.173234] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.180517] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 5301.193481] hsr_slave_0: entered promiscuous mode
[ 5301.204425] hsr_slave_1: entered promiscuous mode
[ 5301.210172] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.210185] Cannot create hsr debugfs directory
[ 5301.224061] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 5301.246901] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.255934] team0: Port device team_slave_0 added
[ 5301.256480] team0: Port device team_slave_1 added
[ 5301.256948] team0: Port device team_slave_0 added
…
[ 5301.435928] hsr_slave_0: entered promiscuous mode
[ 5301.446029] hsr_slave_1: entered promiscuous mode
[ 5301.455872] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.455884] Cannot create hsr debugfs directory
[ 5301.502664] hsr_slave_0: entered promiscuous mode
[ 5301.513675] hsr_slave_1: entered promiscuous mode
[ 5301.526155] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.526164] Cannot create hsr debugfs directory
[ 5301.563662] hsr_slave_0: entered promiscuous mode
[ 5301.576129] hsr_slave_1: entered promiscuous mode
[ 5301.580259] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.580270] Cannot create hsr debugfs directory
[ 5301.590269] 8021q: adding VLAN 0 to HW filter on device bond0
[ 5301.595872] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137]
[ 5301.595877] Mem abort info:
[ 5301.595881] ESR = 0x0000000096000006
[ 5301.595885] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5301.595889] SET = 0, FnV = 0
[ 5301.595893] EA = 0, S1PTW = 0
[ 5301.595896] FSC = 0x06: level 2 translation fault
[ 5301.595900] Data abort info:
[ 5301.595903] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[ 5301.595907] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 5301.595911] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 5301.595915] [dfff800000000026] address between user and kernel address ranges
[ 5301.595971] Internal error: Oops: 0000000096000006 [#1] SMP
…
[ 5301.596076] CPU: 2 PID: 102769 Comm:
syz-executor.3 Kdump: loaded Tainted:
G W ------- --- 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug #1
[ 5301.596080] Hardware name: VMware, Inc. VMware20,1/VBSA,
BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023
[ 5301.596082] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 5301.596085] pc : strnlen+0x40/0x88
[ 5301.596114] lr : trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
[ 5301.596124] sp : ffff8000beef6b40
[ 5301.596126] x29: ffff8000beef6b40 x28: dfff800000000000 x27: 0000000000000001
[ 5301.596131] x26: 6de1800082c62bd0 x25: 1ffff000110aa9e0 x24: ffff800088554f00
[ 5301.596136] x23: ffff800088554ec0 x22: 0000000000000130 x21: 0000000000000140
[ 5301.596140] x20: dfff800000000000 x19: ffff8000beef6c60 x18: ffff7000115106d8
[ 5301.596143] x17: ffff800121bad000 x16: ffff800080020000 x15: 0000000000000006
[ 5301.596147] x14: 0000000000000002 x13: ffff0001f3ed8d14 x12: ffff700017ddeda5
[ 5301.596151] x11: 1ffff00017ddeda4 x10: ffff700017ddeda4 x9 : ffff800082cc5eec
[ 5301.596155] x8 : 0000000000000004 x7 : 00000000f1f1f1f1 x6 : 00000000f2f2f200
[ 5301.596158] x5 : 00000000f3f3f3f3 x4 : ffff700017dded80 x3 : 00000000f204f1f1
[ 5301.596162] x2 : 0000000000000026 x1 : 0000000000000000 x0 : 0000000000000130
[ 5301.596166] Call trace:
[ 5301.596175] strnlen+0x40/0x88
[ 5301.596179] trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
[ 5301.596182] perf_trace_qdisc_reset+0xb0/0x538
[ 5301.596184] __traceiter_qdisc_reset+0x68/0xc0
[ 5301.596188] qdisc_reset+0x43c/0x5e8
[ 5301.596190] netif_set_real_num_tx_queues+0x288/0x770
[ 5301.596194] veth_init_queues+0xfc/0x130 [veth]
[ 5301.596198] veth_newlink+0x45c/0x850 [veth]
[ 5301.596202] rtnl_newlink_create+0x2c8/0x798
[ 5301.596205] __rtnl_newlink+0x92c/0xb60
[ 5301.596208] rtnl_newlink+0xd8/0x130
[ 5301.596211] rtnetlink_rcv_msg+0x2e0/0x890
[ 5301.596214] netlink_rcv_skb+0x1c4/0x380
[ 5301.596225] rtnetlink_rcv+0x20/0x38
[ 5301.596227] netlink_unicast+0x3c8/0x640
[ 5301.596231] netlink_sendmsg+0x658/0xa60
[ 5301.596234] __sock_sendmsg+0xd0/0x180
[ 5301.596243] __sys_sendto+0x1c0/0x280
[ 5301.596246] __arm64_sys_sendto+0xc8/0x150
[ 5301.596249] invoke_syscall+0xdc/0x268
[ 5301.596256] el0_svc_common.constprop.0+0x16c/0x240
[ 5301.596259] do_el0_svc+0x48/0x68
[ 5301.596261] el0_svc+0x50/0x188
[ 5301.596265] el0t_64_sync_handler+0x120/0x130
[ 5301.596268] el0t_64_sync+0x194/0x198
[ 5301.596272] Code: eb15001f 54000120 d343fc02 12000801 (38f46842)
[ 5301.596285] SMP: stopping secondary CPUs
[ 5301.597053] Starting crashdump kernel...
[ 5301.597057] Bye!
After applying our patch, I didn't find any kernel panic errors.
We've found a simple reproducer
# echo 1 > /sys/kernel/debug/tracing/events/qdisc/qdisc_reset/enable
# ip link add veth0 type veth peer name veth1
Error: Unknown device type.
However, without our patch applied, I tested upstream 6.10.0-rc3 kernel
using the qdisc_reset event and the ip command on my qemu virtual machine.
This 2 commands makes always kernel panic.
Linux version: 6.10.0-rc3
[ 0.000000] Linux version 6.10.0-rc3-00164-g44ef20baed8e-dirty
(paran@fedora) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld
version 2.41-34.fc40) #20 SMP PREEMPT Sat Jun 15 16:51:25 KST 2024
Kernel panic message:
[ 615.236484] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[ 615.237250] Dumping ftrace buffer:
[ 615.237679] (ftrace buffer empty)
[ 615.238097] Modules linked in: veth crct10dif_ce virtio_gpu
virtio_dma_buf drm_shmem_helper drm_kms_helper zynqmp_fpga xilinx_can
xilinx_spi xilinx_selectmap xilinx_core xilinx_pr_decoupler versal_fpga
uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev
videobuf2_common mc usbnet deflate zstd ubifs ubi rcar_canfd rcar_can
omap_mailbox ntb_msi_test ntb_hw_epf lattice_sysconfig_spi
lattice_sysconfig ice40_spi gpio_xilinx dwmac_altr_socfpga mdio_regmap
stmmac_platform stmmac pcs_xpcs dfl_fme_region dfl_fme_mgr dfl_fme_br
dfl_afu dfl fpga_region fpga_bridge can can_dev br_netfilter bridge stp
llc atl1c ath11k_pci mhi ath11k_ahb ath11k qmi_helpers ath10k_sdio
ath10k_pci ath10k_core ath mac80211 libarc4 cfg80211 drm fuse backlight ipv6
Jun 22 02:36:5[3 6k152.62-4sm98k4-0k]v kCePUr:n e1l :P IUDn:a b4le6
8t oC ohmma: nidpl eN oketr nteali nptaedg i6n.g1 0re.0q-urecs3t- 0at0
1v6i4r-tgu4a4le fa2d0dbraeeds0se-dir tyd f#f2f08
615.252376] Hardware name: linux,dummy-virt (DT)
[ 615.253220] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 615.254433] pc : strnlen+0x6c/0xe0
[ 615.255096] lr : trace_event_get_offsets_qdisc_reset+0x94/0x3d0
[ 615.256088] sp : ffff800080b269a0
[ 615.256615] x29: ffff800080b269a0 x28: ffffc070f3f98500 x27:
0000000000000001
[ 615.257831] x26: 0000000000000010 x25: ffffc070f3f98540 x24:
ffffc070f619cf60
[ 615.259020] x23: 0000000000000128 x22: 0000000000000138 x21:
dfff800000000000
[ 615.260241] x20: ffffc070f631ad00 x19: 0000000000000128 x18:
ffffc070f448b800
[ 615.261454] x17: 0000000000000000 x16: 0000000000000001 x15:
ffffc070f4ba2a90
[ 615.262635] x14: ffff700010164d73 x13: 1ffff80e1e8d5eb3 x12:
1ffff00010164d72
[ 615.263877] x11: ffff700010164d72 x10: dfff800000000000 x9 :
ffffc070e85d6184
[ 615.265047] x8 : ffffc070e4402070 x7 : 000000000000f1f1 x6 :
000000001504a6d3
[ 615.266336] x5 : ffff28ca21122140 x4 : ffffc070f5043ea8 x3 :
0000000000000000
[ 615.267528] x2 : 0000000000000025 x1 : 0000000000000000 x0 :
0000000000000000
[ 615.268747] Call trace:
[ 615.269180] strnlen+0x6c/0xe0
[ 615.269767] trace_event_get_offsets_qdisc_reset+0x94/0x3d0
[ 615.270716] trace_event_raw_event_qdisc_reset+0xe8/0x4e8
[ 615.271667] __traceiter_qdisc_reset+0xa0/0x140
[ 615.272499] qdisc_reset+0x554/0x848
[ 615.273134] netif_set_real_num_tx_queues+0x360/0x9a8
[ 615.274050] veth_init_queues+0x110/0x220 [veth]
[ 615.275110] veth_newlink+0x538/0xa50 [veth]
[ 615.276172] __rtnl_newlink+0x11e4/0x1bc8
[ 615.276944] rtnl_newlink+0xac/0x120
[ 615.277657] rtnetlink_rcv_msg+0x4e4/0x1370
[ 615.278409] netlink_rcv_skb+0x25c/0x4f0
[ 615.279122] rtnetlink_rcv+0x48/0x70
[ 615.279769] netlink_unicast+0x5a8/0x7b8
[ 615.280462] netlink_sendmsg+0xa70/0x1190
Yeoreum and I don't know if the patch we wrote will fix the underlying
cause, but we think that priority is to prevent kernel panic happening.
So, we're sending this patch.
Fixes: 51270d573a8d ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string")
Link: https://lore.kernel.org/lkml/[email protected]/t/
Cc: [email protected]
Tested-by: Yunseong Kim <[email protected]>
Signed-off-by: Yunseong Kim <[email protected]>
Signed-off-by: Yeoreum Yun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"13 hotfixes, 7 are cc:stable.
All are MM related apart from a MAINTAINERS update. There is no
identifiable theme here - just singleton patches in various places"
* tag 'mm-hotfixes-stable-2024-06-26-17-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/memory: don't require head page for do_set_pmd()
mm/page_alloc: Separate THP PCP into movable and non-movable categories
nfs: drop the incorrect assertion in nfs_swap_rw()
mm/migrate: make migrate_pages_batch() stats consistent
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
selftests/mm:fix test_prctl_fork_exec return failure
mm: convert page type macros to enum
ocfs2: fix DIO failure due to insufficient transaction credits
kasan: fix bad call to unpoison_slab_object
mm: handle profiling for fake memory allocations during compaction
mm/slab: fix 'variable obj_exts set but not used' warning
/proc/pid/smaps: add mseal info for vma
mm: fix incorrect vbq reference in purge_fragmented_block
|
|
register store validation for NFT_DATA_VALUE is conditional, however,
the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This
only requires a new helper function to infer the register type from the
set datatype so this conditional check can be removed. Otherwise,
pointer to chain object can be leaked through the registers.
Fixes: 96518518cc41 ("netfilter: add nftables")
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Two patches to fix kworker name formatting"
* tag 'wq-for-6.10-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Increase worker desc's length to 32
workqueue: Refactor worker ID formatting and make wq_worker_comm() use full ID string
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.10
A relatively large batch of updates, largely due to the long interval
since I last sent fixes due to various travel and holidays. There's a
lot of driver specific fixes and quirks in here, none of them too major,
and also some fixes for recently introduced memory safety issues in the
topology code.
|
|
Modify register setting sequence of enabling inline command
to fix issue of random interrupt from push-button.
Signed-off-by: Jack Yu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
The conversion of SPP to MIDI2 UMP called a wrong function, and the
secondary argument wasn't taken. As a result, MSB of SPP was always
zero. Fix to call the right function.
Fixes: e9e02819a98a ("ALSA: seq: Automatic conversion of UMP events")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Add the following Telit FN912 compositions:
0x3000: rmnet + tty (AT/NMEA) + tty (AT) + tty (diag)
T: Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#= 8 Spd=480 MxCh= 0
D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=3000 Rev=05.15
S: Manufacturer=Telit Cinterion
S: Product=FN912
S: SerialNumber=92c4c4d8
C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x3001: rmnet + tty (AT) + tty (diag) + DPL (data packet logging) + adb
T: Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#= 7 Spd=480 MxCh= 0
D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=3001 Rev=05.15
S: Manufacturer=Telit Cinterion
S: Product=FN912
S: SerialNumber=92c4c4d8
C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Signed-off-by: Daniele Palmas <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The Vivobook S 16X IPS needs a quirks-table entry for the internal microphone to function properly.
Signed-off-by: Vyacheslav Frantsishko <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
HP EliteBook 645/665 G11 needs ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to
make mic-mute/audio-mute working.
Signed-off-by: Dirk Su <[email protected]>
Cc: <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
The recent fix for Lenovo IdeaPad 330-17IKB replaced the quirk entry,
and this eventually breaks the existing quirk for Lenovo Yoga Duet 7
13ITL6 equipped with the same PCI SSID 17aa:3820.
For applying a proper quirk for each model, check the codec SSID
additionally. Fortunately Yoga Duet has a different codec SSID,
0x17aa3802.
(Interestingly, 17aa:3802 has another conflict of SSID between another
Yoga model vs 14IRP8 which we had to work around similarly.)
Fixes: b1fd0d1285b1 ("ALSA: hda/realtek: Enable headset mic on IdeaPad 330-17IKB 81DM")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Testing determined that the recent commit 9e046bb111f1 ("tcp: clear
tp->retrans_stamp in tcp_rcv_fastopen_synack()") has a race, and does
not always ensure retrans_stamp is 0 after a TFO payload retransmit.
If transmit completion for the SYN+data skb happens after the client
TCP stack receives the SYNACK (which sometimes happens), then
retrans_stamp can erroneously remain non-zero for the lifetime of the
connection, causing a premature ETIMEDOUT later.
Testing and tracing showed that the buggy scenario is the following
somewhat tricky sequence:
+ Client attempts a TFO handshake. tcp_send_syn_data() sends SYN + TFO
cookie + data in a single packet in the syn_data skb. It hands the
syn_data skb to tcp_transmit_skb(), which makes a clone. Crucially,
it then reuses the same original (non-clone) syn_data skb,
transforming it by advancing the seq by one byte and removing the
FIN bit, and enques the resulting payload-only skb in the
sk->tcp_rtx_queue.
+ Client sets retrans_stamp to the start time of the three-way
handshake.
+ Cookie mismatches or server has TFO disabled, and server only ACKs
SYN.
+ tcp_ack() sees SYN is acked, tcp_clean_rtx_queue() clears
retrans_stamp.
+ Since the client SYN was acked but not the payload, the TFO failure
code path in tcp_rcv_fastopen_synack() tries to retransmit the
payload skb. However, in some cases the transmit completion for the
clone of the syn_data (which had SYN + TFO cookie + data) hasn't
happened. In those cases, skb_still_in_host_queue() returns true
for the retransmitted TFO payload, because the clone of the syn_data
skb has not had its tx completetion.
+ Because skb_still_in_host_queue() finds skb_fclone_busy() is true,
it sets the TSQ_THROTTLED bit and the retransmit does not happen in
the tcp_rcv_fastopen_synack() call chain.
+ The tcp_rcv_fastopen_synack() code next implicitly assumes the
retransmit process is finished, and sets retrans_stamp to 0 to clear
it, but this is later overwritten (see below).
+ Later, upon tx completion, tcp_tsq_write() calls
tcp_xmit_retransmit_queue(), which puts the retransmit in flight and
sets retrans_stamp to a non-zero value.
+ The client receives an ACK for the retransmitted TFO payload data.
+ Since we're in CA_Open and there are no dupacks/SACKs/DSACKs/ECN to
make tcp_ack_is_dubious() true and make us call
tcp_fastretrans_alert() and reach a code path that clears
retrans_stamp, retrans_stamp stays nonzero.
+ Later, if there is a TLP, RTO, RTO sequence, then the connection
will suffer an early ETIMEDOUT due to the erroneously ancient
retrans_stamp.
The fix: this commit refactors the code to have
tcp_rcv_fastopen_synack() retransmit by reusing the relevant parts of
tcp_simple_retransmit() that enter CA_Loss (without changing cwnd) and
call tcp_xmit_retransmit_queue(). We have tcp_simple_retransmit() and
tcp_rcv_fastopen_synack() share code in this way because in both cases
we get a packet indicating non-congestion loss (MTU reduction or TFO
failure) and thus in both cases we want to retransmit as many packets
as cwnd allows, without reducing cwnd. And given that retransmits will
set retrans_stamp to a non-zero value (and may do so in a later
calling context due to TSQ), we also want to enter CA_Loss so that we
track when all retransmitted packets are ACked and clear retrans_stamp
when that happens (to ensure later recurring RTOs are using the
correct retrans_stamp and don't declare ETIMEDOUT prematurely).
Fixes: 9e046bb111f1 ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()")
Fixes: a7abf3cd76e1 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If we're not in a NAPI softirq context, we need to be careful
about how we call napi_consume_skb(), specifically we need to
call it with budget==0 to signal to it that we're not in a
safe context.
This was found while running some configuration stress testing
of traffic and a change queue config loop running, and this
curious note popped out:
[ 4371.402645] BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/20545
[ 4371.402897] caller is napi_skb_cache_put+0x16/0x80
[ 4371.403120] CPU: 25 PID: 20545 Comm: ethtool Kdump: loaded Tainted: G OE 6.10.0-rc3-netnext+ #8
[ 4371.403302] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 01/23/2021
[ 4371.403460] Call Trace:
[ 4371.403613] <TASK>
[ 4371.403758] dump_stack_lvl+0x4f/0x70
[ 4371.403904] check_preemption_disabled+0xc1/0xe0
[ 4371.404051] napi_skb_cache_put+0x16/0x80
[ 4371.404199] ionic_tx_clean+0x18a/0x240 [ionic]
[ 4371.404354] ionic_tx_cq_service+0xc4/0x200 [ionic]
[ 4371.404505] ionic_tx_flush+0x15/0x70 [ionic]
[ 4371.404653] ? ionic_lif_qcq_deinit.isra.23+0x5b/0x70 [ionic]
[ 4371.404805] ionic_txrx_deinit+0x71/0x190 [ionic]
[ 4371.404956] ionic_reconfigure_queues+0x5f5/0xff0 [ionic]
[ 4371.405111] ionic_set_ringparam+0x2e8/0x3e0 [ionic]
[ 4371.405265] ethnl_set_rings+0x1f1/0x300
[ 4371.405418] ethnl_default_set_doit+0xbb/0x160
[ 4371.405571] genl_family_rcv_msg_doit+0xff/0x130
[...]
I found that ionic_tx_clean() calls napi_consume_skb() which calls
napi_skb_cache_put(), but before that last call is the note
/* Zero budget indicate non-NAPI context called us, like netpoll */
and
DEBUG_NET_WARN_ON_ONCE(!in_softirq());
Those are pretty big hints that we're doing it wrong. We can pass a
context hint down through the calls to let ionic_tx_clean() know what
we're doing so it can call napi_consume_skb() correctly.
Fixes: 386e69865311 ("ionic: Make use napi_consume_skb")
Signed-off-by: Shannon Nelson <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The switch global port interrupt mask, REG_SW_PORT_INT_MASK__4, is
defined as 0x001C in ksz9477_reg.h. The designers used 32-bit value in
anticipation for increase of port count in future product but currently
the maximum port count is 7 and the effective value is 0x7F in register
0x001F. Each port has its own interrupt mask and is defined as 0x#01F.
It uses only 4 bits for different interrupts.
The developer who implemented the current interrupt mechanism in the
switch driver noticed there are similarities between the mechanism to
mask port interrupts in global interrupt and individual interrupts in
each port and so used the same code to handle these interrupts. He
updated the code to use the new macro REG_SW_PORT_INT_MASK__1 which is
defined as 0x1F in ksz_common.h but he forgot to update the 32-bit write
to 8-bit as now the mask registers are 0x1F and 0x#01F.
In addition all KSZ switches other than the KSZ9897/KSZ9893 and LAN937X
families use only 8-bit access and so this common code will eventually
be changed to accommodate them.
Fixes: e1add7dd6183 ("net: dsa: microchip: use common irq routines for girq and pirq")
Signed-off-by: Tristram Ha <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When dmaengine supports pause function, in suspend state,
dmaengine_pause() is called instead of dmaengine_terminate_async(),
In end of playback stream, the runtime->state will go to
SNDRV_PCM_STATE_DRAINING, if system suspend & resume happen
at this time, application will not resume playback stream, the
stream will be closed directly, the dmaengine_terminate_async()
will not be called before the dmaengine_synchronize(), which
violates the call sequence for dmaengine_synchronize().
This behavior also happens for capture streams, but there is no
SNDRV_PCM_STATE_DRAINING state for capture. So use
dmaengine_tx_status() to check the DMA status if the status is
DMA_PAUSED, then call dmaengine_terminate_async() to terminate
dmaengine before dmaengine_synchronize().
Signed-off-by: Shengjiu Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
This HP Laptop uses ALC236 codec with COEF 0x07 controlling
the mute LED. Enable existing quirk for this device.
Signed-off-by: Aivaz Latypov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
snd_pcm_resume() should bail out if the stream isn't in a suspended
state. Otherwise it'd allow doubly resume.
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
The conversion from the legacy event to MIDI2 UMP for RPN and NRPN
missed the setup of the channel number, resulting in always the
channel 0. Fix it.
Fixes: e9e02819a98a ("ALSA: seq: Automatic conversion of UMP events")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
When bonding is configured in BOND_MODE_BROADCAST mode, if two identical
SYN packets are received at the same time and processed on different CPUs,
it can potentially create the same sk (sock) but two different reqsk
(request_sock) in tcp_conn_request().
These two different reqsk will respond with two SYNACK packets, and since
the generation of the seq (ISN) incorporates a timestamp, the final two
SYNACK packets will have different seq values.
The consequence is that when the Client receives and replies with an ACK
to the earlier SYNACK packet, we will reset(RST) it.
========================================================================
This behavior is consistently reproducible in my local setup,
which comprises:
| NETA1 ------ NETB1 |
PC_A --- bond --- | | --- bond --- PC_B
| NETA2 ------ NETB2 |
- PC_A is the Server and has two network cards, NETA1 and NETA2. I have
bonded these two cards using BOND_MODE_BROADCAST mode and configured
them to be handled by different CPU.
- PC_B is the Client, also equipped with two network cards, NETB1 and
NETB2, which are also bonded and configured in BOND_MODE_BROADCAST mode.
If the client attempts a TCP connection to the server, it might encounter
a failure. Capturing packets from the server side reveals:
10.10.10.10.45182 > localhost: Flags [S], seq 320236027,
10.10.10.10.45182 > localhost: Flags [S], seq 320236027,
localhost > 10.10.10.10.45182: Flags [S.], seq 2967855116,
localhost > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
10.10.10.10.45182 > localhost: Flags [.], ack 4294967290,
10.10.10.10.45182 > localhost: Flags [.], ack 4294967290,
localhost > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
localhost > 10.10.10.10.45182: Flags [R], seq 2967855117,
Two SYNACKs with different seq numbers are sent by localhost,
resulting in an anomaly.
========================================================================
The attempted solution is as follows:
Add a return value to inet_csk_reqsk_queue_hash_add() to confirm if the
ehash insertion is successful (Up to now, the reason for unsuccessful
insertion is that a reqsk for the same connection has already been
inserted). If the insertion fails, release the reqsk.
Due to the refcnt, Kuniyuki suggests also adding a return value check
for the DCCP module; if ehash insertion fails, indicating a successful
insertion of the same connection, simply release the reqsk as well.
Simultaneously, In the reqsk_queue_hash_req(), the start of the
req->rsk_timer is adjusted to be after successful insertion.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: luoxuanqiang <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Below is a summary of how the driver stores a reference to an skb during
transmit:
tx_buff[free_map[consumer_index]]->skb = new_skb;
free_map[consumer_index] = IBMVNIC_INVALID_MAP;
consumer_index ++;
Where variable data looks like this:
free_map == [4, IBMVNIC_INVALID_MAP, IBMVNIC_INVALID_MAP, 0, 3]
consumer_index^
tx_buff == [skb=null, skb=<ptr>, skb=<ptr>, skb=null, skb=null]
The driver has checks to ensure that free_map[consumer_index] pointed to
a valid index but there was no check to ensure that this index pointed
to an unused/null skb address. So, if, by some chance, our free_map and
tx_buff lists become out of sync then we were previously risking an
skb memory leak. This could then cause tcp congestion control to stop
sending packets, eventually leading to ETIMEDOUT.
Therefore, add a conditional to ensure that the skb address is null. If
not then warn the user (because this is still a bug that should be
patched) and free the old pointer to prevent memleak/tcp problems.
Signed-off-by: Nick Child <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The requirement that the head page be passed to do_set_pmd() was added in
commit ef37b2ea08ac ("mm/memory: page_add_file_rmap() ->
folio_add_file_rmap_[pte|pmd]()") and prevents pmd-mapping in the
finish_fault() and filemap_map_pages() paths if the page to be inserted is
anything but the head page for an otherwise suitable vma and pmd-sized
page.
Matthew said:
: We're going to stop using PMDs to map large folios unless the fault is
: within the first 4KiB of the PMD. No idea how many workloads that
: affects, but it only needs to be backported as far as v6.8, so we may
: as well backport it.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ef37b2ea08ac ("mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()")
Signed-off-by: Andrew Bresticker <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
THP-sized allocations") no longer differentiates the migration type of
pages in THP-sized PCP list, it's possible that non-movable allocation
requests may get a CMA page from the list, in some cases, it's not
acceptable.
If a large number of CMA memory are configured in system (for example, the
CMA memory accounts for 50% of the system memory), starting a virtual
machine with device passthrough will get stuck. During starting the
virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM,
...) to pin memory. Normally if a page is present and in CMA area,
pin_user_pages_remote() will migrate the page from CMA area to non-CMA
area because of FOLL_LONGTERM flag. But if non-movable allocation
requests return CMA memory, migrate_longterm_unpinnable_pages() will
migrate a CMA page to another CMA page, which will fail to pass the check
in check_and_migrate_movable_pages() and cause migration endless.
Call trace:
pin_user_pages_remote
--__gup_longterm_locked // endless loops in this function
----_get_user_pages_locked
----check_and_migrate_movable_pages
------migrate_longterm_unpinnable_pages
--------alloc_migration_target
This problem will also have a negative impact on CMA itself. For example,
when CMA is borrowed by THP, and we need to reclaim it through cma_alloc()
or dma_alloc_coherent(), we must move those pages out to ensure CMA's
users can retrieve that contigous memory. Currently, CMA's memory is
occupied by non-movable pages, meaning we can't relocate them. As a
result, cma_alloc() is more likely to fail.
To fix the problem above, we add one PCP list for THP, which will not
introduce a new cacheline for struct per_cpu_pages. THP will have 2 PCP
lists, one PCP list is used by MOVABLE allocation, and the other PCP list
is used by UNMOVABLE allocation. MOVABLE allocation contains GPF_MOVABLE,
and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations")
Signed-off-by: yangge <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Since commit 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS
swap-space"), we can plug multiple pages then unplug them all together.
That means iov_iter_count(iter) could be way bigger than PAGE_SIZE, it
actually equals the size of iov_iter_npages(iter, INT_MAX).
Note this issue has nothing to do with large folios as we don't support
THP_SWPOUT to non-block devices.
[[email protected]: figure out the cause and correct the commit message]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 2282679fb20b ("mm: submit multipage write for SWP_FS_OPS swap-space")
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Barry Song <[email protected]>
Closes: https://lore.kernel.org/linux-mm/[email protected]/
Reviewed-by: Martin Wege <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Steve French <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Chuanhua Han <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Chris Li <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
As Ying pointed out in [1], stats->nr_thp_failed needs to be updated to
avoid stats inconsistency between MIGRATE_SYNC and MIGRATE_ASYNC when
calling migrate_pages_batch().
Because if not, when migrate_pages_batch() is called via
migrate_pages(MIGRATE_ASYNC), nr_thp_failed will not be increased and when
migrate_pages_batch() is called via migrate_pages(MIGRATE_SYNC*),
nr_thp_failed will be increase in migrate_pages_sync() by
stats->nr_thp_failed += astats.nr_thp_split.
[1] https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
Signed-off-by: Zi Yan <[email protected]>
Suggested-by: "Huang, Ying" <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yin Fengwei <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Git hosting for the test suite has been migrated from Gitlab to Codeberg,
given the "less hostile environment".
Link: https://lkml.kernel.org/r/[email protected]
Link: https://codeberg.org/jarkko/linux-tpmdd-test
Signed-off-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
After calling fork() in test_prctl_fork_exec(), the global variable
ksm_full_scans_fd is initialized to 0 in the child process upon entering
the main function of ./ksm_functional_tests.
In the function call chain test_child_ksm() -> __mmap_and_merge_range ->
ksm_merge-> ksm_get_full_scans, start_scans = ksm_get_full_scans() will
return an error. Therefore, the value of ksm_full_scans_fd needs to be
initialized before calling test_child_ksm in the child process.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: aigourensheng <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Changing PG_slab from a page flag to a page type in commit 46df8e73a4a3
("mm: free up PG_slab") in has the unintended consequence of removing the
PG_slab constant from kernel debuginfo. The commit does add the value to
the vmcoreinfo note, which allows debuggers to find the value without
hardcoding it. However it's most flexible to continue representing the
constant with an enum. To that end, convert the page type fields into an
enum. Debuggers will now be able to detect that PG_slab's type has
changed from enum pageflags to enum pagetype.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 46df8e73a4a3 ("mm: free up PG_slab")
Signed-off-by: Stephen Brennan <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hao Ge <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Omar Sandoval <[email protected]>
Cc: Vishal Moola (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The code in ocfs2_dio_end_io_write() estimates number of necessary
transaction credits using ocfs2_calc_extend_credits(). This however does
not take into account that the IO could be arbitrarily large and can
contain arbitrary number of extents.
Extent tree manipulations do often extend the current transaction but not
in all of the cases. For example if we have only single block extents in
the tree, ocfs2_mark_extent_written() will end up calling
ocfs2_replace_extent_rec() all the time and we will never extend the
current transaction and eventually exhaust all the transaction credits if
the IO contains many single block extents. Once that happens a
WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
this error. This was actually triggered by one of our customers on a
heavily fragmented OCFS2 filesystem.
To fix the issue make sure the transaction always has enough credits for
one extent insert before each call of ocfs2_mark_extent_written().
Heming Zhao said:
------
PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error"
PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA"
#0 machine_kexec at ffffffff8c069932
#1 __crash_kexec at ffffffff8c1338fa
#2 panic at ffffffff8c1d69b9
#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2]
#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2]
#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2]
#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2]
#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2]
#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2]
#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2]
#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2]
#11 dio_complete at ffffffff8c2b9fa7
#12 do_blockdev_direct_IO at ffffffff8c2bc09f
#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2]
#14 generic_file_direct_write at ffffffff8c1dcf14
#15 __generic_file_write_iter at ffffffff8c1dd07b
#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2]
#17 aio_write at ffffffff8c2cc72e
#18 kmem_cache_alloc at ffffffff8c248dde
#19 do_io_submit at ffffffff8c2ccada
#20 do_syscall_64 at ffffffff8c004984
#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Reviewed-by: Heming Zhao <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Commit 29d7355a9d05 ("kasan: save alloc stack traces for mempool") messed
up one of the calls to unpoison_slab_object: the last two arguments are
supposed to be GFP flags and whether to init the object memory.
Fix the call.
Without this fix, __kasan_mempool_unpoison_object provides the object's
size as GFP flags to unpoison_slab_object, which can cause LOCKDEP reports
(and probably other issues).
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 29d7355a9d05 ("kasan: save alloc stack traces for mempool")
Signed-off-by: Andrey Konovalov <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Acked-by: Marco Elver <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
During compaction isolated free pages are marked allocated so that they
can be split and/or freed. For that, post_alloc_hook() is used inside
split_map_pages() and release_free_list(). split_map_pages() marks free
pages allocated, splits the pages and then lets
alloc_contig_range_noprof() free those pages. release_free_list() marks
free pages and immediately frees them. This usage of post_alloc_hook()
affect memory allocation profiling because these functions might not be
called from an instrumented allocator, therefore current->alloc_tag is
NULL and when debugging is enabled (CONFIG_MEM_ALLOC_PROFILING_DEBUG=y)
that causes warnings. To avoid that, wrap such post_alloc_hook() calls
into an instrumented function which acts as an allocator which will be
charged for these fake allocations. Note that these allocations are very
short lived until they are freed, therefore the associated counters should
usually read 0.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Sourav Panda <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
slab_post_alloc_hook() uses prepare_slab_obj_exts_hook() to obtain
slabobj_ext object. Currently the only user of slabobj_ext object in this
path is memory allocation profiling, therefore when it's not enabled this
object is not needed. This also generates a warning when compiling with
CONFIG_MEM_ALLOC_PROFILING=n. Move the code under this configuration to
fix the warning. If more slabobj_ext users appear in the future, the code
will have to be changed back to call prepare_slab_obj_exts_hook().
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Cc: Kent Overstreet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add sl in /proc/pid/smaps to indicate vma is sealed
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8be7258aad44 ("mseal: add mseal syscall")
Signed-off-by: Jeff Xu <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Adhemerval Zanella <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jorge Lucangeli Obes <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Stephen Röttger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
xa_for_each() in _vm_unmap_aliases() loops through all vbs. However,
since commit 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks
xarray") the vb from xarray may not be on the corresponding CPU
vmap_block_queue. Consequently, purge_fragmented_block() might use the
wrong vbq->lock to protect the free list, leading to vbq->free breakage.
Incorrect lock protection can exhaust all vmalloc space as follows:
CPU0 CPU1
+--------------------------------------------+
| +--------------------+ +-----+ |
+--> | |---->| |------+
| CPU1:vbq free_list | | vb1 |
+--- | |<----| |<-----+
| +--------------------+ +-----+ |
+--------------------------------------------+
_vm_unmap_aliases() vb_alloc()
new_vmap_block()
xa_for_each(&vbq->vmap_blocks, idx, vb)
--> vb in CPU1:vbq->freelist
purge_fragmented_block(vb)
spin_lock(&vbq->lock) spin_lock(&vbq->lock)
--> use CPU0:vbq->lock --> use CPU1:vbq->lock
list_del_rcu(&vb->free_list) list_add_tail_rcu(&vb->free_list, &vbq->free)
__list_del(vb->prev, vb->next)
next->prev = prev
+--------------------+
| |
| CPU1:vbq free_list |
+---| |<--+
| +--------------------+ |
+----------------------------+
__list_add(new, head->prev, head)
+--------------------------------------------+
| +--------------------+ +-----+ |
+--> | |---->| |------+
| CPU1:vbq free_list | | vb2 |
+--- | |<----| |<-----+
| +--------------------+ +-----+ |
+--------------------------------------------+
prev->next = next
+--------------------------------------------+
|----------------------------+ |
| +--------------------+ | +-----+ |
+--> | |--+ | |------+
| CPU1:vbq free_list | | vb2 |
+--- | |<----| |<-----+
| +--------------------+ +-----+ |
+--------------------------------------------+
Here’s a list breakdown. All vbs, which were to be added to
‘prev’, cannot be used by list_for_each_entry_rcu(vb, &vbq->free,
free_list) in vb_alloc(). Thus, vmalloc space is exhausted.
This issue affects both erofs and f2fs, the stacktrace is as follows:
erofs:
[<ffffffd4ffb93ad4>] __switch_to+0x174
[<ffffffd4ffb942f0>] __schedule+0x624
[<ffffffd4ffb946f4>] schedule+0x7c
[<ffffffd4ffb947cc>] schedule_preempt_disabled+0x24
[<ffffffd4ffb962ec>] __mutex_lock+0x374
[<ffffffd4ffb95998>] __mutex_lock_slowpath+0x14
[<ffffffd4ffb95954>] mutex_lock+0x24
[<ffffffd4fef2900c>] reclaim_and_purge_vmap_areas+0x44
[<ffffffd4fef25908>] alloc_vmap_area+0x2e0
[<ffffffd4fef24ea0>] vm_map_ram+0x1b0
[<ffffffd4ff1b46f4>] z_erofs_lz4_decompress+0x278
[<ffffffd4ff1b8ac4>] z_erofs_decompress_queue+0x650
[<ffffffd4ff1b8328>] z_erofs_runqueue+0x7f4
[<ffffffd4ff1b66a8>] z_erofs_read_folio+0x104
[<ffffffd4feeb6fec>] filemap_read_folio+0x6c
[<ffffffd4feeb68c4>] filemap_fault+0x300
[<ffffffd4fef0ecac>] __do_fault+0xc8
[<ffffffd4fef0c908>] handle_mm_fault+0xb38
[<ffffffd4ffb9f008>] do_page_fault+0x288
[<ffffffd4ffb9ed64>] do_translation_fault[jt]+0x40
[<ffffffd4fec39c78>] do_mem_abort+0x58
[<ffffffd4ffb8c3e4>] el0_ia+0x70
[<ffffffd4ffb8c260>] el0t_64_sync_handler[jt]+0xb0
[<ffffffd4fec11588>] ret_to_user[jt]+0x0
f2fs:
[<ffffffd4ffb93ad4>] __switch_to+0x174
[<ffffffd4ffb942f0>] __schedule+0x624
[<ffffffd4ffb946f4>] schedule+0x7c
[<ffffffd4ffb947cc>] schedule_preempt_disabled+0x24
[<ffffffd4ffb962ec>] __mutex_lock+0x374
[<ffffffd4ffb95998>] __mutex_lock_slowpath+0x14
[<ffffffd4ffb95954>] mutex_lock+0x24
[<ffffffd4fef2900c>] reclaim_and_purge_vmap_areas+0x44
[<ffffffd4fef25908>] alloc_vmap_area+0x2e0
[<ffffffd4fef24ea0>] vm_map_ram+0x1b0
[<ffffffd4ff1a3b60>] f2fs_prepare_decomp_mem+0x144
[<ffffffd4ff1a6c24>] f2fs_alloc_dic+0x264
[<ffffffd4ff175468>] f2fs_read_multi_pages+0x428
[<ffffffd4ff17b46c>] f2fs_mpage_readpages+0x314
[<ffffffd4ff1785c4>] f2fs_readahead+0x50
[<ffffffd4feec3384>] read_pages+0x80
[<ffffffd4feec32c0>] page_cache_ra_unbounded+0x1a0
[<ffffffd4feec39e8>] page_cache_ra_order+0x274
[<ffffffd4feeb6cec>] do_sync_mmap_readahead+0x11c
[<ffffffd4feeb6764>] filemap_fault+0x1a0
[<ffffffd4ff1423bc>] f2fs_filemap_fault+0x28
[<ffffffd4fef0ecac>] __do_fault+0xc8
[<ffffffd4fef0c908>] handle_mm_fault+0xb38
[<ffffffd4ffb9f008>] do_page_fault+0x288
[<ffffffd4ffb9ed64>] do_translation_fault[jt]+0x40
[<ffffffd4fec39c78>] do_mem_abort+0x58
[<ffffffd4ffb8c3e4>] el0_ia+0x70
[<ffffffd4ffb8c260>] el0t_64_sync_handler[jt]+0xb0
[<ffffffd4fec11588>] ret_to_user[jt]+0x0
To fix this, introducee cpu within vmap_block to record which this vb
belongs to.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
Signed-off-by: Zhaoyang Huang <[email protected]>
Suggested-by: Hailong.Liu <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2024-06-24
We've added 12 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 412 insertions(+), 16 deletions(-).
The main changes are:
1) Fix a BPF verifier issue validating may_goto with a negative offset,
from Alexei Starovoitov.
2) Fix a BPF verifier validation bug with may_goto combined with jump to
the first instruction, also from Alexei Starovoitov.
3) Fix a bug with overrunning reservations in BPF ring buffer,
from Daniel Borkmann.
4) Fix a bug in BPF verifier due to missing proper var_off setting related
to movsx instruction, from Yonghong Song.
5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(),
from Daniil Dulov.
* tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: Remove WARN() from __xdp_reg_mem_model()
selftests/bpf: Add tests for may_goto with negative offset.
bpf: Fix may_goto with negative offset.
selftests/bpf: Add more ring buffer test coverage
bpf: Fix overrunning reservations in ringbuf
selftests/bpf: Tests with may_goto and jumps to the 1st insn
bpf: Fix the corner case with may_goto and jump to the 1st insn.
bpf: Update BPF LSM maintainer list
bpf: Fix remap of arena.
selftests/bpf: Add a few tests to cover
bpf: Add missed var_off setting in coerce_subreg_to_size_sx()
bpf: Add missed var_off setting in set_sext32_default_val()
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|