Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A set of small fixes:
- Repair the ktime_get_coarse() functions so they actually deliver
what they are supposed to: tick granular time stamps. The current
code missed to add the accumulated nanoseconds part of the
timekeeper so the resulting granularity was 1 second.
- Prevent the tracer from infinitely recursing into time getter
functions in the arm architectured timer by marking these functions
notrace
- Fix a trivial compiler warning caused by wrong qualifier ordering"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Repair ktime_get_coarse*() granularity
clocksource/drivers/arm_arch_timer: Don't trace count reader functions
clocksource/drivers/timer-ti-dm: Change to new style declaration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
"Two small fixes for RAS:
- Use a proper search algorithm to find the correct element in the
CEC array. The replacement was a better choice than fixing the
crash causes by the original search function with horrible duct
tape.
- Move the timer based decay function into thread context so it can
actually acquire the mutex which protects the CEC array to prevent
corruption"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
RAS/CEC: Convert the timer callback to a workqueue
RAS/CEC: Fix binary search function
|
|
If mtu probing is enabled tcp_mtu_probing() could very well end up
with a too small MSS.
Use the new sysctl tcp_min_snd_mss to make sure MSS search
is performed in an acceptable range.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Jonathan Lemon <[email protected]>
Cc: Jonathan Looney <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: Bruce Curtis <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some TCP peers announce a very small MSS option in their SYN and/or
SYN/ACK messages.
This forces the stack to send packets with a very high network/cpu
overhead.
Linux has enforced a minimal value of 48. Since this value includes
the size of TCP options, and that the options can consume up to 40
bytes, this means that each segment can include only 8 bytes of payload.
In some cases, it can be useful to increase the minimal value
to a saner value.
We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
reasons.
Note that TCP_MAXSEG socket option enforces a minimal value
of (TCP_MIN_MSS). David Miller increased this minimal value
in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.")
from 64 to 88.
We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
CVE-2019-11479 -- tcp mss hardcoded to 48
Signed-off-by: Eric Dumazet <[email protected]>
Suggested-by: Jonathan Looney <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Tyler Hicks <[email protected]>
Cc: Bruce Curtis <[email protected]>
Cc: Jonathan Lemon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.
TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.
A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.
Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.
CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
socket is already using more than half the allowed space
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Jonathan Looney <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Cc: Bruce Curtis <[email protected]>
Cc: Jonathan Lemon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :
BUG_ON(tcp_skb_pcount(skb) < pcount);
This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48
An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.
This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.
Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.
CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Jonathan Looney <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Bruce Curtis <[email protected]>
Cc: Jonathan Lemon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2019-06-15
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) fix stack layout of JITed x64 bpf code, from Alexei.
2) fix out of bounds memory access in bpf_sk_storage, from Arthur.
3) fix lpm trie walk, from Jonathan.
4) fix nested bpf_perf_event_output, from Matt.
5) and several other fixes.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit ef7bfa84725d891bbdb88707ed55b2cbf94942bb.
Russell King espressed some strong opposition to this
change, explaining that this is trying to make phylink
behave outside of how it has been designed.
Signed-off-by: David S. Miller <[email protected]>
|
|
BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as
they do not increment bpf_prog_active while executing.
This enables three levels of nesting, to support
- a kprobe or raw tp or perf event,
- another one of the above that irq context happens to call, and
- another one in nmi context
(at most one of which may be a kprobe or perf event).
Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
Signed-off-by: Matt Mullins <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
bpf_sk_storage maps use multiple spin locks to reduce contention.
The number of locks to use is determined by the number of possible CPUs.
With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used.
When updating elements, the correct lock is determined with hash_ptr().
Calling hash_ptr() with 0 bits is undefined behavior, as it does:
x >> (64 - bits)
Using the value results in an out of bounds memory access.
In my case, this manifested itself as a page fault when raw_spin_lock_bh()
is called later, when running the self tests:
./tools/testing/selftests/bpf/test_verifier 773 775
[ 16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8
Force the minimum number of locks to two.
Signed-off-by: Arthur Fabre <[email protected]>
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has
shut down and there is nothing left to read.
This fixes the following bug:
1) Peer sends SHUTDOWN(RDWR).
2) Socket enters TCP_CLOSING but SOCK_DONE is not set.
3) read() returns -ENOTCONN until close() is called, then returns 0.
Signed-off-by: Stephen Barber <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We get this regression when using RTL8366RB as part of a bridge
with OpenWrt:
WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291
switchdev_port_attr_set_now+0x80/0xa4
lan0: Commit of attribute (id=7) failed.
(...)
realtek-smi switch lan0: failed to initialize vlan filtering on this port
This is because it is trying to disable VLAN filtering
on VLAN0, as we have forgot to add 1 to the port number
to get the right VLAN in rtl8366_vlan_filtering(): when
we initialize the VLAN we associate VLAN1 with port 0,
VLAN2 with port 1 etc, so we need to add 1 to the port
offset.
Fixes: d8652956cf37 ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The phy_state field of phylink should carry only valid information
especially when this can be passed to the .mac_config callback.
Update the an_enabled field with the autoneg state in the
phylink_phy_change function.
Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Ioana Ciornei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
- fix a couple of Mellanox driver enumeration issues
- fix ASUS laptop regression with backlight
- fix Dell computers that got a wrong mode (tablet versus laptop) after
resume
* tag 'platform-drivers-x86-v5.2-3' of git://git.infradead.org/linux-platform-drivers-x86:
platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow
platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration
platform/x86: intel-vbtn: Report switch events when event wakes device
platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 5.2-rc5
Nothing major, just some small gadget fixes, usb-serial new device
ids, a few new quirks, and some small fixes for some regressions that
have been found after the big 5.2-rc1 merge.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: Make sure an alt mode exist before getting its partner
usb: gadget: udc: lpc32xx: fix return value check in lpc32xx_udc_probe()
usb: gadget: dwc2: fix zlp handling
usb: dwc2: Set actual frame number for completed ISOC transfer for none DDMA
usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC
usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i]
usb: phy: mxs: Disable external charger detect in mxs_phy_hw_init()
usb: dwc2: Fix DMA cache alignment issues
usb: dwc2: host: Fix wMaxPacketSize handling (fix webcam regression)
USB: Fix chipmunk-like voice when using Logitech C270 for recording audio.
USB: usb-storage: Add new ID to ums-realtek
usb: typec: ucsi: ccg: fix memory leak in do_flash
USB: serial: option: add Telit 0x1260 and 0x1261 compositions
USB: serial: pl2303: add Allied Telesis VT-Kit3
USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for a regression introduced by our 32-bit KASAN support, which
broke booting on machines with "bootx" early debugging enabled.
A fix for a bug which broke kexec on 32-bit, introduced by changes to
the 32-bit STRICT_KERNEL_RWX support in v5.1.
Finally two fixes going to stable for our THP split/collapse handling,
discovered by Nick. The first fixes random crashes and/or corruption
in guests under sufficient load.
Thanks to: Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu
Malaterre"
* tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()
powerpc/64s: Fix THP PMD collapse serialisation
powerpc: Fix kexec failure on book3s/32
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Out of range read of stack trace output
- Fix for NULL pointer dereference in trace_uprobe_create()
- Fix to a livepatching / ftrace permission race in the module code
- Fix for NULL pointer dereference in free_ftrace_func_mapper()
- A couple of build warning clean ups
* tag 'trace-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()
module: Fix livepatch/ftrace module text permissions race
tracing/uprobe: Fix obsolete comment on trace_uprobe_create()
tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()
tracing: Make two symbols static
tracing: avoid build warning with HAVE_NOP_MCOUNT
tracing: Fix out-of-range read in trace_stack_print()
|
|
Build failure was introduced by the commit identified below,
due to missed macro expension leading to wrong called function's name.
arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
Makefile:1052: recipe for target 'vmlinux' failed
The called function should be kvmppc_handler_8_0x01B(). This patch fixes it.
Reported-by: Paul Mackerras <[email protected]>
Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Use r10 instead of r9 to calculate CPU offset as r9 contains
the value from SRR1 which is used later.
Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The patch referenced below moved the loading of segment registers
out of load_up_mmu() in order to do it earlier in the boot sequence.
However, the secondary CPU still needs it to be done when loading up
the MMU.
Reported-by: Erhard F. <[email protected]>
Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Adric Blake reported the following warning during suspend-resume:
Enabling non-boot CPUs ...
x86: Booting SMP configuration:
smpboot: Booting Node 0 Processor 1 APIC 0x2
unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
Call Trace:
intel_set_tfa
intel_pmu_cpu_starting
? x86_pmu_dead_cpu
x86_pmu_starting_cpu
cpuhp_invoke_callback
? _raw_spin_lock_irqsave
notify_cpu_starting
start_secondary
secondary_startup_64
microcode: sig=0x806ea, pf=0x80, revision=0x96
microcode: updated to revision 0xb4, date = 2019-04-01
CPU1 is up
The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
by microcode. The log above shows that the microcode loader callback
happens after the PMU restoration, leading to the conjecture that
because the microcode hasn't been updated yet, that MSR is not present
yet, leading to the #GP.
Add a microcode loader-specific hotplug vector which comes before
the PERF vectors and thus executes earlier and makes sure the MSR is
present.
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Adric Blake <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This has an unusually high density of tricky fixes:
- task_get_css() could deadlock when it races against a dying cgroup.
- cgroup.procs didn't list thread group leaders with live threads.
This could mislead readers to think that a cgroup is empty when
it's not. Fixed by making PROCS iterator include dead tasks. I made
a couple mistakes making this change and this pull request contains
a couple follow-up patches.
- When cpusets run out of online cpus, it updates cpusmasks of member
tasks in bizarre ways. Joel improved the behavior significantly"
* 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: restore sanity to cpuset_cpus_allowed_fallback()
cgroup: Fix css_task_iter_advance_css_set() cset skip condition
cgroup: css_task_iter_skip()'d iterators must be advanced before accessed
cgroup: Include dying leaders with live threads in PROCS iterations
cgroup: Implement css_task_iter_skip()
cgroup: Call cgroup_release() before __exit_signal()
docs cgroups: add another example size for hugetlb
cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
|
|
Pull drm fixes from Daniel Vetter:
"Nothing unsettling here, also not aware of anything serious still
pending.
The edid override regression fix took a bit longer since this seems to
be an area with an overabundance of bad options. But the fix we have
now seems like a good path forward.
Next week it should be back to Dave.
Summary:
- fix regression on amdgpu on SI
- fix edid override regression
- driver fixes: amdgpu, i915, mediatek, meson, panfrost
- fix writecombine for vmap in gem-shmem helper (used by panfrost)
- add more panel quirks"
* tag 'drm-fixes-2019-06-14' of git://anongit.freedesktop.org/drm/drm: (25 commits)
drm/amdgpu: return 0 by default in amdgpu_pm_load_smu_firmware
drm/amdgpu: Fix bounds checking in amdgpu_ras_is_supported()
drm: add fallback override/firmware EDID modes workaround
drm/edid: abstract override/firmware EDID retrieval
drm/i915/perf: fix whitelist on Gen10+
drm/i915/sdvo: Implement proper HDMI audio support for SDVO
drm/i915: Fix per-pixel alpha with CCS
drm/i915/dmc: protect against reading random memory
drm/i915/dsi: Use a fuzzy check for burst mode clock check
drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
drm/panfrost: Require the simple_ondemand governor
drm/panfrost: make devfreq optional again
drm/gem_shmem: Use a writecombine mapping for ->vaddr
drm: panel-orientation-quirks: Add quirk for GPD MicroPC
drm: panel-orientation-quirks: Add quirk for GPD pocket2
drm/meson: fix G12A primary plane disabling
drm/meson: fix primary plane disabling
drm/meson: fix G12A HDMI PLL settings for 4K60 1000/1001 variations
drm/mediatek: call mtk_dsi_stop() after mtk_drm_crtc_atomic_disable()
drm/mediatek: clear num_pipes when unbind driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
"Fix rounding error in gfs2_iomap_page_prepare"
* tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix rounding error in gfs2_iomap_page_prepare
|
|
Eric Dumazet says:
====================
tcp: add three static keys
Recent addition of per TCP socket rx/tx cache brought
regressions for some workloads, as reported by Feng Tang.
It seems better to make them opt-in, before we adopt better
heuristics.
The last patch adds high_order_alloc_disable sysctl
to ask TCP sendmsg() to exclusively use order-0 allocations,
as mm layer has specific optimizations.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
>From linux-3.7, (commit 5640f7685831 "net: use a per task frag
allocator") TCP sendmsg() has preferred using order-3 allocations.
While it gives good results for most cases, we had reports
that heavy uses of TCP over loopback were hitting a spinlock
contention in page allocations/freeing.
This commits adds a sysctl so that admins can opt-in
for order-0 allocations. Hopefully mm layer might optimize
order-3 allocations in the future since it could give us
a nice boost (see 8 lines of following benchmark)
The following benchmark shows a win when more than 8 TCP_STREAM
threads are running (56 x86 cores server in my tests)
for thr in {1..30}
do
sysctl -wq net.core.high_order_alloc_disable=0
T0=`./super_netperf $thr -H 127.0.0.1 -l 15`
sysctl -wq net.core.high_order_alloc_disable=1
T1=`./super_netperf $thr -H 127.0.0.1 -l 15`
echo $thr:$T0:$T1
done
1: 49979: 37267
2: 98745: 76286
3: 141088: 110051
4: 177414: 144772
5: 197587: 173563
6: 215377: 208448
7: 241061: 234087
8: 267155: 263373
9: 295069: 297402
10: 312393: 335213
11: 340462: 368778
12: 371366: 403954
13: 412344: 443713
14: 426617: 473580
15: 474418: 507861
16: 503261: 538539
17: 522331: 563096
18: 532409: 567084
19: 550824: 605240
20: 525493: 641988
21: 564574: 665843
22: 567349: 690868
23: 583846: 710917
24: 588715: 736306
25: 603212: 763494
26: 604083: 792654
27: 602241: 796450
28: 604291: 797993
29: 611610: 833249
30: 577356: 841062
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Feng Tang reported a performance regression after introduction
of per TCP socket tx/rx caches, for TCP over loopback (netperf)
There is high chance the regression is caused by a change on
how well the 32 KB per-thread page (current->task_frag) can
be recycled, and lack of pcp caches for order-3 pages.
I could not reproduce the regression myself, cpus all being
spinning on the mm spinlocks for page allocs/freeing, regardless
of enabling or disabling the per tcp socket caches.
It seems best to disable the feature by default, and let
admins enabling it.
MM layer either needs to provide scalable order-3 pages
allocations, or could attempt a trylock on zone->lock if
the caller only attempts to get a high-order page and is
able to fallback to order-0 ones in case of pressure.
Tests run on a 56 cores host (112 hyper threads)
- 35.49% netperf [kernel.vmlinux] [k] queued_spin_lock_slowpath
- 35.49% queued_spin_lock_slowpath
- 18.18% get_page_from_freelist
- __alloc_pages_nodemask
- 18.18% alloc_pages_current
skb_page_frag_refill
sk_page_frag_refill
tcp_sendmsg_locked
tcp_sendmsg
inet_sendmsg
sock_sendmsg
__sys_sendto
__x64_sys_sendto
do_syscall_64
entry_SYSCALL_64_after_hwframe
__libc_send
+ 17.31% __free_pages_ok
+ 31.43% swapper [kernel.vmlinux] [k] intel_idle
+ 9.12% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
+ 6.53% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
+ 0.69% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath
+ 0.68% netperf [kernel.vmlinux] [k] skb_release_data
+ 0.52% netperf [kernel.vmlinux] [k] tcp_sendmsg_locked
0.46% netperf [kernel.vmlinux] [k] _raw_spin_lock_irqsave
Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Feng Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Instead of relying on rps_needed, it is safer to use a separate
static key, since we do not want to enable TCP rx_skb_cache
by default. This feature can cause huge increase of memory
usage on hosts with millions of sockets.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert proc_dointvec_minmax_bpf_stats() into a more generic
helper, since we are going to use jump labels more often.
Note that sysctl_bpf_stats_enabled is removed, since
it is no longer needed/used.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
For better consistency of synthetic NIC names, we set the probe mode to
PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
channel offer sequence.
Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Current flower mask creating code assumes that temporary mask that is used
when inserting new filter is stack allocated. To prevent race condition
with data patch synchronize_rcu() is called every time fl_create_new_mask()
replaces temporary stack allocated mask. As reported by Jiri, this
increases runtime of creating 20000 flower classifiers from 4 seconds to
163 seconds. However, this design is no longer necessary since temporary
mask was converted to be dynamically allocated by commit 2cddd2014782
("net/sched: cls_flower: allocate mask dynamically in fl_change()").
Remove synchronize_rcu() calls from mask creation code. Instead, refactor
fl_change() to always deallocate temporary mask with rcu grace period.
Fixes: 195c234d15c9 ("net: sched: flower: handle concurrent mask insertion")
Reported-by: Jiri Pirko <[email protected]>
Signed-off-by: Vlad Buslov <[email protected]>
Tested-by: Jiri Pirko <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When building with CONFIG_NET_DSA_REALTEK_SMI and CONFIG_REALTEK_PHY
enabled as loadable modules, we see the following warning:
warning: same module names found:
drivers/net/phy/realtek.ko
drivers/net/dsa/realtek.ko
Rework so the driver name is realtek-smi instead of realtek.
Reviewed-by: Linus Walleij <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Anders Roxell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Based on comments from Xin, even after fixes for our recent syzbot
report of cookie memory leaks, its possible to get a resend of an INIT
chunk which would lead to us leaking cookie memory.
To ensure that we don't leak cookie memory, free any previously
allocated cookie first.
Change notes
v1->v2
update subsystem tag in subject (davem)
repeat kfree check for peer_random and peer_hmacs (xin)
v2->v3
net->sctp
also free peer_chunks
v3->v4
fix subject tags
v4->v5
remove cut line
Signed-off-by: Neil Horman <[email protected]>
Reported-by: [email protected]
CC: Marcelo Ricardo Leitner <[email protected]>
CC: Xin Long <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: [email protected]
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If some of the switch ports were not listed in the device tree, due to
being unused, the ksz_mib_read_work function ended up accessing a NULL
dp->slave pointer and causing an oops. Skip checking statistics for any
unused ports.
Fixes: 7c6ff470aa867f53 ("net: dsa: microchip: add MIB counter reading support")
Signed-off-by: Robert Hancock <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Reinhard Speyerer says:
====================
qmi_wwan: fix QMAP handling
This series addresses the following issues observed when using the
QMAP support of the qmi_wwan driver:
1. The QMAP code in the qmi_wwan driver is based on the CodeAurora
GobiNet driver ([1], [2]) which does not process QMAP padding
in the RX path correctly. This causes qmimux_rx_fixup() to pass
incorrect data to the IP stack when padding is used.
2. qmimux devices currently lack proper network device usage statistics.
3. RCU stalls on device disconnect with QMAP activated like this
# echo Y > /sys/class/net/wwan0/qmi/raw_ip
# echo 1 > /sys/class/net/wwan0/qmi/add_mux
# echo 2 > /sys/class/net/wwan0/qmi/add_mux
# echo 3 > /sys/class/net/wwan0/qmi/add_mux
have been observed in certain setups:
[ 2273.676593] option1 ttyUSB16: GSM modem (1-port) converter now disconnected from ttyUSB16
[ 2273.676617] option 6-1.2:1.0: device disconnected
[ 2273.676774] WARNING: CPU: 1 PID: 141 at kernel/rcu/tree_plugin.h:342 rcu_note_context_switch+0x2a/0x3d0
[ 2273.676776] Modules linked in: option qmi_wwan cdc_mbim cdc_ncm qcserial cdc_wdm usb_wwan sierra sierra_net usbnet mii edd coretemp iptable_mangle ip6_tables iptable_filter ip_tables cdc_acm dm_mod dax iTCO_wdt evdev iTCO_vendor_support sg ftdi_sio usbserial e1000e ptp pps_core i2c_i801 ehci_pci button lpc_ich i2c_core mfd_core uhci_hcd ehci_hcd rtc_cmos usbcore usb_common sd_mod fan ata_piix thermal
[ 2273.676817] CPU: 1 PID: 141 Comm: kworker/1:1 Not tainted 4.19.38-rsp-1 #1
[ 2273.676819] Hardware name: Not Applicable Not Applicable /CX-GS/GM45-GL40 , BIOS V1.11 03/23/2011
[ 2273.676828] Workqueue: usb_hub_wq hub_event [usbcore]
[ 2273.676832] EIP: rcu_note_context_switch+0x2a/0x3d0
[ 2273.676834] Code: 55 89 e5 57 56 89 c6 53 83 ec 14 89 45 f0 e8 5d ff ff ff 89 f0 64 8b 3d 24 a6 86 c0 84 c0 8b 87 04 02 00 00 75 7a 85 c0 7e 7a <0f> 0b 80 bf 08 02 00 00 00 0f 84 87 00 00 00 e8 b2 e2 ff ff bb dc
[ 2273.676836] EAX: 00000001 EBX: f614bc00 ECX: 00000001 EDX: c0715b81
[ 2273.676838] ESI: 00000000 EDI: f18beb40 EBP: f1a3dc20 ESP: f1a3dc00
[ 2273.676840] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010002
[ 2273.676842] CR0: 80050033 CR2: b7e97230 CR3: 2f9c4000 CR4: 000406b0
[ 2273.676843] Call Trace:
[ 2273.676847] ? preempt_count_add+0xa5/0xc0
[ 2273.676852] __schedule+0x4e/0x4f0
[ 2273.676855] ? __queue_work+0xf1/0x2a0
[ 2273.676858] ? _raw_spin_lock_irqsave+0x14/0x40
[ 2273.676860] ? preempt_count_add+0x52/0xc0
[ 2273.676862] schedule+0x33/0x80
[ 2273.676865] _synchronize_rcu_expedited+0x24e/0x280
[ 2273.676867] ? rcu_accelerate_cbs_unlocked+0x70/0x70
[ 2273.676871] ? wait_woken+0x70/0x70
[ 2273.676873] ? rcu_accelerate_cbs_unlocked+0x70/0x70
[ 2273.676875] ? _synchronize_rcu_expedited+0x280/0x280
[ 2273.676877] synchronize_rcu_expedited+0x22/0x30
[ 2273.676881] synchronize_net+0x25/0x30
[ 2273.676885] dev_deactivate_many+0x133/0x230
[ 2273.676887] ? preempt_count_add+0xa5/0xc0
[ 2273.676890] __dev_close_many+0x4d/0xc0
[ 2273.676892] ? skb_dequeue+0x40/0x50
[ 2273.676895] dev_close_many+0x5d/0xd0
[ 2273.676898] rollback_registered_many+0xbf/0x4c0
[ 2273.676901] ? raw_notifier_call_chain+0x1a/0x20
[ 2273.676904] ? call_netdevice_notifiers_info+0x23/0x60
[ 2273.676906] ? netdev_master_upper_dev_get+0xe/0x70
[ 2273.676908] rollback_registered+0x1f/0x30
[ 2273.676911] unregister_netdevice_queue+0x47/0xb0
[ 2273.676915] qmimux_unregister_device+0x1f/0x30 [qmi_wwan]
[ 2273.676917] qmi_wwan_disconnect+0x5d/0x90 [qmi_wwan]
...
[ 2273.677001] ---[ end trace 0fcc5f88496b485a ]---
[ 2294.679136] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2294.679140] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P141
[ 2294.679144] rcu: (detected by 0, t=21002 jiffies, g=265857, q=8446)
[ 2294.679148] kworker/1:1 D 0 141 2 0x80000000
In addition the permitted QMAP mux_id value range is extended for
compatibility with ip(8) and the rmnet driver.
Reinhard
[1]: https://portland.source.codeaurora.org/patches/quic/gobi
[2]: https://portland.source.codeaurora.org/quic/qsdk/oss/lklm/gobinet/
====================
Tested-by: Daniele Palmas <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Permit mux_id values up to 254 to be used in qmimux_register_device()
for compatibility with ip(8) and the rmnet driver.
Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <[email protected]>
Signed-off-by: Reinhard Speyerer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Switch qmimux_unregister_device() and qmi_wwan_disconnect() to
use unregister_netdevice_queue() and unregister_netdevice_many()
instead of unregister_netdevice(). This avoids RCU stalls which
have been observed on device disconnect in certain setups otherwise.
Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <[email protected]>
Signed-off-by: Reinhard Speyerer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add proper network device usage statistics for qmimux devices
instead of reporting all-zero values for them.
Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <[email protected]>
Signed-off-by: Reinhard Speyerer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The QMAP code in the qmi_wwan driver is based on the CodeAurora GobiNet
driver which does not process QMAP padding in the RX path correctly.
Add support for QMAP padding to qmimux_rx_fixup() according to the
description of the rmnet driver.
Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <[email protected]>
Signed-off-by: Reinhard Speyerer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"A single bug fix for hpsa.
The user visible consequences aren't clear, but the ioaccel2 raid
acceleration may misfire on the malformed request assuming the payload
is big enough to require chaining (more than 31 sg entries)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hpsa: correct ioaccel2 chaining
|
|
Pull block fixes from Jens Axboe:
- Remove references to old schedulers for the scheduler switching and
blkio controller documentation (Andreas)
- Kill duplicate check for report zone for null_blk (Chaitanya)
- Two bcache fixes (Coly)
- Ensure that mq-deadline is selected if zoned block device is enabled,
as we need that to support them (Damien)
- Fix io_uring memory leak (Eric)
- ps3vram fallout from LBDAF removal (Geert)
- Redundant blk-mq debugfs debugfs_create return check cleanup (Greg)
- Extend NOPLM quirk for ST1000LM024 drives (Hans)
- Remove error path warning that can now trigger after the queue
removal/addition fixes (Ming)
* tag 'for-linus-20190614' of git://git.kernel.dk/linux-block:
block/ps3vram: Use %llu to format sector_t after LBDAF removal
libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached
bcache: fix stack corruption by PRECEDING_KEY()
blk-mq: remove WARN_ON(!q->elevator) from blk_mq_sched_free_requests
blkio-controller.txt: Remove references to CFQ
block/switching-sched.txt: Update to blk-mq schedulers
null_blk: remove duplicate check for report zone
blk-mq: no need to check return value of debugfs_create functions
io_uring: fix memory leak of UNIX domain socket inode
block: force select mq-deadline for zoned block devices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has two simple but wanted driver fixes for you"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: pca-platform: Fix GPIO lookup code
i2c: acorn: fix i2c warning
|
|
Since commit 177366bf7ceb the %rbp stopped pointing to %rbp of the
previous stack frame. That broke frame pointer based stack unwinding.
This commit is a partial revert of it.
Note that the location of tail_call_cnt is fixed, since the verifier
enforces MAX_BPF_STACK stack size for programs with tail calls.
Fixes: 177366bf7ceb ("bpf: change x86 JITed program stack layout")
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The 5.1 mount system rework changed the smackfsdef mount option to
smackfsdefault. This fixes the regression by making smackfsdef treated
the same way as smackfsdefault.
Also fix the smack_param_specs[] to have "smack" prefixes on all the
names. This isn't visible to a user unless they either:
(a) Try to mount a filesystem that's converted to the internal mount API
and that implements the ->parse_monolithic() context operation - and
only then if they call security_fs_context_parse_param() rather than
security_sb_eat_lsm_opts().
There are no examples of this upstream yet, but nfs will probably want
to do this for nfs2 or nfs3.
(b) Use fsconfig() to configure the filesystem - in which case
security_fs_context_parse_param() will be called.
This issue is that smack_sb_eat_lsm_opts() checks for the "smack" prefix
on the options, but smack_fs_context_parse_param() does not.
Fixes: c3300aaf95fb ("smack: get rid of match_token()")
Fixes: 2febd254adc4 ("smack: Implement filesystem context security hooks")
Cc: [email protected]
Reported-by: Jose Bollo <[email protected]>
Signed-off-by: Casey Schaufler <[email protected]>
Signed-off-by: David Howells <[email protected]>
Tested-by: Casey Schaufler <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net
uses RCU to detect it has setup the resources for tx. The assumption
accidentally broke when introducing bulk queue in devmap.
Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking")
Reported-by: David Ahern <[email protected]>
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
dev_map_free() forgot to free bulk queue when freeing its entries.
Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
dev_map_free() waits for flush_needed bitmap to be empty in order to
ensure all flush operations have completed before freeing its entries.
However the corresponding clear_bit() was called before using the
entries, so the entries could be used after free.
All access to the entries needs to be done before clearing the bit.
It seems commit a5e2da6e9787 ("bpf: netdev is never null in
__dev_map_flush") accidentally changed the clear_bit() and memory access
order.
Note that the problem happens only in __dev_map_flush(), not in
dev_map_flush_old(). dev_map_flush_old() is called only after nulling
out the corresponding netdev_map entry, so dev_map_free() never frees
the entry thus no such race happens there.
Fixes: a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush")
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
The mapper may be NULL when called from register_ftrace_function_probe()
with probe->data == NULL.
This issue can be reproduced as follow (it may be covered by compiler
optimization sometime):
/ # cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####
/ # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
[ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 206.952402] Mem abort info:
[ 206.952819] ESR = 0x96000006
[ 206.955326] Exception class = DABT (current EL), IL = 32 bits
[ 206.955844] SET = 0, FnV = 0
[ 206.956272] EA = 0, S1PTW = 0
[ 206.956652] Data abort info:
[ 206.957320] ISV = 0, ISS = 0x00000006
[ 206.959271] CM = 0, WnR = 0
[ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
[ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
[ 206.964953] Internal error: Oops: 96000006 [#1] SMP
[ 206.971122] Dumping ftrace buffer:
[ 206.973677] (ftrace buffer empty)
[ 206.975258] Modules linked in:
[ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
[ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
[ 206.978955] Hardware name: linux,dummy-virt (DT)
[ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
[ 206.980874] lr : ftrace_count_free+0x68/0x80
[ 206.982539] sp : ffff0000182f3ab0
[ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
[ 206.983632] x27: ffff000013054b40 x26: 0000000000000001
[ 206.984000] x25: ffff00001385f000 x24: 0000000000000000
[ 206.984394] x23: ffff000013453000 x22: ffff000013054000
[ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28
[ 206.986575] x19: ffff000013872c30 x18: 0000000000000000
[ 206.987111] x17: 0000000000000000 x16: 0000000000000000
[ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
[ 206.987850] x13: 000000000017430e x12: 0000000000000580
[ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc
[ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550
[ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
[ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
[ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
[ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000
[ 206.992557] Call trace:
[ 206.993101] free_ftrace_func_mapper+0x2c/0x118
[ 206.994827] ftrace_count_free+0x68/0x80
[ 206.995238] release_probe+0xfc/0x1d0
[ 206.995555] register_ftrace_function_probe+0x4a8/0x868
[ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180
[ 206.996330] ftrace_dump_callback+0x50/0x70
[ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8
[ 206.997157] ftrace_filter_write+0x44/0x60
[ 206.998971] __vfs_write+0x64/0xf0
[ 206.999285] vfs_write+0x14c/0x2f0
[ 206.999591] ksys_write+0xbc/0x1b0
[ 206.999888] __arm64_sys_write+0x3c/0x58
[ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0
[ 207.000607] el0_svc_handler+0x144/0x1c8
[ 207.000916] el0_svc+0x8/0xc
[ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
[ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
[ 207.010126] Kernel panic - not syncing: Fatal exception
[ 207.011322] SMP: stopping secondary CPUs
[ 207.013956] Dumping ftrace buffer:
[ 207.014595] (ftrace buffer empty)
[ 207.015632] Kernel Offset: disabled
[ 207.017187] CPU features: 0x002,20006008
[ 207.017985] Memory Limit: none
[ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Li <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
It's possible for livepatch and ftrace to be toggling a module's text
permissions at the same time, resulting in the following panic:
BUG: unable to handle page fault for address: ffffffffc005b1d9
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 1 PID: 453 Comm: insmod Tainted: G O K 5.2.0-rc1-a188339ca5 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
RIP: 0010:apply_relocate_add+0xbe/0x14c
Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
FS: 00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
klp_init_object_loaded+0x10f/0x219
? preempt_latency_start+0x21/0x57
klp_enable_patch+0x662/0x809
? virt_to_head_page+0x3a/0x3c
? kfree+0x8c/0x126
patch_init+0x2ed/0x1000 [livepatch_test02]
? 0xffffffffc0060000
do_one_initcall+0x9f/0x1c5
? kmem_cache_alloc_trace+0xc4/0xd4
? do_init_module+0x27/0x210
do_init_module+0x5f/0x210
load_module+0x1c41/0x2290
? fsnotify_path+0x3b/0x42
? strstarts+0x2b/0x2b
? kernel_read+0x58/0x65
__do_sys_finit_module+0x9f/0xc3
? __do_sys_finit_module+0x9f/0xc3
__x64_sys_finit_module+0x1a/0x1c
do_syscall_64+0x52/0x61
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The above panic occurs when loading two modules at the same time with
ftrace enabled, where at least one of the modules is a livepatch module:
CPU0 CPU1
klp_enable_patch()
klp_init_object_loaded()
module_disable_ro()
ftrace_module_enable()
ftrace_arch_code_modify_post_process()
set_all_modules_text_ro()
klp_write_object_relocations()
apply_relocate_add()
*patches read-only code* - BOOM
A similar race exists when toggling ftrace while loading a livepatch
module.
Fix it by ensuring that the livepatch and ftrace code patching
operations -- and their respective permissions changes -- are protected
by the text_mutex.
Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com
Reported-by: Johannes Erdfelt <[email protected]>
Fixes: 444d13ff10fb ("modules: add ro_after_init support")
Acked-by: Jessica Yu <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
Commit 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for
uprobe events") cleaned up the usage of trace_uprobe_create(), and the
function has been no longer used for removing uprobe/uretprobe.
Link: http://lkml.kernel.org/r/[email protected]
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|