Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"A fix for an older bug that has started to show up during testing
(because of an updated test for rename exchange).
It's an in-memory corruption caused by local variable leaking out of
the function scope"
* tag 'for-5.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix log context list corruption after rename exchange operation
|
|
These are the Foxconn-branded variants of the Dell DW5821e modules,
same USB layout as those.
The QMI interface is exposed in USB configuration #1:
P: Vendor=0489 ProdID=e0b4 Rev=03.18
S: Manufacturer=FII
S: Product=T77W968 LTE
S: SerialNumber=0123456789ABCDEF
C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I: If#=0x1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I: If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I: If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I: If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
Signed-off-by: Aleksander Morgado <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-11-13
this is a pull request of 9 patches for net/master, hopefully for the v5.4
release cycle.
All nine patches are by Oleksij Rempel and fix locking and use-after-free bugs
in the j1939 stack found by the syzkaller syzbot.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2019-11-13
1) Fix a page memleak on xfrm state destroy.
2) Fix a refcount imbalance if a xfrm_state
gets invaild during async resumption.
From Xiaodong Xu.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
On a system without KVM_COMPAT, we prevent IOCTLs from being issued
by a compat task. Although this prevents most silly things from
happening, it can still confuse a 32bit userspace that is able
to open the kvm device (the qemu test suite seems to be pretty
mad with this behaviour).
Take a more radical approach and return a -ENODEV to the compat
task.
Reported-by: Peter Maydell <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When applying commit 7a5ee6edb42e ("KVM: X86: Fix initialization of MSR
lists"), it forgot to reset the three MSR lists number varialbes to 0
while removing the useless conditionals.
Fixes: 7a5ee6edb42e (KVM: X86: Fix initialization of MSR lists)
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Glibc-2.30 gained gettid() wrapper, selftests fail to compile:
lib/assert.c:58:14: error: static declaration of ‘gettid’ follows non-static declaration
58 | static pid_t gettid(void)
| ^~~~~~
In file included from /usr/include/unistd.h:1170,
from include/test_util.h:18,
from lib/assert.c:10:
/usr/include/bits/unistd_ext.h:34:16: note: previous declaration of ‘gettid’ was here
34 | extern __pid_t gettid (void) __THROW;
| ^~~~~~
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
If a huge page is recovered (and becomes no executable) while another
thread is executing it, the resulting contention on mmu_lock can cause
latency spikes. Disabling recovery for PREEMPT_RT kernels fixes this
issue.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The datasheet of V3s (and various other chips) wrote
that TCON0_DCLK_DIV can be >= 1 if only dclk is used,
and must >= 6 if dclk1 or dclk2 is used. As currently
neither dclk1 nor dclk2 is used (no writes to these
bits), let's set minimal division to 1.
If this minimal division is 6, some common dot clock
frequencies can't be produced (e.g. 30MHz will not be
possible and will fallback to 25MHz), which is
obviously not an expected behaviour.
Signed-off-by: Yunhao Tian <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/MN2PR08MB57905AD8A00C08DA219377C989760@MN2PR08MB5790.namprd08.prod.outlook.com/
|
|
rdtgroup_cpus_write() and mkdir_rdt_prepare() call
rdtgroup_kn_lock_live() -> kernfs_to_rdtgroup() to get 'rdtgrp', and
then call the rdt_last_cmd_{clear,puts,...}() functions which will check
if rdtgroup_mutex is held/requires its caller to hold rdtgroup_mutex.
But if 'rdtgrp' returned from kernfs_to_rdtgroup() is NULL,
rdtgroup_mutex is not held and calling rdt_last_cmd_{clear,puts,...}()
will result in a self-incurred, potential lockdep warning.
Remove the rdt_last_cmd_{clear,puts,...}() calls in these two paths.
Just returning error should be sufficient to report to the user that the
entry doesn't exist any more.
[ bp: Massage. ]
Fixes: 94457b36e8a5 ("x86/intel_rdt: Add diagnostics when writing the cpus file")
Fixes: cfd0f34e4cd5 ("x86/intel_rdt: Add diagnostics when making directories")
Signed-off-by: Xiaochen Shen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
When the CC variable contains quotes, e.g. when using
ccache (make CC="ccache <compiler>"), this script always
fails, so CONFIG_RELR is never enabled, even when the
toolchain supports this feature. Removing the /dev/null
redirect and invoking the script manually shows the issue:
$ CC='/usr/bin/ccache clang' ./scripts/tools-support-relr.sh
./scripts/tools-support-relr.sh: 7: ./scripts/tools-support-relr.sh: /usr/bin/ccache clang: not found
Fix this by un-quoting the variables.
Before:
$ make ARCH=arm64 CC='/usr/bin/ccache clang' LD=ld.lld \
NM=llvm-nm OBJCOPY=llvm-objcopy defconfig
$ grep RELR .config
CONFIG_ARCH_HAS_RELR=y
With this change:
$ make ARCH=arm64 CC='/usr/bin/ccache clang' LD=ld.lld \
NM=llvm-nm OBJCOPY=llvm-objcopy defconfig
$ grep RELR .config
CONFIG_TOOLS_SUPPORT_RELR=y
CONFIG_ARCH_HAS_RELR=y
CONFIG_RELR=y
Fixes: 5cf896fb6be3 ("arm64: Add support for relocating the kernel with RELR relocations")
Reported-by: Dmitry Golovin <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Masahiro Yamada <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/769
Cc: Peter Collingbourne <[email protected]>
Signed-off-by: Ilie Halip <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
|
|
If the nullity check for `substream->runtime` is outside of the lock
region, it is possible to have a null runtime in the critical section
if snd_pcm_detach_substream is called right before the lock.
Signed-off-by: paulhsia <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
While output urb's snd_complete_urb() is executing, calling
prepare_outbound_urb() may cause endpoint stopped before
prepare_outbound_urb() returns and result in next urb submitted
to stopped endpoint. usb-audio driver cannot re-use it afterwards as
the urb is still hold by usb stack.
This change checks EP_FLAG_RUNNING flag after prepare_outbound_urb() again
to let snd_complete_urb() know the endpoint already stopped and does not
submit next urb. Below kind of error will be fixed:
[ 213.153103] usb 1-2: timeout: still 1 active urbs on EP #1
[ 213.164121] usb 1-2: cannot submit urb 0, error -16: unknown error
Signed-off-by: Henry Lin <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
j1939_session_destroy() and __j1939_priv_release() should be called only
if session, ecu or socket are not linked or used by any one else. If at
least one of these resources is linked, then the reference counting is
broken somewhere.
This warning will be triggered before KASAN will do, and will make it
easier to debug initial issue. This works on platforms without KASAN
support.
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
j1939_can_recv() can be called in parallel with socket release. In this
case sk_release and sk_destruct can be done earlier than
j1939_can_recv() is processed.
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
hrtimer_try_to_cancel() instead of hrtimer_cancel()
This part of the code protected by lock used in the hrtimer as well.
Using hrtimer_cancel() will trigger dead lock.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
We link the socket to the session to be able provide socket specific
notifications. For example messages over error queue.
We need to keep the socket held, while we have a reference to it.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
only once
j1939_session_cancel() was modifying session->state without protecting
it by locks and without checking actual state of the session.
This patch moves j1939_tp_set_rxtimeout() into j1939_session_cancel()
and adds the missing locking.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
j1939_sk_sendmsg()
j1939_sk_sendmsg() should be protected by lock_sock() to avoid race with
j1939_sk_bind() and j1939_sk_release().
Reported-by: [email protected]
Reported-by: [email protected]
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
This patch avoids a NULL pointer deref crash if ndev->ml_priv is NULL.
Reported-by: [email protected]
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
This patch delays the j1939_priv_put() until the socket is destroyed via
the sk_destruct callback, to avoid use-after-free problems.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
In j1939 we need our own struct sock::sk_destruct callback. Export the
generic af_can can_sock_destruct() that allows us to chain-call it.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Oleksij Rempel <[email protected]>
|
|
It looks like a "static inline" has been missed in front
of the empty definition of perf_cgroup_switch() under
certain configurations.
Fixes the following sparse warning:
kernel/events/core.c:1035:1: warning: symbol 'perf_cgroup_switch' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit:
313ccb9615948 ("perf: Allocate context task_ctx_data for child event")
makes the inherit path skip over the current event in case of task_ctx_data
allocation failure. This, however, is inconsistent with allocation failures
in perf_event_alloc(), which would abort the fork.
Correct this by returning an error code on task_ctx_data allocation
failure and failing the fork in that case.
Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit
ab43762ef0109 ("perf: Allow normal events to output AUX data")
added 'aux_output' bit to the attribute structure, which relies on AUX
events and grouping, neither of which is supported for the kernel events.
This notwithstanding, attempts have been made to use it in the kernel
code, suggesting the necessity of an explicit hard -EINVAL.
Fix this by rejecting attributes with aux_output set for kernel events.
Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
A comment is in a wrong place in perf_event_create_kernel_counter().
Fix that.
Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit
f733c6b508bc ("perf/core: Fix inheritance of aux_output groups")
adds a NULL pointer dereference in case inherit_group() races with
perf_release(), which causes the below crash:
> BUG: kernel NULL pointer dereference, address: 000000000000010b
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 3b203b067 P4D 3b203b067 PUD 3b2040067 PMD 0
> Oops: 0000 [#1] SMP KASAN
> CPU: 0 PID: 315 Comm: exclusive-group Tainted: G B 5.4.0-rc3-00181-g72e1839403cb-dirty #878
> RIP: 0010:perf_get_aux_event+0x86/0x270
> Call Trace:
> ? __perf_read_group_add+0x3b0/0x3b0
> ? __kasan_check_write+0x14/0x20
> ? __perf_event_init_context+0x154/0x170
> inherit_task_group.isra.0.part.0+0x14b/0x170
> perf_event_init_task+0x296/0x4b0
Fix this by skipping over events that are getting closed, in the
inheritance path.
Signed-off-by: Alexander Shishkin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Fixes: f733c6b508bc ("perf/core: Fix inheritance of aux_output groups")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
While discussing uncore event scheduling, I noticed we do not in fact
seem to dis-allow making uncore-cgroup events. Such events make no
sense what so ever because the cgroup is a CPU local state where
uncore counts across a number of CPUs.
Disallow them.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The setup_dpio() function tries to allocate a number of channels equal
to the number of CPUs online. When there are not enough DPCON objects
already probed, the function will return EPROBE_DEFER. When this
happens, the already allocated channels are not freed. This results in
the incapacity of properly probing the next time around.
Fix this by freeing the channels on the error path.
Fixes: d7f5a9d89a55 ("dpaa2-eth: defer probe on object allocate")
Signed-off-by: Ioana Ciornei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The device md->input is used after it is released. Setting the device
data to NULL is unnecessary as the device is never used again. Instead,
md->input should be assigned NULL to avoid accessing the freed memory
accidently. Besides, checking md->si against NULL is superfluous as it
points to a variable address, which cannot be NULL.
Signed-off-by: Pan Bian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
The driver for F54 just polls the status and doesn't even have a IRQ
handler registered. Make sure to disable all F54 IRQs, so we don't crash
the kernel on a nonexistent handler.
Signed-off-by: Lucas Stach <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
This went into staging in rc7. It turns out that was a mistake, and
apparently it wasn't even supposed to go there at all, but be introduced
as a regular filesystem.
We don't try to sneak in whole new filesystems this late in the rc, just
delete the whole thing, and it can be re-introduced as a proper patch
with proper acks from actual filesystem people instead of some odd
late-rc staging back-door.
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Hans de Goede <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull kvm fixes from Paolo Bonzini:
"Fix unwinding of KVM_CREATE_VM failure, VT-d posted interrupts,
DAX/ZONE_DEVICE, and module unload/reload"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
KVM: VMX: Introduce pi_is_pir_empty() helper
KVM: VMX: Do not change PID.NDST when loading a blocked vCPU
KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts
KVM: VMX: Fix comment to specify PID.ON instead of PIR.ON
KVM: X86: Fix initialization of MSR lists
KVM: fix placement of refcount initialization
KVM: Fix NULL-ptr deref after kvm_create_vm fails
|
|
If an SMC socket is immediately terminated after a non-blocking connect()
has been called, a memory leak is possible.
Due to the sock_hold move in
commit 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()")
an extra sock_put() is needed in smc_connect_work(), if the internal
TCP socket is aborted and cancels the sk_stream_wait_connect() of the
connect worker.
Reported-by: [email protected]
Fixes: 301428ea3708 ("net/smc: fix refcounting for non-blocking connect()")
Signed-off-by: Ursula Braun <[email protected]>
Signed-off-by: Karsten Graul <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
drm-intel-fixes
gvt-fixes-2019-11-12
- Fix dmabuf reference drop (Pan)
Signed-off-by: Rodrigo Vivi <[email protected]>
From: Zhenyu Wang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Since CNP it's possible for rawclk to have two different values, 19.2
and 24 MHz. If the value indicated by SFUSE_STRAP register is different
from the power on default for PCH_RAWCLK_FREQ, we'll end up having a
mismatch between the rawclk hardware and software states after
suspend/resume. On previous platforms this used to work by accident,
because the power on defaults worked just fine.
Update the rawclk also on resume. The natural place to do this would be
intel_modeset_init_hw(), however VLV/CHV need it done before
intel_power_domains_init_hw(). Thus put it there even if it feels
slightly out of place.
v2: Call intel_update_rawclck() in intel_power_domains_init_hw() for all
platforms (Ville).
Reported-by: Shawn Lee <[email protected]>
Cc: Shawn Lee <[email protected]>
Cc: Ville Syrjala <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Tested-by: Shawn Lee <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 59ed05ccdded5eb18ce012eff3d01798ac8535fa)
Cc: <[email protected]> # v4.15+
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TSX Async Abort and iTLB Multihit mitigations from Thomas Gleixner:
"The performance deterioration departement is not proud at all of
presenting the seventh installment of speculation mitigations and
hardware misfeature workarounds:
1) TSX Async Abort (TAA) - 'The Annoying Affair'
TAA is a hardware vulnerability that allows unprivileged
speculative access to data which is available in various CPU
internal buffers by using asynchronous aborts within an Intel TSX
transactional region.
The mitigation depends on a microcode update providing a new MSR
which allows to disable TSX in the CPU. CPUs which have no
microcode update can be mitigated by disabling TSX in the BIOS if
the BIOS provides a tunable.
Newer CPUs will have a bit set which indicates that the CPU is not
vulnerable, but the MSR to disable TSX will be available
nevertheless as it is an architected MSR. That means the kernel
provides the ability to disable TSX on the kernel command line,
which is useful as TSX is a truly useful mechanism to accelerate
side channel attacks of all sorts.
2) iITLB Multihit (NX) - 'No eXcuses'
iTLB Multihit is an erratum where some Intel processors may incur
a machine check error, possibly resulting in an unrecoverable CPU
lockup, when an instruction fetch hits multiple entries in the
instruction TLB. This can occur when the page size is changed
along with either the physical address or cache type. A malicious
guest running on a virtualized system can exploit this erratum to
perform a denial of service attack.
The workaround is that KVM marks huge pages in the extended page
tables as not executable (NX). If the guest attempts to execute in
such a page, the page is broken down into 4k pages which are
marked executable. The workaround comes with a mechanism to
recover these shattered huge pages over time.
Both issues come with full documentation in the hardware
vulnerabilities section of the Linux kernel user's and administrator's
guide.
Thanks to all patch authors and reviewers who had the extraordinary
priviledge to be exposed to this nuisance.
Special thanks to Borislav Petkov for polishing the final TAA patch
set and to Paolo Bonzini for shepherding the KVM iTLB workarounds and
providing also the backports to stable kernels for those!"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
Documentation: Add ITLB_MULTIHIT documentation
kvm: x86: mmu: Recovery of shattered NX large pages
kvm: Add helper function for creating VM worker threads
kvm: mmu: ITLB_MULTIHIT mitigation
cpu/speculation: Uninline and export CPU mitigations helpers
x86/cpu: Add Tremont to the cpu vulnerability whitelist
x86/bugs: Add ITLB_MULTIHIT bug infrastructure
x86/tsx: Add config options to set tsx=on|off|auto
x86/speculation/taa: Add documentation for TSX Async Abort
x86/tsx: Add "auto" option to the tsx= cmdline parameter
kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
x86/speculation/taa: Add sysfs reporting for TSX Async Abort
x86/speculation/taa: Add mitigation for TSX Async Abort
x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
x86/cpu: Add a helper function x86_read_arch_cap_msr()
x86/msr: Add the IA32_TSX_CTRL MSR
|
|
Some Coffee Lake platforms have a skewed HPET timer once the SoCs entered
PC10, which in consequence marks TSC as unstable because HPET is used as
watchdog clocksource for TSC.
Harry Pan tried to work around it in the clocksource watchdog code [1]
thereby creating a circular dependency between HPET and TSC. This also
ignores the fact, that HPET is not only unsuitable as watchdog clocksource
on these systems, it becomes unusable in general.
Disable HPET on affected platforms.
Suggested-by: Feng Tang <[email protected]>
Signed-off-by: Kai-Heng Feng <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203183
Link: https://lore.kernel.org/lkml/[email protected]/ [1]
Link: https://lkml.kernel.org/r/[email protected]
|
|
__bio_try_merge_page() may merge a page to bio without bio_full() check
and cause bi_size overflow.
The overflow typically ends up with sd_init_command() warning on zero
segment request with call trace like this:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180
CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:scsi_init_io+0x156/0x180
RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246
RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000
RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118
RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000
R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000
FS: 0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0
Call Trace:
sd_init_command+0x326/0xb40 [sd_mod]
scsi_queue_rq+0x502/0xaa0
? blk_mq_get_driver_tag+0xe7/0x120
blk_mq_dispatch_rq_list+0x256/0x5a0
? elv_rb_del+0x24/0x30
? deadline_remove_request+0x7b/0xc0
blk_mq_do_dispatch_sched+0xa3/0x140
blk_mq_sched_dispatch_requests+0xfb/0x170
__blk_mq_run_hw_queue+0x81/0x130
blk_mq_run_work_fn+0x1b/0x20
process_one_work+0x179/0x390
worker_thread+0x4f/0x3e0
kthread+0x105/0x140
? max_active_store+0x80/0x80
? kthread_bind+0x20/0x20
ret_from_fork+0x35/0x40
---[ end trace f9036abf5af4a4d3 ]---
blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0
XFS (sdd1): writeback error on sector 2875552
__bio_try_merge_page() should check the overflow before actually doing
merge.
Fixes: 07173c3ec276c ("block: enable multipage bvecs")
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Jun'ichi Nomura <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis. For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup(). But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.
This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().
Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()). But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.
[*] http://lkml.kernel.org/r/[email protected]
Reported-by: Adam Borowski <[email protected]>
Analyzed-by: David Hildenbrand <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: [email protected]
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Streamline the PID.PIR check and change its call sites to use
the newly added helper.
Suggested-by: Liran Alon <[email protected]>
Signed-off-by: Joao Martins <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
(wakeup_handler()) is responsible to kick (unblock) any vCPU on that
linked list that now has pending posted interrupts.
While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
will cause vmx_vcpu_pi_put() to set PID.SN. If later the vCPU will be
scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
PID.SN but will also *overwrite PID.NDST to this different pCPU*.
Instead of keeping it with original pCPU which vCPU had entered block
phase on.
This results in an issue because when a posted interrupt is delivered, as
the wakeup_handler() will be executed and fail to find blocked vCPU on
its per pCPU linked list of all vCPUs that are blocked on this pCPU.
Which is due to the vCPU being placed on a *different* per pCPU
linked list i.e. the original pCPU in which it entered block phase.
The regression is introduced by commit c112b5f50232 ("KVM: x86:
Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
avoiding changing PID.NDST when loading a blocked vCPU.
Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
Tested-by: Nathan Ni <[email protected]>
Co-developed-by: Liran Alon <[email protected]>
Signed-off-by: Liran Alon <[email protected]>
Signed-off-by: Joao Martins <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Commit 17e433b54393 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
introduced vmx_dy_apicv_has_pending_interrupt() in order to determine
if a vCPU have a pending posted interrupt. This routine is used by
kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule
on pCPU instead of a vCPU doing busy loop.
vmx_dy_apicv_has_pending_interrupt() determines if a
vCPU has a pending posted interrupt solely based on PID.ON. However,
when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause
raised posted interrupts to only set bit in PID.PIR without setting
PID.ON (and without sending notification vector), as depicted in VT-d
manual section 5.2.3 "Interrupt-Posting Hardware Operation".
Therefore, checking PID.ON is insufficient to determine if a vCPU has
pending posted interrupts and instead we should also check if there is
some bit set on PID.PIR if PID.SN=1.
Fixes: 17e433b54393 ("KVM: Fix leak vCPU's VMCS value into other pCPU")
Reviewed-by: Jagannathan Raman <[email protected]>
Co-developed-by: Liran Alon <[email protected]>
Signed-off-by: Liran Alon <[email protected]>
Signed-off-by: Joao Martins <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The Outstanding Notification (ON) bit is part of the Posted Interrupt
Descriptor (PID) as opposed to the Posted Interrupts Register (PIR).
The latter is a bitmap for pending vectors.
Reviewed-by: Joao Martins <[email protected]>
Signed-off-by: Liran Alon <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The three MSR lists(msrs_to_save[], emulated_msrs[] and
msr_based_features[]) are global arrays of kvm.ko, which are
adjusted (copy supported MSRs forward to override the unsupported MSRs)
when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays
to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next
installation, kvm-{intel,amd}.ko will do operations on the modified
arrays with some MSRs lost and some MSRs duplicated.
So define three constant arrays to hold the initial MSR lists and
initialize msrs_to_save[], emulated_msrs[] and msr_based_features[]
based on the constant arrays.
Cc: [email protected]
Reviewed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Chenyi Qiang <[email protected]>
[Remove now useless conditionals. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
An ESP packet could be decrypted in async mode if the input handler for
this packet returns -EINPROGRESS in xfrm_input(). At this moment the device
reference in skb is held. Later xfrm_input() will be invoked again to
resume the processing.
If the transform state is still valid it would continue to release the
device reference and there won't be a problem; however if the transform
state is not valid when async resumption happens, the packet will be
dropped while the device reference is still being held.
When the device is deleted for some reason and the reference to this
device is not properly released, the kernel will keep logging like:
unregister_netdevice: waiting for ppp2 to become free. Usage count = 1
The issue is observed when running IPsec traffic over a PPPoE device based
on a bridge interface. By terminating the PPPoE connection on the server
end for multiple times, the PPPoE device on the client side will eventually
get stuck on the above warning message.
This patch will check the async mode first and continue to release device
reference in async resumption, before it is dropped due to invalid state.
v2: Do not assign address family from outer_mode in the transform if the
state is invalid
v3: Release device reference in the error path instead of jumping to resume
Fixes: 4ce3dbe397d7b ("xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)")
Signed-off-by: Xiaodong Xu <[email protected]>
Reported-by: Bo Chen <[email protected]>
Tested-by: Bo Chen <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
|
Currently we make sequence == 0 be the same as sequence == 1, but that's
not super useful if the intent is really to have a timeout that's just
a pure timeout.
If the user passes in sqe->off == 0, then don't apply any sequence logic
to the request, let it purely be driven by the timeout specified.
Reported-by: 李通洲 <[email protected]>
Reviewed-by: 李通洲 <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
A cast to 'time_t' was accidentally left in place during the
conversion of __do_adjtimex() to 64-bit timestamps, so the
resulting value is incorrectly truncated.
Remove the cast so the 64-bit time gets propagated correctly.
Fixes: ead25417f82e ("timex: use __kernel_timex internally")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Fix coccinelle warning:
./drivers/net/phy/mdio_bus.c:67:5-12: ERROR: PTR_ERR applied after initialization to constant on line 62
./drivers/net/phy/mdio_bus.c:68:5-12: ERROR: PTR_ERR applied after initialization to constant on line 62
Fix this by using IS_ERR before PTR_ERR
Reported-by: Hulk Robot <[email protected]>
Fixes: 71dd6c0dff51 ("net: phy: add support for reset-controller")
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
I2C communication errors (-EREMOTEIO) during the IRQ handler of nxp-nci
result in a NULL pointer dereference at the moment:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 355 Comm: irq/137-nxp-nci Not tainted 5.4.0-rc6 #1
RIP: 0010:skb_queue_tail+0x25/0x50
Call Trace:
nci_recv_frame+0x36/0x90 [nci]
nxp_nci_i2c_irq_thread_fn+0xd1/0x285 [nxp_nci_i2c]
? preempt_count_add+0x68/0xa0
? irq_forced_thread_fn+0x80/0x80
irq_thread_fn+0x20/0x60
irq_thread+0xee/0x180
? wake_threads_waitq+0x30/0x30
kthread+0xfb/0x130
? irq_thread_check_affinity+0xd0/0xd0
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x40
Afterward the kernel must be rebooted to work properly again.
This happens because it attempts to call nci_recv_frame() with skb == NULL.
However, unlike nxp_nci_fw_recv_frame(), nci_recv_frame() does not have any
NULL checks for skb, causing the NULL pointer dereference.
Change the code to call only nxp_nci_fw_recv_frame() in case of an error.
Make sure to log it so it is obvious that a communication error occurred.
The error above then becomes:
nxp-nci_i2c i2c-NXP1001:00: NFC: Read failed with error -121
nci: __nci_request: wait_for_completion_interruptible_timeout failed 0
nxp-nci_i2c i2c-NXP1001:00: NFC: Read failed with error -121
Fixes: 6be88670fc59 ("NFC: nxp-nci_i2c: Add I2C support to NXP NCI driver")
Signed-off-by: Stephan Gerhold <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|