Age | Commit message (Collapse) | Author | Files | Lines |
|
On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall entry work. This confuses audit and ptrace. For example:
$ ./tools/testing/selftests/x86/syscall_arg_fault_32
...
strace: pid 264258: entering, ptrace_syscall_info.op == 2
...
This is a minimal fix intended for ease of backporting. A more
complete cleanup is coming.
Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/8c82296ddf803b91f8d1e5eac89e5803ba54ab0e.1614884673.git.luto@kernel.org
|
|
The ORC unwinder attempts to fall back to frame pointers when ORC data
is missing for a given instruction. It sets state->error, but then
tries to keep going as a best-effort type of thing. That may result in
further warnings if the unwinder gets lost.
Until we have some way to register generated code with the unwinder,
missing ORC will be expected, and occasionally going off the rails will
also be expected. So don't warn about it.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Ivan Babrou <[email protected]>
Link: https://lkml.kernel.org/r/06d02c4bbb220bd31668db579278b0352538efbb.1612534649.git.jpoimboe@redhat.com
|
|
KASAN reserves "redzone" areas between stack frames in order to detect
stack overruns. A read or write to such an area triggers a KASAN
"stack-out-of-bounds" BUG.
Normally, the ORC unwinder stays in-bounds and doesn't access the
redzone. But sometimes it can't find ORC metadata for a given
instruction. This can happen for code which is missing ORC metadata, or
for generated code. In such cases, the unwinder attempts to fall back
to frame pointers, as a best-effort type thing.
This fallback often works, but when it doesn't, the unwinder can get
confused and go off into the weeds into the KASAN redzone, triggering
the aforementioned KASAN BUG.
But in this case, the unwinder's confusion is actually harmless and
working as designed. It already has checks in place to prevent
off-stack accesses, but those checks get short-circuited by the KASAN
BUG. And a BUG is a lot more disruptive than a harmless unwinder
warning.
Disable the KASAN checks by using READ_ONCE_NOCHECK() for all stack
accesses. This finishes the job started by commit 881125bfe65b
("x86/unwind: Disable KASAN checking in the ORC unwinder"), which only
partially fixed the issue.
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reported-by: Ivan Babrou <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Ivan Babrou <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/9583327904ebbbeda399eca9c56d6c7085ac20fe.1612534649.git.jpoimboe@redhat.com
|
|
To supply a PID/TID for large PEBS, it requires flushing the PEBS buffer
in a context switch.
For normal LBRs, a context switch can flip the address space and LBR
entries are not tagged with an identifier, we need to wipe the LBR, even
for per-cpu events.
For LBR callstack, save/restore the stack is required during a context
switch.
Set PERF_ATTACH_SCHED_CB for the event with large PEBS & LBR.
Fixes: 9c964efa4330 ("perf/x86/intel: Drain the PEBS buffer during context switches")
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Sometimes the PMU internal buffers have to be flushed for per-CPU events
during a context switch, e.g., large PEBS. Otherwise, the perf tool may
report samples in locations that do not belong to the process where the
samples are processed in, because PEBS does not tag samples with PID/TID.
The current code only flush the buffers for a per-task event. It doesn't
check a per-CPU event.
Add a new event state flag, PERF_ATTACH_SCHED_CB, to indicate that the
PMU internal buffers have to be flushed for this event during a context
switch.
Add sched_cb_entry and perf_sched_cb_usages back to track the PMU/cpuctx
which is required to be flushed.
Only need to invoke the sched_task() for per-CPU events in this patch.
The per-task events have been handled in perf_event_context_sched_in/out
already.
Fixes: 9c964efa4330 ("perf/x86/intel: Drain the PEBS buffer during context switches")
Reported-by: Gabriel Marin <[email protected]>
Originally-by: Namhyung Kim <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Provided the target address of a R_X86_64_PC32 relocation is aligned,
the low two bits should be invariant between the relative and absolute
value.
Turns out the address is not aligned and things go sideways, ensure we
transfer the bits in the absolute form when fixing up the key address.
Fixes: 73f44fe19d35 ("static_call: Allow module use without exposing static_call_key")
Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Steven Rostedt (VMware) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The function sync_runqueues_membarrier_state() should copy the
membarrier state from the @mm received as parameter to each runqueue
currently running tasks using that mm.
However, the use of smp_call_function_many() skips the current runqueue,
which is unintended. Replace by a call to on_each_cpu_mask().
Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Reported-by: Nadav Amit <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: [email protected] # 5.4.x+
Link: https://lore.kernel.org/r/[email protected]
|
|
Now that we have set_affinity_pending::stop_pending to indicate if a
stopper is in progress, and we have the guarantee that if that stopper
exists, it will (eventually) complete our @pending we can simplify the
refcount scheme by no longer counting the stopper thread.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Consider:
sched_setaffinity(p, X); sched_setaffinity(p, Y);
Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.
This causes stopper list corruption.
Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
When the purpose of migration_cpu_stop() is to migrate the task to
'any' valid CPU, don't migrate the task when it's already running on a
valid CPU.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The SCA_MIGRATE_ENABLE and task_running() cases are almost identical,
collapse them to avoid further duplication.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
When affine_move_task() issues a migration_cpu_stop(), the purpose of
that function is to complete that @pending, not any random other
p->migration_pending that might have gotten installed since.
This realization much simplifies migration_cpu_stop() and allows
further necessary steps to fix all this as it provides the guarantee
that @pending's stopper will complete @pending (and not some random
other @pending).
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
When affine_move_task(p) is called on a running task @p, which is not
otherwise already changing affinity, we'll first set
p->migration_pending and then do:
stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg);
This then gets us to migration_cpu_stop() running on the CPU that was
previously running our victim task @p.
If we find that our task is no longer on that runqueue (this can
happen because of a concurrent migration due to load-balance etc.),
then we'll end up at the:
} else if (dest_cpu < 1 || pending) {
branch. Which we'll take because we set pending earlier. Here we first
check if the task @p has already satisfied the affinity constraints,
if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop()
onto the CPU that is now hosting our task @p:
stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
&pending->arg, &pending->stop_work);
Except, we've never initialized pending->arg, which will be all 0s.
This then results in running migration_cpu_stop() on the next CPU with
arg->p == NULL, which gives the by now obvious result of fireworks.
The cure is to change affine_move_task() to always use pending->arg,
furthermore we can use the exact same pattern as the
SCA_MIGRATE_ENABLE case, since we'll block on the pending->done
completion anyway, no point in adding yet another completion in
stop_one_cpu().
This then gives a clear distinction between the two
migration_cpu_stop() use cases:
- sched_exec() / migrate_task_to() : arg->pending == NULL
- affine_move_task() : arg->pending != NULL;
And we can have it ignore p->migration_pending when !arg->pending. Any
stop work from sched_exec() / migrate_task_to() is in addition to stop
works from affine_move_task(), which will be sufficient to issue the
completion.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The mute and mic-mute LEDs on HP ZBook Studio G5 are controlled via
GPIO bits 0x10 and 0x20, respectively, and we need the extra setup for
those.
As the similar code is already present for other HP models but with
different GPIO pins, this patch factors out the common helper code and
applies those GPIO values for each model.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211893
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
When walking the page tables at a given level, and if the start
address for the range isn't aligned for that level, we propagate
the misalignment on each iteration at that level.
This results in the walker ignoring a number of entries (depending
on the original misalignment) on each subsequent iteration.
Properly aligning the address before the next iteration addresses
this issue.
Cc: [email protected]
Reported-by: Howard Zhang <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Jia He <[email protected]>
Fixes: b1e57de62cfb ("KVM: arm64: Add stand-alone page-table walker infrastructure")
[maz: rewrite commit message]
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
It looks like we have broken firmware out there that wrongly advertises
a GICv2 compatibility interface, despite the CPUs not being able to deal
with it.
To work around this, check that the CPU initialising KVM is actually able
to switch to MMIO instead of system registers, and use that as a
precondition to enable GICv2 compatibility in KVM.
Note that the detection happens on a single CPU. If the firmware is
lying *and* that the CPUs are asymetric, all hope is lost anyway.
Reported-by: Shameerali Kolothum Thodi <[email protected]>
Tested-by: Shameer Kolothum <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
As we are about to report a bit more information to the rest of
the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more
explicit __vgic_v3_get_gic_config().
No functional change.
Tested-by: Shameer Kolothum <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When running under a nesting hypervisor, it isn't guaranteed that
the virtual HW will include a PMU. In which case, let's not try
to access the PMU registers in the world switch, as that'd be
deadly.
Reported-by: Andre Przywara <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Alexandru Elisei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
We currently find out about the presence of a HW PMU (or the handling
of that PMU by perf, which amounts to the same thing) in a fairly
roundabout way, by checking the number of counters available to perf.
That's good enough for now, but we will soon need to find about about
that on paths where perf is out of reach (in the world switch).
Instead, let's turn kvm_arm_support_pmu_v3() into a static key.
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Alexandru Elisei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When panicking from the nVHE hyp and restoring the host context, x29 is
expected to hold a pointer to the host context. This wasn't being done
so fix it to make sure there's a valid pointer the host context being
used.
Rather than passing a boolean indicating whether or not the host context
should be restored, instead pass the pointer to the host context. NULL
is passed to indicate that no context should be restored.
Fixes: a2e102e20fd6 ("KVM: arm64: nVHE: Handle hyp panics")
Cc: [email protected]
Signed-off-by: Andrew Scull <[email protected]>
[maz: partial rewrite to fit 5.12-rc1]
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest
context") tracks the currently running vCPU, clearing the pointer to
NULL on exit from a guest.
Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the
kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS
code to go off into the weeds when it saves the DISR assuming that the
CPU context is embedded in a struct vCPU.
Leave x1 alone and use x3 as a temporary register instead when clearing
the vCPU on the guest exit path.
Cc: Marc Zyngier <[email protected]>
Cc: Andrew Scull <[email protected]>
Cc: <[email protected]>
Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context")
Suggested-by: Quentin Perret <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The nVHE KVM hyp drains and disables the SPE buffer, before
entering the guest, as the EL1&0 translation regime
is going to be loaded with that of the guest.
But this operation is performed way too late, because :
- The owning translation regime of the SPE buffer
is transferred to EL2. (MDCR_EL2_E2PB == 0)
- The guest Stage1 is loaded.
Thus the flush could use the host EL1 virtual address,
but use the EL2 translations instead of host EL1, for writing
out any cached data.
Fix this by moving the SPE buffer handling early enough.
The restore path is doing the right thing.
Fixes: 014c4c77aad7 ("KVM: arm64: Improve debug register save/restore flow")
Cc: [email protected]
Cc: Christoffer Dall <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Reviewed-by: Alexandru Elisei <[email protected]>
Signed-off-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Sparse warnings removed:
warning: Using plain integer as NULL pointer
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Message-Id: <20210305180816.GA488770@LEGION>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
|
|
Pull rdma fixes from Jason Gunthorpe:
"Nothing special here, though Bob's regression fixes for rxe would have
made it before the rc cycle had there not been such strong winter
weather!
- Fix corner cases in the rxe reference counting cleanup that are
causing regressions in blktests for SRP
- Two kdoc fixes so W=1 is clean
- Missing error return in error unwind for mlx5
- Wrong lock type nesting in IB CM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
RDMA/rxe: Fix missed IB reference counting in loopback
RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc
RDMA/mlx5: Set correct kernel-doc identifier
IB/mlx5: Add missing error code
RDMA/rxe: Fix missing kconfig dependency on CRYPTO
RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc-plugins fixes from Kees Cook:
"Tiny gcc-plugin fixes for v5.12-rc2. These issues are small but have
been reported a couple times now by static analyzers, so best to get
them fixed to reduce the noise. :)
- Fix coding style issues (Jason Yan)"
* tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: latent_entropy: remove unneeded semicolon
gcc-plugins: structleak: remove unneeded variable 'ret'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Rate-limit ECC warnings (Dmitry Osipenko)
- Fix error path check for NULL (Tetsuo Handa)
* tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Rate-limit "uncorrectable error in header" message
pstore: Fix warning in pstore_kill_sb()
|
|
netif_device_attach() will unpause the queues so we can't call
it before __alx_open(). This went undetected until
commit b0999223f224 ("alx: add ability to allocate and free
alx_napi structures") but now if stack tries to xmit immediately
on resume before __alx_open() we'll crash on the NAPI being null:
BUG: kernel NULL pointer dereference, address: 0000000000000198
CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G OE 5.10.0-3-amd64 #1 Debian 5.10.13-1
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77-D3H, BIOS F15 11/14/2013
RIP: 0010:alx_start_xmit+0x34/0x650 [alx]
Code: 41 56 41 55 41 54 55 53 48 83 ec 20 0f b7 57 7c 8b 8e b0
0b 00 00 39 ca 72 06 89 d0 31 d2 f7 f1 89 d2 48 8b 84 df
RSP: 0018:ffffb09240083d28 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffffa04d80ae7800 RCX: 0000000000000004
RDX: 0000000000000000 RSI: ffffa04d80afa000 RDI: ffffa04e92e92a00
RBP: 0000000000000042 R08: 0000000000000100 R09: ffffa04ea3146700
R10: 0000000000000014 R11: 0000000000000000 R12: ffffa04e92e92100
R13: 0000000000000001 R14: ffffa04e92e92a00 R15: ffffa04e92e92a00
FS: 0000000000000000(0000) GS:ffffa0508f600000(0000) knlGS:0000000000000000
i915 0000:00:02.0: vblank wait timed out on crtc 0
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000198 CR3: 000000004460a001 CR4: 00000000001706f0
Call Trace:
dev_hard_start_xmit+0xc7/0x1e0
sch_direct_xmit+0x10f/0x310
Cc: <[email protected]> # 4.9+
Fixes: bc2bebe8de8e ("alx: remove WoL support")
Reported-by: Zbynek Michl <[email protected]>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983595
Signed-off-by: Jakub Kicinski <[email protected]>
Tested-by: Zbynek Michl <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Trim all 4 bytes of the received FCS; not just 2 of them. Leaving 2
bytes of the FCS on the frame breaks DSA tailing tag drivers.
Fixes: a8db76d40e4d ("lan743x: boost performance on cpu archs w/o dma cache snooping")
Signed-off-by: George McCollister <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Fix DM verity target's optional Forward Error Correction (FEC) for
Reed-Solomon roots that are unaligned to block size"
* tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm verity: fix FEC for RS roots unaligned to block size
dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size
|
|
When using jumbo packets and overrunning rx queue with napi enabled,
the following sequence is observed in gfar_add_rx_frag:
| lstatus | | skb |
t | lstatus, size, flags | first | len, data_len, *ptr |
---+--------------------------------------+-------+-----------------------+
13 | 18002348, 9032, INTERRUPT LAST | 0 | 9600, 8000, f554c12e |
12 | 10000640, 1600, INTERRUPT | 0 | 8000, 6400, f554c12e |
11 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, f554c12e |
10 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, f554c12e |
09 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, f554c12e |
08 | 14000640, 1600, INTERRUPT FIRST | 0 | 1600, 0, f554c12e |
07 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, f554c12e |
06 | 1c000080, 128, INTERRUPT LAST FIRST | 1 | 0, 0, abf3bd6e |
05 | 18002348, 9032, INTERRUPT LAST | 0 | 8000, 6400, c5a57780 |
04 | 10000640, 1600, INTERRUPT | 0 | 6400, 4800, c5a57780 |
03 | 10000640, 1600, INTERRUPT | 0 | 4800, 3200, c5a57780 |
02 | 10000640, 1600, INTERRUPT | 0 | 3200, 1600, c5a57780 |
01 | 10000640, 1600, INTERRUPT | 0 | 1600, 0, c5a57780 |
00 | 14000640, 1600, INTERRUPT FIRST | 1 | 0, 0, c5a57780 |
So at t=7 a new packets is started but not finished, probably due to rx
overrun - but rx overrun is not indicated in the flags. Instead a new
packets starts at t=8. This results in skb->len to exceed size for the LAST
fragment at t=13 and thus a negative fragment size added to the skb.
This then crashes:
kernel BUG at include/linux/skbuff.h:2277!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c04689f4] skb_pull+0x2c/0x48
LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
Call Trace:
[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
[ec4bff08] [c0062718] kthread+0x140/0x144
[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c
This patch fixes this by checking for computed LAST fragment size, so a
negative sized fragment is never added.
In order to prevent the newer rx frame from getting corrupted, the FIRST
flag is checked to discard the incomplete older frame.
Signed-off-by: Michael Braun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
RXMAC_BC_FRM_CNT_COUNT added to mp->rx_bcasts twice in a row
in niu_xmac_interrupt(). Remove the second addition.
Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
"len > sp->mtu" checked twice in a row in sp_encaps().
Remove the second check.
Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The (0xBAF70000 & 0x00FFF000) << 6 should be (0xf70 << 18).
Fixes: 561535b0f239 ("r8169: fix OCP access on RTL8117")
Signed-off-by: Hayes Wang <[email protected]>
Acked-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
fix semicolon.cocci warning:
tools/testing/selftests/net/ipsec.c:1788:2-3: Unneeded semicolon
Signed-off-by: Xu Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
ibmvnic_remove locks multiple spinlocks while disabling interrupts:
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock_irqsave(&adapter->rwi_lock, flags);
As reported by coccinelle, the second _irqsave() overwrites the value
saved in 'flags' by the first _irqsave(), therefore when the second
_irqrestore() comes,the value in 'flags' is not valid,the value saved
by the first _irqsave() has been lost.
This likely leads to IRQs remaining disabled. So remove the second
_irqsave():
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock(&adapter->rwi_lock);
Generated by: ./scripts/coccinelle/locks/flags.cocci
./drivers/net/ethernet/ibm/ibmvnic.c:5413:1-18:
ERROR: nested lock+irqsave that reuses flags from line 5404.
Fixes: 4a41c421f367 ("ibmvnic: serialize access to work queue on remove")
Signed-off-by: Junlin Yang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to use put_unaligned when writing 32-bit DOI value
in cipso_v4_gentag_hdr to avoid unaligned memory access.
v2: unneeded type cast removed as Ondrej Mosnacek suggested.
Signed-off-by: Sergey Nazarov <[email protected]>
Acked-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull block fixes from Jens Axboe:
- NVMe fixes:
- more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal
Terjan)
- fix a hwmon error return (Daniel Wagner)
- fix the keep alive timeout initialization (Martin George)
- ensure the model_number can't be changed on a used subsystem
(Max Gurtovoy)
- rsxx missing -EFAULT on copy_to_user() failure (Dan)
- rsxx remove unused linux.h include (Tian)
- kill unused RQF_SORTED (Jean)
- updated outdated BFQ comments (Joseph)
- revert work-around commit for bd_size_lock, since we removed the
offending user in this merge window (Damien)
* tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block:
nvmet: model_number must be immutable once set
nvme-fabrics: fix kato initialization
nvme-hwmon: Return error code when registration fails
nvme-pci: add quirks for Lexar 256GB SSD
nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
rsxx: Return -EFAULT if copy_to_user() fails
block/bfq: update comments and default value in docs for fifo_expire
rsxx: remove unused including <linux/version.h>
block: Drop leftover references to RQF_SORTED
block: revert "block: fix bd_size_lock use"
|
|
Issue seen when enumerating multiple Intel mGbE interfaces in EHL.
[ 6.898141] intel-eth-pci 0000:00:1d.2: enabling device (0000 -> 0002)
[ 6.900971] intel-eth-pci 0000:00:1d.2: Fail to register stmmac-clk
[ 6.906434] intel-eth-pci 0000:00:1d.2: User ID: 0x51, Synopsys ID: 0x52
We fix it by making the clock name to be unique following the format
of stmmac-pci_name(pci_dev) so that we can differentiate the clock for
these Intel mGbE interfaces in EHL platform as follow:
/sys/kernel/debug/clk/stmmac-0000:00:1d.1
/sys/kernel/debug/clk/stmmac-0000:00:1d.2
/sys/kernel/debug/clk/stmmac-0000:00:1e.4
Fixes: 58da0cfa6cf1 ("net: stmmac: create dwmac-intel.c to contain all Intel platform")
Signed-off-by: Wong Vee Khee <[email protected]>
Signed-off-by: Voon Weifeng <[email protected]>
Co-developed-by: Ong Boon Leong <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
For Intel mGbE controller, MAC VLAN filter delete operation will time-out
if serdes power-down sequence happened first during driver remove() with
below message.
[82294.764958] intel-eth-pci 0000:00:1e.4 eth2: stmmac_dvr_remove: removing driver
[82294.778677] intel-eth-pci 0000:00:1e.4 eth2: Timeout accessing MAC_VLAN_Tag_Filter
[82294.779997] intel-eth-pci 0000:00:1e.4 eth2: failed to kill vid 0081/0
[82294.947053] intel-eth-pci 0000:00:1d.2 eth1: stmmac_dvr_remove: removing driver
[82295.002091] intel-eth-pci 0000:00:1d.1 eth0: stmmac_dvr_remove: removing driver
Therefore, we delay the serdes power-down to be after unregister_netdev()
which triggers the VLAN filter delete.
Fixes: b9663b7ca6ff ("net: stmmac: Enable SERDES power up/down sequence")
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When iavf_process_config() fails, no error return code of
iavf_init_get_resources() is assigned.
To fix this bug, err is assigned with the return value of
iavf_process_config(), and then err is checked.
Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When bdx_read_mac() fails, no error return code of bdx_probe()
is assigned.
To fix this bug, err is assigned with -EFAULT as error return code.
Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull io_uring fixes from Jens Axboe:
"A bit of a mix between fallout from the worker change, cleanups and
reductions now possible from that change, and fixes in general. In
detail:
- Fully serialize manager and worker creation, fixing races due to
that.
- Clean up some naming that had gone stale.
- SQPOLL fixes.
- Fix race condition around task_work rework that went into this
merge window.
- Implement unshare. Used for when the original task does unshare(2)
or setuid/seteuid and friends, drops the original workers and forks
new ones.
- Drop the only remaining piece of state shuffling we had left, which
was cred. Move it into issue instead, and we can drop all of that
code too.
- Kill f_op->flush() usage. That was such a nasty hack that we had
out of necessity, we no longer need it.
- Following from ->flush() removal, we can also drop various bits of
ctx state related to SQPOLL and cancelations.
- Fix an issue with IOPOLL retry, which originally was fallout from a
filemap change (removing iov_iter_revert()), but uncovered an issue
with iovec re-import too late.
- Fix an issue with system suspend.
- Use xchg() for fallback work, instead of cmpxchg().
- Properly destroy io-wq on exec.
- Add create_io_thread() core helper, and use that in io-wq and
io_uring. This allows us to remove various silly completion events
related to thread setup.
- A few error handling fixes.
This should be the grunt of fixes necessary for the new workers, next
week should be quieter. We've got a pending series from Pavel on
cancelations, and how tasks and rings are indexed. Outside of that,
should just be minor fixes. Even with these fixes, we're still killing
a net ~80 lines"
* tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block: (41 commits)
io_uring: don't restrict issue_flags for io_openat
io_uring: make SQPOLL thread parking saner
io-wq: kill hashed waitqueue before manager exits
io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
io_uring: don't keep looping for more events if we can't flush overflow
io_uring: move to using create_io_thread()
kernel: provide create_io_thread() helper
io_uring: reliably cancel linked timeouts
io_uring: cancel-match based on flags
io-wq: ensure all pending work is canceled on exit
io_uring: ensure that threads freeze on suspend
io_uring: remove extra in_idle wake up
io_uring: inline __io_queue_async_work()
io_uring: inline io_req_clean_work()
io_uring: choose right tctx->io_wq for try cancel
io_uring: fix -EAGAIN retry with IOPOLL
io-wq: fix error path leak of buffered write hash map
io_uring: remove sqo_task
io_uring: kill sqo_dead and sqo submission halting
io_uring: ignore double poll add on the same waitqueue head
...
|
|
This patch fixes a bug that the moderation config will not be
applied when calling mlx4_en_reset_config. For example, when
turning on rx timestamping, mlx4_en_reset_config() will be called,
causing the NIC to forget previous moderation config.
This fix is in phase with a previous fix:
commit 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss
after set_ringparam is called")
Tested: Before this patch, on a host with NIC using mlx4, run
netserver and stream TCP to the host at full utilization.
$ sar -I SUM 1
INTR intr/s
14:03:56 sum 48758.00
After rx hwtstamp is enabled:
$ sar -I SUM 1
14:10:38 sum 317771.00
We see the moderation is not working properly and issued 7x more
interrupts.
After the patch, and turned on rx hwtstamp, the rate of interrupts
is as expected:
$ sar -I SUM 1
14:52:11 sum 49332.00
Fixes: 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called")
Signed-off-by: Kevin(Yudong) Yang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
CC: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the usage of device links in the runtime PM core code and
update the DTPM (Dynamic Thermal Power Management) feature added
recently.
Specifics:
- Make the runtime PM core code avoid attempting to suspend supplier
devices before updating the PM-runtime status of a consumer to
'suspended' (Rafael Wysocki).
- Fix DTPM (Dynamic Thermal Power Management) root node
initialization and label that feature as EXPERIMENTAL in Kconfig
(Daniel Lezcano)"
* tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap/drivers/dtpm: Add the experimental label to the option description
powercap/drivers/dtpm: Fix root node initialization
PM: runtime: Update device status before letting suppliers suspend
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make the empty stubs of some helper functions used when CONFIG_ACPI is
not set actually match those functions (Andy Shevchenko)"
* tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: bus: Constify is_acpi_node() and friends (part 2)
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2021-03-04
The following pull-request contains BPF updates for your *net* tree.
We've added 7 non-merge commits during the last 4 day(s) which contain
a total of 9 files changed, 128 insertions(+), 40 deletions(-).
The main changes are:
1) Fix 32-bit cmpxchg, from Brendan.
2) Fix atomic+fetch logic, from Ilya.
3) Fix usage of bpf_csum_diff in selftests, from Yauheni.
====================
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix a sleeping-while-atomic issue in the AMD IOMMU code
- Disable lazy IOTLB flush for untrusted devices in the Intel VT-d
driver
- Fix status code definitions for Intel VT-d
- Fix IO Page Fault issue in Tegra IOMMU driver
* tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Fix status code for Allocate/Free PASID command
iommu: Don't use lazy flush for untrusted device
iommu/tegra-smmu: Fix mc errors on tegra124-nyan
iommu/amd: Fix sleeping in atomic in increase_address_space()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"More regression fixes and stabilization.
Regressions:
- zoned mode
- count zone sizes in wider int types
- fix space accounting for read-only block groups
- subpage: fix page tail zeroing
Fixes:
- fix spurious warning when remounting with free space tree
- fix warning when creating a directory with smack enabled
- ioctl checks for qgroup inheritance when creating a snapshot
- qgroup
- fix missing unlock on error path in zero range
- fix amount of released reservation on error
- fix flushing from unsafe context with open transaction,
potentially deadlocking
- minor build warning fixes"
* tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: do not account freed region of read-only block group as zone_unusable
btrfs: zoned: use sector_t for zone sectors
btrfs: subpage: fix the false data csum mismatch error
btrfs: fix warning when creating a directory with smack enabled
btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
btrfs: export and rename qgroup_reserve_meta
btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
btrfs: fix spurious free_space_tree remount warning
btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
btrfs: ref-verify: use 'inline void' keyword ordering
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Another batch of graph and video-interfaces schema conversions
- Drop DT header symlink for dropped C6X arch
- Fix bcm2711-hdmi schema error
* tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: media: Use graph and video-interfaces schemas, round 2
dts: drop dangling c6x symlink
dt-bindings: bcm2711-hdmi: Fix broken schema
|