Age | Commit message (Collapse) | Author | Files | Lines |
|
While vxlan doesn't need any extra tailroom, the lowerdev might need it. In
that case, copy it over to reduce the chance for additional (re)allocations
in the transmit path.
Signed-off-by: Sven Eckelmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
It was observed that sending data via batadv over vxlan (on top of
wireguard) reduced the performance massively compared to raw ethernet or
batadv on raw ethernet. A check of perf data showed that the
vxlan_build_skb was calling all the time pskb_expand_head to allocate
enough headroom for:
min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
+ VXLAN_HLEN + iphdr_len;
But the vxlan_config_apply only requested needed headroom for:
lowerdev->hard_header_len + VXLAN6_HEADROOM or VXLAN_HEADROOM
So it completely ignored the needed_headroom of the lower device. The first
caller of net_dev_xmit could therefore never make sure that enough headroom
was allocated for the rest of the transmit path.
Cc: Annika Wickert <[email protected]>
Signed-off-by: Sven Eckelmann <[email protected]>
Tested-by: Annika Wickert <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
there is kernel panic in inet_twsk_free() while chtls
module unload when socket is in TIME_WAIT state because
sk_prot_creator was not preserved on connection socket.
Fixes: cc35c88ae4db ("crypto : chtls - CPL handler definition")
Signed-off-by: Udai Sharma <[email protected]>
Signed-off-by: Vinay Kumar Yadav <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending
delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work
may block waiting for delete_work_func to complete, and delete_work_func may
block trying to acquire the inode glock in gfs2_inode_lookup.
Reported-by: Alexander Aring <[email protected]>
Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: [email protected] # v5.8+
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
This patch fixes a potential use-after-free bug in
cifs_echo_request().
For instance,
thread 1
--------
cifs_demultiplex_thread()
clean_demultiplex_info()
kfree(server)
thread 2 (workqueue)
--------
apic_timer_interrupt()
smp_apic_timer_interrupt()
irq_exit()
__do_softirq()
run_timer_softirq()
call_timer_fn()
cifs_echo_request() <- use-after-free in server ptr
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
CC: Stable <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
A customer has reported that several files in their multi-threaded app
were left with size of 0 because most of the read(2) calls returned
-EINTR and they assumed no bytes were read. Obviously, they could
have fixed it by simply retrying on -EINTR.
We noticed that most of the -EINTR on read(2) were due to real-time
signals sent by glibc to process wide credential changes (SIGRT_1),
and its signal handler had been established with SA_RESTART, in which
case those calls could have been automatically restarted by the
kernel.
Let the kernel decide to whether or not restart the syscalls when
there is a signal pending in __smb_send_rqst() by returning
-ERESTARTSYS. If it can't, it will return -EINTR anyway.
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
CC: Stable <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
In the slow path of __rb_reserve_next() a nested event(s) can happen
between evaluating the timestamp delta of the current event and updating
write_stamp via local_cmpxchg(); in this case the delta is not valid
anymore and it should be set to 0 (same timestamp as the interrupting
event), since the event that we are currently processing is not the last
event in the buffer.
Link: https://lkml.kernel.org/r/X8IVJcp1gRE+FJCJ@xps-13-7390
Cc: Ingo Molnar <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: [email protected]
Link: https://lwn.net/Articles/831207
Fixes: a389d86f7fd0 ("ring-buffer: Have nested events still record running time stamp")
Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
The write stamp, used to calculate deltas between events, was updated with
the stale "ts" value in the "info" structure, and not with the updated "ts"
variable. This caused the deltas between events to be inaccurate, and when
crossing into a new sub buffer, had time go backwards.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp")
Reported-by: "J. Avila" <[email protected]>
Tested-by: Daniel Mentz <[email protected]>
Tested-by: Will McVicker <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
UL in the definition of SYS_TFSR_EL1_TF1 was misspelled causing
compilation issues when trying to implement in kernel MTE async
mode.
Fix the macro correcting the typo.
Note: MTE async mode will be introduced with a future series.
Fixes: c058b1c4a5ea ("arm64: mte: system register definitions")
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
In debug_exception_enter() and debug_exception_exit() we trace hardirqs
on/off while RCU isn't guaranteed to be watching, and we don't save and
restore the hardirq state, and so may return with this having changed.
Handle this appropriately with new entry/exit helpers which do the bare
minimum to ensure this is appropriately maintained, without marking
debug exceptions as NMIs. These are placed in entry-common.c with the
other entry/exit helpers.
In future we'll want to reconsider whether some debug exceptions should
be NMIs, but this will require a significant refactoring, and for now
this should prevent issues with lockdep and RCU.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marins <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Exceptions which can be taken at (almost) any time are consdiered to be
NMIs. On arm64 that includes:
* SDEI events
* GICv3 Pseudo-NMIs
* Kernel stack overflows
* Unexpected/unhandled exceptions
... but currently debug exceptions (BRKs, breakpoints, watchpoints,
single-step) are not considered NMIs.
As these can be taken at any time, kernel features (lockdep, RCU,
ftrace) may not be in a consistent kernel state. For example, we may
take an NMI from the idle code or partway through an entry/exit path.
While nmi_enter() and nmi_exit() handle most of this state, notably they
don't save/restore the lockdep state across an NMI being taken and
handled. When interrupts are enabled and an NMI is taken, lockdep may
see interrupts become disabled within the NMI code, but not see
interrupts become enabled when returning from the NMI, leaving lockdep
believing interrupts are disabled when they are actually disabled.
The x86 code handles this in idtentry_{enter,exit}_nmi(), which will
shortly be moved to the generic entry code. As we can't use either yet,
we copy the x86 approach in arm64-specific helpers. All the NMI
entrypoints are marked as noinstr to prevent any instrumentation
handling code being invoked before the state has been corrected.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
There are periods in kernel mode when RCU is not watching and/or the
scheduler tick is disabled, but we can still take exceptions such as
interrupts. The arm64 exception handlers do not account for this, and
it's possible that RCU is not watching while an exception handler runs.
The x86/generic entry code handles this by ensuring that all (non-NMI)
kernel exception handlers call irqentry_enter() and irqentry_exit(),
which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to
the generic entry code, and already hadnle the user<->kernel transitions
elsewhere, so we add new kernel<->kernel transition helpers alog the
lines of the generic entry code.
Since we now track interrupts becoming masked when an exception is
taken, local_daif_inherit() is modified to track interrupts becoming
re-enabled when the original context is inherited. To balance the
entry/exit paths, each handler masks all DAIF exceptions before
exit_to_kernel_mode().
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Exceptions from EL1 may be taken when RCU isn't watching (e.g. in idle
sequences), or when the lockdep hardirqs transiently out-of-sync with
the hardware state (e.g. in the middle of local_irq_enable()). To
correctly handle these cases, we'll need to save/restore this state
across some exceptions taken from EL1.
A series of subsequent patches will update EL1 exception handlers to
handle this. In preparation for this, and to avoid dependencies between
those patches, this patch adds two new fields to struct pt_regs so that
exception handlers can track this state.
Note that this is placed in pt_regs as some entry/exit sequences such as
el1_irq are invoked from assembly, which makes it very difficult to add
a separate structure as with the irqentry_state used by x86. We can
separate this once more of the exception logic is moved to C. While the
fields only need to be bool, they are both made u64 to keep pt_regs
16-byte aligned.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
In preparation for reworking the EL1 irq/nmi entry code, move the
existing logic to C. We no longer need the asm_nmi_enter() and
asm_nmi_exit() wrappers, so these are removed. The new C functions are
marked noinstr, which prevents compiler instrumentation and runtime
probing.
In subsequent patches we'll want the new C helpers to be called in all
cases, so we don't bother wrapping the calls with ifdeferry. Even when
the new C functions are stubs the trivial calls are unlikely to have a
measurable impact on the IRQ or NMI paths anyway.
Prototypes are added to <asm/exception.h> as otherwise (in some
configurations) GCC will complain about the lack of a forward
declaration. We already do this for existing function, e.g.
enter_from_user_mode().
The new helpers are marked as noinstr (which prevents all
instrumentation, tracing, and kprobes). Otherwise, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
In a subsequent patch ret_to_user will need to make a C function call
(in some configurations) which may clobber x0-x18 at the start of the
finish_ret_to_user block, before enable_step_tsk consumes the flags
loaded into x1.
In preparation for this, let's load the flags into x19, which is
preserved across C function calls. This avoids a redundant reload of the
flags and ensures we operate on a consistent shapshot regardless.
There should be no functional change as a result of this patch. At this
point of the entry/exit paths we only need to preserve x28 (tsk) and the
sp, and x19 is free for this use.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
In later patches we'll want to extend enter_from_user_mode() and add a
corresponding exit_to_user_mode(). As these will be common for all
entries/exits from userspace, it'd be better for these to live in
entry-common.c with the rest of the entry logic.
This patch moves enter_from_user_mode() into entry-common.c. As with
other functions in entry-common.c it is marked as noinstr (which
prevents all instrumentation, tracing, and kprobes) but there are no
other functional changes.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Functions in entry-common.c are marked as notrace and NOKPROBE_SYMBOL(),
but they're still subject to other instrumentation which may rely on
lockdep/rcu/context-tracking being up-to-date, and may cause nested
exceptions (e.g. for WARN/BUG or KASAN's use of BRK) which will corrupt
exceptions registers which have not yet been read.
Prevent this by marking all functions in entry-common.c as noinstr to
prevent compiler instrumentation. This also blacklists the functions for
tracing and kprobes, so we don't need to handle that separately.
Functions elsewhere will be dealt with in subsequent patches.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Core code disables RCU when calling arch_cpu_idle(), so it's not safe
for arch_cpu_idle() or its calees to be instrumented, as the
instrumentation callbacks may attempt to use RCU or other features which
are unsafe to use in this context.
Mark them noinstr to prevent issues.
The use of local_irq_enable() in arch_cpu_idle() is similarly
problematic, and the "sched/idle: Fix arch_cpu_idle() vs tracing" patch
queued in the tip tree addresses that case.
Reported-by: Marco Elver <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
In el0_svc_common() we unmask exceptions before we call user_exit(), and
so there's a window where an IRQ or debug exception can be taken while
RCU is not watching. In do_debug_exception() we account for this in via
debug_exception_{enter,exit}(), but in the el1_irq asm we do not and we
call trace functions which rely on RCU before we have a guarantee that
RCU is watching.
Let's avoid this by having el0_svc_common() exit userspace before
unmasking exceptions, matching what we do for all other EL0 entry paths.
We can use user_exit_irqoff() to avoid the pointless save/restore of IRQ
flags while we're sure exceptions are masked in DAIF.
The workaround for Cortex-A76 erratum 1463225 may trigger a debug
exception before this point, but the debug code invoked in this case is
safe even when RCU is not watching.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: James Morse <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
If kvaser_pciefd_bus_on() failed, we should call close_candev() to avoid
reference leak.
Fixes: 26ad340e582d3 ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Signed-off-by: Zhang Qilong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
In the error handling in c_can_power_up(), there are two bugs:
1) c_can_pm_runtime_get_sync() will increase usage counter if device is not
empty. Forgetting to call c_can_pm_runtime_put_sync() will result in a
reference leak here.
2) c_can_reset_ram() operation will set start bit when enable is true. We
should clear it in the error handling.
We fix it by adding c_can_pm_runtime_put_sync() for 1), and
c_can_reset_ram(enable is false) for 2) in the error handling.
Fixes: 8212003260c60 ("can: c_can: Add d_can suspend resume support")
Fixes: 52cde85acc23f ("can: c_can: Add d_can raminit support")
Signed-off-by: Zhang Qilong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[mkl: return "0" instead of "ret"]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
Losing arbitration is normal in a CAN-bus network, it means that a higher
priority frame is being send and the pending message will be retried later.
Hence most driver only increment arbitration_lost, but the sun4i driver also
incremeants tx_error, causing errors to be reported on a normal functioning
CAN-bus. So stop counting them as errors.
Fixes: 0738eff14d81 ("can: Allwinner A10/A20 CAN Controller support - Kernel module")
Signed-off-by: Jeroen Hofstee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[mkl: split into two seperate patches]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
Losing arbitration is normal in a CAN-bus network, it means that a higher
priority frame is being send and the pending message will be retried later.
Hence most driver only increment arbitration_lost, but the sja1000 driver also
incremeants tx_error, causing errors to be reported on a normal functioning
CAN-bus. So stop counting them as errors.
Fixes: 8935f57e68c4 ("can: sja1000: fix network statistics update")
Signed-off-by: Jeroen Hofstee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[mkl: split into two seperate patches]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
clk_disable_unprepare()
The clocks mcan_class->cclk and mcan_class->hclk are not prepared by any call
during tcan4x5x_can_probe(), so remove erroneous clk_disable_unprepare() on
them.
Fixes: 5443c226ba91 ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
Link: http://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.
Similar to the entry path the low level idle functions have to be
non-instrumentable"
* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Two fixes for irqchip drivers:
- Save and restore the GICV3 ITS state unconditionally on
suspend/resume to handle firmware which fails to do so.
- Use the correct index into the fwspec parameters to read the irq
trigger type in the EXIU chip driver"
* tag 'irq-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend
irqchip/exiu: Fix the index of fwspec for IRQ type
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Borislav Petkov:
"More EFI fixes forwarded from Ard Biesheuvel:
- revert efivarfs kmemleak fix again - it was a false positive
- make CONFIG_EFI_EARLYCON depend on CONFIG_EFI explicitly so it does
not pull in other dependencies unnecessarily if CONFIG_EFI is not
set
- defer attempts to load SSDT overrides from EFI vars until after the
efivar layer is up"
* tag 'efi-urgent-for-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: EFI_EARLYCON should depend on EFI
efivarfs: revert "fix memory leak in efivarfs_create()"
efi/efivars: Set generic ops before loading SSDT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A couple of urgent fixes which accumulated this last week:
- Two resctrl fixes to prevent refcount leaks when manipulating the
resctrl fs (Xiaochen Shen)
- Correct prctl(PR_GET_SPECULATION_CTRL) reporting (Anand K Mistry)
- A fix to not lose already seen MCE severity which determines
whether the machine can recover (Gabriele Paoloni)"
* tag 'x86_urgent_for_v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Do not overwrite no_way_out if mce_end() fails
x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak
x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"I've collected a handful of fixes over the past few weeks:
- A fix to un-break the build-id argument to the vDSO build, which is
necessary for the LLVM linker.
- A fix to initialize the jump label subsystem, without which it (and
all the stuff that uses it) doesn't actually function.
- A fix to include <asm/barrier.h> from <vdso/processor.h>, without
which some drivers won't compile"
* tag 'riscv-for-linus-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: fix barrier() use in <vdso/processor.h>
RISC-V: Add missing jump label initialization
riscv: Explicitly specify the build id style in vDSO Makefile again
|
|
GPIO_ACTIVE_x flags are not correct in the context of interrupt flags.
These are simple defines so they could be used in DTS but they will not
have the same meaning:
1. GPIO_ACTIVE_HIGH = 0 = IRQ_TYPE_NONE
2. GPIO_ACTIVE_LOW = 1 = IRQ_TYPE_EDGE_RISING
Correct the interrupt flags, assuming the author of the code wanted same
logical behavior behind the name "ACTIVE_xxx", this is:
ACTIVE_LOW => IRQ_TYPE_LEVEL_LOW
ACTIVE_HIGH => IRQ_TYPE_LEVEL_HIGH
Fixes: a1a8b4594f8d ("NFC: pn544: i2c: Add DTS Documentation")
Fixes: 6be88670fc59 ("NFC: nxp-nci_i2c: Add I2C support to NXP NCI driver")
Fixes: e3b329221567 ("dt-bindings: can: tcan4x5x: Update binding to use interrupt property")
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Rob Herring <[email protected]>
Acked-by: Marc Kleine-Budde <[email protected]> # for tcan4x5x.txt
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Dany Madden says:
====================
ibmvnic: assorted bug fixes
Assorted fixes for ibmvnic originated from "[PATCH net 00/15] ibmvnic:
assorted bug fixes" sent by Lijun Pan.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Reduce the wait time for Command Response Queue response from 30 seconds
to 20 seconds, as recommended by VIOS and Power Hypervisor teams.
Fixes: bd0b672313941 ("ibmvnic: Move login and queue negotiation into ibmvnic_open")
Fixes: 53da09e92910f ("ibmvnic: Add set_link_state routine for setting adapter link state")
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Reset timeout is going off right after adapter reset. This patch ensures
that timeout is scheduled if it has been 5 seconds since the last reset.
5 seconds is the default watchdog timeout.
Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
send_login() does not check for the result of ibmvnic_send_crq() of the
login request. This results in the driver needlessly retrying the login
10 times even when CRQ is no longer active. Check the return code and
give up in case of errors in sending the CRQ.
The only time we want to retry is if we get a PARITALSUCCESS response
from the partner.
Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If after ibmvnic sends a LOGIN it gets a FAILOVER, it is possible that
the worker thread will start reset process and free the login response
buffer before it gets a (now stale) LOGIN_RSP. The ibmvnic tasklet will
then try to access the login response buffer and crash.
Have ibmvnic track pending logins and discard any stale login responses.
Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If auto-priority failover is enabled, the backing device needs time
to settle if hard resetting fails for any reason. Add a delay of 60
seconds before retrying the hard-reset.
Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In a failed reset, driver could end up in VNIC_PROBED or VNIC_CLOSED
state and cannot recover in subsequent resets, leaving it offline.
This patch restores the adapter state to reset_state, the original
state when reset was called.
Fixes: b27507bb59ed5 ("net/ibmvnic: unlock rtnl_lock in reset so linkwatch_event can run")
Fixes: 2770a7984db58 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
scrq->msgs could be NULL during device reset, causing Linux to crash.
So, check before memset scrq->msgs.
Fixes: c8b2ad0a4a901 ("ibmvnic: Sanitize entire SCRQ buffer on reset")
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Lijun Pan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When ibmvnic fails to reset, it breaks out of the reset loop and frees
all of the remaining resets from the workqueue. Doing so prevents the
adapter from recovering if no reset is scheduled after that. Instead,
have the driver continue to process resets on the workqueue.
Remove the no longer need free_all_rwi().
Fixes: ed651a10875f1 ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Inconsistent login with the vnicserver is causing the device to be
removed. This does not give the device a chance to recover from error
state. This patch schedules a FATAL reset instead to bring the adapter
up.
Fixes: 032c5e82847a2 ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Lijun Pan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix insufficient validation of IPSET_ATTR_IPADDR_IPV6 reported
by syzbot.
2) Remove spurious reports on nf_tables when lockdep gets disabled,
from Florian Westphal.
3) Fix memleak in the error path of error path of
ip_vs_control_net_init(), from Wang Hai.
4) Fix missing control data in flow dissector, otherwise IP address
matching in hardware offload infra does not work.
5) Fix hardware offload match on prefix IP address when userspace
does not send a bitwise expression to represent the prefix.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: nftables_offload: build mask based from the matching bytes
netfilter: nftables_offload: set address type in control dissector
ipvs: fix possible memory leak in ip_vs_control_net_init
netfilter: nf_tables: avoid false-postive lockdep splat
netfilter: ipset: prevent uninit-value in hash_ip6_add
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
a proper kernel configuration for running kselftest can be obtained with:
$ yes | make kselftest-merge
enable compile support for the 'red' qdisc: otherwise, tdc kselftest fail
when trying to run tdc test items contained in red.json.
Signed-off-by: Davide Caratti <[email protected]>
Link: https://lore.kernel.org/r/cfa23f2d4f672401e6cebca3a321dd1901a9ff07.1606416464.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When inet_rtm_getroute() was converted to use the RCU variants of
ip_route_input() and ip_route_output_key(), the TOS parameters
stopped being masked with IPTOS_RT_MASK before doing the route lookup.
As a result, "ip route get" can return a different route than what
would be used when sending real packets.
For example:
$ ip route add 192.0.2.11/32 dev eth0
$ ip route add unreachable 192.0.2.11/32 tos 2
$ ip route get 192.0.2.11 tos 2
RTNETLINK answers: No route to host
But, packets with TOS 2 (ECT(0) if interpreted as an ECN bit) would
actually be routed using the first route:
$ ping -c 1 -Q 2 192.0.2.11
PING 192.0.2.11 (192.0.2.11) 56(84) bytes of data.
64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.173 ms
--- 192.0.2.11 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.173/0.173/0.173/0.000 ms
This patch re-applies IPTOS_RT_MASK in inet_rtm_getroute(), to
return results consistent with real route lookups.
Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Signed-off-by: Guillaume Nault <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/b2d237d08317ca55926add9654a48409ac1b8f5b.1606412894.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-11-28
1) Do not reference the skb for xsk's generic TX side since when looped
back into RX it might crash in generic XDP, from Björn Töpel.
2) Fix umem cleanup on a partially set up xsk socket when being destroyed,
from Magnus Karlsson.
3) Fix an incorrect netdev reference count when failing xsk_bind() operation,
from Marek Majtyka.
4) Fix bpftool to set an error code on failed calloc() in build_btf_type_table(),
from Zhen Lei.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add MAINTAINERS entry for BPF LSM
bpftool: Fix error return value in build_btf_type_table
net, xsk: Avoid taking multiple skbuff references
xsk: Fix incorrect netdev reference count
xsk: Fix umem cleanup bug at socket destruct
MAINTAINERS: Update XDP and AF_XDP entries
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Fix head/tailroom issues for fragments, by Sven Eckelmann (3 patches)
* tag 'batadv-net-pullrequest-20201127' of git://git.open-mesh.org/linux-merge:
batman-adv: Don't always reallocate the fragmentation skb head
batman-adv: Reserve needed_*room for fragments
batman-adv: Consider fragmentation for needed_headroom
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the
hooks as, while it's an expected value for a bridge, routing expects
PACKET_HOST. The change is undone later on after hook traversal. This
can be seen with pairs of functions updating skb>pkt_type and then
reverting it to its original value:
For hook NF_INET_PRE_ROUTING:
setup_pre_routing / br_nf_pre_routing_finish
For hook NF_INET_FORWARD:
br_nf_forward_ip / br_nf_forward_finish
But the third case where netfilter does this, for hook
NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing
but never reverted. A comment says:
/* We assume any code from br_dev_queue_push_xmit onwards doesn't care
* about the value of skb->pkt_type. */
But when having a tunnel (say vxlan) attached to a bridge we have the
following call trace:
br_nf_pre_routing
br_nf_pre_routing_ipv6
br_nf_pre_routing_finish
br_nf_forward_ip
br_nf_forward_finish
br_nf_post_routing <- pkt_type is updated to PACKET_HOST
br_nf_dev_queue_xmit <- but not reverted to its original value
vxlan_xmit
vxlan_xmit_one
skb_tunnel_check_pmtu <- a check on pkt_type is performed
In this specific case, this creates issues such as when an ICMPv6 PTB
should be sent back. When CONFIG_BRIDGE_NETFILTER is enabled, the PTB
isn't sent (as skb_tunnel_check_pmtu checks if pkt_type is PACKET_HOST
and returns early).
If the comment is right and no one cares about the value of
skb->pkt_type after br_dev_queue_push_xmit (which isn't true), resetting
it to its original value should be safe.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Antoine Tenart <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Remove unused OBJSIZE variable.
- Fix rootless deb-pkg build in a setgid directory.
* tag 'kbuild-fixes-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
builddeb: Fix rootless build in setuid/setgid directory
kbuild: remove unused OBJSIZE
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool fixes from Arnaldo Carvalho de Melo:
- Fix die_entrypc() when DW_AT_ranges DWARF attribute not available
- Cope with broken DWARF (missing DW_AT_declaration) generated by some
recent gcc versions
- Do not generate CGROUP metadata events when not asked to in 'perf
record'
- Use proper CPU for shadow stats in 'perf stat'
- Update copy of libbpf's hashmap.c, silencing tools/perf build warning
- Fix return value in 'perf diff'
* tag 'perf-tools-fixes-for-v5.10-2020-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf probe: Change function definition check due to broken DWARF
perf probe: Fix to die_entrypc() returns error correctly
perf stat: Use proper cpu for shadow stats
perf record: Synthesize cgroup events only if needed
perf diff: Fix error return value in __cmd_diff()
perf tools: Update copy of libbpf's hashmap.c
|