Age | Commit message (Collapse) | Author | Files | Lines |
|
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.
But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().
This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.
Use a dedicated per CPU storage to prepare for that.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
The wmb()s before sending the IPIs are not synchronizing anything.
If at all then the apic IPI functions have to provide or act as appropriate
barriers.
Remove these cargo cult barriers which have no explanation of what they are
synchronizing.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel
CPUs return the content of the highest supported leaf when a non-existing
leaf is read, while AMD CPUs return all zeros for unsupported leafs.
So the result of the test on Intel CPUs is lottery.
While harmless it's incorrect and causes the conditional wbinvd() to be
issued where not required.
Check whether the leaf is supported before reading it.
[ tglx: Adjusted changelog ]
Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Signed-off-by: Tony Battersby <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Tony reported intermittent lockups on poweroff. His analysis identified the
wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
on SME enabled machines a kexec() does not leave any stale data in the
caches when switching from encrypted to non-encrypted mode or vice versa.
That wbinvd() is conditional on the SME feature bit which is read directly
from CPUID. But that readout does not check whether the CPUID leaf is
available or not. If it's not available the CPU will return the value of
the highest supported leaf instead. Depending on the content the "SME" bit
might be set or not.
That's incorrect but harmless. Making the CPUID readout conditional makes
the observed hangs go away, but it does not fix the underlying problem:
CPU0 CPU1
stop_other_cpus()
send_IPIs(REBOOT); stop_this_cpu()
while (num_online_cpus() > 1); set_online(false);
proceed... -> hang
wbinvd()
WBINVD is an expensive operation and if multiple CPUs issue it at the same
time the resulting delays are even larger.
But CPU0 already observed num_online_cpus() going down to 1 and proceeds
which causes the system to hang.
This issue exists independent of WBINVD, but the delays caused by WBINVD
make it more prominent.
Make this more robust by adding a cpumask which is initialized to the
online CPU mask before sending the IPIs and CPUs clear their bit in
stop_this_cpu() after the WBINVD completed. Check for that cpumask to
become empty in stop_other_cpus() instead of watching num_online_cpus().
The cpumask cannot plug all holes either, but it's better than a raw
counter and allows to restrict the NMI fallback IPI to be sent only the
CPUs which have not reported within the timeout window.
Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Reported-by: Tony Battersby <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
ipsec-2023-06-20
|
|
Provide helpers to set and clear sb->s_readonly_remount including
appropriate memory barriers. Also use this opportunity to document what
the barriers pair with and why they are needed.
Suggested-by: Dave Chinner <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
This microSD card never clears Flush Cache bit after cache flush has
been started in sd_flush_cache(). This leads e.g. to failure to mount
file system. Add a quirk which disables the SD cache for this specific
card from specific manufacturing date of 11/2019, since on newer dated
cards from 05/2023 the cache flush works correctly.
Fixes: 08ebf903af57 ("mmc: core: Fixup support for writeback-cache for eMMC and SD")
Signed-off-by: Marek Vasut <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
The Kconfig currently defaults the governor to schedutil on x86_64
only when intel-pstate and SMP have been selected.
If the kernel is built only with amd-pstate, the default governor
should also be schedutil.
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Leo Li <[email protected]>
Acked-by: Huang Rui <[email protected]>
Tested-by: Perry Yuan <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
It seems that Kingston EMMC04G-M627 despite advertising TRIM support does
not work when the core is trying to use REQ_OP_WRITE_ZEROES.
We are seeing I/O errors in OpenWrt under 6.1 on Zyxel NBG7815 that we did
not previously have and tracked it down to REQ_OP_WRITE_ZEROES.
Trying to use fstrim seems to also throw errors like:
[93010.835112] I/O error, dev loop0, sector 16902 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2
Disabling TRIM makes the error go away, so lets add a quirk for this eMMC
to disable TRIM.
Signed-off-by: Robert Marko <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
On STM32MP25, the delay block is inside the SoC, and configured through
the SYSCFG registers. The algorithm is also different from what was in
STM32MP1 chip.
Signed-off-by: Yann Gautier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
Create an sdmmc_tuning_ops struct to ease support for another
delay block peripheral.
Signed-off-by: Yann Gautier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
In stm32 sdmmc variant revision v3.0, a block gap hardware flow control
should be used with bus speed modes SDR104 and HS200.
It is enabled by writing a non-null value to the new added register
MMCI_STM32_FIFOTHRR.
The threshold will be 2^(N-1) bytes, so we can use the ffs() function to
compute the value N to be written to the register. The threshold used
should be the data block size, but must not be bigger than the FIFO size.
Signed-off-by: Yann Gautier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
This is an update of the SDMMC revision v2.2, with just an increased
FIFO size, from 64B to 1kB.
Signed-off-by: Yann Gautier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
The alignment for the IDMA size depends on the peripheral version, it
should then be configurable. Add stm32_idmabsize_align in the variant
structure.
And remove now unused (and wrong) MMCI_STM32_IDMABNDT_* macros.
Signed-off-by: Yann Gautier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
For STM32MP25, we'll need to distinguish how is managed the delay block.
This is done through a new comptible dedicated for this SoC, as the
delay block registers are located in SYSCFG peripheral.
Signed-off-by: Yann Gautier <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux
Merge devfreq updates for v6.5 from Chanwoo Choi:
"1. Reorder fieldls in 'struct devfreq_dev_status' in order to shrink
the size of 'struct devfreqw_dev_status' without any behavior
changes.
2. Add exynos-ppmu.c driver as a soft module dependency in order to
prevent the freeze issue between exynos-bus.c devfreq driver and
exynos-ppmu.c devfreq event driver.
3. Fix variable deferencing before NULL check on mtk-cci-devfreq.c"
* tag 'devfreq-next-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: mtk-cci: Fix variable deferencing before NULL check
PM / devfreq: exynos: add Exynos PPMU as a soft module dependency
PM / devfreq: Reorder fields in 'struct devfreq_dev_status'
|
|
Arınç ÜNAL says:
====================
net: dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
This patch series fixes all non-theoretical issues regarding multiple CPU
ports and the handling of LLDP frames and BPDUs.
I am adding me as a maintainer, I've got some code improvements on the way.
I will keep an eye on this driver and the patches submitted for it in the
future.
Arınç
v6:
- Change a small portion of the comment in the diff on "net: dsa: mt7530:
set all CPU ports in MT7531_CPU_PMAP" with Russell's suggestion.
- Change the patch log of "net: dsa: mt7530: fix trapping frames on
non-MT7621 SoC MT7530 switch" with Vladimir's suggestion.
- Group the code for trapping frames into a common function and call that.
- Add Vladimir and Russell's reviewed-by tags to where they're given.
v5:
- Change the comment in the diff on the first patch with Russell's words.
- Change the patch log of the first patch to state that the patch is just
preparatory work for change "net: dsa: introduce
preferred_default_local_cpu_port and use on MT7530" and not a fix to an
existing problem on the code base.
- Remove the "net: dsa: mt7530: fix trapping frames with multiple CPU ports
on MT7530" patch. It fixes a theoretical issue, therefore it is net-next
material.
- Remove unnecessary information from the patch logs. Remove the enum
renaming change.
- Strengthen the point of the "net: dsa: introduce
preferred_default_local_cpu_port and use on MT7530" patch.
v4: Make the patch logs and my comments in the code easier to understand.
v3: Fix the from header on the patches. Write a cover letter.
v2: Add patches to fix the handling of LLDP frames and BPDUs.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add me as a maintainer of the MediaTek MT7530 DSA subdriver.
List maintainers in alphabetical order by first name.
Signed-off-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Matthias Brugger <[email protected]>
Reviewed-by: Matthias Brugger <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since the introduction of the OF bindings, DSA has always had a policy that
in case multiple CPU ports are present in the device tree, the numerically
smallest one is always chosen.
The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU
ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because
it has higher bandwidth.
The MT7530 driver developers had 3 options:
- to modify DSA when the MT7531 switch support was introduced, such as to
prefer the better port
- to declare both CPU ports in device trees as CPU ports, and live with the
sub-optimal performance resulting from not preferring the better port
- to declare just port 6 in the device tree as a CPU port
Of course they chose the path of least resistance (3rd option), kicking the
can down the road. The hardware description in the device tree is supposed
to be stable - developers are not supposed to adopt the strategy of
piecemeal hardware description, where the device tree is updated in
lockstep with the features that the kernel currently supports.
Now, as a result of the fact that they did that, any attempts to modify the
device tree and describe both CPU ports as CPU ports would make DSA change
its default selection from port 6 to 5, effectively resulting in a
performance degradation visible to users with the MT7531BE switch as can be
seen below.
Without preferring port 6:
[ ID][Role] Interval Transfer Bitrate Retr
[ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender
[ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver
[ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender
[ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver
With preferring port 6:
[ ID][Role] Interval Transfer Bitrate Retr
[ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender
[ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver
[ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender
[ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver
Using one port for WAN and the other ports for LAN is a very popular use
case which is what this test emulates.
As such, this change proposes that we retroactively modify stable kernels
(which don't support the modification of the CPU port assignments, so as to
let user space fix the problem and restore the throughput) to keep the
mt7530 driver preferring port 6 even with device trees where the hardware
is more fully described.
Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch")
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
LLDP frames are link-local frames, therefore they must be trapped to the
CPU port. Currently, the MT753X switches treat LLDP frames as regular
multicast frames, therefore flooding them to user ports. To fix this, set
LLDP frames to be trapped to the CPU port(s).
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
BPDUs are link-local frames, therefore they must be trapped to the CPU
port. Currently, the MT7530 switch treats BPDUs as regular multicast
frames, therefore flooding them to user ports. To fix this, set BPDUs to be
trapped to the CPU port. Group this on mt7530_setup() and
mt7531_setup_common() into mt753x_trap_frames() and call that.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Signed-off-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
All MT7530 switch IP variants share the MT7530_MFC register, but the
current driver only writes it for the switch variant that is integrated in
the MT7621 SoC. Modify the code to include all MT7530 derivatives.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
MT7531_CPU_PMAP represents the destination port mask for trapped-to-CPU
frames (further restricted by PCR_MATRIX).
Currently the driver sets the first CPU port as the single port in this bit
mask, which works fine regardless of whether the device tree defines port
5, 6 or 5+6 as CPU ports. This is because the logic coincides with DSA's
logic of picking the first CPU port as the CPU port that all user ports are
affine to, by default.
An upcoming change would like to influence DSA's selection of the default
CPU port to no longer be the first one, and in that case, this logic needs
adaptation.
Since there is no observed leakage or duplication of frames if all CPU
ports are defined in this bit mask, simply include them all.
Suggested-by: Russell King (Oracle) <[email protected]>
Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Layerscape MACs support 25Gbps network speed with dpmac "CAUI" mode.
Add the mappings between DPMAC_ETH_IF_* and HY_INTERFACE_MODE_*, as well
as the 25000 mac capability.
Tested on SolidRun LX2162a Clearfog, serdes 1 protocol 18.
Signed-off-by: Josua Mayer <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
An update from ieee802154 for your *net* tree:
Two small fixes and MAINTAINERS update this time.
Azeem Shaikh ensured consistent use of strscpy through the tree and fixed
the usage in our trace.h.
Chen Aotian fixed a potential memory leak in the hwsim simulator for
ieee802154.
Miquel Raynal updated the MAINATINERS file with the new team git tree
locations and patchwork URLs.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Rahul Rameshbabu says:
====================
ptp .adjphase cleanups
The goal of this patch series is to improve documentation of .adjphase, add
a new callback .getmaxphase to enable advertising the max phase offset a
device PHC can support, and support invoking .adjphase from the testptp
kselftest.
Changes:
v2->v1:
* Removes arbitrary rule that the PHC servo must restore the frequency
to the value used in the last .adjfine call if any other PHC
operation is used after a .adjphase operation.
* Removes a macro introduced in v1 for adding PTP sysfs device
attribute nodes using a callback for populating the data.
Link: https://lore.kernel.org/netdev/[email protected]/
Link: https://lore.kernel.org/netdev/[email protected]/
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add a function that advertises a maximum offset of zero supported by
ptp_clock_info .adjphase in the OCP null ptp implementation.
Cc: Richard Cochran <[email protected]>
Cc: Jonathan Lemon <[email protected]>
Cc: Vadim Fedorenko <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Acked-by: Vadim Fedorenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Advertise the maximum offset the .adjphase callback is capable of
supporting in nanoseconds for IDT devices.
Refactor the negation of the offset stored in the register to be after the
boundary check of the offset value rather than before. Boundary check based
on the intended value rather than its device-specific representation.
Depend on ptp_clock_adjtime for handling out-of-range offsets.
ptp_clock_adjtime returns -ERANGE instead of clamping out-of-range offsets.
Cc: Richard Cochran <[email protected]>
Cc: Min Li <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Advertise the maximum offset the .adjphase callback is capable of
supporting in nanoseconds for IDT ClockMatrix devices. Depend on
ptp_clock_adjtime for handling out-of-range offsets. ptp_clock_adjtime
returns -ERANGE instead of clamping out-of-range offsets.
Cc: Richard Cochran <[email protected]>
Cc: Vincent Cheng <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Implement .getmaxphase callback of ptp_clock_info in mlx5 driver. No longer
do a range check in .adjphase callback implementation. Handled by the ptp
stack.
Cc: Saeed Mahameed <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Enables advertisement of the maximum offset supported by the phase control
functionality of PHCs. The callback is used to return an error if an offset
not supported by the PHC is used in ADJ_OFFSET. The ioctls
PTP_CLOCK_GETCAPS and PTP_CLOCK_GETCAPS2 now advertise the maximum offset a
PHC's phase control functionality is capable of supporting. Introduce new
sysfs node, max_phase_adjustment.
Cc: Jakub Kicinski <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Maciek Machnikowski <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Invoke clock_adjtime syscall with tx.modes set with ADJ_OFFSET when testptp
is invoked with a phase adjustment offset value. Support seconds and
nanoseconds for the offset value.
Cc: Jakub Kicinski <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Maciek Machnikowski <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use existing NSEC_PER_SEC declaration in place of hardcoded magic numbers.
Cc: Jakub Kicinski <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Maciek Machnikowski <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The mlx5_core driver has implemented ptp clock driver functionality but
lacked documentation about the PTP devices. This patch adds information
about the Mellanox device family.
Signed-off-by: Rahul Rameshbabu <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
.adjphase expects a PHC to use an internal servo algorithm to correct the
provided phase offset target in the callback. Implementation of the
internal servo algorithm are defined by the individual devices.
Cc: Jakub Kicinski <[email protected]>
Cc: Richard Cochran <[email protected]>
Signed-off-by: Rahul Rameshbabu <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix races in Hyper-V PCI controller (Dexuan Cui)
- Fix handling of hyperv_pcpu_input_arg (Michael Kelley)
- Fix vmbus_wait_for_unload to scan present CPUs (Michael Kelley)
- Call hv_synic_free in the failure path of hv_synic_alloc (Dexuan Cui)
- Add noop for real mode handlers for virtual trust level code (Saurabh
Sengar)
* tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
PCI: hv: Add a per-bus mutex state_lock
Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
PCI: hv: Fix a race condition bug in hv_pci_query_relations()
arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing
x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails
x86/hyperv/vtl: Add noop for realmode pointers
|
|
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_ARCH without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.
The change allows to clean up dependencies of PPC_WATCHDOG
and HAVE_HARDLOCKUP_DETECTOR_PERF definitions for powerpc.
As a result HAVE_HARDLOCKUP_DETECTOR_PERF has the same dependencies
on arm, x86, powerpc architectures.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.
Before, it is far from obvious that the SPARC64 variant is actually used:
$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
After, it is more clear:
$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.
The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic
touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support
/proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see
CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific
functions are needed in sparc-specific code which includes
only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog()
+ included asm/nmi.h into linux/nmi.h
+ defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.
Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
arch_touch_nmi_watchdog() needs a different implementation for various
hardlockup detector implementations. And it does nothing when
any hardlockup detector is not built at all.
arch_touch_nmi_watchdog() is declared via linux/nmi.h. And it must be
defined as an empty function when there is no hardlockup detector.
It is done directly in this header file for the perf and buddy detectors.
And it is done in the included asm/linux.h for arch specific detectors.
The reason probably is that the arch specific variants build the code
using another conditions. For example, powerpc64/sparc64 builds the code
when CONFIG_PPC_WATCHDOG is enabled.
Another reason might be that these architectures define more functions
in asm/nmi.h anyway.
However the generic code actually knows when the function will be
implemented. It happens when some full featured or the sparc64-specific
hardlockup detector is built.
In particular, CONFIG_HARDLOCKUP_DETECTOR can be enabled only when
a generic or arch-specific full featured hardlockup detector is available.
The only exception is sparc64 which can be built even when the global
HARDLOCKUP_DETECTOR switch is disabled.
The information about sparc64 is a bit complicated. The hardlockup
detector is built there when CONFIG_HAVE_NMI_WATCHDOG is set and
CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
People might wonder whether this change really makes things easier.
The motivation is:
+ The current logic in linux/nmi.h is far from obvious.
For example, arch_touch_nmi_watchdog() is defined as {} when
neither CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER nor
CONFIG_HAVE_NMI_WATCHDOG is defined.
+ The change synchronizes the checks in lib/Kconfig.debug and
in the generic code.
+ It is a step that will help cleaning HAVE_NMI_WATCHDOG related
checks.
The change should not change the existing behavior.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There are four possible variants of hardlockup detectors:
+ buddy: available when SMP is set.
+ perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.
+ arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.
+ sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
The check for the sparc64 variant is more complicated because
HAVE_NMI_WATCHDOG is used to #ifdef code used by both arch-specific
and sparc64 specific variant. Therefore it is automatically
selected with HAVE_HARDLOCKUP_DETECTOR_ARCH.
This complexity is partly hidden in HAVE_HARDLOCKUP_DETECTOR_NON_ARCH.
It reduces the size of some checks but it makes them harder to follow.
Finally, the other temporary variable HARDLOCKUP_DETECTOR_NON_ARCH
is used to re-compute HARDLOCKUP_DETECTOR_PERF/BUDDY when the global
HARDLOCKUP_DETECTOR switch is enabled/disabled.
Make the logic more straightforward by the following changes:
+ Better explain the role of HAVE_HARDLOCKUP_DETECTOR_ARCH and
HAVE_NMI_WATCHDOG in comments.
+ Add HAVE_HARDLOCKUP_DETECTOR_BUDDY so that there is separate
HAVE_* for all four hardlockup detector variants.
Use it in the other conditions instead of SMP. It makes it
clear that it is about the buddy detector.
+ Open code HAVE_HARDLOCKUP_DETECTOR_NON_ARCH in HARDLOCKUP_DETECTOR
and HARDLOCKUP_DETECTOR_PREFER_BUDDY. It helps to understand
the conditions between the four hardlockup detector variants.
+ Define the exact conditions when HARDLOCKUP_DETECTOR_PERF/BUDDY
can be enabled. It explains the dependency on the other
hardlockup detector variants.
Also it allows to remove HARDLOCKUP_DETECTOR_NON_ARCH by using "imply".
It triggers re-evaluating HARDLOCKUP_DETECTOR_PERF/BUDDY when
the global HARDLOCKUP_DETECTOR switch is changed.
+ Add dependency on HARDLOCKUP_DETECTOR so that the affected variables
disappear when the hardlockup detectors are disabled.
Another nice side effect is that HARDLOCKUP_DETECTOR_PREFER_BUDDY
value is not preserved when the global switch is disabled.
The user has to make the decision again when it gets re-enabled.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
logical way
Patch series "watchdog/hardlockup: Cleanup configuration of hardlockup
detectors", v2.
Clean up watchdog Kconfig after introducing the buddy detector.
This patch (of 6):
There are four possible variants of hardlockup detectors:
+ buddy: available when SMP is set.
+ perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.
+ arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.
+ sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
Only one hardlockup detector can be compiled in. The selection is done
using quite complex dependencies between several CONFIG variables.
The following patches will try to make it more straightforward.
As a first step, reorder the definitions of the various CONFIG variables.
The logical order is:
1. HAVE_* variables define available variants. They are typically
defined in the arch/ config files.
2. HARDLOCKUP_DETECTOR y/n variable defines whether the hardlockup
detector is enabled at all.
3. HARDLOCKUP_DETECTOR_PREFER_BUDDY y/n variable defines whether
the buddy detector should be preferred over the perf one.
Note that the arch specific variants are always preferred when
available.
4. HARDLOCKUP_DETECTOR_PERF/BUDDY variables define whether the given
detector is enabled in the end.
5. HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and HARDLOCKUP_DETECTOR_NON_ARCH
are temporary variables that are going to be removed in
a followup patch.
This is a preparation step for further cleanup. It will change the logic
without shuffling the definitions.
This change temporary breaks the C-like ordering where the variables are
declared or defined before they are used. It is not really needed for
Kconfig. Also the following patches will rework the logic so that
the ordering will be C-like in the end.
The patch just shuffles the definitions. It should not change the existing
behavior.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
It's been suggested that since the SMP barriers are only potentially
useful for the buddy hardlockup detector, not the perf hardlockup
detector, that the barriers belong in the buddy code. Let's move them and
add clearer comments about why they're needed.
Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more complicated
than it needed to be. If the "perf" detector is available and we have SMP
then we have a choice, so enable the config based on just those two config
items.
Link: https://lkml.kernel.org/r/20230526184139.8.I49d5b483336b65b8acb1e5066548a05260caf809@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
There's no reason to make a copy of the "watchdog_cpus" locally in
watchdog_next_cpu(). Making a copy wouldn't make things any more race
free and we're just reading the value so there's no need for a copy.
Link: https://lkml.kernel.org/r/20230526184139.7.If466f9a2b50884cbf6a1d8ad05525a2c17069407@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In the patch ("watchdog/hardlockup: detect hard lockups using secondary
(buddy) CPUs"), we added a call from the common watchdog.c file into the
buddy. That call could be done more cleanly. Specifically:
1. If we move the call into watchdog_hardlockup_kick() then it keeps
watchdog_timer_fn() simpler.
2. We don't need to pass an "unsigned long" to the buddy for the timer
count. In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") the count was changed to "atomic_t"
which is backed by an int, so we should match types.
Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In the patch ("watchdog/hardlockup: add comments to touch_nmi_watchdog()")
we adjusted some comments for touch_nmi_watchdog(). The comment about the
softlockup had a typo and were also felt to be too obvious. Remove it.
Link: https://lkml.kernel.org/r/20230526184139.5.Ia593afc9eb12082d55ea6681dc2c5a89677f20a8@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started using a cpumask to keep track of
which CPUs to backtrace. When setting up this cpumask, it's better to use
cpumask_copy() than to just copy the structure directly. Fix this.
Link: https://lkml.kernel.org/r/20230526184139.4.Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr().
Using this_cpu_ptr() works fine.
Link: https://lkml.kernel.org/r/20230526184139.3.I660e103077dcc23bb29aaf2be09cb234e0495b2d@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
watchdog_hardlockup_probe()
Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one
architecture, we have some special case code in the watchdog core to
handle the fact that watchdog_hardlockup_probe() isn't implemented.
Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the
special case.
As a side effect of doing this, code inspection tells us that we could fix
a minor bug where the system won't properly realize that NMI watchdogs are
disabled. Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off
the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which
selects CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then
nothing will override the "weak" watchdog_hardlockup_probe() and we'll
fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.
Link: https://lkml.kernel.org/r/20230526184139.2.Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|