Age | Commit message (Collapse) | Author | Files | Lines |
|
Make the DWC PCIe RC/EP safer and more verbose for invalid or failed
inbound and outbound iATU window setups. Silently ignoring iATU regions
setup errors may cause unpredictable errors. For instance if a cfg or IO
window fails to be activated, then any CFG/IO requested won't reach target
PCIe devices and the corresponding accessors will return platform-specific
random values.
[bhelgaas: trim commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Make __dw_pcie_prog_outbound_atu() check the requested region base and size
against what the hardware can support. Return error if the region is not
correctly aligned or of a supported size.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
The DWC PCIe RC/EP/DM IP core configuration parameters determine the number
of inbound and outbound iATU windows, alignment requirements (which is also
the minimum window size), minimum and maximum sizes. If internal ATU is
enabled, the former settings are determined by CX_ATU_MIN_REGION_SIZE; the
latter are determined by CX_ATU_MAX_REGION_SIZE.
Determine the required alignment and maximum size supported by the
controller and log it to help verify whether the requested inbound or
outbound memory mappings can be fully created.
Note 1. The extended iATU regions have been supported since DWC PCIe
v4.60a. There is no need in testing the upper limit register availability
for the older cores.
Note 2. The regions alignment is determined with using the fls() method
since the lower four bits of the ATU Limit register can be occupied with
the Circular Buffer Increment setting, which can be initialized with zeros.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Previously __dw_pcie_prog_outbound_atu() duplicated a lot of code between
the iatu_unroll_enabled version and the PCIE_ATU_VIEWPORT version:
__dw_pcie_prog_outbound_atu
if (iatu_unroll_enabled)
dw_pcie_prog_outbound_atu_unroll
dw_pcie_writel_ob_unroll(PCIE_ATU_UNR_LOWER_BASE, ...)
dw_pcie_writel_ob_unroll(PCIE_ATU_UNR_UPPER_BASE, ...)
...
return
dw_pcie_writel_dbi(PCIE_ATU_VIEWPORT, ...)
dw_pcie_writel_dbi(PCIE_ATU_LOWER_BASE, ...)
dw_pcie_writel_dbi(PCIE_ATU_UPPER_BASE, ...)
...
Unify those by pushing the unroll address computation and viewport
selection down into dw_pcie_writel_atu() so we can use the same
dw_pcie_writel_atu_ob() accessor for both paths:
__dw_pcie_prog_outbound_atu
dw_pcie_writel_atu_ob(PCIE_ATU_LOWER_BASE, ...)
dw_pcie_writel_atu
dw_pcie_select_atu # new
if (iatu_unroll_enabled)
return pci->atu_base + PCIE_ATU_UNROLL_BASE(...)
dw_pcie_writel_dbi(PCIE_ATU_VIEWPORT, ...)
return pci->atu_base
dw_pcie_write(base + reg)
dw_pcie_writel_atu_ob(PCIE_ATU_UPPER_BASE, ...)
...
In the non-unroll case, this does involve more MMIO writes to
PCIE_ATU_VIEWPORT, but it's mainly in initialization paths and the code
simplification is significant.
[bhelgaas: commit log, simplify dw_pcie_select_atu()]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Previously callers of dw_pcie_disable_atu() supplied enum
dw_pcie_region_type (DW_PCIE_REGION_INBOUND, DW_PCIE_REGION_OUTBOUND),
which dw_pcie_disable_atu() converted to the PCIE_ATU_REGION_DIR_IB or
PCIE_ATU_REGION_DIR_OB values needed to program the ATU registers.
Simplify the code by dropping the dw_pcie_region_type enum and passing
PCIE_ATU_REGION_DIR_IB or PCIE_ATU_REGION_DIR_OB directly.
Reorder dw_pcie_disable_atu() arguments to (dir, index) since "index"
indicates an ATU window in the regions of the corresponding direction.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Previously dw_pcie_ep_set_bar() converted the BAR PCI_BASE_ADDRESS_SPACE
bit to the internal dw_pcie_as_type enum (DW_PCIE_AS_MEM, DW_PCIE_AS_IO)
and passed it down to dw_pcie_prog_inbound_atu(), which converted the enum
to the PCIE_ATU_TYPE_MEM/PCIE_ATU_TYPE_IO values needed to program the ATU
registers.
Simplify the code by dropping the dw_pcie_as_type enum and passing
PCIE_ATU_TYPE_MEM or PCIE_ATU_TYPE_IO directly.
Reorder inbound ATU function arguments to match the outbound functions,
with address-related parameters at the end.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
dw_pcie_host_init() calls the dw_pcie_ops.host_init() callback to do
platform-specific host initialization.
Add a dw_pcie_ops.host_deinit() callback to perform the corresponding
cleanups in dw_pcie_host_deinit() and in dw_pcie_host_init() failure paths.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Since the DW PCIe common code (dw_pcie_version_detect()) now reads the IP
core version directly from the hardware, there is no point manually setting
the version for controllers newer than v4.70a.
Tegra194 only supports v4.90a, so remove the now-superfluous code that sets
struct dw_pcie.version.
Suggested-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Since the DW PCIe common code (dw_pcie_version_detect()) now reads the IP
core version directly from the hardware, there is no point manually setting
the version for controllers newer than v4.70a.
Remove the now-superfluous intel-gw code that sets struct dw_pcie.version.
Suggested-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Add macros to compare DWC IP core versions:
dw_pcie_ver_is()
dw_pcie_ver_is_ge()
dw_pcie_ver_type_is()
dw_pcie_ver_type_is_ge()
These are along the lines of DWC3_VER_IS() and dw_spi_ver_is().
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
|
|
Since DWC PCIe v4.70a, the controller version and version type can be read
from the PORT_LOGIC.PCIE_VERSION_OFF and PORT_LOGIC.PCIE_VERSION_TYPE_OFF
registers respectively.
Read the version from those registers and warn if if's different from the
version we got from the device tree.
We can only read the version after platform-specific drivers have done any
DBI-related initialization, such as reference clock activation.
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Save the DWC IP core version in the same format as the
PORT_LOGIC.PCIE_VERSION_OFF register, similar to what other drivers for DWC
IP do (dw_spi_hw_init(), dwc3_core_is_valid(), stmmac_hwif_init()).
[bhelgaas: trim commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Previously, dw_pcie_ep_init() did:
dw_pcie_iatu_detect(pci);
res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
if (!res)
return -EINVAL;
The platform_get_resource_byname() can fail, and dw_pcie_iatu_detect()
doesn't depend on the "addr_space" resource, so delay it until afterwards,
i.e.,
platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
dw_pcie_iatu_detect(pci);
[bhelgaas: commit log]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
Printing just "link up" isn't very informative for PCI Express. Even if the
link is up, bus performance can degrade to slower speeds or to narrower
width than both Root Port and its partner is capable of. In that case it
would be handy to know the link specifications as early as possible.
If the link comes up, log the link speed (PCIe generation) and width.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Serge Semin <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for interrupt core and drivers:
Core:
- Fix a few inconsistencies between UP and SMP vs interrupt
affinities
- Small updates and cleanups all over the place
New drivers:
- LoongArch interrupt controller
- Renesas RZ/G2L interrupt controller
Updates:
- Hotpath optimization for SiFive PLIC
- Workaround for broken PLIC edge triggered interrupts
- Simall cleanups and improvements as usual"
* tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
irqchip/mmp: Declare init functions in common header file
irqchip/mips-gic: Check the return value of ioremap() in gic_of_init()
genirq: Use for_each_action_of_desc in actions_show()
irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch
irqchip: Add LoongArch CPU interrupt controller support
irqchip: Add Loongson Extended I/O interrupt controller support
irqchip/loongson-liointc: Add ACPI init support
irqchip/loongson-pch-msi: Add ACPI init support
irqchip/loongson-pch-pic: Add ACPI init support
irqchip: Add Loongson PCH LPC controller support
LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain
LoongArch: Use ACPI_GENERIC_GSI for gsi handling
genirq/generic_chip: Export irq_unmap_generic_chip
ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback
APCI: irq: Add support for multiple GSI domains
LoongArch: Provisionally add ACPICA data structures
irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains
irqdomain: Report irq number for NOMAP domains
irqchip/gic-v3: Fix comment typo
dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/V2L SoC
...
|
|
Commit 2dec18ad826f forgets to call mutex_unlock() before the function
returns in the error path:
New smatch warnings:
net/core/devlink.c:6392 devlink_nl_cmd_region_new() warn: inconsistent \
returns '®ion->snapshot_lock'.
Make sure we call mutex_unlock() in this error path.
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Fixes: 2dec18ad826f ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: Ammar Faizi <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
destroy_workqueue() safely destroys the workqueue after draining it.
No need for the explicit call to flush_workqueue(). Remove it.
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Timers, timekeeping and related drivers update:
Core:
- Make wait_event_hrtimeout() aware of RT/DL tasks
New drivers:
- R-Car Gen4 timer
- Tegra186 timer
- Mediatek MT6795 CPUXGPT timer
Updates:
- Rework suspend/resume handling in timer drivers so it
takes inactive clocks into account.
- The usual device tree compatible add ons
- Small fixed and cleanups all over the place"
* tag 'timers-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
wait: Fix __wait_event_hrtimeout for RT/DL tasks
clocksource/drivers/sun5i: Remove unnecessary (void*) conversions
dt-bindings: timer: allwinner,sun4i-a10-timer: Add D1 compatible
dt-bindings: timer: ingenic,tcu: use absolute path to other schema
clocksource/drivers/sun4i: Remove unnecessary (void*) conversions
dt-bindings: timer: renesas,cmt: Fix R-Car Gen4 fall-out
clocksource/drivers/tegra186: Put Kconfig option 'tristate' to 'bool'
clocksource/drivers/timer-ti-dm: Make driver selection bool for TI K3
clocksource/drivers/timer-ti-dm: Add compatible for am6 SoCs
clocksource/drivers/timer-ti-dm: Make timer selectable for ARCH_K3
clocksource/drivers/timer-ti-dm: Move inline functions to driver for am6
clocksource/drivers/sh_cmt: Add R-Car Gen4 support
dt-bindings: timer: renesas,cmt: R-Car V3U is R-Car Gen4
dt-bindings: timer: renesas,cmt: Add r8a779f0 and generic Gen4 CMT support
clocksource/drivers/timer-microchip-pit64b: Fix compilation warnings
clocksource/drivers/timer-microchip-pit64b: Use mchp_pit64b_{suspend, resume}
clocksource/drivers/timer-microchip-pit64b: Remove suspend/resume ops for ce
thermal/drivers/rcar_gen3_thermal: Add r8a779f0 support
clocksource/drivers/timer-mediatek: Implement CPUXGPT timers
dt-bindings: timer: mediatek: Add CPUX System Timer and MT6795 compatible
...
|
|
A pci_enable_pcie_error_reporting() should be balanced by a corresponding
pci_disable_pcie_error_reporting() call in the error handling path, as
already done in the remove function.
Fixes: 3ce7547e5b71 ("net: txgbe: Add build support for txgbe")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Jiawen Wu <[email protected]>
Link: https://lore.kernel.org/r/082003d00be1f05578c9c6434272ceb314609b8e.1659285240.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Fix Intel Alder Lake PEBS memory access latency & data source
profiling info bugs.
- Use Intel large-PEBS hardware feature in more circumstances, to
reduce PMI overhead & reduce sampling data.
- Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI
variant, which tells tooling the exact number of samples lost.
- Add new IBS register bits definitions.
- AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements.
* tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/ibs: Add new IBS register bits into header
perf/x86/intel: Fix PEBS data source encoding for ADL
perf/x86/intel: Fix PEBS memory access info encoding for ADL
perf/core: Add a new read format to get a number of lost samples
perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments
perf/x86/amd/uncore: Add PerfMonV2 DF event format
perf/x86/amd/uncore: Detect available DF counters
perf/x86/amd/uncore: Use attr_update for format attributes
perf/x86/amd/uncore: Use dynamic events array
x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
|
|
fix follow spelling misktakes:
desconstructed ==> deconstructed
enforcment ==> enforcement
Reported-by: Hacash Robot <[email protected]>
Signed-off-by: Xie Shaowen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit 08f588fa301bef ("devlink: introduce framework for selftests") adds
documentation for devlink selftests framework, but it is missing from
table of contents.
Add it.
Link: https://lore.kernel.org/linux-doc/[email protected]/
Fixes: 08f588fa301bef ("devlink: introduce framework for selftests")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Bagas Sanjaya <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"This was a fairly quiet cycle for the locking subsystem:
- lockdep: Fix a handful of the more complex lockdep_init_map_*()
primitives that can lose the lock_type & cause false reports. No
such mishap was observed in the wild.
- jump_label improvements: simplify the cross-arch support of initial
NOP patching by making it arch-specific code (used on MIPS only),
and remove the s390 initial NOP patching that was superfluous"
* tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix lockdep_init_map_*() confusion
jump_label: make initial NOP patching the special case
jump_label: mips: move module NOP patching into arch code
jump_label: s390: avoid pointless initial NOP patching
|
|
In the case of sk->dccps_qpolicy == DCCPQ_POLICY_PRIO, dccp_qpolicy_full
will drop a skb when qpolicy is full. And the lock in dccp_sendmsg is
released before sock_alloc_send_skb and then relocked after
sock_alloc_send_skb. The following conditions may lead dccp_qpolicy_push
to add skb to an already full sk_write_queue:
thread1--->lock
thread1--->dccp_qpolicy_full: queue is full. drop a skb
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_full: queue is not full. no need to drop.
thread2--->unlock
thread1--->lock
thread1--->dccp_qpolicy_push: add a skb. queue is full.
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_push: add a skb!
thread2--->unlock
Fix this by moving dccp_qpolicy_full.
Fixes: b1308dc015eb ("[DCCP]: Set TX Queue Length Bounds via Sysctl")
Signed-off-by: Hangyu Hua <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Guangbin Huang says:
====================
net: fix using wrong flags to check features
We find that some drivers may use wrong flags to check features,
so fix them.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The prototype of input features of ionic_set_nic_features() is
netdev_features_t, but the vlan_flags is using the private
definition of ionic drivers. It should use the variable
ctx.cmd.lif_setattr.features, rather than features to check
the vlan flags. So fixes it.
Fixes: beead698b173 ("ionic: Add the basic NDO callbacks for netdev support")
Signed-off-by: Jian Shen <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
vsi->current_netdev_flags is used store the current net device
flags, not the active netdevice features. So it should use
vsi->netdev->featurs, rather than vsi->current_netdev_flags
to check NETIF_F_HW_VLAN_CTAG_FILTER.
Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Jian Shen <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Acked-by: Tony Nguyen <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently nfp driver will reject to offload tunnel key action without
tunnel key ID which means tunnel ID is 0. But it is a normal case for tc
flower since user can setup a tunnel with tunnel ID is 0.
So we need to support this case to accept tunnel key action without
tunnel key ID.
Signed-off-by: Baowen Zheng <[email protected]>
Reviewed-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Eric Dumazet says:
====================
net: rose: fix module unload issues
Bernard Pidoux reported that unloading rose module could lead
to infamous "unregistered_netdevice:" issues.
First patch is the fix, stable candidate.
Second patch is adding netdev ref tracker to af_rose.
I chose net-next to not inflict merge conflicts, because
Jakub changed dev_put_track() to netdev_put_track() in net-next.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This will help debugging netdevice refcount problems with
CONFIG_NET_DEV_REFCNT_TRACKER=y
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Tested-by: Bernard Pidoux <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Bernard reported that trying to unload rose module would lead
to infamous messages:
unregistered_netdevice: waiting for rose0 to become free. Usage count = xx
This patch solves the issue, by making sure each socket referring to
a netdevice holds a reference count on it, and properly releases it
in rose_release().
rose_dev_first() is also fixed to take a device reference
before leaving the rcu_read_locked section.
Following patch will add ref_tracker annotations to ease
future bug hunting.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Bernard Pidoux <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Bernard Pidoux <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Load-balancing improvements:
- Improve NUMA balancing on AMD Zen systems for affine workloads.
- Improve the handling of reduced-capacity CPUs in load-balancing.
- Energy Model improvements: fix & refine all the energy fairness
metrics (PELT), and remove the conservative threshold requiring 6%
energy savings to migrate a task. Doing this improves power
efficiency for most workloads, and also increases the reliability
of energy-efficiency scheduling.
- Optimize/tweak select_idle_cpu() to spend (much) less time
searching for an idle CPU on overloaded systems. There's reports of
several milliseconds spent there on large systems with large
workloads ...
[ Since the search logic changed, there might be behavioral side
effects. ]
- Improve NUMA imbalance behavior. On certain systems with spare
capacity, initial placement of tasks is non-deterministic, and such
an artificial placement imbalance can persist for a long time,
hurting (and sometimes helping) performance.
The fix is to make fork-time task placement consistent with runtime
NUMA balancing placement.
Note that some performance regressions were reported against this,
caused by workloads that are not memory bandwith limited, which
benefit from the artificial locality of the placement bug(s). Mel
Gorman's conclusion, with which we concur, was that consistency is
better than random workload benefits from non-deterministic bugs:
"Given there is no crystal ball and it's a tradeoff, I think
it's better to be consistent and use similar logic at both fork
time and runtime even if it doesn't have universal benefit."
- Improve core scheduling by fixing a bug in
sched_core_update_cookie() that caused unnecessary forced idling.
- Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs
for newly woken tasks.
- Fix a newidle balancing bug that introduced unnecessary wakeup
latencies.
ABI improvements/fixes:
- Do not check capabilities and do not issue capability check denial
messages when a scheduler syscall doesn't require privileges. (Such
as increasing niceness.)
- Add forced-idle accounting to cgroups too.
- Fix/improve the RSEQ ABI to not just silently accept unknown flags.
(No existing tooling is known to have learned to rely on the
previous behavior.)
- Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags.
Optimizations:
- Optimize & simplify leaf_cfs_rq_list()
- Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg().
Misc fixes & cleanups:
- Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems.
- Fix a full-NOHZ bug that can in some cases result in the tick not
being re-enabled when the last SCHED_RT task is gone from a
runqueue but there's still SCHED_OTHER tasks around.
- Various PREEMPT_RT related fixes.
- Misc cleanups & smaller fixes"
* tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
rseq: Kill process when unknown flags are encountered in ABI structures
rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags
sched/core: Fix the bug that task won't enqueue into core tree when update cookie
nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt()
sched/core: Always flush pending blk_plug
sched/fair: fix case with reduced capacity CPU
sched/core: Use try_cmpxchg in set_nr_{and_not,if}_polling
sched/core: add forced idle accounting for cgroups
sched/fair: Remove the energy margin in feec()
sched/fair: Remove task_util from effective utilization in feec()
sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu()
sched/fair: Rename select_idle_mask to select_rq_mask
sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util()
sched/fair: Decay task PELT values during wakeup migration
sched/fair: Provide u64 read for 32-bits arch helper
sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg
sched: only perform capability check on privileged operation
sched: Remove unused function group_first_cpu()
sched/fair: Remove redundant word " *"
selftests/rseq: check if libc rseq support is registered
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- An addition of 'accounted' flag to slab allocation tracepoints to
indicate memcg_kmem accounting, by Vasily
- An optimization of memcg handling in freeing paths, by Muchun
- Various smaller fixes and cleanups
* tag 'slab-for-5.20_or_6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab_common: move generic bulk alloc/free functions to SLOB
mm/sl[au]b: use own bulk free function when bulk alloc failed
mm: slab: optimize memcg_slab_free_hook()
mm/tracing: add 'accounted' entry into output of allocation tracepoints
tools/vm/slabinfo: Handle files in debugfs
mm/slub: Simplify __kmem_cache_alias()
mm, slab: fix bad alignments
|
|
It's not possible for inode->i_security to be NULL here because every
inode will call inode_init_always and then lsm_inode_alloc to alloc
memory for inode->security, this is what LSM infrastructure management
do, so remove this redundant code.
Signed-off-by: Xiu Jianfeng <[email protected]>
Signed-off-by: Casey Schaufler <[email protected]>
|
|
Simplify the code by using kstrndup instead of kzalloc and strncpy in
smk_parse_smack(), which meanwhile remove strncpy as [1] suggests.
[1]: https://github.com/KSPP/linux/issues/90
Signed-off-by: GONG, Ruiqi <[email protected]>
Signed-off-by: Casey Schaufler <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2022-07-28
Jacob Keller says:
Convert all of the Intel drivers with PTP support to the newer .adjfine
implementation which uses scaled parts per million.
This improves the precision of the frequency adjustments by taking advantage
of the full scaled parts per million input coming from user space.
In addition, all implementations are converted to using the
mul_u64_u64_div_u64 function which better handles the intermediate value.
This function supports architecture specific instructions where possible to
avoid loss of precision if the normal 64-bit multiplication would overflow.
Of note, the i40e implementation is now able to avoid loss of precision on
slower link speeds by taking advantage of this to multiply by the link speed
factor first. This results in a significantly more precise adjustment by
allowing the calculation to impact the lower bits.
This also gets us a step closer to being able to remove the .adjfreq
entirely by removing its use from many drivers.
I plan to follow this up with a series to update the drivers from other
vendors and drop the .adjfreq implementation entirely.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igb: convert .adjfreq to .adjfine
ixgbe: convert .adjfreq to .adjfine
i40e: convert .adjfreq to .adjfine
i40e: use mul_u64_u64_div_u64 for PTP frequency calculation
e1000e: convert .adjfreq to .adjfine
e1000e: remove unnecessary range check in e1000e_phc_adjfreq
ice: implement adjfine with mul_u64_u64_div_u64
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add fsl,imx8ulp-fec for i.MX8ULP platform.
Signed-off-by: Wei Fang <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The use of kmap() is being deprecated in favor of kmap_local_page()
where it is feasible. For kmap around a memcpy there's a convenience
helper memcpy_to_page that also makes the flush_dcache_page() redundant.
CC: Fabio M. De Francesco <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Highlights include a major rework of our kPTI page-table rewriting
code (which makes it both more maintainable and considerably faster in
the cases where it is required) as well as significant changes to our
early boot code to reduce the need for data cache maintenance and
greatly simplify the KASLR relocation dance.
Summary:
- Remove unused generic cpuidle support (replaced by PSCI version)
- Fix documentation describing the kernel virtual address space
- Handling of some new CPU errata in Arm implementations
- Rework of our exception table code in preparation for handling
machine checks (i.e. RAS errors) more gracefully
- Switch over to the generic implementation of ioremap()
- Fix lockdep tracking in NMI context
- Instrument our memory barrier macros for KCSAN
- Rework of the kPTI G->nG page-table repainting so that the MMU
remains enabled and the boot time is no longer slowed to a crawl
for systems which require the late remapping
- Enable support for direct swapping of 2MiB transparent huge-pages
on systems without MTE
- Fix handling of MTE tags with allocating new pages with HW KASAN
- Expose the SMIDR register to userspace via sysfs
- Continued rework of the stack unwinder, particularly improving the
behaviour under KASAN
- More repainting of our system register definitions to match the
architectural terminology
- Improvements to the layout of the vDSO objects
- Support for allocating additional bits of HWCAP2 and exposing
FEAT_EBF16 to userspace on CPUs that support it
- Considerable rework and optimisation of our early boot code to
reduce the need for cache maintenance and avoid jumping in and out
of the kernel when handling relocation under KASLR
- Support for disabling SVE and SME support on the kernel
command-line
- Support for the Hisilicon HNS3 PMU
- Miscellanous cleanups, trivial updates and minor fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (136 commits)
arm64: Delay initialisation of cpuinfo_arm64::reg_{zcr,smcr}
arm64: fix KASAN_INLINE
arm64/hwcap: Support FEAT_EBF16
arm64/cpufeature: Store elf_hwcaps as a bitmap rather than unsigned long
arm64/hwcap: Document allocation of upper bits of AT_HWCAP
arm64: enable THP_SWAP for arm64
arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52
arm64: errata: Remove AES hwcap for COMPAT tasks
arm64: numa: Don't check node against MAX_NUMNODES
drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX
perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node()
docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING
arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
mm: kasan: Skip unpoisoning of user pages
mm: kasan: Ensure the tags are visible before the tag in page->flags
drivers/perf: hisi: add driver for HNS3 PMU
drivers/perf: hisi: Add description for HNS3 PMU driver
drivers/perf: riscv_pmu_sbi: perf format
perf/arm-cci: Use the bitmap API to allocate bitmaps
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Use RNG seed from bootinfo block on virt platform
- defconfig updates
- Minor fixes and improvements
* tag 'm68k-for-v5.20-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: defconfig: Update defconfigs for v5.19-rc1
m68k: Add common forward declaration for show_registers()
m68k: mac: Remove forward declaration for mac_nmi_handler()
m68k: virt: Fix missing platform_device_unregister() on error in virt_platform_init()
m68k: virt: Use RNG seed from bootinfo block
m68k: bitops: Change __fls to return and accept unsigned long
m68k: Kconfig.machine: Add endif comment
m68k: Kconfig.debug: Replace single quotes
m68k: Kconfig.cpu: Fix indentation and add endif comments
m68k: q40: Align '*' in comments
m68k: sun3: Use __func__ to get function's name in an output message
m68k: mac: Fix typos in comments
m68k: virt: Kconfig minor fixes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kdump updates from Borislav Petkov:
- Add the ability to pass early an RNG seed to the kernel from the boot
loader
- Add the ability to pass the IMA measurement of kernel and bootloader
to the kexec-ed kernel
* tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/setup: Use rng seeds from setup_data
x86/kexec: Carry forward IMA measurement log on kexec
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:
- Fix stack protector builds when cross compiling with Clang
- Other Kbuild improvements and fixes
* tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Omit use of bin2c
x86/purgatory: Hard-code obj-y in Makefile
x86/build: Remove unused OBJECT_FILES_NON_STANDARD_test_nx.o
x86/Kconfig: Fix CONFIG_CC_HAS_SANE_STACKPROTECTOR when cross compiling with clang
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:
- Have invalid MSR accesses warnings appear only once after a
pr_warn_once() change broke that
- Simplify {JMP,CALL}_NOSPEC and let the objtool retpoline patching
infra take care of them instead of having unreadable alternative
macros there
* tag 'x86_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/extable: Fix ex_handler_msr() print condition
x86,nospec: Simplify {JMP,CALL}_NOSPEC
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Add a bunch of PCI IDs for new AMD CPUs and use them in k10temp
- Free the pmem platform device on the registration error path
* tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hwmon: (k10temp): Add support for new family 17h and 19h models
x86/amd_nb: Add AMD PCI IDs for SMN communication
x86/pmem: Fix platform-device leak in error path
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- Remove the vendor check when selecting MWAIT as the default idle
state
- Respect idle=nomwait when supplied on the kernel cmdline
- Two small cleanups
* tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Use MSR_IA32_MISC_ENABLE constants
x86: Fix comment for X86_FEATURE_ZEN
x86: Remove vendor checks from prefer_mwait_c1_over_halt
x86: Handle idle=nomwait cmdline properly for x86_idle
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu update from Borislav Petkov:
- Add machinery to initialize AMX register state in order for
AMX-capable CPUs to be able to enter deeper low-power state
* tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Add a new flag to initialize the AMX state
x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:
- Rename a PKRU macro to make more sense when reading the code
- Update pkeys documentation
- Avoid reading contended mm's TLB generation var if not absolutely
necessary along with fixing a case where arch_tlbbatch_flush()
doesn't adhere to the generation scheme and thus violates the
conditions for the above avoidance.
* tag 'x86_mm_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/tlb: Ignore f->new_tlb_gen when zero
x86/pkeys: Clarify PKRU_AD_KEY macro
Documentation/protection-keys: Clean up documentation for User Space pkeys
x86/mm/tlb: Avoid reading mm_tlb_gen when possible
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanup from Borislav Petkov:
- A single CONFIG_ symbol correction in a comment
* tag 'x86_cleanups_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Refer to the intended config STRICT_DEVMEM in a comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vmware cleanup from Borislav Petkov:
- A single statement simplification by using the BIT() macro
* tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vmware: Use BIT() macro for shifting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS update from Borislav Petkov:
"A single RAS change:
- Probe whether hardware error injection (direct MSR writes) is
possible when injecting errors on AMD platforms. In some cases, the
platform could prohibit those"
* tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Check whether writes to MCA_STATUS are getting ignored
|