| Age | Commit message (Collapse) | Author | Files | Lines |
|
Per the reasoning in commit 4bf7fda4dce2 ("iommu/dma: Add config for
PCI SAC address trick") and its subsequent revert, this mechanism no
longer serves its original purpose, but now only works around broken
hardware/drivers in a way that is unfortunately too impactful to remove.
This does not, however, prevent us from solving the performance impact
which that workaround has on large-scale systems that don't need it.
Once the 32-bit IOVA space fills up and a workload starts allocating and
freeing on both sides of the boundary, the opportunistic SAC allocation
can then end up spending significant time hunting down scattered
fragments of free 32-bit space, or just reestablishing max32_alloc_size.
This can easily be exacerbated by a change in allocation pattern, such
as by changing the network MTU, which can increase pressure on the
32-bit space by leaving a large quantity of cached IOVAs which are now
the wrong size to be recycled, but also won't be freed since the
non-opportunistic allocations can still be satisfied from the whole
64-bit space without triggering the reclaim path.
However, in the context of a workaround where smaller DMA addresses
aren't simply a preference but a necessity, if we get to that point at
all then in fact it's already the endgame. The nature of the allocator
is currently such that the first IOVA we give to a device after the
32-bit space runs out will be the highest possible address for that
device, ever. If that works, then great, we know we can optimise for
speed by always allocating from the full range. And if it doesn't, then
the worst has already happened and any brokenness is now showing, so
there's little point in continuing to try to hide it.
To that end, implement a flag to refine the SAC business into a
per-device policy that can automatically get itself out of the way if
and when it stops being useful.
CC: Linus Torvalds <[email protected]>
CC: Jakub Kicinski <[email protected]>
Reviewed-by: John Garry <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
Tested-by: Vasant Hegde <[email protected]>
Tested-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/b8502b115b915d2a3fabde367e099e39106686c8.1681392791.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <[email protected]>
|
|
The Documentation/process/deprecated.rst suggests to use flexible array
members to provide a way to declare having a dynamically sized set of
trailing elements in a structure.This makes code robust agains bunch of
the issues described in the documentation, main of which is about the
correctness of the sizeof() calculation for this data structure.
Due to above, prefer struct_size() over open-coded versions.
Signed-off-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
Some of the PM features can be locked or disabled. In that case, write
interface can be locked.
This status is read via a mailbox. There is one TPMI ID which provides
base address for interface and data register for mail box operation.
The mailbox operations is defined in the TPMI specification. Refer to
https://github.com/intel/tpmi_power_management/ for TPMI specifications.
An API is exposed to feature drivers to read feature control status.
Signed-off-by: Srinivas Pandruvada <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
Immutable branch between pdx86 simatic branch and LED due for the v6.6 merge window
v6.5-rc1 + recent pdx86 simatic-ipc patches for
merging into the LED subsystem for v6.6.
|
|
This is the panel variant of a device we already did have. All the same,
just no LEDs.
Signed-off-by: Henning Schild <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
|
|
Siemens Simatic Industrial PCs can monitor the voltage of the CMOS
battery with two bits that indicate low or empty state. This can be GPIO
or PortIO based.
Here we model that as a hwmon voltage. The core driver does the PortIO
and provides boilerplate for the GPIO versions. Which are split out to
model runtime dependencies while allowing fine-grained kernel
configuration.
Signed-off-by: Henning Schild <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
Add driver for TI DS90UB960 FPD-Link III Deserializer.
Signed-off-by: Tomi Valkeinen <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
An ATR is a device that looks similar to an i2c-mux: it has an I2C
slave "upstream" port and N master "downstream" ports, and forwards
transactions from upstream to the appropriate downstream port. But it
is different in that the forwarded transaction has a different slave
address. The address used on the upstream bus is called the "alias"
and is (potentially) different from the physical slave address of the
downstream chip.
Add a helper file (just like i2c-mux.c for a mux or switch) to allow
implementing ATR features in a device driver. The helper takes care of
adapter creation/destruction and translates addresses at each transaction.
Signed-off-by: Luca Ceresoli <[email protected]>
Signed-off-by: Tomi Valkeinen <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Wolfram Sang <[email protected]>
Signed-off-by: Sakari Ailus <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
|
|
This adds support for the Siemens Simatic IPC model BX-21A. Actual
drivers for that model will be sent in separate patches.
Signed-off-by: Henning Schild <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
|
|
Add the following unlocked accessors to complete the set:
__mdiobus_modify()
__mdiodev_read()
__mdiodev_write()
__mdiodev_modify()
__mdiodev_modify_changed()
which we will need for Marvell DSA PCS conversion.
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add a function, phylink_pcs_change() which can be used by PCs drivers
to notify phylink about changes to the PCS link state.
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add hooks that are called before and after the mac_config() call,
which will be needed to deal with errata workarounds for the
Marvell 88e639x DSA switches.
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add phylink PCS enable/disable callbacks that will allow us to place
IEEE 802.3 register compliant PCS in power-down mode while not being
used.
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
icmpv6_flow_init(), ip6_datagram_flow_key_init() and ip6_mc_hdr() don't
need to modify their sk argument. Make that explicit using const.
Signed-off-by: Guillaume Nault <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
These functions don't need to modify the socket, so let's allow the
callers to pass a const struct sock *.
Signed-off-by: Guillaume Nault <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The sk_getsecid hook shouldn't need to modify its socket argument.
Make it const so that callers of security_sk_classify_flow() can use a
const struct sock *.
Signed-off-by: Guillaume Nault <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Avoid having to look up a dummy item from SMEM to detect if it is
already available or if we need to defer probing.
Reviewed-by: Konrad Dybcio <[email protected]>
Signed-off-by: Stephan Gerhold <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Andersson <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the boolean field of the plat_stmmacenet_data structure in favor of a
simple bitfield flag.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
struct plat_stmmacenet_data contains several boolean fields that could be
easily replaced with a common integer 'flags' bitfield and bit defines.
Start the process with the has_integrated_pcs field.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andrew Halaney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2023-07-13
We've added 67 non-merge commits during the last 15 day(s) which contain
a total of 106 files changed, 4444 insertions(+), 619 deletions(-).
The main changes are:
1) Fix bpftool build in presence of stale vmlinux.h,
from Alexander Lobakin.
2) Introduce bpf_me_mcache_free_rcu() and fix OOM under stress,
from Alexei Starovoitov.
3) Teach verifier actual bounds of bpf_get_smp_processor_id()
and fix perf+libbpf issue related to custom section handling,
from Andrii Nakryiko.
4) Introduce bpf map element count, from Anton Protopopov.
5) Check skb ownership against full socket, from Kui-Feng Lee.
6) Support for up to 12 arguments in BPF trampoline, from Menglong Dong.
7) Export rcu_request_urgent_qs_task, from Paul E. McKenney.
8) Fix BTF walking of unions, from Yafang Shao.
9) Extend link_info for kprobe_multi and perf_event links,
from Yafang Shao.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (67 commits)
selftests/bpf: Add selftest for PTR_UNTRUSTED
bpf: Fix an error in verifying a field in a union
selftests/bpf: Add selftests for nested_trust
bpf: Fix an error around PTR_UNTRUSTED
selftests/bpf: add testcase for TRACING with 6+ arguments
bpf, x86: allow function arguments up to 12 for TRACING
bpf, x86: save/restore regs with BPF_DW size
bpftool: Use "fallthrough;" keyword instead of comments
bpf: Add object leak check.
bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu.
bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().
selftests/bpf: Improve test coverage of bpf_mem_alloc.
rcu: Export rcu_request_urgent_qs_task()
bpf: Allow reuse from waiting_for_gp_ttrace list.
bpf: Add a hint to allocated objects.
bpf: Change bpf_mem_cache draining process.
bpf: Further refactor alloc_bulk().
bpf: Factor out inc/dec of active flag into helpers.
bpf: Refactor alloc_bulk().
bpf: Let free_all() return the number of freed elements.
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq. This occurs because when the vcpu thread is
resumed on its way to block, vcpu_load() will make the vPE resident
again. Once the vcpu actually blocks, we don't request a doorbell
anymore, and the vcpu won't be woken up on interrupt delivery.
Fix it by tracking that we're entering WFI, and key the doorbell
request on that flag. This allows us not to make the vPE resident
when going through a preempt/schedule cycle, meaning we don't lose
any state.
Cc: [email protected]
Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
Reported-by: Xiang Chen <[email protected]>
Suggested-by: Zenghui Yu <[email protected]>
Tested-by: Xiang Chen <[email protected]>
Co-developed-by: Oliver Upton <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Acked-by: Zenghui Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Oliver Upton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter, wireless and ebpf.
Current release - regressions:
- netfilter: conntrack: gre: don't set assured flag for clash entries
- wifi: iwlwifi: remove 'use_tfh' config to fix crash
Previous releases - regressions:
- ipv6: fix a potential refcount underflow for idev
- icmp6: ifix null-ptr-deref of ip6_null_entry->rt6i_idev in
icmp6_dev()
- bpf: fix max stack depth check for async callbacks
- eth: mlx5e:
- check for NOT_READY flag state after locking
- fix page_pool page fragment tracking for XDP
- eth: igc:
- fix tx hang issue when QBV gate is closed
- fix corner cases for TSN offload
- eth: octeontx2-af: Move validation of ptp pointer before its usage
- eth: ena: fix shift-out-of-bounds in exponential backoff
Previous releases - always broken:
- core: prevent skb corruption on frag list segmentation
- sched:
- cls_fw: fix improper refcount update leads to use-after-free
- sch_qfq: account for stab overhead in qfq_enqueue
- netfilter:
- report use refcount overflow
- prevent OOB access in nft_byteorder_eval
- wifi: mt7921e: fix init command fail with enabled device
- eth: ocelot: fix oversize frame dropping for preemptible TCs
- eth: fec: recycle pages for transmitted XDP frames"
* tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
selftests: tc-testing: add test for qfq with stab overhead
net/sched: sch_qfq: account for stab overhead in qfq_enqueue
selftests: tc-testing: add tests for qfq mtu sanity check
net/sched: sch_qfq: reintroduce lmax bound check for MTU
wifi: cfg80211: fix receiving mesh packets without RFC1042 header
wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
net: txgbe: fix eeprom calculation error
net/sched: make psched_mtu() RTNL-less safe
net: ena: fix shift-out-of-bounds in exponential backoff
netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
net/sched: flower: Ensure both minimum and maximum ports are specified
MAINTAINERS: Add another mailing list for QUALCOMM ETHQOS ETHERNET DRIVER
docs: netdev: update the URL of the status page
wifi: iwlwifi: remove 'use_tfh' config to fix crash
xdp: use trusted arguments in XDP hints kfuncs
bpf: cpumap: Fix memory leak in cpu_map_update_elem
wifi: airo: avoid uninitialized warning in airo_get_rate()
octeontx2-pf: Add additional check for MCAM rules
net: dsa: Removed unneeded of_node_put in felix_parse_ports_node
net: fec: use netdev_err_once() instead of netdev_err()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix some missing-prototype warnings
- Fix user events struct args (did not include size of struct)
When creating a user event, the "struct" keyword is to denote that
the size of the field will be passed in. But the parsing failed to
handle this case.
- Add selftest to struct sizes for user events
- Fix sample code for direct trampolines.
The sample code for direct trampolines attached to handle_mm_fault().
But the prototype changed and the direct trampoline sample code was
not updated. Direct trampolines needs to have the arguments correct
otherwise it can fail or crash the system.
- Remove unused ftrace_regs_caller_ret() prototype.
- Quiet false positive of FORTIFY_SOURCE
Due to backward compatibility, the structure used to save stack
traces in the kernel had a fixed size of 8. This structure is
exported to user space via the tracing format file. A change was made
to allow more than 8 functions to be recorded, and user space now
uses the size field to know how many functions are actually in the
stack.
But the structure still has size of 8 (even though it points into the
ring buffer that has the required amount allocated to hold a full
stack.
This was fine until the fortifier noticed that the
memcpy(&entry->caller, stack, size) was greater than the 8 functions
and would complain at runtime about it.
Hide this by using a pointer to the stack location on the ring buffer
instead of using the address of the entry structure caller field.
- Fix a deadloop in reading trace_pipe that was caused by a mismatch
between ring_buffer_empty() returning false which then asked to read
the data, but the read code uses rb_num_of_entries() that returned
zero, and causing a infinite "retry".
- Fix a warning caused by not using all pages allocated to store ftrace
functions, where this can happen if the linker inserts a bunch of
"NULL" entries, causing the accounting of how many pages needed to be
off.
- Fix histogram synthetic event crashing when the start event is
removed and the end event is still using a variable from it
- Fix memory leak in freeing iter->temp in tracing_release_pipe()
* tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix memory leak of iter->temp when reading trace_pipe
tracing/histograms: Add histograms to hist_vars if they have referenced variables
tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
ring-buffer: Fix deadloop issue on reading trace_pipe
tracing: arm64: Avoid missing-prototype warnings
selftests/user_events: Test struct size match cases
tracing/user_events: Fix struct arg size match check
x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
arm64: ftrace: Add direct call trampoline samples support
samples: ftrace: Save required argument registers in sample trampolines
|
|
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.5
- Don't require quirk to use duplicate namespace identifiers
(Christoph, Sagi)
- One more BOGUS_NID quirk (Pankaj)
- IO timeout and error hanlding fixes for PCI (Keith)
- Enhanced metadata format mask fix (Ankit)
- Association race condition fix for fibre channel (Michael)
- Correct debugfs error checks (Minjie)
- Use PAGE_SECTORS_SHIFT where needed (Damien)
- Reduce kernel logs for legacy nguid attribute (Keith)
- Use correct dma direction when unmapping metadata (Ming)"
* tag 'nvme-6.5-2023-07-13' of git://git.infradead.org/nvme:
nvme-pci: fix DMA direction of unmapping integrity data
nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
nvme: ensure disabling pairs with unquiesce
nvme-fc: fix race between error recovery and creating association
nvme-fc: return non-zero status code when fails to create association
nvme: fix parameter check in nvme_fault_inject_init()
nvme: warn only once for legacy uuid attribute
nvme: fix the NVME_ID_NS_NVM_STS_MASK definition
nvmet: use PAGE_SECTORS_SHIFT
nvme: add BOGUS_NID quirk for Samsung SM953
|
|
Make 'rom_attr_enabled' a single bit in a bitfield and move it close to an
existing bitfield so that they can be merged.
This field is only used in 'drivers/pci/pci-sysfs.c' to store 0 or 1.
On x86_64, this shrinks the size of 'struct pci_dev' by 8 bytes from 3560
to 3552.
Link: https://lore.kernel.org/r/d7a34ad369336db73145c3efbade895e792a0ad3.1687105455.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
Group some bitfield variables to reduce holes. On x86_64, this shrinks the
size of 'struct pci_dev' by 16 bytes (from 3576 to 3560) when compiled with
'allmodconfig'.
Link: https://lore.kernel.org/r/407b17c3e56764ef2c558898d4ff4c6c04b2d757.1687105455.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
pci_enable_pcie_error_reporting() is used only inside aer.c. Stop exposing
it outside the file.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
|
|
pci_disable_pcie_error_reporting() has no callers. Remove it.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
|
|
As core scheduling introduced, a new state of idle is defined as
force idle, running idle task but nr_running greater than zero.
If a cpu is in force idle state, idle_cpu() will return zero. This
result makes sense in some scenarios, e.g., load balance,
showacpu when dumping, and judge the RCU boost kthread is starving.
But this will cause error in other scenarios, e.g., tick_irq_exit():
When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
not become 1, which became 0 in tick_nohz_irq_enter().
ts->idle_sleeptime won't update in function update_ts_time_stats(), if
ts->idle_active is 0, which should be 1. And this bug will result that
ts->idle_sleeptime is less than the actual value, and finally will
result that the idle time in /proc/stat is less than the actual value.
To solve this problem, we introduce sched_core_idle_cpu(), which
returns 1 when force idle. We audit all users of idle_cpu(), and
change idle_cpu() into sched_core_idle_cpu() in function
tick_irq_exit().
v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
function tick_irq_exit(). And modify the corresponding commit log.
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Peter Zijlstra <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Joel Fernandes <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
We currently export the total throttled time for cgroups that are given
a bandwidth limit. This patch extends this accounting to also account
the total time that each children cgroup has been throttled.
This is useful to understand the degree to which children have been
affected by the throttling control. Children which are not runnable
during the entire throttled period, for example, will not show any
self-throttling time during this period.
Expose this in a new interface, 'cpu.stat.local', which is similar to
how non-hierarchical events are accounted in 'memory.events.local'.
Signed-off-by: Josh Don <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In put_task_struct(), a spin_lock is indirectly acquired under the kernel
stock. When running the kernel in real-time (RT) configuration, the
operation is dispatched to a preemptible context call to ensure
guaranteed preemption. However, if PROVE_RAW_LOCK_NESTING is enabled
and __put_task_struct() is called while holding a raw_spinlock, lockdep
incorrectly reports an "Invalid lock context" in the stock kernel.
This false splat occurs because lockdep is unaware of the different
route taken under RT. To address this issue, override the inner wait
type to prevent the false lockdep splat.
Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Sebastian Andrzej Siewior <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.
One practical example is splat inside inactive_task_timer(), which is
called in a interrupt context:
CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W ---------
Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012
Call Trace:
dump_stack_lvl+0x57/0x7d
mark_lock_irq.cold+0x33/0xba
mark_lock+0x1e7/0x400
mark_usage+0x11d/0x140
__lock_acquire+0x30d/0x930
lock_acquire.part.0+0x9c/0x210
rt_spin_lock+0x27/0xe0
refill_obj_stock+0x3d/0x3a0
kmem_cache_free+0x357/0x560
inactive_task_timer+0x1ad/0x340
__run_hrtimer+0x8a/0x1a0
__hrtimer_run_queues+0x91/0x130
hrtimer_interrupt+0x10f/0x220
__sysvec_apic_timer_interrupt+0x7b/0xd0
sysvec_apic_timer_interrupt+0x4f/0xd0
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0033:0x7fff196bf6f5
Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.
The issue is reproducible with stress-ng:
while true; do
stress-ng --sched deadline --sched-period 1000000000 \
--sched-runtime 800000000 --sched-deadline \
1000000000 --mmapfork 23 -t 20
done
Reported-by: Hu Chunyu <[email protected]>
Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Valentin Schneider <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Eric Dumazet says[1]:
-------
Speaking of psched_mtu(), I see that net/sched/sch_pie.c is using it
without holding RTNL, so dev->mtu can be changed underneath.
KCSAN could issue a warning.
-------
Annotate dev->mtu with READ_ONCE() so KCSAN don't issue a warning.
[1] https://lore.kernel.org/all/CANn89iJoJO5VtaJ-2=_d2aOQhb0Xw8iBT_Cxqp2HyuS-zj6azw@mail.gmail.com/
v1 -> v2: Fix commit message
Fixes: d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme")
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
objects into free_by_rcu_ttrace list where they are waiting for RCU
task trace grace period to be freed into slab.
The life cycle of objects:
alloc: dequeue free_llist
free: enqeueu free_llist
free_rcu: enqueue free_by_rcu -> waiting_for_gp
free_llist above high watermark -> free_by_rcu_ttrace
after RCU GP waiting_for_gp -> free_by_rcu_ttrace
free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
If a CPU is executing a long series of non-sleeping system calls,
RCU grace periods can be delayed for on the order of a couple hundred
milliseconds. This is normally not a problem, but if each system call
does a call_rcu(), those callbacks can stack up. RCU will eventually
notice this callback storm, but use of rcu_request_urgent_qs_task()
allows the code invoking call_rcu() to give RCU a heads up.
This function is not for general use, not yet, anyway.
Reported-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- Fix fprobe's rethook release issues:
- Release rethook after ftrace_ops is unregistered so that the
rethook is not accessed after free.
- Stop rethook before ftrace_ops is unregistered so that the
rethook is NOT used after exiting unregister_fprobe()
- Fix eprobe cleanup logic. If it attaches to multiple events and
failes to enable one of them, rollback all enabled events correctly.
- Fix fprobe to unlock ftrace recursion lock correctly when it missed
by another running kprobe.
- Cleanup kprobe to remove unnecessary NULL.
- Cleanup kprobe to remove unnecessary 0 initializations.
* tag 'probes-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
kernel: kprobes: Remove unnecessary ‘0’ values
kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addr
fprobe: add unlock to match a succeeded ftrace_test_recursion_trylock
kernel/trace: Fix cleanup logic of enable_trace_eprobe
fprobe: Release rethook after the ftrace_ops is unregistered
|
|
These are all tracing W=1 warnings in arm64 allmodconfig about missing
prototypes:
kernel/trace/trace_kprobe_selftest.c:7:5: error: no previous prototype for 'kprobe_trace_selftest_target' [-Werror=missing-pro
totypes]
kernel/trace/ftrace.c:329:5: error: no previous prototype for '__register_ftrace_function' [-Werror=missing-prototypes]
kernel/trace/ftrace.c:372:5: error: no previous prototype for '__unregister_ftrace_function' [-Werror=missing-prototypes]
kernel/trace/ftrace.c:4130:15: error: no previous prototype for 'arch_ftrace_match_adjust' [-Werror=missing-prototypes]
kernel/trace/fgraph.c:243:15: error: no previous prototype for 'ftrace_return_to_handler' [-Werror=missing-prototypes]
kernel/trace/fgraph.c:358:6: error: no previous prototype for 'ftrace_graph_sleep_time_control' [-Werror=missing-prototypes]
arch/arm64/kernel/ftrace.c:460:6: error: no previous prototype for 'prepare_ftrace_return' [-Werror=missing-prototypes]
arch/arm64/kernel/ptrace.c:2172:5: error: no previous prototype for 'syscall_trace_enter' [-Werror=missing-prototypes]
arch/arm64/kernel/ptrace.c:2195:6: error: no previous prototype for 'syscall_trace_exit' [-Werror=missing-prototypes]
Move the declarations to an appropriate header where they can be seen
by the caller and callee, and make sure the headers are included where
needed.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Florent Revest <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
[ Fixed ftrace_return_to_handler() to handle CONFIG_HAVE_FUNCTION_GRAPH_RETVAL case ]
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Expose various CPU and dGPU tunables that are available on many ASUS
ROG laptops. The tunables shown in sysfs will vary depending on the CPU
and dGPU vendor.
All of these variables are write only and there is no easy way to find
what the defaults are. In general they seem to default to the max value
the vendor sets for the CPU and dGPU package - this is not the same as
the min/max writable value. Values written to these variables that are
beyond the capabilities of the CPU are ignored by the laptop.
Signed-off-by: Luke D. Jones <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
Support changing the mini-LED mode on some of the newer ASUS laptops.
Signed-off-by: Luke D. Jones <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
Exposes the WMI method which tells if the eGPU is properly connected
on the devices that support it.
Signed-off-by: Luke D. Jones <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|