Age | Commit message (Collapse) | Author | Files | Lines |
|
The direct trampoline and graph coexistence test sets global_ops to
trace only 'trace_selftest_dynamic_test_func', but does not reset it
after the test is completed, resulting in the function filter being set
already after the system starts. Although it can be reset through the
tracefs interface, it is more or less confusing to the user, and we
should reset it to trace all functions after the trampoline/graph test
completes.
Link: https://lkml.kernel.org/r/20220427034119.24668-1-lihuafei1@huawei.com
Link: https://lore.kernel.org/all/20220418073958.104029-1-lihuafei1@huawei.com/
Fixes: 130c08065848 ("tracing: Add trampoline/graph selftest")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The print fmt check against trace events to make sure that the format does
not use pointers that may be freed from the time of the trace to the time
the event is read, gives a false positive on %pISpc when reading data that
was saved in __get_dynamic_array() when it is perfectly fine to do so, as
the data being read is on the ring buffer.
Link: https://lore.kernel.org/all/20220407144524.2a592ed6@canb.auug.org.au/
Cc: stable@vger.kernel.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In __sanitizer_cov_trace_pc(), previously we write pc before updating pos.
However, some early interrupt code could bypass check_kcov_mode() check
and invoke __sanitizer_cov_trace_pc(). If such interrupt is raised
between writing pc and updating pos, the pc could be overitten by the
recursive __sanitizer_cov_trace_pc().
As suggested by Dmitry, we cold update pos before writing pc to avoid such
interleaving.
Apply the same change to write_comp_data().
Link: https://lkml.kernel.org/r/20220523053531.1572793-1-liu3101@purdue.edu
Signed-off-by: Congyu Liu <liu3101@purdue.edu>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Support TCPv6 segmentation offload with super-segments larger than
64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
- Generalize skb freeing deferral to per-cpu lists, instead of
per-socket lists.
- Add a netdev statistic for packets dropped due to L2 address
mismatch (rx_otherhost_dropped).
- Continue work annotating skb drop reasons.
- Accept alternative netdev names (ALT_IFNAME) in more netlink
requests.
- Add VLAN support for AF_PACKET SOCK_RAW GSO.
- Allow receiving skb mark from the socket as a cmsg.
- Enable memcg accounting for veth queues, sysctl tables and IPv6.
BPF
---
- Add libbpf support for User Statically-Defined Tracing (USDTs).
- Speed up symbol resolution for kprobes multi-link attachments.
- Support storing typed pointers to referenced and unreferenced
objects in BPF maps.
- Add support for BPF link iterator.
- Introduce access to remote CPU map elements in BPF per-cpu map.
- Allow middle-of-the-road settings for the
kernel.unprivileged_bpf_disabled sysctl.
- Implement basic types of dynamic pointers e.g. to allow for
dynamically sized ringbuf reservations without extra memory copies.
Protocols
---------
- Retire port only listening_hash table, add a second bind table
hashed by port and address. Avoid linear list walk when binding to
very popular ports (e.g. 443).
- Add bridge FDB bulk flush filtering support allowing user space to
remove all FDB entries matching a condition.
- Introduce accept_unsolicited_na sysctl for IPv6 to implement
router-side changes for RFC9131.
- Support for MPTCP path manager in user space.
- Add MPTCP support for fallback to regular TCP for connections that
have never connected additional subflows or transmitted
out-of-sequence data (partial support for RFC8684 fallback).
- Avoid races in MPTCP-level window tracking, stabilize and improve
throughput.
- Support lockless operation of GRE tunnels with seq numbers enabled.
- WiFi support for host based BSS color collision detection.
- Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
- Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
- Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
- Allow matching on the number of VLAN tags via tc-flower.
- Add tracepoint for tcp_set_ca_state().
Driver API
----------
- Improve error reporting from classifier and action offload.
- Add support for listing line cards in switches (devlink).
- Add helpers for reporting page pool statistics with ethtool -S.
- Add support for reading clock cycles when using PTP virtual clocks,
instead of having the driver convert to time before reporting. This
makes it possible to report time from different vclocks.
- Support configuring low-latency Tx descriptor push via ethtool.
- Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
New hardware / drivers
----------------------
- Ethernet:
- Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
- Sunplus SP7021 SoC (sp7021_emac)
- Add support for Renesas RZ/V2M (in ravb)
- Add support for MediaTek mt7986 switches (in mtk_eth_soc)
- Ethernet PHYs:
- ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
- TI DP83TD510 PHY
- Microchip LAN8742/LAN88xx PHYs
- WiFi:
- Driver for pureLiFi X, XL, XC devices (plfxlc)
- Driver for Silicon Labs devices (wfx)
- Support for WCN6750 (in ath11k)
- Support Realtek 8852ce devices (in rtw89)
- Mobile:
- MediaTek T700 modems (Intel 5G 5000 M.2 cards)
- CAN:
- ctucanfd: add support for CTU CAN FD open-source IP core from
Czech Technical University in Prague
Drivers
-------
- Delete a number of old drivers still using virt_to_bus().
- Ethernet NICs:
- intel: support TSO on tunnels MPLS
- broadcom: support multi-buffer XDP
- nfp: support VF rate limiting
- sfc: use hardware tx timestamps for more than PTP
- mlx5: multi-port eswitch support
- hyper-v: add support for XDP_REDIRECT
- atlantic: XDP support (including multi-buffer)
- macb: improve real-time perf by deferring Tx processing to NAPI
- High-speed Ethernet switches:
- mlxsw: implement basic line card information querying
- prestera: add support for traffic policing on ingress and egress
- Embedded Ethernet switches:
- lan966x: add support for packet DMA (FDMA)
- lan966x: add support for PTP programmable pins
- ti: cpsw_new: enable bc/mc storm prevention
- Qualcomm 802.11ax WiFi (ath11k):
- Wake-on-WLAN support for QCA6390 and WCN6855
- device recovery (firmware restart) support
- support setting Specific Absorption Rate (SAR) for WCN6855
- read country code from SMBIOS for WCN6855/QCA6390
- enable keep-alive during WoWLAN suspend
- implement remain-on-channel support
- MediaTek WiFi (mt76):
- support Wireless Ethernet Dispatch offloading packet movement
between the Ethernet switch and WiFi interfaces
- non-standard VHT MCS10-11 support
- mt7921 AP mode support
- mt7921 IPv6 NS offload support
- Ethernet PHYs:
- micrel: ksz9031/ksz9131: cabletest support
- lan87xx: SQI support for T1 PHYs
- lan937x: add interrupt support for link detection"
* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
ptp: ocp: Add firmware header checks
ptp: ocp: fix PPS source selector debugfs reporting
ptp: ocp: add .init function for sma_op vector
ptp: ocp: vectorize the sma accessor functions
ptp: ocp: constify selectors
ptp: ocp: parameterize input/output sma selectors
ptp: ocp: revise firmware display
ptp: ocp: add Celestica timecard PCI ids
ptp: ocp: Remove #ifdefs around PCI IDs
ptp: ocp: 32-bit fixups for pci start address
Revert "net/smc: fix listen processing for SMC-Rv2"
ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
selftests/bpf: Dynptr tests
bpf: Add dynptr data slices
bpf: Add bpf_dynptr_read and bpf_dynptr_write
bpf: Dynptr support for ring buffers
bpf: Add bpf_dynptr_from_mem for local dynptrs
bpf: Add verifier support for dynptrs
bpf: Suppress 'passing zero to PTR_ERR' warning
bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
...
|
|
Pull workqueue update from Tejun Heo:
"A lone commit fixing CPU offline handling for per-cpu wq workers so
that they don't bother isolated CPUs"
* 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Nothing too interesting. This adds cpu controller selftests and there
are a couple code cleanup patches"
* 'for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: remove the superfluous judgment
cgroup: Make cgroup_debug static
kseltest/cgroup: Make test_stress.sh work if run interactively
kselftest/cgroup: fix test_stress.sh to use OUTPUT dir
cgroup: Add config file to cgroup selftest suite
cgroup: Add test_cpucg_max_nested() testcase
cgroup: Add test_cpucg_max() testcase
cgroup: Add test_cpucg_nested_weight_underprovisioned() testcase
cgroup: Adding test_cpucg_nested_weight_overprovisioned() testcase
cgroup: Add test_cpucg_weight_underprovisioned() testcase
cgroup: Add test_cpucg_weight_overprovisioned() testcase
cgroup: Add test_cpucg_stats() testcase to cgroup cpu selftests
cgroup: Add new test_cpu.c test suite in cgroup selftests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"Several fixes, cleanups, and enhancements to tests and framework:
- introduce _NULL and _NOT_NULL macros to pointer error checks
- rework kunit_resource allocation policy to fix memory leaks when
caller doesn't specify free() function to be used when allocating
memory using kunit_add_resource() and kunit_alloc_resource() funcs.
- add ability to specify suite-level init and exit functions"
* tag 'linux-kselftest-kunit-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (41 commits)
kunit: tool: Use qemu-system-i386 for i386 runs
kunit: fix executor OOM error handling logic on non-UML
kunit: tool: update riscv QEMU config with new serial dependency
kcsan: test: use new suite_{init,exit} support
kunit: tool: Add list of all valid test configs on UML
kunit: take `kunit_assert` as `const`
kunit: tool: misc cleanups
kunit: tool: minor cosmetic cleanups in kunit_parser.py
kunit: tool: make parser stop overwriting status of suites w/ no_tests
kunit: tool: remove dead parse_crash_in_log() logic
kunit: tool: print clearer error message when there's no TAP output
kunit: tool: stop using a shell to run kernel under QEMU
kunit: tool: update test counts summary line format
kunit: bail out of test filtering logic quicker if OOM
lib/Kconfig.debug: change KUnit tests to default to KUNIT_ALL_TESTS
kunit: Rework kunit_resource allocation policy
kunit: fix debugfs code to use enum kunit_status, not bool
kfence: test: use new suite_{init/exit} support, add .kunitconfig
kunit: add ability to specify suite-level init and exit functions
kunit: rename print_subtest_{start,end} for clarity (s/subtest/suite)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Offload writing printk() messages on consoles to per-console
kthreads.
It prevents soft-lockups when an extensive amount of messages is
printed. It was observed, for example, during boot of large systems
with a lot of peripherals like disks or network interfaces.
It prevents live-lockups that were observed, for example, when
messages about allocation failures were reported and a CPU handled
consoles instead of reclaiming the memory. It was hard to solve even
with rate limiting because it would need to take into account the
amount of messages and the speed of all consoles.
It is a must to have for real time. Otherwise, any printk() might
break latency guarantees.
The per-console kthreads allow to handle each console on its own
speed. Slow consoles do not longer slow down faster ones. And
printk() does not longer unpredictably slows down various code paths.
There are situations when the kthreads are either not available or
not reliable, for example, early boot, suspend, or panic. In these
situations, printk() uses the legacy mode and tries to handle
consoles immediately.
- Add documentation for the printk index.
* tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk, tracing: fix console tracepoint
printk: remove @console_locked
printk: extend console_lock for per-console locking
printk: add kthread console printers
printk: add functions to prefer direct printing
printk: add pr_flush()
printk: move buffer definitions into console_emit_next_record() caller
printk: refactor and rework printing logic
printk: add con_printk() macro for console details
printk: call boot_delay_msec() in printk_delay()
printk: get caller_id/timestamp after migration disable
printk: wake waiters for safe and NMI contexts
printk: wake up all waiters
printk: add missing memory barrier to wake_up_klogd()
printk: cpu sync always disable interrupts
printk: rename cpulock functions
printk/index: Printk index feature documentation
MAINTAINERS: Add printk indexing maintainers on mention of printk_index
|
|
We're unconditionally registering sys-off handler for the legacy
pm_power_off() callback, this causes problem for platforms that don't
use power-off handlers at all and should be halted. Now reboot syscall
assumes that there is a power-off handler installed and tries to power
off system instead of halting it.
To fix the trouble, move the handler's registration to the reboot syscall
and check the pm_power_off() presence.
Fixes: 0e2110d2e910 ("kernel/reboot: Add kernel_can_power_off()")
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
|
|
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add support for 'artificial' Energy Models in which power
numbers for different entities may be in different scales, add support
for some new hardware, fix bugs and clean up code in multiple places.
Specifics:
- Update the Energy Model support code to allow the Energy Model to
be artificial, which means that the power values may not be on a
uniform scale with other devices providing power information, and
update the cpufreq_cooling and devfreq_cooling thermal drivers to
support artificial Energy Models (Lukasz Luba).
- Make DTPM check the Energy Model type (Lukasz Luba).
- Fix policy counter decrementation in cpufreq if Energy Model is in
use (Pierre Gondois).
- Add CPU-based scaling support to passive devfreq governor (Saravana
Kannan, Chanwoo Choi).
- Update the rk3399_dmc devfreq driver (Brian Norris).
- Export dev_pm_ops instead of suspend() and resume() in the IIO
chemical scd30 driver (Jonathan Cameron).
- Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
PM-runtime counterparts (Jonathan Cameron).
- Move symbol exports in the IIO chemical scd30 driver into the
IIO_SCD30 namespace (Jonathan Cameron).
- Avoid device PM-runtime usage count underflows (Rafael Wysocki).
- Allow dynamic debug to control printing of PM messages (David
Cohen).
- Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
Bai).
- Preserve ACPI-table override during hibernation (Amadeusz
Sławiński).
- Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).
- Make Intel RAPL power capping driver support the RaptorLake and
AlderLake N processors (Zhang Rui, Sumeet Pawnikar).
- Remove redundant store to value after multiply in the RAPL power
capping driver (Colin Ian King).
- Add AlderLake processor support to the intel_idle driver (Zhang
Rui).
- Fix regression leading to no genpd governor in the PSCI cpuidle
driver and fix the riscv-sbi cpuidle driver to allow a genpd
governor to be used (Ulf Hansson).
- Fix cpufreq governor clean up code to avoid using kfree() directly
to free kobject-based items (Kevin Hao).
- Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe
Leroy).
- Make intel_pstate notify frequency invariance code when no_turbo is
turned on and off (Chen Yu).
- Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
Pandruvada).
- Make cpufreq avoid unnecessary frequency updates due to mismatch
between hardware and the frequency table (Viresh Kumar).
- Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
code (Viresh Kumar).
- Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
calling convention for some driver callbacks consistent (Rafael
Wysocki).
- Avoid accessing half-initialized cpufreq policies from the show()
and store() sysfs functions (Schspa Shi).
- Rearrange cpufreq_offline() to make the calling convention for some
driver callbacks consistent (Schspa Shi).
- Update CPPC handling in cpufreq (Pierre Gondois).
- Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).
- Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
Hansson).
- Improve the way genpd deals with its governors (Ulf Hansson).
- Update the turbostat utility to version 2022.04.16 (Len Brown, Dan
Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)"
* tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits)
PM: domains: Trust domain-idle-states from DT to be correct by genpd
PM: domains: Measure power-on/off latencies in genpd based on a governor
PM: domains: Allocate governor data dynamically based on a genpd governor
PM: domains: Clean up some code in pm_genpd_init() and genpd_remove()
PM: domains: Fix initialization of genpd's next_wakeup
PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd
PM: domains: Measure suspend/resume latencies in genpd based on governor
PM: domains: Move the next_wakeup variable into the struct gpd_timing_data
PM: domains: Allocate gpd_timing_data dynamically based on governor
PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain()
PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd
PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd
PM: domains: Drop redundant code for genpd always-on governor
PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor
powercap: intel_rapl: remove redundant store to value after multiply
cpufreq: CPPC: Enable dvfs_possible_from_any_cpu
cpufreq: CPPC: Enable fast_switch
ACPI: CPPC: Assume no transition latency if no PCCT
ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
ACPI: CPPC: Check _OSC for flexible address space
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
- Rework USER_NOTIF notification ordering and kill logic (Sargun
Dhillon)
- Improved PTRACE_O_SUSPEND_SECCOMP selftest (Jann Horn)
- Gracefully handle failed unshare() in selftests (Yang Guang)
- Spelling fix (Colin Ian King)
* tag 'seccomp-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: Fix spelling mistake "Coud" -> "Could"
selftests/seccomp: Add test for wait killable notifier
selftests/seccomp: Refactor get_proc_stat to split out file reading code
seccomp: Add wait_killable semantic to seccomp user notifier
selftests/seccomp: Ensure that notifications come in FIFO order
seccomp: Use FIFO semantics to order notifications
selftests/seccomp: Add SKIP for failed unshare()
selftests/seccomp: Test PTRACE_O_SUSPEND_SECCOMP without CAP_SYS_ADMIN
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kernel hardening updates from Kees Cook:
- usercopy hardening expanded to check other allocation types (Matthew
Wilcox, Yuanzheng Song)
- arm64 stackleak behavioral improvements (Mark Rutland)
- arm64 CFI code gen improvement (Sami Tolvanen)
- LoadPin LSM block dev API adjustment (Christoph Hellwig)
- Clang randstruct support (Bill Wendling, Kees Cook)
* tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (34 commits)
loadpin: stop using bdevname
mm: usercopy: move the virt_addr_valid() below the is_vmalloc_addr()
gcc-plugins: randstruct: Remove cast exception handling
af_unix: Silence randstruct GCC plugin warning
niu: Silence randstruct warnings
big_keys: Use struct for internal payload
gcc-plugins: Change all version strings match kernel
randomize_kstack: Improve docs on requirements/rationale
lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n
arm64: entry: use stackleak_erase_on_task_stack()
stackleak: add on/off stack variants
lkdtm/stackleak: check stack boundaries
lkdtm/stackleak: prevent unexpected stack usage
lkdtm/stackleak: rework boundary management
lkdtm/stackleak: avoid spurious failure
stackleak: rework poison scanning
stackleak: rework stack high bound handling
stackleak: clarify variable names
stackleak: rework stack low bound handling
stackleak: remove redundant check
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"These updates continue to refine the work began in 5.17 and 5.18 of
modernizing the RNG's crypto and streamlining and documenting its
code.
New for 5.19, the updates aim to improve entropy collection methods
and make some initial decisions regarding the "premature next" problem
and our threat model. The cloc utility now reports that random.c is
931 lines of code and 466 lines of comments, not that basic metrics
like that mean all that much, but at the very least it tells you that
this is very much a manageable driver now.
Here's a summary of the various updates:
- The random_get_entropy() function now always returns something at
least minimally useful. This is the primary entropy source in most
collectors, which in the best case expands to something like RDTSC,
but prior to this change, in the worst case it would just return 0,
contributing nothing. For 5.19, additional architectures are wired
up, and architectures that are entirely missing a cycle counter now
have a generic fallback path, which uses the highest resolution
clock available from the timekeeping subsystem.
Some of those clocks can actually be quite good, despite the CPU
not having a cycle counter of its own, and going off-core for a
stamp is generally thought to increase jitter, something positive
from the perspective of entropy gathering. Done very early on in
the development cycle, this has been sitting in next getting some
testing for a while now and has relevant acks from the archs, so it
should be pretty well tested and fine, but is nonetheless the thing
I'll be keeping my eye on most closely.
- Of particular note with the random_get_entropy() improvements is
MIPS, which, on CPUs that lack the c0 count register, will now
combine the high-speed but short-cycle c0 random register with the
lower-speed but long-cycle generic fallback path.
- With random_get_entropy() now always returning something useful,
the interrupt handler now collects entropy in a consistent
construction.
- Rather than comparing two samples of random_get_entropy() for the
jitter dance, the algorithm now tests many samples, and uses the
amount of differing ones to determine whether or not jitter entropy
is usable and how laborious it must be. The problem with comparing
only two samples was that if the cycle counter was extremely slow,
but just so happened to be on the cusp of a change, the slowness
wouldn't be detected. Taking many samples fixes that to some
degree.
This, combined with the other improvements to random_get_entropy(),
should make future unification of /dev/random and /dev/urandom
maybe more possible. At the very least, were we to attempt it again
today (we're not), it wouldn't break any of Guenter's test rigs
that broke when we tried it with 5.18. So, not today, but perhaps
down the road, that's something we can revisit.
- We attempt to reseed the RNG immediately upon waking up from system
suspend or hibernation, making use of the various timestamps about
suspend time and such available, as well as the usual inputs such
as RDRAND when available.
- Batched randomness now falls back to ordinary randomness before the
RNG is initialized. This provides more consistent guarantees to the
types of random numbers being returned by the various accessors.
- The "pre-init injection" code is now gone for good. I suspect you
in particular will be happy to read that, as I recall you
expressing your distaste for it a few months ago. Instead, to avoid
a "premature first" issue, while still allowing for maximal amount
of entropy availability during system boot, the first 128 bits of
estimated entropy are used immediately as it arrives, with the next
128 bits being buffered. And, as before, after the RNG has been
fully initialized, it winds up reseeding anyway a few seconds later
in most cases. This resulted in a pretty big simplification of the
initialization code and let us remove various ad-hoc mechanisms
like the ugly crng_pre_init_inject().
- The RNG no longer pretends to handle the "premature next" security
model, something that various academics and other RNG designs have
tried to care about in the past. After an interesting mailing list
thread, these issues are thought to be a) mainly academic and not
practical at all, and b) actively harming the real security of the
RNG by delaying new entropy additions after a potential compromise,
making a potentially bad situation even worse. As well, in the
first place, our RNG never even properly handled the premature next
issue, so removing an incomplete solution to a fake problem was
particularly nice.
This allowed for numerous other simplifications in the code, which
is a lot cleaner as a consequence. If you didn't see it before,
https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a
thread worth skimming through.
- While the interrupt handler received a separate code path years ago
that avoids locks by using per-cpu data structures and a faster
mixing algorithm, in order to reduce interrupt latency, input and
disk events that are triggered in hardirq handlers were still
hitting locks and more expensive algorithms. Those are now
redirected to use the faster per-cpu data structures.
- Rather than having the fake-crypto almost-siphash-based random32
implementation be used right and left, and in many places where
cryptographically secure randomness is desirable, the batched
entropy code is now fast enough to replace that.
- As usual, numerous code quality and documentation cleanups. For
example, the initialization state machine now uses enum symbolic
constants instead of just hard coding numbers everywhere.
- Since the RNG initializes once, and then is always initialized
thereafter, a pretty heavy amount of code used during that
initialization is never used again. It is now completely cordoned
off using static branches and it winds up in the .text.unlikely
section so that it doesn't reduce cache compactness after the RNG
is ready.
- A variety of functions meant for waiting on the RNG to be
initialized were only used by vsprintf, and in not a particularly
optimal way. Replacing that usage with a more ordinary setup made
it possible to remove those functions.
- A cleanup of how we warn userspace about the use of uninitialized
/dev/urandom and uninitialized get_random_bytes() usage.
Interestingly, with the change you merged for 5.18 that attempts to
use jitter (but does not block if it can't), the majority of users
should never see those warnings for /dev/urandom at all now, and
the one for in-kernel usage is mainly a debug thing.
- The file_operations struct for /dev/[u]random now implements
.read_iter and .write_iter instead of .read and .write, allowing it
to also implement .splice_read and .splice_write, which makes
splice(2) work again after it was broken here (and in many other
places in the tree) during the set_fs() removal. This was a bit of
a last minute arrival from Jens that hasn't had as much time to
bake, so I'll be keeping my eye on this as well, but it seems
fairly ordinary. Unfortunately, read_iter() is around 3% slower
than read() in my tests, which I'm not thrilled about. But Jens and
Al, spurred by this observation, seem to be making progress in
removing the bottlenecks on the iter paths in the VFS layer in
general, which should remove the performance gap for all drivers.
- Assorted other bug fixes, cleanups, and optimizations.
- A small SipHash cleanup"
* tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits)
random: check for signals after page of pool writes
random: wire up fops->splice_{read,write}_iter()
random: convert to using fops->write_iter()
random: convert to using fops->read_iter()
random: unify batched entropy implementations
random: move randomize_page() into mm where it belongs
random: remove mostly unused async readiness notifier
random: remove get_random_bytes_arch() and add rng_has_arch_random()
random: move initialization functions out of hot pages
random: make consistent use of buf and len
random: use proper return types on get_random_{int,long}_wait()
random: remove extern from functions in header
random: use static branch for crng_ready()
random: credit architectural init the exact amount
random: handle latent entropy and command line from random_init()
random: use proper jiffies comparison macro
random: remove ratelimiting for in-kernel unseeded randomness
random: move initialization out of reseeding hot path
random: avoid initializing twice in credit race
random: use symbolic constants for crng_init states
...
|
|
KGDB and KDB allow read and write access to kernel memory, and thus
should be restricted during lockdown. An attacker with access to a
serial port (for example, via a hypervisor console, which some cloud
vendors provide over the network) could trigger the debugger so it is
important that the debugger respect the lockdown mode when/if it is
triggered.
Fix this by integrating lockdown into kdb's existing permissions
mechanism. Unfortunately kgdb does not have any permissions mechanism
(although it certainly could be added later) so, for now, kgdb is simply
and brutally disabled by immediately exiting the gdb stub without taking
any action.
For lockdowns established early in the boot (e.g. the normal case) then
this should be fine but on systems where kgdb has set breakpoints before
the lockdown is enacted than "bad things" will happen.
CVE: CVE-2022-21499
Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Updates to scheduler metrics:
- PELT fixes & enhancements
- PSI fixes & enhancements
- Refactor cpu_util_without()
- Updates to instrumentation/debugging:
- Remove sched_trace_*() helper functions - can be done via debug
info
- Fix double update_rq_clock() warnings
- Introduce & use "preemption model accessors" to simplify some of the
Kconfig complexity.
- Make softirq handling RT-safe.
- Misc smaller fixes & cleanups.
* tag 'sched-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
topology: Remove unused cpu_cluster_mask()
sched: Reverse sched_class layout
sched/deadline: Remove superfluous rq clock update in push_dl_task()
sched/core: Avoid obvious double update_rq_clock warning
smp: Make softirq handling RT safe in flush_smp_call_function_queue()
smp: Rename flush_smp_call_function_from_idle()
sched: Fix missing prototype warnings
sched/fair: Remove cfs_rq_tg_path()
sched/fair: Remove sched_trace_*() helper functions
sched/fair: Refactor cpu_util_without()
sched/fair: Revise comment about lb decision matrix
sched/psi: report zeroes for CPU full at the system level
sched/fair: Delete useless condition in tg_unthrottle_up()
sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq
sched/fair: Move calculate of avg_load to a better location
mailmap: Update my email address to @redhat.com
MAINTAINERS: Add myself as scheduler topology reviewer
psi: Fix trigger being fired unexpectedly at initial
ftrace: Use preemption model accessors for trace header printout
kcsan: Use preemption model accessors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
"Platform PMU changes:
- x86/intel:
- Add new Intel Alder Lake and Raptor Lake support
- x86/amd:
- AMD Zen4 IBS extensions support
- Add AMD PerfMonV2 support
- Add AMD Fam19h Branch Sampling support
Generic changes:
- signal: Deliver SIGTRAP on perf event asynchronously if blocked
Perf instrumentation can be driven via SIGTRAP, but this causes a
problem when SIGTRAP is blocked by a task & terminate the task.
Allow user-space to request these signals asynchronously (after
they get unblocked) & also give the information to the signal
handler when this happens:
"To give user space the ability to clearly distinguish
synchronous from asynchronous signals, introduce
siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for
flags in case more binary information is required in future).
The resolution to the problem is then to (a) no longer force the
signal (avoiding the terminations), but (b) tell user space via
si_perf_flags if the signal was synchronous or not, so that such
signals can be handled differently (e.g. let user space decide
to ignore or consider the data imprecise). "
- Unify/standardize the /sys/devices/cpu/events/* output format.
- Misc fixes & cleanups"
* tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
perf/x86/amd/core: Fix reloading events for SVM
perf/x86/amd: Run AMD BRS code only on supported hw
perf/x86/amd: Fix AMD BRS period adjustment
perf/x86/amd: Remove unused variable 'hwc'
perf/ibs: Fix comment
perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute
perf/amd/ibs: Add support for L3 miss filtering
perf/amd/ibs: Use ->is_visible callback for dynamic attributes
perf/amd/ibs: Cascade pmu init functions' return value
perf/x86/uncore: Add new Alder Lake and Raptor Lake support
perf/x86/uncore: Clean up uncore_pci_ids[]
perf/x86/cstate: Add new Alder Lake and Raptor Lake support
perf/x86/msr: Add new Alder Lake and Raptor Lake support
perf/x86: Add new Alder Lake and Raptor Lake support
perf/amd/ibs: Use interrupt regs ip for stack unwinding
perf/x86/amd/core: Add PerfMonV2 overflow handling
perf/x86/amd/core: Add PerfMonV2 counter control
perf/x86/amd/core: Detect available counters
perf/x86/amd/core: Detect PerfMonV2 support
x86/msr: Add PerfCntrGlobal* registers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Comprehensive interface overhaul:
=================================
Objtool's interface has some issues:
- Several features are done unconditionally, without any way to
turn them off. Some of them might be surprising. This makes
objtool tricky to use, and prevents porting individual features
to other arches.
- The config dependencies are too coarse-grained. Objtool
enablement is tied to CONFIG_STACK_VALIDATION, but it has several
other features independent of that.
- The objtool subcmds ("check" and "orc") are clumsy: "check" is
really a subset of "orc", so it has all the same options.
The subcmd model has never really worked for objtool, as it only
has a single purpose: "do some combination of things on an object
file".
- The '--lto' and '--vmlinux' options are nonsensical and have
surprising behavior.
Overhaul the interface:
- get rid of subcmds
- make all features individually selectable
- remove and/or clarify confusing/obsolete options
- update the documentation
- fix some bugs found along the way
- Fix x32 regression
- Fix Kbuild cleanup bugs
- Add scripts/objdump-func helper script to disassemble a single
function from an object file.
- Rewrite scripts/faddr2line to be section-aware, by basing it on
'readelf', moving it away from 'nm', which doesn't handle multiple
sections well, which can result in decoding failure.
- Rewrite & fix symbol handling - which had a number of bugs wrt.
object files that don't have global symbols - which is rare but
possible. Also fix a bunch of symbol handling bugs found along the
way.
* tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
objtool: Fix objtool regression on x32 systems
objtool: Fix symbol creation
scripts/faddr2line: Fix overlapping text section failures
scripts: Create objdump-func helper script
objtool: Remove libsubcmd.a when make clean
objtool: Remove inat-tables.c when make clean
objtool: Update documentation
objtool: Remove --lto and --vmlinux in favor of --link
objtool: Add HAVE_NOINSTR_VALIDATION
objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"
objtool: Make noinstr hacks optional
objtool: Make jump label hack optional
objtool: Make static call annotation optional
objtool: Make stack validation frame-pointer-specific
objtool: Add CONFIG_OBJTOOL
objtool: Extricate sls from stack validation
objtool: Rework ibt and extricate from stack validation
objtool: Make stack validation optional
objtool: Add option to print section addresses
objtool: Don't print parentheses in function addresses
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- rwsem cleanups & optimizations/fixes:
- Conditionally wake waiters in reader/writer slowpaths
- Always try to wake waiters in out_nolock path
- Add try_cmpxchg64() implementation, with arch optimizations - and use
it to micro-optimize sched_clock_{local,remote}()
- Various force-inlining fixes to address objdump instrumentation-check
warnings
- Add lock contention tracepoints:
lock:contention_begin
lock:contention_end
- Misc smaller fixes & cleanups
* tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
locking/atomic/x86: Introduce arch_try_cmpxchg64
locking/atomic: Add generic try_cmpxchg64 support
futex: Remove a PREEMPT_RT_FULL reference.
locking/qrwlock: Change "queue rwlock" to "queued rwlock"
lockdep: Delete local_irq_enable_in_hardirq()
locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
locking: Apply contention tracepoints in the slow path
locking: Add lock contention tracepoints
locking/rwsem: Always try to wake waiters in out_nolock path
locking/rwsem: Conditionally wake waiters in reader/writer slowpaths
locking/rwsem: No need to check for handoff bit if wait queue empty
lockdep: Fix -Wunused-parameter for _THIS_IP_
x86/mm: Force-inline __phys_addr_nodebug()
x86/kvm/svm: Force-inline GHCB accessors
task_stack, x86/cea: Force-inline stack helpers
|
|
include/{linux,asm-generic}/export.h defines a weak symbol, __crc_*
as a placeholder.
Genksyms writes the version CRCs into the linker script, which will be
used for filling the __crc_* symbols. The linker script format depends
on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset
to the reference of CRC.
It is time to get rid of this complexity.
Now that modpost parses text files (.*.cmd) to collect all the CRCs,
it can generate C code that will be linked to the vmlinux or modules.
Generate a new C file, .vmlinux.export.c, which contains the CRCs of
symbols exported by vmlinux. It is compiled and linked to vmlinux in
scripts/link-vmlinux.sh.
Put the CRCs of symbols exported by modules into the existing *.mod.c
files. No additional build step is needed for modules. As before,
*.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal.
No linker magic is used here. The new C implementation works in the
same way, whether CONFIG_RELOCATABLE is enabled or not.
CONFIG_MODULE_REL_CRCS is no longer needed.
Previously, Kbuild invoked additional $(LD) to update the CRCs in
objects, but this step is unneeded too.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
|
|
All three versions of klp_arch_set_pc() do exactly the same: they
call ftrace_instruction_pointer_set().
Call ftrace_instruction_pointer_set() directly and remove
klp_arch_set_pc().
As klp_arch_set_pc() was the only thing remaining in asm/livepatch.h
on x86 and s390, remove asm/livepatch.h
livepatch.h remains on powerpc but its content is exclusively used
by powerpc specific code.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Initial support for the ARMv9 Scalable Matrix Extension (SME).
SME takes the approach used for vectors in SVE and extends this to
provide architectural support for matrix operations. No KVM support
yet, SME is disabled in guests.
- Support for crashkernel reservations above ZONE_DMA via the
'crashkernel=X,high' command line option.
- btrfs search_ioctl() fix for live-lock with sub-page faults.
- arm64 perf updates: support for the Hisilicon "CPA" PMU for
monitoring coherent I/O traffic, support for Arm's CMN-650 and
CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup.
- Kselftest updates for SME, BTI, MTE.
- Automatic generation of the system register macros from a 'sysreg'
file describing the register bitfields.
- Update the type of the function argument holding the ESR_ELx register
value to unsigned long to match the architecture register size
(originally 32-bit but extended since ARMv8.0).
- stacktrace cleanups.
- ftrace cleanups.
- Miscellaneous updates, most notably: arm64-specific huge_ptep_get(),
avoid executable mappings in kexec/hibernate code, drop TLB flushing
from get_clear_flush() (and rename it to get_clear_contig()),
ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits)
arm64/sysreg: Generate definitions for FAR_ELx
arm64/sysreg: Generate definitions for DACR32_EL2
arm64/sysreg: Generate definitions for CSSELR_EL1
arm64/sysreg: Generate definitions for CPACR_ELx
arm64/sysreg: Generate definitions for CONTEXTIDR_ELx
arm64/sysreg: Generate definitions for CLIDR_EL1
arm64/sve: Move sve_free() into SVE code section
arm64: Kconfig.platforms: Add comments
arm64: Kconfig: Fix indentation and add comments
arm64: mm: avoid writable executable mappings in kexec/hibernate code
arm64: lds: move special code sections out of kernel exec segment
arm64/hugetlb: Implement arm64 specific huge_ptep_get()
arm64/hugetlb: Use ptep_get() to get the pte value of a huge page
arm64: kdump: Do not allocate crash low memory if not needed
arm64/sve: Generate ZCR definitions
arm64/sme: Generate defintions for SVCR
arm64/sme: Generate SMPRI_EL1 definitions
arm64/sme: Automatically generate SMPRIMAP_EL2 definitions
arm64/sme: Automatically generate SMIDR_EL1 defines
arm64/sme: Automatically generate defines for SMCR
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Make use of the IBM z16 processor activity instrumentation facility
to count cryptography operations: add a new PMU device driver so that
perf can make use of this.
- Add new IBM z16 extended counter set to cpumf support.
- Add vdso randomization support.
- Add missing KCSAN instrumentation to barriers and spinlocks, which
should make s390's KCSAN support complete.
- Add support for IPL-complete-control facility: notify the hypervisor
that kexec finished work and the kernel starts.
- Improve error logging for PCI.
- Various small changes to workaround llvm's integrated assembler
limitations, and one bug, to make it finally possible to compile the
kernel with llvm's integrated assembler. This also requires to raise
the minimum clang version to 14.0.0.
- Various other small enhancements, bug fixes, and cleanups all over
the place.
* tag 's390-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/head: get rid of 31 bit leftovers
scripts/min-tool-version.sh: raise minimum clang version to 14.0.0 for s390
s390/boot: do not emit debug info for assembly with llvm's IAS
s390/boot: workaround llvm IAS bug
s390/purgatory: workaround llvm's IAS limitations
s390/entry: workaround llvm's IAS limitations
s390/alternatives: remove padding generation code
s390/alternatives: provide identical sized orginal/alternative sequences
s390/cpumf: add new extended counter set for IBM z16
s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES
s390/stp: clock_delta should be signed
s390/stp: fix todoff size
s390/pai: add support for cryptography counters
entry: Rename arch_check_user_regs() to arch_enter_from_user_mode()
s390/compat: cleanup compat_linux.h header file
s390/entry: remove broken and not needed code
s390/boot: convert parmarea to C
s390/boot: convert initial lowcore to C
s390/ptrace: move short psw definitions to ptrace header file
s390/head: initialize all new psws
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"This includes some small changes to kernel/stop_machine.c and arch/x86
which are deps of the new Intel IFS support.
Highlights:
- New drivers:
- Intel "In Field Scan" (IFS) support
- Winmate FM07/FM07P buttons
- Mellanox SN2201 support
- AMD PMC driver enhancements
- Lots of various other small fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
platform/x86: intel_cht_int33fe: Set driver data
platform/x86: intel-hid: fix _DSM function index handling
platform/x86: toshiba_acpi: use kobj_to_dev()
platform/x86: samsung-laptop: use kobj_to_dev()
platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
tools/power/x86/intel-speed-select: Display error on turbo mode disabled
Documentation: In-Field Scan
platform/x86/intel/ifs: add ABI documentation for IFS
trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
platform/x86/intel/ifs: Add IFS sysfs interface
platform/x86/intel/ifs: Add scan test support
platform/x86/intel/ifs: Authenticate and copy to secured memory
platform/x86/intel/ifs: Check IFS Image sanity
platform/x86/intel/ifs: Read IFS firmware image
platform/x86/intel/ifs: Add stub driver for In-Field Scan
stop_machine: Add stop_core_cpuslocked() for per-core operations
x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
x86/microcode/intel: Expose collect_cpu_info_early() for IFS
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Borislav Petkov:
- Add Raptor Lake to the set of CPU models which support splitlock
- Make life miserable for apps using split locks by slowing them down
considerably while the rest of the system remains responsive. The
hope is it will hurt more and people will really fix their misaligned
locks apps. As a result, free a TIF bit.
* tag 'x86_splitlock_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/split_lock: Enable the split lock feature on Raptor Lake
x86/split-lock: Remove unused TIF_SLD bit
x86/split_lock: Make life miserable for split lockers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Borislav Petkov:
- Remove all the code around GS switching on 32-bit now that it is not
needed anymore
- Other misc improvements
* tag 'x86_core_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bug: Use normal relative pointers in 'struct bug_entry'
x86/nmi: Make register_nmi_handler() more robust
x86/asm: Merge load_gs_index()
x86/32: Remove lazy GS macros
ELF: Remove elf_core_copy_kernel_regs()
x86/32: Simplify ELF_CORE_COPY_REGS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:
- Add a "make x86_debug.config" target which enables a bunch of useful
config debug options when trying to debug an issue
- A gcc-12 build warnings fix
* tag 'x86_build_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Wrap literal addresses in absolute_pointer()
x86/configs: Add x86 debugging Kconfig fragment plus docs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel TDX support from Borislav Petkov:
"Intel Trust Domain Extensions (TDX) support.
This is the Intel version of a confidential computing solution called
Trust Domain Extensions (TDX). This series adds support to run the
kernel as part of a TDX guest. It provides similar guest protections
to AMD's SEV-SNP like guest memory and register state encryption,
memory integrity protection and a lot more.
Design-wise, it differs from AMD's solution considerably: it uses a
software module which runs in a special CPU mode called (Secure
Arbitration Mode) SEAM. As the name suggests, this module serves as
sort of an arbiter which the confidential guest calls for services it
needs during its lifetime.
Just like AMD's SNP set, this series reworks and streamlines certain
parts of x86 arch code so that this feature can be properly
accomodated"
* tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/tdx: Fix RETs in TDX asm
x86/tdx: Annotate a noreturn function
x86/mm: Fix spacing within memory encryption features message
x86/kaslr: Fix build warning in KASLR code in boot stub
Documentation/x86: Document TDX kernel architecture
ACPICA: Avoid cache flush inside virtual machines
x86/tdx/ioapic: Add shared bit for IOAPIC base address
x86/mm: Make DMA memory shared for TD guest
x86/mm/cpa: Add support for TDX shared memory
x86/tdx: Make pages shared in ioremap()
x86/topology: Disable CPU online/offline control for TDX guests
x86/boot: Avoid #VE during boot for TDX platforms
x86/boot: Set CR0.NE early and keep it set during the boot
x86/acpi/x86/boot: Add multiprocessor wake-up support
x86/boot: Add a trampoline for booting APs via firmware handoff
x86/tdx: Wire up KVM hypercalls
x86/tdx: Port I/O: Add early boot support
x86/tdx: Port I/O: Add runtime hypercalls
x86/boot: Port I/O: Add decompression-time support for TDX
x86/boot: Port I/O: Allow to hook up alternative helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and timekeeping updates from Thomas Gleixner:
- Expose CLOCK_TAI to instrumentation to aid with TSN debugging.
- Ensure that the clockevent is stopped when there is no timer armed to
avoid pointless wakeups.
- Make the sched clock frequency handling and rounding consistent.
- Provide a better debugobject hint for delayed works. The timer
callback is always the same, which makes it difficult to identify the
underlying work. Use the work function as a hint instead.
- Move the timer specific sysctl code into the timer subsystem.
- The usual set of improvements and cleanups
* tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Provide a better debugobjects hint for delayed works
time/sched_clock: Fix formatting of frequency reporting code
time/sched_clock: Use Hz as the unit for clock rate reporting below 4kHz
time/sched_clock: Round the frequency reported to nearest rather than down
timekeeping: Consolidate fast timekeeper
timekeeping: Annotate ktime_get_boot_fast_ns() with data_race()
timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
timekeeping: Introduce fast accessor to clock tai
tracing/timer: Add missing argument documentation of trace points
clocksource: Replace cpumask_weight() with cpumask_empty()
timers: Move timer sysctl into the timer code
clockevents: Use dedicated list iterator variable
timers: Simplify calc_index()
timers: Initialize base::next_expiry_recalc in timers_prepare_cpu()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt handling updates from Thomas Gleixner:
"Core code:
- Make the managed interrupts more robust by shutting them down in
the core code when the assigned affinity mask does not contain
online CPUs.
- Make the irq simulator chip work on RT
- A small set of cpumask and power manageent cleanups
Drivers:
- A set of changes which mark GPIO interrupt chips immutable to
prevent the GPIO subsystem from modifying it under the hood. This
provides the necessary infrastructure and converts a set of GPIO
and pinctrl drivers over.
- A set of changes to make the pseudo-NMI handling for GICv3 more
robust: a missing barrier and consistent handling of the priority
mask.
- Another set of GICv3 improvements and fixes, but nothing
outstanding
- The usual set of improvements and cleanups all over the place
- No new irqchip drivers and not even a new device tree binding!
100+ interrupt chips are truly enough"
* tag 'irq-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
irqchip: Add Kconfig symbols for sunxi drivers
irqchip/gic-v3: Fix priority mask handling
irqchip/gic-v3: Refactor ISB + EOIR at ack time
irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling
genirq/irq_sim: Make the irq_work always run in hard irq context
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
irqchip/gic: Improved warning about incorrect type
irqchip/csky: Return true/false (not 1/0) from bool functions
irqchip/imx-irqsteer: Add runtime PM support
irqchip/imx-irqsteer: Constify irq_chip struct
irqchip/armada-370-xp: Enable MSI affinity configuration
irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
irqchip/sun6i-r: Use NULL for chip_data
irqchip/xtensa-mx: Fix initial IRQ affinity in non-SMP setup
irqchip/exiu: Fix acknowledgment of edge triggered interrupts
irqchip/gic-v3: Claim iomem resources
dt-bindings: interrupt-controller: arm,gic-v3: Make the v2 compat requirements explicit
irqchip/gic-v3: Relax polling of GIC{R,D}_CTLR.RWP
irqchip/gic-v3: Detect LPI invalidation MMIO registers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner:
- Initialize the per-CPU structures during early boot so that the state
is consistent from the very beginning.
- Make the virtualization hotplug state handling more robust and let
the core bringup CPUs which timed out in an earlier attempt again.
- Make the x86/xen CPU state tracking consistent on a failed online
attempt, so a consecutive bringup does not fall over the inconsistent
state.
* tag 'smp-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Initialise all cpuhp_cpu_state structs earlier
cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again.
x86/xen: Allow to retry if cpu_initialize_context() failed.
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-05-23
We've added 113 non-merge commits during the last 26 day(s) which contain
a total of 121 files changed, 7425 insertions(+), 1586 deletions(-).
The main changes are:
1) Speed up symbol resolution for kprobes multi-link attachments, from Jiri Olsa.
2) Add BPF dynamic pointer infrastructure e.g. to allow for dynamically sized ringbuf
reservations without extra memory copies, from Joanne Koong.
3) Big batch of libbpf improvements towards libbpf 1.0 release, from Andrii Nakryiko.
4) Add BPF link iterator to traverse links via seq_file ops, from Dmitrii Dolgov.
5) Add source IP address to BPF tunnel key infrastructure, from Kaixi Fan.
6) Refine unprivileged BPF to disable only object-creating commands, from Alan Maguire.
7) Fix JIT blinding of ld_imm64 when they point to subprogs, from Alexei Starovoitov.
8) Add BPF access to mptcp_sock structures and their meta data, from Geliang Tang.
9) Add new BPF helper for access to remote CPU's BPF map elements, from Feng Zhou.
10) Allow attaching 64-bit cookie to BPF link of fentry/fexit/fmod_ret, from Kui-Feng Lee.
11) Follow-ups to typed pointer support in BPF maps, from Kumar Kartikeya Dwivedi.
12) Add busy-poll test cases to the XSK selftest suite, from Magnus Karlsson.
13) Improvements in BPF selftest test_progs subtest output, from Mykola Lysenko.
14) Fill bpf_prog_pack allocator areas with illegal instructions, from Song Liu.
15) Add generic batch operations for BPF map-in-map cases, from Takshak Chahande.
16) Make bpf_jit_enable more user friendly when permanently on 1, from Tiezhu Yang.
17) Fix an array overflow in bpf_trampoline_get_progs(), from Yuntao Wang.
====================
Link: https://lore.kernel.org/r/20220523223805.27931-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new helper function
void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);
which returns a pointer to the underlying data of a dynptr. *len*
must be a statically known value. The bpf program may access the returned
data slice as a normal buffer (eg can do direct reads and writes), since
the verifier associates the length with the returned pointer, and
enforces that no out of bounds accesses occur.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-6-joannelkoong@gmail.com
|
|
This patch adds two helper functions, bpf_dynptr_read and
bpf_dynptr_write:
long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);
long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);
The dynptr passed into these functions must be valid dynptrs that have
been initialized.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-5-joannelkoong@gmail.com
|
|
Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.
The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.
There are 3 new APIs:
long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-4-joannelkoong@gmail.com
|
|
This patch adds a new api bpf_dynptr_from_mem:
long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
which initializes a dynptr to point to a bpf program's local memory. For now
only local memory that is of reg type PTR_TO_MAP_VALUE is supported.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-3-joannelkoong@gmail.com
|
|
This patch adds the bulk of the verifier work for supporting dynamic
pointers (dynptrs) in bpf.
A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure
defined internally as:
struct bpf_dynptr_kern {
void *data;
u32 size;
u32 offset;
} __aligned(8);
The upper 8 bits of *size* is reserved (it contains extra metadata about
read-only status and dynptr type). Consequently, a dynptr only supports
memory less than 16 MB.
There are different types of dynptrs (eg malloc, ringbuf, ...). In this
patchset, the most basic one, dynptrs to a bpf program's local memory,
is added. For now only local memory that is of reg type PTR_TO_MAP_VALUE
is supported.
In the verifier, dynptr state information will be tracked in stack
slots. When the program passes in an uninitialized dynptr
(ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding
to the frame pointer where the dynptr resides at are marked
STACK_DYNPTR. For helper functions that take in initialized dynptrs (eg
bpf_dynptr_read + bpf_dynptr_write which are added later in this
patchset), the verifier enforces that the dynptr has been initialized
properly by checking that their corresponding stack slots have been
marked as STACK_DYNPTR.
The 6th patch in this patchset adds test cases that the verifier should
successfully reject, such as for example attempting to use a dynptr
after doing a direct write into it inside the bpf program.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20220523210712.3641569-2-joannelkoong@gmail.com
|
|
Kernel Test Robot complains about passing zero to PTR_ERR for the said
line, suppress it by using PTR_ERR_OR_ZERO.
Fixes: c0a5a21c25f3 ("bpf: Allow storing referenced kptr in map")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220521132620.1976921-1-memxor@gmail.com
|
|
Introduce bpf_arch_text_invalidate and use it to fill unused part of the
bpf_prog_pack with illegal instructions when a BPF program is freed.
Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator")
Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220520235758.1858153-4-song@kernel.org
|
|
bpf_prog_pack enables sharing huge pages among multiple BPF programs.
These pages are marked as executable before the JIT engine fill it with
BPF programs. To make these pages safe, fill the hole bpf_prog_pack with
illegal instructions before making it executable.
Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator")
Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220520235758.1858153-2-song@kernel.org
|
|
Pull block updates from Jens Axboe:
"Here are the core block changes for 5.19. This contains:
- blk-throttle accounting fix (Laibin)
- Series removing redundant assignments (Michal)
- Expose bio cache via the bio_set, so that DM can use it (Mike)
- Finish off the bio allocation interface cleanups by dealing with
the weirdest member of the family. bio_kmalloc combines a kmalloc
for the bio and bio_vecs with a hidden bio_init call and magic
cleanup semantics (Christoph)
- Clean up the block layer API so that APIs consumed by file systems
are (almost) only struct block_device based, so that file systems
don't have to poke into block layer internals like the
request_queue (Christoph)
- Clean up the blk_execute_rq* API (Christoph)
- Clean up various lose end in the blk-cgroup code to make it easier
to follow in preparation of reworking the blkcg assignment for bios
(Christoph)
- Fix use-after-free issues in BFQ when processes with merged queues
get moved to different cgroups (Jan)
- BFQ fixes (Jan)
- Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming,
Wolfgang, me)"
* tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits)
blk-mq: fix typo in comment
bfq: Remove bfq_requeue_request_body()
bfq: Remove superfluous conversion from RQ_BIC()
bfq: Allow current waker to defend against a tentative one
bfq: Relax waker detection for shared queues
blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
blk-throttle: Set BIO_THROTTLED when bio has been throttled
blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
blk-cgroup: always terminate io.stat lines
block, bfq: make bfq_has_work() more accurate
block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
block: cleanup the VM accounting in submit_bio
block: Fix the bio.bi_opf comment
block: reorder the REQ_ flags
blk-iocost: combine local_stat and desc_stat to stat
block: improve the error message from bio_check_eod
block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
block: remove superfluous calls to blkcg_bio_issue_init
kthread: unexport kthread_blkcg
blk-cgroup: cleanup blkcg_maybe_throttle_current
...
|
|
Pull io_uring updates from Jens Axboe:
"Here are the main io_uring changes for 5.19. This contains:
- Fixes for sparse type warnings (Christoph, Vasily)
- Support for multi-shot accept (Hao)
- Support for io_uring managed fixed files, rather than always
needing the applicationt o manage the indices (me)
- Fix for a spurious poll wakeup (Dylan)
- CQE overflow fixes (Dylan)
- Support more types of cancelations (me)
- Support for co-operative task_work signaling, rather than always
forcing an IPI (me)
- Support for doing poll first when appropriate, rather than always
attempting a transfer first (me)
- Provided buffer cleanups and support for mapped buffers (me)
- Improve how io_uring handles inflight SCM files (Pavel)
- Speedups for registered files (Pavel, me)
- Organize the completion data in a struct in io_kiocb rather than
keep it in separate spots (Pavel)
- task_work improvements (Pavel)
- Cleanup and optimize the submission path, in general and for
handling links (Pavel)
- Speedups for registered resource handling (Pavel)
- Support sparse buffers and file maps (Pavel, me)
- Various fixes and cleanups (Almog, Pavel, me)"
* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
io_uring: fix incorrect __kernel_rwf_t cast
io_uring: disallow mixed provided buffer group registrations
io_uring: initialize io_buffer_list head when shared ring is unregistered
io_uring: add fully sparse buffer registration
io_uring: use rcu_dereference in io_close
io_uring: consistently use the EPOLL* defines
io_uring: make apoll_events a __poll_t
io_uring: drop a spurious inline on a forward declaration
io_uring: don't use ERR_PTR for user pointers
io_uring: use a rwf_t for io_rw.flags
io_uring: add support for ring mapped supplied buffers
io_uring: add io_pin_pages() helper
io_uring: add buffer selection support to IORING_OP_NOP
io_uring: fix locking state for empty buffer group
io_uring: implement multishot mode for accept
io_uring: let fast poll support multishot
io_uring: add REQ_F_APOLL_MULTISHOT for requests
io_uring: add IORING_ACCEPT_MULTISHOT for accept
io_uring: only wake when the correct events are set
io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU update from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offloading updates, mainly simplifications
- RCU-tasks updates, including some -rt fixups, handling of systems
with sparse CPU numbering, and a fix for a boot-time race-condition
failure
- Put SRCU on a memory diet in order to reduce the size of the
srcu_struct structure
- Torture-test updates fixing some bugs in tests and closing some
testing holes
- Torture-test updates for the RCU tasks flavors, most notably ensuring
that building rcutorture and friends does not change the
RCU-tasks-related Kconfig options
- Torture-test scripting updates
- Expedited grace-period updates, most notably providing
milliseconds-scale (not all that) soft real-time response from
synchronize_rcu_expedited().
This is also the first time in almost 30 years of RCU that someone
other than me has pushed for a reduction in the RCU CPU stall-warning
timeout, in this case by more than three orders of magnitude from 21
seconds to 20 milliseconds. This tighter timeout applies only to
expedited grace periods
* tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
rcu: Move expedited grace period (GP) work to RT kthread_worker
rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
srcu: Drop needless initialization of sdp in srcu_gp_start()
srcu: Prevent expedited GPs and blocking readers from consuming CPU
srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
srcu: Automatically determine size-transition strategy at boot
rcutorture: Make torture.sh allow for --kasan
rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU
rcutorture: Make kvm.sh allow more memory for --kasan runs
torture: Save "make allmodconfig" .config file
scftorture: Remove extraneous "scf" from per_version_boot_params
rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC
torture: Enable CSD-lock stall reports for scftorture
torture: Skip vmlinux check for kvm-again.sh runs
scftorture: Adjust for TASKS_RCU Kconfig option being selected
rcuscale: Allow rcuscale without RCU Tasks Rude/Trace
rcuscale: Allow rcuscale without RCU Tasks
refscale: Allow refscale without RCU Tasks Rude/Trace
refscale: Allow refscale without RCU Tasks
rcutorture: Allow specifying per-scenario stat_interval
...
|
|
Marge Energy Model support updates and cpuidle updates for 5.19-rc1:
- Update the Energy Model support code to allow the Energy Model to be
artificial, which means that the power values may not be on a uniform
scale with other devices providing power information, and update the
cpufreq_cooling and devfreq_cooling thermal drivers to support
artificial Energy Models (Lukasz Luba).
- Make DTPM check the Energy Model type (Lukasz Luba).
- Fix policy counter decrementation in cpufreq if Energy Model is in
use (Pierre Gondois).
- Add AlderLake processor support to the intel_idle driver (Zhang Rui).
- Fix regression leading to no genpd governor in the PSCI cpuidle
driver and fix the riscv-sbi cpuidle driver to allow a genpd
governor to be used (Ulf Hansson).
* pm-em:
PM: EM: Decrement policy counter
powercap: DTPM: Check for Energy Model type
thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling
Documentation: EM: Add artificial EM registration description
PM: EM: Remove old debugfs files and print all 'flags'
PM: EM: Change the order of arguments in the .active_power() callback
PM: EM: Use the new .get_cost() callback while registering EM
PM: EM: Add artificial EM flag
PM: EM: Add .get_cost() callback
* pm-cpuidle:
cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used
cpuidle: psci: Fix regression leading to no genpd governor
intel_idle: Add AlderLake support
|
|
Merge PM core changes, updates related to system sleep and power capping
updates for 5.19-rc1:
- Export dev_pm_ops instead of suspend() and resume() in the IIO
chemical scd30 driver (Jonathan Cameron).
- Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
PM-runtime counterparts (Jonathan Cameron).
- Move symbol exports in the IIO chemical scd30 driver into the
IIO_SCD30 namespace (Jonathan Cameron).
- Avoid device PM-runtime usage count underflows (Rafael Wysocki).
- Allow dynamic debug to control printing of PM messages (David
Cohen).
- Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
Bai).
- Preserve ACPI-table override during hibernation (Amadeusz Sławiński).
- Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).
- Make Intel RAPL power capping driver support the RaptorLake and
AlderLake N processors (Zhang Rui, Sumeet Pawnikar).
- Remove redundant store to value after multiply in the RAPL power
capping driver (Colin Ian King).
* pm-core:
PM: runtime: Avoid device usage count underflows
iio: chemical: scd30: Move symbol exports into IIO_SCD30 namespace
PM: core: Add NS varients of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and runtime pm equiv
iio: chemical: scd30: Export dev_pm_ops instead of suspend() and resume()
* pm-sleep:
cpuidle: PSCI: Improve support for suspend-to-RAM for PSCI OSI mode
PM: runtime: Allow to call __pm_runtime_set_status() from atomic context
PM: hibernate: Don't mark comment as kernel-doc
x86/ACPI: Preserve ACPI-table override during hibernation
PM: hibernate: Fix some kernel-doc comments
PM: sleep: enable dynamic debug support within pm_pr_dbg()
PM: sleep: Narrow down -DDEBUG on kernel/power/ files
* powercap:
powercap: intel_rapl: remove redundant store to value after multiply
powercap: intel_rapl: add support for ALDERLAKE_N
powercap: RAPL: Add Power Limit4 support for RaptorLake
powercap: intel_rapl: add support for RaptorLake
|
|
The original x86 sev_alloc() only called set_memory_decrypted() on
memory returned by alloc_pages_node(), so the page order calculation
fell out of that logic. However, the common dma-direct code has several
potential allocators, not all of which are guaranteed to round up the
underlying allocation to a power-of-two size, so carrying over that
calculation for the encryption/decryption size was a mistake. Fix it by
rounding to a *number* of pages, rather than an order.
Until recently there was an even worse interaction with DMA_DIRECT_REMAP
where we could have ended up decrypting part of the next adjacent
vmalloc area, only averted by no architecture actually supporting both
configs at once. Don't ask how I found that one out...
Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent buffers in common code")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Rientjes <rientjes@google.com>
|
|
With unprivileged BPF disabled, all cmds associated with the BPF syscall
are blocked to users without CAP_BPF/CAP_SYS_ADMIN. However there are
use cases where we may wish to allow interactions with BPF programs
without being able to load and attach them. So for example, a process
with required capabilities loads/attaches a BPF program, and a process
with less capabilities interacts with it; retrieving perf/ring buffer
events, modifying map-specified config etc. With all BPF syscall
commands blocked as a result of unprivileged BPF being disabled,
this mode of interaction becomes impossible for processes without
CAP_BPF.
As Alexei notes
"The bpf ACL model is the same as traditional file's ACL.
The creds and ACLs are checked at open(). Then during file's write/read
additional checks might be performed. BPF has such functionality already.
Different map_creates have capability checks while map_lookup has:
map_get_sys_perms(map, f) & FMODE_CAN_READ.
In other words it's enough to gate FD-receiving parts of bpf
with unprivileged_bpf_disabled sysctl.
The rest is handled by availability of FD and access to files in bpffs."
So key fd creation syscall commands BPF_PROG_LOAD and BPF_MAP_CREATE
are blocked with unprivileged BPF disabled and no CAP_BPF.
And as Alexei notes, map creation with unprivileged BPF disabled off
blocks creation of maps aside from array, hash and ringbuf maps.
Programs responsible for loading and attaching the BPF program
can still control access to its pinned representation by restricting
permissions on the pin path, as with normal files.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/1652970334-30510-2-git-send-email-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Tracing and syscall BPF program types are very convenient to add BPF
capabilities to subsystem otherwise not BPF capable.
When we add kfuncs capabilities to those program types, we can add
BPF features to subsystems without having to touch BPF core.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20220518205924.399291-2-benjamin.tissoires@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch implements a new struct bpf_func_proto, named
bpf_skc_to_mptcp_sock_proto. Define a new bpf_id BTF_SOCK_TYPE_MPTCP,
and a new helper bpf_skc_to_mptcp_sock(), which invokes another new
helper bpf_mptcp_sock_from_subflow() in net/mptcp/bpf.c to get struct
mptcp_sock from a given subflow socket.
v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c,
remove build check for CONFIG_BPF_JIT
v5: Drop EXPORT_SYMBOL (Martin)
Co-developed-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220519233016.105670-2-mathew.j.martineau@linux.intel.com
|