aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-08-31Merge tag 'pm-5.15-rc1' of ↵Linus Torvalds7-41/+56
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These address some PCI device power management issues, add new hardware support to the RAPL power capping driver, add HWP guaranteed performance change notification support to the intel_pstate driver, replace deprecated CPU-hotplug functions in a few places, update CPU PM notifiers to use raw spinlocks, update the PM domains framework (new DT property support, Kconfig fix), do a couple of cleanups in code related to system sleep, and improve the energy model and the schedutil cpufreq governor. Specifics: - Address 3 PCI device power management issues (Rafael Wysocki). - Add Power Limit4 support for Alder Lake to the Intel RAPL power capping driver (Sumeet Pawnikar). - Add HWP guaranteed performance change notification support to the intel_pstate driver (Srinivas Pandruvada). - Replace deprecated CPU-hotplug functions in code related to power management (Sebastian Andrzej Siewior). - Update CPU PM notifiers to use raw spinlocks (Valentin Schneider). - Add support for 'required-opps' DT property to the generic power domains (genpd) framework and use this property for I2C on ARM64 sc7180 (Rajendra Nayak). - Fix Kconfig issue related to genpd (Geert Uytterhoeven). - Increase energy calculation precision in the Energy Model (Lukasz Luba). - Fix kobject deletion in the exit code of the schedutil cpufreq governor (Kevin Hao). - Unmark some functions as kernel-doc in the PM core to avoid false-positive documentation build warnings (Randy Dunlap). - Check RTC features instead of ops in suspend_test Alexandre Belloni)" * tag 'pm-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: domains: Fix domain attach for CONFIG_PM_OPP=n powercap: Add Power Limit4 support for Alder Lake SoC cpufreq: intel_pstate: Process HWP Guaranteed change notification thermal: intel: Allow processing of HWP interrupt notifier: Remove atomic_notifier_call_chain_robust() PM: cpu: Make notifier chain use a raw_spinlock_t PM: sleep: unmark 'state' functions as kernel-doc arm64: dts: sc7180: Add required-opps for i2c PM: domains: Add support for 'required-opps' to set default perf state opp: Don't print an error if required-opps is missing cpufreq: schedutil: Use kobject release() method to free sugov_tunables PM: EM: Increase energy calculation precision PM: sleep: check RTC features instead of ops in suspend_test PM: sleep: s2idle: Replace deprecated CPU-hotplug functions cpufreq: Replace deprecated CPU-hotplug functions powercap: intel_rapl: Replace deprecated CPU-hotplug functions PCI: PM: Enable PME if it can be signaled from D3cold PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently PCI: Use pci_update_current_state() in pci_enable_device_flags()
2021-08-31Merge tag 'audit-pr-20210830' of ↵Linus Torvalds2-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Two patches in the audit pull request for v5.15; one is trivial ("header protection") but the second is a real patch that fixes a refcounting problem. The refcount fix normally would have been sent up during the -rcX cycle, but since we merged it less than a week before v5.14 proper I felt it was better to wait for the merge window to open; the patch is marked with the usual -stable markings" * tag 'audit-pr-20210830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: move put_tree() to avoid trim_trees refcount underflow and UAF audit: add header protection to kernel/audit.h
2021-08-31Merge tag 'kernel.sys.v5.15' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull set_user() update from Christian Brauner: "This contains a single fix to set_user() which aligns permission checks with the corresponding fork() codepath. No one involved in this could come up with a reason for the difference. A capable caller can already circumvent the check when they fork where the permission checks are already for the relevant capabilities in addition to also allowing to exceed nproc when it is the init user. So apply the same logic to set_user()" * tag 'kernel.sys.v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds
2021-08-31Merge tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds1-0/+42
Pull nfsd updates from Chuck Lever: "New features: - Support for server-side disconnect injection via debugfs - Protocol definitions for new RPC_AUTH_TLS authentication flavor Performance improvements: - Reduce page allocator traffic in the NFSD splice read actor - Reduce CPU utilization in svcrdma's Send completion handler Notable bug fixes: - Stabilize lockd operation when re-exporting NFS mounts - Fix the use of %.*s in NFSD tracepoints - Fix /proc/sys/fs/nfs/nsm_use_hostnames" * tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) nfsd: fix crash on LOCKT on reexported NFSv3 nfs: don't allow reexport reclaims lockd: don't attempt blocking locks on nfs reexports nfs: don't atempt blocking locks on nfs reexports Keep read and write fds with each nlm_file lockd: update nlm_lookup_file reexport comment nlm: minor refactoring nlm: minor nlm_lookup_file argument change lockd: lockd server-side shouldn't set fl_ops SUNRPC: Add documentation for the fail_sunrpc/ directory SUNRPC: Server-side disconnect injection SUNRPC: Move client-side disconnect injection SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free() nfsd4: Fix forced-expiry locking rpc: fix gss_svc_init cleanup on failure SUNRPC: Add RPC_AUTH_TLS protocol numbers lockd: change the proc_handler for nsm_use_hostnames sysctl: introduce new proc handler proc_dobool SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() ...
2021-08-30Merge tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull io_uring updates from Jens Axboe: - cancellation cleanups (Hao, Pavel) - io-wq accounting cleanup (Hao) - io_uring submit locking fix (Hao) - io_uring link handling fixes (Hao) - fixed file improvements (wangyangbo, Pavel) - allow updates of linked timeouts like regular timeouts (Pavel) - IOPOLL fix (Pavel) - remove batched file get optimization (Pavel) - improve reference handling (Pavel) - IRQ task_work batching (Pavel) - allow pure fixed file, and add support for open/accept (Pavel) - GFP_ATOMIC RT kernel fix - multiple CQ ring waiter improvement - funnel IRQ completions through task_work - add support for limiting async workers explicitly - add different clocksource support for timeouts - io-wq wakeup race fix - lots of cleanups and improvement (Pavel et al) * tag 'for-5.15/io_uring-2021-08-30' of git://git.kernel.dk/linux-block: (87 commits) io-wq: fix wakeup race when adding new work io-wq: wqe and worker locks no longer need to be IRQ safe io-wq: check max_worker limits if a worker transitions bound state io_uring: allow updating linked timeouts io_uring: keep ltimeouts in a list io_uring: support CLOCK_BOOTTIME/REALTIME for timeouts io-wq: provide a way to limit max number of workers io_uring: add build check for buf_index overflows io_uring: clarify io_req_task_cancel() locking io_uring: add task-refs-get helper io_uring: fix failed linkchain code logic io_uring: remove redundant req_set_fail() io_uring: don't free request to slab io_uring: accept directly into fixed file table io_uring: hand code io_accept() fd installing io_uring: openat directly into fixed fd table net: add accept helper not installing fd io_uring: fix io_try_cancel_userdata race for iowq io_uring: IRQ rw completion batching io_uring: batch task work locking ...
2021-08-30Merge tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+18
Pull block driver updates from Jens Axboe: "Sitting on top of the core block changes, here are the driver changes for the 5.15 merge window: - NVMe updates via Christoph: - suspend improvements for devices with an HMB (Keith Busch) - handle double completions more gacefull (Sagi Grimberg) - cleanup the selects for the nvme core code a bit (Sagi Grimberg) - don't update queue count when failing to set io queues (Ruozhu Li) - various nvmet connect fixes (Amit Engel) - cleanup lightnvm leftovers (Keith Busch, me) - small cleanups (Colin Ian King, Hou Pu) - add tracing for the Set Features command (Hou Pu) - CMB sysfs cleanups (Keith Busch) - add a mutex_destroy call (Keith Busch) - remove lightnvm subsystem. It's served its purpose and ultimately led to zoned nvme support, we no longer need it (Christoph) - revert floppy O_NDELAY fix (Denis) - nbd fixes (Hou, Pavel, Baokun) - nbd locking fixes (Tetsuo) - nbd device removal fixes (Christoph) - raid10 rcu warning fix (Xiao) - raid1 write behind fix (Guoqing) - rnbd fixes (Gioh, Md Haris) - misc fixes (Colin)" * tag 'for-5.15/drivers-2021-08-30' of git://git.kernel.dk/linux-block: (42 commits) Revert "floppy: reintroduce O_NDELAY fix" raid1: ensure write behind bio has less than BIO_MAX_VECS sectors md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard nbd: remove nbd->destroy_complete nbd: only return usable devices from nbd_find_unused nbd: set nbd->index before releasing nbd_index_mutex nbd: prevent IDR lookups from finding partially initialized devices nbd: reset NBD to NULL when restarting in nbd_genl_connect nbd: add missing locking to the nbd_dev_add error path nvme: remove the unused NVME_NS_* enum nvme: remove nvm_ndev from ns nvme: Have NVME_FABRICS select NVME_CORE instead of transport drivers block: nbd: add sanity check for first_minor nvmet: check that host sqsize does not exceed ctrl MQES nvmet: avoid duplicate qid in connect cmd nvmet: pass back cntlid on successful completion nvme-rdma: don't update queue count when failing to set io queues nvme-tcp: don't update queue count when failing to set io queues nvme-tcp: pair send_mutex init with destroy nvme: allow user toggling hmb usage ...
2021-08-30Merge tag 'timers-core-2021-08-30' of ↵Linus Torvalds10-156/+398
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timekeeping, timers and related drivers: Core code: - Cure a couple of correctness issues in the posix CPU timer code to prevent that the tick dependency for NOHZ full is kept alive for no reason. - Avoid expensive double reprogramming of the clockevent device in hrtimer_start_range_ns(). - Avoid pointless SMP function calls when the clock was set to avoid disturbing CPUs which do not have any affected timers queued. - Make the clocksource watchdog test work correctly when CONFIG_HZ is less than 100. Drivers: - Prefer the ARM architected timer over the Exynos timer which is way more expensive to access. - Add device tree bindings for new Ingenic SoCs - The usual improvements and cleanups all over the place" * tag 'timers-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) clocksource: Make clocksource watchdog test safe for slow-HZ systems dt-bindings: timer: Add ABIs for new Ingenic SoCs clocksource/drivers/fttmr010: Pass around less pointers clocksource/drivers/mediatek: Optimize systimer irq clear flow on shutdown clocksource/drivers/ingenic: Use bitfield macro helpers clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel dt-bindings: timer: convert rockchip,rk-timer.txt to YAML clocksource/drivers/exynos_mct: Mark MCT device as CLOCK_EVT_FEAT_PERCPU clocksource/drivers/exynos_mct: Prioritise Arm arch timer on arm64 hrtimer: Unbreak hrtimer_force_reprogram() hrtimer: Use raw_cpu_ptr() in clock_was_set() hrtimer: Avoid more SMP function calls in clock_was_set() hrtimer: Avoid unnecessary SMP function calls in clock_was_set() hrtimer: Add bases argument to clock_was_set() time/timekeeping: Avoid invoking clock_was_set() twice timekeeping: Distangle resume and clock-was-set events timerfd: Provide timerfd_resume() hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case hrtimer: Ensure timerfd notification for HIGHRES=n hrtimer: Consolidate reprogramming code ...
2021-08-30Merge tag 'irq-core-2021-08-30' of ↵Linus Torvalds13-50/+195
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates to the interrupt core and driver subsystems: Core changes: - The usual set of small fixes and improvements all over the place, but nothing stands out MSI changes: - Further consolidation of the PCI/MSI interrupt chip code - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts of platform devices in the same way as PCI exposes them. Driver changes: - Support for ARM GICv3 EPPI partitions - Treewide conversion to generic_handle_domain_irq() for all chained interrupt controllers - Conversion to bitmap_zalloc() throughout the irq chip drivers - The usual set of small fixes and improvements" * tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) platform-msi: Add ABI to show msi_irqs of platform devices genirq/msi: Move MSI sysfs handling from PCI to MSI core genirq/cpuhotplug: Demote debug printk to KERN_DEBUG irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy irqdomain: Export irq_domain_disconnect_hierarchy() irqchip/gic-v3: Fix priority comparison when non-secure priorities are used irqchip/apple-aic: Fix irq_disable from within irq handlers pinctrl/rockchip: drop the gpio related codes gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type gpio/rockchip: support next version gpio controller gpio/rockchip: use struct rockchip_gpio_regs for gpio controller gpio/rockchip: add driver for rockchip gpio dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank pinctrl/rockchip: add pinctrl device to gpio bank struct pinctrl/rockchip: separate struct rockchip_pin_bank to a head file pinctrl/rockchip: always enable clock for gpio controller genirq: Fix kernel doc indentation EDAC/altera: Convert to generic_handle_domain_irq() powerpc: Bulk conversion to generic_handle_domain_irq() nios2: Bulk conversion to generic_handle_domain_irq() ...
2021-08-30Merge tag 'locking-core-2021-08-30' of ↵Linus Torvalds20-1373/+3123
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and atomics updates from Thomas Gleixner: "The regular pile: - A few improvements to the mutex code - Documentation updates for atomics to clarify the difference between cmpxchg() and try_cmpxchg() and to explain the forward progress expectations. - Simplification of the atomics fallback generator - The addition of arch_atomic_long*() variants and generic arch_*() bitops based on them. - Add the missing might_sleep() invocations to the down*() operations of semaphores. The PREEMPT_RT locking core: - Scheduler updates to support the state preserving mechanism for 'sleeping' spin- and rwlocks on RT. This mechanism is carefully preserving the state of the task when blocking on a 'sleeping' spin- or rwlock and takes regular wake-ups targeted at the same task into account. The preserved or updated (via a regular wakeup) state is restored when the lock has been acquired. - Restructuring of the rtmutex code so it can be utilized and extended for the RT specific lock variants. - Restructuring of the ww_mutex code to allow sharing of the ww_mutex specific functionality for rtmutex based ww_mutexes. - Header file disentangling to allow substitution of the regular lock implementations with the PREEMPT_RT variants without creating an unmaintainable #ifdef mess. - Shared base code for the PREEMPT_RT specific rw_semaphore and rwlock implementations. Contrary to the regular rw_semaphores and rwlocks the PREEMPT_RT implementation is writer unfair because it is infeasible to do priority inheritance on multiple readers. Experience over the years has shown that real-time workloads are not the typical workloads which are sensitive to writer starvation. The alternative solution would be to allow only a single reader which has been tried and discarded as it is a major bottleneck especially for mmap_sem. Aside of that many of the writer starvation critical usage sites have been converted to a writer side mutex/spinlock and RCU read side protections in the past decade so that the issue is less prominent than it used to be. - The actual rtmutex based lock substitutions for PREEMPT_RT enabled kernels which affect mutex, ww_mutex, rw_semaphore, spinlock_t and rwlock_t. The spin/rw_lock*() functions disable migration across the critical section to preserve the existing semantics vs per-CPU variables. - Rework of the futex REQUEUE_PI mechanism to handle the case of early wake-ups which interleave with a re-queue operation to prevent the situation that a task would be blocked on both the rtmutex associated to the outer futex and the rtmutex based hash bucket spinlock. While this situation cannot happen on !RT enabled kernels the changes make the underlying concurrency problems easier to understand in general. As a result the difference between !RT and RT kernels is reduced to the handling of waiting for the critical section. !RT kernels simply spin-wait as before and RT kernels utilize rcu_wait(). - The substitution of local_lock for PREEMPT_RT with a spinlock which protects the critical section while staying preemptible. The CPU locality is established by disabling migration. The underlying concepts of this code have been in use in PREEMPT_RT for way more than a decade. The code has been refactored several times over the years and this final incarnation has been optimized once again to be as non-intrusive as possible, i.e. the RT specific parts are mostly isolated. It has been extensively tested in the 5.14-rt patch series and it has been verified that !RT kernels are not affected by these changes" * tag 'locking-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (92 commits) locking/rtmutex: Return success on deadlock for ww_mutex waiters locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes locking/rtmutex: Dequeue waiter on ww_mutex deadlock locking/rtmutex: Dont dereference waiter lockless locking/semaphore: Add might_sleep() to down_*() family locking/ww_mutex: Initialize waiter.ww_ctx properly static_call: Update API documentation locking/local_lock: Add PREEMPT_RT support locking/spinlock/rt: Prepare for RT local_lock locking/rtmutex: Add adaptive spinwait mechanism locking/rtmutex: Implement equal priority lock stealing preempt: Adjust PREEMPT_LOCK_OFFSET for RT locking/rtmutex: Prevent lockdep false positive with PI futexes futex: Prevent requeue_pi() lock nesting issue on RT futex: Simplify handle_early_requeue_pi_wakeup() futex: Reorder sanity checks in futex_requeue() futex: Clarify comment in futex_requeue() futex: Restructure futex_requeue() futex: Correct the number of requeued waiters for PI futex: Remove bogus condition for requeue PI ...
2021-08-30Merge tag 'smp-core-2021-08-30' of ↵Linus Torvalds3-39/+67
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP core updates from Thomas Gleixner: - Replace get/put_online_cpus() in various places. The final removal will happen shortly before v5.15-rc1 when the rest of the patches have been merged. - Add debug code to help the analysis of CPU hotplug failures - A set of kernel doc updates * tag 'smp-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mm: Replace deprecated CPU-hotplug functions. md/raid5: Replace deprecated CPU-hotplug functions. Documentation: Replace deprecated CPU-hotplug functions. smp: Fix all kernel-doc warnings cpu/hotplug: Add debug printks for hotplug callback failures cpu/hotplug: Use DEVICE_ATTR_*() macro cpu/hotplug: Eliminate all kernel-doc warnings cpu/hotplug: Fix kernel doc warnings for __cpuhp_setup_state_cpuslocked() cpu/hotplug: Fix comment typo smpboot: Replace deprecated CPU-hotplug functions.
2021-08-30Merge tag 'perf-core-2021-08-30' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event updates from Ingo Molnar: - Add support for Intel Sapphire Rapids server CPU uncore events - Allow the AMD uncore driver to be built as a module - Misc cleanups and fixes * tag 'perf-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) perf/x86/amd/ibs: Add bitfield definitions in new <asm/amd-ibs.h> header perf/amd/uncore: Allow the driver to be built as a module x86/cpu: Add get_llc_id() helper function perf/amd/uncore: Clean up header use, use <linux/ include paths instead of <asm/ perf/amd/uncore: Simplify code, use free_percpu()'s built-in check for NULL perf/hw_breakpoint: Replace deprecated CPU-hotplug functions perf/x86/intel: Replace deprecated CPU-hotplug functions perf/x86: Remove unused assignment to pointer 'e' perf/x86/intel/uncore: Fix IIO cleanup mapping procedure for SNR/ICX perf/x86/intel/uncore: Support IMC free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Support IIO free-running counters on Sapphire Rapids server perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() perf/x86/intel/uncore: Add alias PMU name perf/x86/intel/uncore: Add Sapphire Rapids server MDF support perf/x86/intel/uncore: Add Sapphire Rapids server M3UPI support perf/x86/intel/uncore: Add Sapphire Rapids server UPI support perf/x86/intel/uncore: Add Sapphire Rapids server M2M support perf/x86/intel/uncore: Add Sapphire Rapids server IMC support perf/x86/intel/uncore: Add Sapphire Rapids server PCU support perf/x86/intel/uncore: Add Sapphire Rapids server M2PCIe support ...
2021-08-30Merge tag 'sched-core-2021-08-30' of ↵Linus Torvalds8-149/+672
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - The biggest change in this cycle is scheduler support for asymmetric scheduling affinity, to support the execution of legacy 32-bit tasks on AArch32 systems that also have 64-bit-only CPUs. Architectures can fill in this functionality by defining their own task_cpu_possible_mask(p). When this is done, the scheduler will make sure the task will only be scheduled on CPUs that support it. (The actual arm64 specific changes are not part of this tree.) For other architectures there will be no change in functionality. - Add cgroup SCHED_IDLE support - Increase node-distance flexibility & delay determining it until a CPU is brought online. (This enables platforms where node distance isn't final until the CPU is only.) - Deadline scheduler enhancements & fixes - Misc fixes & cleanups. * tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) eventfd: Make signal recursion protection a task bit sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED case sched: Introduce dl_task_check_affinity() to check proposed affinity sched: Allow task CPU affinity to be restricted on asymmetric systems sched: Split the guts of sched_setaffinity() into a helper function sched: Introduce task_struct::user_cpus_ptr to track requested affinity sched: Reject CPU affinity changes based on task_cpu_possible_mask() cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 sched: Introduce task_cpu_possible_mask() to limit fallback rq selection sched: Cgroup SCHED_IDLE support sched/topology: Skip updating masks for non-online nodes sched: Replace deprecated CPU-hotplug functions. sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS sched: Fix UCLAMP_FLAG_IDLE setting sched/deadline: Fix missing clock update in migrate_task_rq_dl() sched/fair: Avoid a second scan of target in select_idle_cpu sched/fair: Use prev instead of new target as recent_used_cpu sched: Don't report SCHED_FLAG_SUGOV in sched_getattr() ...
2021-08-30Merge tag 's390-5.15-1' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Improve ftrace code patching so that stop_machine is not required anymore. This requires a small common code patch acked by Steven Rostedt: https://lore.kernel.org/linux-s390/[email protected]/ - Enable KCSAN for s390. This comes with a small common code change to fix a compile warning. Acked by Marco Elver: https://lore.kernel.org/r/[email protected] - Add KFENCE support for s390. This also comes with a minimal x86 patch from Marco Elver who said also this can be carried via the s390 tree: https://lore.kernel.org/linux-s390/[email protected]/ - More changes to prepare the decompressor for relocation. - Enable DAT also for CPU restart path. - Final set of register asm removal patches; leaving only three locations where needed and sane. - Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO support to hwcaps flags. - Cleanup hwcaps implementation. - Add new instructions to in-kernel disassembler. - Various QDIO cleanups. - Add SCLP debug feature. - Various other cleanups and improvements all over the place. * tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (105 commits) s390: remove SCHED_CORE from defconfigs s390/smp: do not use nodat_stack for secondary CPU start s390/smp: enable DAT before CPU restart callback is called s390: update defconfigs s390/ap: fix state machine hang after failure to enable irq KVM: s390: generate kvm hypercall functions s390/sclp: add tracing of SCLP interactions s390/debug: add early tracing support s390/debug: fix debug area life cycle s390/debug: keep debug data on resize s390/diag: make restart_part2 a local label s390/mm,pageattr: fix walk_pte_level() early exit s390: fix typo in linker script s390: remove do_signal() prototype and do_notify_resume() function s390/crypto: fix all kernel-doc warnings in vfio_ap_ops.c s390/pci: improve DMA translation init and exit s390/pci: simplify CLP List PCI handling s390/pci: handle FH state mismatch only on disable s390/pci: fix misleading rc in clp_set_pci_fn() s390/boot: factor out offset_vmlinux_info() function ...
2021-08-30Merge branch 'linus' of ↵Linus Torvalds1-24/+11
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Algorithms: - Add AES-NI/AVX/x86_64 implementation of SM4. Drivers: - Add Arm SMCCC TRNG based driver" [ And obviously a lot of random fixes and updates - Linus] * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (84 commits) crypto: sha512 - remove imaginary and mystifying clearing of variables crypto: aesni - xts_crypt() return if walk.nbytes is 0 padata: Remove repeated verbose license text crypto: ccp - Add support for new CCP/PSP device ID crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation crypto: x86/sm4 - export reusable AESNI/AVX functions crypto: rmd320 - remove rmd320 in Makefile crypto: skcipher - in_irq() cleanup crypto: hisilicon - check _PS0 and _PR0 method crypto: hisilicon - change parameter passing of debugfs function crypto: hisilicon - support runtime PM for accelerator device crypto: hisilicon - add runtime PM ops crypto: hisilicon - using 'debugfs_create_file' instead of 'debugfs_create_regset32' crypto: tcrypt - add GCM/CCM mode test for SM4 algorithm crypto: testmgr - Add GCM/CCM mode test of SM4 algorithm crypto: tcrypt - Fix missing return value check crypto: hisilicon/sec - modify the hardware endian configuration crypto: hisilicon/sec - fix the abnormal exiting process crypto: qat - store vf.compatible flag crypto: qat - do not export adf_iov_putmsg() ...
2021-08-30Merge branch 'core-rcu.2021.08.28a' of ↵Linus Torvalds13-1658/+1767
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "RCU changes for this cycle were: - Documentation updates - Miscellaneous fixes - Offloaded-callbacks updates - Updates to the nolibc library - Tasks-RCU updates - In-kernel torture-test updates - Torture-test scripting, perhaps most notably the pinning of torture-test guest OSes so as to force differences in memory latency. For example, in a two-socket system, a four-CPU guest OS will have one pair of its CPUs pinned to threads in a single core on one socket and the other pair pinned to threads in a single core on the other socket. This approach proved able to force race conditions that earlier testing missed. Some of these race conditions are still being tracked down" * 'core-rcu.2021.08.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (61 commits) torture: Replace deprecated CPU-hotplug functions. rcu: Replace deprecated CPU-hotplug functions rcu: Print human-readable message for schedule() in RCU reader rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCU rcu: Use per_cpu_ptr to get the pointer of per_cpu variable rcu: Remove useless "ret" update in rcu_gp_fqs_loop() rcu: Mark accesses in tree_stall.h rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack rcu: Mark lockless ->qsmask read in rcu_check_boost_fail() srcutiny: Mark read-side data races rcu: Start timing stall repetitions after warning complete rcu: Do not disable GP stall detection in rcu_cpu_stall_reset() rcu/tree: Handle VM stoppage in stall detection rculist: Unify documentation about missing list_empty_rcu() rcu: Mark accesses to ->rcu_read_lock_nesting rcu: Weaken ->dynticks accesses and updates rcu: Remove special bit at the bottom of the ->dynticks counter rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock rcu: Fix to include first blocked task in stall warning torture: Make kvm-test-1-run-qemu.sh check for reboot loops ...
2021-08-30Merge branches 'pm-pci', 'pm-sleep', 'pm-domains' and 'powercap'Rafael J. Wysocki3-4/+4
* pm-pci: PCI: PM: Enable PME if it can be signaled from D3cold PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently PCI: Use pci_update_current_state() in pci_enable_device_flags() * pm-sleep: PM: sleep: unmark 'state' functions as kernel-doc PM: sleep: check RTC features instead of ops in suspend_test PM: sleep: s2idle: Replace deprecated CPU-hotplug functions * pm-domains: PM: domains: Fix domain attach for CONFIG_PM_OPP=n arm64: dts: sc7180: Add required-opps for i2c PM: domains: Add support for 'required-opps' to set default perf state opp: Don't print an error if required-opps is missing * powercap: powercap: Add Power Limit4 support for Alder Lake SoC powercap: intel_rapl: Replace deprecated CPU-hotplug functions
2021-08-30Merge branches 'pm-cpufreq', 'pm-cpu' and 'pm-em'Rafael J. Wysocki4-37/+52
* pm-cpufreq: cpufreq: intel_pstate: Process HWP Guaranteed change notification thermal: intel: Allow processing of HWP interrupt cpufreq: schedutil: Use kobject release() method to free sugov_tunables cpufreq: Replace deprecated CPU-hotplug functions * pm-cpu: notifier: Remove atomic_notifier_call_chain_robust() PM: cpu: Make notifier chain use a raw_spinlock_t * pm-em: PM: EM: Increase energy calculation precision
2021-08-30Merge tag 'fsnotify_for_v5.15-rc1' of ↵Linus Torvalds1-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify speedups when notification actually isn't used and support for identifying processes which caused fanotify events through pidfd instead of normal pid" * tag 'fsnotify_for_v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: optimize the case of no marks of any type fsnotify: count all objects with attached connectors fsnotify: count s_fsnotify_inode_refs for attached connectors fsnotify: replace igrab() with ihold() on attach connector fanotify: add pidfd support to the fanotify API fanotify: introduce a generic info record copying helper fanotify: minor cosmetic adjustments to fid labels kernel/pid.c: implement additional checks upon pidfd_create() parameters kernel/pid.c: remove static qualifier from pidfd_create()
2021-08-29Merge tag 'irqchip-5.15' of ↵Thomas Gleixner1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - API updates: - Treewide conversion to generic_handle_domain_irq() for anything that looks like a chained interrupt controller - Update the irqdomain documentation - Use of bitmap_zalloc() throughout the tree - New functionalities: - Support for GICv3 EPPI partitions - Fixes: - Qualcomm PDC hierarchy fixes - Yet another priority decoding fix for the GICv3 pseudo-NMIs - Fix the apple-aic driver irq_eoi() callback to always unmask the interrupt - Properly handle edge interrupts on loongson-pch-pic - Let the mtk-sysirq driver advertise IRQCHIP_SKIP_SET_WAKE Link: https://lore.kernel.org/r/[email protected]
2021-08-29Merge tag 'sched_urgent_for_v5.14' of ↵Linus Torvalds2-27/+121
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Have get_push_task() check whether current has migration disabled and thus avoid useless invocations of the migration thread - Rework initialization flow so that all rq->core's are initialized, even of CPUs which have not been onlined yet, so that iterating over them all works as expected * tag 'sched_urgent_for_v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix get_push_task() vs migrate_disable() sched: Fix Core-wide rq->lock for uninitialized CPUs
2021-08-28clocksource: Make clocksource watchdog test safe for slow-HZ systemsPaul E. McKenney3-23/+23
The clocksource watchdog test sets a local JIFFIES_SHIFT macro and assumes that HZ is >= 100. For smaller HZ values this shift value is too large and causes undefined behaviour. Move the HZ-based definitions of JIFFIES_SHIFT from kernel/time/jiffies.c to kernel/time/tick-internal.h so the clocksource watchdog test can utilize them, which makes it work correctly with all HZ values. [ tglx: Resolved conflicts and massaged changelog ] Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/lkml/20210812000133.GA402890@paulmck-ThinkPad-P17-Gen-1/
2021-08-27locking/rtmutex: Return success on deadlock for ww_mutex waitersPeter Zijlstra1-1/+14
ww_mutexes can legitimately cause a deadlock situation in the lock graph which is resolved afterwards by the wait/wound mechanics. The rtmutex chain walk can detect such a deadlock and returns EDEADLK which in turn skips the wait/wound mechanism and returns EDEADLK to the caller. That's wrong because both lock chains might get EDEADLK or the wrong waiter would back out. Detect that situation and return 'success' in case that the waiter which initiated the chain walk is a ww_mutex with context. This allows the wait/wound mechanics to resolve the situation according to the rules. [ tglx: Split it apart and added changelog ] Reported-by: Sebastian Siewior <[email protected]> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-27locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexesPeter Zijlstra1-0/+25
rtmutex based ww_mutexes can legitimately create a cycle in the lock graph which can be observed by a blocker which didn't cause the problem: P1: A, ww_A, ww_B P2: ww_B, ww_A P3: A P3 might therefore be trapped in the ww_mutex induced cycle and run into the lock depth limitation of rt_mutex_adjust_prio_chain() which returns -EDEADLK to the caller. Disable the deadlock detection walk when the chain walk observes a ww_mutex to prevent this looping. [ tglx: Split it apart and added changelog ] Reported-by: Sebastian Siewior <[email protected]> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-27padata: Remove repeated verbose license textCai Huoqing1-13/+0
remove it because SPDX-License-Identifier is already used Signed-off-by: Cai Huoqing <[email protected]> Acked-by: Daniel Jordan <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2021-08-26Merge tag 'net-5.14-rc8' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from can and bpf. Closing three hw-dependent regressions. Any fixes of note are in the 'old code' category. Nothing blocking release from our perspective. Current release - regressions: - stmmac: revert "stmmac: align RX buffers" - usb: asix: ax88772: move embedded PHY detection as early as possible - usb: asix: do not call phy_disconnect() for ax88178 - Revert "net: really fix the build...", from Kalle to fix QCA6390 Current release - new code bugs: - phy: mediatek: add the missing suspend/resume callbacks Previous releases - regressions: - qrtr: fix another OOB Read in qrtr_endpoint_post - stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings Previous releases - always broken: - inet: use siphash in exception handling - ip_gre: add validation for csum_start - bpf: fix ringbuf helper function compatibility - rtnetlink: return correct error on changing device netns - e1000e: do not try to recover the NVM checksum on Tiger Lake" * tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) Revert "net: really fix the build..." net: hns3: fix get wrong pfc_en when query PFC configuration net: hns3: fix GRO configuration error after reset net: hns3: change the method of getting cmd index in debugfs net: hns3: fix duplicate node in VLAN list net: hns3: fix speed unknown issue in bond 4 net: hns3: add waiting time before cmdq memory is released net: hns3: clear hardware resource when loading driver net: fix NULL pointer reference in cipso_v4_doi_free rtnetlink: Return correct error on changing device netns net: dsa: hellcreek: Adjust schedule look ahead window net: dsa: hellcreek: Fix incorrect setting of GCL cxgb4: dont touch blocked freelist bitmap after free ipv4: use siphash instead of Jenkins in fnhe_hashfun() ipv6: use siphash in rt6_exception_hash() can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters net: usb: asix: ax88772: fix boolconv.cocci warnings net/sched: ets: fix crash when flipping from 'strict' to 'quantum' qede: Fix memset corruption net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp ...
2021-08-26sched: Fix get_push_task() vs migrate_disable()Sebastian Andrzej Siewior1-0/+3
push_rt_task() attempts to move the currently running task away if the next runnable task has migration disabled and therefore is pinned on the current CPU. The current task is retrieved via get_push_task() which only checks for nr_cpus_allowed == 1, but does not check whether the task has migration disabled and therefore cannot be moved either. The consequence is a pointless invocation of the migration thread which correctly observes that the task cannot be moved. Return NULL if the task has migration disabled and cannot be moved to another CPU. Fixes: a7c81556ec4d3 ("sched: Fix migrate_disable() vs rt/dl balancing") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-26sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED caseIngo Molnar1-1/+1
It's not actually used in the !CONFIG_FAIR_GROUP_SCHED case: kernel/sched/fair.c:488:12: warning: ‘tg_is_idle’ defined but not used [-Wunused-function] Keep around a placeholder nevertheless, for API completeness. Mark it inline, so the compiler doesn't think it must be used. Fixes: 304000390f88: ("sched: Cgroup SCHED_IDLE support") Signed-off-by: Ingo Molnar <[email protected]> Cc: Josh Don <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Vincent Guittot <[email protected]>
2021-08-26perf/hw_breakpoint: Replace deprecated CPU-hotplug functionsSebastian Andrzej Siewior1-2/+2
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-25Merge branch 'for-v5.14' of ↵Linus Torvalds2-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucount fixes from Eric Biederman: "This branch fixes a regression that made it impossible to increase rlimits that had been converted to the ucount infrastructure, and also fixes a reference counting bug where the reference was not incremented soon enough. The fixes are trivial and the bugs have been encountered in the wild, and the fixes have been tested" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Increase ucounts reference counter before the security hook ucounts: Fix regression preventing increasing of rlimits in init_user_ns
2021-08-25locking/rtmutex: Dequeue waiter on ww_mutex deadlockThomas Gleixner1-1/+6
The rt_mutex based ww_mutex variant queues the new waiter first in the lock's rbtree before evaluating the ww_mutex specific conditions which might decide that the waiter should back out. This check and conditional exit happens before the waiter is enqueued into the PI chain. The failure handling at the call site assumes that the waiter, if it is the top most waiter on the lock, is queued in the PI chain and then proceeds to adjust the unmodified PI chain, which results in RB tree corruption. Dequeue the waiter from the lock waiter list in the ww_mutex error exit path to prevent this. Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Reported-by: Sebastian Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-25locking/rtmutex: Dont dereference waiter locklessThomas Gleixner2-2/+16
The new rt_mutex_spin_on_onwer() loop checks whether the spinning waiter is still the top waiter on the lock by utilizing rt_mutex_top_waiter(), which is broken because that function contains a sanity check which dereferences the top waiter pointer to check whether the waiter belongs to the lock. That's wrong in the lockless spinwait case: CPU 0 CPU 1 rt_mutex_lock(lock) rt_mutex_lock(lock); queue(waiter0) waiter0 == rt_mutex_top_waiter(lock) rt_mutex_spin_on_onwer(lock, waiter0) { queue(waiter1) waiter1 == rt_mutex_top_waiter(lock) ... top_waiter = rt_mutex_top_waiter(lock) leftmost = rb_first_cached(&lock->waiters); -> signal dequeue(waiter1) destroy(waiter1) w = rb_entry(leftmost, ....) BUG_ON(w->lock != lock) <- UAF The BUG_ON() is correct for the case where the caller holds lock->wait_lock which guarantees that the leftmost waiter entry cannot vanish. For the lockless spinwait case it's broken. Create a new helper function which avoids the pointer dereference and just compares the leftmost entry pointer with current's waiter pointer to validate that currrent is still elegible for spinning. Fixes: 992caf7f1724 ("locking/rtmutex: Add adaptive spinwait mechanism") Reported-by: Sebastian Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-24audit: move put_tree() to avoid trim_trees refcount underflow and UAFRichard Guy Briggs1-1/+1
AUDIT_TRIM is expected to be idempotent, but multiple executions resulted in a refcount underflow and use-after-free. git bisect fingered commit fb041bb7c0a9 ("locking/refcount: Consolidate implementations of refcount_t") but this patch with its more thorough checking that wasn't in the x86 assembly code merely exposed a previously existing tree refcount imbalance in the case of tree trimming code that was refactored with prune_one() to remove a tree introduced in commit 8432c7006297 ("audit: Simplify locking around untag_chunk()") Move the put_tree() to cover only the prune_one() case. Passes audit-testsuite and 3 passes of "auditctl -t" with at least one directory watch. Cc: Jan Kara <[email protected]> Cc: Will Deacon <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Seiji Nishikawa <[email protected]> Cc: [email protected] Fixes: 8432c7006297 ("audit: Simplify locking around untag_chunk()") Signed-off-by: Richard Guy Briggs <[email protected]> Reviewed-by: Jan Kara <[email protected]> [PM: reformatted/cleaned-up the commit description] Signed-off-by: Paul Moore <[email protected]>
2021-08-24genirq/msi: Move MSI sysfs handling from PCI to MSI coreBarry Song1-0/+134
Move PCI's MSI sysfs code to the irq core so that other busses such as platform can reuse it. Signed-off-by: Barry Song <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-24genirq/cpuhotplug: Demote debug printk to KERN_DEBUGLee Jones1-1/+1
This sort of information is only generally useful when debugging. No need to have these sprinkled through the kernel log otherwise. Real world problem: During pre-release testing these have an affect on performance on real products. To the point where so much logging builds up, that it sets off the watchdog(s) on some high profile consumer devices. Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-23ucounts: Increase ucounts reference counter before the security hookAlexey Gladkov1-6/+6
We need to increment the ucounts reference counter befor security_prepare_creds() because this function may fail and abort_creds() will try to decrement this reference. [ 96.465056][ T8641] FAULT_INJECTION: forcing a failure. [ 96.465056][ T8641] name fail_page_alloc, interval 1, probability 0, space 0, times 0 [ 96.478453][ T8641] CPU: 1 PID: 8641 Comm: syz-executor668 Not tainted 5.14.0-rc6-syzkaller #0 [ 96.487215][ T8641] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 96.497254][ T8641] Call Trace: [ 96.500517][ T8641] dump_stack_lvl+0x1d3/0x29f [ 96.505758][ T8641] ? show_regs_print_info+0x12/0x12 [ 96.510944][ T8641] ? log_buf_vmcoreinfo_setup+0x498/0x498 [ 96.516652][ T8641] should_fail+0x384/0x4b0 [ 96.521141][ T8641] prepare_alloc_pages+0x1d1/0x5a0 [ 96.526236][ T8641] __alloc_pages+0x14d/0x5f0 [ 96.530808][ T8641] ? __rmqueue_pcplist+0x2030/0x2030 [ 96.536073][ T8641] ? lockdep_hardirqs_on_prepare+0x3e2/0x750 [ 96.542056][ T8641] ? alloc_pages+0x3f3/0x500 [ 96.546635][ T8641] allocate_slab+0xf1/0x540 [ 96.551120][ T8641] ___slab_alloc+0x1cf/0x350 [ 96.555689][ T8641] ? kzalloc+0x1d/0x30 [ 96.559740][ T8641] __kmalloc+0x2e7/0x390 [ 96.563980][ T8641] ? kzalloc+0x1d/0x30 [ 96.568029][ T8641] kzalloc+0x1d/0x30 [ 96.571903][ T8641] security_prepare_creds+0x46/0x220 [ 96.577174][ T8641] prepare_creds+0x411/0x640 [ 96.581747][ T8641] __sys_setfsuid+0xe2/0x3a0 [ 96.586333][ T8641] do_syscall_64+0x3d/0xb0 [ 96.590739][ T8641] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 96.596611][ T8641] RIP: 0033:0x445a69 [ 96.600483][ T8641] Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 [ 96.620152][ T8641] RSP: 002b:00007f1054173318 EFLAGS: 00000246 ORIG_RAX: 000000000000007a [ 96.628543][ T8641] RAX: ffffffffffffffda RBX: 00000000004ca4c8 RCX: 0000000000445a69 [ 96.636600][ T8641] RDX: 0000000000000010 RSI: 00007f10541732f0 RDI: 0000000000000000 [ 96.644550][ T8641] RBP: 00000000004ca4c0 R08: 0000000000000001 R09: 0000000000000000 [ 96.652500][ T8641] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ca4cc [ 96.660631][ T8641] R13: 00007fffffe0b62f R14: 00007f1054173400 R15: 0000000000022000 Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred") Reported-by: [email protected] Signed-off-by: Alexey Gladkov <[email protected]> Link: https://lkml.kernel.org/r/97433b1742c3331f02ad92de5a4f07d673c90613.1629735352.git.legion@kernel.org Signed-off-by: Eric W. Biederman <[email protected]>
2021-08-23ucounts: Fix regression preventing increasing of rlimits in init_user_nsEric W. Biederman1-4/+4
"Ma, XinjianX" <[email protected]> reported: > When lkp team run kernel selftests, we found after these series of patches, testcase mqueue: mq_perf_tests > in kselftest failed with following message. > > # selftests: mqueue: mq_perf_tests > # > # Initial system state: > # Using queue path: /mq_perf_tests > # RLIMIT_MSGQUEUE(soft): 819200 > # RLIMIT_MSGQUEUE(hard): 819200 > # Maximum Message Size: 8192 > # Maximum Queue Size: 10 > # Nice value: 0 > # > # Adjusted system state for testing: > # RLIMIT_MSGQUEUE(soft): (unlimited) > # RLIMIT_MSGQUEUE(hard): (unlimited) > # Maximum Message Size: 16777216 > # Maximum Queue Size: 65530 > # Nice value: -20 > # Continuous mode: (disabled) > # CPUs to pin: 3 > # ./mq_perf_tests: mq_open() at 296: Too many open files > not ok 2 selftests: mqueue: mq_perf_tests # exit=1 > ``` > > Test env: > rootfs: debian-10 > gcc version: 9 After investigation the problem turned out to be that ucount_max for the rlimits in init_user_ns was being set to the initial rlimit value. The practical problem is that ucount_max provides a limit that applications inside the user namespace can not exceed. Which means in practice that rlimits that have been converted to use the ucount infrastructure were not able to exceend their initial rlimits. Solve this by setting the relevant values of ucount_max to RLIM_INIFINITY. A limit in init_user_ns is pointless so the code should allow the values to grow as large as possible without riscking an underflow or an overflow. As the ltp test case was a bit of a pain I have reproduced the rlimit failure and tested the fix with the following little C program: > #include <stdio.h> > #include <fcntl.h> > #include <sys/stat.h> > #include <mqueue.h> > #include <sys/time.h> > #include <sys/resource.h> > #include <errno.h> > #include <string.h> > #include <stdlib.h> > #include <limits.h> > #include <unistd.h> > > int main(int argc, char **argv) > { > struct mq_attr mq_attr; > struct rlimit rlim; > mqd_t mqd; > int ret; > > ret = getrlimit(RLIMIT_MSGQUEUE, &rlim); > if (ret != 0) { > fprintf(stderr, "getrlimit(RLIMIT_MSGQUEUE) failed: %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > printf("RLIMIT_MSGQUEUE %lu %lu\n", > rlim.rlim_cur, rlim.rlim_max); > rlim.rlim_cur = RLIM_INFINITY; > rlim.rlim_max = RLIM_INFINITY; > ret = setrlimit(RLIMIT_MSGQUEUE, &rlim); > if (ret != 0) { > fprintf(stderr, "setrlimit(RLIMIT_MSGQUEUE, RLIM_INFINITY) failed: %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > > memset(&mq_attr, 0, sizeof(struct mq_attr)); > mq_attr.mq_maxmsg = 65536 - 1; > mq_attr.mq_msgsize = 16*1024*1024 - 1; > > mqd = mq_open("/mq_rlimit_test", O_RDONLY|O_CREAT, 0600, &mq_attr); > if (mqd == (mqd_t)-1) { > fprintf(stderr, "mq_open failed: %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > ret = mq_close(mqd); > if (ret) { > fprintf(stderr, "mq_close failed; %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > > return EXIT_SUCCESS; > } Fixes: 6e52a9f0532f ("Reimplement RLIMIT_MSGQUEUE on top of ucounts") Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts") Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Reported-by: kernel test robot [email protected] Acked-by: Alexey Gladkov <[email protected]> Link: https://lkml.kernel.org/r/87eeajswfc.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <[email protected]>
2021-08-23bpf: Fix ringbuf helper function compatibilityDaniel Borkmann1-2/+6
Commit 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") extended check_map_func_compatibility() by enforcing map -> helper function match, but not helper -> map type match. Due to this all of the bpf_ringbuf_*() helper functions could be used with a wrong map type such as array or hash map, leading to invalid access due to type confusion. Also, both BPF_FUNC_ringbuf_{submit,discard} have ARG_PTR_TO_ALLOC_MEM as argument and not a BPF map. Therefore, their check_map_func_compatibility() presence is incorrect since it's only for map type checking. Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Ryota Shiga (Flatt Security) Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]>
2021-08-23io_uring: remove files pointer in cancellation functionsHao Xu1-1/+1
When doing cancellation, we use a parameter to indicate where it's from do_exit or exec. So a boolean value is good enough for this, remove the struct files* as it is not necessary. Signed-off-by: Hao Xu <[email protected]> [axboe: fixup io_uring_files_cancel for !CONFIG_IO_URING] Signed-off-by: Jens Axboe <[email protected]>
2021-08-23Merge back cpufreq changes for v5.15.Rafael J. Wysocki1-5/+11
2021-08-23irqdomain: Export irq_domain_disconnect_hierarchy()Maulik Shah1-0/+1
Export irq_domain_disconnect_hierarchy() so irqchip module drivers can use it. Signed-off-by: Maulik Shah <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20locking/semaphore: Add might_sleep() to down_*() familyXiaoming Ni1-0/+4
Semaphore is sleeping lock. Add might_sleep() to down*() family (with exception of down_trylock()) to detect atomic context sleep. Signed-off-by: Xiaoming Ni <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20sched: Introduce dl_task_check_affinity() to check proposed affinityWill Deacon1-17/+29
In preparation for restricting the affinity of a task during execve() on arm64, introduce a new dl_task_check_affinity() helper function to give an indication as to whether the restricted mask is admissible for a deadline task. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Daniel Bristot de Oliveira <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20sched: Allow task CPU affinity to be restricted on asymmetric systemsWill Deacon2-18/+181
Asymmetric systems may not offer the same level of userspace ISA support across all CPUs, meaning that some applications cannot be executed by some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do not feature support for 32-bit applications on both clusters. Although userspace can carefully manage the affinity masks for such tasks, one place where it is particularly problematic is execve() because the CPU on which the execve() is occurring may be incompatible with the new application image. In such a situation, it is desirable to restrict the affinity mask of the task and ensure that the new image is entered on a compatible CPU. From userspace's point of view, this looks the same as if the incompatible CPUs have been hotplugged off in the task's affinity mask. Similarly, if a subsequent execve() reverts to a compatible image, then the old affinity is restored if it is still valid. In preparation for restricting the affinity mask for compat tasks on arm64 systems without uniform support for 32-bit applications, introduce {force,relax}_compatible_cpus_allowed_ptr(), which respectively restrict and restore the affinity mask for a task based on the compatible CPUs. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Quentin Perret <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20sched: Split the guts of sched_setaffinity() into a helper functionWill Deacon1-48/+57
In preparation for replaying user affinity requests using a saved mask, split sched_setaffinity() up so that the initial task lookup and security checks are only performed when the request is coming directly from userspace. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20sched: Introduce task_struct::user_cpus_ptr to track requested affinityWill Deacon2-0/+22
In preparation for saving and restoring the user-requested CPU affinity mask of a task, add a new cpumask_t pointer to 'struct task_struct'. If the pointer is non-NULL, then the mask is copied across fork() and freed on task exit. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20sched: Reject CPU affinity changes based on task_cpu_possible_mask()Will Deacon1-1/+8
Reject explicit requests to change the affinity mask of a task via set_cpus_allowed_ptr() if the requested mask is not a subset of the mask returned by task_cpu_possible_mask(). This ensures that the 'cpus_mask' for a given task cannot contain CPUs which are incapable of executing it, except in cases where the affinity is forced. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Quentin Perret <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()Will Deacon2-4/+9
select_fallback_rq() only needs to recheck for an allowed CPU if the affinity mask of the task has changed since the last check. Return a 'bool' from cpuset_cpus_allowed_fallback() to indicate whether the affinity mask was updated, and use this to elide the allowed check when the mask has been left alone. No functional change. Suggested-by: Valentin Schneider <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-08-20cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()Will Deacon1-17/+26
Asymmetric systems may not offer the same level of userspace ISA support across all CPUs, meaning that some applications cannot be executed by some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do not feature support for 32-bit applications on both clusters. Modify guarantee_online_cpus() to take task_cpu_possible_mask() into account when trying to find a suitable set of online CPUs for a given task. This will avoid passing an invalid mask to set_cpus_allowed_ptr() during ->attach() and will subsequently allow the cpuset hierarchy to be taken into account when forcefully overriding the affinity mask for a task which requires migration to a compatible CPU. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-20cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1Will Deacon1-2/+6
If the scheduler cannot find an allowed CPU for a task, cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask if cgroup v1 is in use. In preparation for allowing architectures to provide their own fallback mask, just return early if we're either using cgroup v1 or we're using cgroup v2 with a mask that contains invalid CPUs. This will allow select_fallback_rq() to figure out the mask by itself. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Quentin Perret <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-08-20sched: Introduce task_cpu_possible_mask() to limit fallback rq selectionWill Deacon1-6/+3
Asymmetric systems may not offer the same level of userspace ISA support across all CPUs, meaning that some applications cannot be executed by some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do not feature support for 32-bit applications on both clusters. On such a system, we must take care not to migrate a task to an unsupported CPU when forcefully moving tasks in select_fallback_rq() in response to a CPU hot-unplug operation. Introduce a task_cpu_possible_mask() hook which, given a task argument, allows an architecture to return a cpumask of CPUs that are capable of executing that task. The default implementation returns the cpu_possible_mask, since sane machines do not suffer from per-cpu ISA limitations that affect scheduling. The new mask is used when selecting the fallback runqueue as a last resort before forcing a migration to the first active CPU. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Quentin Perret <[email protected]> Link: https://lore.kernel.org/r/[email protected]