aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-01-28RDMA/cma: Fix unbalanced cm_id reference count during address resolveParav Pandit1-0/+2
Below commit missed the AF_IB and loopback code flow in rdma_resolve_addr(). This leads to an unbalanced cm_id refcount in cma_work_handler() which puts the refcount which was not incremented prior to queuing the work. A call trace is observed with such code flow: BUG: unable to handle kernel NULL pointer dereference at (null) [<ffffffff96b67e16>] __mutex_lock_slowpath+0x166/0x1d0 [<ffffffff96b6715f>] mutex_lock+0x1f/0x2f [<ffffffffc0beabb5>] cma_work_handler+0x25/0xa0 [<ffffffff964b9ebf>] process_one_work+0x17f/0x440 [<ffffffff964baf56>] worker_thread+0x126/0x3c0 Hence, hold the cm_id reference when scheduling the resolve work item. Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-01-28RDMA/umem: Fix ib_umem_find_best_pgsz()Artemy Kovalyov1-3/+6
Except for the last entry, the ending iova alignment sets the maximum possible page size as the low bits of the iova must be zero when starting the next chunk. Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Artemy Kovalyov <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Tested-by: Gal Pressman <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-01-28Merge branches 'x86/hyperv', 'x86/kdump' and 'x86/misc' into x86/urgent, to ↵Ingo Molnar8-34/+46
pick up single-commit branches Signed-off-by: Ingo Molnar <[email protected]>
2020-01-28Merge branch 'sched-core-for-linus' of ↵Linus Torvalds72-329/+446
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...
2020-01-28Merge branch 'perf-core-for-linus' of ↵Linus Torvalds128-1591/+2515
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...
2020-01-28Merge branch 'locking-core-for-linus' of ↵Linus Torvalds4-21/+28
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Just a handful of changes in this cycle: an ARM64 performance optimization, a comment fix and a debug output fix" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/osq: Use optimized spinning loop for arm64 locking/qspinlock: Fix inaccessible URL of MCS lock paper locking/lockdep: Fix lockdep_stats indentation problem
2020-01-28Merge branch 'efi-core-for-linus' of ↵Linus Torvalds109-2892/+2650
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Cleanup of the GOP [graphics output] handling code in the EFI stub - Complete refactoring of the mixed mode handling in the x86 EFI stub - Overhaul of the x86 EFI boot/runtime code - Increase robustness for mixed mode code - Add the ability to disable DMA at the root port level in the EFI stub - Get rid of RWX mappings in the EFI memory map and page tables, where possible - Move the support code for the old EFI memory mapping style into its only user, the SGI UV1+ support code. - plus misc fixes, updates, smaller cleanups. ... and due to interactions with the RWX changes, another round of PAT cleanups make a guest appearance via the EFI tree - with no side effects intended" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) efi/x86: Disable instrumentation in the EFI runtime handling code efi/libstub/x86: Fix EFI server boot failure efi/x86: Disallow efi=old_map in mixed mode x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping efi: Fix handling of multiple efi_fake_mem= entries efi: Fix efi_memmap_alloc() leaks efi: Add tracking for dynamically allocated memmaps efi: Add a flags parameter to efi_memory_map efi: Fix comment for efi_mem_type() wrt absent physical addresses efi/arm: Defer probe of PCIe backed efifb on DT systems efi/x86: Limit EFI old memory map to SGI UV machines efi/x86: Avoid RWX mappings for all of DRAM efi/x86: Don't map the entire kernel text RW for mixed mode x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd efi/libstub/x86: Fix unused-variable warning efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode efi/libstub/x86: Use const attribute for efi_is_64bit() efi: Allow disabling PCI busmastering on bridges during boot efi/x86: Allow translating 64-bit arguments for mixed mode calls ...
2020-01-29builddeb: split libc headers deployment out into a functionMasahiro Yamada1-14/+18
Deploy user-space headers (linux-libc-dev package) in a separate function for readability. Signed-off-by: Masahiro Yamada <[email protected]>
2020-01-29builddeb: split kernel headers deployment out into a functionMasahiro Yamada1-34/+42
Deploy kernel headers (linux-headers package) in a separate function for readability. Signed-off-by: Masahiro Yamada <[email protected]>
2020-01-29builddeb: remove redundant make for ARCH=umMasahiro Yamada1-2/+1
The kernel build has already been done before builddeb is invoked. Signed-off-by: Masahiro Yamada <[email protected]>
2020-01-29builddeb: avoid invoking sub-shells where possibleMasahiro Yamada1-13/+22
The commands surrounded by ( ... ) is run in a sub-shell, but you do not have to spawn a sub-shell for every single line. Use just one ( ... ) for creating debian/hdrsrcfiles. For tar, use -C option instead. Signed-off-by: Masahiro Yamada <[email protected]>
2020-01-29builddeb: remove redundant $objtree/Masahiro Yamada1-16/+16
This script works only when it is invoked in the $objtree, that is, it is already relying on $objtree is '.' Signed-off-by: Masahiro Yamada <[email protected]>
2020-01-29builddeb: match temporary directory name to the package nameMasahiro Yamada1-4/+4
The temporary directory names, debian/hdrtmp (linux-headers package) vs debian/headertmp (linux-libc-dev package), are confusing. Matching the directory name to the package name is clearer, IMHO. Signed-off-by: Masahiro Yamada <[email protected]>
2020-01-29builddeb: remove unneeded files in hdrobjfiles for headers packageMasahiro Yamada1-2/+2
- We do not need tools/objtool/fixdep or tools/objtool/sync-check.sh for building external modules. Including tools/objtool/objtool is enough. - gcc-common.h is a check-in file. I do not see any point to search for it in objtree. Signed-off-by: Masahiro Yamada <[email protected]>
2020-01-28Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds46-851/+1476
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU changes in this cycle were: - Expedited grace-period updates - kfree_rcu() updates - RCU list updates - Preemptible RCU updates - Torture-test updates - Miscellaneous fixes - Documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) rcu: Remove unused stop-machine #include powerpc: Remove comment about read_barrier_depends() .mailmap: Add entries for old [email protected] addresses srcu: Apply *_ONCE() to ->srcu_last_gp_end rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask() rcu: Move rcu_{expedited,normal} definitions into rcupdate.h rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h rcu: Remove the declaration of call_rcu() in tree.h rcu: Fix tracepoint tracking RCU CPU kthread utilization rcu: Fix harmless omission of "CONFIG_" from #if condition rcu: Avoid tick_dep_set_cpu() misordering rcu: Provide wrappers for uses of ->rcu_read_lock_nesting rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special() rcu: Clear ->rcu_read_unlock_special only once rcu: Clear .exp_hint only when deferred quiescent state has been reported rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU rcu: Remove kfree_call_rcu_nobatch() rcu: Remove kfree_rcu() special casing and lazy-callback handling rcu: Add support for debug_objects debugging for kfree_rcu() rcu: Add multiple in-flight batches of kfree_rcu() work ...
2020-01-28Merge branch 'core-objtool-for-linus' of ↵Linus Torvalds18-396/+559
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "The main changes are to move the ORC unwind table sorting from early init to build-time - this speeds up booting. No change in functionality intended" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Fix !CONFIG_MODULES build warning x86/unwind/orc: Remove boot-time ORC unwind tables sorting scripts/sorttable: Implement build-time ORC unwind table sorting scripts/sorttable: Rename 'sortextable' to 'sorttable' scripts/sortextable: Refactor the do_func() function scripts/sortextable: Remove dead code scripts/sortextable: Clean up the code to meet the kernel coding style better scripts/sortextable: Rewrite error/success handling
2020-01-28scripts/dtc: Revert "yamltree: Ensure consistent bracketing of properties ↵Rob Herring1-21/+0
with phandles" This reverts upstream commit 18d7b2f4ee45fec422b7d82bab0b3c762ee907e4. A revert in upstream dtc is pending. This commit didn't work for properties such as 'interrupt-map' that have phandle in the middle of an entry. It would also not work for a 0 or -1 phandle value that acts as a NULL. Signed-off-by: Rob Herring <[email protected]>
2020-01-28Merge branch 'core-headers-for-linus' of ↵Linus Torvalds18-122/+81
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull header cleanup from Ingo Molnar: "This is a treewide cleanup, mostly (but not exclusively) with x86 impact, which breaks implicit dependencies on the asm/realtime.h header and finally removes it from asm/acpi.h" * 'core-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ACPI/sleep: Move acpi_get_wakeup_address() into sleep.c, remove <asm/realmode.h> from <asm/acpi.h> ACPI/sleep: Convert acpi_wakeup_address into a function x86/ACPI/sleep: Remove an unnecessary include of asm/realmode.h ASoC: Intel: Skylake: Explicitly include linux/io.h for virt_to_phys() vmw_balloon: Explicitly include linux/io.h for virt_to_phys() virt: vbox: Explicitly include linux/io.h to pick up various defs efi/capsule-loader: Explicitly include linux/io.h for page_to_phys() perf/x86/intel: Explicitly include asm/io.h to use virt_to_phys() x86/kprobes: Explicitly include vmalloc.h for set_vm_flush_reset_perms() x86/ftrace: Explicitly include vmalloc.h for set_vm_flush_reset_perms() x86/boot: Explicitly include realmode.h to handle RM reservations x86/efi: Explicitly include realmode.h to handle RM trampoline quirk x86/platform/intel/quark: Explicitly include linux/io.h for virt_to_phys() x86/setup: Enhance the comments x86/setup: Clean up the header portion of setup.c
2020-01-28of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpcMichael Ellerman3-1/+10
There's an OF helper called of_dma_is_coherent(), which checks if a device has a "dma-coherent" property to see if the device is coherent for DMA. But on some platforms devices are coherent by default, and on some platforms it's not possible to update existing device trees to add the "dma-coherent" property. So add a Kconfig symbol to allow arch code to tell of_dma_is_coherent() that devices are coherent by default, regardless of the presence of the property. Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie. when the system has a coherent cache. Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper") Cc: [email protected] # v3.16+ Reported-by: Christian Zigotzky <[email protected]> Tested-by: Christian Zigotzky <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Ulf Hansson <[email protected]> Signed-off-by: Rob Herring <[email protected]>
2020-01-28Revert "gfs2: eliminate tr_num_revoke_rm"Bob Peterson3-4/+6
This reverts commit e955537e3262de8e56f070b13817f525f472fa00. Before patch e955537e32, tr_num_revoke tracked the number of revokes added to the transaction, and tr_num_revoke_rm tracked how many revokes were removed. But since revokes are queued off the sdp (superblock) pointer, some transactions could remove more revokes than they added. (e.g. revokes added by a different process). Commit e955537e32 eliminated transaction variable tr_num_revoke_rm, but in order to do so, it changed the accounting to always use tr_num_revoke for its math. Since you can remove more revokes than you add, tr_num_revoke could now become a negative value. This negative value broke the assert in function gfs2_trans_end: if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) && (tr->tr_num_revoke <=3D tr->tr_revokes))) One way to fix this is to simply remove the tr_num_revoke clause from the assert and allow the value to become negative. Andreas didn't like that idea, so instead, we decided to revert e955537e32. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2020-01-28PCI: brcmstb: Add MSI supportJim Quinlan2-1/+262
This adds MSI support to the Broadcom STB PCIe host controller. The MSI controller is physically located within the PCIe block, however, there is no reason why the MSI controller could not be moved elsewhere in the future. MSIX is not supported by the HW. Since the internal Brcmstb MSI controller is intertwined with the PCIe controller, it is not its own platform device but rather part of the PCIe platform device. Signed-off-by: Jim Quinlan <[email protected]> Co-developed-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Reviewed-by: Andrew Murray <[email protected]>
2020-01-28PCI: brcmstb: Add Broadcom STB PCIe host controller driverJim Quinlan3-0/+764
This adds a basic driver for Broadcom's STB PCIe controller, for now aimed at Raspberry Pi 4's SoC, bcm2711. Signed-off-by: Jim Quinlan <[email protected]> Co-developed-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Nicolas Saenz Julienne <[email protected]> [[email protected]: updated brcm_pcie_get_rc_bar2_size_and_offset()according to https://lore.kernel.org/linux-pci/[email protected]] Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Andrew Murray <[email protected]> Reviewed-by: Jeremy Linton <[email protected]>
2020-01-28KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integerAlexandru Elisei1-1/+2
According to the ARM ARM, registers CNT{P,V}_TVAL_EL0 have bits [63:32] RES0 [1]. When reading the register, the value is truncated to the least significant 32 bits [2], and on writes, TimerValue is treated as a signed 32-bit integer [1, 2]. When the guest behaves correctly and writes 32-bit values, treating TVAL as an unsigned 64 bit register works as expected. However, things start to break down when the guest writes larger values, because (u64)0x1_ffff_ffff = 8589934591. but (s32)0x1_ffff_ffff = -1, and the former will cause the timer interrupt to be asserted in the future, but the latter will cause it to be asserted now. Let's treat TVAL as a signed 32-bit register on writes, to match the behaviour described in the architecture, and the behaviour experimentally exhibited by the virtual timer on a non-vhe host. [1] Arm DDI 0487E.a, section D13.8.18 [2] Arm DDI 0487E.a, section D11.2.4 Signed-off-by: Alexandru Elisei <[email protected]> [maz: replaced the read-side mask with lower_32_bits] Signed-off-by: Marc Zyngier <[email protected]> Fixes: 8fa761624871 ("KVM: arm/arm64: arch_timer: Fix CNTP_TVAL calculation") Link: https://lore.kernel.org/r/[email protected]
2020-01-28KVM: arm64: pmu: Only handle supported event countersEric Auger1-5/+5
Let the code never use unsupported event counters. Change kvm_pmu_handle_pmcr() to only reset supported counters and kvm_pmu_vcpu_reset() to only stop supported counters. Other actions are filtered on the supported counters in kvm/sysregs.c Signed-off-by: Eric Auger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-28KVM: arm64: pmu: Fix chained SW_INCR countersEric Auger1-13/+30
At the moment a SW_INCR counter always overflows on 32-bit boundary, independently on whether the n+1th counter is programmed as CHAIN. Check whether the SW_INCR counter is a 64b counter and if so, implement the 64b logic. Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Signed-off-by: Eric Auger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-28KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabledEric Auger1-29/+33
At the moment we update the chain bitmap on type setting. This does not take into account the enable state of the odd register. Let's make sure a counter is never considered as chained if the high counter is disabled. We recompute the chain state on enable/disable and type changes. Also let create_perf_event() use the chain bitmap and not use kvm_pmu_idx_has_chain_evtype(). Suggested-by: Marc Zyngier <[email protected]> Signed-off-by: Eric Auger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-28KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unsetEric Auger1-0/+3
The specification says PMSWINC increments PMEVCNTR<n>_EL1 by 1 if PMEVCNTR<n>_EL0 is enabled and configured to count SW_INCR. For PMEVCNTR<n>_EL0 to be enabled, we need both PMCNTENSET to be set for the corresponding event counter but we also need the PMCR.E bit to be set. Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register") Signed-off-by: Eric Auger <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Andrew Murray <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-28net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROCScott Branden1-0/+1
Add default MDIO_BCM_IPROC Kconfig setting such that it is default on for IPROC family of devices. Signed-off-by: Scott Branden <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-28udp: segment looped gso packets correctlyWillem de Bruijn1-0/+3
Multicast and broadcast packets can be looped from egress to ingress pre segmentation with dev_loopback_xmit. That function unconditionally sets ip_summed to CHECKSUM_UNNECESSARY. udp_rcv_segment segments gso packets in the udp rx path. Segmentation usually executes on egress, and does not expect packets of this type. __udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But the offsets are not correct for gso_make_checksum. UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set to the correct pseudo header checksum. Reset ip_summed to this type. (CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h) Reported-by: syzbot <[email protected]> Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection") Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-28netem: change mailing listStephen Hemminger1-1/+1
The old netem mailing list was inactive and recently was targeted by spammers. Switch to just using netdev mailing list which is where all the real change happens. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-28prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaimMike Christie3-0/+30
There are several storage drivers like dm-multipath, iscsi, tcmu-runner, amd nbd that have userspace components that can run in the IO path. For example, iscsi and nbd's userspace deamons may need to recreate a socket and/or send IO on it, and dm-multipath's daemon multipathd may need to send SG IO or read/write IO to figure out the state of paths and re-set them up. In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the memalloc_*_save/restore functions to control the allocation behavior, but for userspace we would end up hitting an allocation that ended up writing data back to the same device we are trying to allocate for. The device is then in a state of deadlock, because to execute IO the device needs to allocate memory, but to allocate memory the memory layers want execute IO to the device. Here is an example with nbd using a local userspace daemon that performs network IO to a remote server. We are using XFS on top of the nbd device, but it can happen with any FS or other modules layered on top of the nbd device that can write out data to free memory. Here a nbd daemon helper thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute a request. This kicks off a reclaim operation which results in a WRITE to the nbd device and the nbd thread calling back into the mm layer. [ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000 [ 1626.609193] Call Trace: [ 1626.609195] ? __schedule+0x29b/0x630 [ 1626.609197] ? wait_for_completion+0xe0/0x170 [ 1626.609198] schedule+0x30/0xb0 [ 1626.609200] schedule_timeout+0x1f6/0x2f0 [ 1626.609202] ? blk_finish_plug+0x21/0x2e [ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410 [ 1626.609206] ? wait_for_completion+0xe0/0x170 [ 1626.609208] wait_for_completion+0x108/0x170 [ 1626.609210] ? wake_up_q+0x70/0x70 [ 1626.609212] ? __xfs_buf_submit+0x12e/0x250 [ 1626.609214] ? xfs_bwrite+0x25/0x60 [ 1626.609215] xfs_buf_iowait+0x22/0xf0 [ 1626.609218] __xfs_buf_submit+0x12e/0x250 [ 1626.609220] xfs_bwrite+0x25/0x60 [ 1626.609222] xfs_reclaim_inode+0x2e8/0x310 [ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300 [ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40 [ 1626.609228] super_cache_scan+0x152/0x1a0 [ 1626.609231] do_shrink_slab+0x12c/0x2d0 [ 1626.609233] shrink_slab+0x9c/0x2a0 [ 1626.609235] shrink_node+0xd7/0x470 [ 1626.609237] do_try_to_free_pages+0xbf/0x380 [ 1626.609240] try_to_free_pages+0xd9/0x1f0 [ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30 [ 1626.609251] ? ___slab_alloc+0x238/0x560 [ 1626.609254] __alloc_pages_nodemask+0x30c/0x350 [ 1626.609259] skb_page_frag_refill+0x97/0xd0 [ 1626.609274] sk_page_frag_refill+0x1d/0x80 [ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0 [ 1626.609304] tcp_sendmsg+0x27/0x40 [ 1626.609307] sock_sendmsg+0x54/0x60 [ 1626.609308] ___sys_sendmsg+0x29f/0x320 [ 1626.609313] ? sock_poll+0x66/0xb0 [ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0 [ 1626.609320] ? ep_send_events_proc+0xe6/0x230 [ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0 [ 1626.609324] ? ep_read_events_proc+0xc0/0xc0 [ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20 [ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230 [ 1626.609329] ? __hrtimer_init+0xb0/0xb0 [ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20 [ 1626.609334] ? ep_poll+0x26c/0x4a0 [ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0 [ 1626.609339] ? release_sock+0x43/0x90 [ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20 [ 1626.609342] __sys_sendmsg+0x47/0x80 [ 1626.609347] do_syscall_64+0x5f/0x1c0 [ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0 [ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch adds a new prctl command that daemons can use after they have done their initial setup, and before they start to do allocations that are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags so both userspace block and FS threads can use it to avoid the allocation recursion and try to prevent from being throttled while writing out data to free up memory. Signed-off-by: Mike Christie <[email protected]> Acked-by: Michal Hocko <[email protected]> Tested-by: Masato Suzuki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2020-01-28Merge branch 'core/kprobes' into perf/core, to pick up fixesIngo Molnar2-25/+45
Signed-off-by: Ingo Molnar <[email protected]>
2020-01-28Merge branch 'core/documentation' into core/urgent, to pick up single commitIngo Molnar1-4/+1
Signed-off-by: Ingo Molnar <[email protected]>
2020-01-27Merge tag 'x86-pti-2020-01-28' of ↵Linus Torvalds2-7/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "The performance deterioration departement provides a few non-scary fixes and improvements: - Update the cached HLE state when the TSX state is changed via the new control register. This ensures feature bit consistency. - Exclude the new Zhaoxin CPUs from Spectre V2 and SWAPGS vulnerabilities" * tag 'x86-pti-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/swapgs: Exclude Zhaoxin CPUs from SWAPGS vulnerability x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2 x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
2020-01-27Merge tag 'irq-core-2020-01-28' of ↵Linus Torvalds33-111/+2024
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The interrupt departement provides: - A mechanism to shield isolated tasks from managed interrupts: The affinity of managed interrupts is completely controlled by the kernel and user space has no influence on them. The reason is that the automatically assigned affinity correlates to the multi-queue CPU handling of block devices. If the generated affinity mask spaws both housekeeping and isolated CPUs the interrupt could be routed to an isolated CPU which would then be disturbed by I/O submitted by a housekeeping CPU. The new mechamism ensures that as long as one housekeeping CPU is online in the assigned affinity mask the interrupt is routed to a housekeeping CPU. If there is no online housekeeping CPU in the affinity mask, then the interrupt is routed to an isolated CPU to keep the device queue intact, but unless the isolated CPU submits I/O by itself these interrupts are not raised. - A small addon to the device tree irqdomain core code to avoid duplication in irq chip drivers - Conversion of the SiFive PLIC to hierarchical domains - The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI, NXP INTMUX, Meson A1 GPIO - The first cut of support for the new ARM GICv4.1 - The usual pile of fixes and improvements in core and driver code" * tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) genirq, sched/isolation: Isolate from handling managed interrupts irqchip/gic-v4.1: Allow direct invalidation of VLPIs irqchip/gic-v4.1: Suppress per-VLPI doorbell irqchip/gic-v4.1: Add VPE INVALL callback irqchip/gic-v4.1: Add VPE eviction callback irqchip/gic-v4.1: Add VPE residency callback irqchip/gic-v4.1: Add mask/unmask doorbell callbacks irqchip/gic-v4.1: Plumb skeletal VPE irqchip irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation irqchip/gic-v3: Add GICv4.1 VPEID size discovery irqchip/gic-v3: Detect GICv4.1 supporting RVPEID irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells irqdomain: Fix a memory leak in irq_domain_push_irq() irqchip: Add NXP INTMUX interrupt multiplexer support dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer irqchip: Define EXYNOS_IRQ_COMBINER irqchip/meson-gpio: Add support for meson a1 SoCs ...
2020-01-27Merge tag 'smp-core-2020-01-28' of ↵Linus Torvalds6-72/+56
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core SMP updates from Thomas Gleixner: "A small set of SMP core code changes: - Rework the smp function call core code to avoid the allocation of an additional cpumask - Remove the not longer required GFP argument from on_each_cpu_cond() and on_each_cpu_cond_mask() and fixup the callers" * tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove allocation mask from on_each_cpu_cond.*() smp: Add a smp_cond_func_t argument to smp_call_function_many() smp: Use smp_cond_func_t as type for the conditional function
2020-01-27Merge tag 'timers-core-2020-01-27' of ↵Linus Torvalds72-270/+3027
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timekeeping and timers departement provides: - Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (ie where host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code" * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n alarmtimer: Use wakeup source from alarmtimer platform device alarmtimer: Make alarmtimer platform device child of RTC device alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality hrtimer: Add missing sparse annotation for __run_timer() lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources clocksource/drivers/timer-microchip-pit64b: Fix sparse warning clocksource/drivers/exynos_mct: Rename Exynos to lowercase clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access clocksource/drivers/timer-ti-dm: Switch to platform_get_irq clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource clocksource/drivers/bcm2835_timer: Fix memory leak of timer clocksource/drivers/cadence-ttc: Use ttc driver as platform driver clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page ...
2020-01-27Merge tag 'core-debugobjects-2020-01-28' of ↵Linus Torvalds1-21/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects update from Thomas Gleixner: "A single commit for debug objects which fixes a pile of potential data races detected by KCSAN" * tag 'core-debugobjects-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Fix various data races
2020-01-27Merge tag 'core-core-2020-01-28' of ↵Linus Torvalds1-24/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull watchdog updates from Thomas Gleixner: "A set of watchdog/softlockup related improvements: - Enforce that the watchdog timestamp is always valid on boot. The original implementation caused a watchdog disabled gap of one second in the boot process due to truncation of the underlying sched clock. The sched clock is divided by 1e9 to convert nanoseconds to seconds. So for the first second of the boot process the result is 0 which is at the same time the indicator to disable the watchdog. The trivial fix is to change the disabled indicator to ULONG_MAX. - Two cleanup patches removing unused and redundant code which got forgotten to be cleaned up in previous changes" * tag 'core-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: watchdog/softlockup: Enforce that timestamp is valid on boot watchdog/softlockup: Remove obsolete check of last reported task watchdog: Remove soft_lockup_hrtimer_cnt and related code
2020-01-27Merge tag 'timers-urgent-2020-01-27' of ↵Linus Torvalds3-24/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes for the generic VDSO code which missed 5.5: - Make the update to the coarse timekeeper unconditional. This is required because the coarse timekeeper interfaces in the VDSO do not depend on a VDSO capable clocksource. If the system does not have a VDSO capable clocksource and the update is depending on the VDSO capable clocksource, the coarse VDSO interfaces would operate on stale data forever. - Invert the logic of __arch_update_vdso_data() to avoid further head scratching. Tripped over this several times while analyzing the update problem above" * tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/vdso: Update coarse timekeeper unconditionally lib/vdso: Make __arch_update_vdso_data() logic understandable
2020-01-27Merge tag 'selinux-pr-20200127' of ↵Linus Torvalds30-559/+1045
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux update from Paul Moore: "This is one of the bigger SELinux pull requests in recent years with 28 patches. Everything is passing our test suite and the highlights are below: - Mark CONFIG_SECURITY_SELINUX_DISABLE as deprecated. We're some time away from actually attempting to remove this in the kernel, but the only distro we know that still uses it (Fedora) is working on moving away from this so we want to at least let people know we are planning to remove it. - Reorder the SELinux hooks to help prevent bad things when SELinux is disabled at runtime. The proper fix is to remove the CONFIG_SECURITY_SELINUX_DISABLE functionality (see above) and just take care of it at boot time (e.g. "selinux=0"). - Add SELinux controls for the kernel lockdown functionality, introducing a new SELinux class/permissions: "lockdown { integrity confidentiality }". - Add a SELinux control for move_mount(2) that reuses the "file { mounton }" permission. - Improvements to the SELinux security label data store lookup functions to speed up translations between our internal label representations and the visible string labels (both directions). - Revisit a previous fix related to SELinux inode auditing and permission caching and do it correctly this time. - Fix the SELinux access decision cache to cleanup properly on error. In some extreme cases this could limit the cache size and result in a decrease in performance. - Enable SELinux per-file labeling for binderfs. - The SELinux initialized and disabled flags were wrapped with accessors to ensure they are accessed correctly. - Mark several key SELinux structures with __randomize_layout. - Changes to the LSM build configuration to only build security/lsm_audit.c when needed. - Changes to the SELinux build configuration to only build the IB object cache when CONFIG_SECURITY_INFINIBAND is enabled. - Move a number of single-caller functions into their callers. - Documentation fixes (/selinux -> /sys/fs/selinux). - A handful of cleanup patches that aren't worth mentioning on their own, the individual descriptions have plenty of detail" * tag 'selinux-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (28 commits) selinux: fix regression introduced by move_mount(2) syscall selinux: do not allocate ancillary buffer on first load selinux: remove redundant allocation and helper functions selinux: remove redundant selinux_nlmsg_perm selinux: fix wrong buffer types in policydb.c selinux: reorder hooks to make runtime disable less broken selinux: treat atomic flags more carefully selinux: make default_noexec read-only after init selinux: move ibpkeys code under CONFIG_SECURITY_INFINIBAND. selinux: remove redundant msg_msg_alloc_security Documentation,selinux: fix references to old selinuxfs mount point selinux: deprecate disabling SELinux and runtime selinux: allow per-file labelling for binderfs selinuxfs: use scnprintf to get real length for inode selinux: remove set but not used variable 'sidtab' selinux: ensure the policy has been loaded before reading the sidtab stats selinux: ensure we cleanup the internal AVC counters on error in avc_update() selinux: randomize layout of key structures selinux: clean up selinux_enabled/disabled/enforcing_boot selinux: remove unnecessary selinux cred request ...
2020-01-27Merge tag 'audit-pr-20200127' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit update from Paul Moore: "One small audit patch for the Linux v5.6 merge window, and unsurprisingly it passes our test suite with flying colors" * tag 'audit-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: Add __rcu annotation to RCU pointer
2020-01-27Merge branch 'for-5.6' of ↵Linus Torvalds6-20/+227
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cgroup2 interface for hugetlb controller. I think this was the last remaining bit which was missing from cgroup2 - fixes for race and a spurious warning in threaded cgroup handling - other minor changes * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: iocost: Fix iocost_monitor.py due to helper type mismatch cgroup: Prevent double killing of css when enabling threaded cgroup cgroup: fix function name in comment mm: hugetlb controller for cgroups v2
2020-01-27Merge branch 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2-23/+29
Pull workqueue updates from Tejun Heo: "Just a couple tracepoint patches" * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove workqueue_work event class workqueue: add worker function to workqueue_execute_end tracepoint
2020-01-27io-wq: make the io_wq ref countedJens Axboe1-1/+10
In preparation for sharing an io-wq across different users, add a reference count that manages destruction of it. Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-27io_uring: fix refcounting with batched allocations at OOMPavel Begunkov1-2/+5
In case of out of memory the second argument of percpu_ref_put_many() in io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly wrong. Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-27io_uring: add comment for drain_nextPavel Begunkov1-0/+7
Draining the middle of a link is tricky, so leave a comment there Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-27io_uring: don't attempt to copy iovec for READ/WRITEJens Axboe1-2/+1
For the non-vectored variant of READV/WRITEV, we don't need to setup an async io context, and we flag that appropriately in the io_op_defs array. However, in fixing this for the 5.5 kernel in commit 74566df3a71c we didn't have these opcodes, so the check there was added just for the READ_FIXED and WRITE_FIXED opcodes. Replace that check with just a single check for needing async context, that covers all four of these read/write variants that don't use an iovec. Signed-off-by: Jens Axboe <[email protected]>
2020-01-27scripts/find-unused-docs: Fix massive false positivesGeert Uytterhoeven1-1/+1
scripts/find-unused-docs.sh invokes scripts/kernel-doc to find out if a source file contains kerneldoc or not. However, as it passes the no longer supported "-text" option to scripts/kernel-doc, the latter prints out its help text, causing all files to be considered containing kerneldoc. Get rid of these false positives by removing the no longer supported "-text" option from the scripts/kernel-doc invocation. Cc: [email protected] # 4.16+ Fixes: b05142675310d2ac ("scripts: kernel-doc: get rid of unused output formats") Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2020-01-27Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremapLinus Torvalds366-600/+506
Pull ioremap updates from Christoph Hellwig: "Remove the ioremap_nocache API (plus wrappers) that are always identical to ioremap" * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap: remove ioremap_nocache and devm_ioremap_nocache MIPS: define ioremap_nocache to ioremap