aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-10-19swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrsChristoph Hellwig1-22/+12
No need to duplicate the code - map_sg is equivalent to map_page for each page in the scatterlist. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-10-19swiotlb: merge swiotlb_unmap_page and unmap_singleChristoph Hellwig1-11/+4
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-10-19swiotlb: remove the overflow bufferChristoph Hellwig2-58/+3
Like all other dma mapping drivers just return an error code instead of an actual memory buffer. The reason for the overflow buffer was that at the time swiotlb was invented there was no way to check for dma mapping errors, but this has long been fixed. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Catalin Marinas <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-10-19swiotlb: do not panic on mapping failuresChristoph Hellwig1-32/+1
All properly written drivers now have error handling in the dma_map_single / dma_map_page callers. As swiotlb_tbl_map_single already prints a useful warning when running out of swiotlb pool space we can also remove swiotlb_full entirely as it serves no purpose now. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-10-19swiotlb: mark is_swiotlb_buffer staticChristoph Hellwig1-1/+1
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-10-19swiotlb: remove a pointless commentChristoph Hellwig1-6/+0
This comments describes an aspect of the map_sg interface that isn't even exploited by swiotlb. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-10-09dma-direct: respect DMA_ATTR_NO_WARNChristoph Hellwig1-0/+3
Respect the DMA_ATTR_NO_WARN flags for allocations in dma-direct. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Robin Murphy <[email protected]>
2018-10-09dma-direct: document the zone selection logicChristoph Hellwig1-1/+8
What we are doing here isn't quite obvious, so add a comment explaining it. Signed-off-by: Christoph Hellwig <[email protected]>
2018-10-08dma-debug: Check for drivers mapping invalid addresses in dma_map_single()Stephen Boyd1-0/+16
I recently debugged a DMA mapping oops where a driver was trying to map a buffer returned from request_firmware() with dma_map_single(). Memory returned from request_firmware() is mapped into the vmalloc region and this isn't a valid region to map with dma_map_single() per the DMA documentation's "What memory is DMA'able?" section. Unfortunately, we don't really check that in the DMA debugging code, so enabling DMA debugging doesn't help catch this problem. Let's add a new DMA debug function to check for a vmalloc address or an invalid virtual address and print a warning if this happens. This makes it a little easier to debug these sorts of problems, instead of seeing odd behavior or crashes when drivers attempt to map the vmalloc space for DMA. Cc: Marek Szyprowski <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-10-05dma-direct: fix return value of dma_direct_supportedAlexander Duyck1-3/+1
It appears that in commit 9d7a224b463e ("dma-direct: always allow dma mask <= physiscal memory size") the logic of the test was changed from a "<" to a ">=" however I don't see any reason for that change. I am assuming that there was some additional change planned, specifically I suspect the logic was intended to be reversed and possibly used for a return. Since that is the case I have gone ahead and done that. This addresses issues I had on my system that prevented me from booting with the above mentioned commit applied on an x86_64 system w/ Intel IOMMU. Fixes: 9d7a224b463e ("dma-direct: always allow dma mask <= physiscal memory size") Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Robin Murphy <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-10-01dma-direct: always allow dma mask <= physiscal memory sizeChristoph Hellwig1-12/+16
This way an architecture with less than 4G of RAM can support dma_mask smaller than 32-bit without a ZONE_DMA. Apparently that is a common case on powerpc. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-10-01dma-direct: implement complete bus_dma_mask handlingChristoph Hellwig1-10/+11
Instead of rejecting devices with a too small bus_dma_mask we can handle by taking the bus dma_mask into account for allocations and bounce buffering decisions. Signed-off-by: Christoph Hellwig <[email protected]>
2018-10-01dma-direct: refine dma_direct_alloc zone selectionChristoph Hellwig1-10/+21
We need to take the DMA offset and encryption bit into account when selecting a zone. User the opportunity to factor out the zone selection into a helper for reuse. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-10-01dma-direct: add an explicit dma_direct_get_required_maskChristoph Hellwig1-3/+19
This is somewhat modelled after the powerpc version, and differs from the legacy fallback in use fls64 instead of pointlessly splitting up the address into low and high dwords and in that it takes (__)phys_to_dma into account. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-09-20dma-mapping: support non-coherent devices in dma_common_get_sgtableChristoph Hellwig1-7/+16
We can use the arch_dma_coherent_to_pfn hook to provide a ->get_sgtable implementation. Note that this isn't an endorsement of this interface (which is a horrible bad idea), but it is required to move arm64 over to the generic code without a loss of functionality. Signed-off-by: Christoph Hellwig <[email protected]>
2018-09-20dma-mapping: consolidate the dma mmap implementationsChristoph Hellwig3-26/+27
The only functional differences (modulo a few missing fixes in the arch code) is that architectures without coherent caches need a hook to convert a virtual or dma address into a pfn, given that we don't have the kernel linear mapping available for the otherwise easy virt_to_page call. As a side effect we can support mmap of the per-device coherent area even on architectures not providing the callback, and we make previous dangerous default methods dma_common_mmap actually save for non-coherent architectures by rejecting it without the right helper. In addition to that we need a hook so that some architectures can override the protection bits when mmaping a dma coherent allocations. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Paul Burton <[email protected]> # MIPS parts
2018-09-20dma-mapping: merge direct and noncoherent opsChristoph Hellwig4-120/+117
All the cache maintainance is already stubbed out when not enabled, but merging the two allows us to nicely handle the case where cache maintainance is required for some devices, but not others. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Paul Burton <[email protected]> # MIPS parts
2018-09-20dma-mapping: move the dma_coherent flag to struct deviceChristoph Hellwig1-0/+3
Various architectures support both coherent and non-coherent dma on a per-device basis. Move the dma_noncoherent flag from the mips archdata field to struct device proper to prepare the infrastructure for reuse on other architectures. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Paul Burton <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]>
2018-09-20dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declarationChristoph Hellwig1-0/+3
The patch adding the infrastructure failed to actually add the symbol declaration, oops.. Fixes: faef87723a ("dma-noncoherent: add a arch_sync_dma_for_cpu_all hook") Signed-off-by: Christoph Hellwig <[email protected]>
2018-09-20dma-mapping: fix panic caused by passing empty cma command line argumentHe Zhe1-1/+5
early_cma does not check input argument before passing it to simple_strtoull. The argument would be a NULL pointer if "cma", without its value, is set in command line and thus causes the following panic. PANIC: early exception 0xe3 IP 10:ffffffffa3e9db8d error 0 cr2 0x0 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-yocto-standard+ #7 [ 0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70 ... [ 0.000000] Call Trace: [ 0.000000] simple_strtoull+0x29/0x70 [ 0.000000] memparse+0x26/0x90 [ 0.000000] early_cma+0x17/0x6a [ 0.000000] do_early_param+0x57/0x8e [ 0.000000] parse_args+0x208/0x320 [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_options+0x29/0x2d [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_param+0x36/0x4d [ 0.000000] setup_arch+0x336/0x99e [ 0.000000] start_kernel+0x6f/0x4e6 [ 0.000000] x86_64_start_reservations+0x24/0x26 [ 0.000000] x86_64_start_kernel+0x6f/0x72 [ 0.000000] secondary_startup_64+0xa4/0xb0 This patch adds a check to prevent the panic. Signed-off-by: He Zhe <[email protected]> Reviewed-by: Marek Szyprowski <[email protected]> Cc: [email protected] Signed-off-by: Christoph Hellwig <[email protected]>
2018-09-08dma-mapping: remove dma_deconfigureChristoph Hellwig1-6/+0
This goes through a lot of hooks just to call arch_teardown_dma_ops. Replace it with a direct call instead. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-09-08dma-mapping: remove dma_configureChristoph Hellwig1-10/+0
There is no good reason for this indirection given that the method always exists. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-09-06Merge tag 'trace-v4.19-rc2' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This fixes two annoying bugs: - The first one is a side effect caused by using SRCU for rcuidle tracepoints. It seems that the perf was depending on the rcuidle tracepoints to make RCU watch when it wasn't. The real fix will be to have perf use SRCU instead of depending on RCU watching, but that can't be done until SRCU is safe to use in NMI context (Paul's working on that). - The second bug fix is for a bug that's been periodically making my tests fail randomly for some time. I haven't had time to track it down, but finally have. It has to do with stressing NMIs (via perf) while enabling or disabling ftrace function handling with lockdep enabled. If an interrupt happens and just as it returns, it sets lockdep back to "interrupts enabled" but before it returns an NMI is triggered, and if this happens while printk_nmi_enter has a breakpoint attached to it (because ftrace is converting it to or from nop to call fentry), the breakpoint trap also calls into lockdep, and since returning from the NMI to a interrupt handler, interrupts were disabled when the NMI went off, lockdep keeps its state as interrupts disabled when it returns back from the interrupt handler where interrupts are enabled. This causes lockdep_assert_irqs_enabled() to trigger a false positive" * tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: printk/tracing: Do not trace printk_nmi_enter() tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints
2018-09-06printk/tracing: Do not trace printk_nmi_enter()Steven Rostedt (VMware)1-2/+2
I hit the following splat in my tests: ------------[ cut here ]------------ IRQs not enabled as expected WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tick_nohz_idle_enter+0x44/0x8c Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00 75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0 e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007 ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296 CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0 Call Trace: do_idle+0x33/0x202 cpu_startup_entry+0x61/0x63 start_secondary+0x18e/0x1ed startup_32_smp+0x164/0x168 irq event stamp: 18773830 hardirqs last enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10 hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b ---[ end trace b7c64aa79e17954a ]--- After a bit of debugging, I found what was happening. This would trigger when performing "perf" with a high NMI interrupt rate, while enabling and disabling function tracer. Ftrace uses breakpoints to convert the nops at the start of functions to calls to the function trampolines. The breakpoint traps disable interrupts and this makes calls into lockdep via the trace_hardirqs_off_thunk in the entry.S code. What happens is the following: do_idle { [interrupts enabled] <interrupt> [interrupts disabled] TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET test if pt_regs say return to interrupts enabled [yes] TRACE_IRQS_ON [lockdep says irqs are on] <nmi> nmi_enter() { printk_nmi_enter() [traced by ftrace] [ hit ftrace breakpoint ] <breakpoint exception> TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET [return from breakpoint] test if pt_regs say interrupts enabled [no] [iret back to interrupt] [iret back to code] tick_nohz_idle_enter() { lockdep_assert_irqs_enabled() [lockdep say no!] Although interrupts are indeed enabled, lockdep thinks it is not, and since we now do asserts via lockdep, it gives a false warning. The issue here is that printk_nmi_enter() is called before lockdep_off(), which disables lockdep (for this reason) in NMIs. By simply not allowing ftrace to see printk_nmi_enter() (via notrace annotation) we keep lockdep from getting confused. Cc: [email protected] Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky <[email protected]> Acked-by: Petr Mladek <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-09-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+1
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <[email protected]>: nilfs2: convert to SPDX license tags drivers/dax/device.c: convert variable to vm_fault_t type lib/Kconfig.debug: fix three typos in help text checkpatch: add __ro_after_init to known $Attribute mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name memory_hotplug: fix kernel_panic on offline page processing checkpatch: add optional static const to blank line declarations test ipc/shm: properly return EIDRM in shm_lock() mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported. mm/util.c: improve kvfree() kerneldoc tools/vm/page-types.c: fix "defined but not used" warning tools/vm/slabinfo.c: fix sign-compare warning kmemleak: always register debugfs file mm: respect arch_dup_mmap() return value mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm(). mm: memcontrol: print proper OOM header when no eligible victim left
2018-09-04mm: respect arch_dup_mmap() return valueNadav Amit1-2/+1
Commit d70f2a14b72a ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") ignored the return value of arch_dup_mmap(). As a result, on x86, a failure to duplicate the LDT (e.g. due to memory allocation error) would leave the duplicated memory mapping in an inconsistent state. Fix by using the return value, as it was before the change. Link: http://lkml.kernel.org/r/[email protected] Fixes: d70f2a14b72a4 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc") Signed-off-by: Nadav Amit <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-28/+36
Pull networking fixes from David Miller: 1) Must perform TXQ teardown before unregistering interfaces in mac80211, from Toke Høiland-Jørgensen. 2) Don't allow creating mac80211_hwsim with less than one channel, from Johannes Berg. 3) Division by zero in cfg80211, fix from Johannes Berg. 4) Fix endian issue in tipc, from Haiqing Bai. 5) BPF sockmap use-after-free fixes from Daniel Borkmann. 6) Spectre-v1 in mac80211_hwsim, from Jinbum Park. 7) Missing rhashtable_walk_exit() in tipc, from Cong Wang. 8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when kvzalloc() tries to use kmalloc() pages. From Eric Dumazet. 9) Fix deadlock in hv_netvsc, from Dexuan Cui. 10) Do not restart timewait timer on RST, from Florian Westphal. 11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev. 12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu. 13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai. 14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu. 15) Use-after-free in act_ife, fix from Cong Wang. 16) Missing hold to meta module in act_ife, from Vlad Buslov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits) net: phy: sfp: Handle unimplemented hwmon limits and alarms net: sched: action_ife: take reference to meta module act_ife: fix a potential use-after-free net/mlx5: Fix SQ offset in QPs with small RQ tipc: correct spelling errors for tipc_topsrv_queue_evt() comments tipc: correct spelling errors for struct tipc_bc_base's comment bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA. bnxt_en: Clean up unused functions. bnxt_en: Fix firmware signaled resource change logic in open. sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel sctp: fix invalid reference to the index variable of the iterator net/ibm/emac: wrong emac_calc_base call was used by typo net: sched: null actions array pointer before releasing action vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition r8169: add support for NCube 8168 network card ip6_tunnel: respect ttl inherit for ip6tnl mac80211: shorten the IBSS debug messages mac80211: don't Tx a deauth frame if the AP forbade Tx mac80211: Fix station bandwidth setting after channel switch mac80211: fix a race between restart and CSA flows ...
2018-09-02Merge tag 'dma-mapping-4.19-2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-2/+2
Pull dma-mapping fixes from Christoph Hellwig: "A few fixes for the fallout of being a little more pedantic about dma masks" * tag 'dma-mapping-4.19-2' of git://git.infradead.org/users/hch/dma-mapping: of/platform: initialise AMBA default DMA masks sparc: set a default 32-bit dma mask for OF devices kernel/dma/direct: take DMA offset into account in dma_direct_supported
2018-09-02bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULPJohn Fastabend1-1/+11
Currently we check sk_user_data is non NULL to determine if the sk exists in a map. However, this is not sufficient to ensure the psock or the ULP ops are not in use by another user, such as kcm or TLS. To avoid this when adding a sock to a map also verify it is of the correct ULP type. Additionally, when releasing a psock verify that it is the TCP_ULP_BPF type before releasing the ULP. The error case where we abort an update due to ULP collision can cause this error path. For example, __sock_map_ctx_update_elem() [...] err = tcp_set_ulp_id(sock, TCP_ULP_BPF) <- collides with TLS if (err) <- so err out here goto out_free [...] out_free: smap_release_sock() <- calling tcp_cleanup_ulp releases the TLS ULP incorrectly. Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-09-02Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds1-22/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Thomas Gleixner: "Remove the stale skip_onerr member from the hotplug states" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Remove skip_onerr field from cpuhp_step structure
2018-09-02Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds4-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A small set of updates for core code: - Prevent tracing in functions which are called from trace patching via stop_machine() to prevent executing half patched function trace entries. - Remove old GCC workarounds - Remove pointless includes of notifier.h" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Remove workaround for unreachable warnings from old GCC notifier: Remove notifier header file wherever not used watchdog: Mark watchdog touch functions as notrace
2018-09-01kernel/dma/direct: take DMA offset into account in dma_direct_supportedChristoph Hellwig1-2/+2
When a device has a DMA offset the dma capable result will change due to the difference between the physical and DMA address. Take that into account. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Benjamin Herrenschmidt <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-08-31cpu/hotplug: Remove skip_onerr field from cpuhp_step structureMukesh Ojha1-22/+4
When notifiers were there, `skip_onerr` was used to avoid calling particular step startup/teardown callbacks in the CPU up/down rollback path, which made the hotplug asymmetric. As notifiers are gone now after the full state machine conversion, the `skip_onerr` field is no longer required. Remove it from the structure and its usage. Signed-off-by: Mukesh Ojha <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-08-30notifier: Remove notifier header file wherever not usedMukesh Ojha1-1/+0
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: Mukesh Ojha <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-08-30watchdog: Mark watchdog touch functions as notraceVincent Whitchurch3-4/+4
Some architectures need to use stop_machine() to patch functions for ftrace, and the assumption is that the stopped CPUs do not make function calls to traceable functions when they are in the stopped state. Commit ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") added calls to the watchdog touch functions from the stopped CPUs and those functions lack notrace annotations. This leads to crashes when enabling/disabling ftrace on ARM kernels built with the Thumb-2 instruction set. Fix it by adding the necessary notrace annotations. Fixes: ce4f06dcbb5d ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE") Signed-off-by: Vincent Whitchurch <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-08-28bpf: sockmap, decrement copied count correctly in redirect error caseJohn Fastabend1-23/+22
Currently, when a redirect occurs in sockmap and an error occurs in the redirect call we unwind the scatterlist once in the error path of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then in the error path of sendmsg we decrement the copied count by the send size. However, its possible we partially sent data before the error was generated. This can happen if do_tcp_sendpages() partially sends the scatterlist before encountering a memory pressure error. If this happens we need to decrement the copied value (the value tracking how many bytes were actually sent to TCP stack) by the number of remaining bytes _not_ the entire send size. Otherwise we risk confusing userspace. Also we don't need two calls to free the scatterlist one is good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and then properly reduce copied by the number of remaining bytes which may in fact be the entire send size if no bytes were sent. To do this use bool to indicate if free_start_sg() should do mem accounting or not. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-27bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsgDaniel Borkmann1-3/+2
In bpf_tcp_recvmsg() we first took a reference on the psock, however once we find that there are skbs in the normal socket's receive queue we return with processing them through tcp_recvmsg(). Problem is that we leak the taken reference on the psock in that path. Given we don't really do anything with the psock at this point, move the skb_queue_empty() test before we fetch the psock to fix this case. Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-08-27bpf, sockmap: fix potential use after free in bpf_tcp_closeDaniel Borkmann1-1/+1
bpf_tcp_close() we pop the psock linkage to a map via psock_map_pop(). A parallel update on the sock hash map can happen between psock_map_pop() and lookup_elem_raw() where we override the element under link->hash / link->key. In bpf_tcp_close()'s lookup_elem_raw() we subsequently only test whether an element is present, but we do not test whether the element is infact the element we were looking for. We lock the sock in bpf_tcp_close() during that time, so do we hold the lock in sock_hash_update_elem(). However, the latter locks the sock which is newly updated, not the one we're purging from the hash table. This means that while one CPU is doing the lookup from bpf_tcp_close(), another CPU is doing the map update in parallel, dropped our sock from the hlist and released the psock. Subsequently the first CPU will find the new sock and attempts to drop and release the old sock yet another time. Fix is that we need to check the elements for a match after lookup, similar as we do in the sock map. Note that the hash tab elems are freed via RCU, so access to their link->hash / link->key is fine since we're under RCU read side there. Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-12/+22
Pull networking fixes from David Miller: 1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks. 2) Better fix for AB-BA deadlock in packet scheduler code, from Cong Wang. 3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel Borkmann. 4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent attackers using it as a side-channel. From Eric Dumazet. 5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva. 6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin Liu. 7) hns and hns3 bug fixes from Huazhong Tan. 8) Fix RIF leak in mlxsw driver, from Ido Schimmel. 9) iova range check fix in vhost, from Jason Wang. 10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend. 11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng. 12) Memory exposure in TCA_U32_SEL handling, from Kees Cook. 13) TCP BBR congestion control fixes from Kevin Yang. 14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger. 15) qed driver fixes from Tomer Tayar. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) net: sched: Fix memory exposure from short TCA_U32_SEL qed: fix spelling mistake "comparsion" -> "comparison" vhost: correctly check the iova range when waking virtqueue qlge: Fix netdev features configuration. net: macb: do not disable MDIO bus at open/close time Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency" net: macb: Fix regression breaking non-MDIO fixed-link PHYs mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge i40e: fix condition of WARN_ONCE for stat strings i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled ixgbe: fix driver behaviour after issuing VFLR ixgbe: Prevent unsupported configurations with XDP ixgbe: Replace GFP_ATOMIC with GFP_KERNEL igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback() igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init() igb: Use an advanced ctx descriptor for launchtime e1000: ensure to free old tx/rx rings in set_ringparam() e1000: check on netif_running() before calling e1000_up() ixgb: use dma_zalloc_coherent instead of allocator/memset ice: Trivial formatting fixes ...
2018-08-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-14/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Kernel: - Improve kallsyms coverage - Add x86 entry trampolines to kcore - Fix ARM SPE handling - Correct PPC event post processing Tools: - Make the build system more robust - Small fixes and enhancements all over the place - Update kernel ABI header copies - Preparatory work for converting libtraceevnt to a shared library - License cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy' tools arch x86: Update tools's copy of cpufeatures.h perf python: Fix pyrf_evlist__read_on_cpu() interface perf mmap: Store real cpu number in 'struct perf_mmap' perf tools: Remove ext from struct kmod_path perf tools: Add gzip_is_compressed function perf tools: Add lzma_is_compressed function perf tools: Add is_compressed callback to compressions array perf tools: Move the temp file processing into decompress_kmodule perf tools: Use compression id in decompress_kmodule() perf tools: Store compression id into struct dso perf tools: Add compression id into 'struct kmod_path' perf tools: Make is_supported_compression() static perf tools: Make decompress_to_file() function static perf tools: Get rid of dso__needs_decompress() call in __open_dso() perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble() perf tools: Get rid of dso__needs_decompress() call in read_object_code() tools lib traceevent: Change to SPDX License format perf llvm: Allow passing options to llc in addition to clang perf parser: Improve error message for PMU address filters ...
2018-08-26Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull licking update from Thomas Gleixner: "Mark the switch cases which fall through to the next case with the proper comment so the fallthrough compiler checks can be enabled" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Mark expected switch fall-throughs
2018-08-25Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of ↵Linus Torvalds1-1/+0
gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm memory-failure update from Dave Jiang: "As it stands, memory_failure() gets thoroughly confused by dev_pagemap backed mappings. The recovery code has specific enabling for several possible page states and needs new enabling to handle poison in dax mappings. In order to support reliable reverse mapping of user space addresses: 1/ Add new locking in the memory_failure() rmap path to prevent races that would typically be handled by the page lock. 2/ Since dev_pagemap pages are hidden from the page allocator and the "compound page" accounting machinery, add a mechanism to determine the size of the mapping that encompasses a given poisoned pfn. 3/ Given pmem errors can be repaired, change the speculatively accessed poison protection, mce_unmap_kpfn(), to be reversible and otherwise allow ongoing access from the kernel. A side effect of this enabling is that MADV_HWPOISON becomes usable for dax mappings, however the primary motivation is to allow the system to survive userspace consumption of hardware-poison via dax. Specifically the current behavior is: mce: Uncorrected hardware memory error in user-access at af34214200 {1}[Hardware Error]: It has been corrected by h/w and requires no further action mce: [Hardware Error]: Machine check events logged {1}[Hardware Error]: event severity: corrected Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users [..] Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed mce: Memory error not recovered <reboot> ...and with these changes: Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000 Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption Memory failure: 0x20cb00: recovery action for dax page: Recovered Given all the cross dependencies I propose taking this through nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax folks" * tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, pmem: Restore page attributes when clearing errors x86/memory_failure: Introduce {set, clear}_mce_nospec() x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses mm, memory_failure: Teach memory_failure() about dev_pagemap pages filesystem-dax: Introduce dax_lock_mapping_entry() mm, memory_failure: Collect mapping size in collect_procs() mm, madvise_inject_error: Let memory_failure() optionally take a page reference mm, dev_pagemap: Do not clear ->mapping on final put mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages filesystem-dax: Set page->index device-dax: Set page->index device-dax: Enable page_mapping() device-dax: Convert to vmf_insert_mixed and vm_fault_t
2018-08-24Merge branch 'for-4.19' of ↵Linus Torvalds3-7/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Just one commit from Steven to take out spin lock from trace event handlers" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/tracing: Move taking of spin lock out of trace event handlers
2018-08-24Merge branch 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-15/+30
Pull workqueue updates from Tejun Heo: "Over the lockdep cross-release churn, workqueue lost some of the existing annotations. Johannes Berg restored it and also improved them" * 'for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: re-add lockdep dependencies for flushing workqueue: skip lockdep wq dependency in cancel_work_sync()
2018-08-24Merge branch 'userns-linus' of ↵Linus Torvalds3-80/+80
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This is a set of four fairly obvious bug fixes: - a switch from d_find_alias to d_find_any_alias because the xattr code perversely takes a dentry - two mutex vs copy_to_user fixes from Jann Horn - a fix to use a sanitized size not the size userspace passed in from Christian Brauner" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: getxattr: use correct xattr length sys: don't hold uts_sem while accessing userspace memory userns: move user access out of the mutex cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
2018-08-23Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-2/+20
Merge yet more updates from Andrew Morton: - the rest of MM - various misc fixes and tweaks * emailed patches from Andrew Morton <[email protected]>: (22 commits) mm: Change return type int to vm_fault_t for fault handlers lib/fonts: convert comments to utf-8 s390: ebcdic: convert comments to UTF-8 treewide: convert ISO_8859-1 text comments to utf-8 drivers/gpu/drm/gma500/: change return type to vm_fault_t docs/core-api: mm-api: add section about GFP flags docs/mm: make GFP flags descriptions usable as kernel-doc docs/core-api: split memory management API to a separate file docs/core-api: move *{str,mem}dup* to "String Manipulation" docs/core-api: kill trailing whitespace in kernel-api.rst mm/util: add kernel-doc for kvfree mm/util: make strndup_user description a kernel-doc comment fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu builds treewide: correct "differenciate" and "instanciate" typos fs/afs: use new return type vm_fault_t drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t mm: soft-offline: close the race against page allocation mm: fix race on soft-offlining free huge pages namei: allow restricted O_CREAT of FIFOs and regular files hfs: prevent crash on exit from failed search ...
2018-08-23mm: Change return type int to vm_fault_t for fault handlersSouptick Joarder1-1/+1
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t") The aim is to change the return type of finish_fault() and handle_mm_fault() to vm_fault_t type. As part of that clean up return type of all other recursively called functions have been changed to vm_fault_t type. The places from where handle_mm_fault() is getting invoked will be change to vm_fault_t type but in a separate patch. vmf_error() is the newly introduce inline function in 4.17-rc6. [[email protected]: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()] Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-23treewide: convert ISO_8859-1 text comments to utf-8Arnd Bergmann1-1/+1
Almost all files in the kernel are either plain text or UTF-8 encoded. A couple however are ISO_8859-1, usually just a few characters in a C comments, for historic reasons. This converts them all to UTF-8 for consistency. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Simon Horman <[email protected]> [IPVS portion] Acked-by: Jonathan Cameron <[email protected]> [IIO] Acked-by: Michael Ellerman <[email protected]> [powerpc] Acked-by: Rob Herring <[email protected]> Cc: Joe Perches <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Samuel Ortiz <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Rob Herring <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-23namei: allow restricted O_CREAT of FIFOs and regular filesSalvatore Mesoraca1-0/+18
Disallows open of FIFOs or regular files not owned by the user in world writable sticky directories, unless the owner is the same as that of the directory or the file is opened without the O_CREAT flag. The purpose is to make data spoofing attacks harder. This protection can be turned on and off separately for FIFOs and regular files via sysctl, just like the symlinks/hardlinks protection. This patch is based on Openwall's "HARDEN_FIFO" feature by Solar Designer. This is a brief list of old vulnerabilities that could have been prevented by this feature, some of them even allow for privilege escalation: CVE-2000-1134 CVE-2007-3852 CVE-2008-0525 CVE-2009-0416 CVE-2011-4834 CVE-2015-1838 CVE-2015-7442 CVE-2016-7489 This list is not meant to be complete. It's difficult to track down all vulnerabilities of this kind because they were often reported without any mention of this particular attack vector. In fact, before hardlinks/symlinks restrictions, fifos/regular files weren't the favorite vehicle to exploit them. [[email protected]: fix bug reported by Dan Carpenter] Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda Link: http://lkml.kernel.org/r/[email protected] [[email protected]: drop pr_warn_ratelimited() in favor of audit changes in the future] [[email protected]: adjust commit subjet] Link: http://lkml.kernel.org/r/20180416175918.GA13494@beast Signed-off-by: Salvatore Mesoraca <[email protected]> Signed-off-by: Kees Cook <[email protected]> Suggested-by: Solar Designer <[email protected]> Suggested-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-23Merge tag 'fbdev-v4.19' of https://github.com/bzolnier/linuxLinus Torvalds1-0/+3
Pull fbdev updates from Bartlomiej Zolnierkiewicz: "Mostly small fixes and cleanups for fb drivers (the biggest updates are for udlfb and pxafb drivers). This also adds deferred console takeover support to the console code and efifb driver. Summary: - add support for deferred console takeover, when enabled defers fbcon taking over the console from the dummy console until the first text is displayed on the console - together with the "quiet" kernel commandline option this allows fbcon to still be used together with a smooth graphical bootup (Hans de Goede) - improve console locking debugging code (Thomas Zimmermann) - copy the ACPI BGRT boot graphics to the framebuffer when deferred console takeover support is used in efifb driver (Hans de Goede) - update udlfb driver - fix lost console when the user unplugs a USB adapter, fix the screen corruption issue, fix locking and add some performance optimizations (Mikulas Patocka) - update pxafb driver - fix using uninitialized memory, switch to devm_* API, handle initialization errors and add support for lcd-supply regulator (Daniel Mack) - add support for boards booted with a DeviceTree in pxa3xx_gcu driver (Daniel Mack) - rename omap2 module to omap2fb.ko to avoid conflicts with omap1 driver (Arnd Bergmann) - enable ACPI-based enumeration for goldfishfb driver (Yu Ning) - fix goldfishfb driver to make user space Android code use 60 fps (Christoffer Dall) - print big fat warning when nomodeset kernel parameter is used in vgacon driver (Lyude Paul) - remove VLA usage from fsl-diu-fb driver (Kees Cook) - misc fixes (Julia Lawall, Geert Uytterhoeven, Fredrik Noring, Yisheng Xie, Dan Carpenter, Daniel Vetter, Anton Vasilyev, Randy Dunlap, Gustavo A. R. Silva, Colin Ian King, Fengguang Wu) - misc cleanups (Roman Kiryanov, Yisheng Xie, Colin Ian King)" * tag 'fbdev-v4.19' of https://github.com/bzolnier/linux: (54 commits) Documentation/fb: corrections for fbcon.txt fbcon: Do not takeover the console from atomic context dummycon: Stop exporting dummycon_[un]register_output_notifier fbcon: Only defer console takeover if the current console driver is the dummycon fbcon: Only allow FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER if fbdev is builtin fbdev: omap2: omapfb: fix ifnullfree.cocci warnings fbdev: omap2: omapfb: fix bugon.cocci warnings fbdev: omap2: omapfb: fix boolreturn.cocci warnings fb: amifb: fix build warnings when not builtin fbdev/core: Disable console-lock warnings when fb.lockless_register_fb is set console: Replace #if 0 with atomic var 'ignore_console_lock_warning' udlfb: use spin_lock_irq instead of spin_lock_irqsave udlfb: avoid prefetch udlfb: optimization - test the backing buffer udlfb: allow reallocating the framebuffer udlfb: set line_length in dlfb_ops_set_par udlfb: handle allocation failure udlfb: set optimal write delay udlfb: make a local copy of fb_ops udlfb: don't switch if we are switching to the same videomode ...