aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-06mm/mmu_notifiers: use the right return code for WARN_ONJason Gunthorpe1-1/+1
The return code from the op callback is actually in _ret, while the WARN_ON was checking ret which causes it to misfire. Link: http://lkml.kernel.org/r/[email protected] Fixes: 8402ce61bec2 ("mm/mmu_notifiers: check if mmu notifier callbacks are allowed to fail") Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()Shuning Zhang1-44/+90
When the extent tree is modified, it should be protected by inode cluster lock and ip_alloc_sem. The extent tree is accessed and modified in the ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem. The following is a case. The function ocfs2_fiemap is accessing the extent tree, which is modified at the same time. kernel BUG at fs/ocfs2/extent_map.c:475! invalid opcode: 0000 [#1] SMP Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...] CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018 task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000 RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] Call Trace: ocfs2_fiemap+0x1e3/0x430 [ocfs2] do_vfs_ioctl+0x155/0x510 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x18/0xd8 Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f RIP ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] ---[ end trace c8aa0c8180e869dc ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This issue can be reproduced every week in a production environment. This issue is related to the usage mode. If others use ocfs2 in this mode, the kernel will panic frequently. [[email protected]: coding style fixes] [Fix new warning due to unused function by removing said function - Linus ] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shuning Zhang <[email protected]> Reviewed-by: Junxiao Bi <[email protected]> Reviewed-by: Gang He <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06mm: thp: handle page cache THP correctly in PageTransCompoundMapYang Shi3-7/+23
We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But the _mapcount < 0 check doesn't work for page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi <[email protected]> Reported-by: Gang Deng <[email protected]> Tested-by: Gang Deng <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06mm, meminit: recalculate pcpu batch and high limits after init completesMel Gorman1-2/+8
Deferred memory initialisation updates zone->managed_pages during the initialisation phase but before that finishes, the per-cpu page allocator (pcpu) calculates the number of pages allocated/freed in batches as well as the maximum number of pages allowed on a per-cpu list. As zone->managed_pages is not up to date yet, the pcpu initialisation calculates inappropriately low batch and high values. This increases zone lock contention quite severely in some cases with the degree of severity depending on how many CPUs share a local zone and the size of the zone. A private report indicated that kernel build times were excessive with extremely high system CPU usage. A perf profile indicated that a large chunk of time was lost on zone->lock contention. This patch recalculates the pcpu batch and high values after deferred initialisation completes for every populated zone in the system. It was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation workload -- allmodconfig and all available CPUs. mmtests configuration: config-workload-kernbench-max Configuration was modified to build on a fresh XFS partition. kernbench 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Duration User 39766.24 49221.79 Duration System 44298.10 13361.67 Duration Elapsed 519.11 388.87 The patch reduces system CPU usage by 69.86% and total build time by 26.65%. The variance of system CPU usage is also much reduced. Before, this was the breakdown of batch and high values over all zones was: 256 batch: 1 256 batch: 63 512 batch: 7 256 high: 0 256 high: 378 512 high: 42 512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After the patch: 256 batch: 1 768 batch: 63 256 high: 0 768 high: 378 [[email protected]: fix merge/linkage snafu] Link: http://lkml.kernel.org/r/[email protected]: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Qian Cai <[email protected]> Cc: <[email protected]> [4.1+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06mm/gup_benchmark: fix MAP_HUGETLB caseJohn Hubbard1-1/+1
The MAP_HUGETLB ("-H" option) of gup_benchmark fails: $ sudo ./gup_benchmark -H mmap: Invalid argument This is because gup_benchmark.c is passing in a file descriptor to mmap(), but the fd came from opening up the /dev/zero file. This confuses the mmap syscall implementation, which thinks that, if the caller did not specify MAP_ANONYMOUS, then the file must be a huge page file. So it attempts to verify that the file really is a huge page file, as you can see here: ksys_mmap_pgoff() { if (!(flags & MAP_ANONYMOUS)) { retval = -EINVAL; if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) goto out_fput; /* THIS IS WHERE WE END UP */ else if (flags & MAP_HUGETLB) { ...proceed normally, /dev/zero is ok here... ...and of course is_file_hugepages() returns "false" for the /dev/zero file. The problem is that the user space program, gup_benchmark.c, really just wants anonymous memory here. The simplest way to get that is to pass MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this patch does. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Cc: Keith Busch <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06mm: memcontrol: fix NULL-ptr deref in percpu stats flushShakeel Butt1-6/+6
__mem_cgroup_free() can be called on the failure path in mem_cgroup_alloc(). However memcg_flush_percpu_vmstats() and memcg_flush_percpu_vmevents() which are called from __mem_cgroup_free() access the fields of memcg which can potentially be null if called from failure path from mem_cgroup_alloc(). Indeed syzbot has reported the following crash: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 30393 Comm: syz-executor.1 Not tainted 5.4.0-rc2+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:memcg_flush_percpu_vmstats+0x4ae/0x930 mm/memcontrol.c:3436 Code: 05 41 89 c0 41 0f b6 04 24 41 38 c7 7c 08 84 c0 0f 85 5d 03 00 00 44 3b 05 33 d5 12 08 0f 83 e2 00 00 00 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 0f 85 91 03 00 00 48 8b 85 10 fe ff ff 48 8b b0 90 RSP: 0018:ffff888095c27980 EFLAGS: 00010206 RAX: 0000000000000012 RBX: ffff888095c27b28 RCX: ffffc90008192000 RDX: 0000000000040000 RSI: ffffffff8340fae7 RDI: 0000000000000007 RBP: ffff888095c27be0 R08: 0000000000000000 R09: ffffed1013f0da33 R10: ffffed1013f0da32 R11: ffff88809f86d197 R12: fffffbfff138b760 R13: dffffc0000000000 R14: 0000000000000090 R15: 0000000000000007 FS: 00007f5027170700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000710158 CR3: 00000000a7b18000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __mem_cgroup_free+0x1a/0x190 mm/memcontrol.c:5021 mem_cgroup_free mm/memcontrol.c:5033 [inline] mem_cgroup_css_alloc+0x3a1/0x1ae0 mm/memcontrol.c:5160 css_create kernel/cgroup/cgroup.c:5156 [inline] cgroup_apply_control_enable+0x44d/0xc40 kernel/cgroup/cgroup.c:3119 cgroup_mkdir+0x899/0x11b0 kernel/cgroup/cgroup.c:5401 kernfs_iop_mkdir+0x14d/0x1d0 fs/kernfs/dir.c:1124 vfs_mkdir+0x42e/0x670 fs/namei.c:3807 do_mkdirat+0x234/0x2a0 fs/namei.c:3830 __do_sys_mkdir fs/namei.c:3846 [inline] __se_sys_mkdir fs/namei.c:3844 [inline] __x64_sys_mkdir+0x5c/0x80 fs/namei.c:3844 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixing this by moving the flush to mem_cgroup_free as there is no need to flush anything if we see failure in mem_cgroup_alloc(). Link: http://lkml.kernel.org/r/[email protected] Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg") Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg") Signed-off-by: Shakeel Butt <[email protected]> Reported-by: [email protected] Reviewed-by: Roman Gushchin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06MAINTAINERS: update Cavium ThunderX2 maintainersJayachandran C2-1/+4
jnair is no longer at caviumnetworks.com (or at marvell.com). This also means that Cavium ThunderX2 will now be maintained by Robert. This is probably a good time to map various email addresses used for my patches to my personal email ID, update .mailmap to do this. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jayachandran C <[email protected]> Acked-by: Robert Richter <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
2019-11-06Merge tag 'stm32-dt-for-v5.4-fixes-2' of ↵Olof Johansson2-13/+4
git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into arm/fixes STM32 DT fixes for v5.4, round 2 Highlights: ----------- Fixes for STM32MP157: -Fix CAN RAM mapping -Change stmfx pinctrl definition for joystick and camera. Due to stmfx pinctrl fix done in v5.4-rc cycle, camera and joystick were no longer functional. * tag 'stm32-dt-for-v5.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32: ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1 ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1 ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157c ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Olof Johansson <[email protected]>
2019-11-06ASoC: SOF: topology: Fix bytes control size checksDragos Tarcatu1-5/+6
When using the example SOF amp widget topology, KASAN dumps this when the AMP bytes kcontrol gets loaded: [ 9.579548] BUG: KASAN: slab-out-of-bounds in sof_control_load+0x8cc/0xac0 [snd_sof] [ 9.588194] Write of size 40 at addr ffff8882314559dc by task systemd-udevd/2411 Fix that by rejecting the topology if the bytes data size > max_size Fixes: 311ce4fe7637d ("ASoC: SOF: Add support for loading topologies") Reviewed-by: Jaska Uimonen <[email protected]> Reviewed-by: Ranjani Sridharan <[email protected]> Signed-off-by: Dragos Tarcatu <[email protected]> Signed-off-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-11-06ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1Amelie Delaunay1-1/+0
Pins used for joystick are all configured as input. "push-pull" is not a valid setting for an input pin. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <[email protected]> Signed-off-by: Amelie Delaunay <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1Amelie Delaunay1-10/+2
"push-pull" configuration is now fully handled by the gpiolib and the STMFX pinctrl driver. There is no longer need to declare a pinctrl group to only configure "push-pull" setting for the line. It is done directly by the gpiolib. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <[email protected]> Signed-off-by: Amelie Delaunay <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157cChristophe Roullier1-2/+2
Split the 10Kbytes CAN message RAM to be able to use simultaneously FDCAN1 and FDCAN2 instances. First 5Kbytes are allocated to FDCAN1 and last 5Kbytes are used for FDCAN2. To do so, set the offset to 0x1400 in mram-cfg for FDCAN2. Fixes: d44d6e021301 ("ARM: dts: stm32: change CAN RAM mapping on stm32mp157c") Signed-off-by: Christophe Roullier <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157Patrice Chotard1-4/+4
Relax qspi pins slew-rate to minimize peak currents. Fixes: 844030057339 ("ARM: dts: stm32: add flash nor support on stm32mp157c eval board") Signed-off-by: Patrice Chotard <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-05Documentation: TLS: Add missing counter descriptionTariq Toukan1-0/+4
Add TLS TX counter description for the handshake retransmitted packets that triggers the resync procedure then skip it, going into the regular transmit flow. Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05NFC: fdp: fix incorrect free objectPan Bian1-1/+1
The address of fw_vsc_cfg is on stack. Releasing it with devm_kfree() is incorrect, which may result in a system crash or other security impacts. The expected object to free is *fw_vsc_cfg. Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05net: prevent load/store tearing on sk->sk_stampEric Dumazet1-2/+2
Add a couple of READ_ONCE() and WRITE_ONCE() to prevent load-tearing and store-tearing in sock_read_timestamp() and sock_write_timestamp() This might prevent another KCSAN report. Fixes: 3a0ed3e96197 ("sock: Make sock->sk_stamp thread-safe") Signed-off-by: Eric Dumazet <[email protected]> Cc: Deepa Dinamani <[email protected]> Acked-by: Deepa Dinamani <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05net: qualcomm: rmnet: Fix potential UAF when unregisteringSean Tranchetti1-2/+2
During the exit/unregistration process of the RmNet driver, the function rmnet_unregister_real_device() is called to handle freeing the driver's internal state and removing the RX handler on the underlying physical device. However, the order of operations this function performs is wrong and can lead to a use after free of the rmnet_port structure. Before calling netdev_rx_handler_unregister(), this port structure is freed with kfree(). If packets are received on any RmNet devices before synchronize_net() completes, they will attempt to use this already-freed port structure when processing the packet. As such, before cleaning up any other internal state, the RX handler must be unregistered in order to guarantee that no further packets will arrive on the device. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05net/tls: fix sk_msg trim on fallback to copy modeJakub Kicinski2-8/+21
sk_msg_trim() tries to only update curr pointer if it falls into the trimmed region. The logic, however, does not take into the account pointer wrapping that sk_msg_iter_var_prev() does nor (as John points out) the fact that msg->sg is a ring buffer. This means that when the message was trimmed completely, the new curr pointer would have the value of MAX_MSG_FRAGS - 1, which is neither smaller than any other value, nor would it actually be correct. Special case the trimming to 0 length a little bit and rework the comparison between curr and end to take into account wrapping. This bug caused the TLS code to not copy all of the message, if zero copy filled in fewer sg entries than memcopy would need. Big thanks to Alexander Potapenko for the non-KMSAN reproducer. v2: - take into account that msg->sg is a ring buffer (John). Link: https://lore.kernel.org/netdev/[email protected]/ (v1) Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Reported-by: [email protected] Reported-by: [email protected] Co-developed-by: John Fastabend <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05mlx4_core: fix wrong comment about the reason of subtract one from the max_cqesDotan Barak1-2/+1
The reason for the pre-allocation of one CQE is to enable resizing of the CQ. Fix comment accordingly. Signed-off-by: Dotan Barak <[email protected]> Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Vladimir Sokolovsky <[email protected]> Signed-off-by: Yuval Shaia <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05net: dsa: bcm_sf2: Fix driver removalFlorian Fainelli1-2/+2
With the DSA core doing the call to dsa_port_disable() we do not need to do that within the driver itself. This could cause an use after free since past dsa_unregister_switch() we should not be accessing any dsa_switch internal structures. Fixes: 0394a63acfe2 ("net: dsa: enable and disable all ports") Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05net: sched: prevent duplicate flower rules from tcf_proto destroy raceJohn Hurley2-4/+83
When a new filter is added to cls_api, the function tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to determine if the tcf_proto is duplicated in the chain's hashtable. It then creates a new entry or continues with an existing one. In cls_flower, this allows the function fl_ht_insert_unque to determine if a filter is a duplicate and reject appropriately, meaning that the duplicate will not be passed to drivers via the offload hooks. However, when a tcf_proto is destroyed it is removed from its chain before a hardware remove hook is hit. This can lead to a race whereby the driver has not received the remove message but duplicate flows can be accepted. This, in turn, can lead to the offload driver receiving incorrect duplicate flows and out of order add/delete messages. Prevent duplicates by utilising an approach suggested by Vlad Buslov. A hash table per block stores each unique chain/protocol/prio being destroyed. This entry is only removed when the full destroy (and hardware offload) has completed. If a new flow is being added with the same identiers as a tc_proto being detroyed, then the add request is replayed until the destroy is complete. Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution") Signed-off-by: John Hurley <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reported-by: Louis Peens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05net: hns3: Use the correct style for SPDX License IdentifierNishad Kamdar7-7/+7
This patch corrects the SPDX License Identifier style in header files related to Hisilicon network devices. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <[email protected]> Signed-off-by: Nishad Kamdar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05bonding: fix state transition issue in link monitoringJay Vosburgh2-24/+23
Since de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring"), the bonding driver has utilized two separate variables to indicate the next link state a particular slave should transition to. Each is used to communicate to a different portion of the link state change commit logic; one to the bond_miimon_commit function itself, and another to the state transition logic. Unfortunately, the two variables can become unsynchronized, resulting in incorrect link state transitions within bonding. This can cause slaves to become stuck in an incorrect link state until a subsequent carrier state transition. The issue occurs when a special case in bond_slave_netdev_event sets slave->link directly to BOND_LINK_FAIL. On the next pass through bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL case will set the proposed next state (link_new_state) to BOND_LINK_UP, but the new_link to BOND_LINK_DOWN. The setting of the final link state from new_link comes after that from link_new_state, and so the slave will end up incorrectly in _DOWN state. Resolve this by combining the two variables into one. Reported-by: Aleksei Zakharov <[email protected]> Reported-by: Sha Zhang <[email protected]> Cc: Mahesh Bandewar <[email protected]> Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Jay Vosburgh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller8-9/+35
Daniel Borkmann says: ==================== pull-request: bpf 2019-11-02 The following pull-request contains BPF updates for your *net* tree. We've added 6 non-merge commits during the last 6 day(s) which contain a total of 8 files changed, 35 insertions(+), 9 deletions(-). The main changes are: 1) Fix ppc BPF JIT's tail call implementation by performing a second pass to gather a stable JIT context before opcode emission, from Eric Dumazet. 2) Fix build of BPF samples sys_perf_event_open() usage to compiled out unavailable test_attr__{enabled,open} checks. Also fix potential overflows in bpf_map_{area_alloc,charge_init} on 32 bit archs, from Björn Töpel. 3) Fix narrow loads of bpf_sysctl context fields with offset > 0 on big endian archs like s390x and also improve the test coverage, from Ilya Leoshkevich. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-05Merge branch 'nvme-5.4-rc7' of git://git.infradead.org/nvme into for-linusJens Axboe3-0/+11
Pull NVMe fixes from Keith: "We have a few late nvme fixes for a couple device removal kernel crashes, and a compat fix for a new ioctl introduced during this merge window." * 'nvme-5.4-rc7' of git://git.infradead.org/nvme: nvme: change nvme_passthru_cmd64 to explicitly mark rsvd nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths nvme-rdma: fix a segmentation fault during module unload
2019-11-05taprio: fix panic while hw offload sched list swapIvan Khoronzhuk1-2/+3
Don't swap oper and admin schedules too early, it's not correct and causes crash. Steps to reproduce: 1) tc qdisc replace dev eth0 parent root handle 100 taprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 1@2 \ base-time $SOME_BASE_TIME \ sched-entry S 01 80000 \ sched-entry S 02 15000 \ sched-entry S 04 40000 \ flags 2 2) tc qdisc replace dev eth0 parent root handle 100 taprio \ base-time $SOME_BASE_TIME \ sched-entry S 01 90000 \ sched-entry S 02 20000 \ sched-entry S 04 40000 \ flags 2 3) tc qdisc replace dev eth0 parent root handle 100 taprio \ base-time $SOME_BASE_TIME \ sched-entry S 01 150000 \ sched-entry S 02 200000 \ sched-entry S 04 40000 \ flags 2 Do 2 3 2 .. steps more times if not happens and observe: [ 305.832319] Unable to handle kernel write to read-only memory at virtual address ffff0000087ce7f0 [ 305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted [ 305.919306] Hardware name: Texas Instruments AM654 Base Board (DT) [...] [ 306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80 [ 306.022422] Call trace: [ 306.024866] taprio_free_sched_cb+0x4c/0x98 [ 306.029040] rcu_process_callbacks+0x25c/0x410 [ 306.033476] __do_softirq+0x10c/0x208 [ 306.037132] irq_exit+0xb8/0xc8 [ 306.040267] __handle_domain_irq+0x64/0xb8 [ 306.044352] gic_handle_irq+0x7c/0x178 [ 306.048092] el1_irq+0xb0/0x128 [ 306.051227] arch_cpu_idle+0x10/0x18 [ 306.054795] do_idle+0x120/0x138 [ 306.058015] cpu_startup_entry+0x20/0x28 [ 306.061931] rest_init+0xcc/0xd8 [ 306.065154] start_kernel+0x3bc/0x3e4 [ 306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662) [ 306.074900] ---[ end trace 96c8e2284a9d9d6e ]--- [ 306.079507] Kernel panic - not syncing: Fatal exception in interrupt [ 306.085847] SMP: stopping secondary CPUs [ 306.089765] Kernel Offset: disabled Try to explain one of the possible crash cases: The "real" admin list is assigned when admin_sched is set to new_admin, it happens after "swap", that assigns to oper_sched NULL. Thus if call qdisc show it can crash. Farther, next second time, when sched list is updated, the admin_sched is not NULL and becomes the oper_sched, previous oper_sched was NULL so just skipped. But then admin_sched is assigned new_admin, but schedules to free previous assigned admin_sched (that already became oper_sched). Farther, next third time, when sched list is updated, while one more swap, oper_sched is not null, but it was happy to be freed already (while prev. admin update), so while try to free oper_sched the kernel panic happens at taprio_free_sched_cb(). So, move the "swap emulation" where it should be according to function comment from code. Fixes: 9c66d15646760e ("taprio: Add support for hardware offloading") Signed-off-by: Ivan Khoronzhuk <[email protected]> Acked-by: Vinicius Costa Gomes <[email protected]> Tested-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05Merge tag 'linux-can-fixes-for-5.4-20191105' of ↵David S. Miller23-138/+369
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2019-11-05 this is a pull request of 33 patches for net/master. In the first patch Wen Yang's patch adds a missing of_node_put() to CAN device infrastructure. Navid Emamdoost's patch for the gs_usb driver fixes a memory leak in the gs_can_open() error path. Johan Hovold provides two patches, one for the mcba_usb, the other for the usb_8dev driver. Both fix a use-after-free after USB-disconnect. Joakim Zhang's patch improves the flexcan driver, the ECC mechanism is now completely disabled instead of masking the interrupts. The next three patches all target the peak_usb driver. Stephane Grosjean's patch fixes a potential out-of-sync while decoding packets, Johan Hovold's patch fixes a slab info leak, Jeroen Hofstee's patch adds missing reporting of bus off recovery events. Followed by three patches for the c_can driver. Kurt Van Dijck's patch fixes detection of potential missing status IRQs, Jeroen Hofstee's patches add a chip reset on open and add missing reporting of bus off recovery events. Appana Durga Kedareswara rao's patch for the xilinx driver fixes the flags field initialization for axi CAN. The next seven patches target the rx-offload helper, they are by me and Jeroen Hofstee. The error handling in case of a queue overflow is fixed removing a memory leak. Further the error handling in case of queue overflow and skb OOM is cleaned up. The next two patches are by me and target the flexcan and ti_hecc driver. In case of a error during can_rx_offload_queue_sorted() the error counters in the drivers are incremented. Jeroen Hofstee provides 6 patches for the ti_hecc driver, which properly stop the device in ifdown, improve the rx-offload support (which hit mainline in v5.4-rc1), and add missing FIFO overflow and state change reporting. The following four patches target the j1939 protocol. Colin Ian King's patch fixes a memory leak in the j1939_sk_errqueue() handling. Three patches by Oleksij Rempel fix a memory leak on socket release and fix the EOMA packet in the transport protocol. Timo Schlüßler's patch fixes a potential race condition in the mcp251x driver on after suspend. The last patch is by Yegor Yefremov and updates the SPDX-License-Identifier to v3.0. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-06nvme: change nvme_passthru_cmd64 to explicitly mark rsvdCharles Machalow1-0/+1
Changing nvme_passthru_cmd64 to add a field: rsvd2. This field is an explicit marker for the padding space added on certain platforms as a result of the enlargement of the result field from 32 bit to 64 bits in size, and fixes differences in struct size when using compat ioctl for 32-bit binaries on 64-bit architecture. Fixes: 65e68edce0db ("nvme: allow 64-bit results in passthru commands") Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Charles Machalow <[email protected]> [changelog] Signed-off-by: Keith Busch <[email protected]>
2019-11-05ALSA: hda: hdmi - add Tigerlake supportKai Vehmanen1-0/+13
Add Tigerlake HDMI codec support. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205379 BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=112171 Cc: Pan Xiuli <[email protected]> Signed-off-by: Kai Vehmanen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2019-11-05ASoC: max98373: replace gpio_request with devm_gpio_requestYong Zhi1-2/+2
Use devm_gpio_request() to automatic unroll when fails and avoid resource leaks at error paths. Signed-off-by: Yong Zhi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-11-05ASoC: stm32: sai: add restriction on mmap supportOlivier Moysan1-1/+11
Do not support mmap in S/PDIF mode. In S/PDIF mode the buffer has to be copied, to allow the channel status bits insertion. Signed-off-by: Olivier Moysan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-11-05Merge tag 'for-linus-2019-11-05' of ↵Linus Torvalds2-1/+36
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull clone3 stack argument update from Christian Brauner: "This changes clone3() to do basic stack validation and to set up the stack depending on whether or not it is growing up or down. With clone3() the expectation is now very simply that the .stack argument points to the lowest address of the stack and that .stack_size specifies the initial stack size. This is diferent from legacy clone() where the "stack" argument had to point to the lowest or highest address of the stack depending on the architecture. clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that changing clone3() to determine stack direction and doing basic validation is the right course of action. Note, this is a potentially user visible change. In the very unlikely case, that it breaks someone's use-case we will revert. (And then e.g. place the new behavior under an appropriate flag.) Note that passing an empty stack will continue working just as before. Breaking someone's use-case is very unlikely. Neither glibc nor musl currently expose a wrapper for clone3(). There is currently also no real motivation for anyone to use clone3() directly. First, because using clone{3}() with stacks requires some assembly (see glibc and musl). Second, because it does not provide features that legacy clone() doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. I did a codesearch on https://codesearch.debian.net, github, and gitlab and could not find any software currently relying directly on clone3(). I expect this to change once we land CLONE_CLEAR_SIGHAND which was a request coming from glibc at which point they'll likely start using it" * tag 'for-linus-2019-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: clone3: validate stack arguments
2019-11-05Merge tag 'gpio-v5.4-4' of ↵Linus Torvalds2-21/+18
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "More GPIO fixes! We found a late regression in the Intel Merrifield driver. Oh well. We fixed it up. - Fix a build error in the tools used for kselftest - A series of reverts to bring the Intel Merrifield back to working. We will likely unrevert the reverts for v5.5 but we can't have v5.4 broken" * tag 'gpio-v5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: Revert "gpio: merrifield: Pass irqchip when adding gpiochip" Revert "gpio: merrifield: Restore use of irq_base" Revert "gpio: merrifield: Move hardware initialization to callback" tools: gpio: Use !building_out_of_srctree to determine srctree
2019-11-05watchdog: bd70528: Add MODULE_ALIAS to allow module auto loadingMatti Vaittinen1-0/+1
The bd70528 watchdog driver is probed by MFD driver. Add MODULE_ALIAS in order to allow udev to load the module when MFD sub-device cell for watchdog is added. Fixes: bbc88a0ec9f37 ("watchdog: bd70528: Initial support for ROHM BD70528 watchdog block") Signed-off-by: Matti Vaittinen <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2019-11-05watchdog: imx_sc_wdt: Pretimeout should follow SCU firmware formatAnson Huang1-1/+7
SCU firmware calculates pretimeout based on current time stamp instead of watchdog timeout stamp, need to convert the pretimeout to SCU firmware's timeout value. Fixes: 15f7d7fc5542 ("watchdog: imx_sc: Add pretimeout support") Signed-off-by: Anson Huang <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2019-11-05watchdog: meson: Fix the wrong value of left timeXingyu Chen1-2/+2
The left time value is wrong when we get it by sysfs. The left time value should be equal to preset timeout value minus elapsed time value. According to the Meson-GXB/GXL datasheets which can be found at [0], the timeout value is saved to BIT[0-15] of the WATCHDOG_TCNT, and elapsed time value is saved to BIT[16-31] of the WATCHDOG_TCNT. [0]: http://linux-meson.com Fixes: 683fa50f0e18 ("watchdog: Add Meson GXBB Watchdog Driver") Signed-off-by: Xingyu Chen <[email protected]> Acked-by: Neil Armstrong <[email protected]> Reviewed-by: Kevin Hilman <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2019-11-05watchdog: pm8916_wdt: fix pretimeout registration flowJorge Ramirez-Ortiz1-4/+11
When an IRQ is present in the dts, the probe function shall fail if the interrupt can not be registered. The probe function shall also be retried if getting the irq is being deferred. Signed-off-by: Jorge Ramirez-Ortiz <[email protected]> Reviewed-by: Loic Poulain <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2019-11-05watchdog: cpwd: fix build regressionArnd Bergmann1-1/+7
The compat_ptr_ioctl() infrastructure did not make it into linux-5.4, so cpwd now fails to build. Fix it by using an open-coded version. Fixes: 68f28b01fb9e ("watchdog: cpwd: use generic compat_ptr_ioctl") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]>
2019-11-06nvme-multipath: fix crash in nvme_mpath_clear_ctrl_pathsAnton Eidelman1-0/+2
nvme_mpath_clear_ctrl_paths() iterates through the ctrl->namespaces list while holding ctrl->scan_lock. This does not seem to be the correct way of protecting from concurrent list modification. Specifically, nvme_scan_work() sorts ctrl->namespaces AFTER unlocking scan_lock. This may result in the following (rare) crash in ctrl disconnect during scan_work: BUG: kernel NULL pointer dereference, address: 0000000000000050 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core] ... Call Trace: nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core] nvme_remove_namespaces+0x35/0xe0 [nvme_core] nvme_do_delete_ctrl+0x47/0x90 [nvme_core] nvme_sysfs_delete+0x49/0x60 [nvme_core] dev_attr_store+0x17/0x30 sysfs_kf_write+0x3e/0x50 kernfs_fop_write+0x11e/0x1a0 __vfs_write+0x1b/0x40 vfs_write+0xb9/0x1a0 ksys_write+0x67/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x5a/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f8d02bfb154 Fix: After taking scan_lock in nvme_mpath_clear_ctrl_paths() down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe. This will not cause deadlocks because taking scan_lock never happens while holding the namespaces_rwsem. Moreover, scan work downs namespaces_rwsem in the same order. Alternative: sort ctrl->namespaces in nvme_scan_work() while still holding the scan_lock. This would leave nvme_mpath_clear_ctrl_paths() without correct protection against ctrl->namespaces modification by anyone other than scan_work. Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Anton Eidelman <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2019-11-06nvme-rdma: fix a segmentation fault during module unloadMax Gurtovoy1-0/+8
In case there are controllers that are not associated with any RDMA device (e.g. during unsuccessful reconnection) and the user will unload the module, these controllers will not be freed and will access already freed memory. The same logic appears in other fabric drivers as well. Fixes: 87fd125344d6 ("nvme-rdma: remove redundant reference between ib_device and tagset") Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Max Gurtovoy <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2019-11-05clone3: validate stack argumentsChristian Brauner2-1/+36
Validate the stack arguments and setup the stack depening on whether or not it is growing down or up. Legacy clone() required userspace to know in which direction the stack is growing and pass down the stack pointer appropriately. To make things more confusing microblaze uses a variant of the clone() syscall selected by CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument. IA64 has a separate clone2() syscall which also takes an additional stack_size argument. Finally, parisc has a stack that is growing upwards. Userspace therefore has a lot nasty code like the following: #define __STACK_SIZE (8 * 1024 * 1024) pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { pid_t ret; void *stack; stack = malloc(__STACK_SIZE); if (!stack) return -ENOMEM; #ifdef __ia64__ ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #elif defined(__parisc__) /* stack grows up */ ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd); #else ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #endif return ret; } or even crazier variants such as [3]. With clone3() we have the ability to validate the stack. We can check that when stack_size is passed, the stack pointer is valid and the other way around. We can also check that the memory area userspace gave us is fine to use via access_ok(). Furthermore, we probably should not require userspace to know in which direction the stack is growing. It is easy for us to do this in the kernel and I couldn't find the original reasoning behind exposing this detail to userspace. /* Intentional user visible API change */ clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that trying to change clone3() to setup the stack instead of requiring userspace to do this is the right course of action. Note, that this is an explicit change in user visible behavior we introduce with this patch. If it breaks someone's use-case we will revert! (And then e.g. place the new behavior under an appropriate flag.) Breaking someone's use-case is very unlikely though. First, neither glibc nor musl currently expose a wrapper for clone3(). Second, there is no real motivation for anyone to use clone3() directly since it does not provide features that legacy clone doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. Searches on [4] did not reveal any packages calling clone3(). [1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17TA@mail.gmail.com [2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein [3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/raw-clone.h#L31 [4]: https://codesearch.debian.net Fixes: 7f192e3cd316 ("fork: add clone3") Cc: Kees Cook <[email protected]> Cc: Jann Horn <[email protected]> Cc: David Howells <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: <[email protected]> # 5.3 Cc: GNU C Library <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Aleksa Sarai <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-05ceph: don't allow copy_file_range when stripe_count != 1Luis Henriques1-2/+10
copy_file_range tries to use the OSD 'copy-from' operation, which simply performs a full object copy. Unfortunately, the implementation of this system call assumes that stripe_count is always set to 1 and doesn't take into account that the data may be striped across an object set. If the file layout has stripe_count different from 1, then the destination file data will be corrupted. For example: Consider a 8 MiB file with 4 MiB object size, stripe_count of 2 and stripe_size of 2 MiB; the first half of the file will be filled with 'A's and the second half will be filled with 'B's: 0 4M 8M Obj1 Obj2 +------+------+ +----+ +----+ file: | AAAA | BBBB | | AA | | AA | +------+------+ |----| |----| | BB | | BB | +----+ +----+ If we copy_file_range this file into a new file (which needs to have the same file layout!), then it will start by copying the object starting at file offset 0 (Obj1). And then it will copy the object starting at file offset 4M -- which is Obj1 again. Unfortunately, the solution for this is to not allow remote object copies to be performed when the file layout stripe_count is not 1 and simply fallback to the default (VFS) copy_file_range implementation. Cc: [email protected] Signed-off-by: Luis Henriques <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2019-11-05ceph: don't try to handle hashed dentries in non-O_CREAT atomic_openJeff Layton1-0/+3
If ceph_atomic_open is handed a !d_in_lookup dentry, then that means that it already passed d_revalidate so we *know* that it's negative (or at least was very recently). Just return -ENOENT in that case. This also addresses a subtle bug in dentry handling. Non-O_CREAT opens call atomic_open with the parent's i_rwsem shared, but calling d_splice_alias on a hashed dentry requires the exclusive lock. If ceph_atomic_open receives a hashed, negative dentry on a non-O_CREAT open, and another client were to race in and create the file before we issue our OPEN, ceph_fill_trace could end up calling d_splice_alias on the dentry with the new inode with insufficient locks. Cc: [email protected] Reported-by: Al Viro <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2019-11-05ALSA: hda/ca0132 - Fix possible workqueue stallTakashi Iwai1-1/+1
The unsolicited event handler for the headphone jack on CA0132 codec driver tries to reschedule the another delayed work with cancel_delayed_work_sync(). It's no good idea, unfortunately, especially after we changed the work queue to the standard global one; this may lead to a stall because both works are using the same global queue. Fix it by dropping the _sync but does call cancel_delayed_work() instead. Fixes: 993884f6a26c ("ALSA: hda/ca0132 - Delay HP amp turnon.") BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1155836 Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2019-11-05scripts/nsdeps: make sure to pass all module source files to spatchJessica Yu1-3/+3
The nsdeps script passes a list of the module source files to generate_deps_for_ns() as a space delimited string named $mod_source_files, which then passes it to spatch. But since $mod_source_files is not encased in quotes, each source file in that string is treated as a separate shell function argument (as $2, $3, $4, etc.). However, the spatch invocation only refers to $2, so only the first file out of $mod_source_files is processed by spatch. This causes problems (namely, the MODULE_IMPORT_NS() statement doesn't get inserted) when a module is composed of many source files and the "main" module file containing the MODULE_LICENSE() statement is not the first file listed in $mod_source_files. Fix this by encasing $mod_source_files in quotes so that the entirety of the string is treated as a single argument and can be referred to as $2. In addition, put quotes in the variable assignment of mod_source_files to prevent any shell interpretation and field splitting. Reviewed-by: Masahiro Yamada <[email protected]> Acked-by: Matthias Maennich <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2019-11-05perf tools: Fix time sortingJiri Olsa1-1/+1
The final sort might get confused when the comparison is done over bigger numbers than int like for -s time. Check the following report for longer workloads: $ perf report -s time -F time,overhead --stdio Fix hist_entry__sort() to properly return int64_t and not possible cut int. Fixes: 043ca389a318 ("perf tools: Use hpp formats to sort final output") Signed-off-by: Jiri Olsa <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] # v3.16+ Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-05can: don't use deprecated license identifiersYegor Yefremov8-8/+8
The "GPL-2.0" license identifier changed to "GPL-2.0-only" in SPDX v3.0. Signed-off-by: Yegor Yefremov <[email protected]> Acked-by: Oliver Hartkopp <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2019-11-05can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race ↵Timo Schlüßler1-1/+1
condition In mcp251x_restart_work_handler() the variable to stop the interrupt handler (priv->force_quit) is reset after the chip is restarted and thus a interrupt might occur. This patch fixes the potential race condition by resetting force_quit before enabling interrupts. Signed-off-by: Timo Schlüßler <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2019-11-05perf tools: Remove unused trace_find_next_event()Steven Rostedt (VMware)2-33/+0
trace_find_next_event() was buggy and pretty much a useless helper. As there are no more users, just remove it. Signed-off-by: Steven Rostedt (VMware) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-05perf scripting engines: Iterate on tep event arrays directlySteven Rostedt (VMware)2-4/+13
Instead of calling a useless (and broken) helper function to get the next event of a tep event array, just get the array directly and iterate over it. Note, the broken part was from trace_find_next_event() which after this will no longer be used, and can be removed. Committer notes: This fixes a segfault when generating python scripts from perf.data files with multiple tracepoint events, i.e. the following use case is fixed by this patch: # perf record -e sched:* sleep 1 [ perf record: Woken up 31 times to write data ] [ perf record: Captured and wrote 0.031 MB perf.data (9 samples) ] # perf script -g python Segmentation fault (core dumped) # Reported-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>