aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-05-29net/mlx5e: Fix MLX5_TC_CT dependenciesVlad Buslov1-1/+1
Change MLX5_TC_CT config dependencies to include MLX5_ESWITCH instead of MLX5_CORE_EN && NET_SWITCHDEV, which are already required by MLX5_ESWITCH. Without this change mlx5 fails to compile if user disables MLX5_ESWITCH without also manually disabling MLX5_TC_CT. Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-05-29net/mlx5e: Properly set default values when disabling adaptive moderationTal Gilboa3-14/+37
Add a call to mlx5e_reset_rx/tx_moderation() when enabling/disabling adaptive moderation, in order to select the proper default values. In order to do so, we separate the logic of selecting the moderation values and setting moderion mode (CQE/EQE based). Fixes: 0088cbbc4b66 ("net/mlx5e: Enable CQE based moderation on TX CQ") Fixes: 9908aa292971 ("net/mlx5e: CQE based moderation") Signed-off-by: Tal Gilboa <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-05-29net/mlx5e: Fix arch depending casting issue in FECAya Levin2-20/+24
Change type of active_fec to u32 to match the type expected by mlx5e_get_fec_mode. Copy active_fec and configured_fec values to unsigned long before preforming bitwise manipulations. Take the same approach when configuring FEC over 50G link modes: copy the policy into an unsigned long and only than preform bitwise operations. Fixes: 2132b71f78d2 ("net/mlx5e: Advertise globaly supported FEC modes") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-05-29net/mlx5e: Remove warning "devices are not on same switch HW"Maor Dickman1-4/+0
On tunnel decap rule insertion, the indirect mechanism will attempt to offload the rule on all uplink representors which will trigger the "devices are not on same switch HW, can't offload forwarding" message for the uplink which isn't on the same switch HW as the VF representor. The above flow is valid and shouldn't cause warning message, fix by removing the warning and only report this flow using extack. Fixes: 321348475d54 ("net/mlx5e: Fix allowed tc redirect merged eswitch offload cases") Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-05-29net/mlx5e: Fix stats update for matchall classifierRoi Dayan1-1/+1
It's bytes, packets, lastused. Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-05-29net/mlx5: Fix crash upon suspend/resumeMark Bloch1-0/+18
Currently a Linux system with the mlx5 NIC always crashes upon hibernation - suspend/resume. Add basic callbacks so the NIC could be suspended and resumed. Fixes: 9603b61de1ee ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Tested-by: Dexuan Cui <[email protected]> Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-05-29Merge branch 'master' of ↵David S. Miller11-44/+104
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-05-29 1) Several fixes for ESP gro/gso in transport and beet mode when IPv6 extension headers are present. From Xin Long. 2) Fix a wrong comment on XFRMA_OFFLOAD_DEV. From Antony Antony. 3) Fix sk_destruct callback handling on ESP in TCP encapsulation. From Sabrina Dubroca. 4) Fix a use after free in xfrm_output_gso when used with vxlan. From Xin Long. 5) Fix secpath handling of VTI when used wiuth IPCOMP. From Xin Long. 6) Fix an oops when deleting a x-netns xfrm interface. From Nicolas Dichtel. 7) Fix a possible warning on policy updates. We had a case where it was possible to add two policies with the same lookup keys. From Xin Long. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-05-29Merge tag 'drm-fixes-2020-05-29-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds3-10/+5
Pull drm fixes from Dave Airlie: "A couple of amdgpu fixes and minor ingenic fixes: amdgpu: - display atomic test fix - Fix soft hang in display vupdate code ingenic: - fix pointer cast - fix crtc atomic check callback" * tag 'drm-fixes-2020-05-29-1' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Fix potential integer wraparound resulting in a hang drm/amd/display: drop cursor position check in atomic test gpu/drm: Ingenic: Fix opaque pointer casted to wrong type gpu/drm: ingenic: Fix bogus crtc_atomic_check callback
2020-05-29gfs2: Even more gfs2_find_jhead fixesAndreas Gruenbacher1-10/+5
Fix several issues in the previous gfs2_find_jhead fix: * When updating @blocks_submitted, @block refers to the first block block not submitted yet, not the last block submitted, so fix an off-by-one error. * We want to ensure that @blocks_submitted is far enough ahead of @blocks_read to guarantee that there is in-flight I/O. Otherwise, we'll eventually end up waiting for pages that haven't been submitted, yet. * It's much easier to compare the number of blocks added with the number of blocks submitted to limit the maximum bio size. * Even with bio chaining, we can keep adding blocks until we reach the maximum bio size, as long as we stop at a page boundary. This simplifies the logic. Signed-off-by: Andreas Gruenbacher <[email protected]> Reviewed-by: Bob Peterson <[email protected]>
2020-05-29parisc: Fix kernel panic in mem_init()Helge Deller1-1/+1
The Debian kernel v5.6 triggers this kernel panic: Kernel panic - not syncing: Bad Address (null pointer deref?) Bad Address (null pointer deref?): Code=26 (Data memory access rights trap) at addr 0000000000000000 CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-2-parisc64 #1 Debian 5.6.14-1 IAOQ[0]: mem_init+0xb0/0x150 IAOQ[1]: mem_init+0xb4/0x150 RP(r2): start_kernel+0x6c8/0x1190 Backtrace: [<0000000040101ab4>] start_kernel+0x6c8/0x1190 [<0000000040108574>] start_parisc+0x158/0x1b8 on a HP-PARISC rp3440 machine with this memory layout: Memory Ranges: 0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB 1) Start 0x0000004040000000 End 0x00000040ffdfffff Size 3070 MB Fix the crash by avoiding virt_to_page() and similar functions in mem_init() until the memory zones have been fully set up. Signed-off-by: Helge Deller <[email protected]> Cc: [email protected] # v5.0+
2020-05-29iommu: Fix reference count leak in iommu_group_alloc.Qiushi Wu1-1/+1
kobject_init_and_add() takes reference even when it fails. Thus, when kobject_init_and_add() returns an error, kobject_put() must be called to properly clean up the kobject. Fixes: d72e31c93746 ("iommu: IOMMU Groups") Signed-off-by: Qiushi Wu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2020-05-29gpio: fix locking open drain IRQ linesLinus Walleij1-2/+9
We provided the right semantics on open drain lines being by definition output but incidentally the irq set up function would only allow IRQs on lines that were "not output". Fix the semantics to allow output open drain lines to be used for IRQs. Reported-by: Hans Verkuil <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Tested-by: Hans Verkuil <[email protected]> Cc: Russell King <[email protected]> Cc: [email protected] # v5.3+ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2020-05-29powerpc/64s: Disable sanitisers for C syscall/interrupt entry/exit codeDaniel Axtens1-0/+3
syzkaller is picking up a bunch of crashes that look like this: Unrecoverable exception 380 at c00000000037ed60 (msr=8000000000001031) Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0 NIP: c00000000037ed60 LR: c00000000004bac8 CTR: c000000000030990 REGS: c0000000555a7230 TRAP: 0380 Not tainted (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e) MSR: 8000000000001031 <SF,ME,IR,DR,LE> CR: 48222882 XER: 20000000 CFAR: c00000000004bac4 IRQMASK: 0 GPR00: c00000000004bb68 c0000000555a74c0 c0000000024b3500 0000000000000005 GPR04: 0000000000000000 0000000000000000 c00000000004bb88 c008000000910000 GPR08: 00000000000b0000 c00000000004bac8 0000000000016000 c000000002503500 GPR12: c000000000030990 c000000003190000 00000000106a5898 00000000106a0000 GPR16: 00000000106a5890 c000000007a92000 c000000008180e00 c000000007a8f700 GPR20: c000000007a904b0 0000000010110000 c00000000259d318 5deadbeef0000100 GPR24: 5deadbeef0000122 c000000078422700 c000000009ee88b8 c000000078422778 GPR28: 0000000000000001 800000000280b033 0000000000000000 c0000000555a75a0 NIP [c00000000037ed60] __sanitizer_cov_trace_pc+0x40/0x50 LR [c00000000004bac8] interrupt_exit_kernel_prepare+0x118/0x310 Call Trace: [c0000000555a74c0] [c00000000004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable) [c0000000555a7530] [c00000000000f9a8] interrupt_return+0x118/0x1c0 --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50 ...<random previous call chain>... This is caused by __sanitizer_cov_trace_pc() causing an SLB fault after MSR[RI] has been cleared by __hard_EE_RI_disable(), which we can not recover from. Do not instrument the new syscall/interrupt entry/exit code with KCOV, GCOV or UBSAN. Reported-by: syzbot-ppc64 <[email protected]> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Daniel Axtens <[email protected]> Acked-by: Andrew Donnellan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2020-05-29xfrm: fix a NULL-ptr deref in xfrm_local_errorXin Long1-1/+2
This patch is to fix a crash: [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access [ ] general protection fault: 0000 [#1] SMP KASAN PTI [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0 [ ] Call Trace: [ ] xfrm6_local_error+0x1eb/0x300 [ ] xfrm_local_error+0x95/0x130 [ ] __xfrm6_output+0x65f/0xb50 [ ] xfrm6_output+0x106/0x46f [ ] udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel] [ ] vxlan_xmit_one+0xbc6/0x2c60 [vxlan] [ ] vxlan_xmit+0x6a0/0x4276 [vxlan] [ ] dev_hard_start_xmit+0x165/0x820 [ ] __dev_queue_xmit+0x1ff0/0x2b90 [ ] ip_finish_output2+0xd3e/0x1480 [ ] ip_do_fragment+0x182d/0x2210 [ ] ip_output+0x1d0/0x510 [ ] ip_send_skb+0x37/0xa0 [ ] raw_sendmsg+0x1b4c/0x2b80 [ ] sock_sendmsg+0xc0/0x110 This occurred when sending a v4 skb over vxlan6 over ipsec, in which case skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries to get ipv6 info from a ipv4 sk. This issue was actually fixed by Commit 628e341f319f ("xfrm: make local error reporting more robust"), but brought back by Commit 844d48746e4b ("xfrm: choose protocol family by skb protocol"). So to fix it, we should call xfrm6_local_error() only when skb->protocol is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6. Fixes: 844d48746e4b ("xfrm: choose protocol family by skb protocol") Reported-by: Xiumei Mu <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2020-05-29Merge branch 'fixes' of ↵Ingo Molnar1-38/+48
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into x86/urgent Pick up FPU register dump fixes from Al Viro. Signed-off-by: Ingo Molnar <[email protected]>
2020-05-29Merge tag 'drm-misc-fixes-2020-05-28' of ↵Dave Airlie1-3/+3
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Two ingenic fixes, one for a wrong cast, the other for a typo in a comparison Signed-off-by: Dave Airlie <[email protected]> From: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-28net: be more gentle about silly gso requests coming from userEric Dumazet1-8/+9
Recent change in virtio_net_hdr_to_skb() broke some packetdrill tests. When --mss=XXX option is set, packetdrill always provide gso_type & gso_size for its inbound packets, regardless of packet size. if (packet->tcp && packet->mss) { if (packet->ipv4) gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; else gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; gso.gso_size = packet->mss; } Since many other programs could do the same, relax virtio_net_hdr_to_skb() to no longer return an error, but instead ignore gso settings. This keeps Willem intent to make sure no malicious packet could reach gso stack. Note that TCP stack has a special logic in tcp_set_skb_tso_segs() to clear gso_size for small packets. Fixes: 6dd912f82680 ("net: check untrusted gso_size at kernel entry") Signed-off-by: Eric Dumazet <[email protected]> Cc: Willem de Bruijn <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-28Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-4/+19
Merge misc fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <[email protected]>: include/asm-generic/topology.h: guard cpumask_of_node() macro argument fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info() mm: remove VM_BUG_ON(PageSlab()) from page_mapcount() mm,thp: stop leaking unreleased file pages mm/z3fold: silence kmemleak false positives of slots
2020-05-28sctp: check assoc before SCTP_ADDR_{MADE_PRIM, ADDED} eventJonas Falkevik1-0/+3
Make sure SCTP_ADDR_{MADE_PRIM,ADDED} are sent only for associations that have been established. These events are described in rfc6458#section-6.1 SCTP_PEER_ADDR_CHANGE: This tag indicates that an address that is part of an existing association has experienced a change of state (e.g., a failure or return to service of the reachability of an endpoint via a specific transport address). Signed-off-by: Jonas Falkevik <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Reviewed-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-28Merge branch 'for-linus' of ↵Linus Torvalds13-72/+88
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a few random driver fixups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics - add a second working PNP_ID for Lenovo T470s Input: applespi - replace zero-length array with flexible-array Input: axp20x-pek - always register interrupt handlers Input: lm8333 - update contact email Input: synaptics-rmi4 - fix error return code in rmi_driver_probe() Input: synaptics-rmi4 - really fix attn_data use-after-free Input: i8042 - add ThinkPad S230u to i8042 reset list Revert "Input: i8042 - add ThinkPad S230u to i8042 nomux list" Input: dlink-dir685-touchkeys - fix a typo in driver name Input: xpad - add custom init packet for Xbox One S controllers Input: evdev - call input_flush_device() on release(), not flush() Input: i8042 - add ThinkPad S230u to i8042 nomux list Input: usbtouchscreen - add support for BonXeon TP Input: cros_ec_keyb - use cros_ec_cmd_xfer_status helper Input: mms114 - fix handling of mms345l Input: elants_i2c - support palm detection
2020-05-28x86/ioperm: Prevent a memory leak when fork failsJay Lang3-15/+15
In the copy_process() routine called by _do_fork(), failure to allocate a PID (or further along in the function) will trigger an invocation to exit_thread(). This is done to clean up from an earlier call to copy_thread_tls(). Naturally, the child task is passed into exit_thread(), however during the process, io_bitmap_exit() nullifies the parent's io_bitmap rather than the child's. As copy_thread_tls() has been called ahead of the failure, the reference count on the calling thread's io_bitmap is incremented as we would expect. However, io_bitmap_exit() doesn't accept any arguments, and thus assumes it should trash the current thread's io_bitmap reference rather than the child's. This is pretty sneaky in practice, because in all instances but this one, exit_thread() is called with respect to the current task and everything works out. A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to get a bitmap allocated, and force a clone3() syscall to fail by passing in a zeroed clone_args structure. The kernel handles the erroneous struct and the buggy code path is followed, and even though the parent's reference to the io_bitmap is trashed, the child still holds a reference and thus the structure will never be freed. Fix this by tweaking io_bitmap_exit() and its subroutines to accept a task_struct argument which to operate on. Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped") Signed-off-by: Jay Lang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: stable#@vger.kernel.org Link: https://lkml.kernel.org/r/[email protected]
2020-05-28Merge tag 'csky-for-linus-5.7-rc8' of git://github.com/c-sky/csky-linuxLinus Torvalds4-71/+66
Pull csky fixes from Guo Ren: "Another four fixes for csky: - fix req_syscall debug - fix abiv2 syscall_trace - fix preempt enable - clean up regs usage in entry.S" * tag 'csky-for-linus-5.7-rc8' of git://github.com/c-sky/csky-linux: csky: Fixup CONFIG_DEBUG_RSEQ csky: Coding convention in entry.S csky: Fixup abiv2 syscall_trace break a4 & a5 csky: Fixup CONFIG_PREEMPT panic
2020-05-28Revert "block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT"Jens Axboe1-7/+4
This reverts commit c58c1f83436b501d45d4050fd1296d71a9760bcb. io_uring does do the right thing for this case, and we're still returning -EAGAIN to userspace for the cases we don't support. Revert this change to avoid doing endless spins of resubmits. Cc: [email protected] # v5.6 Reported-by: Bijan Mottahedeh <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-05-28include/asm-generic/topology.h: guard cpumask_of_node() macro argumentArnd Bergmann1-1/+1
drivers/hwmon/amd_energy.c:195:15: error: invalid operands to binary expression ('void' and 'int') (channel - data->nr_cpus)); ~~~~~~~~~^~~~~~~~~~~~~~~~~ include/asm-generic/topology.h:51:42: note: expanded from macro 'cpumask_of_node' #define cpumask_of_node(node) ((void)node, cpu_online_mask) ^~~~ include/linux/cpumask.h:618:72: note: expanded from macro 'cpumask_first_and' #define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p), (src2p)) ^~~~~ Fixes: f0b848ce6fe9 ("cpumask: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask") Fixes: 8abee9566b7e ("hwmon: Add amd_energy driver to report energy counters") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Guenter Roeck <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-28fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()Alexander Potapenko1-1/+1
KMSAN reported uninitialized data being written to disk when dumping core. As a result, several kilobytes of kmalloc memory may be written to the core file and then read by a non-privileged user. Reported-by: sam <[email protected]> Signed-off-by: Alexander Potapenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: https://github.com/google/kmsan/issues/76 Signed-off-by: Linus Torvalds <[email protected]>
2020-05-28mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()Konstantin Khlebnikov1-2/+13
Replace superfluous VM_BUG_ON() with comment about correct usage. Technically reverts commit 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()"), but context lines have changed. Function isolate_migratepages_block() runs some checks out of lru_lock when choose pages for migration. After checking PageLRU() it checks extra page references by comparing page_count() and page_mapcount(). Between these two checks page could be removed from lru, freed and taken by slab. As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). Race window is tiny. For certain workload this happens around once a year. page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0 flags: 0x500000000008100(slab|head) raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(PageSlab(page)) ------------[ cut here ]------------ kernel BUG at ./include/linux/mm.h:628! invalid opcode: 0000 [#1] SMP NOPTI CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G W 4.19.109-27 #1 Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019 RIP: 0010:isolate_migratepages_block+0x986/0x9b0 The code in isolate_migratepages_block() was added in commit 119d6d59dcc0 ("mm, compaction: avoid isolating pinned pages") before adding VM_BUG_ON into page_mapcount(). This race has been predicted in 2015 by Vlastimil Babka (see link below). [[email protected]: comment tweaks, per Hugh] Fixes: 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()") Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/159032779896.957378.7852761411265662220.stgit@buzz Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/linux-mm/158937872515.474360.5066096871639561424.stgit@buzz/T/ (v1) Signed-off-by: Linus Torvalds <[email protected]>
2020-05-28mm,thp: stop leaking unreleased file pagesHugh Dickins1-0/+1
When collapse_file() calls try_to_release_page(), it has already isolated the page: so if releasing buffers happens to fail (as it sometimes does), remember to putback_lru_page(): otherwise that page is left unreclaimable and unfreeable, and the file extent uncollapsible. Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: <[email protected]> [5.4+] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-28mm/z3fold: silence kmemleak false positives of slotsQian Cai1-0/+3
Kmemleak reported many leaks while under memory pressue in, slots = alloc_slots(pool, gfp); which is referenced by "zhdr" in init_z3fold_page(), zhdr->slots = slots; However, "zhdr" could be gone without freeing slots as the later will be freed separately when the last "handle" off of "handles" array is freed. It will be within "slots" which is always aligned. unreferenced object 0xc000000fdadc1040 (size 104): comm "oom04", pid 140476, jiffies 4295359280 (age 3454.970s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: z3fold_zpool_malloc+0x7b0/0xe10 alloc_slots at mm/z3fold.c:214 (inlined by) init_z3fold_page at mm/z3fold.c:412 (inlined by) z3fold_alloc at mm/z3fold.c:1161 (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1735 zpool_malloc+0x34/0x50 zswap_frontswap_store+0x60c/0xda0 zswap_frontswap_store at mm/zswap.c:1093 __frontswap_store+0x128/0x330 swap_writepage+0x58/0x110 pageout+0x16c/0xa40 shrink_page_list+0x1ac8/0x25c0 shrink_inactive_list+0x270/0x730 shrink_lruvec+0x444/0xf30 shrink_node+0x2a4/0x9c0 do_try_to_free_pages+0x158/0x640 try_to_free_pages+0x1bc/0x5f0 __alloc_pages_slowpath.constprop.60+0x4dc/0x15a0 __alloc_pages_nodemask+0x520/0x650 alloc_pages_vma+0xc0/0x420 handle_mm_fault+0x1174/0x1bf0 Signed-off-by: Qian Cai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vitaly Wool <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-28x86/dma: Fix max PFN arithmetic overflow on 32 bit systemsAlexander Dahl1-1/+1
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is 4 294 967 296 or 0x100000000 which is no problem on 64 bit systems. The patch does not change the later overall result of 0x100000 for MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new calculation yields the same result, but does not require 64 bit arithmetic. On 32 bit systems the old calculation suffers from an arithmetic overflow in that intermediate term in braces: 4UL aka unsigned long int is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does not fit in 4 bytes), the in braces result is truncated to zero, the following right shift does not alter that, so MAX_DMA32_PFN evaluates to 0 on 32 bit systems. That wrong value is a problem in a comparision against MAX_DMA32_PFN in the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if swiotlb should be active. That comparison yields the opposite result, when compiling on 32 bit systems. This was not possible before 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32 (and which landed in v3.0). In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on x86-32. However if one has set CONFIG_IOMMU_INTEL, since c5a5dc4cbbf4 ("iommu/vt-d: Don't switch off swiotlb if bounce page is used") there's a dependency on CONFIG_SWIOTLB, which was not necessarily active before. That landed in v5.4, where we noticed it in the fli4l Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64 bit kernel configs there (I could not find out why, so let's just say historical reasons). The effect is at boot time 64 MiB (default size) were allocated for bounce buffers now, which is a noticeable amount of memory on small systems like pcengines ALIX 2D3 with 256 MiB memory, which are still frequently used as home routers. We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4 (LTS) in fli4l and got that kernel messages for example: Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018 … Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem) … PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB) The initial analysis and the suggested fix was done by user 'sourcejedi' at stackoverflow and explicitly marked as GPLv2 for inclusion in the Linux kernel: https://unix.stackexchange.com/a/520525/50007 The new calculation, which does not suffer from that overflow, is the same as for arch/mips now as suggested by Robin Murphy. The fix was tested by fli4l users on round about two dozen different systems, including both 32 and 64 bit archs, bare metal and virtualized machines. [ bp: Massage commit message. ] Fixes: 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") Reported-by: Alan Jenkins <[email protected]> Suggested-by: Robin Murphy <[email protected]> Signed-off-by: Alexander Dahl <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://unix.stackexchange.com/q/520065/50007 Link: https://web.nettworks.org/bugs/browse/FFL-2560 Link: https://lkml.kernel.org/r/[email protected]
2020-05-28bonding: Fix reference count leak in bond_sysfs_slave_add.Qiushi Wu1-1/+3
kobject_init_and_add() takes reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Previous commit "b8eb718348b8" fixed a similar problem. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Signed-off-by: Qiushi Wu <[email protected]> Acked-by: Jay Vosburgh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller3-6/+6
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Uninitialized when used in __nf_conntrack_update(), from Nathan Chancellor. 2) Comparison of unsigned expression in nf_confirm_cthelper(). 3) Remove 'const' type qualifier with no effect. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-05-28powerpc/bpf: Enable bpf_probe_read{, str}() on powerpc againPetr Mladek1-0/+1
The commit 0ebeea8ca8a4d1d453a ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") caused that bpf_probe_read{, str}() functions were not longer available on architectures where the same logical address might have different content in kernel and user memory mapping. These architectures should use probe_read_{user,kernel}_str helpers. For backward compatibility, the problematic functions are still available on architectures where the user and kernel address spaces are not overlapping. This is defined CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. At the moment, these backward compatible functions are enabled only on x86_64, arm, and arm64. Let's do it also on powerpc that has the non overlapping address space as well. Fixes: 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") Signed-off-by: Petr Mladek <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]
2020-05-28Merge branch 'nvme-5.7' of git://git.infradead.org/nvme into block-5.7Jens Axboe1-4/+7
Pull NVMe poll fix from Christoph. * 'nvme-5.7' of git://git.infradead.org/nvme: nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()
2020-05-28arm64/kernel: Fix return value when cpu_online() fails in __cpu_up()Nobuhiro Iwamatsu1-1/+1
If boot_secondary() was successful, and cpu_online() was an error in __cpu_up(), -EIO was returned, but 0 is returned by commit d22b115cbfbb7 ("arm64/kernel: Simplify __cpu_up() by bailing out early"). Therefore, bringup_wait_for_ap() causes the primary core to wait for a long time, which may cause boot failure. This commit sets -EIO to return code under the same conditions. Fixes: d22b115cbfbb ("arm64/kernel: Simplify __cpu_up() by bailing out early") Signed-off-by: Nobuhiro Iwamatsu <[email protected]> Tested-by: Yuji Ishikawa <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Gavin Shan <[email protected]> Cc: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] [[email protected]: return -EIO at the end of the function] Signed-off-by: Catalin Marinas <[email protected]>
2020-05-28Merge tag 'amd-drm-fixes-5.7-2020-05-27' of ↵Dave Airlie2-7/+2
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.7-2020-05-27: amdgpu: - Display atomic test fix - Fix soft hang in display vupdate code Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-28csky: Fixup CONFIG_DEBUG_RSEQGuo Ren1-5/+10
Put the rseq_syscall check point at the prologue of the syscall will break the a0 ... a7. This will casue system call bug when DEBUG_RSEQ is enabled. So move it to the epilogue of syscall, but before syscall_trace. Signed-off-by: Guo Ren <[email protected]>
2020-05-28csky: Coding convention in entry.SGuo Ren4-40/+42
There is no fixup or feature in the patch, we only cleanup with: - Remove unnecessary reg used (r11, r12), just use r9 & r10 & syscallid regs as temp useage. - Add _TIF_SYSCALL_WORK and _TIF_WORK_MASK to gather macros. Signed-off-by: Guo Ren <[email protected]>
2020-05-28csky: Fixup abiv2 syscall_trace break a4 & a5Guo Ren2-2/+6
Current implementation could destory a4 & a5 when strace, so we need to get them from pt_regs by SAVE_ALL. Signed-off-by: Guo Ren <[email protected]>
2020-05-28csky: Fixup CONFIG_PREEMPT panicGuo Ren3-27/+11
log: [    0.13373200] Calibrating delay loop... [    0.14077600] ------------[ cut here ]------------ [    0.14116700] WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:3790 preempt_count_add+0xc8/0x11c [    0.14348000] DEBUG_LOCKS_WARN_ON((preempt_count() < 0))Modules linked in: [    0.14395100] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0 #7 [    0.14410800] [    0.14427400] Call Trace: [    0.14450700] [<807cd226>] dump_stack+0x8a/0xe4 [    0.14473500] [<80072792>] __warn+0x10e/0x15c [    0.14495900] [<80072852>] warn_slowpath_fmt+0x72/0xc0 [    0.14518600] [<800a5240>] preempt_count_add+0xc8/0x11c [    0.14544900] [<807ef918>] _raw_spin_lock+0x28/0x68 [    0.14572600] [<800e0eb8>] vprintk_emit+0x84/0x2d8 [    0.14599000] [<800e113a>] vprintk_default+0x2e/0x44 [    0.14625100] [<800e2042>] vprintk_func+0x12a/0x1d0 [    0.14651300] [<800e1804>] printk+0x30/0x48 [    0.14677600] [<80008052>] lockdep_init+0x12/0xb0 [    0.14703800] [<80002080>] start_kernel+0x558/0x7f8 [    0.14730000] [<800052bc>] csky_start+0x58/0x94 [    0.14756600] irq event stamp: 34 [    0.14775100] hardirqs last  enabled at (33): [<80067370>] ret_from_exception+0x2c/0x72 [    0.14793700] hardirqs last disabled at (34): [<800e0eae>] vprintk_emit+0x7a/0x2d8 [    0.14812300] softirqs last  enabled at (32): [<800655b0>] __do_softirq+0x578/0x6d8 [    0.14830800] softirqs last disabled at (25): [<8007b3b8>] irq_exit+0xec/0x128 The preempt_count of reg could be destroyed after csky_do_IRQ without reload from memory. After reference to other architectures (arm64, riscv), we move preempt entry into ret_from_exception and disable irq at the beginning of ret_from_exception instead of RESTORE_ALL. Signed-off-by: Guo Ren <[email protected]> Reported-by: Lu Baoquan <[email protected]>
2020-05-27IB/ipoib: Fix double free of skb in case of multicast traffic in CM modeValentine Fatiev4-12/+26
When connected mode is set, and we have connected and datagram traffic in parallel, ipoib might crash with double free of datagram skb. The current mechanism assumes that the order in the completion queue is the same as the order of sent packets for all QPs. Order is kept only for specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and few CM's) in parallel. The problem: ---------------------------------------------------------- Transmit queue: ----------------- UD skb pointer kept in queue itself, CM skb kept in spearate queue and uses transmit queue as a placeholder to count the number of total transmitted packets. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ tail head Completion queue (problematic scenario) - the order not the same as in the transmit queue: 1 2 3 4 5 6 7 8 9 ------------------------------------ ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5 ------------------------------------ 1. CM1 'wc' processing - skb freed in cm separate ring. - tx_tail of transmit queue increased although UD2 is not freed. Now driver assumes UD2 index is already freed and it could be used for new transmitted skb. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ ^ (Bad)tail head (Bad - Could be used for new SKB) In this case (due to heavy load) UD2 skb pointer could be replaced by new transmitted packet UD_NEW, as the driver assumes its free. At this point we will have to process two 'wc' with same index but we have only one pointer to free. During second attempt to free the same skb we will have NULL pointer exception. 2. UD2 'wc' processing - skb freed according the index we got from 'wc', but it was already overwritten by mistake. So actually the skb that was released is the skb of the new transmitted packet and not the original one. 3. UD_NEW 'wc' processing - attempt to free already freed skb. NUll pointer exception. The fix: ----------------------------------------------------------------------- The fix is to stop using the UD ring as a placeholder for CM packets, the cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a new cyclic variables global_tx_head and global_tx_tail are introduced for managing and counting the overall outstanding sent packets, then the send queue will be stopped and waken based on these variables only. Note that no locking is needed since global_tx_head is updated in the xmit flow and global_tx_tail is updated in the NAPI flow only. A previous attempt tried to use one variable to count the outstanding sent packets, but it did not work since xmit and NAPI flows can run at the same time and the counter will be updated wrongly. Thus, we use the same simple cyclic head and tail scheme that we have today for the UD tx_ring. Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Valentine Fatiev <[email protected]> Signed-off-by: Alaa Hleihel <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Acked-by: Doug Ledford <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2020-05-27drm/amd/display: Fix potential integer wraparound resulting in a hangAric Cyr1-0/+2
[Why] If VUPDATE_END is before VUPDATE_START the delay calculated can become very large, causing a soft hang. [How] Take the absolute value of the difference between START and END. Signed-off-by: Aric Cyr <[email protected]> Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Qingqing Zhuo <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2020-05-27drm/amd/display: drop cursor position check in atomic testSimon Ser1-7/+0
get_cursor_position already handles the case where the cursor has negative off-screen coordinates by not setting dc_cursor_position.enabled. Signed-off-by: Simon Ser <[email protected]> Fixes: 626bf90fe03f ("drm/amd/display: add basic atomic check for cursor plane") Cc: Alex Deucher <[email protected]> Cc: Nicholas Kazlauskas <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2020-05-27net: dsa: declare lockless TX feature for slave portsVladimir Oltean1-0/+1
Be there a platform with the following layout: Regular NIC | +----> DSA master for switch port | +----> DSA master for another switch port After changing DSA back to static lockdep class keys in commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key changes"), this kernel splat can be seen: [ 13.361198] ============================================ [ 13.366524] WARNING: possible recursive locking detected [ 13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted [ 13.377874] -------------------------------------------- [ 13.383201] swapper/0/0 is trying to acquire lock: [ 13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.397879] [ 13.397879] but task is already holding lock: [ 13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.413593] [ 13.413593] other info that might help us debug this: [ 13.420140] Possible unsafe locking scenario: [ 13.420140] [ 13.426075] CPU0 [ 13.428523] ---- [ 13.430969] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.435946] lock(&dsa_slave_netdev_xmit_lock_key); [ 13.440924] [ 13.440924] *** DEADLOCK *** [ 13.440924] [ 13.446860] May be due to missing lock nesting notation [ 13.446860] [ 13.453668] 6 locks held by swapper/0/0: [ 13.457598] #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400 [ 13.466593] #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560 [ 13.474803] #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10 [ 13.483886] #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.492793] #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0 [ 13.503094] #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0 [ 13.512000] [ 13.512000] stack backtrace: [ 13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 [ 13.530421] Call trace: [ 13.532871] dump_backtrace+0x0/0x1d8 [ 13.536539] show_stack+0x24/0x30 [ 13.539862] dump_stack+0xe8/0x150 [ 13.543271] __lock_acquire+0x1030/0x1678 [ 13.547290] lock_acquire+0xf8/0x458 [ 13.550873] _raw_spin_lock+0x44/0x58 [ 13.554543] __dev_queue_xmit+0x84c/0xbe0 [ 13.558562] dev_queue_xmit+0x24/0x30 [ 13.562232] dsa_slave_xmit+0xe0/0x128 [ 13.565988] dev_hard_start_xmit+0xf4/0x448 [ 13.570182] __dev_queue_xmit+0x808/0xbe0 [ 13.574200] dev_queue_xmit+0x24/0x30 [ 13.577869] neigh_resolve_output+0x15c/0x220 [ 13.582237] ip6_finish_output2+0x244/0xb10 [ 13.586430] __ip6_finish_output+0x1dc/0x298 [ 13.590709] ip6_output+0x84/0x358 [ 13.594116] mld_sendpack+0x2bc/0x560 [ 13.597786] mld_ifc_timer_expire+0x210/0x390 [ 13.602153] call_timer_fn+0xcc/0x400 [ 13.605822] run_timer_softirq+0x588/0x6e0 [ 13.609927] __do_softirq+0x118/0x590 [ 13.613597] irq_exit+0x13c/0x148 [ 13.616918] __handle_domain_irq+0x6c/0xc0 [ 13.621023] gic_handle_irq+0x6c/0x160 [ 13.624779] el1_irq+0xbc/0x180 [ 13.627927] cpuidle_enter_state+0xb4/0x4d0 [ 13.632120] cpuidle_enter+0x3c/0x50 [ 13.635703] call_cpuidle+0x44/0x78 [ 13.639199] do_idle+0x228/0x2c8 [ 13.642433] cpu_startup_entry+0x2c/0x48 [ 13.646363] rest_init+0x1ac/0x280 [ 13.649773] arch_call_rest_init+0x14/0x1c [ 13.653878] start_kernel+0x490/0x4bc Lockdep keys themselves were added in commit ab92d68fc22f ("net: core: add generic lockdep keys"), and it's very likely that this splat existed since then, but I have no real way to check, since this stacked platform wasn't supported by mainline back then. >From Taehee's own words: This patch was considered that all stackable devices have LLTX flag. But the dsa doesn't have LLTX, so this splat happened. After this patch, dsa shares the same lockdep class key. On the nested dsa interface architecture, which you illustrated, the same lockdep class key will be used in __dev_queue_xmit() because dsa doesn't have LLTX. So that lockdep detects deadlock because the same lockdep class key is used recursively although actually the different locks are used. There are some ways to fix this problem. 1. using NETIF_F_LLTX flag. If possible, using the LLTX flag is a very clear way for it. But I'm so sorry I don't know whether the dsa could have LLTX or not. 2. using dynamic lockdep again. It means that each interface uses a separate lockdep class key. So, lockdep will not detect recursive locking. But this way has a problem that it could consume lockdep class key too many. Currently, lockdep can have 8192 lockdep class keys. - you can see this number with the following command. cat /proc/lockdep_stats lock-classes: 1251 [max: 8192] ... The [max: 8192] means that the maximum number of lockdep class keys. If too many lockdep class keys are registered, lockdep stops to work. So, using a dynamic(separated) lockdep class key should be considered carefully. In addition, updating lockdep class key routine might have to be existing. (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key()) 3. Using lockdep subclass. A lockdep class key could have 8 subclasses. The different subclass is considered different locks by lockdep infrastructure. But "lock-classes" is not counted by subclasses. So, it could avoid stopping lockdep infrastructure by an overflow of lockdep class keys. This approach should also have an updating lockdep class key routine. (lockdep_set_subclass()) 4. Using nonvalidate lockdep class key. The lockdep infrastructure supports nonvalidate lockdep class key type. It means this lockdep is not validated by lockdep infrastructure. So, the splat will not happen but lockdep couldn't detect real deadlock case because lockdep really doesn't validate it. I think this should be used for really special cases. (lockdep_set_novalidate_class()) Further discussion here: https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ There appears to be no negative side-effect to declaring lockless TX for the DSA virtual interfaces, which means they handle their own locking. So that's what we do to make the splat go away. Patch tested in a wide variety of cases: unicast, multicast, PTP, etc. Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Suggested-by: Taehee Yoo <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-27copy_xstate_to_kernel(): don't leave parts of destination uninitializedAl Viro1-38/+48
copy the corresponding pieces of init_fpstate into the gaps instead. Cc: [email protected] Tested-by: Alexander Potapenko <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-05-27net: dsa: felix: send VLANs on CPU port as egress-taggedVladimir Oltean1-2/+6
As explained in other commits before (b9cd75e66895 and 87b0f983f66f), ocelot switches have a single egress-untagged VLAN per port, and the driver would deny adding a second one while an egress-untagged VLAN already exists. But on the CPU port (where the VLAN configuration is implicit, because there is no net device for the bridge to control), the DSA core attempts to add a VLAN using the same flags as were used for the front-panel port. This would make adding any untagged VLAN fail due to the CPU port rejecting the configuration: bridge vlan add dev swp0 vid 100 pvid untagged [ 1865.854253] mscc_felix 0000:00:00.5: Port already has a native VLAN: 1 [ 1865.860824] mscc_felix 0000:00:00.5: Failed to add VLAN 100 to port 5: -16 (note that port 5 is the CPU port and not the front-panel swp0). So this hardware will send all VLANs as tagged towards the CPU. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-27bridge: multicast: work around clang bugArnd Bergmann1-1/+2
Clang-10 and clang-11 run into a corner case of the register allocator on 32-bit ARM, leading to excessive stack usage from register spilling: net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=] Work around this by marking one of the internal functions as noinline_for_stack. Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9 Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-27nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll()Dongli Zhang1-4/+7
There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g., when doing live reset while polling the nvme device. CPU X CPU Y nvme_poll() nvme_dev_disable() -> nvme_stop_queues() -> nvme_suspend_io_queues() -> nvme_suspend_queue() -> spin_lock(&nvmeq->cq_poll_lock); -> nvme_reap_pending_cqes() -> nvme_process_cq() -> nvme_process_cq() In the above scenario, the nvme_process_cq() for the same queue may be running on both CPU X and CPU Y concurrently. It is much more easier to reproduce the issue when CONFIG_PREEMPT is enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace period. This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in nvme_reap_pending_cqes(). Fixes: fa46c6fb5d61 ("nvme/pci: move cqe check after device shutdown") Signed-off-by: Dongli Zhang <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-05-27vsock: fix timeout in vsock_accept()Stefano Garzarella1-1/+1
The accept(2) is an "input" socket interface, so we should use SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout. So this patch replace sock_sndtimeo() with sock_rcvtimeo() to use the right timeout in the vsock_accept(). Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Jorgen Hansen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-27nfp: flower: fix used time of merge flow statisticsHeinrich Kuhn1-1/+2
Prior to this change the correct value for the used counter is calculated but not stored nor, therefore, propagated to user-space. In use-cases such as OVS use-case at least this results in active flows being removed from the hardware datapath. Which results in both unnecessary flow tear-down and setup, and packet processing on the host. This patch addresses the problem by saving the calculated used value which allows the value to propagate to user-space. Found by inspection. Fixes: aa6ce2ea0c93 ("nfp: flower: support stats update for merge flows") Signed-off-by: Heinrich Kuhn <[email protected]> Signed-off-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-27net/sched: fix infinite loop in sch_fq_pieDavide Caratti2-2/+23
this command hangs forever: # tc qdisc add dev eth0 root fq_pie flows 65536 watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028] [...] CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167 RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie] Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48 RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590 RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699 R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0 R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8 FS: 00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0 Call Trace: qdisc_create+0x3fd/0xeb0 tc_modify_qdisc+0x3be/0x14a0 rtnetlink_rcv_msg+0x5f3/0x920 netlink_rcv_skb+0x121/0x350 netlink_unicast+0x439/0x630 netlink_sendmsg+0x714/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5b4/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x9a/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 we can't accept 65536 as a valid number for 'nflows', because the loop on 'idx' in fq_pie_init() will never end. The extack message is correct, but it doesn't say that 0 is not a valid number for 'flows': while at it, fix this also. Add a tdc selftest to check correct validation of 'flows'. CC: Ivan Vecera <[email protected]> Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Ivan Vecera <[email protected]> Signed-off-by: David S. Miller <[email protected]>