aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-08-11net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlanIvan Khoronzhuk1-4/+7
It's exclusive with normal behaviour but if try to set vlan to one of the reserved values is made, the cpsw runtime pm is broken. Fixes: a6c5d14f5136 ("drivers: net: cpsw: ndev: fix accessing to suspended device") Signed-off-by: Ivan Khoronzhuk <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-11net: ethernet: ti: cpsw: clear all entries when delete vidIvan Khoronzhuk2-11/+5
In cases if some of the entries were not found in forwarding table while killing vlan, the rest not needed entries still left in the table. No need to stop, as entry was deleted anyway. So fix this by returning error only after all was cleaned. To implement this, return -ENOENT in cpsw_ale_del_mcast() as it's supposed to be. Signed-off-by: Ivan Khoronzhuk <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-10zram: remove BD_CAP_SYNCHRONOUS_IO with writeback featureMinchan Kim1-1/+14
If zram supports writeback feature, it's no longer a BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations for incompressible pages. Do not pretend to be synchronous IO device. It makes the system very sluggish due to waiting for IO completion from upper layers. Furthermore, it causes a user-after-free problem because swap thinks the opearion is done when the IO functions returns so it can free the page (e.g., lock_page_or_retry and goto out_release in do_swap_page) but in fact, IO is asynchronous so the driver could access a just freed page afterward. This patch fixes the problem. BUG: Bad page state in process qemu-system-x86 pfn:3dfab21 page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1 flags: 0x17fffc000000008(uptodate) raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000 raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set bad because of flags: 0x8(uptodate) CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 Call Trace: dump_stack+0x5c/0x7b bad_page+0xba/0x120 get_page_from_freelist+0x1016/0x1250 __alloc_pages_nodemask+0xfa/0x250 alloc_pages_vma+0x7c/0x1c0 do_swap_page+0x347/0x920 __handle_mm_fault+0x7b4/0x1110 handle_mm_fault+0xfc/0x1f0 __get_user_pages+0x12f/0x690 get_user_pages_unlocked+0x148/0x1f0 __gfn_to_pfn_memslot+0xff/0x3c0 [kvm] try_async_pf+0x87/0x230 [kvm] tdp_page_fault+0x132/0x290 [kvm] kvm_mmu_page_fault+0x74/0x570 [kvm] kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm] kvm_vcpu_ioctl+0x388/0x5d0 [kvm] do_vfs_ioctl+0xa2/0x630 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x55/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: https://lore.kernel.org/lkml/[email protected]/ Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix changelog, add comment] Link: https://lore.kernel.org/lkml/[email protected]/ Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: coding-style fixes] Signed-off-by: Minchan Kim <[email protected]> Reported-by: Tino Lehnig <[email protected]> Tested-by: Tino Lehnig <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Jens Axboe <[email protected]> Cc: <[email protected]> [4.15+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-10mm/memory.c: check return value of ioremap_protjie@[email protected]1-0/+3
ioremap_prot() can return NULL which could lead to an oops. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: chen jie <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Li Zefan <[email protected]> Cc: chenjie <[email protected]> Cc: Yang Shi <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-10lib/ubsan: remove null-pointer checksAndrey Ryabinin4-17/+0
With gcc-8 fsanitize=null become very noisy. GCC started to complain about things like &a->b, where 'a' is NULL pointer. There is no NULL dereference, we just calculate address to struct member. It's technically undefined behavior so UBSAN is correct to report it. But as long as there is no real NULL-dereference, I think, we should be fine. -fno-delete-null-pointer-checks compiler flag should protect us from any consequences. So let's just no use -fsanitize=null as it's not useful for us. If there is a real NULL-deref we will see crash. Even if userspace mapped something at NULL (root can do this), with things like SMAP should catch the issue. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-10MAINTAINERS: GDB: update e-mail addressKieran Bingham1-1/+1
This entry was created with my personal e-mail address. Update this entry to my open-source kernel.org account. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kieran Bingham <[email protected]> Cc: Jan Kiszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-08-10x86/mm/pti: Move user W+X check into pti_finalize()Joerg Roedel3-5/+10
The user page-table gets the updated kernel mappings in pti_finalize(), which runs after the RO+X permissions got applied to the kernel page-table in mark_readonly(). But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in mark_readonly() for insecure mappings. This causes false-positive warnings, because the user page-table did not get the updated mappings yet. Move the W+X check for the user page-table into pti_finalize() after it updated all required mappings. [ tglx: Folded !NX supported fix ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: Pavel Machek <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-08-10Merge branch 'i2c/for-current' of ↵Linus Torvalds1-13/+28
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "A single driver bugfix for I2C. The bug was found by systematically stress testing the driver, so I am confident to merge it that late in the cycle although it is probably unusually large" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: xlp9xx: Fix case where SSIF read transaction completes early
2018-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller8-20/+30
Daniel Borkmann says: ==================== pull-request: bpf 2018-08-10 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix cpumap and devmap on teardown as they're under RCU context and won't have same assumption as running under NAPI protection, from Jesper. 2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had a bug where socket error was not propagated correctly, from Daniel. 3) Fix incompatible libbpf header license for BTF code and match it before it gets officially released with the rest of libbpf which is LGPL-2.1, from Martin. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-09make sure that __dentry_kill() always invalidates d_seq, unhashed or notAl Viro1-5/+2
RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by: "Dae R. Jeong" <[email protected]> Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2018-08-09fix __legitimize_mnt()/mntput() raceAl Viro1-0/+14
__legitimize_mnt() has two problems - one is that in case of success the check of mount_lock is not ordered wrt preceding increment of refcount, making it possible to have successful __legitimize_mnt() on one CPU just before the otherwise final mntpu() on another, with __legitimize_mnt() not seeing mntput() taking the lock and mntput() not seeing the increment done by __legitimize_mnt(). Solved by a pair of barriers. Another is that failure of __legitimize_mnt() on the second read_seqretry() leaves us with reference that'll need to be dropped by caller; however, if that races with final mntput() we can end up with caller dropping rcu_read_lock() and doing mntput() to release that reference - with the first mntput() having freed the damn thing just as rcu_read_lock() had been dropped. Solution: in "do mntput() yourself" failure case grab mount_lock, check if MNT_DOOMED has been set by racing final mntput() that has missed our increment and if it has - undo the increment and treat that as "failure, caller doesn't need to drop anything" case. It's not easy to hit - the final mntput() has to come right after the first read_seqretry() in __legitimize_mnt() *and* manage to miss the increment done by __legitimize_mnt() before the second read_seqretry() in there. The things that are almost impossible to hit on bare hardware are not impossible on SMP KVM, though... Reported-by: Oleg Nesterov <[email protected]> Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2018-08-09MIPS: Remove remnants of UASM_ISAPaul Burton2-2/+0
Commit 33679a50370d ("MIPS: uasm: Remove needless ISA abstraction") removed use of the MIPS_ISA preprocessor macro, but left a couple of unused definitions of it behind. Remove the dead code. Signed-off-by: Paul Burton <[email protected]>
2018-08-09fix mntput/mntput raceAl Viro1-2/+12
mntput_no_expire() does the calculation of total refcount under mount_lock; unfortunately, the decrement (as well as all increments) are done outside of it, leading to false positives in the "are we dropping the last reference" test. Consider the following situation: * mnt is a lazy-umounted mount, kept alive by two opened files. One of those files gets closed. Total refcount of mnt is 2. On CPU 42 mntput(mnt) (called from __fput()) drops one reference, decrementing component * After it has looked at component #0, the process on CPU 0 does mntget(), incrementing component #0, gets preempted and gets to run again - on CPU 69. There it does mntput(), which drops the reference (component #69) and proceeds to spin on mount_lock. * On CPU 42 our first mntput() finishes counting. It observes the decrement of component #69, but not the increment of component #0. As the result, the total it gets is not 1 as it should've been - it's 0. At which point we decide that vfsmount needs to be killed and proceed to free it and shut the filesystem down. However, there's still another opened file on that filesystem, with reference to (now freed) vfsmount, etc. and we are screwed. It's not a wide race, but it can be reproduced with artificial slowdown of the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups. Fix consists of moving the refcount decrement under mount_lock; the tricky part is that we want (and can) keep the fast case (i.e. mount that still has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu() before that mntput(). IOW, if mntput() observes (under rcu_read_lock()) a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to be dropped. Reported-by: Jann Horn <[email protected]> Tested-by: Jann Horn <[email protected]> Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2018-08-09Merge branch 'bpf-fix-cpu-and-devmap-teardown'Daniel Borkmann4-14/+21
Jesper Dangaard Brouer says: ==================== Removing entries from cpumap and devmap, goes through a number of syncronization steps to make sure no new xdp_frames can be enqueued. But there is a small chance, that xdp_frames remains which have not been flushed/processed yet. Flushing these during teardown, happens from RCU context and not as usual under RX NAPI context. The optimization introduced in commt 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi"), missed that the flush operation can also be called from RCU context. Thus, we cannot always use the xdp_return_frame_rx_napi call, which take advantage of the protection provided by XDP RX running under NAPI protection. The samples/bpf xdp_redirect_cpu have a --stress-mode, that is adjusted to easier reproduce (verified by Red Hat QA). ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-09xdp: fix bug in devmap teardown code pathJesper Dangaard Brouer1-5/+9
Like cpumap teardown, the devmap teardown code also flush remaining xdp_frames, via bq_xmit_all() in case map entry is removed. The code can call xdp_return_frame_rx_napi, from the the wrong context, in-case ndo_xdp_xmit() fails. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-09samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easierJesper Dangaard Brouer2-3/+3
The teardown race in cpumap is really hard to reproduce. These changes makes it easier to reproduce, for QA. The --stress-mode now have a case of a very small queue size of 8, that helps to trigger teardown flush to encounter a full queue, which results in calling xdp_return_frame API, in a non-NAPI protect context. Also increase MAX_CPUS, as my QA department have larger machines than me. Tested-by: Jean-Tsung Hsiao <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-09xdp: fix bug in cpumap teardown code pathJesper Dangaard Brouer1-6/+9
When removing a cpumap entry, a number of syncronization steps happen. Eventually the teardown code __cpu_map_entry_free is invoked from/via call_rcu. The teardown code __cpu_map_entry_free() flushes remaining xdp_frames, by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi(). The issues is that the teardown code is not running in the RX NAPI code path. Thus, it is not allowed to invoke the NAPI variant of xdp_return_frame. This bug was found and triggered by using the --stress-mode option to the samples/bpf program xdp_redirect_cpu. It is hard to trigger, because the ptr_ring have to be full and cpumap bulk queue max contains 8 packets, and a remote CPU is racing to empty the ptr_ring queue. Fixes: 389ab7f01af9 ("xdp: introduce xdp_return_frame_rx_napi") Tested-by: Jean-Tsung Hsiao <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-08-09x86/relocs: Add __end_rodata_aligned to S_RELJoerg Roedel1-0/+1
This new symbol needs to be in the workaround-list for buggy binutils, otherwise the build with gcc-4.6 fails. Fixes: 39d668e04eda ('x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit') Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Sedat Dilek <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linux-Next Mailing List <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-08-09Merge branch 'linus' of ↵Linus Torvalds8-191/+101
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a performance regression in arm64 NEON crypto as well as a crash in x86 aegis/morus on unsupported CPUs" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aegis,morus - Fix and simplify CPUID checks crypto: arm64 - revert NEON yield for fast AEAD implementations
2018-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds26-161/+178
Pull networking fixes from David Miller: 1) The real fix for the ipv6 route metric leak Sabrina was seeing, from Cong Wang. 2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room conditions, from Willem de Bruijn. 3) vsock can reinitialize active work struct, fix from Cong Wang. 4) RXRPC keepalive generator can wedge a cpu, fix from David Howells. 5) Fix locking in AF_SMC ioctl, from Ursula Braun. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: dsa: slave: eee: Allow ports to use phylink net/smc: move sock lock in smc_ioctl() net/smc: allow sysctl rmem and wmem defaults for servers net/smc: no shutdown in state SMC_LISTEN net: aquantia: Fix IFF_ALLMULTI flag functionality rxrpc: Fix the keepalive generator [ver #2] net/mlx5e: Cleanup of dcbnl related fields net/mlx5e: Properly check if hairpin is possible between two functions vhost: reset metadata cache when initializing new IOTLB llc: use refcount_inc_not_zero() for llc_sap_find() dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart() tipc: fix an interrupt unsafe locking scenario vsock: split dwork to avoid reinitializations net: thunderx: check for failed allocation lmac->dmacs cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts packet: refine ring v3 block size test to hold one frame ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit ipv6: fix double refcount of fib6_metrics
2018-08-09i2c: xlp9xx: Fix case where SSIF read transaction completes earlyGeorge Cherian1-13/+28
During ipmi stress tests we see occasional failure of transactions at the boot time. This happens in the case of a I2C_M_RECV_LEN transactions, when the read transfer completes (with the initial read length of 34) before the driver gets a chance to handle interrupts. The current driver code expects at least 2 interrupts for I2C_M_RECV_LEN transactions. The length is updated during the first interrupt, and the buffer contents are only copied during subsequent interrupts. In case of just one interrupt, we will complete the transaction without copying out the bytes from RX fifo. Update the code to drain the RX fifo after the length update, so that the transaction completes correctly in all cases. Signed-off-by: George Cherian <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Cc: [email protected]
2018-08-09s390/dasd: fix hanging offline processing due to canceled workerStefan Haberland1-2/+5
During offline processing two worker threads are canceled without freeing the device reference which leads to a hanging offline process. Reviewed-by: Jan Hoeppner <[email protected]> Signed-off-by: Stefan Haberland <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-08-09s390/dasd: fix panic for failed online processingStefan Haberland1-0/+3
Fix a panic that occurs for a device that got an error in dasd_eckd_check_characteristics() during online processing. For example the read configuration data command may have failed. If this error occurs the device is not being set online and the earlier invoked steps during online processing are rolled back. Therefore dasd_eckd_uncheck_device() is called which needs a valid private structure. But this pointer is not valid if dasd_eckd_check_characteristics() has failed. Check for a valid device->private pointer to prevent a panic. Reviewed-by: Jan Hoeppner <[email protected]> Signed-off-by: Stefan Haberland <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-08-09s390/mm: fix addressing exception after suspend/resumeGerald Schaefer1-1/+1
Commit c9b5ad546e7d "s390/mm: tag normal pages vs pages used in page tables" accidentally changed the logic in arch_set_page_states(), which is used by the suspend/resume code. set_page_stable(page, order) was changed to set_page_stable_dat(page, 0). After this, only the first page of higher order pages will be set to stable, and a write to one of the unstable pages will result in an addressing exception. Fix this by using "order" again, instead of "0". Fixes: c9b5ad546e7d ("s390/mm: tag normal pages vs pages used in page tables") Cc: [email protected] # 4.14+ Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Gerald Schaefer <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-08-09rseq/selftests: add s390 supportVasily Gorbik3-0/+539
Implement support for s390 in the rseq selftests, in order to sanity check the recently enabled rseq syscall. The Implementation covers both 64-bit and 31-bit mode. Acked-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-08-08dsa: slave: eee: Allow ports to use phylinkAndrew Lunn1-2/+2
For a port to be able to use EEE, both the MAC and the PHY must support EEE. A phy can be provided by both a phydev or phylink. Verify at least one of these exist, not just phydev. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08Merge branch 'smc-fixes'David S. Miller1-5/+10
Ursula Braun says: ==================== net/smc: fixes 2018-08-08 here are small fixes for SMC: The first patch makes sure, shutdown code is not executed for sockets in state SMC_LISTEN. The second patch resets send and receive buffer values for accepted sockets, since TCP buffer size optimizations for the internal CLC socket should not be forwarded to the outer SMC socket. The third patch solves a race between connect and ioctl reported by syzbot. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-08net/smc: move sock lock in smc_ioctl()Ursula Braun1-3/+7
When an SMC socket is connecting it is decided whether fallback to TCP is needed. To avoid races between connect and ioctl move the sock lock before the use_fallback check. Reported-by: [email protected] Reported-by: [email protected] Fixes: 1992d99882af ("net/smc: take sock lock in smc_ioctl()") Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08net/smc: allow sysctl rmem and wmem defaults for serversUrsula Braun1-0/+2
Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size optimizations for servers should be ignored. Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08net/smc: no shutdown in state SMC_LISTENUrsula Braun1-2/+1
Invoking shutdown for a socket in state SMC_LISTEN does not make sense. Nevertheless programs like syzbot fuzzing the kernel may try to do this. For SMC this means a socket refcounting problem. This patch makes sure a shutdown call for an SMC socket in state SMC_LISTEN simply returns with -ENOTCONN. Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08net: aquantia: Fix IFF_ALLMULTI flag functionalityDmitry Bogdanov1-1/+1
It was noticed that NIC always pass all multicast traffic to the host regardless of IFF_ALLMULTI flag on the interface. The rule in MC Filter Table in NIC, that is configured to accept any multicast packets, is turning on if IFF_MULTICAST flag is set on the interface. It leads to passing all multicast traffic to the host. This fix changes the condition to turn on that rule by checking IFF_ALLMULTI flag as it should. Fixes: b21f502f84be ("net:ethernet:aquantia: Fix for multicast filter handling.") Signed-off-by: Dmitry Bogdanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08rxrpc: Fix the keepalive generator [ver #2]David Howells7-89/+109
AF_RXRPC has a keepalive message generator that generates a message for a peer ~20s after the last transmission to that peer to keep firewall ports open. The implementation is incorrect in the following ways: (1) It mixes up ktime_t and time64_t types. (2) It uses ktime_get_real(), the output of which may jump forward or backward due to adjustments to the time of day. (3) If the current time jumps forward too much or jumps backwards, the generator function will crank the base of the time ring round one slot at a time (ie. a 1s period) until it catches up, spewing out VERSION packets as it goes. Fix the problem by: (1) Only using time64_t. There's no need for sub-second resolution. (2) Use ktime_get_seconds() rather than ktime_get_real() so that time isn't perceived to go backwards. (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two parts: (a) The "worker" function that manages the buckets and the timer. (b) The "dispatch" function that takes the pending peers and potentially transmits a keepalive packet before putting them back in the ring into the slot appropriate to the revised last-Tx time. (4) Taking everything that's pending out of the ring and splicing it into a temporary collector list for processing. In the case that there's been a significant jump forward, the ring gets entirely emptied and then the time base can be warped forward before the peers are processed. The warping can't happen if the ring isn't empty because the slot a peer is in is keepalive-time dependent, relative to the base time. (5) Limit the number of iterations of the bucket array when scanning it. (6) Set the timer to skip any empty slots as there's no point waking up if there's nothing to do yet. This can be triggered by an incoming call from a server after a reboot with AF_RXRPC and AFS built into the kernel causing a peer record to be set up before userspace is started. The system clock is then adjusted by userspace, thereby potentially causing the keepalive generator to have a meltdown - which leads to a message like: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23] ... Workqueue: krxrpcd rxrpc_peer_keepalive_worker EIP: lock_acquire+0x69/0x80 ... Call Trace: ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? _raw_spin_lock_bh+0x29/0x60 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? __lock_acquire+0x3d3/0x870 ? process_one_work+0x110/0x340 ? process_one_work+0x166/0x340 ? process_one_work+0x110/0x340 ? worker_thread+0x39/0x3c0 ? kthread+0xdb/0x110 ? cancel_delayed_work+0x90/0x90 ? kthread_stop+0x70/0x70 ? ret_from_fork+0x19/0x24 Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Reported-by: kernel test robot <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08Merge branch 'mlx5-fixes'David S. Miller3-25/+15
Saeed Mahameed says: ==================== Mellanox, mlx5e fixes 2018-08-07 I know it is late into 4.18 release, and this is why I am submitting only two mlx5e ethernet fixes. The first one from Or, is needed for -stable and it fixes hairpin for "same device" check. The second fix is a non risk fix from Huy which cleans up and improves error return value reporting for dcbnl_ieee_setapp. For -stable v4.16 - net/mlx5e: Properly check if hairpin is possible between two functions ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-08net/mlx5e: Cleanup of dcbnl related fieldsHuy Nguyen2-21/+11
Remove unused netdev_registered_init/remove in en.h Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails. Remove extra white space Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support") Signed-off-by: Huy Nguyen <[email protected]> Cc: Yuval Shaia <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08net/mlx5e: Properly check if hairpin is possible between two functionsOr Gerlitz1-4/+4
The current check relies on function BDF addresses and can get us wrong e.g when two VFs are assigned into a VM and the PCI v-address is set by the hypervisor. Fixes: 5c65c564c962 ('net/mlx5e: Support offloading TC NIC hairpin flows') Signed-off-by: Or Gerlitz <[email protected]> Reported-by: Alaa Hleihel <[email protected]> Tested-by: Alaa Hleihel <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-08parisc: Define mb() and add memory barriers to assembler unlock sequencesJohn David Anglin4-0/+39
For years I thought all parisc machines executed loads and stores in order. However, Jeff Law recently indicated on gcc-patches that this is not correct. There are various degrees of out-of-order execution all the way back to the PA7xxx processor series (hit-under-miss). The PA8xxx series has full out-of-order execution for both integer operations, and loads and stores. This is described in the following article: http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml For this reason, we need to define mb() and to insert a memory barrier before the store unlocking spinlocks. This ensures that all memory accesses are complete prior to unlocking. The ldcw instruction performs the same function on entry. Signed-off-by: John David Anglin <[email protected]> Cc: [email protected] # 4.0+ Signed-off-by: Helge Deller <[email protected]>
2018-08-08parisc: Enable CONFIG_MLONGCALLS by defaultHelge Deller1-1/+1
Enable the -mlong-calls compiler option by default, because otherwise in most cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F relocations. This fixes building the 64-bit defconfig. Cc: [email protected] # 4.0+ Signed-off-by: Helge Deller <[email protected]>
2018-08-08Merge branch 'sockmap-fixes'Alexei Starovoitov2-4/+7
Daniel Borkmann says: ==================== Two sockmap fixes in bpf_tcp_sendmsg(), and one fix for the sockmap kernel selftest. Thanks! ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2018-08-08bpf, sockmap: fix cork timeout for select due to epipeDaniel Borkmann1-1/+1
I ran into the same issue as a009f1f396d0 ("selftests/bpf: test_sockmap, timing improvements") where I had a broken pipe error on the socket due to remote end timing out on select and then shutting down it's sockets while the other side was still sending. We may need to do a bigger rework in general on the test_sockmap.c, but for now increase it to a more suitable timeout. Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-08-08bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem pathDaniel Borkmann1-2/+5
In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of ENOMEM, it may also mean that we've partially filled the scatterlist entries with pages. Later jumping to sk_stream_wait_memory() we could further fail with an error for several reasons, however we miss to call free_start_sg() if the local sk_msg_buff was used. Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-08-08bpf, sockmap: fix bpf_tcp_sendmsg sock error handlingDaniel Borkmann1-1/+1
While working on bpf_tcp_sendmsg() code, I noticed that when a sk->sk_err is set we error out with err = sk->sk_err. However this is problematic since sk->sk_err is a positive error value and therefore we will neither go into sk_stream_error() nor will we report an error back to user space. I had this case with EPIPE and user space was thinking sendmsg() succeeded since EPIPE is a positive value, thinking we submitted 32 bytes. Fix it by negating the sk->sk_err value. Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-08-08MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()Paul Burton1-2/+0
In nlm_fmn_send() we have a loop which attempts to send a message multiple times in order to handle the transient failure condition of a lack of available credit. When examining the status register to detect the failure we check for a condition that can never be true, which falls foul of gcc 8's -Wtautological-compare: In file included from arch/mips/netlogic/common/irq.c:65: ./arch/mips/include/asm/netlogic/xlr/fmn.h: In function 'nlm_fmn_send': ./arch/mips/include/asm/netlogic/xlr/fmn.h:304:22: error: bitwise comparison always evaluates to false [-Werror=tautological-compare] if ((status & 0x2) == 1) ^~ If the path taken if this condition were true all we do is print a message to the kernel console. Since failures seem somewhat expected here (making the console message questionable anyway) and the condition has clearly never evaluated true we simply remove it, rather than attempting to fix it to check status correctly. Signed-off-by: Paul Burton <[email protected]> Patchwork: https://patchwork.linux-mips.org/patch/20174/ Cc: Ganesan Ramalingam <[email protected]> Cc: James Hogan <[email protected]> Cc: Jayachandran C <[email protected]> Cc: John Crispin <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: [email protected]
2018-08-08vhost: reset metadata cache when initializing new IOTLBJason Wang1-3/+6
We need to reset metadata cache during new IOTLB initialization, otherwise the stale pointers to previous IOTLB may be still accessed which will lead a use after free. Reported-by: [email protected] Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-07MIPS: VDSO: Force link endiannessPaul Burton1-0/+1
When building the VDSO with clang it appears to invoke ld without specifying endianness, even though clang itself was provided with a -EB or -EL flag. This results in the build failing due to a mismatch between the objects that are the input to ld, and the output it is attempting to create: VDSO arch/mips/vdso/vdso.so.dbg.raw mips-linux-ld: arch/mips/vdso/elf.o: compiled for a big endian system and target is little endian mips-linux-ld: arch/mips/vdso/elf.o: endianness incompatible with that of the selected emulation mips-linux-ld: failed to merge target specific data of file arch/mips/vdso/elf.o ... Work around this problem by explicitly specifying the link endianness using -Wl,-EB or -Wl,-EL when -EB or -EL are part of KBUILD_CFLAGS. This resolves the build failure when using clang, and doesn't have any negative effect on gcc. Signed-off-by: Paul Burton <[email protected]>
2018-08-07MIPS: Always specify -EB or -EL when using clangPaul Burton1-0/+10
When building using clang, always specify -EB or -EL in order to ensure we target the desired endianness. Since clang cross compiles using a single compiler build with multiple targets, our -dumpmachine tests which don't specify clang's --target argument check output based upon the build machine rather than the machine our build will target. This means our detection of whether to specify -EB fails miserably & we never do. Providing the endianness flag unconditionally for clang resolves this issue & simplifies the clang path somewhat. Signed-off-by: Paul Burton <[email protected]>
2018-08-07llc: use refcount_inc_not_zero() for llc_sap_find()Cong Wang2-2/+7
llc_sap_put() decreases the refcnt before deleting sap from the global list. Therefore, there is a chance llc_sap_find() could find a sap with zero refcnt in this global list. Close this race condition by checking if refcnt is zero or not in llc_sap_find(), if it is zero then it is being removed so we can just treat it as gone. Reported-by: <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-07dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()Alexey Kodanev1-2/+4
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value can lead to undefined behavior [1]. In order to fix this use a gradual shift of the window with a 'while' loop, similar to what tcp_cwnd_restart() is doing. When comparing delta and RTO there is a minor difference between TCP and DCCP, the last one also invokes dccp_cwnd_restart() and reduces 'cwnd' if delta equals RTO. That case is preserved in this change. [1]: [40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7 [40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int' [40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1 ... [40851.377176] Call Trace: [40851.408503] dump_stack+0xf1/0x17b [40851.451331] ? show_regs_print_info+0x5/0x5 [40851.503555] ubsan_epilogue+0x9/0x7c [40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4 [40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f [40851.686796] ? xfrm4_output_finish+0x80/0x80 [40851.739827] ? lock_downgrade+0x6d0/0x6d0 [40851.789744] ? xfrm4_prepare_output+0x160/0x160 [40851.845912] ? ip_queue_xmit+0x810/0x1db0 [40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp] [40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp] [40852.142412] dccp_sendmsg+0x428/0xb20 [dccp] [40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp] [40852.254833] ? sched_clock+0x5/0x10 [40852.298508] ? sched_clock+0x5/0x10 [40852.342194] ? inet_create+0xdf0/0xdf0 [40852.388988] sock_sendmsg+0xd9/0x160 ... Fixes: 113ced1f52e5 ("dccp ccid-2: Perform congestion-window validation") Signed-off-by: Alexey Kodanev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-07x86/mm/pti: Clone kernel-image on PTE level for 32 bitJoerg Roedel1-41/+99
On 32 bit the kernel sections are not huge-page aligned. When we clone them on PMD-level we unevitably map some areas that are normal kernel memory and may contain secrets to user-space. To prevent that we need to clone the kernel-image on PTE-level for 32 bit. Also make the page-table cloning code more general so that it can handle PMD and PTE level cloning. This can be generalized further in the future to also handle clones on the P4D-level. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: Pavel Machek <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-08-07x86/mm/pti: Don't clear permissions in pti_clone_pmd()Joerg Roedel1-6/+5
The function sets the global-bit on cloned PMD entries, which only makes sense when the permissions are identical between the user and the kernel page-table. Further, only write-permissions are cleared for entry-text and kernel-text sections, which are not writeable at the end of the boot process. The reason why this RW clearing exists is that in the early PTI implementations the cloned kernel areas were set up during early boot before the kernel text is set to read only and not touched afterwards. This is not longer true. The cloned areas are still set up early to get the entry code working for interrupts and other things, but after the kernel text has been set RO the clone is repeated which copies the RO PMD/PTEs over to the user visible clone. That means the initial clearing of the writable bit can be avoided. [ tglx: Amended changelog ] Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Laight <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eduardo Valentin <[email protected]> Cc: Greg KH <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrea Arcangeli <[email protected]> Cc: Waiman Long <[email protected]> Cc: Pavel Machek <[email protected]> Cc: "David H . Gutteridge" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-08-07tipc: fix an interrupt unsafe locking scenarioYing Xue1-3/+1
Commit 9faa89d4ed9d ("tipc: make function tipc_net_finalize() thread safe") tries to make it thread safe to set node address, so it uses node_list_lock lock to serialize the whole process of setting node address in tipc_net_finalize(). But it causes the following interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- rht_deferred_worker() rhashtable_rehash_table() lock(&(&ht->lock)->rlock) tipc_nl_compat_doit() tipc_net_finalize() local_irq_disable(); lock(&(&tn->node_list_lock)->rlock); tipc_sk_reinit() rhashtable_walk_enter() lock(&(&ht->lock)->rlock); <Interrupt> tipc_disc_rcv() tipc_node_check_dest() tipc_node_create() lock(&(&tn->node_list_lock)->rlock); *** DEADLOCK *** When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't disable BH. So if an interrupt happens after the lock, it can create an inverse lock ordering between ht->lock and tn->node_list_lock. As a consequence, deadlock might happen. The reason causing the inverse lock ordering scenario above is because the initial purpose of node_list_lock is not designed to do the serialization of node address setting. As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic, we use it to replace node_list_lock to ensure setting node address can be atomically finished. It turns out the potential deadlock can be avoided as well. Fixes: 9faa89d4ed9d ("tipc: make function tipc_net_finalize() thread safe") Signed-off-by: Ying Xue <[email protected]> Acked-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>