aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-06-08bpf, tests: fix endianness selectionDaniel Borkmann1-11/+30
I noticed that test_l4lb was failing in selftests: # ./test_progs test_pkt_access:PASS:ipv4 77 nsec test_pkt_access:PASS:ipv6 44 nsec test_xdp:PASS:ipv4 2933 nsec test_xdp:PASS:ipv6 1500 nsec test_l4lb:PASS:ipv4 377 nsec test_l4lb:PASS:ipv6 544 nsec test_l4lb:FAIL:stats 6297600000 200000 test_tcp_estats:PASS: 0 nsec Summary: 7 PASSED, 1 FAILED Tracking down the issue actually revealed that endianness selection in bpf_endian.h is broken when compiled with clang with bpf target. test_pkt_access.c, test_l4lb.c is compiled with __BYTE_ORDER as __BIG_ENDIAN, test_xdp.c as __LITTLE_ENDIAN! test_l4lb noticeably fails, because the test accounts bytes via bpf_ntohs(ip6h->payload_len) and bpf_ntohs(iph->tot_len), and compares them against a defined value and given a wrong endianness, the test outcome is different, of course. Turns out that there are actually two bugs: i) when we do __BYTE_ORDER comparison with __LITTLE_ENDIAN/__BIG_ENDIAN, then depending on the include order we see different outcomes. Reason is that __BYTE_ORDER is undefined due to missing endian.h include. Before we include the asm/byteorder.h (e.g. through linux/in.h), then __BYTE_ORDER equals __LITTLE_ENDIAN since both are undefined, after the include which correctly pulls in linux/byteorder/little_endian.h, __LITTLE_ENDIAN is defined, but given __BYTE_ORDER is still undefined, we match on __BYTE_ORDER equals to __BIG_ENDIAN since __BIG_ENDIAN is also undefined at that point, sigh. ii) But even that would be wrong, since when compiling the test cases with clang, one can select between bpfeb and bpfel targets for cross compilation. Hence, we can also not rely on what the system's endian.h provides, but we need to look at the compiler's defined endianness. The compiler defines __BYTE_ORDER__, and we can match __ORDER_LITTLE_ENDIAN__ and __ORDER_BIG_ENDIAN__, which also reflects targets bpf (native), bpfel, bpfeb correctly, thus really only rely on that. After patch: # ./test_progs test_pkt_access:PASS:ipv4 74 nsec test_pkt_access:PASS:ipv6 42 nsec test_xdp:PASS:ipv4 2340 nsec test_xdp:PASS:ipv6 1461 nsec test_l4lb:PASS:ipv4 400 nsec test_l4lb:PASS:ipv6 530 nsec test_tcp_estats:PASS: 0 nsec Summary: 7 PASSED, 0 FAILED Fixes: 43bcf707ccdc ("bpf: fix _htons occurences in test_progs") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08ethtool.h: remind to update 802.3ad when adding new speedsNicolas Dichtel1-2/+4
Each time a new speed is added, the bonding 802.3ad isn't updated. Add a comment to remind the developer to update this driver. Signed-off-by: Nicolas Dichtel <[email protected]> Acked-by: Andy Gospodarek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08bonding: fix 802.3ad support for 14G speedNicolas Dichtel1-0/+9
This patch adds 14 Gbps enum definition, and fixes aggregated bandwidth calculation based on above slave links. Fixes: 0d7e2d2166f6 ("IB/ipoib: add get_link_ksettings in ethtool") Signed-off-by: Nicolas Dichtel <[email protected]> Acked-by: Andy Gospodarek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08bonding: fix 802.3ad support for 5G and 50G speedsThibaut Collet1-0/+18
This patch adds [5|50] Gbps enum definition, and fixes aggregated bandwidth calculation based on above slave links. Fixes: c9a70d43461d ("net-next: ethtool: Added port speed macros.") Signed-off-by: Thibaut Collet <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Acked-by: Andy Gospodarek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08openvswitch: warn about missing first netlink attributeNicolas Dichtel1-0/+1
The first netlink attribute (value 0) must always be defined as none/unspec. Because we cannot change an existing UAPI, I add a comment to point the mistake and avoid to propagate it in a new ovs API in the future. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08[media] media/cec.h: use IS_REACHABLE instead of IS_ENABLEDHans Verkuil1-1/+1
Fix messages like this: adv7842.c:(.text+0x2edadd): undefined reference to `cec_unregister_adapter' when CEC_CORE=m but the driver including media/cec.h is built-in. In that case the static inlines provided in media/cec.h should be used by that driver. Reported-by: Randy Dunlap <[email protected]> Reported-by: kbuild test robot <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2017-06-08ila_xlat: add missing hash secret initializationArnd Bergmann1-0/+1
While discussing the possible merits of clang warning about unused initialized functions, I found one function that was clearly meant to be called but never actually is. __ila_hash_secret_init() initializes the hash value for the ila locator, apparently this is intended to prevent hash collision attacks, but this ends up being a read-only zero constant since there is no caller. I could find no indication of why it was never called, the earliest patch submission for the module already was like this. If my interpretation is right, we certainly want to backport the patch to stable kernels as well. I considered adding it to the ila_xlat_init callback, but for best effect the random data is read as late as possible, just before it is first used. The underlying net_get_random_once() is already highly optimized to avoid overhead when called frequently. Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Cc: [email protected] Link: https://www.spinics.net/lists/kernel/msg2527243.html Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08perf symbols: Kill dso__build_id_is_kmod()Namhyung Kim3-50/+0
The commit e7ee40475760 ("perf symbols: Fix symbols searching for module in buildid-cache") added the function to check kernel modules reside in the build-id cache. This was because there's no way to identify a DSO which is actually a kernel module. So it searched linkname of the file and find ".ko" suffix. But this does not work for compressed kernel modules and now such DSOs hCcave correct symtab_type now. So no need to check it anymore. This patch essentially reverts the commit. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf symbols: Keep DSO->symtab_type after decompressNamhyung Kim1-0/+2
The symsrc__init() overwrites dso->symtab_type as symsrc->type in dso__load_sym(). But for compressed kernel modules in the build-id cache, it should have original symtab type to be decompressed as needed. This fixes perf annotate to show disassembly of the function properly. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf tests: Decompress kernel module before objdumpNamhyung Kim1-1/+19
If a kernel modules is compressed, it should be decompressed before running objdump to parse binary data correctly. This fixes a failure of object code reading test for me. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf tools: Consolidate error path in __open_dso()Namhyung Kim1-11/+8
On failure, it should free the 'name', so clean up the error path using goto. Signed-off-by: Namhyung Kim <[email protected]> Suggested-by: Arnaldo Carvalho de Melo <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf tools: Decompress kernel module when reading DSO dataNamhyung Kim1-0/+16
Currently perf decompresses kernel modules when loading the symbol table but it missed to do it when reading raw data. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf annotate: Use dso__decompress_kmodule_path()Namhyung Kim1-24/+3
Convert open-coded decompress routine to use the function. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf tools: Introduce dso__decompress_kmodule_{fd,path}Namhyung Kim3-35/+65
Move decompress_kmodule() to util/dso.c and split it into two functions returning fd and (decompressed) file path. The existing user only wants the fd version but the path version will be used soon. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf tools: Fix a memory leak in __open_dso()Namhyung Kim1-1/+3
The 'name' variable should be freed on the error path. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08perf annotate: Fix symbolic link of build-id cacheNamhyung Kim1-1/+9
The commit 6ebd2547dd24 ("perf annotate: Fix a bug following symbolic link of a build-id file") changed to use dirname to follow the symlink. But it only considers new-style build-id cache names so old names fail on readlink() and force to use system path which might not available. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Taeung Song <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] Fixes: 6ebd2547dd24 ("perf annotate: Fix a bug following symbolic link of a build-id file") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2017-06-08Merge branch 'for-linus' of ↵Linus Torvalds1-36/+10
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk fix from Petr Mladek: "This reverts a fix added into 4.12-rc1. It caused the kernel log to be printed on another console when two consoles of the same type were defined, e.g. console=ttyS0 console=ttyS1. This configuration was never supported by kernel itself, but it started to make sense with systemd. In other words, the commit broke userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: Revert "printk: fix double printing with earlycon"
2017-06-08Merge branch 'linus' of ↵Linus Torvalds3-8/+5
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a couple of places in the crypto code that were doing interruptible sleeps dangerously. They have been converted to use non-interruptible sleeps. This also fixes a bug in asymmetric_keys where it would trigger a use-after-free if a request returned EBUSY due to a full device queue" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: gcm - wait for crypto op not signal safe crypto: drbg - wait for crypto op not signal safe crypto: asymmetric_keys - handle EBUSY due to backlog correctly
2017-06-08net: Fix build regression in rtl8723bs staging driver.David S. Miller2-3/+2
drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c: In function ‘rtw_cfg80211_add_monitor_if’: drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c:2670:10: error: ‘struct net_device’ has no member named ‘destructor’ mon_ndev->destructor = rtw_ndev_destructor; ^ Signed-off-by: David S. Miller <[email protected]>
2017-06-08block, bfq: access and cache blkg data only when safePaolo Valente3-36/+105
In blk-cgroup, operations on blkg objects are protected with the request_queue lock. This is no more the lock that protects I/O-scheduler operations in blk-mq. In fact, the latter are now protected with a finer-grained per-scheduler-instance lock. As a consequence, although blkg lookups are also rcu-protected, blk-mq I/O schedulers may see inconsistent data when they access blkg and blkg-related objects. BFQ does access these objects, and does incur this problem, in the following case. The blkg_lookup performed in bfq_get_queue, being protected (only) through rcu, may happen to return the address of a copy of the original blkg. If this is the case, then the blkg_get performed in bfq_get_queue, to pin down the blkg, is useless: it does not prevent blk-cgroup code from destroying both the original blkg and all objects directly or indirectly referred by the copy of the blkg. BFQ accesses these objects, which typically causes a crash for NULL-pointer dereference of memory-protection violation. Some additional protection mechanism should be added to blk-cgroup to address this issue. In the meantime, this commit provides a quick temporary fix for BFQ: cache (when safe) blkg data that might disappear right after a blkg_lookup. In particular, this commit exploits the following facts to achieve its goal without introducing further locks. Destroy operations on a blkg invoke, as a first step, hooks of the scheduler associated with the blkg. And these hooks are executed with bfqd->lock held for BFQ. As a consequence, for any blkg associated with the request queue an instance of BFQ is attached to, we are guaranteed that such a blkg is not destroyed, and that all the pointers it contains are consistent, while that instance is holding its bfqd->lock. A blkg_lookup performed with bfqd->lock held then returns a fully consistent blkg, which remains consistent until this lock is held. In more detail, this holds even if the returned blkg is a copy of the original one. Finally, also the object describing a group inside BFQ needs to be protected from destruction on the blkg_free of the original blkg (which invokes bfq_pd_free). This commit adds private refcounting for this object, to let it disappear only after no bfq_queue refers to it any longer. This commit also removes or updates some stale comments on locking issues related to blk-cgroup operations. Reported-by: Tomas Konir <[email protected]> Reported-by: Lee Tibbert <[email protected]> Reported-by: Marco Piazza <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Tested-by: Tomas Konir <[email protected]> Tested-by: Lee Tibbert <[email protected]> Tested-by: Marco Piazza <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-06-08Merge branch 'netvsc-bug-fixes'David S. Miller3-39/+50
Stephen Hemminger says: ==================== netvsc: bug fixes These are bugfixes for netvsc driver in 4.12. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-06-08netvsc: move filter setting to rndis_devicestephen hemminger3-34/+34
The work queue and handling of network filter parameters should be in rndis_device. This gets rid of warning from RCU checks, eliminates a race and cleans up code. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08netvsc: fix net poll modestephen hemminger1-4/+15
The ndo_poll_controller function needs to schedule NAPI to pick up arriving packets and send completions. Otherwise no data will ever be received. For simple case of netconsole, it also will allow send completions to happen. Without this netpoll will eventually get stuck. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08netvsc: fix rcu dereference warning from ethtoolstephen hemminger1-1/+1
The ethtool info command calls the netvsc get_sset_count with RTNL but not with RCU. Which causes warning: drivers/net/hyperv/netvsc_drv.c:1010 suspicious rcu_dereference_check() usage! Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08srcu: Allow use of Classic SRCU from both process and interrupt contextPaolo Bonzini2-5/+2
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Cc: [email protected] Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <[email protected]> Suggested-by: Linu Cherian <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2017-06-08srcu: Allow use of Tiny/Tree SRCU from both process and interrupt contextPaolo Bonzini2-6/+6
Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on a single variable. Therefore, as Peter Zijlstra pointed out, Tiny SRCU's implementation already supports mixed-context use of srcu_read_lock() and srcu_read_unlock(), at least as long as uses of srcu_read_lock() and srcu_read_unlock() in each handler are nested and paired properly. In other words, it is still illegal to (say) invoke srcu_read_lock() in an interrupt handler and to invoke the matching srcu_read_unlock() in a softirq handler. Therefore, the only change required for Tiny SRCU is to its comments. Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <[email protected]> Suggested-by: Linu Cherian <[email protected]> Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]> Cc: Linus Torvalds <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Paolo Bonzini <[email protected]>
2017-06-08xfs: fix spurious spin_is_locked() assert failures on non-smp kernelsBrian Foster2-4/+3
The 0-day kernel test robot reports assertion failures on !CONFIG_SMP kernels due to failed spin_is_locked() checks. As it turns out, spin_is_locked() is hardcoded to return zero on !CONFIG_SMP kernels and so this function cannot be relied on to verify spinlock state in this configuration. To avoid this problem, replace the associated asserts with lockdep variants that do the right thing regardless of kernel configuration. Drop the one assert that checks for an unlocked lock as there is no suitable lockdep variant for that case. This moves the spinlock checks from XFS debug code to lockdep, but generally provides the same level of protection. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-06-08net: ipv6: Release route when device is unregisteringDavid Ahern2-0/+6
Roopa reported attempts to delete a bond device that is referenced in a multipath route is hanging: $ ifdown bond2 # ifupdown2 command that deletes virtual devices unregister_netdevice: waiting for bond2 to become free. Usage count = 2 Steps to reproduce: echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown ip link add dev bond12 type bond ip link add dev bond13 type bond ip addr add 2001:db8:2::0/64 dev bond12 ip addr add 2001:db8:3::0/64 dev bond13 ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2 ip link del dev bond12 ip link del dev bond13 The root cause is the recent change to keep routes on a linkdown. Update the check to detect when the device is unregistering and release the route for that case. Fixes: a1a22c12060e4 ("net: ipv6: Keep nexthop of multipath route on admin down") Reported-by: Roopa Prabhu <[email protected]> Signed-off-by: David Ahern <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08net: Zero ifla_vf_info in rtnl_fill_vfinfo()Mintz, Yuval1-1/+2
Some of the structure's fields are not initialized by the rtnetlink. If driver doesn't set those in ndo_get_vf_config(), they'd leak memory to user. Signed-off-by: Yuval Mintz <[email protected]> CC: Michal Schmidt <[email protected]> Reviewed-by: Greg Rose <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skbMateusz Jurczyk1-1/+3
Verify that the length of the socket buffer is sufficient to cover the nlmsghdr structure before accessing the nlh->nlmsg_len field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08Revert "decnet: dn_rtmsg: Improve input length sanitization in ↵David S. Miller1-3/+1
dnrmg_receive_user_skb" This reverts commit 85eac2ba35a2dbfbdd5767c7447a4af07444a5b4. There is an updated version of this fix which we should use instead. Signed-off-by: David S. Miller <[email protected]>
2017-06-08net: emac: fix and unify emac_mdio functionsChristian Lamparter1-23/+18
emac_mdio_read_link() was not copying the requested phy settings back into the emac driver's own phy api. This has caused a link speed mismatch issue for the AR8035 as the emac driver kept trying to connect with 10/100MBps on a 1GBit/s link. This patch also unifies shared code between emac_setup_aneg() and emac_mdio_setup_forced(). And furthermore it removes a chunk of emac_mdio_init_phy(), that was copying the same data into itself. Signed-off-by: Christian Lamparter <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08net: emac: fix reset timeout with AR8035 phyChristian Lamparter1-4/+22
This patch fixes a problem where the AR8035 PHY can't be detected on an Cisco Meraki MR24, if the ethernet cable is not connected on boot. Russell Senior provided steps to reproduce the issue: |Disconnect ethernet cable, apply power, wait until device has booted, |plug in ethernet, check for interfaces, no eth0 is listed. | |This appears to be a problem during probing of the AR8035 Phy chip. |When ethernet has no link, the phy detection fails, and eth0 is not |created. Plugging ethernet later has no effect, because there is no |interface as far as the kernel is concerned. The relevant part of |the boot log looks like this: |this is the failing case: | |[ 0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode |[ 0.882532] /plb/opb/ethernet@ef600c00: reset timeout |[ 0.888546] /plb/opb/ethernet@ef600c00: can't find PHY! |and the succeeding case: | |[ 0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode |[ 0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:.. |[ 0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01) Based on the comment and the commit message of commit 23fbb5a87c56 ("emac: Fix EMAC soft reset on 460EX/GT"). This is because the AR8035 PHY doesn't provide the TX Clock, if the ethernet cable is not attached. This causes the reset to timeout and the PHY detection code in emac_init_phy() is unable to detect the AR8035 PHY. As a result, the emac driver bails out early and the user left with no ethernet. In order to stay compatible with existing configurations, the driver tries the current reset approach at first. Only if the first attempt timed out, it does perform one more retry with the clock temporarily switched to the internal source for just the duration of the reset. LEDE-Bug: #687 <https://bugs.lede-project.org/index.php?do=details&task_id=687> Cc: Chris Blake <[email protected]> Reported-by: Russell Senior <[email protected]> Fixes: 23fbb5a87c56e98 ("emac: Fix EMAC soft reset on 460EX/GT") Signed-off-by: Christian Lamparter <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skbMateusz Jurczyk1-1/+3
Verify that the length of the socket buffer is sufficient to cover the entire nlh->nlmsg_len field before accessing that field for further input sanitization. If the client only supplies 1-3 bytes of data in sk_buff, then nlh->nlmsg_len remains partially uninitialized and contains leftover memory from the corresponding kernel allocation. Operating on such data may result in indeterminate evaluation of the nlmsg_len < sizeof(*nlh) expression. The bug was discovered by a runtime instrumentation designed to detect use of uninitialized memory in the kernel. The patch prevents this and other similar tools (e.g. KMSAN) from flagging this behavior in the future. Signed-off-by: Mateusz Jurczyk <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08Merge tag 'kvm-s390-master-4.12-1' of ↵Paolo Bonzini3-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix for master (4.12) - The newly created AIS capability enables the feature unconditionally and ignores the cpu model
2017-06-08Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linusJens Axboe4-31/+67
Christoph writes: "A few NVMe fixes for 4.12-rc, PCIe reset fixes and APST fixes, a RDMA reconnect fix, two FC fixes and a general controller removal fix."
2017-06-08hsi: Fix build regression due to netdev destructor fix.David S. Miller1-1/+1
> ../drivers/hsi/clients/ssi_protocol.c:1069:5: error: 'struct net_device' has no member named 'destructor' Reported-by: Mark Brown <[email protected]> Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08net: s390: fix up for "Fix inconsistent teardown and release of private ↵Stephen Rothwell1-2/+2
netdev state" Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-06-08drm/i915: fix warning for unused variableJani Nikula1-2/+0
drivers/gpu/drm/i915/intel_engine_cs.c: In function ‘intel_engine_is_idle’: drivers/gpu/drm/i915/intel_engine_cs.c:1103:27: error: unused variable ‘dev_priv’ [-Werror=unused-variable] struct drm_i915_private *dev_priv = engine->i915; ^~~~~~~~ Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Jani Nikula <[email protected]>
2017-06-08Fix loop device flush before configure v3James Wang1-0/+3
While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be the removal of a bound device check in loop_flush() by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq"). Restoring this check, examining ->lo_state as set by loop_set_fd() eliminates the bad behavior. Test method: modprobe loop max_loop=64 dd if=/dev/zero of=disk bs=512 count=200K for((i=0;i<4;i++))do losetup -f disk; done mkfs.ext4 -F /dev/loop0 for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done for f in `ls /dev/loop[0-9]*|sort`; do \ echo $f; dd if=$f of=/dev/null bs=512 count=1; \ done Test output: stock patched /dev/loop0 18.1217e-05 8.3842e-05 /dev/loop1 6.1114e-05 0.000147979 /dev/loop10 0.414701 0.000116564 /dev/loop11 0.7474 6.7942e-05 /dev/loop12 0.747986 8.9082e-05 /dev/loop13 0.746532 7.4799e-05 /dev/loop14 0.480041 9.3926e-05 /dev/loop15 1.26453 7.2522e-05 Note that from loop10 onward, the device is not mounted, yet the stock kernel consumes several orders of magnitude more wall time than it does for a mounted device. (Thanks for Mike Galbraith <[email protected]>, give a changelog review.) Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: James Wang <[email protected]> Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq") Signed-off-by: Jens Axboe <[email protected]>
2017-06-08s390: update defconfigMartin Schwidefsky5-21/+87
Signed-off-by: Martin Schwidefsky <[email protected]>
2017-06-08MIPS: kprobes: flush_insn_slot should flush only if probe initialisedMarcin Nowakowski1-1/+2
When ftrace is used with kprobes, it is possible for a kprobe to contain an invalid location (ie. only initialised to 0 and not to a specific location in the code). Trying to perform a cache flush on such location leads to a crash r4k_flush_icache_range(). Fixes: c1bf207d6ee1 ("MIPS: kprobe: Add support.") Signed-off-by: Marcin Nowakowski <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/16296/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-08KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulationWanpeng Li1-9/+11
If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it potentially can be exploited the vulnerability. this will out-of-bounds read and write. Luckily, the effect is small: /* when no next entry is found, the current entry[i] is reselected */ for (j = i + 1; ; j = (j + 1) % nent) { struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j]; if (ej->function == e->function) { It reads ej->maxphyaddr, which is user controlled. However... ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT; After cpuid_entries there is int maxphyaddr; struct x86_emulate_ctxt emulate_ctxt; /* 16-byte aligned */ So we have: - cpuid_entries at offset 1B50 (6992) - maxphyaddr at offset 27D0 (6992 + 3200 = 10192) - padding at 27D4...27DF - emulate_ctxt at 27E0 And it writes in the padding. Pfew, writing the ops field of emulate_ctxt would have been much worse. This patch fixes it by modding the index to avoid the out-of-bounds access. Worst case, i == j and ej->function == e->function, the loop can bail out. Reported-by: Moguofang <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Guofang Mo <[email protected]> Cc: [email protected] Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2017-06-08Merge tag 'kvm-arm-for-v4.12-rc5-take2' of ↵Paolo Bonzini12-39/+131
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Fixes for v4.12-rc5 - Take 2 Changes include: - Fix an issue with migrating GICv2 VMs on GICv3 systems. - Squashed a bug for gicv3 when figuring out preemption levels. - Fix a potential null pointer derefence in KVM happening under memory pressure. - Maintain RES1 bits in the SCTLR_EL2 to make sure KVM works on new architecture revisions. - Allow unaligned accesses at EL2/HYP
2017-06-08MIPS: ftrace: fix init functions tracingMarcin Nowakowski1-19/+5
Since introduction of tracing for init functions the in_kernel_space() check is no longer correct, as it ignores the init sections. As a result, when probes are inserted (and disabled) in the init functions, a branch instruction is inserted instead of a nop, which is likely to result in random crashes during boot. Remove the MIPS-specific in_kernel_space() method and replace it with a generic core_kernel_text() that also checks for init sections during system boot stage. Fixes: 42c269c88dc1 ("ftrace: Allow for function tracing to record init functions on boot up") Signed-off-by: Marcin Nowakowski <[email protected]> Tested-by: Matt Redfearn <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/16092/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-08MIPS: mm: adjust PKMAP locationMarcin Nowakowski1-1/+6
Space reserved for PKMap should span from PKMAP_BASE to FIXADDR_START. For large page sizes this is not the case as eg. for 64k pages the range currently defined is from 0xfe000000 to 0x102000000(!!) which obviously isn't right. Remove the hardcoded location and set the BASE address as an offset from FIXADDR_START. Since all PKMAP ptes have to be placed in a contiguous memory, ensure that this is the case by placing them all in a single page. This is achieved by aligning the end address to pkmap pages count pages. Signed-off-by: Marcin Nowakowski <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/15950/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-08MIPS: highmem: ensure that we don't use more than one page for PTEsMarcin Nowakowski1-0/+5
All PTEs used by PKMAP should be allocated in a contiguous memory area, but we do not currently have a mechanism to enforce that, so ensure that we don't try to allocate more entries than would fit in a single page. Current fixed value of 1024 would not work with XPA enabled when sizeof(pte_t)==8 and we need two pages to store pte tables. Signed-off-by: Marcin Nowakowski <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/15949/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-08MIPS: mm: fixed mappings: correct initialisationMarcin Nowakowski1-3/+3
fixrange_init operates at PMD-granularity and expects the addresses to be PMD-size aligned, but currently that might not be the case for PKMAP_BASE unless it is defined properly, so ensure a correct alignment is used before passing the address to fixrange_init. fixed mappings: only align the start address that is passed to fixrange_init rather than the value before adding the size, as we may end up with uninitialised upper part of the range. Signed-off-by: Marcin Nowakowski <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/15948/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-08MIPS: perf: Remove incorrect odd/even counter handling for I6400Marcin Nowakowski1-1/+5
All performance counters on I6400 (odd and even) are capable of counting any of the available events, so drop current logic of using the extra bit to determine which counter to use. Signed-off-by: Marcin Nowakowski <[email protected]> Fixes: 4e88a8621301 ("MIPS: Add cases for CPU_I6400") Fixes: fd716fca10fc ("MIPS: perf: Fix I6400 event numbers") Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/15991/ Signed-off-by: Ralf Baechle <[email protected]>
2017-06-08powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by defaultMichael Ellerman2-11/+11
The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with other general kernel options. It's really more at home in the "Platform Support" section, so move it there. Also enable it by default, for Book3s 64. It does mostly nothing unless the device tree properties are found, and we will want it enabled eventually in distro kernels, so turn it on to start getting more testing. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Signed-off-by: Michael Ellerman <[email protected]>