aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-12-06virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()Andy Lutomirski1-5/+14
With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a pointer to the stack and it will OOPS. Copy the address to the heap to prevent the crash. Cc: Michael S. Tsirkin <[email protected]> Cc: Jason Wang <[email protected]> Cc: Laura Abbott <[email protected]> Reported-by: [email protected] Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Jason Wang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-06tcp: warn on bogus MSS and try to amend itMarcelo Ricardo Leitner1-1/+21
There have been some reports lately about TCP connection stalls caused by NIC drivers that aren't setting gso_size on aggregated packets on rx path. This causes TCP to assume that the MSS is actually the size of the aggregated packet, which is invalid. Although the proper fix is to be done at each driver, it's often hard and cumbersome for one to debug, come to such root cause and report/fix it. This patch amends this situation in two ways. First, it adds a warning on when this situation occurs, so it gives a hint to those trying to debug this. It also limit the maximum probed MSS to the adverised MSS, as it should never be any higher than that. The result is that the connection may not have the best performance ever but it shouldn't stall, and the admin will have a hint on what to look for. Tested with virtio by forcing gso_size to 0. v2: updated msg per David's suggestion v3: use skb_iif to find the interface and also log its name, per Eric Dumazet's suggestion. As the skb may be backlogged and the interface gone by then, we need to check if the number still has a meaning. v4: use helper tcp_gro_dev_warn() and avoid pr_warn_once inside __once, per David's suggestion Cc: Jonathan Maxwell <[email protected]> Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-06uapi glibc compat: fix outer guard of net device flags enumJonas Gorski1-2/+2
Fix a wrong condition preventing the higher net device flags IFF_LOWER_UP etc to be defined if net/if.h is included before linux/if.h. The comment makes it clear the intention was to allow partial definition with either parts. This fixes compilation of userspace programs trying to use IFF_LOWER_UP, IFF_DORMANT or IFF_ECHO. Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h") Signed-off-by: Jonas Gorski <[email protected]> Reviewed-by: Mikko Rapeli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-06net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before ↵Niklas Cassel3-2/+8
writing WR_OSR_LMT and RD_OSR_LMT have a reset value of 1. Since the reset value wasn't cleared before writing, the value in the register would be incorrect if specifying an uneven value for snps,wr_osr_lmt/snps,rd_osr_lmt. Zero is a valid value for the properties, since the databook specifies: maximum outstanding requests = WR_OSR_LMT + 1. We do not want to change the behavior for existing users when the property is missing. Therefore, default to 1 if the property is missing, since that is the same as the reset value. Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-06fuse: fix clearing suid, sgid for chown()Miklos Szeredi1-5/+2
Basically, the pjdfstests set the ownership of a file to 06555, and then chowns it (as root) to a new uid/gid. Prior to commit a09f99eddef4 ("fuse: fix killing s[ug]id in setattr"), fuse would send down a setattr with both the uid/gid change and a new mode. Now, it just sends down the uid/gid change. Technically this is NOTABUG, since POSIX doesn't _require_ that we clear these bits for a privileged process, but Linux (wisely) has done that and I think we don't want to change that behavior here. This is caused by the use of should_remove_suid(), which will always return 0 when the process has CAP_FSETID. In fact we really don't need to be calling should_remove_suid() at all, since we've already been indicated that we should remove the suid, we just don't want to use a (very) stale mode for that. This patch should fix the above as well as simplify the logic. Reported-by: Jeff Layton <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Fixes: a09f99eddef4 ("fuse: fix killing s[ug]id in setattr") Cc: <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2016-12-06lockdep: Fix report formattingDmitry Vyukov1-54/+57
Since commit: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines") printk() requires KERN_CONT to continue log messages. Lots of printk() in lockdep.c and print_ip_sym() don't have it. As the result lockdep reports are completely messed up. Add missing KERN_CONT and inline print_ip_sym() where necessary. Example of a messed up report: 0-rc5+ #41 Not tainted ------------------------------------------------------- syz-executor0/5036 is trying to acquire lock: ( rtnl_mutex ){+.+.+.} , at: [<ffffffff86b3d6ac>] rtnl_lock+0x1c/0x20 but task is already holding lock: ( &net->packet.sklist_lock ){+.+...} , at: [<ffffffff873541a6>] packet_diag_dump+0x1a6/0x1920 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ( &net->packet.sklist_lock +.+...} ... Without this patch all scripts that parse kernel bug reports are broken. Signed-off-by: Dmitry Vyukov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-12-06perf/core: Remove invalid warning from list_update_cgroup_even()tDavid Carrillo-Cisneros1-11/+8
The warning introduced in commit: 864c2357ca89 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups") assumed that a cgroup switch always precedes list_del_event. This is not the case. Remove warning. Make sure that cpuctx->cgrp is NULL until a cgroup event is sched in or ctx->nr_cgroups == 0. Signed-off-by: David Carrillo-Cisneros <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Nilay Vaish <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi V Shankar <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vikas Shivappa <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-12-06perf/x86: Fix full width counter, counter overflowPeter Zijlstra (Intel)2-2/+2
Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM. Both these parts have full_width_write set, and that does indeed have a problem. In order to deal with counter wrap, we must sample the counter at at least half the counter period (see also the sampling theorem) such that we can unambiguously reconstruct the count. However commit: 069e0c3c4058 ("perf/x86/intel: Support full width counting") sets the sampling interval to the full period, not half. Fixing that exposes another issue, in that we must not sign extend the delta value when we shift it right; the counter cannot have decremented after all. With both these issues fixed, counter overflow functions correctly again. Reported-by: Lukasz Odzioba <[email protected]> Tested-by: Liang, Kan <[email protected]> Tested-by: Odzioba, Lukasz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Fixes: 069e0c3c4058 ("perf/x86/intel: Support full width counting") Signed-off-by: Ingo Molnar <[email protected]>
2016-12-06perf/x86/intel: Enable C-state residency events for Knights MillPiotr Luc1-0/+1
The Knights Mill is enough close to Knights Landing so the path reuses C-state residency support of the latter. Signed-off-by: Piotr Luc <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-12-06objtool: Fix bytes check of lea's rex_prefixJiri Slaby1-1/+1
For the "lea %(rsp), %rbp" case, we check if there is a rex_prefix. But we check 'bytes' which is insn_byte_t[4] in rex_prefix (insn_field structure). Therefore, the check is always true. Instead, check 'nbytes' which is the right one. Signed-off-by: Jiri Slaby <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-12-05netlink: Do not schedule work from sk_destructHerbert Xu1-17/+15
It is wrong to schedule a work from sk_destruct using the socket as the memory reserve because the socket will be freed immediately after the return from sk_destruct. Instead we should do the deferral prior to sk_free. This patch does just that. Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread") Signed-off-by: Herbert Xu <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05uapi: export nf_log.hstephen hemminger1-0/+1
File is in uapi directory but not being copied on make install_headers Fixes commit 4ec9c8fbbc22 ("netfilter: nft_log: complete NFTA_LOG_FLAGS attr support"). Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05uapi: export tc_skbmod.hstephen hemminger1-0/+1
Fixes commit 735cffe5d800 ("net_sched: Introduce skbmod action") Not used by iproute2 but maybe in future. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: ep93xx_eth: Do not crash unloading moduleFlorian Fainelli1-0/+4
When we unload the ep93xx_eth, whether we have opened the network interface or not, we will either hit a kernel paging request error, or a simple NULL pointer de-reference because: - if ep93xx_open has been called, we have created a valid DMA mapping for ep->descs, when we call ep93xx_stop, we also call ep93xx_free_buffers, ep->descs now has a stale value - if ep93xx_open has not been called, we have a NULL pointer for ep->descs, so performing any operation against that address just won't work Fix this by adding a NULL pointer check for ep->descs which means that ep93xx_free_buffers() was able to successfully tear down the descriptors and free the DMA cookie as well. Fixes: 1d22e05df818 ("[PATCH] Cirrus Logic ep93xx ethernet driver") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05Merge branch 'bnx2x-fixes'David S. Miller2-2/+10
Yuval Mintz says: ==================== bnx2x: fixes series Two unrelated fixes for bnx2x - the first one is nice-to-have, while the other fixes fatal behaviour in older adapters. Please consider applying them to `net'. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-12-05bnx2x: Prevent tunnel config for 577xxMintz, Yuval1-2/+2
Only the 578xx adapters are capable of configuring UDP ports for the purpose of tunnelling - doing the same on 577xx might lead to a firmware assertion. We're already not claiming support for any related feature for such devices, but we also need to prevent the configuration of the UDP ports to the device in this case. Fixes: f34fa14cc033 ("bnx2x: Add vxlan RSS support") Reported-by: Anikina Anna <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05bnx2x: Correct ringparam estimate when DOWNMintz, Yuval1-0/+8
Until interface is up [and assuming ringparams weren't explicitly configured] when queried for the size of its rings bnx2x would claim they're the maximal size by default. That is incorrect as by default the maximal number of buffers would be equally divided between the various rx rings. This prevents the user from actually setting the number of elements on each rx ring to be of maximal size prior to transitioning the interface into up state. To fix this, make a rough estimation about the number of buffers. It wouldn't always be accurate, but it would be much better than current estimation and would allow users to increase number of buffers during early initialization of the interface. Reported-by: Seymour, Shane <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05isdn: hisax: set error code on failurePan Bian1-0/+1
In function hfc4s8s_probe(), the value of return variable err should be negative on failures. However, when the call to request_region() returns NULL, the value of err is 0. This patch fixes the bug, assigning "-EBUSY" to err on the path that request_region() fails. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931 Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: bnx2x: fix improper return valuePan Bian1-0/+1
Macro BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate memory, and jumps to label "lbl" if the allocation fails. Label "lbl" first cleans memory and then returns variable rc. Before calling the macro, the value of variable rc is 0. Because 0 means no error, the callers of bnx2x_init_firmware() may be misled. This patch fixes the bug, assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141 Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: ethernet: qlogic: set error code on failurePan Bian1-0/+1
When calling dma_mapping_error(), the value of return variable rc is 0. And when the call returns an unexpected value, rc is not set to a negative errno. Thus, it will return 0 on the error path, and its callers cannot detect the bug. This patch fixes the bug, assigning "-ENOMEM" to err. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189041 Signed-off-by: Pan Bian <[email protected]> Acked-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05atm: fix improper return valuePan Bian1-1/+1
It returns variable "error" when ioremap_nocache() returns a NULL pointer. The value of "error" is 0 then, which will mislead the callers to believe that there is no error. This patch fixes the bug, returning "-ENOMEM". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189021 Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: irda: set error code on failuresPan Bian1-0/+1
When the calls to kzalloc() fail, the value of return variable ret may be 0. 0 means success in this context. This patch fixes the bug, assigning "-ENOMEM" to ret before calling kzalloc(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188971 Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: caif: remove ineffective checkPan Bian1-4/+1
The check of the return value of sock_register() is ineffective. "if(!err)" seems to be a typo. It is better to propagate the error code to the callers of caif_sktinit_module(). This patch removes the check statment and directly returns the result of sock_register(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188751 Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: ping: check minimum size on ICMP header lengthKees Cook1-0/+4
Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there was no check that the iovec contained enough bytes for an ICMP header, and the read loop would walk across neighboring stack contents. Since the iov_iter conversion, bad arguments are noticed, but the returned error is EFAULT. Returning EINVAL is a clearer error and also solves the problem prior to v3.19. This was found using trinity with KASAN on v3.18: BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0 Read of size 8 by task trinity-c2/9623 page:ffffffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G BU 3.18.0-dirty #15 Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT) Call trace: [<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90 [<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171 [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50 [< inline >] print_address_description mm/kasan/report.c:147 [< inline >] kasan_report_error mm/kasan/report.c:236 [<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259 [< inline >] check_memory_region mm/kasan/kasan.c:264 [<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507 [<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15 [< inline >] memcpy_from_msg include/linux/skbuff.h:2667 [<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674 [<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714 [<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749 [< inline >] __sock_sendmsg_nosec net/socket.c:624 [< inline >] __sock_sendmsg net/socket.c:632 [<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643 [< inline >] SYSC_sendto net/socket.c:1797 [<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761 CVE-2016-8399 Reported-by: Qidan He <[email protected]> Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05Merge tag 'powerpc-4.9-7' of ↵Linus Torvalds6-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Four fixes, the first for code we merged this cycle and three that are also going to stable: - On 64-bit Book3E we were not placing the .text section where we said we would in the asm. - We broke building the boot wrapper on some 32-bit toolchains. - Lazy icache flushing was broken on pre-POWER5 machines. - One of the error paths in our EEH code would lead to a deadlock. Thanks to: Andrew Donnellan, Ben Hutchings, Benjamin Herrenschmidt, Nicholas Piggin" * tag 'powerpc-4.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix placement of .text to be immediately following .head.text powerpc/eeh: Fix deadlock when PE frozen state can't be cleared powerpc/mm: Fix lazy icache flush on pre-POWER5 powerpc/boot: Fix build failure in 32-bit boot wrapper
2016-12-05atm: lanai: set error code when ioremap failsPan Bian1-0/+1
In function lanai_dev_open(), when the call to ioremap() fails, the value of return variable result is 0. 0 means no error in this context. This patch fixes the bug, assigning "-ENOMEM" to result when ioremap() returns a NULL pointer. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188791 Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: usb: set error code when usb_alloc_urb failsPan Bian1-0/+1
In function lan78xx_probe(), variable ret takes the errno code on failures. However, when the call to usb_alloc_urb() fails, its value will keeps 0. 0 indicates success in the context, which is inconsistent with the execution result. This patch fixes the bug, assigning "-ENOMEM" to ret when usb_alloc_urb() returns a NULL pointer. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188771 Signed-off-by: Pan Bian <[email protected]> Acked-by: Woojung Huh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: bridge: set error code on failurePan Bian1-0/+1
Function br_sysfs_addbr() does not set error code when the call kobject_create_and_add() returns a NULL pointer. It may be better to return "-ENOMEM" when kobject_create_and_add() fails. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188781 Signed-off-by: Pan Bian <[email protected]> Acked-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05net: af_mpls.c add space before open parenthesisSuraj Deshmukh1-1/+1
Adding space after switch keyword before open parenthesis for readability purpose. This patch fixes the checkpatch.pl warning: space required before the open parenthesis '(' Signed-off-by: Suraj Deshmukh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05netdev: broadcom: propagate error codePan Bian1-1/+1
Function bnxt_hwrm_stat_ctx_alloc() always returns 0, even if the call to _hwrm_send_message() fails. It may be better to propagate the errors to the caller of bnxt_hwrm_stat_ctx_alloc(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188661 Signed-off-by: Pan Bian <[email protected]> Acked-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05Merge branch 'fib-suffix-length-fixes'David S. Miller1-33/+35
Alexander Duyck says: ==================== IPv4 FIB suffix length fixes In reviewing the patch from Robert Shearman and looking over the code I realized there were a few different bugs we were still carrying in the IPv4 FIB lookup code. These two patches are based off of Robert's original patch, but take things one step further by splitting them up to address two additional issues I found. So first have Robert's original patch which was addressing the fact that us calling update_suffix in resize is expensive when it is called per add. To address that I incorporated the core bit of the patch which was us dropping the update_suffix call from resize. The first patch in the series does a rename and fix on the push_suffix and pull_suffix code. Specifically we drop the need to pass a leaf and secondly we fix things so we pull the suffix as long as the value of the suffix in the node is dropping. The second patch addresses the original issue reported as well as optimizing the code for the fact that update_suffix is only really meant to go through and clean things up when we are decreasing a suffix. I had originally added code for it to somehow cause an increase, but if we push the suffix when a new leaf is added we only ever have to handle pulling down the suffix with update_suffix so I updated the code to reflect that. As far as side effects the only ones I think that will be obvious should be the fact that some routes may be able to be found earlier since before we relied on resize to update the suffix lengths, and now we are updating them before we add or remove the leaf. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-12-05ipv4: Drop suffix update from resize codeAlexander Duyck1-21/+21
It has been reported that update_suffix can be expensive when it is called on a large node in which most of the suffix lengths are the same. The time required to add 200K entries had increased from around 3 seconds to almost 49 seconds. In order to address this we need to move the code for updating the suffix out of resize and instead just have it handled in the cases where we are pushing a node that increases the suffix length, or will decrease the suffix length. Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length") Reported-by: Robert Shearman <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Reviewed-by: Robert Shearman <[email protected]> Tested-by: Robert Shearman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05ipv4: Drop leaf from suffix pull/push functionsAlexander Duyck1-12/+14
It wasn't necessary to pass a leaf in when doing the suffix updates so just drop it. Instead just pass the suffix and work with that. Since we dropped the leaf there is no need to include that in the name so the names are updated to node_push_suffix and node_pull_suffix. Finally I noticed that the logic for pulling the suffix length back actually had some issues. Specifically it would stop prematurely if there was a longer suffix, but it was not as long as the original suffix. I updated the code to address that in node_pull_suffix. Fixes: 5405afd1a306 ("fib_trie: Add tracking value for suffix length") Suggested-by: Robert Shearman <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Reviewed-by: Robert Shearman <[email protected]> Tested-by: Robert Shearman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-05Merge branch 'linus' of ↵Linus Torvalds4-6/+29
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - Intermittent build failure in RSA - Memory corruption in chelsio crypto driver - Regression in DRBG due to vmalloced stack" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: rsa - Add Makefile dependencies to fix parallel builds crypto: chcr - Fix memory corruption crypto: drbg - prevent invalid SG mappings
2016-12-04Linux 4.9-rc8Linus Torvalds1-1/+1
2016-12-03net: dcb: set error code on failuresPan Bian1-0/+1
In function dcbnl_cee_fill(), returns the value of variable err on errors. However, on some error paths (e.g. nla put fails), its value may be 0. It may be better to explicitly set a negative errno to variable err before returning. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188881 Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-03Merge tag 'drm-fixes-for-v4.9-rc8' of ↵Linus Torvalds7-13/+34
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A pretty small pull request: a couple of AMD powerxpress regression fixes and a power management fix, a couple of i915 fixes and one hdlcd fix, along with one core don't oops because of incorrect API usage fix" * tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux: drm/i915: drop the struct_mutex when wedged or trying to reset drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error drm: Don't call drm_for_each_crtc with a non-KMS driver drm/radeon: fix check for port PM availability drm/amdgpu: fix check for port PM availability drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr drm: hdlcd: Fix cleanup order
2016-12-03Merge tag 'batadv-net-for-davem-20161202' of git://git.open-mesh.org/linux-mergeDavid S. Miller1-2/+2
Simon Wunderlich says: ==================== Here is another batman-adv bugfix: - fix checking for failed allocation of TVLV blocks in TT local data, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <[email protected]>
2016-12-04Merge tag 'drm-intel-fixes-2016-12-01' of ↵Dave Airlie2-3/+5
git://anongit.freedesktop.org/git/drm-intel into drm-fixes 2 intel fixes. * tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel: drm/i915: drop the struct_mutex when wedged or trying to reset drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
2016-12-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-1/+3
Merge more fixes from Andrew Morton: "2 fixes" * emailed patches from Andrew Morton <[email protected]>: mm, vmscan: add cond_resched() into shrink_node_memcg() mm: workingset: fix NULL ptr in count_shadow_nodes
2016-12-02mm, vmscan: add cond_resched() into shrink_node_memcg()Michal Hocko1-0/+2
Boris Zhmurov has reported RCU stalls during the kswapd reclaim: INFO: rcu_sched detected stalls on CPUs/tasks: 23-...: (22 ticks this GP) idle=92f/140000000000000/0 softirq=2638404/2638404 fqs=23 (detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115) Task dump for CPU 23: kswapd1 R running task 0 148 2 0x00000008 Call Trace: shrink_node+0xd2/0x2f0 kswapd+0x2cb/0x6a0 mem_cgroup_shrink_node+0x160/0x160 kthread+0xbd/0xe0 __switch_to+0x1fa/0x5c0 ret_from_fork+0x1f/0x40 kthread_create_on_node+0x180/0x180 a closer code inspection has shown that we might indeed miss all the scheduling points in the reclaim path if no pages can be isolated from the LRU list. This is a pathological case but other reports from Donald Buczek have shown that we might indeed hit such a path: clusterd-989 [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193 kswapd1-86 [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239830 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239844 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239858 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239872 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239886 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239900 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239914 nr_taken=0 file=1 [...] kswapd1-86 [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4241133 nr_taken=0 file=1 this is minute long snapshot which didn't take a single page from the LRU. It is not entirely clear why only 1303 pages have been scanned during that time (maybe there was a heavy IRQ activity interfering). In any case it looks like we can really hit long periods without scheduling on non preemptive kernels so an explicit cond_resched() in shrink_node_memcg which is independent on the reclaim operation is due. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Boris Zhmurov <[email protected]> Tested-by: Boris Zhmurov <[email protected]> Reported-by: Donald Buczek <[email protected]> Reported-by: "Christopher S. Aker" <[email protected]> Reported-by: Paul Menzel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-02mm: workingset: fix NULL ptr in count_shadow_nodesMichal Hocko1-1/+1
Commit 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") has made the workingset shadow nodes shrinker memcg aware. The implementation is not correct though because memcg_kmem_enabled() might become true while we are doing a global reclaim when the sc->memcg might be NULL which is exactly what Marek has seen: BUG: unable to handle kernel NULL pointer dereference at 0000000000000400 IP: [<ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40 PGD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 60 Comm: kswapd0 Tainted: G O 4.8.10-12.pvops.qubes.x86_64 #1 task: ffff880011863b00 task.stack: ffff880011868000 RIP: mem_cgroup_node_nr_lru_pages+0x20/0x40 RSP: e02b:ffff88001186bc70 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88001186bd20 RCX: 0000000000000002 RDX: 000000000000000c RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88001186bc70 R08: 28f5c28f5c28f5c3 R09: 0000000000000000 R10: 0000000000006c34 R11: 0000000000000333 R12: 00000000000001f6 R13: ffffffff81c6f6a0 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880013c00000(0000) knlGS:ffff880013d00000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000400 CR3: 00000000122f2000 CR4: 0000000000042660 Call Trace: count_shadow_nodes+0x9a/0xa0 shrink_slab.part.42+0x119/0x3e0 shrink_node+0x22c/0x320 kswapd+0x32c/0x700 kthread+0xd8/0xf0 ret_from_fork+0x1f/0x40 Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 3b 35 dd eb b1 00 55 48 89 e5 73 2c 89 d2 31 c9 31 c0 4c 63 ce 48 0f a3 ca 73 13 <4a> 8b b4 cf 00 04 00 00 41 89 c8 4a 03 84 c6 80 00 00 00 83 c1 RIP mem_cgroup_node_nr_lru_pages+0x20/0x40 RSP <ffff88001186bc70> CR2: 0000000000000400 ---[ end trace 100494b9edbdfc4d ]--- This patch fixes the issue by checking sc->memcg rather than memcg_kmem_enabled() which is sufficient because shrink_slab makes sure that only memcg aware shrinkers will get non-NULL memcgs and only if memcg_kmem_enabled is true. Fixes: 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Marek Marczykowski-Górecki <[email protected]> Tested-by: Marek Marczykowski-Górecki <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Balbir Singh <[email protected]> Cc: <[email protected]> [4.6+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-02kbuild: fix building bzImage with CONFIG_TRIM_UNUSED_KSYMS enabledNicolas Pitre1-1/+8
When building a specific target such as bzImage, modules aren't normally built. However if CONFIG_TRIM_UNUSED_KSYMS is enabled, no built modules means none of the exported symbols are used and therefore they will all be trimmed away from the final kernel. A subsequent "make modules" will fail because modpost cannot find the needed symbols for those modules in the kernel binary. Let's make sure modules are also built whenever CONFIG_TRIM_UNUSED_KSYMS is enabled and that the kernel binary is properly rebuilt accordingly. Signed-off-by: Nicolas Pitre <[email protected]> Tested-by: Jarod Wilson <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-02Merge tag 'fixes-for-linus' of ↵Linus Torvalds8-6/+22
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "This should be the last set of bugfixes for arm-soc in v4.9. None of these are critical regressions, but it would be nice to still get them merged. - On the Juno platform, the idle latency was described wrong, leading to suboptimal cpuidle tuning. - Also on the same platform, PCI I/O space was set up incorrectly and could not work. - On the sti platform, a syntactically incorrect DT entry caused warnings. - The newly added 'gr8' platform has somewhat confusing file names, which we rename for consistency" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions arm64: dts: juno: Correct PCI IO window ARM: dts: STiH407-family: fix i2c nodes ARM: gr8: Rename the DTSI and relevant DTS
2016-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds121-450/+1064
Pull networking fixes from David Miller: 1) Lots more phydev and probe error path leaks in various drivers by Johan Hovold. 2) Fix race in packet_set_ring(), from Philip Pettersson. 3) Use after free in dccp_invalid_packet(), from Eric Dumazet. 4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric Dumazet. 5) When tunneling between ipv4 and ipv6 we can be left with the wrong skb->protocol value as we enter the IPSEC engine and this causes all kinds of problems. Set it before the output path does any dst_output() calls, from Eli Cooper. 6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from Florian Fainelli. 7) Various netfilter nat bug fixes from FLorian Westphal. 8) Fix memory leak in ipvlan_link_new(), from Gao Feng. 9) Locking fixes, particularly wrt. socket lookups, in l2tp from Guillaume Nault. 10) Avoid invoking rhash teardowns in atomic context by moving netlink cb->done() dump completion from a worker thread. Fix from Herbert Xu. 11) Buffer refcount problems in tun and macvtap on errors, from Jason Wang. 12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user selects BBR. Fix from Julian Wollrath. 13) Fix deadlock in transmit path on altera TSE driver, from Lino Sanfilippo. 14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita Yushchenko. 15) tc_tunnel_key needs to be properly exported to userspace via uapi, fix from Roi Dayan. 16) rds_tcp_init_net() doesn't unregister notifier in error path, fix from Sowmini Varadhan. 17) Stale packet header pointer access after pskb_expand_head() in genenve driver, fix from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits) net: avoid signed overflows for SO_{SND|RCV}BUFFORCE geneve: avoid use-after-free of skb->data tipc: check minimum bearer MTU net: renesas: ravb: unintialized return value sh_eth: remove unchecked interrupts for RZ/A1 net: bcmgenet: Utilize correct struct device for all DMA operations NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040 cdc_ether: Fix handling connection notification ip6_offload: check segs for NULL in ipv6_gso_segment. RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" ipv6: Set skb->protocol properly for local output ipv4: Set skb->protocol properly for local output packet: fix race condition in packet_set_ring net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks net: ethernet: stmmac: platform: fix outdated function header net: ethernet: stmmac: dwmac-meson8b: fix probe error path net: ethernet: stmmac: dwmac-generic: fix probe error path ...
2016-12-02net: avoid signed overflows for SO_{SND|RCV}BUFFORCEEric Dumazet1-2/+2
CAP_NET_ADMIN users should not be allowed to set negative sk_sndbuf or sk_rcvbuf values, as it can lead to various memory corruptions, crashes, OOM... Note that before commit 82981930125a ("net: cleanups in sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF and SO_RCVBUF were vulnerable. This needs to be backported to all known linux kernels. Again, many thanks to syzkaller team for discovering this gem. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-02geneve: avoid use-after-free of skb->dataSabrina Dubroca1-10/+4
geneve{,6}_build_skb can end up doing a pskb_expand_head(), which makes the ip_hdr(skb) reference we stashed earlier stale. Since it's only needed as an argument to ip_tunnel_ecn_encap(), move this directly in the function call. Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths") Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: John W. Linville <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-02tipc: check minimum bearer MTUMichal Kubeček3-2/+27
Qian Zhang (张谦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek <[email protected]> Reported-by: Qian Zhang (张谦) <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-02Merge tag 'linux-can-fixes-for-4.9-20161201' of ↵David S. Miller4-24/+121
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2016-12-02 this is a pull request for net/master. There are two patches by Stephane Grosjean, who adds support for the new PCAN-USB X6 USB interface to the pcan_usb driver. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-12-02net: renesas: ravb: unintialized return valueDan Carpenter1-2/+0
We want to set the other "err" variable here so that we can return it later. My version of GCC misses this issue but I caught it with a static checker. Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Sergei Shtylyov <[email protected]> Reviewed-by: Johan Hovold <[email protected]> Signed-off-by: David S. Miller <[email protected]>