aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-07-19tracing/ring_buffer: Try harder to allocateJoel Fernandes1-5/+5
ftrace can fail to allocate per-CPU ring buffer on systems with a large number of CPUs coupled while large amounts of cache happening in the page cache. Currently the ring buffer allocation doesn't retry in the VM implementation even if direct-reclaim made some progress but still wasn't able to find a free page. On retrying I see that the allocations almost always succeed. The retry doesn't happen because __GFP_NORETRY is used in the tracer to prevent the case where we might OOM, however if we drop __GFP_NORETRY, we risk destabilizing the system if OOM killer is triggered. To prevent this situation, use the __GFP_RETRY_MAYFAIL flag introduced recently [1]. Tested the following still succeeds without destabilizing a system with 1GB memory. echo 300000 > /sys/kernel/debug/tracing/buffer_size_kb [1] https://marc.info/?l=linux-mm&m=149820805124906&w=2 Link: http://lkml.kernel.org/r/[email protected] Cc: Tim Murray <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Joel Fernandes <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-07-19KVM: x86: masking out upper bitsDan Carpenter1-2/+2
kvm_read_cr3() returns an unsigned long and gfn is a u64. We intended to mask out the bottom 5 bits but because of the type issue we mask the top 32 bits as well. I don't know if this is a real problem, but it causes static checker warnings. Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Radim Krčmář <[email protected]>
2017-07-18device-dax: fix sysfs duplicate warningsDan Williams3-14/+24
Fix warnings of the form... WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/class/dax/dax12.0' Call Trace: dump_stack+0x63/0x86 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5a/0x80 ? kernfs_path_from_node+0x4f/0x60 sysfs_warn_dup+0x62/0x80 sysfs_do_create_link_sd.isra.2+0x97/0xb0 sysfs_create_link+0x25/0x40 device_add+0x266/0x630 devm_create_dax_dev+0x2cf/0x340 [dax] dax_pmem_probe+0x1f5/0x26e [dax_pmem] nvdimm_bus_probe+0x71/0x120 ...by reusing the namespace id for the device-dax instance name. Now that we have decided that there will never by more than one device-dax instance per libnvdimm-namespace parent device [1], we can directly reuse the namepace ids. There are some possible follow-on cleanups, but those are saved for a later patch to simplify the -stable backport. [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html Fixes: 98a29c39dc68 ("libnvdimm, namespace: allow creation of multiple pmem...") Cc: Jeff Moyer <[email protected]> Cc: <[email protected]> Reported-by: Dariusz Dokupil <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2017-07-18netfilter: fix netfilter_net_init() returnDan Carpenter1-2/+2
We accidentally return an uninitialized variable. Fixes: cf56c2f892a8 ("netfilter: remove old pre-netns era hook api") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-18irqchip/digicolor: Drop unnecessary staticJulia Lawall1-1/+1
Drop static on a local variable, when the variable is initialized before any possible use. Thus, the static has no benefit. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; // </smpl> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Baruch Siach <[email protected]> Cc: [email protected] Cc: Marc Zyngier <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Jason Cooper <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-07-18irqchip/mips-cpu: Drop unnecessary staticJulia Lawall1-1/+1
Drop static on a local variable, when the variable is initialized before any possible use. Thus, the static has no benefit. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; // </smpl> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Jason Cooper <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-07-18irqchip/gic/realview: Drop unnecessary staticJulia Lawall1-1/+1
Drop static on a local variable, when the variable is initialized before any possible use. Thus, the static has no benefit. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @bad exists@ position p; identifier x; type T; @@ static T x@p; ... x = <+...x...+> @@ identifier x; expression e; type T; position p != bad.p; @@ -static T x@p; ... when != x when strict ?x = e; // </smpl> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Jason Cooper <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller6-166/+14
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Missing netlink message sanity check in nfnetlink, patch from Mateusz Jurczyk. 2) We now have netfilter per-netns hooks, so let's kill global hook infrastructure, this infrastructure is known to be racy with netns. We don't care about out of tree modules. Patch from Florian Westphal. 3) find_appropriate_src() is buggy when colissions happens after the conversion of the nat bysource to rhashtable. Also from Florian. 4) Remove forward chain in nf_tables arp family, it's useless and it is causing quite a bit of confusion, from Florian Westphal. 5) nf_ct_remove_expect() is called with the wrong parameter, causing kernel oops, patch from Florian Westphal. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-07-18udp: preserve skb->dst if required for IP options processingPaolo Abeni1-2/+11
Eric noticed that in udp_recvmsg() we still need to access skb->dst while processing the IP options. Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue") skb->dst is no more available at recvmsg() time and bad things will happen if we enter the relevant code path. This commit address the issue, avoid clearing skb->dst if any IP options are present into the relevant skb. Since the IP CB is contained in the first skb cacheline, we can test it to decide to leverage the consume_stateless_skb() optimization, without measurable additional cost in the faster path. v1 -> v2: updated commit message tags Fixes: 0a463c78d25b ("udp: avoid a cache miss on dequeue") Reported-by: Andrey Konovalov <[email protected]> Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-18Merge tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds4-6/+7
Pull MD fixes from Shaohua Li: - raid5-ppl fix by Artur. This one is introduced in this release cycle. - raid5 reshape fix by Xiao. This is an old bug and will be added to stable. - bitmap fix by Guoqing. * tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid5-ppl: use BIOSET_NEED_BVECS when creating bioset Raid5 should update rdev->sectors after reshape md/bitmap: don't read page from device with Bitmap_sync
2017-07-18atm: zatm: Fix an error handling path in 'zatm_init_one()'Christophe Jaillet1-1/+1
If 'dma_set_mask_and_coherent()' fails, we must undo the previous 'pci_request_regions()' call. Adjust corresponding 'goto' to jump at the right place of the error handling path. Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-18ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()Alexander Potapenko2-0/+2
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(), which originated from the TCP request socket created in cookie_v6_check(): ================================================================== BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters. Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 skb_set_hash_from_sk ./include/net/sock.h:2011 tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983 tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493 tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284 tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309 call_timer_fn+0x240/0x520 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 __run_timers+0xc13/0xf10 kernel/time/timer.c:1601 run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614 __do_softirq+0x485/0x942 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x1fa/0x230 kernel/softirq.c:405 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657 smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966 apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4 </IRQ> poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293 SYSC_select+0x4b4/0x4e0 fs/select.c:653 SyS_select+0x76/0xa0 fs/select.c:634 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204 RIP: 0033:0x4597e7 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003 chained origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_save_stack mm/kmsan/kmsan.c:317 kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547 __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259 tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472 tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103 tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212 cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337 kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766 reqsk_alloc ./include/net/request_sock.h:87 inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200 cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 ================================================================== Similar error is reported for cookie_v4_check(). Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Alexander Potapenko <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-18ppp: Fix false xmit recursion detect with two ppp devicesGao Feng1-9/+21
The global percpu variable ppp_xmit_recursion is used to detect the ppp xmit recursion to avoid the deadlock, which is caused by one CPU tries to lock the xmit lock twice. But it would report false recursion when one CPU wants to send the skb from two different PPP devices, like one L2TP on the PPPoE. It is a normal case actually. Now use one percpu member of struct ppp instead of the gloable variable to detect the xmit recursion of one ppp device. Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Gao Feng <[email protected]> Signed-off-by: Liu Jianying <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-18Merge tag 'for-linus' of ↵Linus Torvalds43-290/+366
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "First set of -rc fixes for 4.13 cycle: - misc iSER fixes - namespace fixups - fix the fact that IPoIB didn't use the proper API for noio mem allocs - rxe driver fixes - hns_roce fixes - misc core fixes - misc IPoIB fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits) IB/core: Allow QP state transition from reset to error IB/hns: Fix for checkpatch.pl comment style warnings IB/hns: Fix the bug with modifying the MAC address without removing the driver IB/hns: Fix the bug with rdma operation IB/hns: Fix the bug with wild pointer when destroy rc qp IB/hns: Fix the bug of polling cq failed for loopback Qps IB/rxe: Set dma_mask and coherent_dma_mask IB/rxe: Fix kernel panic from skb destructor IB/ipoib: Let lower driver handle get_stats64 call IB/core: Add ordered workqueue for RoCE GID management IB/mlx5: Clean mr_cache debugfs in case of failure IB/core: Remove NOIO QP create flag {net, IB}/mlx4: Remove gfp flags argument IB/{rdmavt, qib, hfi1}: Remove gfp flags argument IB/IPoIB: Convert IPoIB to memalloc_noio_* calls IB/IPoIB: Forward MTU change to driver below IB: Convert msleep below 20ms to usleep_range IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMAC IB/core: Introduce modify QP operation with udata IB/core: Don't resolve IP address to the loopback device ...
2017-07-18Merge tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-3/+3
Pull nfsd fix from Bruce Fields: "One fix for a problem introduced in the most recent merge window and found by Dave Jones and KASAN" * tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux: nfsd: Fix a memory scribble in the callback channel
2017-07-18hfsplus: Don't clear SGID when inheriting ACLsJan Kara1-12/+18
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by creating __hfsplus_set_posix_acl() function that does not call posix_acl_update_mode() and use it when inheriting ACLs. That prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef CC: [email protected] Signed-off-by: Jan Kara <[email protected]>
2017-07-18perf/x86/intel: Fix debug_store reset field for freq eventsJiri Olsa1-0/+2
There's a bug in PEBs event enabling code, that prevents PEBS freq events to work properly after non freq PEBS event was run. freq events - perf_event_attr::freq set -F <freq> option of perf record PEBS events - perf_event_attr::precise_ip > 0 default for perf record Like in following example with CPU 0 busy, we expect ~10000 samples for following perf tool run: # perf record -F 10000 -C 0 sleep 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.640 MB perf.data (10031 samples) ] Everything's fine, but once we run non freq PEBS event like: # perf record -c 10000 -C 0 sleep 1 [ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 1.053 MB perf.data (20061 samples) ] the freq events start to fail like this: # perf record -F 10000 -C 0 sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.185 MB perf.data (40 samples) ] The issue is in non freq PEBs event initialization of debug_store reset field, which value is used to auto-reload the counter value after PEBS event drain. This value is not being used for PEBS freq events, but once we run non freq event it stays in debug_store data and screws the sample_freq counting for PEBS freq events. Setting the reset field to 0 for freq events. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-18perf/x86/intel: Add Goldmont Plus CPU PMU supportKan Liang3-0/+166
Add perf core PMU support for Intel Goldmont Plus CPU cores: - The init code is based on Goldmont. - There is a new cache event list, based on the Goldmont cache event list. - All four general-purpose performance counters support PEBS. - The first general-purpose performance counter is for reduced skid PEBS mechanism. Using :ppp to indicate the event which want to do reduced skid PEBS. - Goldmont Plus has 4-wide pipeline for Topdown Signed-off-by: Kan Liang <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-18perf/x86/intel: Enable C-state residency events for Apollo LakeHarry Pan1-6/+20
Goldmont microarchitecture supports C1/C3/C6, PC2/PC3/PC6/PC10 state residency counters, the patch enables them for Apollo Lake platform. The MSR information is based on Intel Software Developers' Manual, Vol. 4, Order No. 335592, Table 2-6 and 2-12. Signed-off-by: Harry Pan <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-07-18isofs: Fix off-by-one in 'session' mount option parsingJan Kara1-2/+6
According to ECMA-130 standard maximum valid track number is 99. Since 'session' mount option starts indexing at 0 (and we add 1 to the passed number), we should refuse value 99. Also the condition in isofs_get_last_session() unnecessarily repeats the check - remove it. Reported-by: David Howells <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2017-07-18powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=yMichael Ellerman8-0/+39
Currently even with STRICT_KERNEL_RWX we leave the __init text marked executable after init, which is bad. Add a hook to mark it NX (no-execute) before we free it, and implement it for radix and hash. Note that we use __init_end as the end address, not _einittext, because overlaps_kernel_text() uses __init_end, because there are additional executable sections other than .init.text between __init_begin and __init_end. Tested on radix and hash with: 0:mon> p $__init_begin *** 400 exception occurred Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Signed-off-by: Michael Ellerman <[email protected]>
2017-07-18reiserfs: preserve i_mode if __reiserfs_set_acl() failsErnesto A. Fernández1-3/+6
When changing a file's acl mask, reiserfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. Signed-off-by: Ernesto A. Fernández <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2017-07-18ext2: preserve i_mode if ext2_set_acl() failsErnesto A. Fernández1-2/+9
When changing a file's acl mask, ext2_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. [JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"] Signed-off-by: Ernesto A. Fernández <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2017-07-18powerpc/mm/hash: Refactor hash__mark_rodata_ro()Michael Ellerman1-13/+19
Move the core logic into a helper, so we can use it for changing other permissions. We also change the logic to align start down, and end up. This means calling the function with a range will expand that range to be at least 1 mmu_linear_psize page in size. We need that so we can use it on __init_begin ... __init_end which is not a full page in size. This should always work for _stext/__init_begin, because we align __init_begin to _stext + 16M in the linker script. Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-18powerpc/mm/radix: Refactor radix__mark_rodata_ro()Michael Ellerman1-5/+15
Move the core logic into a helper, so we can use it for changing permissions other than _PAGE_WRITE. Signed-off-by: Michael Ellerman <[email protected]> Reviewed-by: Balbir Singh <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-18x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNTRoman Kagan1-1/+1
A recent commit: d6e41f1151fe ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant") introduced a VM_WARN_ON(!in_atomic()) which generates false positives on every VM entry on !CONFIG_PREEMPT_COUNT kernels. Replace it with a test for preemptible(), which appears to match the original intent and works across different CONFIG_PREEMPT* variations. Signed-off-by: Roman Kagan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: d6e41f1151fe ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant") Signed-off-by: Ingo Molnar <[email protected]>
2017-07-18irqchip/mips-gic: Remove population of irq domain namesMatt Redfearn1-2/+0
Since commit d59f6617eef0f ("genirq: Allow fwnode to carry name information only") the irqdomain core sets the names of irq domains. When the name is allocated the new IRQ_DOMAIN_NAME_ALLOCATED flag is set. Replacing the allocated name with a constant one is not a good idea, since calling the new irq_domain_update_bus_token() API, added to the MIPS GIC driver by commit 96f0d93a487e1 ("irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access") will attempt to kfree the pointer, and result in a kernel OOPS. Fix this by removing the names, now that they are set by the irqdomain core. This effectively reverts commit 21c57fd13589 ("irqchip/mips-gic: Populate irq_domain names"). Fixes: d59f6617eef0f ("genirq: Allow fwnode to carry name information only") Signed-off-by: Matt Redfearn <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: [email protected] Cc: Jason Cooper <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2017-07-18powerpc/64s: Fix hypercall entry clobbering r12 inputNicholas Piggin1-14/+14
A previous optimisation incorrectly assumed the PAPR hcall does not use r12, and clobbers it upon entry. In fact it is used as an input. This can result in KVM guests crashing (observed with PR KVM). Instead of using r12 to save r13, tihs patch saves r13 in ctr. This is more costly, but not as slow as using the SPRG. Fixes: acd7d8cef0153 ("powerpc/64s: Optimize hypercall/syscall entry") Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-17f2fs: avoid cpu lockupJaegeuk Kim1-0/+10
Before retrying to flush data or dentry pages, we need to release cpu in order to prevent watchdog. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-17f2fs: include seq_file.h for sysfs.cJaegeuk Kim1-0/+1
This patch includes seq_file.h to avoid compile error. Signed-off-by: Eric Biggers <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2017-07-18powerpc/perf: Avoid spurious PMU interrupts after idleNicholas Piggin1-1/+14
POWER9 DD2 can see spurious PMU interrupts after state-loss idle in some conditions. A solution is to save and reload MMCR0 over state-loss idle. Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Madhavan Srinivasan <[email protected]> Tested-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-07-17IB/core: Allow QP state transition from reset to errorTadeusz Struk1-0/+1
Playing with IP-O-IB interface can trigger a warning message: "ib0: Failed to modify QP to ERROR state" to be logged. This happens when the QP is in IB_QPS_RESET state and the stack is trying to transition it to IB_QPS_ERR state in ipoib_ib_dev_stop(). According to the IB spec, Table 91 - "QP State Transition Properties" it looks like the transition from reset to error is valid: Transition: Any State to Error Required Attributes: None Optional Attributes: None allowed Actions: Queue processing is stopped. Work Requests pending or in process are completed in error, when possible. This patch allows the transition and quiets the message. Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Tadeusz Struk <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/hns: Fix for checkpatch.pl comment style warningsoulijun1-4/+6
This patch correct the comment style warnings caught by checkpatch.pl script. Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/hns: Fix the bug with modifying the MAC address without removing the driveroulijun1-3/+0
When modified the MAC address used hns_roce_mac function, we release and create reserved qp again, It is not necessary to use spin_lock_bh and spin_unlock_bh in handle_en_event, Otherwise, it will occur a error. This patch mainly fixes it. Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/hns: Fix the bug with rdma operationoulijun1-4/+4
When opcode of work request is RDMA read and write, it should use rdma_wr to get remote_addr and rkey. This patch fixes it. Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/hns: Fix the bug with wild pointer when destroy rc qpoulijun1-5/+7
When destroyed rc qp, the hr_qp will be used after freed. This patch will fix it. Signed-off-by: Lijun Ou <[email protected]> Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/hns: Fix the bug of polling cq failed for loopback Qpsoulijun1-19/+34
In hip06 SoC, RoCE driver creates 8 reserved loopback QPs to ensure zero wqe when free mr. However, if the enabled phy port number is less than 6, it will fail in polling cqe with 8 reserved loopback QPs. In order to solve this problem, the number of loopback Qps will be adjusted based on the number of enabled phy port. Signed-off-by: Shaobo Xu <[email protected]> Signed-off-by: Lijun Ou <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/rxe: Set dma_mask and coherent_dma_maskyonatanc1-0/+2
The RXE coupled with dummy device causes to the kernel panic attached below. The panic happens when ib_register_device tries to set dma_mask by accessing a NULLed parent device. The RXE does not actually use DMA, so we can set the dma_mask to architecture value. [16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core] [16240.205289] RSP: 0018:ffffc9000220fc10 EFLAGS: 00010246 [16240.209909] RAX: 0000000000000024 RBX: ffff880220d1a2a8 RCX: 0000000000000000 [16240.212244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 [16240.214385] RBP: ffffc9000220fcb0 R08: 0000000000000000 R09: 000000000000023f [16240.254465] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000 [16240.259467] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880220d1a2a8 [16240.263314] FS: 00007fd8ecca0740(0000) GS:ffff8802364c0000(0000) knlGS:0000000000000000 [16240.267292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [16240.273503] CR2: 0000000000000218 CR3: 00000002253ba000 CR4: 00000000000006e0 [16240.277066] Call Trace: [16240.281836] ? __kmalloc+0x26f/0x280 [16240.286596] rxe_register_device+0x297/0x300 [rdma_rxe] [16240.291377] rxe_add+0x535/0x5b0 [rdma_rxe] [16240.297586] rxe_net_add+0x3e/0xc0 [rdma_rxe] [16240.302375] rxe_param_set_add+0x65/0x144 [rdma_rxe] [16240.307769] param_attr_store+0x68/0xd0 [16240.311640] module_attr_store+0x1d/0x30 [16240.316421] sysfs_kf_write+0x3a/0x50 [16240.317802] kernfs_fop_write+0xff/0x180 [16240.322989] __vfs_write+0x37/0x140 [16240.328164] ? handle_mm_fault+0xce/0x240 [16240.333340] vfs_write+0xb2/0x1b0 [16240.335013] SyS_write+0x55/0xc0 [16240.340632] entry_SYSCALL_64_fastpath+0x1a/0xa9 Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <[email protected]> Reviewed-by: Moni Shoua <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/rxe: Fix kernel panic from skb destructorYonatan Cohen1-0/+3
In the time between rxe_send has finished and skb destructor called, the QP's ref count might be 0, leading to a possible QP destruction. This will lead to a kernel panic when the destructor dereferences the QP. The operation of incrementing QP ref count at rxe_send and decrementing from skb destructor will prevent this crash. BUG: unable to handle kernel NULL pointer dereference at 000000000000072c IP: [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe] PGD 0 [16240.211178] Oops: 0002 [#1] SMP CPU: 3 PID: 0 Comm: swapper/3 Tainted: G OE 4.9.0-mlnx #1 Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 task: ffff88042d6b1480 task.stack: ffffc90001904000 RIP: 0010:[<ffffffffa05df765>] [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe] RSP: 0018:ffff88043fcc3df0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880429684700 RCX: ffff88042d248200 RDX: 00000000ffffffff RSI: 00000000fffffe01 RDI: ffff880429684700 RBP: ffff88043fcc3e00 R08: ffff88043fcda240 R09: 00000000ff2d1de6 R10: 0000000000000000 R11: 00000000f49cf6fe R12: ffff880429684700 R13: ffffffff81893f96 R14: ffffffff817d66f0 R15: ffff880427f74200 FS: 0000000000000000(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000072c CR3: 000000041d3df000 CR4: 00000000000006e0 Stack: ffffffff817b29cf ffff880429684700 ffff88043fcc3e18 ffffffff817b42c2 ffff880429684700 ffff88043fcc3e40 ffffffff817b4332 ffff880429684700 ffff880427f74238 ffff880427f74228 ffff88043fcc3e58 ffffffff81893f96 Call Trace: <IRQ> [16240.336345] [<ffffffff817b29cf>] ? skb_release_head_state+0x4f/0xb0 [<ffffffff817b42c2>] skb_release_all+0x12/0x30 [<ffffffff817b4332>] kfree_skb+0x32/0x90 [<ffffffff81893f96>] ndisc_error_report+0x36/0x40 [<ffffffff817d4de1>] neigh_invalidate+0x81/0xf0 [<ffffffff817d68f7>] neigh_timer_handler+0x207/0x2b0 [<ffffffff81109295>] call_timer_fn+0x35/0x120 [<ffffffff81109db7>] run_timer_softirq+0x1d7/0x460 [<ffffffff8106155e>] ? kvm_sched_clock_read+0x1e/0x30 [<ffffffff810366b9>] ? sched_clock+0x9/0x10 [<ffffffff810cfed2>] ? sched_clock_cpu+0x72/0xa0 [<ffffffff818dd537>] __do_softirq+0xd7/0x289 [<ffffffff810a6c95>] irq_exit+0xb5/0xc0 [<ffffffff818dd372>] smp_apic_timer_interrupt+0x42/0x50 [<ffffffff818dc682>] apic_timer_interrupt+0x82/0x90 <EOI> [16240.395776] [<ffffffff818da156>] ? native_safe_halt+0x6/0x10 [<ffffffff818d9e6e>] default_idle+0x1e/0xd0 [<ffffffff8103797f>] arch_cpu_idle+0xf/0x20 [<ffffffff818da2c5>] default_idle_call+0x35/0x40 [<ffffffff810e3eb5>] cpu_startup_entry+0x185/0x210 [<ffffffff81050433>] start_secondary+0x103/0x130 RIP [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe] Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <[email protected]> Reviewed-by: Moni Shoua <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/ipoib: Let lower driver handle get_stats64 callErez Shitrit1-0/+12
The driver checks if the lower level driver supports get_stats, and if so calls it to get the updated statistics, otherwise takes from the current netdevice stats object. Signed-off-by: Erez Shitrit <[email protected]> Reviewed-by: Alex Vesker <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Yuval Shaia <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/core: Add ordered workqueue for RoCE GID managementMajd Dibbiny1-2/+9
Currently the RoCE GID management uses the ib_wq to do add and delete new GIDs according to the netdev events. The ib_wq isn't an ordered workqueue and thus two work elements can be executed concurrently which will result in unexpected behavior and inconsistency of the GIDs cache content. Example: ifconfig eth1 11.11.11.11/16 up This command will invoke the following netdev events in the following order: 1. NETDEV_UP 2. NETDEV_DOWN 3. NETDEV_UP If (2) and (3) will be executed concurrently or in reverse order, instead of having a new GID with 11.11.11.11 IP, we will end up without any new GIDs. Signed-off-by: Majd Dibbiny <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Yuval Shaia <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/mlx5: Clean mr_cache debugfs in case of failureLeon Romanovsky1-12/+22
The failure in creation of debugfs entries for mr_cache left entries, which were already created. It caused to mismatch and misguiding for the end users. The solution is to clean mr_cache debugfs root, so no leftovers will be in the system. In addition, let's document why the error is not needed to be forwarded to user in case of failure. Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Matan Barak <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/core: Remove NOIO QP create flagLeon Romanovsky1-1/+1
There are no users for IB_QP_CREATE_USE_GFP_NOIO flag, so let's remove it. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17{net, IB}/mlx4: Remove gfp flags argumentLeon Romanovsky16-90/+76
The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/{rdmavt, qib, hfi1}: Remove gfp flags argumentLeon Romanovsky6-54/+28
The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/IPoIB: Convert IPoIB to memalloc_noio_* callsLeon Romanovsky1-9/+7
Commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation") added the memalloc_noio_(save|restore) functions to enable people to modify the MM behavior by disabling I/O during memory allocation. This was further extended in Fixes: 934f3072c17c ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent allocation paths recursing back into the filesystem without explicitly changing the flags for every allocation site. However the IPoIB hasn't been keeping up with the changes and missed completely these memalloc_noio_* calls. This led to update of allocation site with special QP creation flag, see commit 09b93088d750 ("IB: Add a QP creation flag to use GFP_NOIO allocations"), while this flag is supported by small number of drivers in IB stack. Let's change it by updating to memalloc_noio_* calls and allow for every driver underneath enjoy NOIO allocations. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Dennis Dalessandro <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/IPoIB: Forward MTU change to driver belowErez Shitrit1-2/+17
This patch checks if there is a driver below that needs to be updated on the new MTU and calls it accordingly. Signed-off-by: Erez Shitrit <[email protected]> Reviewed by: Alex Vesker <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Yuval Shaia <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB: Convert msleep below 20ms to usleep_rangeLeon Romanovsky6-8/+9
The msleep(1) may do not sleep 1 ms as expected and will sleep longer. The simple conversion from msleep to usleep_range between 1ms and 2ms can solve an issue. The full and comprehensive explanation can be found at [1] and [2]. [1] https://lkml.org/lkml/2007/8/3/250 [2] Documentation/timers/timers-howto.txt Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Erez Shitrit <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMACParav Pandit1-19/+4
This patch makes use of IB core's ib_modify_qp_with_udata function that also resolves the DMAC and handles udata. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Eli Cohen <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2017-07-17IB/core: Introduce modify QP operation with udataParav Pandit2-8/+40
This patch adds new function ib_modify_qp_with_udata so that uverbs layer can avoid handling L2 mac address at verbs layer and depend on the core layer to resolve the mac address consistently for all required QPs. Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Eli Cohen <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Doug Ledford <[email protected]>