aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-05-01samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnelJesper Dangaard Brouer1-2/+2
The xdp_tx_iptunnel program can be terminated in two ways, after N-seconds or via Ctrl-C SIGINT. The SIGINT code path does not handle detatching the correct XDP program, in-case the program was attached with XDP_FLAGS_SKB_MODE. Fix this by storing the XDP flags as a global variable, which is available for the SIGINT handler function. Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Reviewed-by: Andy Gospodarek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-05-01samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned intJesper Dangaard Brouer4-10/+11
The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under samples/bpf/ should use the correct type. Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Reviewed-by: Andy Gospodarek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-05-01Merge branch 'xdp-netlink-ext-ack'David S. Miller8-24/+52
Jakub Kicinski says: ==================== xdp: use netlink extended ACK reporting This series is an attempt to make XDP more user friendly by enabling exploiting the recently added netlink extended ACK reporting to carry messages to user space. David Ahern's iproute2 ext ack patches for ip link are sufficient to show the errors like this: Error: nfp: MTU too large w/ XDP enabled Where the message is coming directly from the driver. There could still be a bit of a leap for a complete novice from the message above to the right settings, but it's a big improvement over the standard "Invalid argument" message. v1/non-rfc: - add a separate macro in patch 1; - add KBUILD_MODNAME as part of the message (Daniel); - don't print the error to logs in patch 1. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-05-01virtio_net: make use of extended ack message reportingJakub Kicinski1-4/+7
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-05-01nfp: make use of extended ack message reportingJakub Kicinski3-12/+17
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-05-01xdp: propagate extended ack to XDP setupJakub Kicinski3-8/+20
Drivers usually have a number of restrictions for running XDP - most common being buffer sizes, LRO and number of rings. Even though some drivers try to be helpful and print error messages experience shows that users don't often consult kernel logs on netlink errors. Try to use the new extended ack mechanism to carry the message back to user space. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-05-01netlink: add NULL-friendly helper for setting extended ACK messageJakub Kicinski1-0/+8
As we propagate extended ack reporting throughout various paths in the kernel it may be that the same function is called with the extended ack parameter passed as NULL. One place where that happens is in drivers which have a centralized reconfiguration function called both from ndos and from ethtool_ops. Add a new helper for setting the error message in such conditions. Existing helper is left as is to encourage propagating the ext act fully wherever possible. It also makes it clear in the code which messages may be lost due to ext ack being NULL. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-05-01ring-buffer: Return reader page back into existing ring bufferSteven Rostedt (VMware)4-10/+51
When reading the ring buffer for consuming, it is optimized for splice, where a page is taken out of the ring buffer (zero copy) and sent to the reading consumer. When the read is finished with the page, it calls ring_buffer_free_read_page(), which simply frees the page. The next time the reader needs to get a page from the ring buffer, it must call ring_buffer_alloc_read_page() which allocates and initializes a reader page for the ring buffer to be swapped into the ring buffer for a new filled page for the reader. The problem is that there's no reason to actually free the page when it is passed back to the ring buffer. It can hold it off and reuse it for the next iteration. This completely removes the interaction with the page_alloc mechanism. Using the trace-cmd utility to record all events (causing trace-cmd to require reading lots of pages from the ring buffer, and calling ring_buffer_alloc/free_read_page() several times), and also assigning a stack trace trigger to the mm_page_alloc event, we can see how many times the ring_buffer_alloc_read_page() needed to allocate a page for the ring buffer. Before this change: # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1 # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l 9968 After this change: # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1 # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l 4 Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2017-05-01netfilter: nf_ct_ext: invoke destroy even when ext is not attachedLiping Zhang2-12/+3
For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table, then remove it from the nat_bysource_table via nat_extend->destroy. But now, the nat extension is attached on demand, so if the nat extension is not attached, we will not be notified when the ct is destroyed, i.e. we may fail to remove ct from the nat_bysource_table. So just keep it simple, even if the extension is not attached, we will still invoke the related ext->destroy. And this will also preserve the flexibility for the future extension. Fixes: 9a08ecfe74d7 ("netfilter: don't attach a nat extension by default") Signed-off-by: Liping Zhang <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2017-05-01Merge tag 'ipvs3-for-v4.12' of ↵Pablo Neira Ayuso3-25/+1
http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Third Round of IPVS Updates for v4.12 please consider these enhancements to IPVS for v4.12. If it is too late for v4.12 then please consider them for v4.13. * Remove unused function * Correct comparison of unsigned value ==================== Signed-off-by: Pablo Neira Ayuso <[email protected]>
2017-05-01netfilter: snmp: avoid stack size warningFlorian Westphal1-6/+6
net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size of 1160 bytes is larger than 1024 bytes Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2017-05-01netfilter: nf_queue: only call synchronize_net twice if nf_queue is activeFlorian Westphal5-24/+27
nf_unregister_net_hook(s) can avoid a second call to synchronize_net, provided there is no nfqueue active in that net namespace (which is the common case). This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally this gets called during netns cleanup so no packets should be queued. For the rare case of base chain being unregistered or module removal while nfqueue is in use the extra hiccup due to the packet drops isn't a big deal. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2017-05-01netfilter: nf_log: don't call synchronize_rcu in nf_log_unsetFlorian Westphal2-2/+1
nf_log_unregister() (which is what gets called in the logger backends module exit paths) does a (required, module is removed) synchronize_rcu(). But nf_log_unset() is only called from pernet exit handlers. It doesn't free any memory so there appears to be no need to call synchronize_rcu. v2: Liping Zhang points out that nf_log_unregister() needs to be called after pernet unregister, else rmmod would become unsafe. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2017-05-01netfilter: batch synchronize_net calls during hook unregisterFlorian Westphal1-6/+40
synchronize_net is expensive and slows down netns cleanup a lot. We have two APIs to unregister a hook: nf_unregister_net_hook (which calls synchronize_net()) and nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop) Make nf_unregister_net_hook a wapper around new helper __nf_unregister_net_hook, which unlinks the hook but does not free it. Then, we can call that helper in nf_unregister_net_hooks and then call synchronize_net() only once. Andrey Konovalov reports this change improves syzkaller fuzzing speed at least twice. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2017-05-01powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TESTNicholas Piggin1-2/+2
This was a hack we added to work around the allmodconfig build breaking, see commit fb43e8477ed9 ("powerpc: Disable RELOCATABLE for COMPILE_TEST with PPC64"). Since we merged the thin archives support in commit 43c9127d94d6 ("powerpc: Add option to use thin archives") this hasn't been necessary, so remove it. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-05-01powerpc/xmon: Teach xmon oops about radix vectorsMichael Neuling1-2/+12
Currently if we take an oops caused by an 0x380 or 0x480 exception, we get a print which assumes SLB problems. With radix, these vectors have different meanings. This patch updates the oops message to reflect these different meanings. Signed-off-by: Michael Neuling <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-05-01mm: remove AVR32 arch special handling in mm/KconfigHans-Christian Noren Egtvedt1-1/+0
AVR32 architecture has been removed from the Linux kernel sources, hence clean up the special handling setting two quicklists by default in mm/Kconfig. Signed-off-by: Hans-Christian Noren Egtvedt <[email protected]>
2017-05-01lib: remove check for AVR32 arch in test_user_copyHans-Christian Noren Egtvedt1-1/+0
The AVR32 architecture support has been removed from the Linux kernel, hence remove all the check for this architecture in test_user_copy.c. Signed-off-by: Hans-Christian Noren Egtvedt <[email protected]>
2017-05-01lib: remove AVR32 entry in Kconfig.debug compile with frame pointersHans-Christian Noren Egtvedt1-1/+1
AVR32 architecture has been removed from the Linux kernel sources, hence clean up the architecture related symbols in lib/Kconfig.debug. Signed-off-by: Hans-Christian Noren Egtvedt <[email protected]>
2017-05-01scripts: remove AVR32 support from checkstack.plHans-Christian Noren Egtvedt1-5/+0
The AVR32 architecture support has been removed from the kernel, hence remove the related bits from checkstack.pl script. Signed-off-by: Hans-Christian Noren Egtvedt <[email protected]> Signed-off-by: Håvard Skinnemoen <[email protected]> Signed-off-by: Nicolas Ferre <[email protected]> Acked-by: Andy Shevchenko <[email protected]> Acked-by: Boris Brezillon <[email protected]>
2017-05-01docs: remove all references to AVR32 architectureHans-Christian Noren Egtvedt46-47/+3
The AVR32 architecture support has been removed from the Linux kernel, hence remove all references to it from Documentation. Signed-off-by: Hans-Christian Noren Egtvedt <[email protected]> Signed-off-by: Håvard Skinnemoen <[email protected]> Signed-off-by: Nicolas Ferre <[email protected]> Acked-by: Andy Shevchenko <[email protected]> Acked-by: Boris Brezillon <[email protected]>
2017-05-01avr32: remove support for AVR32 architectureHans-Christian Noren Egtvedt226-27045/+8
This patch drops support for AVR32 architecture from the Linux kernel. The AVR32 architecture is not keeping up with the development of the kernel, and since it shares so much of the drivers with Atmel ARM SoC, it is starting to hinder these drivers to develop swiftly. Also, all AVR32 AP7 SoC processors are end of lifed from Atmel (now Microchip). Finally, the GCC toolchain is stuck at version 4.2.x, and has not received any patches since the last release from Atmel; 4.2.4-atmel.1.1.3.avr32linux.1. When building kernel v4.10, this toolchain is no longer able to properly link the network stack. Haavard and I have came to the conclusion that we feel keeping AVR32 on life support offers more obstacles for Atmel ARMs, than it gives joy to AVR32 users. I also suspect there are very few AVR32 users left today, if anybody at all. Signed-off-by: Hans-Christian Noren Egtvedt <[email protected]> Signed-off-by: Håvard Skinnemoen <[email protected]> Signed-off-by: Nicolas Ferre <[email protected]> Acked-by: Andy Shevchenko <[email protected]> Acked-by: Boris Brezillon <[email protected]>
2017-05-01mm, zone_device: Replace {get, put}_zone_device_page() with a single ↵Dan Williams5-30/+31
reference to fix pmem crash The x86 conversion to the generic GUP code included a small change which causes crashes and data corruption in the pmem code - not good. The root cause is that the /dev/pmem driver code implicitly relies on the x86 get_user_pages() implementation doing a get_page() on the page refcount, because get_page() does a get_zone_device_page() which properly refcounts pmem's separate page struct arrays that are not present in the regular page struct structures. (The pmem driver does this because it can cover huge memory areas.) But the x86 conversion to the generic GUP code changed the get_page() to page_cache_get_speculative() which is faster but doesn't do the get_zone_device_page() call the pmem code relies on. One way to solve the regression would be to change the generic GUP code to use get_page(), but that would slow things down a bit and punish other generic-GUP using architectures for an x86-ism they did not care about. (Arguably the pmem driver was probably not working reliably for them: but nvdimm is an Intel feature, so non-x86 exposure is probably still limited.) So restructure the pmem code's interface with the MM instead: get rid of the get/put_zone_device_page() distinction, integrate put_zone_device_page() into __put_page() and and restructure the pmem completion-wait and teardown machinery: Kirill points out that the calls to {get,put}_dev_pagemap() can be removed from the mm fast path if we take a single get_dev_pagemap() reference to signify that the page is alive and use the final put of the page to drop that reference. This does require some care to make sure that any waits for the percpu_ref to drop to zero occur *after* devm_memremap_page_release(), since it now maintains its own elevated reference. This speeds up things while also making the pmem refcounting more robust going forward. Suggested-by: Kirill Shutemov <[email protected]> Tested-by: Kirill Shutemov <[email protected]> Signed-off-by: Dan Williams <[email protected]> Reviewed-by: Logan Gunthorpe <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2017-04-30net: phy: Allow BCM5481x PHYs to setup internal TX/RX clock delayAbhishek Shah1-36/+33
This patch allows users to enable/disable internal TX and/or RX clock delay for BCM5481x series PHYs so as to satisfy RGMII timing specifications. On a particular platform, whether TX and/or RX clock delay is required depends on how PHY connected to the MAC IP. This requirement can be specified through "phy-mode" property in the platform device tree. Signed-off-by: Abhishek Shah <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30net: sunhme: fix spelling mistakes: "ParityErro" -> "ParityError"Colin Ian King1-1/+1
trivial fix to spelling mistakes in printk message. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30bnx2x: Align RX buffersScott Wood1-0/+1
The bnx2x driver is not providing proper alignment on the receive buffers it passes to build_skb(), causing skb_shared_info to be misaligned. skb_shared_info contains an atomic, and while PPC normally supports unaligned accesses, it does not support unaligned atomics. Aligning the size of rx buffers will ensure that page_frag_alloc() returns aligned addresses. This can be reproduced on PPC by setting the network MTU to 1450 (or other non-multiple-of-4) and then generating sufficient inbound network traffic (one or two large "wget"s usually does it), producing the following oops: Unable to handle kernel paging request for unaligned access at address 0xc00000ffc43af656 Faulting instruction address: 0xc00000000080ef8c Oops: Kernel access of bad area, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: vmx_crypto powernv_rng rng_core powernv_op_panel leds_powernv led_class nfsd ip_tables x_tables autofs4 xfs lpfc bnx2x mdio libcrc32c crc_t10dif crct10dif_generic crct10dif_common CPU: 104 PID: 0 Comm: swapper/104 Not tainted 4.11.0-rc8-00088-g4c761da #2 task: c00000ffd4892400 task.stack: c00000ffd4920000 NIP: c00000000080ef8c LR: c00000000080eee8 CTR: c0000000001f8320 REGS: c00000ffffc33710 TRAP: 0600 Not tainted (4.11.0-rc8-00088-g4c761da) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24082042 XER: 00000000 CFAR: c00000000080eea0 DAR: c00000ffc43af656 DSISR: 00000000 SOFTE: 1 GPR00: c000000000907f64 c00000ffffc33990 c000000000dd3b00 c00000ffcaf22100 GPR04: c00000ffcaf22e00 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000b80008 c00000ffc43af636 c00000ffc43af656 0000000000000000 GPR12: c0000000001f6f00 c00000000fe1a000 000000000000049f 000000000000c51f GPR16: 00000000ffffef33 0000000000000000 0000000000008a43 0000000000000001 GPR20: c00000ffc58a90c0 0000000000000000 000000000000dd86 0000000000000000 GPR24: c000007fd0ed10c0 00000000ffffffff 0000000000000158 000000000000014a GPR28: c00000ffc43af010 c00000ffc9144000 c00000ffcaf22e00 c00000ffcaf22100 NIP [c00000000080ef8c] __skb_clone+0xdc/0x140 LR [c00000000080eee8] __skb_clone+0x38/0x140 Call Trace: [c00000ffffc33990] [c00000000080fb74] skb_clone+0x74/0x110 (unreliable) [c00000ffffc339c0] [c000000000907f64] packet_rcv+0x144/0x510 [c00000ffffc33a40] [c000000000827b64] __netif_receive_skb_core+0x5b4/0xd80 [c00000ffffc33b00] [c00000000082b2bc] netif_receive_skb_internal+0x2c/0xc0 [c00000ffffc33b40] [c00000000082c49c] napi_gro_receive+0x11c/0x260 [c00000ffffc33b80] [d000000066483d68] bnx2x_poll+0xcf8/0x17b0 [bnx2x] [c00000ffffc33d00] [c00000000082babc] net_rx_action+0x31c/0x480 [c00000ffffc33e10] [c0000000000d5a44] __do_softirq+0x164/0x3d0 [c00000ffffc33f00] [c0000000000d60a8] irq_exit+0x108/0x120 [c00000ffffc33f20] [c000000000015b98] __do_irq+0x98/0x200 [c00000ffffc33f90] [c000000000027f14] call_do_irq+0x14/0x24 [c00000ffd4923a90] [c000000000015d94] do_IRQ+0x94/0x110 [c00000ffd4923ae0] [c000000000008d90] hardware_interrupt_common+0x150/0x160 Signed-off-by: David S. Miller <[email protected]>
2017-04-30Linux 4.11Linus Torvalds1-1/+1
2017-04-30net: bridge: Fix improper taking over HW learned FDBArkadi Sharshevsky1-5/+3
Commit 7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb entries") added the ability to "take over an entry which was previously learned via HW when it shows up from a SW port". However, if an entry was learned via HW and then a control packet (e.g., ARP request) was trapped to the CPU, the bridge driver will update the entry and remove the externally learned flag, although the entry is still present in HW. Instead, only clear the externally learned flag in case of roaming. Fixes: 7e26bf45e4cb ("net: bridge: allow SW learn to take over HW fdb entries") Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Arkadi Sharashevsky <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30ipv4: get rid of ip_ra_lockWANG Cong1-10/+2
After commit 1215e51edad1 ("ipv4: fix a deadlock in ip_ra_control") we always take RTNL lock for ip_ra_control() which is the only place we update the list ip_ra_chain, so the ip_ra_lock is no longer needed. As Eric points out, BH does not need to disable either, RCU readers don't care. Signed-off-by: Cong Wang <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrongJesper Dangaard Brouer1-9/+31
The struct bpf_map_def was extended in commit fb30d4b71214 ("bpf: Add tests for map-in-map") with member unsigned int inner_map_idx. This changed the size of the maps section in the generated ELF _kern.o files. Unfortunately the loader in bpf_load.c does not detect or handle this. Thus, older _kern.o files became incompatible, and caused hard-to-debug errors where the syscall validation rejected BPF_MAP_CREATE request. This patch only detect the situation and aborts load_bpf_file(). It also add code comments warning people that read this loader for inspiration for these pitfalls. Fixes: fb30d4b71214 ("bpf: Add tests for map-in-map") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30lwtunnel: fix error path in lwtunnel_fill_encap()Dan Carpenter1-3/+4
We recently added a check to see if nla_nest_start() fails. There are two issues with that. First, if it fails then I don't think we should call nla_nest_cancel(). Second, it's slightly convoluted but the current code returns success but we should return -EMSGSIZE instead. Fixes: a50fe0ffd76f ("lwtunnel: check return value of nla_nest_start") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30liquidio: silence a locking static checker warningDan Carpenter1-0/+1
Presumably we never hit this return, but static checkers complain that we need to unlock so we may as well fix that. Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Felix Manlunas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30qed: Unlock on error in qed_vf_pf_acquire()Dan Carpenter1-1/+1
My static checker complains that we're holding a mutex on this error path. Let's goto exit instead of returning directly. Fixes: b0bccb69eba3 ("qed: Change locking scheme for VF channel") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30Merge branch 'hns-deferred-probe'David S. Miller4-9/+44
lipeng says: ==================== net: hns: bug fix for HNS driver This patchset add support defered dsaf probe when mdio and mbigen module is not insmod. For more details, please refer to individual patch. change log: V4 - > V5: 1. Float on net-next; 2. Delete patch "net: hns: fixed bug that skb used after kfree" from this patchset; V3 -> V4: 1. Delete redundant commit message; 2. Add Reviewed-by: Matthias Brugger <[email protected]>; V2 -> V3: 1. Check return value when platform_get_irq in hns_rcb_get_cfg; V1 -> V2: 1. Return appropriate errno in hns_mac_register_phy; ==================== Signed-off-by: David S. Miller <[email protected]>
2017-04-30net: hns: support deferred probe when no mdiolipeng1-6/+33
In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio module is probed with module_init, and, as such, is not guaranteed to probe before the HNS driver. So we need to support deferred probe. We check for probe deferral in the mac init, so we not init DSAF when there is no mdio, and free all resource, to later learn that we need to defer the probe. Signed-off-by: lipeng <[email protected]> Reviewed-by: Yisen Zhuang <[email protected]> Reviewed-by: Matthias Brugger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30net: hns: support deferred probe when can not obtain irqlipeng3-3/+11
In the hip06 and hip07 SoCs, the interrupt lines from the DSAF controllers are connected to mbigen hw module. The mbigen module is probed with module_init, and, as such, is not guaranteed to probe before the HNS driver. So we need to support deferred probe. Signed-off-by: lipeng <[email protected]> Reviewed-by: Yisen Zhuang <[email protected]> Reviewed-by: Matthias Brugger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30Merge branch 'nfp-XDP_TX-optimizations'David S. Miller6-101/+118
Jakub Kicinski says: ==================== nfp: optimize XDP TX and small fixes This series optimizes the nfp XDP TX performance a little bit. I run quick tests on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. Single core/queue performance for both touch and drop and touch and forward is above 20Mpps @64B packets, drop being 2Mpps faster. I think this is max for a single queue on the low power NFPs. There are also a few minor fixes included for code in net-next. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-04-30nfp: provide 256 bytes of XDP headroom in all configurationsJakub Kicinski2-14/+3
For legacy reasons NFP FW may be compiled to DMA packets to a constant offset into the buffer and use the space before it for metadata. This ensures that packets data always start at a certain offset regardless of the amount of preceding metadata. If rx offset is set to 0 there may still be up to 64 bytes of metadata but metadata will start at the beginning of the buffer, instead of: data_start_offset = rx_offset - meta_len Even though we make the buffers larger to accommodate up to 64 bytes of metadata, if there is only N bytes of metadata, we will end up with N bytes of headroom and 64 - N bytes of tailroom. Therefore we can't rely on that space for XDP headroom. Make sure we always allocate full 256 bytes. This, unfortunately, means we can't fit the headroom on an u8 any more. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30nfp: don't completely refuse to work with old flashesJakub Kicinski1-3/+1
Right now the required Service Process ABI version is still tied to max ID of known commands. For new NSP commands we are adding we are checking if NSP version is recent enough on command-by-command basis. The driver doesn't have to force the device to have the very latest flash, anything newer than 0.8 should do. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30nfp: avoid reading TX queue indexes from the deviceJakub Kicinski1-0/+6
Reading TX queue indexes from the device memory on each interrupt is expensive. It's doubly expensive with XDP running since we have two TX rings to check there. If the software indexes indicate that the TX queue is completely empty, however, we don't need to look at the device completion index at all. The queuing CPU is doing a wmb() before kicking the device TX so we should be safe to assume on the CPU handling the completions will never see old value of the software copy of the index. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30nfp: do simple XDP TX buffer recyclingJakub Kicinski2-57/+85
On the RX path we follow the "drop if allocation of replacement buffer fails" rule. With XDP we extended that to the TX action, so if XDP prog returned TX but allocation of replacement RX buffer failed, we will drop the packet. To improve our XDP TX performance extend the idea of rings being always full to XDP TX rings. Pre-fill the XDP TX rings with RX buffers, and when XDP prog returns TX action swap the RX buffer with the next buffer from the TX ring. XDP TX complete will no longer free the buffers but let them sit on the TX ring and wait for swap with RX buffer, instead. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30nfp: drop rx_ring param from buffer allocationJakub Kicinski1-6/+2
We will soon allocate RX buffers for caching on XDP TX rings. The rx_ring parameter passed to nfp_net_rx_alloc_one() is not actually used, remove it. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30nfp: replace -ENOTSUPP with -EOPNOTSUPPJakub Kicinski4-21/+21
As Or points out in commit 423b3aecf290 ("net/mlx4: Change ENOTSUPP to EOPNOTSUPP"), ENOTSUPP is NFS specific error. Replace it with EOPNOTSUPP. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30virtio-net: use netif_tx_napi_add for tx napiWillem de Bruijn1-2/+2
Avoid hashing the tx napi struct into napi_hash[], which is used for busy polling receive queues. Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30net: Initialise init_net.count to 1David Howells1-1/+2
Initialise init_net.count to 1 for its pointer from init_nsproxy lest someone tries to do a get_net() and a put_net() in a process in which current->ns_proxy->net_ns points to the initial network namespace. Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30geneve: fix incorrect setting of UDP checksum flagGirish Moodalbail1-1/+1
Creating a geneve link with 'udpcsum' set results in a creation of link for which UDP checksum will NOT be computed on outbound packets, as can be seen below. 11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0 geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum Similarly, creating a link with 'noudpcsum' set results in a creation of link for which UDP checksum will be computed on outbound packets. Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Girish Moodalbail <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Acked-by: Lance Richardson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30Merge branch 'vxlan-disabled-ipv6'David S. Miller1-5/+7
Jiri Benc says: ==================== vxlan: do not error out on disabled IPv6 This patchset fixes a bug with metadata based tunnels when booted with ipv6.disable=1. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-04-30vxlan: do not output confusing error messageJiri Benc1-2/+0
The message "Cannot bind port X, err=Y" creates only confusion. In metadata based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and no error message should be printed. But when IPv6 tunnel was requested, such failure is fatal. The vxlan_socket_create does not know when the error is harmless and when it's not. Instead of passing such information down to vxlan_socket_create, remove the message completely. It's not useful. We propagate the error code up to the user space and the port number comes from the user space. There's nothing in the message that the process creating vxlan interface does not know. Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30vxlan: correctly handle ipv6.disable module parameterJiri Benc1-3/+7
When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole operation of bringing up the tunnel. Ignore failure of IPv6 socket creation for metadata based tunnels caused by IPv6 not being available. Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-04-30bnx2x: Get rid of useless temporary variableAndy Shevchenko1-9/+5
Replace pattern int status; ... status = func(...); return status; by return func(...); No functional change intented. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>