aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-07-03Documentation: fix wrong example commandMatteo Croce1-2/+2
In the IPVLAN documentation there is an example command line where the master and slave interface names are inverted. Fix the command line and also add the optional `name' keyword to better describe what the command is doing. v2: added commit message Signed-off-by: Matteo Croce <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03vxlan: correctly set vxlan->net when creating the device in a netnsSabrina Dubroca1-3/+6
Commit a985343ba906 ("vxlan: refactor verification and application of configuration") modified vxlan device creation, and replaced the assignment of vxlan->net to src_net with dev_net(netdev) in ->setup(). But dev_net(netdev) is not the same as src_net. At the time ->setup() is called, dev_net hasn't been set yet, so we end up creating the socket for the vxlan device in init_net. Fix this by bringing back the assignment of vxlan->net during device creation. Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration") Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: Matthias Schiffer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03Merge branch 'hns-phy-loopback'David S. Miller5-71/+92
Lin Yun Sheng says: ==================== Add loopback support in phy_driver and hns ethtool fix This Patch Set add set_loopback in phy_driver and use it to setup loopback when doing ethtool phy self_test. Patch V8: Respin the Patch based on net-next Patch V7: 1. Add comment why resume the phy in hns_nic_config_phy_loopback. 2. Fix a typo error in patch description. Patch V6: Fix Or'ing error code in __lb_setup. Patch V5: Removing non loopback related code change. Patch V4: 1. Remove c45 checking 2. Add -ENOTSUPP when function pointer is null, take mutex in phy_loopback. Patch V3: Calling phy_loopback enable and disable in pair in hns mac driver. Patch V2: 1. Add phy_loopback in phy_device.c. 2. Do error checking and do the read and write once in genphy_loopback. 3. Remove gen10g_loopback in phy_device.c. Patch V1: Initial Submit ==================== Signed-off-by: David S. Miller <[email protected]>
2017-07-03net: hns: Use phy_driver to setup Phy loopbackLin Yun Sheng2-71/+35
Use function set_loopback in phy_driver to setup phy loopback when doing ethtool self test. Signed-off-by: Lin Yun Sheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03net: phy: Add phy loopback support in net phy frameworkLin Yun Sheng3-0/+57
This patch add set_loopback in phy_driver, which is used by MAC driver to enable or disable phy loopback. it also add a generic genphy_loopback function, which use BMCR loopback bit to enable or disable loopback. Signed-off-by: Lin Yun Sheng <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03net/mlx5: fix memcpy limit?Stephen Rothwell1-1/+1
Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03ipv6: dad: don't remove dynamic addresses if link is downSabrina Dubroca1-9/+9
Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03net: cdc_ncm: Reduce memory use when kernel memory lowJim Baxter2-12/+45
The CDC-NCM driver can require large amounts of memory to create skb's and this can be a problem when the memory becomes fragmented. This especially affects embedded systems that have constrained resources but wish to maximise the throughput of CDC-NCM with 16KiB NTB's. The issue is after running for a while the kernel memory can become fragmented and it needs compacting. If the NTB allocation is needed before the memory has been compacted the atomic allocation can fail which can cause increased latency, large re-transmissions or disconnections depending upon the data being transmitted at the time. This situation occurs for less than a second until the kernel has compacted the memory but the failed devices can take a lot longer to recover from the failed TX packets. To ease this temporary situation I modified the CDC-NCM TX path to temporarily switch into a reduced memory mode which allocates an NTB that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes) sized memory block and only transmit NTB's with a single network frame until the memory situation is resolved. Each time this issue occurs we wait for an increasing number of reduced size allocations before requesting a full size one to not put additional pressure on a low memory system. Once the memory is compacted the CDC-NCM data can resume transmitting at the normal tx_max rate once again. Signed-off-by: Jim Baxter <[email protected]> Reviewed-by: Bjørn Mork <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03Merge branch 'qed-Add-iWARP-support-for-QL4xxxx'David S. Miller18-100/+3008
Michal Kalderon says: ==================== qed: Add iWARP support for QL4xxxx This patch series adds iWARP support to our QL4xxxx networking adapters. The code changes span across qed and qedr drivers, but this series contains changes to qed only. Once the series is accepted, the qedr series will be submitted to the rdma tree. There is one additional qed patch which enables the iWARP, this patch is delayed until the qedr series will be accepted. The patches were previously sent as an RFC, and these are the first 12 patches in the RFC series: https://www.spinics.net/lists/linux-rdma/msg51416.html This series was tested and built against net-next. MAINTAINERS file is not updated in this PATCH as there is a pending patch for qedr driver update https://patchwork.kernel.org/patch/9752761. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: Add iWARP support for physical queue allocationKalderon, Michal1-0/+4
iWARP has different physical queue requirements than RoCE Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: Add iWARP protocol support in context allocationKalderon, Michal1-2/+11
When computing how much memory is required for the different hw clients iWARP protocol should be taken into account Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: iWARP CM add error handlingKalderon, Michal2-2/+190
This patch introduces error handling for errors that occurred during connection establishment. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: iWARP implement disconnect flowsKalderon, Michal2-1/+91
This patch takes care of active/passive disconnect flows. Disconnect flows can be initiated remotely, in which case a async event will arrive from peer and indicated to qedr driver. These are referred to as exceptions. When a QP is destroyed, it needs to check that it's associated ep has been closed. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: iWARP CM add active side connectKalderon, Michal4-12/+265
This patch implements the active side connect. Offload a connection, process MPA reply and send RTR. In some of the common passive/active functions, the active side will work in blocking mode. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: iWARP CM add passive side connectKalderon, Michal9-20/+1039
This patch implements the passive side connect. It addresses pre-allocating resources, creating a connection element upon valid SYN packet received. Calling upper layer and implementation of the accept/reject calls. Error handling is not part of this patch. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: iWARP CM add listener functions and initial SYN processingKalderon, Michal4-3/+343
This patch adds the ability to add and remove listeners and identify whether the SYN packet received is intended for iWARP or not. If a listener is not found the SYN packet is posted back to the chip. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: iWARP CM - setup a ll2 connection for handling SYN packetsKalderon, Michal2-3/+220
iWARP handles incoming SYN packets using the ll2 interface. This patch implements ll2 setup and teardown. Additional ll2 connections will be used in the future which are not part of this patch series. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: Add iWARP support in ll2 connectionsKalderon, Michal2-2/+12
Add a new connection type for iWARP ll2 connections for setting correct ll2 filters and connection type to FW. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: Rename some ll2 related definesKalderon, Michal3-17/+16
Make some names more generic as they will be used by iWARP too. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: Implement iWARP initialization, teardown and qp operationsKalderon, Michal11-40/+803
This patch adds iWARP support for flows that have common code between RoCE and iWARP, such as initialization, teardown and qp setup verbs: create, destroy, modify, query. It introduces the iWARP specific files qed_iwarp.[ch] and iwarp_common.h Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-03qed: Introduce iWARP personalityKalderon, Michal7-27/+43
iWARP personality introduced the need for differentiating in several places in the code whether we are RoCE, iWARP or either. This leads to introducing new macros for querying the personality. Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-02Linux 4.12Linus Torvalds1-1/+1
2017-07-02moduleparam: fix doc: hwparam_irq configures an IRQSylvain 'ythier' Hitier1-1/+1
Signed-off-by: Sylvain 'ythier' Hitier <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-02bpf: fix to bpf_setsockopsLawrence Brakmo1-2/+1
Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1 statement up). Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-02parisc: Report SIGSEGV instead of SIGBUS when running out of stackHelge Deller1-1/+1
When a process runs out of stack the parisc kernel wrongly faults with SIGBUS instead of the expected SIGSEGV signal. This example shows how the kernel faults: do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000] trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000 The vma->vm_end value is the first address which does not belong to the vma, so adjust the check to include vma->vm_end to the range for which to send the SIGSEGV signal. This patch unbreaks building the debian libsigsegv package. Cc: [email protected] Signed-off-by: Helge Deller <[email protected]>
2017-07-02parisc: use compat_sys_keyctl()Eric Biggers1-1/+1
Architectures with a compat syscall table must put compat_sys_keyctl() in it, not sys_keyctl(). The parisc architecture was not doing this; fix it. Cc: [email protected] Signed-off-by: Eric Biggers <[email protected]> Acked-by: Helge Deller <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2017-07-02Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds7-16/+33
Pull MIPS fixes from Ralf Baechle: "Here's a final round of fixes for 4.12: - Fix misordered instructions in assembly code making kenel startup via UHB unreliable. - Fix special case of MADDF and MADDF emulation. - Fix alignment issue in address calculation in pm-cps on 64 bit. - Fix IRQ tracing & lockdep when rescheduling - Systems with MAARs require post-DMA cache flushes. The reordering fix and the MADDF/MSUBF fix have sat in linux-next for a number of days. The others haven't propagated from my pull tree to linux-next yet but all have survived manual testing and Imagination's automated test system and there are no pending bug reports" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Avoid accidental raw backtrace MIPS: Perform post-DMA cache flushes on systems with MAARs MIPS: Fix IRQ tracing & lockdep when rescheduling MIPS: pm-cps: Drop manual cache-line alignment of ready_count MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately MIPS: head: Reorder instructions missing a delay slot
2017-07-02Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-4/+4
Pull ARM fix from Russell King: "One final fix for 4.12 - Doug found a boot failure case triggered by requesting a non-even MB vmalloc size" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8685/1: ensure memblock-limit is pmd-aligned
2017-07-02locking/refcount: Remove the half-implemented refcount_sub() APIKees Cook1-6/+0
CONFIG_REFCOUNT_FULL=y (correctly) does not provide a refcount_sub(), which should not be part of proper refcount design patterns. Remove the erroneous extern and the later !CONFIG_REFCOUNT_FULL accidental implementation. Signed-off-by: Kees Cook <[email protected]> Cc: Elena Reshetova <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 29dee3c03abc ("locking/refcounts: Out-of-line everything") Link: http://lkml.kernel.org/r/20170701180129.GA17405@beast Signed-off-by: Ingo Molnar <[email protected]>
2017-07-01Merge branch 'bpf-Add-support-for-sock_ops'David S. Miller25-26/+1218
Lawrence Brakmo says: ==================== bpf: Add support for sock_ops Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.) and setting connection parameters such as buffer sizes, initial window, SYN/SYN-ACK RTOs, etc. Unlike current BPF program types that expect to be called at a particular place in the network stack code, SOCK_OPS program can be called at different places and use an "op" field to indicate the context. There are currently two types of operations, those whose effect is through their return value and those whose effect is through the new bpf_setsocketop BPF helper function. Example operands of the first type are: BPF_SOCK_OPS_TIMEOUT_INIT BPF_SOCK_OPS_RWND_INIT BPF_SOCK_OPS_NEEDS_ECN Example operands of the secont type are: BPF_SOCK_OPS_TCP_CONNECT_CB BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB Current operands are only called during connection establishment so there should not be any BPF overheads after connection establishment. The main idea is to use connection information form both hosts, such as IP addresses and ports to allow setting of per connection parameters to optimize the connection's peformance. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some disticnt advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it does not require application changes and it can be updated easily at any time. It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. This patch set also includes sample BPF programs to demostrate the differnet features. v2: Formatting changes, rebased to latest net-next v3: Fixed build issues, changed socket_ops to sock_ops throught, fixed formatting issues, removed the syscall to load sock_ops program and added functionality to use existing bpf attach and bpf detach system calls, removed reader/writer locks in sock_bpfops.c (used when saving sock_ops global program) and fixed missing module refcount increment. v4: Removed global sock_ops program and instead used existing cgroup bpf infrastructure to support a new BPF_CGROUP_ATTCH type. v5: fixed kbuild warning happening in bpf-cgroup.h removed automatic converstion to host byte order from some sock_ops fields (ipv4 and ipv6 addresses, remote port) Added conversion to host byte order in some of the sample programs Added to sample BPF program comments about using load_sock_ops to load Removed is_req_sock field from bpf_sock_ops_kern and related places, using sk_fullsock() instead. v6: fixes to BPF helper function setsockopt (possible NULL deferencing, etc.) ==================== Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: update tools/include/uapi/linux/bpf.hLawrence Brakmo1-1/+65
Update tools/include/uapi/linux/bpf.h to include changes related to new bpf sock_ops program type. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Sample bpf program to set sndcwnd clampLawrence Brakmo2-0/+103
Sample BPF program, tcp_clamp_kern.c, to demostrate the use of setting the sndcwnd clamp. This program assumes that if the first 5.5 bytes of the host's IPv6 addresses are the same, then the hosts are in the same datacenter and sets sndcwnd clamp to 100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer sizes to 150KB. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Adds support for setting sndcwnd clampLawrence Brakmo2-0/+8
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which sets the initial congestion window. It is useful to limit the sndcwnd when the host are close to each other (small RTT). Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Sample BPF program to set initial cwndLawrence Brakmo2-0/+89
Sample BPF program that assumes hosts are far away (i.e. large RTTs) and sets initial cwnd and initial receive window to 40 packets, send and receive buffers to 1.5MB. In practice there would be a test to insure the hosts are actually far enough away. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Adds support for setting initial cwndLawrence Brakmo2-1/+19
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the initial congestion window. This can be used when the hosts are far apart (large RTTs) and it is safe to start with a large inital cwnd. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Sample BPF program to set congestion controlLawrence Brakmo2-0/+84
Sample BPF program that sets congestion control to dctcp when both hosts are within the same datacenter. In this example that is assumed to be when they have the first 5.5 bytes of their IPv6 address are the same. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Add support for changing congestion controlLawrence Brakmo7-17/+58
Added support for changing congestion control for SOCK_OPS bpf programs through the setsockopt bpf helper function. It also adds a new SOCK_OPS op, BPF_SOCK_OPS_NEEDS_ECN, that is needed for congestion controls, like dctcp, that need to enable ECN in the SYN packets. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Sample BPF program to set buffer sizesLawrence Brakmo2-0/+87
This patch contains a BPF program to set initial receive window to 40 packets and send and receive buffers to 1.5MB. This would usually be done after doing appropriate checks that indicate the hosts are far enough away (i.e. large RTT). Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Add TCP connection BPF callbacksLawrence Brakmo4-1/+15
Added callbacks to BPF SOCK_OPS type program before an active connection is intialized and after a passive or active connection is established. The following patch demostrates how they can be used to set send and receive buffer sizes. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Add setsockopt helper function to bpfLawrence Brakmo3-2/+94
Added support for calling a subset of socket setsockopts from BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather than making the changes to call the socket setsockopt function because the changes required would have been larger. The ops supported are: SO_RCVBUF SO_SNDBUF SO_MAX_PACING_RATE SO_PRIORITY SO_RCVLOWAT SO_MARK Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Sample bpf program to set initial windowLawrence Brakmo2-0/+70
The sample bpf program, tcp_rwnd_kern.c, sets the initial advertized window to 40 packets in an environment where distinct IPv6 prefixes indicate that both hosts are not in the same data center. Signed-off-by: Lawrence Brakmo <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Support for setting initial receive windowLawrence Brakmo4-2/+28
This patch adds suppport for setting the initial advertized window from within a BPF_SOCK_OPS program. This can be used to support larger initial cwnd values in environments where it is known to be safe. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Sample bpf program to set SYN/SYN-ACK RTOsLawrence Brakmo2-0/+70
The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK RTOs to 10ms when both hosts are within the same datacenter (i.e. small RTTs) in an environment where common IPv6 prefixes indicate both hosts are in the same data center. Signed-off-by: Lawrence Brakmo <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: Support for per connection SYN/SYN-ACK RTOsLawrence Brakmo4-2/+17
This patch adds support for setting a per connection SYN and SYN_ACK RTOs from within a BPF_SOCK_OPS program. For example, to set small RTOs when it is known both hosts are within a datacenter. Signed-off-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: program to load and attach sock_ops BPF progsLawrence Brakmo2-0/+100
The program load_sock_ops can be used to load sock_ops bpf programs and to attach it to an existing (v2) cgroup. It can also be used to detach sock_ops programs. Examples: load_sock_ops [-l] <cg-path> <prog filename> Load and attaches a sock_ops program at the specified cgroup. If "-l" is used, the program will continue to run to output the BPF log buffer. If the specified filename does not end in ".o", it appends "_kern.o" to the name. load_sock_ops -r <cg-path> Detaches the currently attached sock_ops program from the specified cgroup. Signed-off-by: Lawrence Brakmo <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01bpf: BPF support for sock_opsLawrence Brakmo9-3/+314
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.). It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. The program will be called at appropriate times to set relevant connections parameters such as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection information such as IP addresses, port numbers, etc. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some distinct advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it oes not require application changes and it can be updated easily at any time. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. The purpose of this new program type is to simplify setting connection parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is easy to use facebook's internal IPv6 addresses to determine if both hosts of a connection are in the same datacenter. Therefore, it is easy to write a BPF program to choose a small SYN RTO value when both hosts are in the same datacenter. This patch only contains the framework to support the new BPF program type, following patches add the functionality to set various connection parameters. This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS and a new bpf syscall command to load a new program of this type: BPF_PROG_LOAD_SOCKET_OPS. Two new corresponding structs (one for the kernel one for the user/BPF program): /* kernel version */ struct bpf_sock_ops_kern { struct sock *sk; __u32 op; union { __u32 reply; __u32 replylong[4]; }; }; /* user version * Some fields are in network byte order reflecting the sock struct * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to * convert them to host byte order. */ struct bpf_sock_ops { __u32 op; union { __u32 reply; __u32 replylong[4]; }; __u32 family; __u32 remote_ip4; /* In network byte order */ __u32 local_ip4; /* In network byte order */ __u32 remote_ip6[4]; /* In network byte order */ __u32 local_ip6[4]; /* In network byte order */ __u32 remote_port; /* In network byte order */ __u32 local_port; /* In host byte horder */ }; Currently there are two types of ops. The first type expects the BPF program to return a value which is then used by the caller (or a negative value to indicate the operation is not supported). The second type expects state changes to be done by the BPF program, for example through a setsockopt BPF helper function, and they ignore the return value. The reply fields of the bpf_sockt_ops struct are there in case a bpf program needs to return a value larger than an integer. Signed-off-by: Lawrence Brakmo <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01Merge branch 'for-upstream' of ↵David S. Miller10-39/+60
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-07-01 Here are some more Bluetooth patches for the 4.13 kernel: - Added support for Broadcom BCM43430 controllers - Added sockaddr length checks before accessing sa_family - Fixed possible "might sleep" errors in bnep, cmtp and hidp modules - A few other minor fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-07-01sctp: Add peeloff-flags socket optionNeil Horman2-15/+78
Based on a request raised on the sctp devel list, there is a need to augment the sctp_peeloff operation while specifying the O_CLOEXEC and O_NONBLOCK flags (simmilar to the socket syscall). Since modifying the SCTP_SOCKOPT_PEELOFF socket option would break user space ABI for existing programs, this patch creates a new socket option SCTP_SOCKOPT_PEELOFF_FLAGS, which accepts a third flags parameter to allow atomic assignment of the socket descriptor flags. Tested successfully by myself and the requestor Signed-off-by: Neil Horman <[email protected]> CC: Vlad Yasevich <[email protected]> CC: "David S. Miller" <[email protected]> CC: Andreas Steinmetz <[email protected]> CC: Marcelo Ricardo Leitner <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-07-01Merge branch 'sfc-MCDI-cleanups'David S. Miller1-3/+4
Edward Cree says: ==================== sfc: small MCDI cleanups Giving the full MCDI event rather than just the code can aid in debugging. While fixing this I noticed an outdated comment. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-07-01sfc: correct comment on efx_mcdi_process_eventEdward Cree1-1/+1
Fix out-of-date comment. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>