aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-11-14bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZEROYonghong Song1-2/+6
The helper bpf_probe_read arg2 type is changed from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO to permit size-0 buffer. Together with newer ARG_CONST_SIZE_OR_ZERO semantics which allows non-NULL buffer with size 0, this allows simpler bpf programs with verifier acceptance. The previous commit which changes ARG_CONST_SIZE_OR_ZERO semantics has details on examples. Signed-off-by: Yonghong Song <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semanticsYonghong Song1-16/+24
For helpers, the argument type ARG_CONST_SIZE_OR_ZERO permits the access size to be 0 when accessing the previous argument (arg). Right now, it requires the arg needs to be NULL when size passed is 0 or could be 0. It also requires a non-NULL arg when the size is proved to be non-0. This patch changes verifier ARG_CONST_SIZE_OR_ZERO behavior such that for size-0 or possible size-0, it is not required the arg equal to NULL. There are a couple of reasons for this semantics change, and all of them intends to simplify user bpf programs which may improve user experience and/or increase chances of verifier acceptance. Together with the next patch which changes bpf_probe_read arg2 type from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO, the following two examples, which fail the verifier currently, are able to get verifier acceptance. Example 1: unsigned long len = pend - pstart; len = len > MAX_PAYLOAD_LEN ? MAX_PAYLOAD_LEN : len; len &= MAX_PAYLOAD_LEN; bpf_probe_read(data->payload, len, pstart); It does not have test for "len > 0" and it failed the verifier. Users may not be aware that they have to add this test. Converting the bpf_probe_read helper to have ARG_CONST_SIZE_OR_ZERO helps the above code get verifier acceptance. Example 2: Here is one example where llvm "messed up" the code and the verifier fails. ...... unsigned long len = pend - pstart; if (len > 0 && len <= MAX_PAYLOAD_LEN) bpf_probe_read(data->payload, len, pstart); ...... The compiler generates the following code and verifier fails: ...... 39: (79) r2 = *(u64 *)(r10 -16) 40: (1f) r2 -= r8 41: (bf) r1 = r2 42: (07) r1 += -1 43: (25) if r1 > 0xffe goto pc+3 R0=inv(id=0) R1=inv(id=0,umax_value=4094,var_off=(0x0; 0xfff)) R2=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4095,imm=0) R7=inv(id=0) R8=inv(id=0) R9=inv0 R10=fp0 44: (bf) r1 = r6 45: (bf) r3 = r8 46: (85) call bpf_probe_read#45 R2 min value is negative, either use unsigned or 'var &= const' ...... The compiler optimization is correct. If r1 = 0, r1 - 1 = 0xffffffffffffffff > 0xffe. If r1 != 0, r1 - 1 will not wrap. r1 > 0xffe at insn #43 can actually capture both "r1 > 0" and "len <= MAX_PAYLOAD_LEN". This however causes an issue in verifier as the value range of arg2 "r2" does not properly get refined and lead to verification failure. Relaxing bpf_prog_read arg2 from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO allows the following simplied code: unsigned long len = pend - pstart; if (len <= MAX_PAYLOAD_LEN) bpf_probe_read(data->payload, len, pstart); The llvm compiler will generate less complex code and the verifier is able to verify that the program is okay. Signed-off-by: Yonghong Song <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14tcp: allow drivers to tweak TSQ logicEric Dumazet3-2/+5
I had many reports that TSQ logic breaks wifi aggregation. Current logic is to allow up to 1 ms of bytes to be queued into qdisc and drivers queues. But Wifi aggregation needs a bigger budget to allow bigger rates to be discovered by various TCP Congestion Controls algorithms. This patch adds an extra socket field, allowing wifi drivers to select another log scale to derive TCP Small Queue credit from current pacing rate. Initial value is 10, meaning that this patch does not change current behavior. We expect wifi drivers to set this field to smaller values (tests have been done with values from 6 to 9) They would have to use following template : if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) skb->sk->sk_pacing_shift = MY_PACING_SHIFT; Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041 Signed-off-by: Eric Dumazet <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Toke Høiland-Jørgensen <[email protected]> Cc: Kir Kolyshkin <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14Merge tag 'rxrpc-next-20171111' of ↵David S. Miller7-7/+36
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes Here are some patches that fix some things in AF_RXRPC: (1) Prevent notifications from being passed to a kernel service for a call that it has ended. (2) Fix a null pointer deference that occurs under some circumstances when an ACK is generated. (3) Fix a number of things to do with call expiration. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-14bnx2x: fix slowpath null crashZhu Yanjun1-3/+10
When "NETDEV WATCHDOG: em4 (bnx2x): transmit queue 2 timed out" occurs, BNX2X_SP_RTNL_TX_TIMEOUT is set. In the function bnx2x_sp_rtnl_task, bnx2x_nic_unload and bnx2x_nic_load are executed to shutdown and open NIC. In the function bnx2x_nic_load, bnx2x_alloc_mem allocates dma failure. The message "bnx2x: [bnx2x_alloc_mem:8399(em4)]Can't allocate memory" pops out. The variable slowpath is set to NULL. When shutdown the NIC, the function bnx2x_nic_unload is called. In the function bnx2x_nic_unload, the following functions are executed. bnx2x_chip_cleanup bnx2x_set_storm_rx_mode bnx2x_set_q_rx_mode bnx2x_set_q_rx_mode bnx2x_config_rx_mode bnx2x_set_rx_mode_e2 In the function bnx2x_set_rx_mode_e2, the variable slowpath is operated. Then the crash occurs. To fix this crash, the variable slowpath is checked. And in the function bnx2x_sp_rtnl_task, after dma memory allocation fails, another shutdown and open NIC is executed. CC: Joe Jin <[email protected]> CC: Junxiao Bi <[email protected]> Signed-off-by: Zhu Yanjun <[email protected]> Acked-by: Ariel Elior <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14Merge branch 'cxgb4-collect-LE-TCAM-and-SGE-queue-contexts'David S. Miller9-0/+456
Rahul Lakkireddy says: ==================== cxgb4: collect LE-TCAM and SGE queue contexts Collect hardware dumps via ethtool --get-dump facility. Patch 1 collects LE-TCAM dump. Patch 2 collects SGE queue context dumps. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-14cxgb4: collect SGE queue context dumpRahul Lakkireddy9-0/+195
Collect SGE freelist queue and congestion manager contexts. Signed-off-by: Rahul Lakkireddy <[email protected]> Signed-off-by: Ganesh Goudar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14cxgb4: collect LE-TCAM dumpRahul Lakkireddy6-0/+261
Signed-off-by: Rahul Lakkireddy <[email protected]> Signed-off-by: Ganesh Goudar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14vxlan: fix the issue that neigh proxy blocks all icmpv6 packetsXin Long1-18/+13
Commit f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") removed icmp6_code and icmp6_type check before calling neigh_reduce when doing neigh proxy. It means all icmpv6 packets would be blocked by this, not only ns packet. In Jianlin's env, even ping6 couldn't work through it. This patch is to bring the icmp6_code and icmp6_type check back and also removed the same check from neigh_reduce(). Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") Reported-by: Jianlin Shi <[email protected]> Signed-off-by: Xin Long <[email protected]> Reviewed-by: Vincent Bernat <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14xfrm6_tunnel: exit_net cleanup check addedVasily Averin1-0/+8
Be sure that spi_byaddr and spi_byspi arrays initialized in net_init hook were return to initial state Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14ppp: exit_net cleanup checks addedVasily Averin1-0/+2
Be sure that lists initialized in net_init hook were return to initial state. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14phonet: exit_net cleanup check addedVasily Averin1-0/+3
Be sure that pndevs.list initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14l2tp: exit_net cleanup check addedVasily Averin1-0/+4
Be sure that l2tp_session_hlist array initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14fib_rules: exit_net cleanup check addedVasily Averin1-0/+6
Be sure that rules_ops list initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14fib_notifier: exit_net cleanup check addedVasily Averin1-0/+6
Be sure that fib_notifier_ops list initilized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14netdev: exit_net cleanup check addedVasily Averin1-0/+2
Be sure that dev_base_head list initialized in net_init hook was return to initial state Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14vxlan: exit_net cleanup checks addedVasily Averin1-0/+4
Be sure that sock_list array initialized in net_init hook was return to initial state Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14packet: exit_net cleanup check addedVasily Averin1-0/+1
Be sure that packet.sklist initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14geneve: exit_net cleanup check addedVasily Averin1-0/+1
Be sure that sock_list initialized in net_init hook was return to initial state. Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14af_key: replace BUG_ON on WARN_ON in net_exit hookVasily Averin1-1/+1
Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-14net: dsa: Fix dependencies on bridgeAndrew Lunn1-0/+1
DSA now uses one of the symbols exported by the bridge, br_vlan_enabled(). This has a stub, if the bridge is not enabled. However, if the bridge is enabled, we cannot have DSA built in and the bridge as a module, otherwise we get undefined symbols at link time: net/dsa/port.o: In function `dsa_port_vlan_add': net/dsa/port.c:255: undefined reference to `br_vlan_enabled' net/dsa/port.o: In function `dsa_port_vlan_del': net/dsa/port.c:270: undefined reference to `br_vlan_enabled' Reported-by: kbuild test robot <[email protected]> Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13Merge branch 'net-improve-the-process-of-redirect-and-toobig-for-ipv6-tunnels'David S. Miller2-50/+34
Xin Long says: ==================== net: improve the process of redirect and toobig for ipv6 tunnels Now let's say there are 3 kinds of icmp packets to process for tunnels, toobig(needfrag), redirect, others, their process should be: - toobig(needfrag) update the lower dst's pmtu by route cache, also update sk dst's pmtu if possible, or it will be fine if sk dst pmtu will get updated on tx path. - redirect update the lower dst's gw by route cache and return, no need to send this redirect packet to user sk. - others send the packet to user's sk, or it will also be fine to use err_count to count it and report fail link on tx path. All ipv4 tunnels basically follow this while some of ipv6 tunnels are doing in different ways, like ip6gre and ip6_tunnels update tnl dev's mtu instead of updating lower dst pmtu, no redirect process on their err_handlers, which doesn't make any sense and even causes performance problems. This patchset is to improve the process of redirect and toobig for ip6gre ip4ip6, ip6ip6 tunnels, as in ipv4 tunnels. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-13ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlersXin Long1-28/+14
This patch is to remove some useless codes of redirect and fix some indents on ip4ip6 and ip6ip6's err_handlers. Note that redirect icmp packet is already processed in ip6_tnl_err, the old redirect codes in ip4ip6_err actually never worked even before this patch. Besides, there's no need to send redirect to user's sk, it's for lower dst, so just remove it in this patch. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13ip6_tunnel: process toobig in a better wayXin Long1-4/+3
The same improvement in "ip6_gre: process toobig in a better way" is needed by ip4ip6 and ip6ip6 as well. Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their err_handlers. Like I said before, gre6 could not do this as it's inner proto is not certain. But for all of them, sk dst pmtu will be updated in tx path if in need. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13ip6_tunnel: add the process for redirect in ip6_tnl_errXin Long1-5/+10
The same process for redirect in "ip6_gre: add the process for redirect in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13ip6_gre: process toobig in a better wayXin Long1-13/+2
Now ip6gre processes toobig icmp packet by setting gre dev's mtu in ip6gre_err, which would cause few things not good: - It couldn't set mtu with dev_set_mtu due to it's not in user context, which causes route cache and idev->cnf.mtu6 not to be updated. - It has to update sk dst pmtu in tx path according to gredev->mtu for ip6gre, while it updates pmtu again according to lower dst pmtu in ip6_tnl_xmit. - To change dev->mtu by toobig icmp packet is not a good idea, it should only work on pmtu. This patch is to process toobig by updating the lower dst's pmtu, as later sk dst pmtu will be updated in ip6_tnl_xmit, the same way as in ip4gre. Note that gre dev's mtu will not be updated any more, it doesn't make any sense to change dev's mtu after receiving a toobig packet. Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13ip6_gre: add the process for redirect in ip6gre_errXin Long1-0/+5
This patch is to add redirect icmp packet process for ip6gre by calling ip6_redirect() in ip6gre_err(), as in vti6_err. Prior to this patch, there's even no route cache generated after receiving redirect. Reported-by: Jianlin Shi <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13forcedeth: remove redudant assignments in xmitZhu Yanjun1-8/+20
In xmit process, the variables are set many times. In fact, it is enough for these variables to be set once. After a long time test, the throughput performance is better than before. CC: Srinivas Eeda <[email protected]> CC: Joe Jin <[email protected]> CC: Junxiao Bi <[email protected]> Signed-off-by: Zhu Yanjun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13Merge tag 'nfc-next-4.15-1' of ↵David S. Miller17-42/+64
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.15 pull request This is the NFC pull request for 4.15. We have: - A new netlink command for explicitly deactivating NFC targets - i2c constification for all NFC drivers - One NFC device allocation error path fix ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-13Merge branch 'Openvswitch-meter-action'David S. Miller8-31/+772
Andy Zhou says: ==================== Openvswitch meter action This patch series is the first attempt to add openvswitch meter support. We have previously experimented with adding metering support in nftables. However 1) It was not clear how to expose a named nftables object cleanly, and 2) the logic that implements metering is quite small, < 100 lines of code. With those two observations, it seems cleaner to add meter support in the openvswitch module directly. --- v1(RFC)->v2: remove unused code improve locking and other review comments v2 -> v3: rebase v3 -> v4: fix undefined "__udivdi3" references on 32 bit builds. use div_u64() instead. v4 -> v5: rebase ==================== Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13openvswitch: Add meter action supportAndy Zhou4-0/+16
Implements OVS kernel meter action support. Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13openvswitch: Add meter infrastructureAndy Zhou5-2/+674
OVS kernel datapath so far does not support Openflow meter action. This is the first stab at adding kernel datapath meter support. This implementation supports only drop band type. Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13openvswitch: export get_dp() API.Andy Zhou2-29/+31
Later patches will invoke get_dp() outside of datapath.c. Export it. Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13openvswitch: Add meter netlink definitionsAndy Zhou1-0/+51
Meter has its own netlink family. Define netlink messages and attributes for communicating with the user space programs. Signed-off-by: Andy Zhou <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13Merge branch 'dsa-b53-Support-prepended-Broadcom-tags'David S. Miller18-44/+109
Florian Fainelli says: ==================== net: dsa: b53: Support prepended Broadcom tags This patch series adds support for prepended 4-bytes Broadcom tags that we already support. This type of tag will typically be used when interfaced to a SoC like BCM58xx (NorthStar Plus) which supports a Flow Accelerator (WIP). In that case, we need to support a slightly different tagging format. The first patch does a bit of re-factoring and passes a port index to the get_tag_protocol() function since at least two different drivers need that type of information (mt7530, b53) to support tagging or not. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-13net: dsa: b53: Support prepended Broadcom tagsFlorian Fainelli2-4/+11
On BCM58xx devices (Northstar Plus), there is an accelerator attached to port 8 which would only work if we use prepended Broadcom tags. Resolve that difference in our get_tag_protocol() function by setting the appropriate tagging protocol in that case. We need to change b53_brcm_hdr_setup() a little bit now since we can deal with two types of Broadcom tags. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net: dsa: Support prepended Broadcom tagFlorian Fainelli6-7/+41
Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the 4-bytes Broadcom tag that we already support, but in a format where it is pre-pended to the packet instead of located between the MAC SA and the Ethertyper (DSA_TAG_PROTO_BRCM). Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net: dsa: tag_brcm: Prepare for supporting prepended tagFlorian Fainelli1-11/+34
In preparation for supporting the same Broadcom tag format, but instead of inserted between the MAC SA and EtherType, prepended to the Ethernet frame, restructure the code a little bit to make that possible and take an offset parameter. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net: dsa: Pass a port to get_tag_protocol()Florian Fainelli12-31/+32
A number of drivers want to check whether the configured CPU port is a possible configuration for enabling tagging, pass down the CPU port number so they verify that. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net/sched/sch_red.c: work around gcc-4.4.4 anon union initializer issueAndrew Morton1-5/+9
gcc-4.4.4 (at lest) has issues with initializers and anonymous unions: net/sched/sch_red.c: In function 'red_dump_offload': net/sched/sch_red.c:282: error: unknown field 'stats' specified in initializer net/sched/sch_red.c:282: warning: initialization makes integer from pointer without a cast net/sched/sch_red.c:283: error: unknown field 'stats' specified in initializer net/sched/sch_red.c:283: warning: initialization makes integer from pointer without a cast net/sched/sch_red.c: In function 'red_dump_stats': net/sched/sch_red.c:352: error: unknown field 'xstats' specified in initializer net/sched/sch_red.c:352: warning: initialization makes integer from pointer without a cast Work around this. Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") Cc: Nogah Frankel <[email protected]> Cc: Jiri Pirko <[email protected]> Cc: Simon Horman <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net/mlx4: Use Kconfig flag to remove support of old gen2 Mellanox devicesSlava Shwartsman2-0/+10
Since Mellanox focus is on newer adapters, we would like to have the ability to disable the support for old gen2 adapters. This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag. We keep it turned on by default. Signed-off-by: Slava Shwartsman <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13af_netlink: ensure that NLMSG_DONE never fails in dumpsJason A. Donenfeld2-6/+12
The way people generally use netlink_dump is that they fill in the skb as much as possible, breaking when nla_put returns an error. Then, they get called again and start filling out the next skb, and again, and so forth. The mechanism at work here is the ability for the iterative dumping function to detect when the skb is filled up and not fill it past the brim, waiting for a fresh skb for the rest of the data. However, if the attributes are small and nicely packed, it is possible that a dump callback function successfully fills in attributes until the skb is of size 4080 (libmnl's default page-sized receive buffer size). The dump function completes, satisfied, and then, if it happens to be that this is actually the last skb, and no further ones are to be sent, then netlink_dump will add on the NLMSG_DONE part: nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); It is very important that netlink_dump does this, of course. However, in this example, that call to nlmsg_put_answer will fail, because the previous filling by the dump function did not leave it enough room. And how could it possibly have done so? All of the nla_put variety of functions simply check to see if the skb has enough tailroom, independent of the context it is in. In order to keep the important assumptions of all netlink dump users, it is therefore important to give them an skb that has this end part of the tail already reserved, so that the call to nlmsg_put_answer does not fail. Otherwise, library authors are forced to find some bizarre sized receive buffer that has a large modulo relative to the common sizes of messages received, which is ugly and buggy. This patch thus saves the NLMSG_DONE for an additional message, for the case that things are dangerously close to the brim. This requires keeping track of the errno from ->dump() across calls. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13Merge branch 'netem-add-nsec-scheduling-and-slot-feature'David S. Miller2-29/+121
Dave Taht says: ==================== netem: add nsec scheduling and slot feature This patch series converts netem away from the old "ticks" interface and userspace API, and adds support for a new "slot" feature intended to emulate bursty macs such as WiFi and LTE better. Changes since v2: Use u64 for packet_len_sched_time() Use simpler max(time_to_send,q->slot.slot_next) Changes since v1: Always pass new nanosecond APIs to userspace ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-13netem: support delivering packets in delayed time slotsDave Taht2-3/+79
Slotting is a crude approximation of the behaviors of shared media such as cable, wifi, and LTE, which gather up a bunch of packets within a varying delay window and deliver them, relative to that, nearly all at once. It works within the existing loss, duplication, jitter and delay parameters of netem. Some amount of inherent latency must be specified, regardless. The new "slot" parameter specifies a minimum and maximum delay between transmission attempts. The "bytes" and "packets" parameters can be used to limit the amount of information transferred per slot. Examples of use: tc qdisc add dev eth0 root netem delay 200us \ slot 800us 10ms bytes 64k packets 42 A more correct example, using stacked netem instances and a packet limit to emulate a tail drop wifi queue with slots and variable packet delivery, with a 200Mbit isochronous underlying rate, and 20ms path delay: tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \ limit 10000 tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \ slot 800us 10ms bytes 64k packets 42 limit 512 Signed-off-by: Dave Taht <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13netem: add uapi to express delay and jitter in nanosecondsDave Taht2-0/+16
netem userspace has long relied on a horrible /proc/net/psched hack to translate the current notion of "ticks" to nanoseconds. Expressing latency and jitter instead, in well defined nanoseconds, increases the dynamic range of emulated delays and jitter in netem. It will also ease a transition where reducing a tick to nsec equivalence would constrain the max delay in prior versions of netem to only 4.3 seconds. Signed-off-by: Dave Taht <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13netem: convert to qdisc_watchdog_schedule_nsDave Taht1-28/+28
Upgrade the internal netem scheduler to use nanoseconds rather than ticks throughout. Convert to and from the std "ticks" userspace api automatically, while allowing for finer grained scheduling to take place. Signed-off-by: Dave Taht <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13ipv6: try not to take rtnl_lock in ip6mr_sk_doneFrancesco Ruggeri1-0/+4
Avoid traversing the list of mr6_tables (which requires the rtnl_lock) in ip6mr_sk_done(), when we know in advance that a match will not be found. This can happen when rawv6_close()/ip6mr_sk_done() is invoked on non-mroute6 sockets. This patch helps reduce rtnl_lock contention when destroying a large number of net namespaces, each having a non-mroute6 raw socket. v2: same patch, only fixed subject line and expanded comment. Signed-off-by: Francesco Ruggeri <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net: realtek: r8169: remove redundant assignment to giga_ctrlColin Ian King1-2/+0
The variable giga_ctrl is being assigned to zero however this is never read and hence the assignment is redundant, so remove it. Cleans up clang warning: drivers/net/ethernet/realtek/r8169.c:1978:3: warning: Value stored to 'giga_ctrl' is never read Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net: dsa: lan9303: Documentation: Add missing word "Mbps"Egil Hjelmeland1-3/+3
Signed-off-by: Egil Hjelmeland <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-13net: dsa: lan9303: Fix lan9303_alr_del_port()Egil Hjelmeland1-1/+1
Fix embarrassing bug in lan9303_alr_del_port(): Instead of zeroing entr->mac_addr, I destroyed the next cache entry. Affected .port_fdb_del and .port_mdb_del. Fixes: 0620427ea0d6 ("net: dsa: lan9303: Add fdb/mdb manipulation") Signed-off-by: Egil Hjelmeland <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>