aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-12-09udp: copy skb->truesize in the first cache lineEric Dumazet1-3/+10
In UDP RX handler, we currently clear skb->dev before skb is added to receive queue, because device pointer is no longer available once we exit from RCU section. Since this first cache line is always hot, lets reuse this space to store skb->truesize and thus avoid a cache line miss at udp_recvmsg()/udp_skb_destructor time while receive queue spinlock is held. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-09udp: add busylocks in RX pathEric Dumazet1-1/+42
Idea of busylocks is to let producers grab an extra spinlock to relieve pressure on the receive_queue spinlock shared by consumer. This behavior is requested only once socket receive queue is above half occupancy. Under flood, this means that only one producer can be in line trying to acquire the receive_queue spinlock. These busylock can be allocated on a per cpu manner, instead of a per socket one (that would consume a cache line per socket) This patch considerably improves UDP behavior under stress, depending on number of NIC RX queues and/or RPS spread. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-09cfg80211/mac80211: fix BSS leaks when abandoning assoc attemptsJohannes Berg4-9/+39
When mac80211 abandons an association attempt, it may free all the data structures, but inform cfg80211 and userspace about it only by sending the deauth frame it received, in which case cfg80211 has no link to the BSS struct that was used and will not cfg80211_unhold_bss() it. Fix this by providing a way to inform cfg80211 of this with the BSS entry passed, so that it can clean up properly, and use this ability in the appropriate places in mac80211. This isn't ideal: some code is more or less duplicated and tracing is missing. However, it's a fairly small change and it's thus easier to backport - cleanups can come later. Cc: [email protected] Signed-off-by: Johannes Berg <[email protected]>
2016-12-09nl80211: Use different attrs for BSSID and random MAC addr in scan reqVamsi Krishna1-1/+15
NL80211_ATTR_MAC was used to set both the specific BSSID to be scanned and the random MAC address to be used when privacy is enabled. When both the features are enabled, both the BSSID and the local MAC address were getting same value causing Probe Request frames to go with unintended DA. Hence, this has been fixed by using a different NL80211_ATTR_BSSID attribute to set the specific BSSID (which was the more recent addition in cfg80211) for a scan. Backwards compatibility with old userspace software is maintained to some extent by allowing NL80211_ATTR_MAC to be used to set the specific BSSID when scanning without enabling random MAC address use. Scanning with random source MAC address was introduced by commit ad2b26abc157 ("cfg80211: allow drivers to support random MAC addresses for scan") and the issue was introduced with the addition of the second user for the same attribute in commit 818965d39177 ("cfg80211: Allow a scan request for a specific BSSID"). Fixes: 818965d39177 ("cfg80211: Allow a scan request for a specific BSSID") Signed-off-by: Vamsi Krishna <[email protected]> Signed-off-by: Jouni Malinen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-12-09nl80211: fix logic inversion in start_nan()Johannes Berg1-1/+1
Arend inadvertently inverted the logic while converting to wdev_running(), fix that. Fixes: 73c7da3dae1e ("cfg80211: add generic helper to check interface is running") Signed-off-by: Johannes Berg <[email protected]>
2016-12-08net: socket: preferred __aligned(size) for control bufferAmit Kushwaha1-1/+2
This patch cleanup checkpatch.pl warning WARNING: __aligned(size) is preferred over __attribute__((aligned(size))) Signed-off-by: Amit Kushwaha <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08Merge branch 'for-upstream' of ↵David S. Miller2-17/+69
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-12-08 I didn't miss your "net-next is closed" email, but it did come as a bit of a surprise, and due to time-zone differences I didn't have a chance to react to it until now. We would have had a couple of patches in bluetooth-next that we'd still have wanted to get to 4.10. Out of these the most critical one is the H7/CT2 patch for Bluetooth Security Manager Protocol, something that couldn't be published before the Bluetooth 5.0 specification went public (yesterday). If these really can't go to net-next we'll likely be sending at least this patch through bluetooth.git to net.git for rc1 inclusion. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau1-2/+26
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08udp: under rx pressure, try to condense skbsEric Dumazet2-1/+39
Under UDP flood, many softirq producers try to add packets to UDP receive queue, and one user thread is burning one cpu trying to dequeue packets as fast as possible. Two parts of the per packet cost are : - copying payload from kernel space to user space, - freeing memory pieces associated with skb. If socket is under pressure, softirq handler(s) can try to pull in skb->head the payload of the packet if it fits. Meaning the softirq handler(s) can free/reuse the page fragment immediately, instead of letting udp_recvmsg() do this hundreds of usec later, possibly from another node. Additional gains : - We reduce skb->truesize and thus can store more packets per SO_RCVBUF - We avoid cache line misses at copyout() time and consume_skb() time, and avoid one put_page() with potential alien freeing on NUMA hosts. This comes at the cost of a copy, bounded to available tail room, which is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger than necessary) This patch gave me about 5 % increase in throughput in my tests. skb_condense() helper could probably used in other contexts. Signed-off-by: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08net: rfs: add a jump labelEric Dumazet2-1/+6
RFS is not commonly used, so add a jump label to avoid some conditionals in fast path. Signed-off-by: Eric Dumazet <[email protected]> Cc: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08net/sched: cls_flower: Support matching on ICMP type and codeSimon Horman1-0/+53
Support matching on ICMP type and code. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: flower \ indev eth0 ip_proto icmp type 8 code 0 action drop tc filter add dev eth0 protocol ipv6 parent ffff: flower \ indev eth0 ip_proto icmpv6 type 128 code 0 action drop Signed-off-by: Simon Horman <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08flow dissector: ICMP supportSimon Horman1-0/+31
Allow dissection of ICMP(V6) type and code. This should only occur if a packet is ICMP(V6) and the dissector has FLOW_DISSECTOR_KEY_ICMP set. There are currently no users of FLOW_DISSECTOR_KEY_ICMP. A follow-up patch will allow FLOW_DISSECTOR_KEY_ICMP to be used by the flower classifier. Signed-off-by: Simon Horman <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08net/sched: cls_flower: Add support for matching on flagsOr Gerlitz1-0/+76
Add UAPI to provide set of flags for matching, where the flags provided from user-space are mapped to flow-dissector flags. The 1st flag allows to match on whether the packet is an IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag. Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Paul Blakey <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08icmp: correct return value of icmp_rcv()Zhang Shengju1-2/+2
Currently, icmp_rcv() always return zero on a packet delivery upcall. To make its behavior more compliant with the way this API should be used, this patch changes this to let it return NET_RX_SUCCESS when the packet is proper handled, and NET_RX_DROP otherwise. Signed-off-by: Zhang Shengju <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-08Bluetooth: SMP: Add support for H7 crypto function and CT2 auth flagJohan Hedberg2-17/+69
Bluetooth 5.0 introduces a new H7 key generation function that's used when both sides of the pairing set the CT2 authentication flag to 1. Signed-off-by: Johan Hedberg <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-12-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller69-620/+2162
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains a large Netfilter update for net-next, to summarise: 1) Add support for stateful objects. This series provides a nf_tables native alternative to the extended accounting infrastructure for nf_tables. Two initial stateful objects are supported: counters and quotas. Objects are identified by a user-defined name, you can fetch and reset them anytime. You can also use a maps to allow fast lookups using any arbitrary key combination. More info at: http://marc.info/?l=netfilter-devel&m=148029128323837&w=2 2) On-demand registration of nf_conntrack and defrag hooks per netns. Register nf_conntrack hooks if we have a stateful ruleset, ie. state-based filtering or NAT. The new nf_conntrack_default_on sysctl enables this from newly created netnamespaces. Default behaviour is not modified. Patches from Florian Westphal. 3) Allocate 4k chunks and then use these for x_tables counter allocation requests, this improves ruleset load time and also datapath ruleset evaluation, patches from Florian Westphal. 4) Add support for ebpf to the existing x_tables bpf extension. From Willem de Bruijn. 5) Update layer 4 checksum if any of the pseudoheader fields is updated. This provides a limited form of 1:1 stateless NAT that make sense in specific scenario, eg. load balancing. 6) Add support to flush sets in nf_tables. This series comes with a new set->ops->deactivate_one() indirection given that we have to walk over the list of set elements, then deactivate them one by one. The existing set->ops->deactivate() performs an element lookup that we don't need. 7) Two patches to avoid cloning packets, thus speed up packet forwarding via nft_fwd from ingress. From Florian Westphal. 8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to prevent infinite loops, patch from Dwip Banerjee. And one minor refactoring from Gao feng. 9) Revisit recent log support for nf_tables netdev families: One patch to ensure that we correctly handle non-ethernet packets. Another patch to add missing logger definition for netdev. Patches from Liping Zhang. 10) Three patches for nft_fib, one to address insufficient register initialization and another to solve incorrect (although harmless) byteswap operation. Moreover update xt_rpfilter and nft_fib to match lbcast packets with zeronet as source, eg. DHCP Discover packets (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang. 11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has been broken in many-cast mode for some little time, let's give them a chance by placing them at the same level as other existing protocols. Thus, users don't explicitly have to modprobe support for this and NAT rules work for them. Some people point to the lack of support in SOHO Linux-based routers that make deployment of new protocols harder. I guess other middleboxes outthere on the Internet are also to blame. Anyway, let's see if this has any impact in the midrun. 12) Skip software SCTP software checksum calculation if the NIC comes with SCTP checksum offload support. From Davide Caratti. 13) Initial core factoring to prepare conversion to hook array. Three patches from Aaron Conole. 14) Gao Feng made a wrong conversion to switch in the xt_multiport extension in a patch coming in the previous batch. Fix it in this batch. 15) Get vmalloc call in sync with kmalloc flags to avoid a warning and likely OOM killer intervention from x_tables. From Marcelo Ricardo Leitner. 16) Update Arturo Borrero's email address in all source code headers. ==================== Signed-off-by: David S. Miller <[email protected]>
2016-12-07netfilter: nft_quota: allow to restore consumed quotaPablo Neira Ayuso1-2/+9
Allow to restore consumed quota, this is useful to restore the quota state across reboots. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: xt_bpf: support ebpfWillem de Bruijn1-16/+80
Add support for attaching an eBPF object by file descriptor. The iptables binary can be called with a path to an elf object or a pinned bpf object. Also pass the mode and path to the kernel to be able to return it later for iptables dump and save. Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: x_tables: avoid warn and OOM killer on vmalloc callMarcelo Ricardo Leitner1-1/+3
Andrey Konovalov reported that this vmalloc call is based on an userspace request and that it's spewing traces, which may flood the logs and cause DoS if abused. Florian Westphal also mentioned that this call should not trigger OOM killer. This patch brings the vmalloc call in sync to kmalloc and disables the warn trace on allocation failure and also disable OOM killer invocation. Note, however, that under such stress situation, other places may trigger OOM killer invocation. Reported-by: Andrey Konovalov <[email protected]> Cc: Florian Westphal <[email protected]> Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nf_tables: support for set flushingPablo Neira Ayuso3-6/+51
This patch adds support for set flushing, that consists of walking over the set elements if the NFTA_SET_ELEM_LIST_ELEMENTS attribute is set. This patch requires the following changes: 1) Add set->ops->deactivate_one() operation: This allows us to deactivate an element from the set element walk path, given we can skip the lookup that happens in ->deactivate(). 2) Add a new nft_trans_alloc_gfp() function since we need to allocate transactions using GFP_ATOMIC given the set walk path happens with held rcu_read_lock. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nft_set: introduce nft_{hash, rbtree}_deactivate_one()Pablo Neira Ayuso2-8/+27
This new function allows us to deactivate one single element, this is required by the set flush command that comes in a follow up patch. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nf_tables: constify struct nft_ctx * parameter in nft_trans_alloc()Pablo Neira Ayuso1-2/+2
Context is not modified by nft_trans_alloc(), so constify it. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nat: skip checksum on offload SCTP packetsDavide Caratti1-1/+4
SCTP GSO and hardware can do CRC32c computation after netfilter processing, so we can avoid calling sctp_compute_checksum() on skb if skb->ip_summed is equal to CHECKSUM_PARTIAL. Moreover, set skb->ip_summed to CHECKSUM_NONE when the NAT code computes the CRC, to prevent offloaders from computing it again (on ixgbe this resulted in a transmission with wrong L4 checksum). Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: rpfilter: bypass ipv4 lbcast packets with zeronet sourceLiping Zhang2-9/+12
Otherwise, DHCP Discover packets(0.0.0.0->255.255.255.255) may be dropped incorrectly. Signed-off-by: Liping Zhang <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nf_tables: allow to filter stateful object dumps by typePablo Neira Ayuso1-0/+50
This patch adds the netlink code to filter out dump of stateful objects, through the NFTA_OBJ_TYPE netlink attribute. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nft_objref: support for stateful object mapsPablo Neira Ayuso2-1/+119
This patch allows us to refer to stateful object dictionaries, the source register indicates the key data to be used to look up for the corresponding state object. We can refer to these maps through names or, alternatively, the map transaction id. This allows us to refer to both anonymous and named maps. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nf_tables: add stateful object reference to set elementsPablo Neira Ayuso1-10/+62
This patch allows you to refer to stateful objects from set elements. This provides the infrastructure to create maps where the right hand side of the mapping is a stateful object. This allows us to build dictionaries of stateful objects, that you can use to perform fast lookups using any arbitrary key combination. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nft_quota: add depleted flag for objectsPablo Neira Ayuso2-8/+29
Notify on depleted quota objects. The NFT_QUOTA_F_DEPLETED flag indicates we have reached overquota. Add pointer to table from nft_object, so we can use it when sending the depletion notification to userspace. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nf_tables: notify internal updates of stateful objectsPablo Neira Ayuso1-12/+19
Introduce nf_tables_obj_notify() to notify internal state changes in stateful objects. This is used by the quota object to report depletion in a follow up patch. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nf_tables: atomic dump and reset for stateful objectsPablo Neira Ayuso3-22/+81
This patch adds a new NFT_MSG_GETOBJ_RESET command perform an atomic dump-and-reset of the stateful object. This also comes with add support for atomic dump and reset for counter and quota objects. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07netfilter: nft_quota: dump consumed quotaPablo Neira Ayuso1-5/+16
Add a new attribute NFTA_QUOTA_CONSUMED that displays the amount of quota that has been already consumed. This allows us to restore the internal state of the quota object between reboots as well as to monitor how wasted it is. This patch changes the logic to account for the consumed bytes, instead of the bytes that remain to be consumed. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-07can: raw: raw_setsockopt: limit number of can_filter that can be setMarc Kleine-Budde1-0/+3
This patch adds a check to limit the number of can_filters that can be set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER are not prevented resulting in a warning. Reference: https://lkml.org/lkml/2016/12/2/230 Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Cc: linux-stable <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2016-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller9-58/+81
2016-12-06netfilter: nf_tables: add stateful object reference expressionPablo Neira Ayuso3-0/+119
This new expression allows us to refer to existing stateful objects from rules. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: nft_quota: add stateful object typePablo Neira Ayuso1-13/+83
Register a new quota stateful object type into the new stateful object infrastructure. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: nft_counter: add stateful object typePablo Neira Ayuso1-27/+113
Register a new percpu counter stateful object type into the stateful object infrastructure. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: nf_tables: add stateful objectsPablo Neira Ayuso1-0/+516
This patch augments nf_tables to support stateful objects. This new infrastructure allows you to create, dump and delete stateful objects, that are identified by a user-defined name. This patch adds the generic infrastructure, follow up patches add support for two stateful objects: counters and quotas. This patch provides a native infrastructure for nf_tables to replace nfacct, the extended accounting infrastructure for iptables. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: add and use nf_fwd_netdev_egressFlorian Westphal2-10/+27
... so we can use current skb instead of working with a clone. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: xt_multiport: Fix wrong unmatch result with multiple portsGao Feng1-7/+19
I lost one test case in the last commit for xt_multiport. For example, the rule is "-m multiport --dports 22,80,443". When first port is unmatched and the second is matched, the curent codes could not return the right result. It would return false directly when the first port is unmatched. Fixes: dd2602d00f80 ("netfilter: xt_multiport: Use switch case instead of multiple condition checks") Signed-off-by: Gao Feng <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fieldsPablo Neira Ayuso1-5/+102
This patch adds a new flag that signals the kernel to update layer 4 checksum if the packet field belongs to the layer 4 pseudoheader. This implicitly provides stateless NAT 1:1 that is useful under very specific usecases. Since rules mangling layer 3 fields that are part of the pseudoheader may potentially convey any layer 4 packet, we have to deal with the layer 4 checksum adjustment using protocol specific code. This patch adds support for TCP, UDP and ICMPv6, since they include the pseudoheader in the layer 4 checksum calculation. ICMP doesn't, so we can skip it. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: nft_fib_ipv4: initialize *dest to zeroLiping Zhang1-0/+2
Otherwise, if fib lookup fail, *dest will be filled with garbage value, so reverse path filtering will not work properly: # nft add rule x prerouting fib saddr oif eq 0 drop Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Liping Zhang <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: nft_fib: convert htonl to ntohl properlyLiping Zhang3-3/+3
Acctually ntohl and htonl are identical, so this doesn't affect anything, but it is conceptually wrong. Signed-off-by: Liping Zhang <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: x_tables: pack percpu counter allocationsFlorian Westphal4-18/+42
instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: x_tables: pass xt_counters struct to counter allocatorFlorian Westphal4-12/+33
Keeps some noise away from a followup patch. Signed-off-by: Florian Westphal <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: x_tables: pass xt_counters struct instead of packet counterFlorian Westphal4-7/+15
On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: Florian Westphal <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: convert while loops to for loopsAaron Conole2-8/+6
This is to facilitate converting from a singly-linked list to an array of elements. Signed-off-by: Aaron Conole <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: introduce accessor functions for hook entriesAaron Conole3-10/+7
This allows easier future refactoring. Signed-off-by: Aaron Conole <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06netfilter: defrag: only register defrag functionality if neededFlorian Westphal6-19/+126
nf_defrag modules for ipv4 and ipv6 export an empty stub function. Any module that needs the defragmentation hooks registered simply 'calls' this empty function to create a phony module dependency -- modprobe will then load the defrag module too. This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook registration until the functionality is requested within a network namespace instead of module load time for all namespaces. Hooks are only un-registered on module unload or when a namespace that used such defrag functionality exits. We have to use struct net for this as the register hooks can be called before netns initialization here from the ipv4/ipv6 conntrack module init path. There is no unregister functionality support, defrag will always be active once it was requested inside a net namespace. The reason is that defrag has impact on nft and iptables rulesets (without defrag we might see framents). Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-12-06sunrpc: use DEFINE_SPINLOCK()Fabian Frederick1-2/+1
Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-12-06Revert "dctcp: update cwnd on congestion event"Florian Westphal1-8/+1
Neal Cardwell says: If I am reading the code correctly, then I would have two concerns: 1) Has that been tested? That seems like an extremely dramatic decrease in cwnd. For example, if the cwnd is 80, and there are 40 ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope calculations seem to suggest that after just 11 ACKs the cwnd would be down to a minimal value of 2 [..] 2) That seems to contradict another passage in the draft [..] where it sazs: Just as specified in [RFC3168], DCTCP does not react to congestion indications more than once for every window of data. Neal is right. Fortunately we don't have to complicate this by testing vs. current rtt estimate, we can just revert the patch. Normal stack already handles this for us: receiving ACKs with ECE set causes a call to tcp_enter_cwr(), from there on the ssthresh gets adjusted and prr will take care of cwnd adjustment. Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event") Cc: Neal Cardwell <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>