aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2013-09-19Bluetooth: Add synchronization train parameters reading supportJohan Hedberg1-0/+2
This patch adds support for reading the synchronization train parameters for controllers that support the feature. Since the feature is detectable through the local features page 2, which is retreived only in stage 3 of the HCI init sequence, there is no other option than to add a fourth stage to the init sequence. For now the patch doesn't yet add storing of the parameters, but it is nevertheless convenient to have around to see what kind of parameters various controllers use by default (analyzable e.g. with the btmon user space tool). Signed-off-by: Johan Hedberg <[email protected]> Acked-by: Marcel Holtmann <[email protected]> Signed-off-by: Gustavo Padovan <[email protected]>
2013-09-18Bluetooth: Fix waiting for clearing of BT_SK_SUSPEND flagJohan Hedberg1-0/+1
In the case of blocking sockets we should not proceed with sendmsg() if the socket has the BT_SK_SUSPEND flag set. So far the code was only ensuring that POLLOUT doesn't get set for non-blocking sockets using poll() but there was no code in place to ensure that blocking sockets do the right thing when writing to them. This patch adds a new bt_sock_wait_ready helper function to sleep in the sendmsg call if the BT_SK_SUSPEND flag is set, and wake up as soon as it is unset. It also updates the L2CAP and RFCOMM sendmsg callbacks to take advantage of this new helper function. Signed-off-by: Johan Hedberg <[email protected]> Acked-by: Marcel Holtmann <[email protected]> Signed-off-by: Gustavo Padovan <[email protected]>
2013-09-18ipvs: make the service replacement more robustJulian Anastasov1-5/+2
commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added IP_VS_DEST_STATE_REMOVING flag and RCU callback named ip_vs_dest_wait_readers() to keep dests and services after removal for at least a RCU grace period. But we have the following corner cases: - we can not reuse the same dest if its service is removed while IP_VS_DEST_STATE_REMOVING is still set because another dest removal in the first grace period can not extend this period. It can happen when ipvsadm -C && ipvsadm -R is used. - dest->svc can be replaced but ip_vs_in_stats() and ip_vs_out_stats() have no explicit read memory barriers when accessing dest->svc. It can happen that dest->svc was just freed (replaced) while we use it to update the stats. We solve the problems as follows: - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start will remember when for first time after deletion we noticed dest->refcnt=0. Later, the connections can grab a reference while in RCU grace period but if refcnt becomes 0 we can safely free the dest and its svc. - dest->svc becomes RCU pointer. As result, we add explicit RCU locking in ip_vs_in_stats() and ip_vs_out_stats(). - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it now can free the service immediately or after a RCU grace period. dest->svc is not set to NULL anymore. As result, unlinked dests and their services are freed always after IP_VS_DEST_TRASH_PERIOD period, unused services are freed after a RCU grace period. Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2013-09-18ipvs: fix overflow on dest weight multiplySimon Kirby1-1/+1
Schedulers such as lblc and lblcr require the weight to be as high as the maximum number of active connections. In commit b552f7e3a9524abcbcdf ("ipvs: unify the formula to estimate the overhead of processing connections"), the consideration of inactconns and activeconns was cleaned up to always count activeconns as 256 times more important than inactconns. In cases where 3000 or more connections are expected, a weight of 3000 * 256 * 3000 connections overflows the 32-bit signed result used to determine if rescheduling is required. On amd64, this merely changes the multiply and comparison instructions to 64-bit. On x86, a 64-bit result is already present from imull, so only a few more comparison instructions are emitted. Signed-off-by: Simon Kirby <[email protected]> Acked-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2013-09-18Bluetooth: Remove unused event mask structJohan Hedberg1-3/+0
The struct for HCI_Set_Event_Mask is never used. Instead a local 8-byte array is used for sending this command. Therefore, remove the unnecessary struct definition. Signed-off-by: Johan Hedberg <[email protected]> Acked-by: Marcel Holtmann <[email protected]> Signed-off-by: Gustavo Padovan <[email protected]>
2013-09-18Bluetooth: Introduce a new HCI_RFKILLED flagJohan Hedberg1-0/+1
This makes it more convenient to check for rfkill (no need to check for dev->rfkill before calling rfkill_blocked()) and also avoids potential races if the RFKILL state needs to be checked from within the rfkill callback. Signed-off-by: Johan Hedberg <[email protected]> Cc: [email protected] Acked-by: Marcel Holtmann <[email protected]> Signed-off-by: Gustavo Padovan <[email protected]>
2013-09-17Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-1/+1
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for you net tree, mostly targeted to ipset, they are: * Fix ICMPv6 NAT due to wrong comparison, code instead of type, from Phil Oester. * Fix RCU race in conntrack extensions release path, from Michal Kubecek. * Fix missing inversion in the userspace ipset test command match if the nomatch option is specified, from Jozsef Kadlecsik. * Skip layer 4 protocol matching in ipset in case of IPv6 fragments, also from Jozsef Kadlecsik. * Fix sequence adjustment in nfnetlink_queue due to using the netlink skb instead of the network skb, from Gao feng. * Make sure we cannot swap of sets with different layer 3 family in ipset, from Jozsef Kadlecsik. * Fix possible bogus matching in ipset if hash sets with net elements are used, from Oliver Smith. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-09-16Bluetooth: Introduce new HCI socket channel for user operationMarcel Holtmann1-0/+1
This patch introcuces a new HCI socket channel that allows user applications to take control over a specific HCI device. The application gains exclusive access to this device and forces the kernel to stay away and not manage it. In case of the management interface it will actually hide the device. Such operation is useful for security testing tools that need to operate underneath the Bluetooth stack and need full control over a device. The advantage here is that the kernel still provides the service of hardware abstraction and HCI level access. The use of Bluetooth drivers for hardware access also means that sniffing tools like btmon or hcidump are still working and the whole set of transaction can be traced with existing tools. With the new channel it is possible to send HCI commands, ACL and SCO data packets and receive HCI events, ACL and SCO packets from the device. The format follows the well established H:4 protocol. The new HCI user channel can only be established when a device has been through its setup routine and is currently powered down. This is enforced to not cause any problems with current operations. In addition only one user channel per HCI device is allowed. It is exclusive access for one user application. Access to this channel is limited to process with CAP_NET_RAW capability. Using this new facility does not require any external library or special ioctl or socket filters. Just create the socket and bind it. After that the file descriptor is ready to speak H:4 protocol. struct sockaddr_hci addr; int fd; fd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI); memset(&addr, 0, sizeof(addr)); addr.hci_family = AF_BLUETOOTH; addr.hci_dev = 0; addr.hci_channel = HCI_CHANNEL_USER; bind(fd, (struct sockaddr *) &addr, sizeof(addr)); The example shows on how to create a user channel for hci0 device. Error handling has been left out of the example. However with the limitations mentioned above it is advised to handle errors. Binding of the user cahnnel socket can fail for various reasons. Specifically if the device is currently activated by BlueZ or if the access permissions are not present. Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Gustavo Padovan <[email protected]>
2013-09-16Bluetooth: Introduce user channel flag for HCI devicesMarcel Holtmann1-0/+1
This patch introduces a new user channel flag that allows to give full control of a HCI device to a user application. The kernel will stay away from the device and does not allow any further modifications of the device states. The existing raw flag is not used since it has a bit of unclear meaning due to its legacy. Using a new flag makes the code clearer. A device with the user channel flag set can still be enumerate using the legacy API, but it does not longer enumerate using the new management interface used by BlueZ 5 and beyond. This is intentional to not confuse users of modern systems. Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Gustavo Padovan <[email protected]>
2013-09-13netfilter: nf_conntrack: use RCU safe kfree for conntrack extensionsMichal Kubeček1-1/+1
Commit 68b80f11 (netfilter: nf_nat: fix RCU races) introduced RCU protection for freeing extension data when reallocation moves them to a new location. We need the same protection when freeing them in nf_ct_ext_free() in order to prevent a use-after-free by other threads referencing a NAT extension data via bysource list. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-09-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+2
Pull networking fixes from David Miller: 1) Brown paper bag fix in HTB scheduler, class options set incorrectly due to a typoe. Fix from Vimalkumar. 2) It's possible for the ipv6 FIB garbage collector to run before all the necessary datastructure are setup during init, defer the notifier registry to avoid this problem. Fix from Michal Kubecek. 3) New i40e ethernet driver from the Intel folks. 4) Add new qmi wwan device IDs, from Bjørn Mork. 5) Doorbell lock in bnx2x driver is not initialized properly in some configurations, fix from Ariel Elior. 6) Revert an ipv6 packet option padding change that broke standardized ipv6 implementation test suites. From Jiri Pirko. 7) Fix synchronization of ARP information in bonding layer, from Nikolay Aleksandrov. 8) Fix missing error return resulting in illegal memory accesses in openvswitch, from Daniel Borkmann. 9) SCTP doesn't signal poll events properly due to mistaken operator precedence, fix also from Daniel Borkmann. 10) __netdev_pick_tx() passes wrong index to sk_tx_queue_set() which essentially disables caching of TX queue in sockets :-/ Fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) net_sched: htb: fix a typo in htb_change_class() net: qmi_wwan: add new Qualcomm devices ipv6: don't call fib6_run_gc() until routing is ready net: tilegx driver: avoid compiler warning fib6_rules: fix indentation irda: vlsi_ir: Remove casting the return value which is a void pointer irda: donauboe: Remove casting the return value which is a void pointer net: fix multiqueue selection net: sctp: fix smatch warning in sctp_send_asconf_del_ip net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE net: fib: fib6_add: fix potential NULL pointer dereference net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs bcm63xx_enet: remove deprecated IRQF_DISABLED net: korina: remove deprecated IRQF_DISABLED macvlan: Move skb_clone check closer to call qlcnic: Fix warning reported by kbuild test robot. bonding: fix bond_arp_rcv setting and arp validate desync state bonding: fix store_arp_validate race with mode change ipv6/exthdrs: accept tlv which includes only padding bnx2x: avoid atomic allocations during initialization ...
2013-09-11ipv6: don't call fib6_run_gc() until routing is readyMichal Kubeček1-0/+2
When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-11Merge tag 'for-linus-3.12-merge' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs Pull 9p updates from Eric Van Hensbergen: "Minor 9p fixes and tweaks for 3.12 merge window The first fixes namespace issues which causes a kernel NULL pointer dereference, the second fixes uevent handling to work better with udev, and the third switches some code to use srlcpy instead of strncpy in order to be safer. All changes have been baking in for-next for at least 2 weeks" * tag 'for-linus-3.12-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: fs/9p: avoid accessing utsname after namespace has been torn down 9p: send uevent after adding/removing mount_tag attribute fs: 9p: use strlcpy instead of strncpy
2013-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds63-881/+1423
Pull networking changes from David Miller: "Noteworthy changes this time around: 1) Multicast rejoin support for team driver, from Jiri Pirko. 2) Centralize and simplify TCP RTT measurement handling in order to reduce the impact of bad RTO seeding from SYN/ACKs. Also, when both timestamps and local RTT measurements are available prefer the later because there are broken middleware devices which scramble the timestamp. From Yuchung Cheng. 3) Add TCP_NOTSENT_LOWAT socket option to limit the amount of kernel memory consumed to queue up unsend user data. From Eric Dumazet. 4) Add a "physical port ID" abstraction for network devices, from Jiri Pirko. 5) Add a "suppress" operation to influence fib_rules lookups, from Stefan Tomanek. 6) Add a networking development FAQ, from Paul Gortmaker. 7) Extend the information provided by tcp_probe and add ipv6 support, from Daniel Borkmann. 8) Use RCU locking more extensively in openvswitch data paths, from Pravin B Shelar. 9) Add SCTP support to openvswitch, from Joe Stringer. 10) Add EF10 chip support to SFC driver, from Ben Hutchings. 11) Add new SYNPROXY netfilter target, from Patrick McHardy. 12) Compute a rate approximation for sending in TCP sockets, and use this to more intelligently coalesce TSO frames. Furthermore, add a new packet scheduler which takes advantage of this estimate when available. From Eric Dumazet. 13) Allow AF_PACKET fanouts with random selection, from Daniel Borkmann. 14) Add ipv6 support to vxlan driver, from Cong Wang" Resolved conflicts as per discussion. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1218 commits) openvswitch: Fix alignment of struct sw_flow_key. netfilter: Fix build errors with xt_socket.c tcp: Add missing braces to do_tcp_setsockopt caif: Add missing braces to multiline if in cfctrl_linkup_request bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize vxlan: Fix kernel panic on device delete. net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls net: mvneta: properly disable HW PHY polling and ensure adjust_link() works icplus: Use netif_running to determine device state ethernet/arc/arc_emac: Fix huge delays in large file copies tuntap: orphan frags before trying to set tx timestamp tuntap: purge socket error queue on detach qlcnic: use standard NAPI weights ipv6:introduce function to find route for redirect bnx2x: VF RSS support - VF side bnx2x: VF RSS support - PF side vxlan: Notify drivers for listening UDP port changes net: usbnet: update addr_assign_type if appropriate driver/net: enic: update enic maintainers and driver driver/net: enic: Exposing symbols for Cisco's low latency driver ...
2013-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-2/+33
Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <[email protected]>
2013-09-05vxlan: Notify drivers for listening UDP port changesJoseph Gasparakis1-0/+1
This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and ndo_del_rx_vxlan_port(). Drivers can get notifications through the above functions about changes of the UDP listening port of VXLAN. Also, when physical ports come up, now they can call vxlan_get_rx_port() in order to obtain the port number(s) of the existing VXLAN interface in case they already up before them. This information about the listening UDP port would be used for VXLAN related offloads. A big thank you to John Fastabend ([email protected]) for his input and his suggestions on this patch set. CC: John Fastabend <[email protected]> CC: Stephen Hemminger <[email protected]> Signed-off-by: Joseph Gasparakis <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-04net: ipv6: mld: get rid of MLDV2_MRC and simplify calculationDaniel Borkmann1-9/+19
Get rid of MLDV2_MRC and use our new macros for mantisse and exponent to calculate Maximum Response Delay out of the Maximum Response Code. Signed-off-by: Daniel Borkmann <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-04net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.Daniel Borkmann2-3/+33
i) RFC3810, 9.2. Query Interval [QI] says: The Query Interval variable denotes the interval between General Queries sent by the Querier. Default value: 125 seconds. [...] ii) RFC3810, 9.3. Query Response Interval [QRI] says: The Maximum Response Delay used to calculate the Maximum Response Code inserted into the periodic General Queries. Default value: 10000 (10 seconds) [...] The number of seconds represented by the [Query Response Interval] must be less than the [Query Interval]. iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says: The Older Version Querier Present Timeout is the time-out for transitioning a host back to MLDv2 Host Compatibility Mode. When an MLDv1 query is received, MLDv2 hosts set their Older Version Querier Present Timer to [Older Version Querier Present Timeout]. This value MUST be ([Robustness Variable] times (the [Query Interval] in the last Query received)) plus ([Query Response Interval]). Hence, on *default* the timeout results in: [RV] = 2, [QI] = 125sec, [QRI] = 10sec [OVQPT] = [RV] * [QI] + [QRI] = 260sec Having that said, we currently calculate [OVQPT] (here given as 'switchback' variable) as ... switchback = (idev->mc_qrv + 1) * max_delay RFC3810, 9.12. says "the [Query Interval] in the last Query received". In section "9.14. Configuring timers", it is said: This section is meant to provide advice to network administrators on how to tune these settings to their network. Ambitious router implementations might tune these settings dynamically based upon changing characteristics of the network. [...] iv) RFC38010, 9.14.2. Query Interval: The overall level of periodic MLD traffic is inversely proportional to the Query Interval. A longer Query Interval results in a lower overall level of MLD traffic. The value of the Query Interval MUST be equal to or greater than the Maximum Response Delay used to calculate the Maximum Response Code inserted in General Query messages. I assume that was why switchback is calculated as is (3 * max_delay), although this setting seems to be meant for routers only to configure their [QI] interval for non-default intervals. So usage here like this is clearly wrong. Concluding, the current behaviour in IPv6's multicast code is not conform to the RFC as switch back is calculated wrongly. That is, it has a too small value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs instead of ~260secs on default. Hence, introduce necessary helper functions and fix this up properly as it should be. Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu who did initial testing. Signed-off-by: Daniel Borkmann <[email protected]> Cc: David Stevens <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-04tcp: Change return value of tcp_rcv_established()Vijay Subramanian1-2/+2
tcp_rcv_established() returns only one value namely 0. We change the return value to void (as suggested by David Miller). After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in response to SYNs. We can remove the check and processing on the return value of tcp_rcv_established(). We also fix jtcp_rcv_established() in tcp_probe.c to match that of tcp_rcv_established(). Signed-off-by: Vijay Subramanian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-04tunnels: harmonize cleanup done on skb on rx pathNicolas Dichtel1-5/+7
The goal of this patch is to harmonize cleanup done on a skbuff on rx path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-04tunnels: harmonize cleanup done on skb on xmit pathNicolas Dichtel2-2/+1
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-04vxlan: remove net arg from vxlan[6]_xmit_skb()Nicolas Dichtel1-1/+1
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-04iptunnels: remove net arg from iptunnel_xmit()Nicolas Dichtel1-2/+1
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-03llc: Use normal etherdevice.h testsJoe Perches1-30/+0
Convert the llc_<foo> static inlines to the equivalents from etherdevice.h and remove the llc_<foo> static inline functions. llc_mac_null -> is_zero_ether_addr llc_mac_multicast -> is_multicast_ether_addr llc_mac_match -> ether_addr_equal Signed-off-by: Joe Perches <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-09-03Merge branch 'for-davem' of ↵David S. Miller7-10/+137
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please accept this batch of updates intended for the 3.12 stream. For the mac80211 bits, Johannes says this: "This time I have various improvements all over the place: IBSS, mesh, testmode, AP client powersave handling, one of the rare rfkill patches and some code cleanup." Also for mac80211: "And I also have some more changes for -next, just a few small fixes and improvements, nothing really stands out." And for iwlwifi: "This time I have some powersave work (notably uAPSD support), CQM offloads, support for a new firmware API and various code cleanups." Regarding the Bluetooth bits, Gustavo says: "Patches to 3.12, here we have: * implementation of a proper tty_port for RFCOMM devices, this fixes some issues people were seeing lately in the kernel. * Add voice_setting option for SCO, it is used for SCO Codec selection * bugfixes, small improvements and clean ups" For the NFC bits, Samuel says: "With this one we have: - A few pn533 improvements and minor fixes. Testing our pn533 driver against Google's NCI stack triggered a few issues that we fixed now. We also added Tx fragmentation support to this driver. - More NFC secure element handling. We added a GET_SE netlink command for getting all the discovered secure elements, and we defined 2 additional secure element netlink event (transaction and connectivity). We also fixed a couple of typos and copy-paste bugs from the secure element handling code. - Firmware download support for the pn544 driver. This chipset can enter a special mode where it's waiting for firmware blobs to replace the already flashed one. We now support that mode." With repect to the ath tree, Kalle says: "New features in ath10k are rx/tx checsumming in hw and survey scan implemented by Michal. Also he made fixes to different areas of the driver, most notable being fixing the case when using two streams and reducing the number of interface combinations to avoid firmware crashes. Bartosz did a clean related to how we handle SoC power save in PCI layer. For ath6kl Mohammed and Vasanth sent each a patch to fix two infrequent crashes." I also pulled the wireless tree into wireless-next to support a request from Johannes. On top of all that, there are the usual sort of driver updates. The mwifiex, brcmfmac, brcmsmac, ath9k, and rt2x00 drivers all get some attention, as does the bcma bus and a few other random bits here and there. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-09-03Merge branch 'for-3.12' of ↵Linus Torvalds2-8/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on the cgroup front. Most changes aren't visible to userland at all at this point and are laying foundation for the planned unified hierarchy. - The biggest change is decoupling the lifetime management of css (cgroup_subsys_state) from that of cgroup's. Because controllers (cpu, memory, block and so on) will need to be dynamically enabled and disabled, css which is the association point between a cgroup and a controller may come and go dynamically across the lifetime of a cgroup. Till now, css's were created when the associated cgroup was created and stayed till the cgroup got destroyed. Assumptions around this tight coupling permeated through cgroup core and controllers. These assumptions are gradually removed, which consists bulk of patches, and css destruction path is completely decoupled from cgroup destruction path. Note that decoupling of creation path is relatively easy on top of these changes and the patchset is pending for the next window. - cgroup has its own event mechanism cgroup.event_control, which is only used by memcg. It is overly complex trying to achieve high flexibility whose benefits seem dubious at best. Going forward, new events will simply generate file modified event and the existing mechanism is being made specific to memcg. This pull request contains prepatory patches for such change. - Various fixes and cleanups" Fixed up conflict in kernel/cgroup.c as per Tejun. * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits) cgroup: fix cgroup_css() invocation in css_from_id() cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp() cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup cgroup: implement CFTYPE_NO_PREFIX cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax cgroup: fix cgroup_write_event_control() cgroup: fix subsystem file accesses on the root cgroup cgroup: change cgroup_from_id() to css_from_id() cgroup: use css_get() in cgroup_create() to check CSS_ROOT cpuset: remove an unncessary forward declaration cgroup: RCU protect each cgroup_subsys_state release cgroup: move subsys file removal to kill_css() cgroup: factor out kill_css() cgroup: decouple cgroup_subsys_state destruction from cgroup destruction cgroup: replace cgroup->css_kill_cnt with ->nr_css cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item cgroup: move cgroup->subsys[] assignment to online_css() cgroup: reorganize css init / exit paths cgroup: add __rcu modifier to cgroup->subsys[] ...
2013-09-02net: make snmp_mib_free static inlineCong Wang1-1/+11
Fengguang reported: net/built-in.o: In function `in6_dev_finish_destroy': (.text+0x4ca7d): undefined reference to `snmp_mib_free' this is due to snmp_mib_free() is defined when CONFIG_INET is enabled, but in6_dev_finish_destroy() is now moved to core kernel. I think snmp_mib_free() is small enough to be inlined, so just make it static inline. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-31vxlan: add ipv6 proxy supportCong Wang2-0/+9
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <[email protected]> Cc: David Stevens <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-31vxlan: add ipv6 route short circuit supportCong Wang1-0/+1
route short circuit only has IPv4 part, this patch adds the IPv6 part. nd_tbl will be needed. Cc: David S. Miller <[email protected]> Cc: David Stevens <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-31vxlan: add ipv6 supportCong Wang1-1/+1
This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-31ipv6: export a stub for IPv6 symbols used by vxlanCong Wang1-0/+15
In case IPv6 is compiled as a module, introduce a stub for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used by vxlan module. Suggested by Ben. This is an ugly but easy solution for now. Cc: Ben Hutchings <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-31ipv6: move ip6_dst_hoplimit() into core kernelCong Wang2-2/+2
It will be used by vxlan, and may not be inlined. Cc: Eric Dumazet <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-31qdisc: make args to qdisc_create_default conststephen hemminger1-2/+2
Fixes warnings introduced by the qdisc default patch. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-31qdisc: allow setting default queuing disciplinestephen hemminger2-0/+4
By default, the pfifo_fast queue discipline has been used by default for all devices. But we have better choices now. This patch allow setting the default queueing discipline with sysctl. This allows easy use of better queueing disciplines on all devices without having to use tc qdisc scripts. It is intended to allow an easy path for distributions to make fq_codel or sfq the default qdisc. This patch also makes pfifo_fast more of a first class qdisc, since it is now possible to manually override the default and explicitly use pfifo_fast. The behavior for systems who do not use the sysctl is unchanged, they still get pfifo_fast Also removes leftover random # in sysctl net core. Signed-off-by: Stephen Hemminger <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29Merge branch 'master' of ↵David S. Miller2-0/+14
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== This pull request fixes some issues that arise when 6in4 or 4in6 tunnels are used in combination with IPsec, all from Hannes Frederic Sowa and a null pointer dereference when queueing packets to the policy hold queue. 1) We might access the local error handler of the wrong address family if 6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer to the correct local_error to xfrm_state_afinet. 2) Add a helper function to always refer to the correct interpretation of skb->sk. 3) Call skb_reset_inner_headers to record the position of the inner headers when adding a new one in various ipv6 tunnels. This is needed to identify the addresses where to send back errors in the xfrm layer. 4) Dereference inner ipv6 header if encapsulated to always call the right error handler. 5) Choose protocol family by skb protocol to not call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode. 6) Partly revert "xfrm: introduce helper for safe determination of mtu" because this introduced pmtu discovery problems. 7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs. We need this to get the correct mtu informations in xfrm. 8) Fix null pointer dereference in xdst_queue_output. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-08-29tcp: TSO packets automatic sizingEric Dumazet2-0/+3
After hearing many people over past years complaining against TSO being bursty or even buggy, we are proud to present automatic sizing of TSO packets. One part of the problem is that tcp_tso_should_defer() uses an heuristic relying on upcoming ACKS instead of a timer, but more generally, having big TSO packets makes little sense for low rates, as it tends to create micro bursts on the network, and general consensus is to reduce the buffering amount. This patch introduces a per socket sk_pacing_rate, that approximates the current sending rate, and allows us to size the TSO packets so that we try to send one packet every ms. This field could be set by other transports. Patch has no impact for high speed flows, where having large TSO packets makes sense to reach line rate. For other flows, this helps better packet scheduling and ACK clocking. This patch increases performance of TCP flows in lossy environments. A new sysctl (tcp_min_tso_segs) is added, to specify the minimal size of a TSO packet (default being 2). A follow-up patch will provide a new packet scheduler (FQ), using sk_pacing_rate as an input to perform optional per flow pacing. This explains why we chose to set sk_pacing_rate to twice the current rate, allowing 'slow start' ramp up. sk_pacing_rate = 2 * cwnd * mss / srtt v2: Neal Cardwell reported a suspect deferring of last two segments on initial write of 10 MSS, I had to change tcp_tso_should_defer() to take into account tp->xmit_size_goal_segs Signed-off-by: Eric Dumazet <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Van Jacobson <[email protected]> Cc: Tom Herbert <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29net: sctp: reorder sctp_globals to reduce cacheline usageDaniel Borkmann1-11/+9
Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By reordering elements, we can close gaps and simply achieve the following: Current situation: /* size: 80, cachelines: 2, members: 10 */ /* sum members: 57, holes: 4, sum holes: 16 */ /* padding: 7 */ /* last cacheline: 16 bytes */ Afterwards: /* size: 64, cachelines: 1, members: 10 */ /* padding: 7 */ Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-29Merge branch 'master' of ↵John W. Linville7-10/+137
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c
2013-08-28net: add cpu_relax to busy poll loopEliezer Tamir1-0/+1
Add a cpu_relaxt to sk_busy_loop. Julie Cummings reported performance issues when hyperthreading is on. Arjan van de Ven observed that we should have a cpu_relax() in the busy poll loop. Reported-by: Julie Cummings <[email protected]> Signed-off-by: Eliezer Tamir <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-28genl: Hold reference on correct module while netlink-dump.Pravin B Shelar1-2/+18
netlink dump operations take module as parameter to hold reference for entire netlink dump duration. Currently it holds ref only on genl module which is not correct when we use ops registered to genl from another module. Following patch adds module pointer to genl_ops so that netlink can hold ref count on it. CC: Jesse Gross <[email protected]> CC: Johannes Berg <[email protected]> Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-08-28Merge branch 'for-john' of ↵John W. Linville1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
2013-08-28Merge branch 'master' of ↵John W. Linville1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c net/mac80211/ibss.c
2013-08-28{ipv4,xfrm}: Introduce xfrm_tunnel_notifier for xfrm tunnel mode callbackFan Du1-2/+8
Some thoughts on IPv4 VTI implementation: The connection between VTI receiving part and xfrm tunnel mode input process is hardly a "xfrm_tunnel", xfrm_tunnel is used in places where, e.g ipip/sit and xfrm4_tunnel, acts like a true "tunnel" device. In addition, IMHO, VTI doesn't need vti_err to do something meaningful, as all VTI needs is just a notifier to be called whenever xfrm_input ingress a packet to update statistics. A IPsec protected packet is first handled by protocol handlers, e.g AH/ESP, to check packet authentication or encryption rightness. PMTU update is taken care of in this stage by protocol error handler. Then the packet is rearranged properly depending on whether it's transport mode or tunnel mode packed by mode "input" handler. The VTI handler code takes effects in this stage in tunnel mode only. So it neither need propagate PMTU, as it has already been done if necessary, nor the VTI handler is qualified as a xfrm_tunnel. So this patch introduces xfrm_tunnel_notifier and meanwhile wipe out vti_err code. Signed-off-by: Fan Du <[email protected]> Cc: Steffen Klassert <[email protected]> Cc: David S. Miller <[email protected]> Reviewed-by: Saurabh Mohan <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2013-08-27Merge branch 'master' of ↵David S. Miller1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A number of significant new features and optimizations for net-next/3.12. Highlights are: * "Megaflows", an optimization that allows userspace to specify which flow fields were used to compute the results of the flow lookup. This allows for a major reduction in flow setups (the major performance bottleneck in Open vSwitch) without reducing flexibility. * Converting netlink dump operations to use RCU, allowing for additional parallelism in userspace. * Matching and modifying SCTP protocol fields. ==================== Signed-off-by: David S. Miller <[email protected]>
2013-08-28net: syncookies: export cookie_v6_init_sequence/cookie_v6_checkPatrick McHardy1-0/+4
Extract the local TCP stack independant parts of tcp_v6_init_sequence() and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY target. Signed-off-by: Patrick McHardy <[email protected]> Acked-by: David S. Miller <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28netfilter: add SYNPROXY core/targetPatrick McHardy3-1/+84
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28net: syncookies: export cookie_v4_init_sequence/cookie_v4_checkPatrick McHardy1-0/+4
Extract the local TCP stack independant parts of tcp_v4_init_sequence() and cookie_v4_check() and export them for use by the upcoming SYNPROXY target. Signed-off-by: Patrick McHardy <[email protected]> Acked-by: David S. Miller <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-28netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy4-29/+51
Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <[email protected]> Tested-by: Martin Topholm <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+2
Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <[email protected]>
2013-08-26fs/9p: avoid accessing utsname after namespace has been torn downWill Deacon1-0/+5
During trinity fuzzing in a kvmtool guest, I stumbled across the following: Unable to handle kernel NULL pointer dereference at virtual address 00000004 PC is at v9fs_file_do_lock+0xc8/0x1a0 LR is at v9fs_file_do_lock+0x48/0x1a0 [<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124) [<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4) [<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8) [<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8) [<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0) [<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18) I believe this is due to an attempt to access utsname()->nodename, after exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns as NULL and causing the above dereference. A similar issue was fixed for lockd in 9a1b6bf818e7 ("LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts something similar for 9pfs. Cc: Eric Van Hensbergen <[email protected]> Cc: Ron Minnich <[email protected]> Cc: Latchesar Ionkov <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Eric Van Hensbergen <[email protected]>