aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-03-01ovs: propagate per dp max headroom to all vportsPaolo Abeni3-1/+53
This patch implements bookkeeping support to compute the maximum headroom for all the devices in each datapath. When said value changes, the underlying devs are notified via the ndo_set_rx_headroom method. This also increases the internal vports xmit performance. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-01bridge: notify enslaved devices of headroom changesPaolo Abeni1-2/+35
On bridge needed_headroom changes, the enslaved devices are notified via the ndo_set_rx_headroom method Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-01mac80211: minstrel_ht: fix a logic error in RTS/CTS handlingFelix Fietkau1-1/+1
RTS/CTS needs to be enabled if the rate is a fallback rate *or* if it's a dual-stream rate and the sta is in dynamic SMPS mode. Cc: [email protected] Fixes: a3ebb4e1b763 ("mac80211: minstrel_ht: handle peers in dynamic SMPS") Reported-by: Matías Richart <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-03-01mac80211: Fix Public Action frame RX in AP modeJouni Malinen1-0/+1
Public Action frames use special rules for how the BSSID field (Address 3) is set. A wildcard BSSID is used in cases where the transmitter and recipient are not members of the same BSS. As such, we need to accept Public Action frames with wildcard BSSID. Commit db8e17324553 ("mac80211: ignore frames between TDLS peers when operating as AP") added a rule that drops Action frames to TDLS-peers based on an Action frame having different DA (Address 1) and BSSID (Address 3) values. This is not correct since it misses the possibility of BSSID being a wildcard BSSID in which case the Address 1 would not necessarily match. Fix this by allowing mac80211 to accept wildcard BSSID in an Action frame when in AP mode. Fixes: db8e17324553 ("mac80211: ignore frames between TDLS peers when operating as AP") Cc: [email protected] Signed-off-by: Jouni Malinen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-03-01mac80211: check PN correctly for GCMP-encrypted fragmented MPDUsJohannes Berg2-10/+28
Just like for CCMP we need to check that for GCMP the fragments have PNs that increment by one; the spec was updated to fix this security issue and now has the following text: The receiver shall discard MSDUs and MMPDUs whose constituent MPDU PN values are not incrementing in steps of 1. Adapt the code for CCMP to work for GCMP as well, luckily the relevant fields already alias each other so no code duplication is needed (just check the aliasing with BUILD_BUG_ON.) Cc: [email protected] Signed-off-by: Johannes Berg <[email protected]>
2016-02-29sch_dsmark: update backlog as wellWANG Cong1-0/+3
Similarly, we need to update backlog too when we update qlen. Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29sch_htb: update backlog as wellWANG Cong1-1/+4
We saw qlen!=0 but backlog==0 on our production machine: qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0) backlog 0b 72p requeues 0 The problem is we only count qlen for HTB qdisc but not backlog. We need to update backlog too when we update qlen, so that we can at least know the average packet length. Cc: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29net_sched: update hierarchical backlog tooWANG Cong19-47/+84
When the bottom qdisc decides to, for example, drop some packet, it calls qdisc_tree_decrease_qlen() to update the queue length for all its ancestors, we need to update the backlog too to keep the stats on root qdisc accurate. Cc: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29net_sched: introduce qdisc_replace() helperWANG Cong12-78/+12
Remove nearly duplicated code and prepare for the following patch. Cc: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29netfilter: xt_osf: remove unused variableSudip Mukherjee1-2/+0
While building with W=1 we got the warning: net/netfilter/xt_osf.c:265:9: warning: variable 'loop_cont' set but not used The local variable loop_cont was only initialized and then assigned a value but was never used or checked after that. While removing the variable, the case of OSFOPT_TS was not removed so that it will serve as a reminder to us that we can do something in that particular case. Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-02-29netfilter: meta: add PRANDOM supportFlorian Westphal1-0/+11
Can be used to randomly match packets e.g. for statistic traffic sampling. See commit 3ad0040573b0c00f8848 ("bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs") for more info why this doesn't use prandom_u32 directly. Unlike bpf nft_meta can be built as a module, so add an EXPORT_SYMBOL for prandom_seed_full_state too. Cc: Daniel Borkmann <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-02-29netfilter: nfnetlink_acct: validate NFACCT_FILTER parametersPhil Turnbull1-0/+3
nfacct_filter_alloc doesn't validate the NFACCT_FILTER_MASK and NFACCT_FILTER_VALUE parameters which can trigger a NULL pointer dereference. CAP_NET_ADMIN is required to trigger the bug. Signed-off-by: Phil Turnbull <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-02-29batman-adv: Start new development cycleSimon Wunderlich1-1/+1
Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement bat_neigh_print APILinus Luessing1-0/+55
Lists all neighbours detected by the Echo Locating Protocol (ELP) and their throughput metric. Initially Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement bat_orig_print APIAntonio Quartulli1-0/+105
Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API callsAntonio Quartulli1-0/+38
Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - send unicast ELP packets for throughput samplingAntonio Quartulli2-0/+72
In case of an unused wireless link, the mac80211 throughput estimation won't get updated further. Consequently, the reported throughput metric will become obsolete. With this patch unicast sampling is introduced by periodically sending unicast ELP packets to each neighbor on idle WiFi links. These sampling packets will fill an entire frame, so that the measurement is as reliable as possible Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - compute the metric based on the estimated throughputAntonio Quartulli7-2/+164
In case of wireless interface retrieve the throughput by querying cfg80211. To perform this call a separate work must be scheduled because the function may sleep and this is not allowed within an RCU protected context (RCU in this case is used to iterate over all the neighbours). Use ethtool to retrieve information about an Ethernet link like HALF/FULL_DUPLEX and advertised bandwidth (e.g. 100/10Mbps). The metric is updated each time a new ELP packet is sent, this way it is possible to timely react to a metric variation which can imply (for example) a neighbour disconnection. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: keep track of when unicast packets are sentAntonio Quartulli10-34/+75
To enable ELP to send probing packets over wireless links only if needed, batman-adv must keep track of the last time it sent a unicast packet towards every neighbour. For this purpose a 2 main changes are introduced: 1) a new member of the elp_neigh_node structure stores the last time a unicast packet was sent towards this neighbour; 2) a wrapper function for sending unicast packets is implemented. This function will simply update the member describe din point 1) and then forward the packet to the real sending routine. Point 2) implies that any code-path leading to a unicast sending now has to use the new wrapper. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: add throughput override attribute to hard_ifacesAntonio Quartulli5-2/+86
This attribute is exported to user space to disable the link throughput auto-detection by setting a fixed value. The throughput override value is used when batman-adv is computing the link throughput towards a neighbour. If the value is set to 0 then batman-adv will try to detect the throughput by itself. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: OGMv2 - implement originators logicAntonio Quartulli5-41/+566
Add the support for recognising new originators in the network and rebroadcast their OGMs. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: OGMv2 - add basic infrastructureAntonio Quartulli9-3/+427
This is the initial implementation of the new OGM protocol (version 2). It has been designed to work on top of the newly added ELP. In the previous version the OGM protocol was used to both measure link qualities and flood the network with the metric information. In this version the protocol is in charge of the latter task only, leaving the former to ELP. This means being able to decouple the interval used by the neighbor discovery from the OGM broadcasting, which revealed to be costly in dense networks and needed to be relaxed so leading to a less responsive routing protocol. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - adding sysfs parameter for elp intervalLinus Luessing1-0/+7
This parameter can be set individually on each interface and allows the configuration of the elp interval for the link quality measurements during runtime. Usually it is desirable to set it to a higher (= slower) value on interfaces which have a more static characteristic (e.g. wired interfaces) or very dense neighbourhoods to reduce overhead. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> [[email protected]: respin on top of the latest master] Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: ELP - creating neighbor structuresLinus Luessing5-1/+214
Initially developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: ELP - adding basic infrastructureLinus Luessing9-1/+363
The B.A.T.M.A.N. protocol originally only used a single message type (called OGM) to determine the link qualities to the direct neighbors and spreading these link quality information through the whole mesh. This procedure is summarized on the BATMAN concept page and explained in details in the RFC draft published in 2008. This approach was chosen for its simplicity during the protocol design phase and the implementation. However, it also bears some drawbacks: * Wireless interfaces usually come with some packet loss, therefore a higher broadcast rate is desirable to allow a fast reaction on flaky connections. Other interfaces of the same host might be connected to Ethernet LANs / VPNs / etc which rarely exhibit packet loss would benefit from a lower broadcast rate to reduce overhead. * It generally is more desirable to detect local link quality changes at a faster rate than propagating all these changes through the entire mesh (the far end of the mesh does not need to care about local link quality changes that much). Other optimizations strategies, like reducing overhead, might be possible if OGMs weren't used for all tasks in the mesh at the same time. As a result detecting local link qualities shall be handled by an independent message type, ELP, whereas the OGM message type remains responsible for flooding the mesh with these link quality information and determining the overall path transmit qualities. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: Add hard_iface specific sysfs wrapper macros for UINTLinus Luessing1-0/+49
This allows us to easily add a sysfs parameter for an unsigned int later, which is not for a batman mesh interface (e.g. bat0), but for a common interface instead. It allows reading and writing an atomic_t in hard_iface (instead of bat_priv compared to the mesh variant). Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> [[email protected]: rename functions and move macros] Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-26net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump.MINOURA Makoto / 箕浦 真3-6/+20
When the send skbuff reaches the end, nlmsg_put and friends returns -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called within a for_each_netdev loop and the first fdb entry of a following netdev could fit in the remaining skbuff. This breaks the mechanism of cb->args[0] and idx to keep track of the entries that are already dumped, which results missing entries in bridge fdb show command. Signed-off-by: Minoura Makoto <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26GSO: Provide software checksum of tunneled UDP fragmentation offloadAlexander Duyck3-7/+37
On reviewing the code I realized that GRE and UDP tunnels could cause a kernel panic if we used GSO to segment a large UDP frame that was sent through the tunnel with an outer checksum and hardware offloads were not available. In order to correct this we need to update the feature flags that are passed to the skb_segment function so that in the event of UDP fragmentation being requested for the inner header the segmentation function will correctly generate the checksum for the payload if we cannot segment the outer header. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26net: l3mdev: prefer VRF master for source address selectionDavid Lamparter1-0/+17
When selecting an address in context of a VRF, the vrf master should be preferred for address selection. If it isn't, the user has a hard time getting the system to select to their preference - the code will pick the address off the first in-VRF interface it can find, which on a router could well be a non-routable address. Signed-off-by: David Lamparter <[email protected]> Signed-off-by: David Ahern <[email protected]> [dsa: Fixed comment style and removed extra blank link ] Signed-off-by: David S. Miller <[email protected]>
2016-02-26net: l3mdev: address selection should only consider devices in L3 domainDavid Ahern2-2/+14
David Lamparter noted a use case where the source address selection fails to pick an address from a VRF interface - unnumbered interfaces. Relevant commands from his script: ip addr add 9.9.9.9/32 dev lo ip link set lo up ip link add name vrf0 type vrf table 101 ip rule add oif vrf0 table 101 ip rule add iif vrf0 table 101 ip link set vrf0 up ip addr add 10.0.0.3/32 dev vrf0 ip link add name dummy2 type dummy ip link set dummy2 master vrf0 up --> note dummy2 has no address - unnumbered device ip route add 10.2.2.2/32 dev dummy2 table 101 ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02 tcpdump -ni dummy2 & And using ping instead of his socat example: $ ping -I vrf0 -c1 10.2.2.2 ping: Warning: source address might be selected on device other than vrf0. PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data. >From tcpdump: 12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64 Note the source address is from lo and is not a VRF local address. With this patch: $ ping -I vrf0 -c1 10.2.2.2 PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data. >From tcpdump: 12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64 Now the source address comes from vrf0. The ipv4 function for selecting source address takes a const argument. Removing the const requires touching a lot of places, so instead l3mdev_master_ifindex_rcu is changed to take a const argument and then do the typecast to non-const as required by netdev_master_upper_dev_get_rcu. This is similar to what l3mdev_fib_table_rcu does. IPv6 for unnumbered interfaces appears to be selecting the addresses properly. Cc: David Lamparter <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26Merge branch 'for-linus' of ↵Linus Torvalds2-6/+13
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There are two small messenger bug fixes and a log spam regression fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: don't spam dmesg with stray reply warnings libceph: use the right footer size when skipping a message libceph: don't bail early from try_read() when skipping a message
2016-02-266lowpan: iphc: fix invalid case handlingAlexander Aring1-2/+2
This patch fixes the return value in a case which should never occur. Instead returning "-EINVAL" we return LOWPAN_IPHC_DAM_00 which is invalid on context based addresses. Also change the WARN_ON_ONCE to WARN_ONCE which was suggested by Dan Carpenter. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-02-25Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+1
Pull nfsd bugfix from Bruce Fields: "One fix for a bug that could cause a NULL write past the end of a buffer in case of unusually long writes to some system interfaces used by mountd and other nfs support utilities" * tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: fix off-by-one in qword_get()
2016-02-25net: ethtool: remove unused __ethtool_get_settingsDavid Decotigny1-31/+14
replaced by __ethtool_get_link_ksettings. Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: core: use __ethtool_get_ksettingsDavid Decotigny2-12/+14
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: bridge: use __ethtool_get_ksettingsDavid Decotigny1-3/+3
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: 8021q: use __ethtool_get_ksettingsDavid Decotigny1-4/+4
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: ethtool: add new ETHTOOL_xLINKSETTINGS APIDavid Decotigny1-6/+447
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API, handled by the new get_link_ksettings/set_link_ksettings callbacks. This API provides support for most legacy ethtool_cmd fields, adds support for larger link mode masks (up to 4064 bits, variable length), and removes ethtool_cmd deprecated fields (transceiver/maxrxpkt/maxtxpkt). This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides the following backward compatibility properties: - legacy ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks. - legacy ethtool with new get/set_link_ksettings drivers: the new driver callbacks are used, data internally converted to legacy ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link mode mask. ETHTOOL_SSET will fail if user tries to set the ethtool_cmd deprecated fields to non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if driver sets higher bits. - future ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks, internally converted to new data structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be ignored and seen as 0 from user space. Note that that "future" ethtool tool will not allow changes to these deprecated fields. - future ethtool with new drivers: direct call to the new callbacks. By "future" ethtool, what is meant is: - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if fails - set: query first and remember which of ETHTOOL_GLINKSETTINGS or ETHTOOL_GSET was successful + if ETHTOOL_GLINKSETTINGS was successful, then change config with ETHTOOL_SLINKSETTINGS. A failure there is final (do not try ETHTOOL_SSET). + otherwise ETHTOOL_GSET was successful, change config with ETHTOOL_SSET. A failure there is final (do not try ETHTOOL_SLINKSETTINGS). The interaction user/kernel via the new API requires a small ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link mode bitmaps. If kernel doesn't agree with user, it returns the bitmap length it is expecting from user as a negative length (and cmd field is 0). When kernel and user agree, kernel returns valid info in all fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS). Data structure crossing user/kernel boundary is 32/64-bit agnostic. Converted internally to a legal kernel bitmap. The internal __ethtool_get_settings kernel helper will gradually be replaced by __ethtool_get_link_ksettings by the time the first "link_settings" drivers start to appear. So this patch doesn't change it, it will be removed before it needs to be changed. Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: Facility to report route quality of connected socketsTom Herbert1-0/+4
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The purpose is to allow an application to give feedback to the kernel about the quality of the network path for a connected socket. The value argument indicates the type of quality report. For this initial patch the only supported advice is a value of 1 which indicates "bad path, please reroute"-- the action taken by the kernel is to call dst_negative_advice which will attempt to choose a different ECMP route, reset the TX hash for flow label and UDP source port in encapsulation, etc. This facility should be useful for connected UDP sockets where only the application can provide any feedback about path quality. It could also be useful for TCP applications that have additional knowledge about the path outside of the normal TCP control loop. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: ipv6: Make address flushing on ifdown optionalDavid Ahern1-15/+121
Currently, all ipv6 addresses are flushed when the interface is configured down, including global, static addresses: $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 << nothing; all addresses have been flushed>> Add a new sysctl to make this behavior optional. The new setting defaults to flush all addresses to maintain backwards compatibility. When the set global addresses with no expire times are not flushed on an admin down. The sysctl is per-interface or system-wide for all interfaces $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1 or $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1 Will keep addresses on eth1 on an admin down. $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000 inet6 2100:1::2/120 scope global tentative valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative valid_lft forever preferred_lft forever Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25tipc: fix null deref crash in compat config pathFlorian Westphal1-0/+1
msg.dst_sk needs to be set up with a valid socket because some callbacks later derive the netns from it. Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation") Reported-by: Jon Maloy <[email protected]> Bisected-by: Jon Maloy <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Acked-by Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25tipc: fix crash during node removalJon Paul Maloy1-13/+11
When the TIPC module is unloaded, we have identified a race condition that allows a node reference counter to go to zero and the node instance being freed before the node timer is finished with accessing it. This leads to occasional crashes, especially in multi-namespace environments. The scenario goes as follows: CPU0:(node_stop) CPU1:(node_timeout) // ref == 2 1: if(!mod_timer()) 2: if (del_timer()) 3: tipc_node_put() // ref -> 1 4: tipc_node_put() // ref -> 0 5: kfree_rcu(node); 6: tipc_node_get(node) 7: // BOOM! We now clean up this functionality as follows: 1) We remove the node pointer from the node lookup table before we attempt deactivating the timer. This way, we reduce the risk that tipc_node_find() may obtain a valid pointer to an instance marked for deletion; a harmless but undesirable situation. 2) We use del_timer_sync() instead of del_timer() to safely deactivate the node timer without any risk that it might be reactivated by the timeout handler. There is no risk of deadlock here, since the two functions never touch the same spinlocks. 3: We remove a pointless tipc_node_get() + tipc_node_put() from the timeout handler. Reported-by: Zhijiang Hu <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25tipc: eliminate risk of finding to-be-deleted node instanceJon Paul Maloy1-9/+9
Although we have never seen it happen, we have identified the following problematic scenario when nodes are stopped and deleted: CPU0: CPU1: tipc_node_xxx() //ref == 1 tipc_node_put() //ref -> 0 tipc_node_find() // node still in table tipc_node_delete() list_del_rcu(n. list) tipc_node_get() //ref -> 1, bad kfree_rcu() tipc_node_put() //ref to 0 again. kfree_rcu() // BOOM! We fix this by introducing use of the conditional kref_get_if_not_zero() instead of kref_get() in the function tipc_node_find(). This eliminates any risk of post-mortem access. Reported-by: Zhijiang Hu <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: fix bridge multicast packet checksum validationLinus Lüssing1-2/+20
We need to update the skb->csum after pulling the skb, otherwise an unnecessary checksum (re)computation can ocure for IGMP/MLD packets in the bridge code. Additionally this fixes the following splats for network devices / bridge ports with support for and enabled RX checksum offloading: [...] [ 43.986968] eth0: hw csum failure [ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2 [ 43.996193] Hardware name: BCM2709 [ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14) [ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90) [ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac) [ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178) [ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188) [ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338) [ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00) [ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c) [...] Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Álvaro Fernández Rojas <[email protected]> Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: dsa: drop vlan_getnextVivien Didelot1-34/+1
The VLAN GetNext operation is specific to some switches, and thus can be complicated to implement for some drivers. Remove the support for the vlan_getnext/port_pvid_get approach in favor of the generic and simpler port_vlan_dump function. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: dsa: add port_vlan_dump routineVivien Didelot1-0/+3
Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers which gets passed the switchdev VLAN object and callback. This function, if implemented, takes precedence over the soon legacy vlan_getnext/port_pvid_get approach. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net_sched: add network namespace support for tc actionsWANG Cong13-138/+699
Currently tc actions are stored in a per-module hashtable, therefore are visible to all network namespaces. This is probably the last part of the tc subsystem which is not aware of netns now. This patch makes them per-netns, several tc action API's need to be adjusted for this. The tc action API code is ugly due to historical reasons, we need to refactor that code in the future. Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net_sched: prepare tcf_hashinfo_destroy() for netns supportWANG Cong1-3/+29
We only release the memory of the hashtable itself, not its entries inside. This is not a problem yet since we only call it in module release path, and module is refcount'ed by actions. This would be a problem after we move the per module hinfo into per netns in the latter patch. Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25NFC: Close a race condition in llcp_sock_getname()Cong Wang1-0/+6
llcp_sock_getname() checks llcp_sock->dev to make sure llcp_sock is already connected or bound, however, we could be in the middle of llcp_sock_bind() where llcp_sock->dev is bound and llcp_sock->service_name_len is set, but llcp_sock->service_name is not, in this case we would lead to copy some bytes from a NULL pointer. Just lock the sock since this is not a hot path anyway. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>
2016-02-25NFC: Use GFP_USER for user-controlled kmallocCong Wang1-2/+2
These two functions are called in sendmsg path, and the 'len' is passed from user-space, so we should not allow malicious users to OOM kernel on purpose. Reported-by: Dmitry Vyukov <[email protected]> Acked-by: Eric Dumazet <[email protected]> Reviewed-by: Julian Calaby <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Samuel Ortiz <[email protected]>