aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2016-03-01svcrdma: Remove close_out exit pathChuck Lever1-11/+1
Clean up: close_out is reached only when ctxt == NULL and XPT_CLOSE is already set. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-by: Devesh Sharma <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: Hook up the logic to return ERR_CHUNKChuck Lever2-13/+46
RFC 5666 Section 4.2 states: > When the peer detects an RPC-over-RDMA header version that it does > not support (currently this document defines only version 1), it > replies with an error code of ERR_VERS, and provides the low and > high inclusive version numbers it does, in fact, support. And: > When other decoding errors are detected in the header or chunks, > either an RPC decode error MAY be returned or the RPC/RDMA error > code ERR_CHUNK MUST be returned. The Linux NFS server does throw ERR_VERS when a client sends it a request whose rdma_version is not "one." But it does not return ERR_CHUNK when a header decoding error occurs. It just drops the request. To improve protocol extensibility, it should reject invalid values in the rdma_proc field instead of treating them all like RDMA_MSG. Otherwise clients can't detect when the server doesn't support new rdma_proc values. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-by: Devesh Sharma <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: Use correct XID in error repliesChuck Lever2-7/+3
When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-by: Devesh Sharma <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: Make RDMA_ERROR messages workChuck Lever4-65/+72
Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-by: Devesh Sharma <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: svc_rdma_post_recv() should close connection on errorChuck Lever4-24/+19
Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Devesh Sharma <[email protected]> Tested-by: Devesh Sharma <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: Close connection when a send error occursChuck Lever1-2/+6
Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01nfsd: Lower NFSv4.1 callback message size limitChuck Lever2-6/+4
The maximum size of a backchannel message on RPC-over-RDMA depends on the connection's inline threshold. Today that threshold is typically 1024 bytes, making the maximum message size 996 bytes. The Linux server's CREATE_SESSION operation checks that the size of callback Calls can be as large as 1044 bytes, to accommodate RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the true message size maximum of 996 bytes. But the server's backchannel currently does not support RPCSEC_GSS. The actual maximum size it needs is much smaller. It is safe to reduce the limit to enable NFSv4.1 on RDMA backchannel operation. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: Do not send Write chunk XDR pad with inline contentChuck Lever2-6/+18
The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: Do not write xdr_buf::tail in a Write chunkChuck Lever1-3/+8
When the Linux NFS server writes an odd-length data item into a Write chunk, it finishes with XDR pad bytes. If the data item is smaller than the Write chunk, the pad bytes are written at the end of the data item, but still inside the chunk (ie, in the application's buffer). Since this is direct data placement, that exposes the pad bytes. XDR pad bytes are inserted in order to preserve the XDR alignment of the next XDR data item in an XDR stream. But Write chunks do not appear in the payload XDR stream, and only one data item is allowed in each chunk. Thus XDR padding is not needed in a Write chunk. With NFSv4, the Linux NFS server places the results of any operations that follow an NFSv4 READ or READLINK in the xdr_buf's tail. Those results also should never be sent as a part of a Write chunk. The current logic in send_write_chunks() appears to assume that the xdr_buf's tail contains only pad bytes (ie, NFSv3). The server should write only the contents of the xdr_buf's page list in a Write chunk. If there's more than an XDR pad in the tail, that needs to go inline or in the Reply chunk. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01svcrdma: Find client-provided write and reply chunks once per replyChuck Lever1-44/+36
The client provides the location of Write chunks into which the server writes bulk payload. The client provides these when the Upper Layer Protocol wants direct data placement and the Binding allows it. (For NFS, this is READ and READLINK operations). The client also provides the location of a Reply chunk into which the server writes the non-bulk part of an RPC reply. The client provides this chunk whenever it believes the reply can be larger than its receive buffers. The server then uses the presence of these chunks to determine how it will form its reply message. svc_rdma_sendto() was looking for Write and Reply chunks multiple times for every reply message. It would be more efficient to do it just once. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2016-03-01net: sched: cls_u32 add bit to specify software only rulesJohn Fastabend1-10/+27
In the initial implementation the only way to stop a rule from being inserted into the hardware table was via the device feature flag. However this doesn't work well when working on an end host system where packets are expect to hit both the hardware and software datapaths. For example we can imagine a rule that will match an IP address and increment a field. If we install this rule in both hardware and software we may increment the field twice. To date we have only added support for the drop action so we have been able to ignore these cases. But as we extend the action support we will hit this example plus more such cases. Arguably these are not even corner cases in many working systems these cases will be common. To avoid forcing the driver to always abort (i.e. the above example) this patch adds a flag to add a rule in software only. A careful user can use this flag to build software and hardware datapaths that work together. One example we have found particularly useful is to use hardware resources to set the skb->mark on the skb when the match may be expensive to run in software but a mark lookup in a hash table is cheap. The idea here is hardware can do in one lookup what the u32 classifier may need to traverse multiple lists and hash tables to compute. The flag is only passed down on inserts. On deletion to avoid stale references in hardware we always try to remove a rule if it exists. The flags field is part of the classifier specific options. Although it is tempting to lift this into the generic structure doing this proves difficult do to how the tc netlink attributes are implemented along with how the dump/change routines are called. There is also precedence for putting seemingly generic pieces in the specific classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So although not ideal I've left FLAGS in the u32 options as well as it simplifies the code greatly and user space has already learned how to manage these bits ala 'tc' tool. Another thing if trying to update a rule we require the flags to be unchanged. This is to force user space, software u32 and the hardware u32 to keep in sync. Thanks to Simon Horman for catching this case. Signed-off-by: John Fastabend <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-01net: sched: consolidate offload decision in cls_u32John Fastabend1-4/+4
The offload decision was originally very basic and tied to if the dev implemented the appropriate ndo op hook. The next step is to allow the user to more flexibly define if any paticular rule should be offloaded or not. In order to have this logic in one function lift the current check into a helper routine tc_should_offload(). Signed-off-by: John Fastabend <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-01ovs: propagate per dp max headroom to all vportsPaolo Abeni3-1/+53
This patch implements bookkeeping support to compute the maximum headroom for all the devices in each datapath. When said value changes, the underlying devs are notified via the ndo_set_rx_headroom method. This also increases the internal vports xmit performance. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-01bridge: notify enslaved devices of headroom changesPaolo Abeni1-2/+35
On bridge needed_headroom changes, the enslaved devices are notified via the ndo_set_rx_headroom method Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-03-01mac80211: minstrel_ht: fix a logic error in RTS/CTS handlingFelix Fietkau1-1/+1
RTS/CTS needs to be enabled if the rate is a fallback rate *or* if it's a dual-stream rate and the sta is in dynamic SMPS mode. Cc: [email protected] Fixes: a3ebb4e1b763 ("mac80211: minstrel_ht: handle peers in dynamic SMPS") Reported-by: Matías Richart <[email protected]> Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-03-01mac80211: Fix Public Action frame RX in AP modeJouni Malinen1-0/+1
Public Action frames use special rules for how the BSSID field (Address 3) is set. A wildcard BSSID is used in cases where the transmitter and recipient are not members of the same BSS. As such, we need to accept Public Action frames with wildcard BSSID. Commit db8e17324553 ("mac80211: ignore frames between TDLS peers when operating as AP") added a rule that drops Action frames to TDLS-peers based on an Action frame having different DA (Address 1) and BSSID (Address 3) values. This is not correct since it misses the possibility of BSSID being a wildcard BSSID in which case the Address 1 would not necessarily match. Fix this by allowing mac80211 to accept wildcard BSSID in an Action frame when in AP mode. Fixes: db8e17324553 ("mac80211: ignore frames between TDLS peers when operating as AP") Cc: [email protected] Signed-off-by: Jouni Malinen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2016-03-01mac80211: check PN correctly for GCMP-encrypted fragmented MPDUsJohannes Berg2-10/+28
Just like for CCMP we need to check that for GCMP the fragments have PNs that increment by one; the spec was updated to fix this security issue and now has the following text: The receiver shall discard MSDUs and MMPDUs whose constituent MPDU PN values are not incrementing in steps of 1. Adapt the code for CCMP to work for GCMP as well, luckily the relevant fields already alias each other so no code duplication is needed (just check the aliasing with BUILD_BUG_ON.) Cc: [email protected] Signed-off-by: Johannes Berg <[email protected]>
2016-02-29sch_dsmark: update backlog as wellWANG Cong1-0/+3
Similarly, we need to update backlog too when we update qlen. Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29sch_htb: update backlog as wellWANG Cong1-1/+4
We saw qlen!=0 but backlog==0 on our production machine: qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0) backlog 0b 72p requeues 0 The problem is we only count qlen for HTB qdisc but not backlog. We need to update backlog too when we update qlen, so that we can at least know the average packet length. Cc: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29net_sched: update hierarchical backlog tooWANG Cong19-47/+84
When the bottom qdisc decides to, for example, drop some packet, it calls qdisc_tree_decrease_qlen() to update the queue length for all its ancestors, we need to update the backlog too to keep the stats on root qdisc accurate. Cc: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29net_sched: introduce qdisc_replace() helperWANG Cong12-78/+12
Remove nearly duplicated code and prepare for the following patch. Cc: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-29netfilter: xt_osf: remove unused variableSudip Mukherjee1-2/+0
While building with W=1 we got the warning: net/netfilter/xt_osf.c:265:9: warning: variable 'loop_cont' set but not used The local variable loop_cont was only initialized and then assigned a value but was never used or checked after that. While removing the variable, the case of OSFOPT_TS was not removed so that it will serve as a reminder to us that we can do something in that particular case. Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-02-29netfilter: meta: add PRANDOM supportFlorian Westphal1-0/+11
Can be used to randomly match packets e.g. for statistic traffic sampling. See commit 3ad0040573b0c00f8848 ("bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs") for more info why this doesn't use prandom_u32 directly. Unlike bpf nft_meta can be built as a module, so add an EXPORT_SYMBOL for prandom_seed_full_state too. Cc: Daniel Borkmann <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-02-29netfilter: nfnetlink_acct: validate NFACCT_FILTER parametersPhil Turnbull1-0/+3
nfacct_filter_alloc doesn't validate the NFACCT_FILTER_MASK and NFACCT_FILTER_VALUE parameters which can trigger a NULL pointer dereference. CAP_NET_ADMIN is required to trigger the bug. Signed-off-by: Phil Turnbull <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2016-02-29batman-adv: Start new development cycleSimon Wunderlich1-1/+1
Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement bat_neigh_print APILinus Luessing1-0/+55
Lists all neighbours detected by the Echo Locating Protocol (ELP) and their throughput metric. Initially Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement bat_orig_print APIAntonio Quartulli1-0/+105
Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement neighbor comparison API callsAntonio Quartulli1-0/+38
Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - send unicast ELP packets for throughput samplingAntonio Quartulli2-0/+72
In case of an unused wireless link, the mac80211 throughput estimation won't get updated further. Consequently, the reported throughput metric will become obsolete. With this patch unicast sampling is introduced by periodically sending unicast ELP packets to each neighbor on idle WiFi links. These sampling packets will fill an entire frame, so that the measurement is as reliable as possible Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - compute the metric based on the estimated throughputAntonio Quartulli7-2/+164
In case of wireless interface retrieve the throughput by querying cfg80211. To perform this call a separate work must be scheduled because the function may sleep and this is not allowed within an RCU protected context (RCU in this case is used to iterate over all the neighbours). Use ethtool to retrieve information about an Ethernet link like HALF/FULL_DUPLEX and advertised bandwidth (e.g. 100/10Mbps). The metric is updated each time a new ELP packet is sent, this way it is possible to timely react to a metric variation which can imply (for example) a neighbour disconnection. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: keep track of when unicast packets are sentAntonio Quartulli10-34/+75
To enable ELP to send probing packets over wireless links only if needed, batman-adv must keep track of the last time it sent a unicast packet towards every neighbour. For this purpose a 2 main changes are introduced: 1) a new member of the elp_neigh_node structure stores the last time a unicast packet was sent towards this neighbour; 2) a wrapper function for sending unicast packets is implemented. This function will simply update the member describe din point 1) and then forward the packet to the real sending routine. Point 2) implies that any code-path leading to a unicast sending now has to use the new wrapper. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: add throughput override attribute to hard_ifacesAntonio Quartulli5-2/+86
This attribute is exported to user space to disable the link throughput auto-detection by setting a fixed value. The throughput override value is used when batman-adv is computing the link throughput towards a neighbour. If the value is set to 0 then batman-adv will try to detect the throughput by itself. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: OGMv2 - implement originators logicAntonio Quartulli5-41/+566
Add the support for recognising new originators in the network and rebroadcast their OGMs. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: OGMv2 - add basic infrastructureAntonio Quartulli9-3/+427
This is the initial implementation of the new OGM protocol (version 2). It has been designed to work on top of the newly added ELP. In the previous version the OGM protocol was used to both measure link qualities and flood the network with the metric information. In this version the protocol is in charge of the latter task only, leaving the former to ELP. This means being able to decouple the interval used by the neighbor discovery from the OGM broadcasting, which revealed to be costly in dense networks and needed to be relaxed so leading to a less responsive routing protocol. Signed-off-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
2016-02-29batman-adv: ELP - adding sysfs parameter for elp intervalLinus Luessing1-0/+7
This parameter can be set individually on each interface and allows the configuration of the elp interval for the link quality measurements during runtime. Usually it is desirable to set it to a higher (= slower) value on interfaces which have a more static characteristic (e.g. wired interfaces) or very dense neighbourhoods to reduce overhead. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> [[email protected]: respin on top of the latest master] Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: ELP - creating neighbor structuresLinus Luessing5-1/+214
Initially developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: ELP - adding basic infrastructureLinus Luessing9-1/+363
The B.A.T.M.A.N. protocol originally only used a single message type (called OGM) to determine the link qualities to the direct neighbors and spreading these link quality information through the whole mesh. This procedure is summarized on the BATMAN concept page and explained in details in the RFC draft published in 2008. This approach was chosen for its simplicity during the protocol design phase and the implementation. However, it also bears some drawbacks: * Wireless interfaces usually come with some packet loss, therefore a higher broadcast rate is desirable to allow a fast reaction on flaky connections. Other interfaces of the same host might be connected to Ethernet LANs / VPNs / etc which rarely exhibit packet loss would benefit from a lower broadcast rate to reduce overhead. * It generally is more desirable to detect local link quality changes at a faster rate than propagating all these changes through the entire mesh (the far end of the mesh does not need to care about local link quality changes that much). Other optimizations strategies, like reducing overhead, might be possible if OGMs weren't used for all tasks in the mesh at the same time. As a result detecting local link qualities shall be handled by an independent message type, ELP, whereas the OGM message type remains responsible for flooding the mesh with these link quality information and determining the overall path transmit qualities. Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-29batman-adv: Add hard_iface specific sysfs wrapper macros for UINTLinus Luessing1-0/+49
This allows us to easily add a sysfs parameter for an unsigned int later, which is not for a batman mesh interface (e.g. bat0), but for a common interface instead. It allows reading and writing an atomic_t in hard_iface (instead of bat_priv compared to the mesh variant). Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <[email protected]> Signed-off-by: Marek Lindner <[email protected]> [[email protected]: rename functions and move macros] Signed-off-by: Antonio Quartulli <[email protected]>
2016-02-26net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump.MINOURA Makoto / 箕浦 真3-6/+20
When the send skbuff reaches the end, nlmsg_put and friends returns -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called within a for_each_netdev loop and the first fdb entry of a following netdev could fit in the remaining skbuff. This breaks the mechanism of cb->args[0] and idx to keep track of the entries that are already dumped, which results missing entries in bridge fdb show command. Signed-off-by: Minoura Makoto <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26GSO: Provide software checksum of tunneled UDP fragmentation offloadAlexander Duyck3-7/+37
On reviewing the code I realized that GRE and UDP tunnels could cause a kernel panic if we used GSO to segment a large UDP frame that was sent through the tunnel with an outer checksum and hardware offloads were not available. In order to correct this we need to update the feature flags that are passed to the skb_segment function so that in the event of UDP fragmentation being requested for the inner header the segmentation function will correctly generate the checksum for the payload if we cannot segment the outer header. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26net: l3mdev: prefer VRF master for source address selectionDavid Lamparter1-0/+17
When selecting an address in context of a VRF, the vrf master should be preferred for address selection. If it isn't, the user has a hard time getting the system to select to their preference - the code will pick the address off the first in-VRF interface it can find, which on a router could well be a non-routable address. Signed-off-by: David Lamparter <[email protected]> Signed-off-by: David Ahern <[email protected]> [dsa: Fixed comment style and removed extra blank link ] Signed-off-by: David S. Miller <[email protected]>
2016-02-26net: l3mdev: address selection should only consider devices in L3 domainDavid Ahern2-2/+14
David Lamparter noted a use case where the source address selection fails to pick an address from a VRF interface - unnumbered interfaces. Relevant commands from his script: ip addr add 9.9.9.9/32 dev lo ip link set lo up ip link add name vrf0 type vrf table 101 ip rule add oif vrf0 table 101 ip rule add iif vrf0 table 101 ip link set vrf0 up ip addr add 10.0.0.3/32 dev vrf0 ip link add name dummy2 type dummy ip link set dummy2 master vrf0 up --> note dummy2 has no address - unnumbered device ip route add 10.2.2.2/32 dev dummy2 table 101 ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02 tcpdump -ni dummy2 & And using ping instead of his socat example: $ ping -I vrf0 -c1 10.2.2.2 ping: Warning: source address might be selected on device other than vrf0. PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data. >From tcpdump: 12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64 Note the source address is from lo and is not a VRF local address. With this patch: $ ping -I vrf0 -c1 10.2.2.2 PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data. >From tcpdump: 12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64 Now the source address comes from vrf0. The ipv4 function for selecting source address takes a const argument. Removing the const requires touching a lot of places, so instead l3mdev_master_ifindex_rcu is changed to take a const argument and then do the typecast to non-const as required by netdev_master_upper_dev_get_rcu. This is similar to what l3mdev_fib_table_rcu does. IPv6 for unnumbered interfaces appears to be selecting the addresses properly. Cc: David Lamparter <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-26Merge branch 'for-linus' of ↵Linus Torvalds2-6/+13
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There are two small messenger bug fixes and a log spam regression fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: don't spam dmesg with stray reply warnings libceph: use the right footer size when skipping a message libceph: don't bail early from try_read() when skipping a message
2016-02-266lowpan: iphc: fix invalid case handlingAlexander Aring1-2/+2
This patch fixes the return value in a case which should never occur. Instead returning "-EINVAL" we return LOWPAN_IPHC_DAM_00 which is invalid on context based addresses. Also change the WARN_ON_ONCE to WARN_ONCE which was suggested by Dan Carpenter. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2016-02-25Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+1
Pull nfsd bugfix from Bruce Fields: "One fix for a bug that could cause a NULL write past the end of a buffer in case of unusually long writes to some system interfaces used by mountd and other nfs support utilities" * tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: fix off-by-one in qword_get()
2016-02-25net: ethtool: remove unused __ethtool_get_settingsDavid Decotigny1-31/+14
replaced by __ethtool_get_link_ksettings. Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: core: use __ethtool_get_ksettingsDavid Decotigny2-12/+14
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: bridge: use __ethtool_get_ksettingsDavid Decotigny1-3/+3
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: 8021q: use __ethtool_get_ksettingsDavid Decotigny1-4/+4
Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-25net: ethtool: add new ETHTOOL_xLINKSETTINGS APIDavid Decotigny1-6/+447
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API, handled by the new get_link_ksettings/set_link_ksettings callbacks. This API provides support for most legacy ethtool_cmd fields, adds support for larger link mode masks (up to 4064 bits, variable length), and removes ethtool_cmd deprecated fields (transceiver/maxrxpkt/maxtxpkt). This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides the following backward compatibility properties: - legacy ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks. - legacy ethtool with new get/set_link_ksettings drivers: the new driver callbacks are used, data internally converted to legacy ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link mode mask. ETHTOOL_SSET will fail if user tries to set the ethtool_cmd deprecated fields to non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if driver sets higher bits. - future ethtool with legacy drivers: no change, still using the get_settings/set_settings callbacks, internally converted to new data structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be ignored and seen as 0 from user space. Note that that "future" ethtool tool will not allow changes to these deprecated fields. - future ethtool with new drivers: direct call to the new callbacks. By "future" ethtool, what is meant is: - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if fails - set: query first and remember which of ETHTOOL_GLINKSETTINGS or ETHTOOL_GSET was successful + if ETHTOOL_GLINKSETTINGS was successful, then change config with ETHTOOL_SLINKSETTINGS. A failure there is final (do not try ETHTOOL_SSET). + otherwise ETHTOOL_GSET was successful, change config with ETHTOOL_SSET. A failure there is final (do not try ETHTOOL_SLINKSETTINGS). The interaction user/kernel via the new API requires a small ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link mode bitmaps. If kernel doesn't agree with user, it returns the bitmap length it is expecting from user as a negative length (and cmd field is 0). When kernel and user agree, kernel returns valid info in all fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS). Data structure crossing user/kernel boundary is 32/64-bit agnostic. Converted internally to a legal kernel bitmap. The internal __ethtool_get_settings kernel helper will gradually be replaced by __ethtool_get_link_ksettings by the time the first "link_settings" drivers start to appear. So this patch doesn't change it, it will be removed before it needs to be changed. Signed-off-by: David Decotigny <[email protected]> Signed-off-by: David S. Miller <[email protected]>