Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds infrastructure so that remote checksum offload can
set CHECKSUM_PARTIAL instead of calling csum_partial and writing
the modfied checksum field.
Add skb_remcsum_adjust_partial function to set an skb for using
CHECKSUM_PARTIAL with remote checksum offload. Changed
skb_remcsum_process and skb_gro_remcsum_process to take a boolean
argument to indicate if checksum partial can be set or the
checksum needs to be modified using the normal algorithm.
Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Properly set GSO types and skb->encapsulation in the UDP tunnel GRO
complete so that packets are properly represented for GSO. This sets
SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM depending on whether
non-zero checksums were received, and sets SKB_GSO_TUNNEL_REMCSUM if
the remote checksum option was processed.
Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.
To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.
In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.
Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’:
net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function
If ipv4_tun_from_nlattr() returns a different positive value than
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and
validate_and_copy_set_tun() may return an undefined value instead of a
zero success indicator. Initialize err to zero to fix this.
Fixes: 1dd144cf5b4b47e1 ("openvswitch: Support VXLAN Group Policy extension")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Userspace packet execute command pass down flow key for given
packet. But userspace can skip some parameter with zero value.
Therefore kernel needs to initialize key metadata to zero.
Fixes: 0714812134 ("openvswitch: Eliminate memset() from flow_extract.")
Signed-off-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When the RDS transport is TCP, we cannot inline the call to rds_send_xmit
from rds_cong_queue_update because
(a) we are already holding the sock_lock in the recv path, and
will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock
lock
(b) cong_queue_update does an irqsave on the rds_cong_lock, and this
will trigger warnings (for a good reason) from functions called
out of sock_lock.
This patch reverts the change introduced by
2fa57129d ("RDS: Bypass workqueue when queueing cong updates").
The patch has been verified for both RDS/TCP as well as RDS/RDMA
to ensure that there are not regressions for either transport:
- for verification of RDS/TCP a client-server unit-test was used,
with the server blocked in gdb and thus unable to drain its rcvbuf,
eventually triggering a RDS congestion update.
- for RDS/RDMA, the standard IB regression tests were used
Signed-off-by: Sowmini Varadhan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
ip6_append_data is used by other protocols and some of them can't
be partially checksummed. Only partially checksum UDP protocol.
Fixes: 32dce968dd987a (ipv6: Allow for partial checksums on non-ufo packets)
Reported-by: Sabrina Dubroca <[email protected]>
Tested-by: Sabrina Dubroca <[email protected]>
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains two small Netfilter updates for your
net-next tree, they are:
1) Add ebtables support to nft_compat, from Arturo Borrero.
2) Fix missing validation of the SET_ID attribute in the lookup
expressions, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Using the IPCB() macro to get the IPv4 options is convenient, but
unfortunately NetLabel often needs to examine the CIPSO option outside
of the scope of the IP layer in the stack. While historically IPCB()
worked above the IP layer, due to the inclusion of the inet_skb_param
struct at the head of the {tcp,udp}_skb_cb structs, recent commit
971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
reordered the tcp_skb_cb struct and invalidated this IPCB() trick.
This patch fixes the problem by creating a new function,
cipso_v4_optptr(), which locates the CIPSO option inside the IP header
without calling IPCB(). Unfortunately, this isn't as fast as a simple
lookup so some additional tweaks were made to limit the use of this
new function.
Cc: <[email protected]> # 3.18
Reported-by: Casey Schaufler <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
Tested-by: Casey Schaufler <[email protected]>
|
|
xs_tcp_close() is now just a call to xs_tcp_shutdown(), so remove it,
and replace the entry in xs_tcp_ops.
Suggested-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Packetization Layer Path MTU Discovery works separately beside
Path MTU Discovery at IP level, different net namespace has
various requirements on which one to chose, e.g., a virutalized
container instance would require TCP PMTU to probe an usable
effective mtu for underlying tunnel, while the host would
employ classical ICMP based PMTU to function.
Hence making TCP PMTU mechanism per net namespace to decouple
two functionality. Furthermore the probe base MSS should also
be configured separately for each namespace.
Signed-off-by: Fan Du <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: David S. Miller <[email protected]>
|
|
Yes, kernel_setsockopt() hates you for using a char argument.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
If a server has enabled Fast Open and it receives a pure SYN-data
packet (without a Fast Open option), it won't accept the data but it
incorrectly returns a SYN-ACK with a Fast Open cookie and also
increments the SNMP stat LINUX_MIB_TCPFASTOPENPASSIVEFAIL.
This patch makes the server include a Fast Open cookie in SYN-ACK
only if the SYN has some Fast Open option (i.e., when client
requests or presents a cookie).
Signed-off-by: Yuchung Cheng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This avoids setting TUNNEL_VXLAN_OPT for VXLAN frames which don't
have any GBP metadata set. It is not invalid to set it but unnecessary.
Signed-off-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make __ipv6_select_ident() static as it isn't used outside
the file.
Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.)
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Recent commit:
0508c07f5e0c94f38afd5434e8b2a55b84553077
Author: Vlad Yasevich <[email protected]>
Date: Tue Feb 3 16:36:15 2015 -0500
ipv6: Select fragment id during UFO segmentation if not set.
Introduced a bug on LE in how ipv6 fragment id is assigned.
This was cought by nightly sparce check:
Resolve the following sparce error:
net/ipv6/output_core.c:57:38: sparse: incorrect type in assignment
(different base types)
net/ipv6/output_core.c:57:38: expected restricted __be32
[usertype] ip6_frag_id
net/ipv6/output_core.c:57:38: got unsigned int [unsigned]
[assigned] [usertype] id
Fixes: 0508c07f5e0c9 (ipv6: Select fragment id during UFO segmentation if not set.)
Signed-off-by: Vladislav Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Bridge's default_pvid adds a vid by default, by which we cannot add a
non-vlan fdb entry by default, because br_fdb_add() adds fdb entries for
all vlans instead of a non-vlan one when any vlan is configured.
# ip link add br0 type bridge
# ip link set eth0 master br0
# bridge fdb add 12:34:56:78:90:ab dev eth0 master temp
# bridge fdb show brport eth0 | grep 12:34:56:78:90:ab
12:34:56:78:90:ab dev eth0 vlan 1 static
We expect a non-vlan fdb entry as well as vlan 1:
12:34:56:78:90:ab dev eth0 static
To fix this, we need to insert a non-vlan fdb entry if vlan is not
specified, even when any vlan is configured.
Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
dsa_slave_phy_setup() finds the phy for the port via device tree and
using of_phy_connect(), or it uses the fall back of taking a phy from
the switch internal mdio bus and calling phy_connect_direct(). Either
way, if a phy is found, phy_attach_direct() is called to attach the
phy to the slave device.
In dsa_slave_create(), a second call to phy_attach() is made. This
results in the warning "PHY already attached". Remove this second,
redundant attaching of the phy.
Signed-off-by: Andrew Lunn <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
tipc_snprintf() was heavily utilized by the old netlink API which no
longer exists (now netlink compat).
In this patch we swap tipc_snprintf() to the identical scnprintf() in
the only remaining occurrence.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add TIPC_CMD_NOOP to compat layer and remove the old framework.
All legacy nl commands are now converted to the compat layer in
netlink_compat.c.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_SHOW_STATS to compat layer. This command does not
have any counterpart in the new API, meaning it now solely exists as a
function in the compat layer.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_GET_NETID to compat dumpit.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_SET_NETID to compat doit.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_SET_NODE_ADDR to compat doit.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node
counter solely used by the legacy API.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_GET_MEDIA_NAMES to compat dumpit.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert socket (port) listing to compat dumpit call. If a socket
(port) has publications a second dumpit call is issued to collect them
and format then into the legacy buffer before continuing to process
the sockets (ports).
Command converted in this patch:
TIPC_CMD_SHOW_PORTS
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add functionality for printing a dump header and convert
TIPC_CMD_SHOW_NAME_TABLE to compat dumpit.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_RESET_LINK_STATS to compat doit.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert setting of link proprieties to compat doit calls.
Commands converted in this patch:
TIPC_CMD_SET_LINK_TOL
TIPC_CMD_SET_LINK_PRI
TIPC_CMD_SET_LINK_WINDOW
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link
counter solely used by the legacy API.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add functionality for safely appending string data to a TLV without
keeping write count in the caller.
Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce a framework for transcoding legacy nl action into actions
(.doit) calls from the new nl API. This is done by converting the
incoming TLV data into netlink data with nested netlink attributes.
Unfortunately due to the randomness of the legacy API we can't do this
generically so each legacy netlink command requires a specific
transcoding recipe. In this case for bearer enable and bearer disable.
Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit
compat calls.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce a framework for dumping netlink data from the new netlink
API and formatting it to the old legacy API format. This is done by
looping the dump data and calling a format handler for each entity, in
this case a bearer.
We dump until either all data is dumped or we reach the limited buffer
size of the legacy API. Remember, the legacy API doesn't scale.
In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat
layer.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The new netlink API is no longer "v2" but rather the standard API and
the legacy API is now "nl compat". We split them into separate
start/stop and put them in different files in order to further
distinguish them.
Signed-off-by: Richard Alpe <[email protected]>
Reviewed-by: Erik Hugne <[email protected]>
Reviewed-by: Ying Xue <[email protected]>
Reviewed-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now that the linger code is gone, the xs_tcp_fin_timeout variable has
no real function. Keep it for now, since it is part of the /proc
interface, but only define it if that /proc interface is enabled.
Suggested-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
If the connection reset is due to an active call on our side, then
the state change is sometimes not reported. Catch those instances
using xs_error_report() instead.
Also remove the xs_tcp_shutdown() call in xs_tcp_send_request() as
the change in behaviour makes it redundant.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Use of socket shutdown() means that we monitor the shutdown process
through the xs_tcp_state_change() callback, so it is preferable to
a full close in all cases unless we're destroying the transport.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
The previous behaviour left the connection half-open in order to try
to scrape the last replies from the socket. Now that we have more reliable
reconnection, change the behaviour to close down the socket faster.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Now that we no longer use the partial shutdown code when closing the
socket, we no longer need to worry about the TCP linger2 state.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
We set the outer mode protocol too early. As a result, the
local error handler might dispatch to the wrong address family
and report the error to a wrong socket type. We fix this by
setting the outer protocol to the skb after we accessed the
inner mode for the last time, right before we do the atcual
encapsulation where we switch finally to the outer mode.
Reported-by: Chris Ruehl <[email protected]>
Tested-by: Chris Ruehl <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
|
Make sure root user does not try something stupid.
Also make sure mask field in struct rps_sock_flow_table
does not share a cache line with the potentially often dirtied
flow table.
Signed-off-by: Eric Dumazet <[email protected]>
Fixes: 567e4b79731c ("net: rfs: add hash collision detection")
Signed-off-by: David S. Miller <[email protected]>
|
|
The current code prevents any operation with a mixed-family dest
unless IP_VS_CONN_F_TUNNEL flag is set. The problem is that it's impossible
for the client to follow this rule, because ip_vs_genl_parse_dest does
not even read the destination conn_flags when cmd = IPVS_CMD_DEL_DEST
(need_full_dest = 0).
Also, not every client can pass this flag when removing a dest. ipvsadm,
for example, does not support the "-i" command line option together with
the "-d" option.
This change disables any checks for mixed-family on IPVS_CMD_DEL_DEST command.
Signed-off-by: Alexey Andriyanov <[email protected]>
Fixes: bc18d37f676f ("ipvs: Allow heterogeneous pools now that we support them")
Acked-by: Julian Anastasov <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
Instead we rely on SO_REUSEPORT to provide the reconnection semantics
that we need for NFSv2/v3.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket()
or xs_tcp_setup_socket(), since they do not own the correct locks. Instead,
do it in xs_connect().
Signed-off-by: Trond Myklebust <[email protected]>
|
|
The socket lock is currently held by the task that is requesting the
connection be established. While that is efficient in the case where
the connection happens quickly, it is racy in the case where it doesn't.
What we really want is for the connect helper to be able to block access
to the socket while it is being set up.
This patch does so by arranging to transfer the socket lock from the
task that is requesting the connect attempt, and then releasing that
lock once everything is done.
This scheme also gives us automatic protection against collisions with
the RPC close code, so we can kill the cancel_delayed_work_sync()
call in xs_close().
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Otherwise, we may end up looping.
Signed-off-by: Trond Myklebust <[email protected]>
|