Age | Commit message (Collapse) | Author | Files | Lines |
|
macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Before this patch we had two flavors of most forwarding functions -
_forward and _deliver, the difference being that the latter are used
when the packets are locally originated. Instead of all this function
pointer passing and code duplication, we can just pass a boolean noting
that the packet was locally originated and use that to perform the
necessary checks in __br_forward. This gives a minor performance
improvement but more importantly consolidates the forwarding paths.
Also add a kernel doc comment to explain the exported br_forward()'s
arguments.
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently if the packet is going to be received locally we set skb0 or
sometimes called skb2 variables to the original skb. This can get
confusing and also we can avoid one conditional on the fast path by
simply using a boolean and passing it around. Thanks to Roopa for the
name suggestion.
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch removes one conditional from the unicast path by using the fact
that skb is NULL only when the packet is multicast or is local.
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Trivial style changes in br_handle_frame_finish.
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If there were less than 2 entries in the multipath list, then
xprt_iter_next_entry_multiple() would never advance beyond the
first entry, which is correct for round robin behaviour, but not
for the list iteration.
The end result would be infinite looping in rpc_clnt_iterate_for_each_xprt()
as we would never see the xprt == NULL condition fulfilled.
Reported-by: Oleg Drokin <[email protected]>
Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple transports")
Signed-off-by: Trond Myklebust <[email protected]>
|
|
This adds kernel-doc style descriptions for 6 functions and
fixes 1 typo.
Signed-off-by: Richard Sailer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This work addresses a couple of issues bpf_skb_event_output()
helper currently has: i) We need two copies instead of just a
single one for the skb data when it should be part of a sample.
The data can be non-linear and thus needs to be extracted via
bpf_skb_load_bytes() helper first, and then copied once again
into the ring buffer slot. ii) Since bpf_skb_load_bytes()
currently needs to be used first, the helper needs to see a
constant size on the passed stack buffer to make sure BPF
verifier can do sanity checks on it during verification time.
Thus, just passing skb->len (or any other non-constant value)
wouldn't work, but changing bpf_skb_load_bytes() is also not
the proper solution, since the two copies are generally still
needed. iii) bpf_skb_load_bytes() is just for rather small
buffers like headers, since they need to sit on the limited
BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
this work improves the bpf_skb_event_output() helper to address
all 3 at once.
We can make use of the passed in skb context that we have in
the helper anyway, and use some of the reserved flag bits as
a length argument. The helper will use the new __output_custom()
facility from perf side with bpf_skb_copy() as callback helper
to walk and extract the data. It will pass the data for setup
to bpf_event_output(), which generates and pushes the raw record
with an additional frag part. The linear data used in the first
frag of the record serves as programmatically defined meta data
passed along with the appended sample.
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:
commit f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock")
And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.
Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.
It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: tcp: make challenge acks less predictable
Cc: Eric Dumazet <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Neal Cardwell <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Yue Cao <[email protected]>
Signed-off-by: Jason Baron <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The rxrpc_lookup_peer() function returns NULL on error, it never returns
error pointers.
Fixes: 8496af50eb38 ('rxrpc: Use RCU to access a peer's service connection tree')
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use RDS probe-ping to compute how many paths may be used with
the peer, and to synchronously start the multiple paths. If mprds is
supported, hash outgoing traffic to one of multiple paths in rds_sendmsg()
when multipath RDS is supported by the transport.
CC: Santosh Shilimkar <[email protected]>
Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some code duplication in rds_tcp_reset_callbacks() can be avoided
by having the function call rds_tcp_restore_callbacks() and
rds_tcp_set_callbacks().
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Sowmini Varadhan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As the existing comments in rds_tcp_listen_data_ready() indicate,
it is possible under some race-windows to get to this function with the
accept() socket. If that happens, we could run into a sequence whereby
thread 1 thread 2
rds_tcp_accept_one() thread
sets up new_sock via ->accept().
The sk_user_data is now
sock_def_readable
data comes in for new_sock,
->sk_data_ready is called, and
we land in rds_tcp_listen_data_ready
rds_tcp_set_callbacks()
takes the sk_callback_lock and
sets up sk_user_data to be the cp
read_lock sk_callback_lock
ready = cp
unlock sk_callback_lock
page fault on ready
In the above sequence, we end up with a panic on a bad page reference
when trying to execute (*ready)(). Instead we need to call
sock_def_readable() safely, which is what this patch achieves.
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Sowmini Varadhan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This helper serves to know if two switchdev port netdevices belong to the
same HW ASIC, e.g to figure out if forwarding offload is possible between them.
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently only read-only checks are performed up to the point on where
we check if peer is ECN capable, checks which we can avoid otherwise.
The flag ecn_ce_done is only used to perform this check once per
incoming packet, and nothing more.
Thus this patch moves the peer check up.
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We should not clear that flag when switching to a new skb from a GSO skb
because it would cause ECN processing to happen multiple times per GSO
skb, which is not wanted. Instead, let it be processed once per chunk.
That is, in other words, once per IP header available.
Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Identifying address family operations during rx path is not something
expensive but it's ugly to the eye to have it done multiple times,
specially when we already validated it during initial rx processing.
This patch takes advantage of the now shared sctp_input_cb and make the
pointer to the operations readily available.
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
SCTP will try to access original IP headers on sctp_recvmsg in order to
copy the addresses used. There are also other places that do similar access
to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO
support") they aren't always there because they are only present in the
header skb.
SCTP handles the queueing of incoming data by cloning the incoming skb
and limiting to only the relevant payload. This clone has its cb updated
to something different and it's then queued on socket rx queue. Thus we
need to fix this in two moments.
For rx path, not related to socket queue yet, this patch uses a
partially copied sctp_input_cb to such GSO frags. This restores the
ability to access the headers for this part of the code.
Regarding the socket rx queue, it removes iif member from sctp_event and
also add a chunk pointer on it.
With these changes we're always able to reach the headers again.
The biggest change here is that now the sctp_chunk struct and the
original skb are only freed after the application consumed the buffer.
Note however that the original payload was already like this due to the
skb cloning.
For iif, SCTP's IPv4 code doesn't use it, so no change is necessary.
IPv6 now can fetch it directly from original's IPv6 CB as the original
skb is still accessible.
In the future we probably can simplify sctp_v*_skb_iif() stuff, as
sctp_v4_skb_iif() was called but it's return value not used, and now
it's not even called, but such cleanup is out of scope for this change.
Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The next patch needs 8 bytes in there. sctp_ulpevent has a hole due to
bad alignment; msg_flags is using 4 bytes while it actually uses only 2, so
we shrink it, and iif member (4 bytes) which can be easily fetched from
another place once the next patch is there, so we remove it and thus
creating space for 8 bytes.
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We process input path in other files too and having access to it is
nice, so move it to a header where it's shared.
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2016-07-13
Here's our main bluetooth-next pull request for the 4.8 kernel:
- Fixes and cleanups in 802.15.4 and 6LoWPAN code
- Fix out of bounds issue in btmrvl driver
- Fixes to Bluetooth socket recvmsg return values
- Use crypto_cipher_encrypt_one() instead of crypto_skcipher
- Cleanup of Bluetooth connection sysfs interface
- New Authentication failure reson code for Disconnected mgmt event
- New USB IDs for Atheros, Qualcomm and Intel Bluetooth controllers
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.
A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.
Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.
Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.
Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.
Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Increment the mgmt revision due to the recently added new
reason code for the Disconnected event.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
If link is disconnected due to Authentication Failure (PIN or Key
Missing status) userspace will be notified about this with proper error
code. Many LE profiles define "PIN or Key Missing" status as indication
of remote lost bond so this allows userspace to take action on this.
@ Device Connected: 88:63:DF:88:0E:83 (1) flags 0x0000
02 01 1a 05 03 0a 18 0d 18 0b 09 48 65 61 72 74 ...........Heart
20 52 61 74 65 Rate
> HCI Event: Command Status (0x0f) plen 4
LE Read Remote Used Features (0x08|0x0016) ncmd 1
Status: Success (0x00)
> ACL Data RX: Handle 3585 flags 0x02 dlen 11
ATT: Read By Group Type Request (0x10) len 6
Handle range: 0x0001-0xffff
Attribute group type: Primary Service (0x2800)
> HCI Event: LE Meta Event (0x3e) plen 12
LE Read Remote Used Features (0x04)
Status: Success (0x00)
Handle: 3585
Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
LE Encryption
< HCI Command: LE Start Encryption (0x08|0x0019) plen 28
Handle: 3585
Random number: 0x0000000000000000
Encrypted diversifier: 0x0000
Long term key: 26201cd479a0921b6f949f0b1fa8dc82
> HCI Event: Command Status (0x0f) plen 4
LE Start Encryption (0x08|0x0019) ncmd 1
Status: Success (0x00)
> HCI Event: Encryption Change (0x08) plen 4
Status: PIN or Key Missing (0x06)
Handle: 3585
Encryption: Disabled (0x00)
< HCI Command: Disconnect (0x01|0x0006) plen 3
Handle: 3585
Reason: Authentication Failure (0x05)
> HCI Event: Command Status (0x0f) plen 4
Disconnect (0x01|0x0006) ncmd 1
Status: Success (0x00)
> HCI Event: Disconnect Complete (0x05) plen 4
Status: Success (0x00)
Handle: 3585
Reason: Connection Terminated By Local Host (0x16)
@ Device Disconnected: 88:63:DF:88:0E:83 (1) reason 4
@ Device Connected: C4:43:8F:A3:4D:83 (0) flags 0x0000
08 09 4e 65 78 75 73 20 35 ..Nexus 5
> HCI Event: Command Status (0x0f) plen 4
Authentication Requested (0x01|0x0011) ncmd 1
Status: Success (0x00)
> HCI Event: Link Key Request (0x17) plen 6
Address: C4:43:8F:A3:4D:83 (LG Electronics)
< HCI Command: Link Key Request Reply (0x01|0x000b) plen 22
Address: C4:43:8F:A3:4D:83 (LG Electronics)
Link key: 080812e4aa97a863d11826f71f65a933
> HCI Event: Command Complete (0x0e) plen 10
Link Key Request Reply (0x01|0x000b) ncmd 1
Status: Success (0x00)
Address: C4:43:8F:A3:4D:83 (LG Electronics)
> HCI Event: Auth Complete (0x06) plen 3
Status: PIN or Key Missing (0x06)
Handle: 75
@ Authentication Failed: C4:43:8F:A3:4D:83 (0) status 0x05
< HCI Command: Disconnect (0x01|0x0006) plen 3
Handle: 75
Reason: Remote User Terminated Connection (0x13)
> HCI Event: Command Status (0x0f) plen 4
Disconnect (0x01|0x0006) ncmd 1
Status: Success (0x00)
> HCI Event: Disconnect Complete (0x05) plen 4
Status: Success (0x00)
Handle: 75
Reason: Connection Terminated By Local Host (0x16)
@ Device Disconnected: C4:43:8F:A3:4D:83 (0) reason 4
Signed-off-by: Szymon Janc <[email protected]>
Signed-off-by: Johan Hedberg <[email protected]>
|
|
Define a tracepoint and allow user to trace messages going to and from
hardware associated with devlink instance.
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fixes the following sparse warning:
net/dsa/dsa2.c:680:6: warning:
symbol '_dsa_unregister_switch' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
security initialized after alloc workqueue, so we should exit security
before destroy workqueue in the error handing.
Fixes: 648af7fca159 ("rxrpc: Absorb the rxkad security module")
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree.
they are:
1) Fix leak in the error path of nft_expr_init(), from Liping Zhang.
2) Tracing from nf_tables cannot be disabled, also from Zhang.
3) Fix an integer overflow on 32bit archs when setting the number of
hashtable buckets, from Florian Westphal.
4) Fix configuration of ipvs sync in backup mode with IPv6 address,
from Quentin Armitage via Simon Horman.
5) Fix incorrect timeout calculation in nft_ct NFT_CT_EXPIRATION,
from Florian Westphal.
6) Skip clash resolution in conntrack insertion races if NAT is in
place.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The clash resolution is not easy to apply if the NAT table is
registered. Even if no NAT rules are installed, the nul-binding ensures
that a unique tuple is used, thus, the packet that loses race gets a
different source port number, as described by:
http://marc.info/?l=netfilter-devel&m=146818011604484&w=2
Clash resolution with NAT is also problematic if addresses/port range
ports are used since the conntrack that wins race may describe a
different mangling that we may have earlier applied to the packet via
nf_nat_setup_info().
Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Tested-by: Marc Dionne <[email protected]>
|
|
User can add ct entry via nfnetlink(IPCTNL_MSG_CT_NEW), and if the total
number reach the nf_conntrack_max, we will try to drop some ct entries.
But in this case(the main function call path is ctnetlink_create_conntrack
-> nf_conntrack_alloc -> early_drop), rcu_read_lock is not held, so race
with hash resize will happen.
Fixes: 242922a02717 ("netfilter: conntrack: simplify early_drop")
Cc: Florian Westphal <[email protected]>
Signed-off-by: Liping Zhang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The Makefile controlling compilation of this file is obj-y,
meaning that it currently is never being built as a module.
Since MODULE_ALIAS is a no-op for non-modular code, we can simply
remove the MODULE_ALIAS_NETPROTO variant used here.
We replace module.h with kmod.h since the file does make use of
request_module() in order to load other modules from here.
We don't have to worry about init.h coming in via the removed
module.h since the file explicitly includes init.h already.
Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: James Morris <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: Patrick McHardy <[email protected]>
Cc: [email protected]
Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In test situations with many nodes and a heavily stressed system we have
observed that the transmission broadcast link may fail due to an
excessive number of retransmissions of the same packet. In such
situations we need to reset all unicast links to all peers, in order to
reset and re-synchronize the broadcast link.
In this commit, we add a new function tipc_bearer_reset_all() to be used
in such situations. The function scans across all bearers and resets all
their pertaining links.
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After a new receiver peer has been added to the broadcast transmission
link, we allow immediate transmission of new broadcast packets, trusting
that the new peer will not accept the packets until it has received the
previously sent unicast broadcast initialiation message. In the same
way, the sender must not accept any acknowledges until it has itself
received the broadcast initialization from the peer, as well as
confirmation of the reception of its own initialization message.
Furthermore, when a receiver peer goes down, the sender has to produce
the missing acknowledges from the lost peer locally, in order ensure
correct release of the buffers that were expected to be acknowledged by
the said peer.
In a highly stressed system we have observed that contact with a peer
may come up and be lost before the above mentioned broadcast initial-
ization and confirmation have been received. This leads to the locally
produced acknowledges being rejected, and the non-acknowledged buffers
to linger in the broadcast link transmission queue until it fills up
and the link goes into permanent congestion.
In this commit, we remedy this by temporarily setting the corresponding
broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up'
state to true before we issue the local acknowledges. This ensures that
those acknowledges will always be accepted. The mentioned state values
are restored immediately afterwards when the link is reset.
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
At first contact between two nodes, an endpoint might sometimes have
time to send out a LINK_PROTOCOL/STATE packet before it has received
the broadcast initialization packet from the peer, i.e., before it has
received a valid broadcast packet number to add to the 'bc_ack' field
of the protocol message.
This means that the peer endpoint will receive a protocol packet with an
invalid broadcast acknowledge value of 0. Under unlucky circumstances
this may lead to the original, already received acknowledge value being
overwritten, so that the whole broadcast link goes stale after a while.
We fix this by delaying the setting of the link field 'bc_peer_is_up'
until we know that the peer really has received our own broadcast
initialization message. The latter is always sent out as the first
unicast message on a link, and always with seqeunce number 1. Because
of this, we only need to look for a non-zero unicast acknowledge value
in the arriving STATE messages, and once that is confirmed we know we
are safe and can set the mentioned field. Before this moment, we must
ignore all broadcast acknowledges from the peer.
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
as a control message to TCP. Since __sock_cmsg_send does not
support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
hence breaks pulse audio over TCP.
SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
but they semantically belong to SOL_UNIX. Since all
cmsg-processing functions including sock_cmsg_send ignore control
messages of other layers, it is best to ignore SCM_RIGHTS
and SCM_CREDENTIALS for consistency (and also for fixing pulse
audio over TCP).
Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Reported-by: Sergei Trofimovich <[email protected]>
Tested-by: Sergei Trofimovich <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Vegard Nossum is reporting for a crash in fib_dump_info
when nh_dev = NULL and fib_nhs == 1:
Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+
RIP: 0033:[<00000000602b3d18>]
RSP: 0000000062623890 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00
RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000
R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00
R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000
Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18
CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581
Stack:
626238f0 960226a02 00000400 000000fe
62623910 600afca7 62623970 62623a48
62468038 00000018 00000000 00000000
Call Trace:
[<602b3e93>] rtmsg_fib+0xd3/0x190
[<602b6680>] fib_table_insert+0x260/0x500
[<602b0e5d>] inet_rtm_newroute+0x4d/0x60
[<60250def>] rtnetlink_rcv_msg+0x8f/0x270
[<60267079>] netlink_rcv_skb+0xc9/0xe0
[<60250d4b>] rtnetlink_rcv+0x3b/0x50
[<60265400>] netlink_unicast+0x1a0/0x2c0
[<60265e47>] netlink_sendmsg+0x3f7/0x470
[<6021dc9a>] sock_sendmsg+0x3a/0x90
[<6021e0d0>] ___sys_sendmsg+0x300/0x360
[<6021fa64>] __sys_sendmsg+0x54/0xa0
[<6021fac0>] SyS_sendmsg+0x10/0x20
[<6001ea68>] handle_syscall+0x88/0x90
[<600295fd>] userspace+0x3fd/0x500
[<6001ac55>] fork_handler+0x85/0x90
$ addr2line -e vmlinux -i 0x602b3d18
include/linux/inetdevice.h:222
net/ipv4/fib_semantics.c:1264
Problem happens when RTNH_F_LINKDOWN is provided from user space
when creating routes that do not use the flag, catched with
netlink fuzzer.
Currently, the kernel allows user space to set both flags
to nh_flags and fib_flags but this is not intentional, the
assumption was that they are not set. Fix this by rejecting
both flags with EINVAL.
Reported-by: Vegard Nossum <[email protected]>
Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Julian Anastasov <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: Dinesh Dutt <[email protected]>
Cc: Scott Feldman <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Neal Cardwell <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Using a combination if #if conditionals and goto labels to unwind
tunnel4_init seems unwieldy. This patch takes a simpler approach of
directly unregistering previously registered protocols when an error
occurs.
This fixes a number of problems with the current implementation
including the potential presence of labels when they are unused
and the potential absence of unregister code when it is needed.
Fixes: 8afe97e5d416 ("tunnels: support MPLS over IPv4 tunnels")
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
prsctp PRIO policy is a policy to abandon lower priority chunks when
asoc doesn't have enough snd buffer, so that the current chunk with
higher priority can be queued successfully.
Similar to TTL/RTX policy, we will set the priority of the chunk to
prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
So if PRIO policy is enabled, msg->expire_at won't work.
asoc->sent_cnt_removable will record how many chunks can be checked to
remove. If priority policy is enabled, when the chunk is queued into
the out_queue, we will increase sent_cnt_removable. When the chunk is
moved to abandon_queue or dequeue and free, we will decrease
sent_cnt_removable.
In sctp_sendmsg, we will check if there is enough snd buffer for current
msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
free chunks from out_queue in right order until the abandon+free size >
msg_len - sctp_wfree. For the abandon size, we have to wait until it
sends FORWARD TSN, receives the sack and the chunks are really freed.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
prsctp RTX policy is a policy to abandon chunks when they are
retransmitted beyond the max count.
This patch uses sent_count to count how many times one chunk has
been sent, and prsctp_param is the max rtx count, which is from
sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So similar
to TTL policy, if RTX policy is enabled, msg->expire_at won't
work.
Then in sctp_chunk_abandoned, this patch checks if chunk->sent_count
is bigger than chunk->prsctp_param to abandon this chunk.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
prsctp TTL policy is a policy to abandon chunks when they expire
at the specific time in local stack. It's similar with expires_at
in struct sctp_datamsg.
This patch uses sinfo->sinfo_timetolive to set the specific time for
TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at.
So if prsctp_enable or TTL policy is not enabled, msg->expires_at
still works as before.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
to dump the prsctp statistics info from the asoc. The prsctp statistics
includes abandoned_sent/unsent from the asoc. abandoned_sent is the
count of the packets we drop packets from retransmit/transmited queue,
and abandoned_unsent is the count of the packets we drop from out_queue
according to the policy.
Note: another option for prsctp statistics dump described in rfc is
SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
info from each stream. But by now, linux doesn't yet have per stream
statistics info, it needs rfc6525 to be implemented. As the prsctp
statistics for each stream has to be based on per stream statistics,
we will delay it until rfc6525 is done in linux.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used
to set/get sctp Partially Reliable Policies' default params,
which includes 3 policies (ttl, rtx, prio) and their values.
Still, if we set policy params in sndinfo, we will use the params
of sndinfo against chunks, instead of the default params.
In this patch, we will use 5-8bit of sp/asoc->default_flags
to store prsctp policies, and reuse asoc->default_timetolive
to store their values. It means if we enable and set prsctp
policy, prior ttl timeout in sctp will not work any more.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
We will add prsctp_enable to both asoc and ep, and replace the places
where it used net.sctp->prsctp_enable with asoc->prsctp_enable.
ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
also modify it's value through sockopt SCTP_PR_SUPPORTED.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Before commit 778be232a207 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Steve Wise <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
If an RPC program does not set vs_dispatch and pc_func() returns
rpc_drop_reply, the server sends a reply anyway containing a single
word containing the value RPC_DROP_REPLY (in network byte-order, of
course). This is a nonsense RPC message.
Fixes: 9e701c610923 ("svcrpc: simpler request dropping")
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Steve Wise <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Direct data placement is not allowed when using flavors that
guarantee integrity or privacy. When such security flavors are in
effect, don't allow the use of Read and Write chunks for moving
individual data items. All messages larger than the inline threshold
are sent via Long Call or Long Reply.
On my systems (CX-3 Pro on FDR), for small I/O operations, the use
of Long messages adds only around 5 usecs of latency in each
direction.
Note that when integrity or encryption is used, the host CPU touches
every byte in these messages. Even if it could be used, data
movement offload doesn't buy much in this case.
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Steve Wise <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
fixup_copy_count should count only the number of bytes copied to the
page list. The head and tail are now always handled without a data
copy.
And the debugging at the end of rpcrdma_inline_fixup() is also no
longer necessary, since copy_len will be non-zero when there is reply
data in the tail (a normal and valid case).
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Steve Wise <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Now that rpcrdma_inline_fixup() updates only two fields in
rq_rcv_buf, a full memcpy of that structure to rq_private_buf is
unwarranted. Updating rq_private_buf fields only where needed also
better documents what is going on.
Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Steve Wise <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|