Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds support for TSO using IPv4 headers with a fixed IP ID
field. This is meant to allow us to do a lossless GRO in the case of TCP
flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
headers.
In addition I am adding a feature that for now I am referring to TSO with
IP ID mangling. Basically when this flag is enabled the device has the
option to either output the flow with incrementing IP IDs or with a fixed
IP ID regardless of what the original IP ID ordering was. This is useful
in cases where the DF bit is set and we do not care if the original IP ID
value is maintained.
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The strings were missing for several of the GSO offloads that are
available. This patch provides the missing strings so that we can toggle
or query any of them via the ethtool command.
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
User needs to monitor shared buffer occupancy. For that, he issues a
snapshot command in order to instruct hardware to catch current and
maximal occupancy values, and clear command in order to clear the
historical maximal values.
Also port-pool and tc-pool-bind command response messages are extended to
carry occupancy values.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Define userspace API and drivers API for configuration of shared
buffers. Four basic objects are defined:
shared buffer - attributes are size, number of pools and TCs
pool - chunk of sharedbuffer definition, it has some size and either
static or dynamic threshold
port pool threshold - to set per-port threshold for each pool
port tc threshold bind - to bind port and TC to specified pool
with threshold.
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
This has just the single fix from Dmitry Ivanov, adding the missing
netlink notifier family check to avoid the socket close DoS problem.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
A failure in validate_xmit_skb_list() triggered an unconditional call
to dev_requeue_skb with skb=NULL. This slowly grows the queue
discipline's qlen count until all traffic through the queue stops.
We take the optimistic approach and continue running the queue after a
failure since it is unknown if later packets also will fail in the
validate path.
Fixes: 55a93b3ea780 ("qdisc: validate skb without holding lock")
Signed-off-by: Lars Persson <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.
Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].
Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module")
Signed-off-by: Mathias Krause <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Acked-by: Pavel Emelyanov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
|
|
After introduction of ndo_features_check(), we believe that very
specific checks for rare features should not be done in core
networking stack.
No driver uses gso_min_segs yet, so we revert this feature and save
few instructions per tx packet in fast path.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The meta_type_ops structures are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
For local routes that require a particular output interface we do not want
to cache the result. Caching the result causes incorrect behaviour when
there are multiple source addresses on the interface. The end result
being that if the intended recipient is waiting on that interface for the
packet he won't receive it because it will be delivered on the loopback
interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
interface as well.
This can be tested by running a program such as "dhcp_release" which
attempts to inject a packet on a particular interface so that it is
received by another program on the same board. The receiving process
should see an IP_PKTINFO ipi_ifndex value of the source interface
(e.g., eth1) instead of the loopback interface (e.g., lo). The packet
will still appear on the loopback interface in tcpdump but the important
aspect is that the CMSG info is correct.
Sample dhcp_release command line:
dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
Signed-off-by: Allain Legacy <[email protected]>
Signed off-by: Chris Friesen <[email protected]>
Reviewed-by: Julian Anastasov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently processing of multiple chunks in a single SCTP packet leads to
multiple calls to sk_data_ready, causing multiple wake up signals which
are costy and doesn't make it wake up any faster.
With this patch it will note that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.
Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.
v2: series re-checked
v3: use local vars to cleanup the code, suggested by Jakub Sitnicki
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
f1705ec197e7 added the option to retain user configured addresses on an
admin down. A comment to one of the later revisions suggested using the
IFA_F_PERMANENT flag rather than adding a user_managed boolean to the
ifaddr struct. A side effect of this change is that link local and
loopback addresses are also retained which is not part of the objective
of f1705ec197e7. Add check to drop those addresses.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
changed by ioctl
Now when we change the attributes of bridge or br_port by netlink,
a relevant netlink notification will be sent, but if we change them
by ioctl or sysfs, no notification will be sent.
We should ensure that whenever those attributes change internally or from
sysfs/ioctl, that a netlink notification is sent out to listeners.
Also, NetworkManager will use this in the future to listen for out-of-band
bridge master attribute updates and incorporate them into the runtime
configuration.
This patch is used for ioctl.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
changed by br_sysfs_if
Now when we change the attributes of bridge or br_port by netlink,
a relevant netlink notification will be sent, but if we change them
by ioctl or sysfs, no notification will be sent.
We should ensure that whenever those attributes change internally or from
sysfs/ioctl, that a netlink notification is sent out to listeners.
Also, NetworkManager will use this in the future to listen for out-of-band
bridge master attribute updates and incorporate them into the runtime
configuration.
This patch is used for br_sysfs_if, and we also move br_ifinfo_notify out
of store_flag.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
changed by br_sysfs_br
Now when we change the attributes of bridge or br_port by netlink,
a relevant netlink notification will be sent, but if we change them
by ioctl or sysfs, no notification will be sent.
We should ensure that whenever those attributes change internally or from
sysfs/ioctl, that a netlink notification is sent out to listeners.
Also, NetworkManager will use this in the future to listen for out-of-band
bridge master attribute updates and incorporate them into the runtime
configuration.
This patch is used for br_sysfs_br. and we also need to remove some
rtnl_trylock in old functions so that we can call it in a common one.
For group_addr_store, we cannot make it use store_bridge_parm, because
it's not a string-to-long convert, we will add notification on it
individually.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are some repetitive codes in stp_state_store, we can remove
them by calling store_bridge_parm.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are some repetitive codes in forward_delay_store, we can remove
them by calling store_bridge_parm.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are some repetitive codes in flush_store, we can remove
them by calling store_bridge_parm, also, it would send rtnl notification
after we add it in store_bridge_parm in the following patches.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The original tokenized iid support implemented via f53adae4eae5 ("net: ipv6:
add tokenized interface identifier support") didn't allow for clearing a
device token as it was intended that this addressing mode was the only one
active for globally scoped IPv6 addresses. Later we relaxed that restriction
via 617fe29d45bd ("net: ipv6: only invalidate previously tokenized addresses"),
and we should also allow for clearing tokens as there's no good reason why
it shouldn't be allowed.
Fixes: 617fe29d45bd ("net: ipv6: only invalidate previously tokenized addresses")
Reported-by: Robin H. Johnson <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Hannes Frederic Sowa <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
sock_owned_by_user should not be used without socket lock held. It seems
to be a common practice to check .owned before lock reclassification, so
provide a little help to abstract this check away.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
On udp sockets, recv cmsg IP_CMSG_CHECKSUM returns a checksum over
the packet payload. Since commit e6afc8ace6dd pulled the headers,
taking skb->data as the start of transport header is incorrect. Use
the transport header pointer.
Also, when peeking at an offset from the start of the packet, only
return a checksum from the start of the peeked data. Note that the
cmsg does not subtract a tail checkum when reading truncated data.
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
On udp sockets, ioctl SIOCINQ returns the payload size of the first
packet. Since commit e6afc8ace6dd pulled the headers, the result is
incorrect when subtracting header length. Remove that operation.
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree. More
specifically, they are:
1) Fix missing filter table per-netns registration in arptables, from
Florian Westphal.
2) Resolve out of bound access when parsing TCP options in
nf_conntrack_tcp, patch from Jozsef Kadlecsik.
3) Prefer NFPROTO_BRIDGE extensions over NFPROTO_UNSPEC in ebtables,
this resolves conflict between xt_limit and ebt_limit, from Phil Sutter.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Many of these functions are called from control plane path. Move
ctnetlink_nlmsg_size() under CONFIG_NF_CONNTRACK_EVENTS to avoid a
compilation warning when CONFIG_NF_CONNTRACK_EVENTS=n.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The three variants use same copy&pasted code, condense this into a
helper and use that.
Make sure info.name is 0-terminated.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Since 'netfilter: x_tables: validate targets of jumps' change we
validate that the target aligns exactly with beginning of a rule,
so offset test is now redundant.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
commit 9e67d5a739327c44885adebb4f3a538050be73e4
("[NETFILTER]: x_tables: remove obsolete overflow check") left the
compat parts alone, but we can kill it there as well.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
This looks like refactoring, but its also a bug fix.
Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
sanity tests that are done in the normal path.
For example, we do not check for underflows and the base chain policies.
While its possible to also add such checks to the compat path, its more
copy&pastry, for instance we cannot reuse check_underflow() helper as
e->target_offset differs in the compat case.
Other problem is that it makes auditing for validation errors harder; two
places need to be checked and kept in sync.
At a high level 32 bit compat works like this:
1- initial pass over blob:
validate match/entry offsets, bounds checking
lookup all matches and targets
do bookkeeping wrt. size delta of 32/64bit structures
assign match/target.u.kernel pointer (points at kernel
implementation, needed to access ->compatsize etc.)
2- allocate memory according to the total bookkeeping size to
contain the translated ruleset
3- second pass over original blob:
for each entry, copy the 32bit representation to the newly allocated
memory. This also does any special match translations (e.g.
adjust 32bit to 64bit longs, etc).
4- check if ruleset is free of loops (chase all jumps)
5-first pass over translated blob:
call the checkentry function of all matches and targets.
The alternative implemented by this patch is to drop steps 3&4 from the
compat process, the translation is changed into an intermediate step
rather than a full 1:1 translate_table replacement.
In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
representation, i.e. put() the kernel pointer and restore ->u.user.name .
This gets us a 64bit ruleset that is in the format generated by a 64bit
iptables userspace -- we can then use translate_table() to get the
'native' sanity checks.
This has two drawbacks:
1. we re-validate all the match and target entry structure sizes even
though compat translation is supposed to never generate bogus offsets.
2. we put and then re-lookup each match and target.
THe upside is that we get all sanity tests and ruleset validations
provided by the normal path and can remove some duplicated compat code.
iptables-restore time of autogenerated ruleset with 300k chains of form
-A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
-A CHAIN0002 -m limit --limit 1/s -j CHAIN0003
shows no noticeable differences in restore times:
old: 0m30.796s
new: 0m31.521s
64bit: 0m25.674s
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Always returned 0.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Validate that all matches (if any) add up to the beginning of
the target and that each match covers at least the base structure size.
The compat path should be able to safely re-use the function
as the structures only differ in alignment; added a
BUILD_BUG_ON just in case we have an arch that adds padding as well.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
We have targets and standard targets -- the latter carries a verdict.
The ip/ip6tables validation functions will access t->verdict for the
standard targets to fetch the jump offset or verdict for chainloop
detection, but this happens before the targets get checked/validated.
Thus we also need to check for verdict presence here, else t->verdict
can point right after a blob.
Spotted with UBSAN while testing malformed blobs.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The target size includes the size of the xt_entry_target struct.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Once we add more sanity testing to xt_check_entry_offsets it
becomes relvant if we're expecting a 32bit 'config_compat' blob
or a normal one.
Since we already have a lot of similar-named functions (check_entry,
compat_check_entry, find_and_check_entry, etc.) and the current
incarnation is short just fold its contents into the callers.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
When we see a jump also check that the offset gets us to beginning of
a rule (an ipt_entry).
The extra overhead is negible, even with absurd cases.
300k custom rules, 300k jumps to 'next' user chain:
[ plus one jump from INPUT to first userchain ]:
Before:
real 0m24.874s
user 0m7.532s
sys 0m16.076s
After:
real 0m27.464s
user 0m7.436s
sys 0m18.840s
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Ben Hawkes says:
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Base chains enforce absolute verdict.
User defined chains are supposed to end with an unconditional return,
xtables userspace adds them automatically.
But if such return is missing we will move to non-existent next rule.
Reported-by: Ben Hawkes <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The phys in phys_port_mask suggests this mask is about PHYs. In fact,
it means physical ports. Rename to enabled_port_mask, indicating
external enabled ports of the switch, which is hopefully less
confusing.
Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Vivien Didelot <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The drivers now allocate their own memory for private usage. Remove
the allocation from the core code.
Signed-off-by: Andrew Lunn <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Tested-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now the switch devices have a dev pointer, make use of it for allocating
the drivers private data structures using a devm_kzalloc().
Signed-off-by: Andrew Lunn <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Tested-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
By passing a device structure to the switch devices, it allows them
to use devm_* methods for resource management.
Signed-off-by: Andrew Lunn <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Tested-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
To synchronize with Kalle, here's just a big change that affects
all drivers - removing the duplicated enum ieee80211_band and
replacing it by enum nl80211_band. On top of that, just a small
documentation update.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
We remove a couple of leftover fields in struct tipc_bearer. Those
were used by the old broadcast implementation, and are not needed
any longer. There is no functional changes in this commit.
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In case of link-layer specific handling for 802.15.4 we need to cast to
802.15.4 sepcific structures. Simple add this header when include the
6lowpan header.
Signed-off-by: Alexander Aring <[email protected]>
Reviewed-by: Stefan Schmidt<[email protected]>
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
This patch adds the lowpan_is_ll function, which can be used to make a
special 6lowpan linklayer handling for a specific 6lowpan linklayer
type.
Signed-off-by: Alexander Aring <[email protected]>
Reviewed-by: Stefan Schmidt<[email protected]>
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|