Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.
This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.
Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
<< nothing; all addresses have been flushed>>
Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces
$ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
$ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
Will keep addresses on eth1 on an admin down.
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
inet6 2100:1::2/120 scope global tentative
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
valid_lft forever preferred_lft forever
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
msg.dst_sk needs to be set up with a valid socket because some callbacks
later derive the netns from it.
Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation")
Reported-by: Jon Maloy <[email protected]>
Bisected-by: Jon Maloy <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Acked-by Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When the TIPC module is unloaded, we have identified a race condition
that allows a node reference counter to go to zero and the node instance
being freed before the node timer is finished with accessing it. This
leads to occasional crashes, especially in multi-namespace environments.
The scenario goes as follows:
CPU0:(node_stop) CPU1:(node_timeout) // ref == 2
1: if(!mod_timer())
2: if (del_timer())
3: tipc_node_put() // ref -> 1
4: tipc_node_put() // ref -> 0
5: kfree_rcu(node);
6: tipc_node_get(node)
7: // BOOM!
We now clean up this functionality as follows:
1) We remove the node pointer from the node lookup table before we
attempt deactivating the timer. This way, we reduce the risk that
tipc_node_find() may obtain a valid pointer to an instance marked
for deletion; a harmless but undesirable situation.
2) We use del_timer_sync() instead of del_timer() to safely deactivate
the node timer without any risk that it might be reactivated by the
timeout handler. There is no risk of deadlock here, since the two
functions never touch the same spinlocks.
3: We remove a pointless tipc_node_get() + tipc_node_put() from the
timeout handler.
Reported-by: Zhijiang Hu <[email protected]>
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Although we have never seen it happen, we have identified the
following problematic scenario when nodes are stopped and deleted:
CPU0: CPU1:
tipc_node_xxx() //ref == 1
tipc_node_put() //ref -> 0
tipc_node_find() // node still in table
tipc_node_delete()
list_del_rcu(n. list)
tipc_node_get() //ref -> 1, bad
kfree_rcu()
tipc_node_put() //ref to 0 again.
kfree_rcu() // BOOM!
We fix this by introducing use of the conditional kref_get_if_not_zero()
instead of kref_get() in the function tipc_node_find(). This eliminates
any risk of post-mortem access.
Reported-by: Zhijiang Hu <[email protected]>
Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to update the skb->csum after pulling the skb, otherwise
an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
in the bridge code. Additionally this fixes the following splats for
network devices / bridge ports with support for and enabled RX checksum
offloading:
[...]
[ 43.986968] eth0: hw csum failure
[ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
[ 43.996193] Hardware name: BCM2709
[ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14)
[ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90)
[ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac)
[ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178)
[ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188)
[ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338)
[ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00)
[ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c)
[...]
Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas <[email protected]>
Signed-off-by: Linus Lüssing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The VLAN GetNext operation is specific to some switches, and thus can be
complicated to implement for some drivers.
Remove the support for the vlan_getnext/port_pvid_get approach in favor
of the generic and simpler port_vlan_dump function.
Signed-off-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers
which gets passed the switchdev VLAN object and callback.
This function, if implemented, takes precedence over the soon legacy
vlan_getnext/port_pvid_get approach.
Signed-off-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently tc actions are stored in a per-module hashtable,
therefore are visible to all network namespaces. This is
probably the last part of the tc subsystem which is not
aware of netns now. This patch makes them per-netns,
several tc action API's need to be adjusted for this.
The tc action API code is ugly due to historical reasons,
we need to refactor that code in the future.
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We only release the memory of the hashtable itself, not its
entries inside. This is not a problem yet since we only call
it in module release path, and module is refcount'ed by
actions. This would be a problem after we move the per module
hinfo into per netns in the latter patch.
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
llcp_sock_getname() checks llcp_sock->dev to make sure
llcp_sock is already connected or bound, however, we could
be in the middle of llcp_sock_bind() where llcp_sock->dev
is bound and llcp_sock->service_name_len is set,
but llcp_sock->service_name is not, in this case we would
lead to copy some bytes from a NULL pointer.
Just lock the sock since this is not a hot path anyway.
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
These two functions are called in sendmsg path, and the
'len' is passed from user-space, so we should not allow
malicious users to OOM kernel on purpose.
Reported-by: Dmitry Vyukov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Reviewed-by: Julian Calaby <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Another small set of fixes:
* stop critical protocol session on disconnect to avoid
it getting stuck
* wext: fix two RTNL message ordering issues
* fix an uninitialized value (found by KASAN)
* fix an out-of-bounds access (also found by KASAN)
* clear connection keys when freeing them in all cases
(IBSS, all other places already did so)
* fix expected throughput unit to get consistent values
* set default TX aggregation timeout to 0 in minstrel
to avoid (really just hide) issues and perform better
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The fix in 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be correctly
controlled.") changed behavior for bpf_set_tunnel_key() when in use with
IPv6 and thus uncovered a bug that TUNNEL_CSUM needed to be set but wasn't.
As a result, the stack dropped ingress vxlan IPv6 packets, that have been
sent via eBPF through collect meta data mode due to checksum now being zero.
Since after LCO, we enable IPv4 checksum by default, so make that analogous
and only provide a flag BPF_F_ZERO_CSUM_TX for the user to turn it off in
IPv4 case.
Fixes: 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.")
Fixes: c6c33454072f ("bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Flushing/listing entries was not RCU safe, so parallel flush/dump
could lead to kernel crash. Bug reported by Deniz Eren.
Fixes netfilter bugzilla id #1050.
Signed-off-by: Jozsef Kadlecsik <[email protected]>
|
|
Commit d15f9d694b77 ("libceph: check data_len in ->alloc_msg()")
mistakenly bumped the log level on the "tid %llu unknown, skipping"
message. Turn it back into a dout() - stray replies are perfectly
normal when OSDs flap, crash, get killed for testing purposes, etc.
Cc: [email protected] # 4.3+
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.
Cc: [email protected] # 3.19+
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
The contract between try_read() and try_write() is that when called
each processes as much data as possible. When instructed by osd_client
to skip a message, try_read() is violating this contract by returning
after receiving and discarding a single message instead of checking for
more. try_write() then gets a chance to write out more requests,
generating more replies/skips for try_read() to handle, forcing the
messenger into a starvation loop.
Cc: [email protected] # 3.10+
Reported-by: Varada Kari <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Tested-by: Varada Kari <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
Otherwise we break the contract with GSO to only pass CHECKSUM_PARTIAL
skbs down. This can easily happen with UDP+IPv4 sockets with the first
MSG_MORE write smaller than the MTU, second write is a sendfile.
Returning -EOPNOTSUPP lets the callers fall back into normal sendmsg path,
were we calculate the checksum manually during copying.
Commit d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked
sockets") started to exposes this bug.
Fixes: d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets")
Reported-by: Jiri Benc <[email protected]>
Cc: Jiri Benc <[email protected]>
Reported-by: Wakko Warner <[email protected]>
Cc: Wakko Warner <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We want to try and pull the L4 header in if it is available in the first
fragment. As such add the flag to indicate we want to pull the headers on
the first fragment in.
Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The IPv6 parsing was using a local pointer when it could use the same
pointer as the IPv4 portion of the code since the key_addrs can support
both IPv4 and IPv6 as it is just a pointer.
Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The flow dissector bits handling FCoE didn't bother to actually validate
that the space there was enough for the FCoE header. So we need to update
things so that if there is room we add the header and report a good result,
otherwise we do not add the header, and report the bad result.
Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
It turns out that for IPv4 we were reporting the ip_proto of the fragment,
and for IPv6 we were not. This patch updates that behavior so that we
always report the IP protocol of the fragment. In addition it takes the
steps of updating the payload offset code so that we will determine the
start of the payload not including the L4 header for any fragment after the
first.
Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch corrects the logic for the IPv4 parsing so that it is consistent
with how we handle IPv6. Specifically if we do not have the flow key
indicating we want the addresses we still may need to take a look at the IP
fragmentation bits and to see if we should stop after we have recognized
the L3 header.
Fixes: 807e165dc44f ("flow_dissector: Add control/reporting of fragmentation")
Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
One of the validation checks for the new array-based TCP SO_REUSEPORT
validation was unintentionally dropped in ea8add2b1903. This adds it back.
Lack of this check allows the user to allocate multiple sock_reuseport
structures (leaking all but the first).
Fixes: ea8add2b1903 ("tcp/dccp: better use of ephemeral ports in bind()")
Signed-off-by: Craig Gallek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Bool variable 'fail' is always non-negative, it indicates an error if it
is true.
The problem has been detected using coccinelle script
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci
Signed-off-by: Andrzej Hajda <[email protected]>
Acked-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
In case of multicast address we need to set always the LOWPAN_IPHC_M bit
and if a destination context identifier was found for a multicast
address then we need to set the LOWPAN_IPHC_DAC as well.
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Factor all assignments to rfkill_global_states[].cur into a single
function rfkill_update_global_state().
Signed-off-by: João Paulo Rechi Vita <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Signed-off-by: João Paulo Rechi Vita <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Signed-off-by: João Paulo Rechi Vita <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
No more users for it.
Signed-off-by: Heikki Krogerus <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
This prepares the driver for removal of platform data.
Signed-off-by: Heikki Krogerus <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Helper for finding the type based on name. Useful if the
type needs to be determined based on device property.
Signed-off-by: Heikki Krogerus <[email protected]>
[modify rfkill_types array and BUILD_BUG_ON to not cause errors]
Signed-off-by: Johannes Berg <[email protected]>
|
|
Add IEEE80211_RADIOTAP_VHT entry to rtap_namespace_sizes array in order to
define alignment and size of VHT info in tx radiotap
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Today, the supplicant will add the RRM capabilities
Information Element in the association request only if
Quiet period is supported (NL80211_FEATURE_QUIET).
Quiet is one of many RRM features, and there are other RRM
features that are not related to Quiet (e.g. neighbor
report). Therefore, requiring Quiet to enable RRM is too
restrictive.
Some of the features, like neighbor report, can be
supported by user space without any help from the kernel.
Hence adding the RRM capabilities IE to association request
should be the sole user space's decision.
Removing the RRM dependency on Quiet in the driver solves
this problem, but using an old driver with a user space
tool that would not require Quiet feature would be
problematic: the user space would add NL80211_ATTR_USE_RRM
in the association request even if the kernel doesn't
advertize NL80211_FEATURE_QUIET and the association would
be denied by the kernel.
This solution adds a global RRM capability, that tells user
space that it can request RRM capabilities IE publishment
without any specific feature support in the kernel.
Signed-off-by: Beni Lev <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Drivers may need to track which vif is using VHT MU-MIMO.
Move the flag indicationg the ownership of MU_MIMO to
ieee80211_vif.
Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Provide an interface to the lower level driver to set the VHT
MU-MIMO data. This is needed for example when there is an update
of the group data during low power state, where the management
frame will not be passed to the host at all.
Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Since the PNs of all the tx keys are now tracked in the public
part of the key struct (with atomic counter), we no longer
need these functions.
dvm and vt665{5,6} are currently the only users of these functions,
so update them accordingly.
Signed-off-by: Eliad Peller <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Some drivers/devices might want to set the IVs by
themselves (and still let mac80211 generate MMIC).
Specifically, this is needed when the device does
offloading at certain times, and the driver has
to make sure that the IVs of new tx frames (from
the host) are synchronized with IVs that were
potentially used during the offloading.
Similarly to CCMP, move the TX IVs of TKIP keys to the
public part of the key struct, and export a function
to add the IV right into the crypto header.
The public tx_pn field is defined as atomic64, so define
TKIP_PN_TO_IV16/32 helper macros to convert it to iv16/32
when needed.
Since the iv32 used for the p1k cache is taken
directly from the frame, we can safely remove
iv16/32 from being protected by tkip.txlock.
Signed-off-by: Eliad Peller <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Fix wiphy supported_band access in tx radiotap parsing introduced
in commit 5ec3aed9ba4c ("mac80211: Parse legacy and HT rate in
injected frames"). In particular, info->band is always set to 0
(IEEE80211_BAND_2GHZ) since it has not assigned yet.
This cause a kernel crash on 5GHz only devices.
Move ieee80211_parse_tx_radiotap() after info->band assignment
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
This massively reduces data copying and thus improves rx performance
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
reuse_skb is set to true if the code decides to use the last segment.
Fixes a memory leak
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
The source and destintation addresses in the memcpy arguments
are flipped. Fix that.
Fixes: 23a1f8d44c0b("mac80211: process and save VHT MU-MIMO group frame")
Signed-off-by: Sara Sharon <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
mpp_path_del() and mesh_path_del() are mostly the same function.
Move common code into a new static function.
Acked-by: Bob Copeland <[email protected]>
Signed-off-by: Henning Rogge <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Remember the last time when a mpp table entry is used for
rx or tx and remove them after MESH_PATH_EXPIRE time.
Acked-by: Bob Copeland <[email protected]>
Signed-off-by: Henning Rogge <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Make the mesh_path_del() function remove all mpp table entries
that are proxied by the removed mesh path.
Acked-by: Bob Copeland <[email protected]>
Signed-off-by: Henning Rogge <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
PBSS (Personal Basic Service Set) is a new BSS type for DMG
networks. It is similar to infrastructure BSS, having an AP-like
entity called PCP (PBSS Control Point), but it has few differences.
PBSS support is mandatory for 11ad devices.
Add support for PBSS by introducing a new PBSS flag attribute.
The PBSS flag is used in the START_AP command to request starting
a PCP instead of an AP, and in the CONNECT command to request
connecting to a PCP instead of an AP.
Signed-off-by: Lior David <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Use skb_copy_bits in preparation for allowing fragmented skbs
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
Prepararation for zero-copy A-MSDU support with page fragment SKBs
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|