Age | Commit message (Collapse) | Author | Files | Lines |
|
If the actions (re)allocation fails, or the actions list is larger than the
maximum size, and the conntrack action is the last action when these
problems are hit, then references to helper modules may be leaked. Fix
the issue.
Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action")
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
SCTP is lacking proper np->opt cloning at accept() time.
TCP and DCCP use ipv6_dup_options() helper, do the same
in SCTP.
We might later factorize this code in a common helper to avoid
future mistakes.
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This gets rid of the following compile warn:
net/mpls/mpls_iptunnel.c:40:5: warning: no previous prototype for
mpls_output [-Wmissing-prototypes]
Signed-off-by: Roopa Prabhu <[email protected]>
Acked-by: Robert Shearman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
XFRM can deal with SYNACK messages, sent while listener socket
is not locked. We add proper rcu protection to __xfrm_sk_clone_policy()
and xfrm_sk_policy_lookup()
This might serve as the first step to remove xfrm.xfrm_policy_lock
use in fast path.
Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We will soon switch sk->sk_policy[] to RCU protection,
as SYNACK packets are sent while listener socket is not locked.
This patch simply adds RCU grace period before struct xfrm_policy
freeing, and the corresponding rcu_head in struct xfrm_policy.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
A Linux PC is connected with another device over Bluetooth PAN using a
BNEP interface.
Whenever a packet is tried to be sent over the BNEP interface, the
function "bnep_net_xmit()" in "net/bluetooth/bnep/netdev.c" is called.
This function calls "bnep_net_mc_filter()", which checks (if the
destination address is multicast) if the address is set in a certain
multicast filter (&s->mc_filter). If it is not, then it is not sent out.
This filter is only changed in two other functions, found in
net/bluetooth/bnep/core.c": in "bnep_ctrl_set_mc_filter()", which is
only called if a message of type "BNEP_FILTER_MULTI_ADDR_SET" is
received. Otherwise, it is set in "bnep_add_connection()", where it is
set to a default value which only adds the broadcast address to the
filter:
set_bit(bnep_mc_hash(dev->broadcast), (ulong *) &s->mc_filter);
To sum up, if the BNEP interface does not receive any message of type
"BNEP_FILTER_MULTI_ADDR_SET", it will not send out any messages with
multicast destination addresses except for broadcast.
However, in the BNEP specification (page 27 in
http://grouper.ieee.org/groups/802/15/Bluetooth/BNEP.pdf), it is said
that per default, all multicast addresses should not be filtered, i.e.
the BNEP interface should be able to send packets with any multicast
destination address.
It seems that the default case is wrong: the multicast filter should not
block almost all multicast addresses, but should not filter out any.
This leads to the problem that e.g. Neighbor Solicitation messages sent
with Bluetooth PAN over the BNEP interface to a multicast destination
address other than broadcast are blocked and not sent out.
Therefore, in the default case, we set the mc_filter to ~0LL to not
filter out any multicast addresses.
Signed-off-by: Danny Schweizer <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
This patch reverts 6001d52 ("mac802154: tx: don't allow if down while
sync tx"). This has side effects with stop callback which flush the
transmit workqueue. The stop callback will wait until the workqueue is
flushed and holding the rtnl lock. That means it can happen that the stop
callback waits forever because it try to lock the rtnl mutex which is
already hold by stop callback.
Cc: Michael Hennerich <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
CONFIG_NF_CONNTRACK=m
CONFIG_NF_DUP_IPV4=y
results in:
net/built-in.o: In function `nf_dup_ipv4':
>> (.text+0xd434f): undefined reference to `nf_conntrack_untracked'
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
skbuff clones
If we attach the sk to the skb from nfnetlink_rcv_batch(), then
netlink_skb_destructor() will underflow the socket receive memory
counter and we get warning splat when releasing the socket.
$ cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
ffff8800ca903000 12 0 00000000 -54144 0 0 2 0 17942
^^^^^^
Rmem above shows an underflow.
And here below the warning splat:
[ 1363.815976] WARNING: CPU: 2 PID: 1356 at net/netlink/af_netlink.c:958 netlink_sock_destruct+0x80/0xb9()
[...]
[ 1363.816152] CPU: 2 PID: 1356 Comm: kworker/u16:1 Tainted: G W 4.4.0-rc1+ #153
[ 1363.816155] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
[ 1363.816160] Workqueue: netns cleanup_net
[ 1363.816163] 0000000000000000 ffff880119203dd0 ffffffff81240204 0000000000000000
[ 1363.816169] ffff880119203e08 ffffffff8104db4b ffffffff813d49a1 ffff8800ca771000
[ 1363.816174] ffffffff81a42b00 0000000000000000 ffff8800c0afe1e0 ffff880119203e18
[ 1363.816179] Call Trace:
[ 1363.816181] <IRQ> [<ffffffff81240204>] dump_stack+0x4e/0x79
[ 1363.816193] [<ffffffff8104db4b>] warn_slowpath_common+0x9a/0xb3
[ 1363.816197] [<ffffffff813d49a1>] ? netlink_sock_destruct+0x80/0xb9
skb->sk was only needed to lookup for the netns, however we don't need
this anymore since 633c9a840d0b ("netfilter: nfnetlink: avoid recurrent
netns lookups in call_batch") so this patch removes this manual socket
assignment to resolve this problem.
Reported-by: Arturo Borrero Gonzalez <[email protected]>
Reported-by: Ben Hutchings <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Tested-by: Arturo Borrero Gonzalez <[email protected]>
|
|
Pass the net pointer to the call_batch callback functions so we can skip
recurrent lookups.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Tested-by: Arturo Borrero Gonzalez <[email protected]>
|
|
Some users of rfkill, like NFC and cfg80211, use a dynamic name when
allocating rfkill, in those cases dev_name(). Therefore, the pointer
passed to rfkill_alloc() might not be valid forever, I specifically
found the case that the rfkill name was quite obviously an invalid
pointer (or at least garbage) when the wiphy had been renamed.
Fix this by making a copy of the rfkill name in rfkill_alloc().
Cc: [email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
This patch will introduce a 6lowpan entry into the debugfs if enabled.
Inside this 6lowpan directory we create a subdirectories of all 6lowpan
interfaces to offer a per interface debugfs support.
Reviewed-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
This patch introduces register and unregister functionality for lowpan
interfaces. While register a lowpan interface there are several things
which need to be initialize by the 6lowpan subsystem. Upcoming
functionality need to register/unregister per interface components e.g.
debugfs entry.
Reviewed-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Acked-by: Jukka Rissanen <[email protected]>
Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Fix a crash that may happen when bt_accept_dequeue is run after a
Bluetooth connection has been disconnected. bt_accept_unlink was called
after release_sock, permitting bt_accept_unlink to run twice on the same
socket and cause a NULL pointer dereference.
[50510.241632] BUG: unable to handle kernel NULL pointer dereference at 00000000000001a8
[50510.241694] IP: [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
[50510.241759] PGD 0
[50510.241776] Oops: 0002 [#1] SMP
[50510.241802] Modules linked in: rtl8192cu rtl_usb rtlwifi rtl8192c_common 8021q garp stp mrp llc rfcomm bnep nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp arc4 ath9k ath9k_common ath9k_hw ath kvm eeepc_wmi asus_wmi mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek sparse_keymap crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_hda_controller cfg80211 snd_hda_codec i915 snd_hwdep snd_pcm ghash_clmulni_intel snd_timer snd soundcore serio_raw cryptd drm_kms_helper drm i2c_algo_bit shpchp ath3k mei_me lpc_ich btusb bluetooth 6lowpan_iphc mei lp parport wmi video mac_hid psmouse ahci libahci r8169 mii
[50510.242279] CPU: 0 PID: 934 Comm: krfcommd Not tainted 3.16.0-49-generic #65~14.04.1-Ubuntu
[50510.242327] Hardware name: ASUSTeK Computer INC. VM40B/VM40B, BIOS 1501 12/09/2014
[50510.242370] task: ffff8800d9068a30 ti: ffff8800d7a54000 task.ti: ffff8800d7a54000
[50510.242413] RIP: 0010:[<ffffffffc01243f7>] [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
[50510.242480] RSP: 0018:ffff8800d7a57d58 EFLAGS: 00010246
[50510.242511] RAX: 0000000000000000 RBX: ffff880119bb8c00 RCX: ffff880119bb8eb0
[50510.242552] RDX: ffff880119bb8eb0 RSI: 00000000fffffe01 RDI: ffff880119bb8c00
[50510.242592] RBP: ffff8800d7a57d60 R08: 0000000000000283 R09: 0000000000000001
[50510.242633] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800d8da9eb0
[50510.242673] R13: ffff8800d74fdb80 R14: ffff880119bb8c00 R15: ffff8800d8da9c00
[50510.242715] FS: 0000000000000000(0000) GS:ffff88011fa00000(0000) knlGS:0000000000000000
[50510.242761] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50510.242794] CR2: 00000000000001a8 CR3: 0000000001c13000 CR4: 00000000001407f0
[50510.242835] Stack:
[50510.242849] ffff880119bb8eb0 ffff8800d7a57da0 ffffffffc0124506 ffff8800d8da9eb0
[50510.242899] ffff8800d8da9c00 ffff8800d9068a30 0000000000000000 ffff8800d74fdb80
[50510.242949] ffff8800d6f85208 ffff8800d7a57e08 ffffffffc0159985 000000000000001f
[50510.242999] Call Trace:
[50510.243027] [<ffffffffc0124506>] bt_accept_dequeue+0xb6/0x180 [bluetooth]
[50510.243085] [<ffffffffc0159985>] l2cap_sock_accept+0x125/0x220 [bluetooth]
[50510.243128] [<ffffffff810a1b30>] ? wake_up_state+0x20/0x20
[50510.243163] [<ffffffff8164946e>] kernel_accept+0x4e/0xa0
[50510.243200] [<ffffffffc05b97cd>] rfcomm_run+0x1ad/0x890 [rfcomm]
[50510.243238] [<ffffffffc05b9620>] ? rfcomm_process_rx+0x8a0/0x8a0 [rfcomm]
[50510.243281] [<ffffffff81091572>] kthread+0xd2/0xf0
[50510.243312] [<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
[50510.243353] [<ffffffff8176e9d8>] ret_from_fork+0x58/0x90
[50510.243387] [<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
[50510.243424] Code: 00 48 8b 93 b8 02 00 00 48 8d 83 b0 02 00 00 48 89 51 08 48 89 0a 48 89 83 b0 02 00 00 48 89 83 b8 02 00 00 48 8b 83 c0 02 00 00 <66> 83 a8 a8 01 00 00 01 48 c7 83 c0 02 00 00 00 00 00 00 f0 ff
[50510.243685] RIP [<ffffffffc01243f7>] bt_accept_unlink+0x47/0xa0 [bluetooth]
[50510.243737] RSP <ffff8800d7a57d58>
[50510.243758] CR2: 00000000000001a8
[50510.249457] ---[ end trace bb984f932c4e3ab3 ]---
Signed-off-by: Yichen Zhao <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
When we're doing background scanning and connection attempts it's
possible we timeout trying to connect and go back to scanning again.
The timeout triggers a HCI_LE_Create_Connection_Cancel which will
trigger a Connection Complete with "Unknown Connection Identifier"
error status. Since we go back to scanning this isn't really a failure
and shouldn't be presented as such to user space through mgmt.
The exception to this is if the connection attempt was due to an
explicit request on an L2CAP socket (indicated by
params->explicit_connect being true). Since the socket will get an
error it's consistent to also notify the failure on mgmt in this case.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
All LE connections are now triggered through a preceding passive scan
and waiting for a connectable advertising report. This means we've got
the best possible guarantee that the device is within range and should
be able to request the controller to perform continuous scanning. This
way we minimize the risk that we miss out on any advertising packets.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Cc: [email protected] # 4.3+
|
|
We can simplify a lot of code by making sure hdev->cur_adv_instance is
always up-to-date. This allows e.g. the removal of the
get_current_adv_instance() helper function and the special
HCI_ADV_CURRENT value. This patch also makes selecting instance 0x00
explicit in the various calls where advertising instances aren't
enabled, e.g. when HCI_ADVERTISING is set or we've just finished
enabling LE.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The logic in powered_update_hci() to initialize the advertising data &
state is a bit more complicated than it needs to be. It was previously
not doing anything if HCI_LE_ENABLED wasn't set, but this was not
obvious by quickly looking at the code. Now the conditions for the
various actions are more explicit. Another simplification is due to
the fact that __hci_req_schedule_adv_instance() takes care of setting
hdev->cur_adv_instance so there's no need to set it before calling the
function.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The hci_req_run() function already checks for empty cmd_q and bails
out if necessary. Also, req.cmd_q should really be treated as private
data of the request and not accessed directly.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The __hci_req_update_scan_rsp_data gets the instance to be updated
which should get passed to update_inst_scan_rsp_data() instead of
always enabling the current instance.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
This flag just tells us whether hdev->adv_instances is empty or not.
We can equally well use the list_empty() function to get this
information.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The code in the Read Advertising Features mgmt command handler is
unnecessarily complicated. Clean it up and remove unnecessary
variables & branches.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The request to update HCI during power on is always coming either from
hdev->req_workqueue or through an ioctl, so it's safe to use
hci_req_sync for it. This way we also eliminate potential races with
incoming mgmt commands or other actions while powering on.
Part of this refactoring is the splitting of mgmt_powered() into
mgmt_power_on() and __mgmt_power_off() functions. The main reason is
the different requirements as far as hdev locking is concerned, as
highlighted with the __ prefix of the power off API.
Since the power on in the case of clearing the AUTO_OFF flag cannot be
done synchronously in the set_powered mgmt handler, the hci_power_on
work callback is extended to cover this (which also simplifies the
set_powered helper a lot).
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
We'll soon need this both in hci_request.c and mgmt.c so move it to
hci_request.c as a generic helper.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
We'll soon need to update the EIR both from hci_request.c and mgmt.c
so move update_eir() as a more generic request helper to
hci_request.c.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
We'll soon need this both from hci_request.c and mgmt.c so move it as
a request helper function to hci_request.c.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Since the other discoverable changes are behind req_workqueue now it
only makes sense to move the discoverable timeout there as well.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The discoverable mode is intrinsically linked with the connectable
mode e.g. through sharing the same HCI command (Write Scan Enable) for
BR/EDR. It makes therefore sense to move it to hci_request.c and run
the changes through the same hdev->req_workqueue.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The Class of Device needs to be changed e.g. for limited discoverable
mode. In preparation of moving the discoverable mode to hci_request.c
and hdev->req_workqueue, move the Class of Device helpers there first.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
This way the connectable changes are synchronized against each other,
which helps avoid potential races. The connectable mode is also linked
together with LE advertising which makes is more convenient to have it
behind the same workqueue.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
This paves the way for eventually performing advertising changes
through the hdev->req_workqueue. Some new APIs need to be exposed from
mgmt.c to hci_request.c and vice-versa, but many of them will go away
once hdev->req_workqueue gets used.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
This way we avoid the need to do a forward declaration in later
patches.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Since Add/Remove Device perform the page scan updates independently
from the HCI command completion we've introduced a potential race when
multiple mgmt commands are queued. Doing the page scan updates through
the req_workqueue ensures that the state changes are performed in a
race-free manner.
At the same time, to make the request helper more widely usable,
extend it to also cover Inquiry Scan changes since those are behind
the same HCI command. This is also reflected in the new name of the
API as well as the work struct name.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
nf_log_trace() outputs bogus 'TRACE:' strings because I forgot to update
the comments array.
Fixes: 33d5a7b14bfd0 ("netfilter: nf_tables: extend tracing infrastructure")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Change return type of nfulnl_set_timeout() and nfulnl_set_qthresh() to
be void.
This patch changes the return type of the static methods
nfulnl_set_timeout() and nfulnl_set_qthresh() to be void, as there is no
justification and no need for these methods to return int.
Signed-off-by: Rami Rosen <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Commit 3bfe049807c2403 ("netfilter: nfnetlink_{log,queue}:
Register pernet in first place") reorganised the initialisation
order of the pernet_subsys to avoid "use-before-initialised"
condition. However, in doing so the cleanup logic in nfnetlink_queue
got botched in that the pernet_subsys wasn't cleaned in case
nfnetlink_subsys_register failed. This patch adds the necessary
cleanup routine call.
Fixes: 3bfe049807c2403 ("netfilter: nfnetlink_{log,queue}: Register pernet in first place")
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Valdis reports NULL deref in nf_ct_frag6_gather.
Problem is bogus use of skb_queue_walk() -- we miss first skb in the list
since we start with head->next instead of head.
In case the element we're looking for was head->next we won't find
a result and then trip over NULL iter.
(defrag uses plain NULL-terminated list rather than one terminated by
head-of-list-pointer, which is what skb_queue_walk expects).
Fixes: 029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations")
Reported-by: Valdis Kletnieks <[email protected]>
Tested-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Only needed when meta nftrace rule(s) were added.
The assumption is that no such rules are active, so the call to
nft_trace_init is "never" needed.
When nftrace rules are active, we always call the nft_trace_* functions,
but will only send netlink messages when all of the following are true:
- traceinfo structure was initialised
- skb->nf_trace == 1
- at least one subscriber to trace group.
Adding an extra conditional
(static_branch ... && skb->nf_trace)
nft_trace_init( ..)
Is possible but results in a larger nft_do_chain footprint.
Signed-off-by: Florian Westphal <[email protected]>
Acked-by: Patrick McHardy <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
nft monitor mode can then decode and display this trace data.
Parts of LL/Network/Transport headers are provided as separate
attributes.
Otherwise, printing IP address data becomes virtually impossible
for userspace since in the case of the netdev family we really don't
want userspace to have to know all the possible link layer types
and/or sizes just to display/print an ip address.
We also don't want userspace to have to follow ipv6 header chains
to get the s/dport info, the kernel already did this work for us.
To avoid bloating nft_do_chain all data required for tracing is
encapsulated in nft_traceinfo.
The structure is initialized unconditionally(!) for each nft_do_chain
invocation.
This unconditionall call will be moved under a static key in a
followup patch.
With lots of help from Patrick McHardy and Pablo Neira.
Signed-off-by: Florian Westphal <[email protected]>
Acked-by: Patrick McHardy <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound. As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.
net_cls and net_prio controllers are examples of the latter. They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.
Both net_cls and net_prio aren't properly hierarchical. Both inherit
configuration from the parent on creation but there's no interaction
afterwards. An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level. net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.
While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.
In preparation, this patch updates sock->sk_cgrp_data handling so that
it points to the v2 cgroup that sock was created in until either
net_prio or net_cls is used. Once either of the two is used,
sock->sk_cgrp_data reverts to its previous role of carrying prioidx
and classid. This is to avoid adding yet another cgroup related field
to struct sock.
As the mode switching can happen at most once per boot, the switching
mechanism is aimed at lowering hot path overhead. It may leak a
finite, likely small, number of cgroup refs and report spurious
prioidx or classid on switching; however, dynamic updates of prioidx
and classid have always been racy and lossy - socks between creation
and fd installation are never updated, config changes don't update
existing sockets at all, and prioidx may index with dead and recycled
cgroup IDs. Non-critical inaccuracies from small race windows won't
make any noticeable difference.
This patch doesn't make use of the pointer yet. The following patch
will implement netfilter match for cgroup2 membership.
v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
cgroup specific field.
v3: Add comments explaining why sock_data_prioidx() and
sock_data_classid() use different fallback values.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Daniel Wagner <[email protected]>
CC: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data.
->sk_cgroup_prioidx and ->sk_classid are moved into it. The struct
and its accessors are defined in cgroup-defs.h. This is to prepare
for overloading the fields with a cgroup pointer.
This patch mostly performs equivalent conversions but the followings
are noteworthy.
* Equality test before updating classid is removed from
sock_update_classid(). This shouldn't make any noticeable
difference and a similar test will be implemented on the helper side
later.
* sock_update_netprioidx() now takes struct sock_cgroup_data and can
be moved to netprio_cgroup.h without causing include dependency
loop. Moved.
* The dummy version of sock_update_netprioidx() converted to a static
inline function while at it.
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
netprio builds per-netdev contiguous priomap array which is indexed by
css->id. The array is allocated using kzalloc() effectively limiting
the maximum ID supported to some thousand range. This patch caps the
maximum supported css->id to USHRT_MAX which should be way above what
is actually useable.
This allows reducing sock->sk_cgrp_prioidx to u16 from u32. The freed
up part will be used to overload the cgroup related fields.
sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the
two cgroup related fields are adjacent.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Daniel Wagner <[email protected]>
Cc: Daniel Borkmann <[email protected]>
CC: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit 0d76d6e8b2507983a2cae4c09880798079007421 and merge
commit c402293bd76fbc93e52ef8c0947ab81eea3ae019, reversing changes made
to c89359a42e2a49656451569c382eed63e781153c.
The virtio-vsock device specification is not finalized yet. Michael
Tsirkin voiced concerned about merging this code when the hardware
interface (and possibly the userspace interface) could still change.
Signed-off-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|