Age | Commit message (Collapse) | Author | Files | Lines |
|
Use the new helper.
Signed-off-by: Matthew Wilcox <[email protected]>
|
|
Simply changing idr_remove's 'id' argument to 'unsigned long' works
for all callers.
Signed-off-by: Matthew Wilcox <[email protected]>
|
|
Simply changing idr_remove's 'id' argument to 'unsigned long' suffices
for all callers.
Signed-off-by: Matthew Wilcox <[email protected]>
|
|
Propagate extack to cls->destroy callbacks when called from
non-error paths. On error paths pass NULL to avoid overwriting
the failure message.
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds extack support for classifier delete callback api. This
prepares to handle extack support inside each specific classifier
implementation.
Cc: David Ahern <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The tcf_exts_validate function calls the act api change callback. For
preparing extack support for act api, this patch adds the extack as
parameter for this function which is common used in cls implementations.
Furthermore the tcf_exts_validate will call action init callback which
prepares the TC action subsystem for extack support.
Cc: David Ahern <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds extack support for classifier change callback api. This
prepares to handle extack support inside each specific classifier
implementation.
Cc: David Ahern <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds extack support for generic cls handling. The extack
will be set deeper to each called function which is not part of netdev
core api.
Cc: David Ahern <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK, parent is still passed
down but the value is never used. Compiler does not recognize it and
issues a warning. Silence it down initializing parent to 0.
Fixes: 7960d1daf278 ("net: sched: use block index as a handle instead of qdisc when block is shared")
Reported-by: David Miller <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As the tcm_ifindex with value TCM_IFINDEX_MAGIC_BLOCK is invalid ifindex,
use it to indicate that we work with block, instead of qdisc.
So if tcm_ifindex is set to TCM_IFINDEX_MAGIC_BLOCK, tcm_parent is used
to carry block_index.
If the block is set to be shared between at least 2 qdiscs, it is
forbidden to use the qdisc handle to add/delete filters. In that case,
userspace has to pass block_index.
Also, for dump of the filters, in case the block is shared in between at
least 2 qdiscs, the each filter is dumped with tcm_ifindex value
TCM_IFINDEX_MAGIC_BLOCK and tcm_parent set to block_index. That gives
the user clear indication, that the filter belongs to a shared block
and not only to one qdisc under which it is dumped.
Suggested-by: David Ahern <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
During block bind, we need to check tc offload feature. If it is
disabled yet still the block contains offloaded filters, forbid the
bind. Also forbid to register callback for a block that already
contains offloaded filters, as the play back is not supported now.
For keeping track of offloaded filters there is a new counter
introduced, alongside with couple of helpers called from cls_* code.
These helpers set and clear TCA_CLS_FLAGS_IN_HW flag.
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Both are no longer used, so remove them.
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Couple of classifiers call netif_keep_dst directly on q->dev. That is
not possible to do directly for shared blocke where multiple qdiscs are
owning the block. So introduce a infrastructure to keep track of the
block owners in list and use this list to implement block variant of
netif_keep_dst.
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use block index in the messages instead.
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Allow qdiscs to share filter blocks among them. Each qdisc type has to
use block get/put extended modifications that enable sharing.
Shared blocks are tracked within each net namespace and identified
by u32 index. This index is passed from user during the qdisc creation.
If user passes index that is not used by any other qdisc, new block
is created. If user passes index that is already used, the existing
block will be re-used.
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
So far, there was possible only to register a single filter chain
pointer to block->chain[0]. However, when the blocks will get shareable,
we need to allow multiple filter chain pointers registration.
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
net/ipv6/ip6_gre.c is a case of parallel adds.
include/trace/events/tcp.h is a little bit more tricky. The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to check block for being null in both tcf_block_put and
tcf_block_put_ext.
Fixes: 343723dd51ef ("net: sched: fix clsact init error path")
Reported-by: Prashant Bhole <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds extack support for the function tcf_block_get which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why tcf_block_get failed.
Cc: David Ahern <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds extack support for block callback to prepare per-qdisc
specific changes for extack.
Cc: David Ahern <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Three sets of overlapping changes, two in the packet scheduler
and one in the meson-gxl PHY driver.
Signed-off-by: David S. Miller <[email protected]>
|
|
Since in qdisc_create, the destroy op is called when init fails, we
don't do cleanup in init and leave it up to destroy.
This fixes use-after-free when trying to put already freed block.
Fixes: 6e40cf2d4dee ("net: sched: use extended variants of block_get/put in ingress and clsact qdiscs")
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since the block is freed with last chain being put, once we reach the
end of iteration of list_for_each_entry_safe, the block may be
already freed. I'm hitting this only by creating and deleting clsact:
[ 202.171952] ==================================================================
[ 202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390
[ 202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796
[ 202.194508]
[ 202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5
[ 202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
[ 202.213613] Call Trace:
[ 202.216369] dump_stack+0xda/0x169
[ 202.220192] ? dma_virt_map_sg+0x147/0x147
[ 202.224790] ? show_regs_print_info+0x54/0x54
[ 202.229691] ? tcf_chain_destroy+0x1dc/0x250
[ 202.234494] print_address_description+0x83/0x3d0
[ 202.239781] ? tcf_block_put_ext+0x240/0x390
[ 202.244575] kasan_report+0x1ba/0x460
[ 202.248707] ? tcf_block_put_ext+0x240/0x390
[ 202.253518] tcf_block_put_ext+0x240/0x390
[ 202.258117] ? tcf_chain_flush+0x290/0x290
[ 202.262708] ? qdisc_hash_del+0x82/0x1a0
[ 202.267111] ? qdisc_hash_add+0x50/0x50
[ 202.271411] ? __lock_is_held+0x5f/0x1a0
[ 202.275843] clsact_destroy+0x3d/0x80 [sch_ingress]
[ 202.281323] qdisc_destroy+0xcb/0x240
[ 202.285445] qdisc_graft+0x216/0x7b0
[ 202.289497] tc_get_qdisc+0x260/0x560
Fix this by holding the block also by chain 0 and put chain 0
explicitly, out of the list_for_each_entry_safe loop at the very
end of tcf_block_put_ext.
Fixes: efbf78973978 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()")
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Both Eric and Paolo noticed the rcu_barrier() we use in
tcf_block_put_ext() could be a performance bottleneck when
we have a lot of tc classes.
Paolo provided the following to demonstrate the issue:
tc qdisc add dev lo root htb
for I in `seq 1 1000`; do
tc class add dev lo parent 1: classid 1:$I htb rate 100kbit
tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb
for J in `seq 1 10`; do
tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J
done
done
time tc qdisc del dev root
real 0m54.764s
user 0m0.023s
sys 0m0.000s
The rcu_barrier() there is to ensure we free the block after all chains
are gone, that is, to queue tcf_block_put_final() at the tail of workqueue.
We can achieve this ordering requirement by refcnt'ing tcf block instead,
that is, the tcf block is freed only when the last chain in this block is
gone. This also simplifies the code.
Paolo reported after this patch we get:
real 0m0.017s
user 0m0.000s
sys 0m0.017s
Tested-by: Paolo Abeni <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
tcf_block_put_ext has assumed that all filters (and thus their goto
actions) are destroyed in RCU callback and thus can not race with our
list iteration. However, that is not true during netns cleanup (see
tcf_exts_get_net comment).
Prevent the user after free by holding all chains (except 0, that one is
already held). foreach_safe is not enough in this case.
To reproduce, run the following in a netns and then delete the ns:
ip link add dtest type dummy
tc qdisc add dev dtest ingress
tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2
Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()")
Signed-off-by: Roman Kapl <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If you flush (delete) a filter chain other than chain 0 (such as when
deleting the device), the kernel may run into a use-after-free. The
chain refcount must not be decremented unless we are sure we are done
with the chain.
To reproduce the bug, run:
ip link add dtest type dummy
tc qdisc add dev dtest ingress
tc filter add dev dtest chain 1 parent ffff: flower
ip link del dtest
Introduced in: commit f93e1cdcf42c ("net/sched: fix filter flushing"),
but unless you have KAsan or luck, you won't notice it until
commit 0dadc117ac8b ("cls_flower: use tcf_exts_get_net() before call_rcu()")
Fixes: f93e1cdcf42c ("net/sched: fix filter flushing")
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Roman Kapl <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Simple cases of overlapping changes in the packet scheduler.
Must easier to resolve this time.
Which probably means that I screwed it up somehow.
Signed-off-by: David S. Miller <[email protected]>
|
|
Instead of holding netns refcnt in tc actions, we can minimize
the holding time by saving it in struct tcf_exts instead. This
means we can just hold netns refcnt right before call_rcu() and
release it after tcf_exts_destroy() is done.
However, because on netns cleanup path we call tcf_proto_destroy()
too, obviously we can not hold netns for a zero refcnt, in this
case we have to do cleanup synchronously. It is fine for RCU too,
the caller cleanup_net() already waits for a grace period.
For other cases, refcnt is non-zero and we can safely grab it as
normal and release it after we are done.
This patch provides two new API for each filter to use:
tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can
use the following pattern:
void __destroy_filter() {
tcf_exts_destroy();
tcf_exts_put_net(); // <== release netns refcnt
kfree();
}
void some_work() {
rtnl_lock();
__destroy_filter();
rtnl_unlock();
}
void some_rcu_callback() {
tcf_queue_work(some_work);
}
if (tcf_exts_get_net()) // <== hold netns refcnt
call_rcu(some_rcu_callback);
else
__destroy_filter();
Cc: Lucas Bates <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add a callback that is to be called whenever head of the chain changes.
Also provide a callback for the default case when the caller gets a
block using non-extended getter.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Callers of tcf_block_put() could pass NULL so
we can't use block->q before checking if block is
NULL or not.
tcf_block_put_ext() callers are fine, it is always
non-NULL.
Fixes: 8c4083b30e56 ("net: sched: add block bind/unbind notif. and extended block_get/put")
Reported-by: Dave Taht <[email protected]>
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently, the offload unbind is done before the chains are flushed.
That causes driver to unregister block callback before it can get all
the callback calls done during flush, leaving the offloaded tps inside
the HW. So fix the order to prevent this situation and restore the
original behaviour.
Reported-by: Alexander Duyck <[email protected]>
Reported-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since the only user, mlx5 driver does the check in
mlx5e_setup_tc_block_cb, no need to check here.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
phase
This restores the original behaviour before the block callbacks were
introduced. Allow the drivers to do binding of block always, no matter
if the NETIF_F_HW_TC feature is on or off. Move the check to the block
callback which is called for rule insertion.
Reported-by: Alexander Duyck <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Smooth Cong Wang's bug fix into 'net-next'. Basically put
the bulk of the tcf_block_put() logic from 'net' into
tcf_block_put_ext(), but after the offload unbind.
Signed-off-by: David S. Miller <[email protected]>
|
|
In commit 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
I defer tcf_chain_flush() to a workqueue, this causes a use-after-free
because qdisc is already destroyed after we queue this work.
The tcf_block_put_deferred() is no longer necessary after we get RTNL
for each tc filter destroy work, no others could jump in at this point.
Same for tcf_chain_hold(), we are fully serialized now.
This also reduces one indirection therefore makes the code more
readable. Note this brings back a rcu_barrier(), however comparing
to the code prior to commit 7aa0045dadb6 we still reduced one
rcu_barrier(). For net-next, we can consider to refcnt tcf block to
avoid it.
Fixes: 7aa0045dadb6 ("net_sched: introduce a workqueue for RCU callbacks of tc filter")
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Several conflicts here.
NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.
Parallel additions to net/pkt_cls.h and net/sch_generic.h
A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.
The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.
Signed-off-by: David S. Miller <[email protected]>
|
|
After previous patches, it is now safe to claim that
tcf_exts_destroy() is always called with RTNL lock.
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch introduces a dedicated workqueue for tc filters
so that each tc filter's RCU callback could defer their
action destroy work to this workqueue. The helper
tcf_queue_work() is introduced for them to use.
Because we hold RTNL lock when calling tcf_block_put(), we
can not simply flush works inside it, therefore we have to
defer it again to this workqueue and make sure all flying RCU
callbacks have already queued their work before this one, in
other words, to ensure this is the last one to execute to
prevent any use-after-free.
On the other hand, this makes tcf_block_put() ugly and
harder to understand. Since David and Eric strongly dislike
adding synchronize_rcu(), this is probably the only
solution that could make everyone happy.
Please also see the code comments below.
Reported-by: Chris Mi <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Prior to commit b3f55bdda8df, the networking core doesn't wire an in-place
actions list the when the low level driver is called to offload the flow,
but all low level drivers do that (call tcf_exts_to_list()) in their
offloading "add" logic.
Now, the in-place list is set in the core which goes over the list in a loop,
but also by the hw driver when their offloading code is invoked indirectly:
cls_xxx add flow -> tc_setup_cb_call -> tc_exts_setup_cb_egdev_call -> hw driver
which messes up the core list instance upon driver return. Fix that by avoiding
in-place list on the net core code that deals with adding flows.
Fixes: b3f55bdda8df ('net: sched: introduce per-egress action device callbacks')
Signed-off-by: Or Gerlitz <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Extend the tc_setup_cb_call entrypoint function originally used only for
action egress devices callbacks to call per-block callbacks as well.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule for a specific block.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce new type of ndo_setup_tc message to propage binding/unbinding
of a block to driver. Call this ndo whenever qdisc gets/puts a block.
Alongside with this, there's need to propagate binder type from qdisc
code down to the notifier. So introduce extended variants of
block_get/put in order to pass this info.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The callers have this info, they will pass it down to tcf_fill_node.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Store net pointer in the block structure. Along the way, introduce
qdisc_net helper which allows to easily obtain net pointer for
qdisc instance.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Prepare for removal of tp->q and store Qdisc pointer in the block
structure.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The helper and the struct field ares no longer used by any code,
so remove them.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The only user of cls_flower->egress_dev is mlx5. So do the conversion
there alongside with the code originating the call in cls_flower
function fl_hw_replace_filter to the newly introduced egress device
callback infrastucture.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule and specified device
acts as tc action egress device.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Return dev directly, NULL if not possible. That is enough.
Makes no sense to pass struct net * to get_dev op, as there is only one
net possible, the one the action was created in. So just store it in
mirred priv and use directly.
Rename the mirred op callback function.
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|