diff options
author | David S. Miller <davem@davemloft.net> | 2018-01-17 14:53:58 -0500 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2018-01-17 14:53:58 -0500 |
commit | ca46abd6f89fc21117975ef867a002fa68fe8be9 (patch) | |
tree | 4396d9fa413aa9b8478decef3552542bbd3bee7d /net/sched/cls_api.c | |
parent | c9a824210f530bcddf4ef9cbb04445da70e4b3e8 (diff) | |
parent | 4b23258d6a1b0040c1e7d2d997800cfd09294b7f (diff) |
Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'
Jiri Pirko says:
====================
net: sched: allow qdiscs to share filter block instances
Currently the filters added to qdiscs are independent. So for example if you
have 2 netdevices and you create ingress qdisc on both and you want to add
identical filter rules both, you need to add them twice. This patchset
makes this easier and mainly saves resources allowing to share all filters
within a qdisc - I call it a "filter block". Also this helps to save
resources when we do offload to hw for example to expensive TCAM.
So back to the example. First, we create 2 qdiscs. Both will share
block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress
^^^^^^^^^^^^^^^^
$ tc qdisc add dev ens8 ingress_block 22 ingress
^^^^^^^^^^^^^^^^
If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress
Now if we list the qdiscs, we will see the block index in the output:
$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev ens9 parent ffff:fff1
To make is more visual, the situation looks like this:
ens7 ingress qdisc ens7 ingress qdisc
| |
| |
+----------> block 22 <----------+
Unlimited number of qdiscs may share the same block.
Note that this patchset introduces block sharing support also for clsact
qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24
We can add filter using the block index:
$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
Note we cannot use the qdisc for filter manipulations of shared blocks:
$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
Error: This filter block is shared. Please use the block index to manipulate the filters.
We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:
$ tc filter show block 22
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...
---
v10->v11:
- patch 2:
- fixed error path when register_pernet_subsys fails pointed out by Cong
- patch 9:
- rebased on top of the current net-next
v9->v10:
- patch 7:
- fixed ifindex magic in the patch description
- userspace patches:
- added manpages and patch descriptions
v8->v9:
- patch "net: sched: add rt netlink message type for block get" was
removed, userspace check filter existence using qdisc dump
v7->v8:
- patch 7:
- added comment to ifindex block magic
- patch 9:
- new patch
- patch 10:
- base this on the patch that introduces qdisc-generic block index
attributes parsing/dumping
- patch 13:
- rebased on top of current net-next
v6->v7:
- patch 1:
- unsquashed shared block patch that was previously squashed by mistake
- fixed error path in block create - freeing chain 0
- patch 2:
- new patch - splitted from the previous one as it got accidentaly
squashed in the rebasing process in the past
- converted to idr extended
- removed auto-generating of block indexes. Callers have to explicily
tell that the block is shared by passing non-zero block index
- fixed error path in block get ext - freeing chain 0
- patch 7:
- changed extack message for block index handle as suggested by DaveA
- added extack message when block index does not exist
- the block ifindex magic is in define and change to 0xffffffff
as suggested by Jamal
- patch 8:
- new patch implementing RTM_GETBLOCK in order to query if the block
with some index exists
- patch 9:
- adjust to the core changes and check block index attributes for being 0
v5->v6:
- added patch 6 that introduces block handle
v4->v5:
- patch 5:
- add tracking of binding of devs that are unable to offload and check
that before block cbs call.
v3->v4:
- patch 1:
- rebased on top of the current net-next
- added some extack strings
- patch 3:
- rebased on top of the current net-next
- patch 5:
- propagate netdev_ops->ndo_setup_tc error up to tcf_block_offload_bind
caller
- patch 7:
- rebased on top of the current net-next
v2->v3:
- removed original patch 1, removing tp->q cls_bpf dependency. Fixed by
Jakub in the meantime.
- patch 1:
- rebased on top of the current net-next
- patch 5:
- new patch
- patch 8:
- removed "p_" prefix from block index function args
- patch 10:
- add tc offload feature handling
====================
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/sched/cls_api.c')
-rw-r--r-- | net/sched/cls_api.c | 595 |
1 files changed, 464 insertions, 131 deletions
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 6708b6953bfa..e500d11da9cd 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -24,6 +24,7 @@ #include <linux/init.h> #include <linux/kmod.h> #include <linux/slab.h> +#include <linux/idr.h> #include <net/net_namespace.h> #include <net/sock.h> #include <net/netlink.h> @@ -121,8 +122,7 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp) } static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, - u32 prio, u32 parent, struct Qdisc *q, - struct tcf_chain *chain) + u32 prio, struct tcf_chain *chain) { struct tcf_proto *tp; int err; @@ -156,8 +156,6 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol, tp->classify = tp->ops->classify; tp->protocol = protocol; tp->prio = prio; - tp->classid = parent; - tp->q = q; tp->chain = chain; err = tp->ops->init(tp); @@ -179,6 +177,12 @@ static void tcf_proto_destroy(struct tcf_proto *tp) kfree_rcu(tp, rcu); } +struct tcf_filter_chain_list_item { + struct list_head list; + tcf_chain_head_change_t *chain_head_change; + void *chain_head_change_priv; +}; + static struct tcf_chain *tcf_chain_create(struct tcf_block *block, u32 chain_index) { @@ -187,6 +191,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block, chain = kzalloc(sizeof(*chain), GFP_KERNEL); if (!chain) return NULL; + INIT_LIST_HEAD(&chain->filter_chain_list); list_add_tail(&chain->list, &block->chain_list); chain->block = block; chain->index = chain_index; @@ -194,12 +199,19 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block, return chain; } +static void tcf_chain_head_change_item(struct tcf_filter_chain_list_item *item, + struct tcf_proto *tp_head) +{ + if (item->chain_head_change) + item->chain_head_change(tp_head, item->chain_head_change_priv); +} static void tcf_chain_head_change(struct tcf_chain *chain, struct tcf_proto *tp_head) { - if (chain->chain_head_change) - chain->chain_head_change(tp_head, - chain->chain_head_change_priv); + struct tcf_filter_chain_list_item *item; + + list_for_each_entry(item, &chain->filter_chain_list, list) + tcf_chain_head_change_item(item, tp_head); } static void tcf_chain_flush(struct tcf_chain *chain) @@ -253,47 +265,149 @@ void tcf_chain_put(struct tcf_chain *chain) } EXPORT_SYMBOL(tcf_chain_put); -static void tcf_block_offload_cmd(struct tcf_block *block, struct Qdisc *q, - struct tcf_block_ext_info *ei, - enum tc_block_command command) +static bool tcf_block_offload_in_use(struct tcf_block *block) +{ + return block->offloadcnt; +} + +static int tcf_block_offload_cmd(struct tcf_block *block, + struct net_device *dev, + struct tcf_block_ext_info *ei, + enum tc_block_command command) { - struct net_device *dev = q->dev_queue->dev; struct tc_block_offload bo = {}; - if (!dev->netdev_ops->ndo_setup_tc) - return; bo.command = command; bo.binder_type = ei->binder_type; bo.block = block; - dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); + return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo); } -static void tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q, - struct tcf_block_ext_info *ei) +static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q, + struct tcf_block_ext_info *ei) { - tcf_block_offload_cmd(block, q, ei, TC_BLOCK_BIND); + struct net_device *dev = q->dev_queue->dev; + int err; + + if (!dev->netdev_ops->ndo_setup_tc) + goto no_offload_dev_inc; + + /* If tc offload feature is disabled and the block we try to bind + * to already has some offloaded filters, forbid to bind. + */ + if (!tc_can_offload(dev) && tcf_block_offload_in_use(block)) + return -EOPNOTSUPP; + + err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_BIND); + if (err == -EOPNOTSUPP) + goto no_offload_dev_inc; + return err; + +no_offload_dev_inc: + if (tcf_block_offload_in_use(block)) + return -EOPNOTSUPP; + block->nooffloaddevcnt++; + return 0; } static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q, struct tcf_block_ext_info *ei) { - tcf_block_offload_cmd(block, q, ei, TC_BLOCK_UNBIND); + struct net_device *dev = q->dev_queue->dev; + int err; + + if (!dev->netdev_ops->ndo_setup_tc) + goto no_offload_dev_dec; + err = tcf_block_offload_cmd(block, dev, ei, TC_BLOCK_UNBIND); + if (err == -EOPNOTSUPP) + goto no_offload_dev_dec; + return; + +no_offload_dev_dec: + WARN_ON(block->nooffloaddevcnt-- == 0); +} + +static int +tcf_chain_head_change_cb_add(struct tcf_chain *chain, + struct tcf_block_ext_info *ei, + struct netlink_ext_ack *extack) +{ + struct tcf_filter_chain_list_item *item; + + item = kmalloc(sizeof(*item), GFP_KERNEL); + if (!item) { + NL_SET_ERR_MSG(extack, "Memory allocation for head change callback item failed"); + return -ENOMEM; + } + item->chain_head_change = ei->chain_head_change; + item->chain_head_change_priv = ei->chain_head_change_priv; + if (chain->filter_chain) + tcf_chain_head_change_item(item, chain->filter_chain); + list_add(&item->list, &chain->filter_chain_list); + return 0; } -int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, - struct tcf_block_ext_info *ei, - struct netlink_ext_ack *extack) +static void +tcf_chain_head_change_cb_del(struct tcf_chain *chain, + struct tcf_block_ext_info *ei) +{ + struct tcf_filter_chain_list_item *item; + + list_for_each_entry(item, &chain->filter_chain_list, list) { + if ((!ei->chain_head_change && !ei->chain_head_change_priv) || + (item->chain_head_change == ei->chain_head_change && + item->chain_head_change_priv == ei->chain_head_change_priv)) { + tcf_chain_head_change_item(item, NULL); + list_del(&item->list); + kfree(item); + return; + } + } + WARN_ON(1); +} + +struct tcf_net { + struct idr idr; +}; + +static unsigned int tcf_net_id; + +static int tcf_block_insert(struct tcf_block *block, struct net *net, + u32 block_index, struct netlink_ext_ack *extack) +{ + struct tcf_net *tn = net_generic(net, tcf_net_id); + int err; + + err = idr_alloc_ext(&tn->idr, block, NULL, block_index, + block_index + 1, GFP_KERNEL); + if (err) + return err; + block->index = block_index; + return 0; +} + +static void tcf_block_remove(struct tcf_block *block, struct net *net) { - struct tcf_block *block = kzalloc(sizeof(*block), GFP_KERNEL); + struct tcf_net *tn = net_generic(net, tcf_net_id); + + idr_remove_ext(&tn->idr, block->index); +} + +static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q, + struct netlink_ext_ack *extack) +{ + struct tcf_block *block; struct tcf_chain *chain; int err; + block = kzalloc(sizeof(*block), GFP_KERNEL); if (!block) { NL_SET_ERR_MSG(extack, "Memory allocation for block failed"); - return -ENOMEM; + return ERR_PTR(-ENOMEM); } INIT_LIST_HEAD(&block->chain_list); INIT_LIST_HEAD(&block->cb_list); + INIT_LIST_HEAD(&block->owner_list); /* Create chain 0 by default, it has to be always present. */ chain = tcf_chain_create(block, 0); @@ -302,17 +416,149 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, err = -ENOMEM; goto err_chain_create; } - WARN_ON(!ei->chain_head_change); - chain->chain_head_change = ei->chain_head_change; - chain->chain_head_change_priv = ei->chain_head_change_priv; block->net = qdisc_net(q); + block->refcnt = 1; + block->net = net; block->q = q; - tcf_block_offload_bind(block, q, ei); - *p_block = block; - return 0; + return block; err_chain_create: kfree(block); + return ERR_PTR(err); +} + +static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index) +{ + struct tcf_net *tn = net_generic(net, tcf_net_id); + + return idr_find_ext(&tn->idr, block_index); +} + +static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block) +{ + return list_first_entry(&block->chain_list, struct tcf_chain, list); +} + +struct tcf_block_owner_item { + struct list_head list; + struct Qdisc *q; + enum tcf_block_binder_type binder_type; +}; + +static void +tcf_block_owner_netif_keep_dst(struct tcf_block *block, + struct Qdisc *q, + enum tcf_block_binder_type binder_type) +{ + if (block->keep_dst && + binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS && + binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS) + netif_keep_dst(qdisc_dev(q)); +} + +void tcf_block_netif_keep_dst(struct tcf_block *block) +{ + struct tcf_block_owner_item *item; + + block->keep_dst = true; + list_for_each_entry(item, &block->owner_list, list) + tcf_block_owner_netif_keep_dst(block, item->q, + item->binder_type); +} +EXPORT_SYMBOL(tcf_block_netif_keep_dst); + +static int tcf_block_owner_add(struct tcf_block *block, + struct Qdisc *q, + enum tcf_block_binder_type binder_type) +{ + struct tcf_block_owner_item *item; + + item = kmalloc(sizeof(*item), GFP_KERNEL); + if (!item) + return -ENOMEM; + item->q = q; + item->binder_type = binder_type; + list_add(&item->list, &block->owner_list); + return 0; +} + +static void tcf_block_owner_del(struct tcf_block *block, + struct Qdisc *q, + enum tcf_block_binder_type binder_type) +{ + struct tcf_block_owner_item *item; + + list_for_each_entry(item, &block->owner_list, list) { + if (item->q == q && item->binder_type == binder_type) { + list_del(&item->list); + kfree(item); + return; + } + } + WARN_ON(1); +} + +int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q, + struct tcf_block_ext_info *ei, + struct netlink_ext_ack *extack) +{ + struct net *net = qdisc_net(q); + struct tcf_block *block = NULL; + bool created = false; + int err; + + if (ei->block_index) { + /* block_index not 0 means the shared block is requested */ + block = tcf_block_lookup(net, ei->block_index); + if (block) + block->refcnt++; + } + + if (!block) { + block = tcf_block_create(net, q, extack); + if (IS_ERR(block)) + return PTR_ERR(block); + created = true; + if (ei->block_index) { + err = tcf_block_insert(block, net, + ei->block_index, extack); + if (err) + goto err_block_insert; + } + } + + err = tcf_block_owner_add(block, q, ei->binder_type); + if (err) + goto err_block_owner_add; + + tcf_block_owner_netif_keep_dst(block, q, ei->binder_type); + + err = tcf_chain_head_change_cb_add(tcf_block_chain_zero(block), + ei, extack); + if (err) + goto err_chain_head_change_cb_add; + + err = tcf_block_offload_bind(block, q, ei); + if (err) + goto err_block_offload_bind; + + *p_block = block; + return 0; + +err_block_offload_bind: + tcf_chain_head_change_cb_del(tcf_block_chain_zero(block), ei); +err_chain_head_change_cb_add: + tcf_block_owner_del(block, q, ei->binder_type); +err_block_owner_add: + if (created) { + if (tcf_block_shared(block)) + tcf_block_remove(block, net); +err_block_insert: + kfree(tcf_block_chain_zero(block)); + kfree(block); + } else { + block->refcnt--; + } return err; } EXPORT_SYMBOL(tcf_block_get_ext); @@ -346,26 +592,35 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, { struct tcf_chain *chain, *tmp; - /* Hold a refcnt for all chains, so that they don't disappear - * while we are iterating. - */ if (!block) return; - list_for_each_entry(chain, &block->chain_list, list) - tcf_chain_hold(chain); + tcf_chain_head_change_cb_del(tcf_block_chain_zero(block), ei); + tcf_block_owner_del(block, q, ei->binder_type); - list_for_each_entry(chain, &block->chain_list, list) - tcf_chain_flush(chain); + if (--block->refcnt == 0) { + if (tcf_block_shared(block)) + tcf_block_remove(block, block->net); + + /* Hold a refcnt for all chains, so that they don't disappear + * while we are iterating. + */ + list_for_each_entry(chain, &block->chain_list, list) + tcf_chain_hold(chain); + + list_for_each_entry(chain, &block->chain_list, list) + tcf_chain_flush(chain); + } tcf_block_offload_unbind(block, q, ei); - /* At this point, all the chains should have refcnt >= 1. */ - list_for_each_entry_safe(chain, tmp, &block->chain_list, list) - tcf_chain_put(chain); + if (block->refcnt == 0) { + /* At this point, all the chains should have refcnt >= 1. */ + list_for_each_entry_safe(chain, tmp, &block->chain_list, list) + tcf_chain_put(chain); - /* Finally, put chain 0 and allow block to be freed. */ - chain = list_first_entry(&block->chain_list, struct tcf_chain, list); - tcf_chain_put(chain); + /* Finally, put chain 0 and allow block to be freed. */ + tcf_chain_put(tcf_block_chain_zero(block)); + } } EXPORT_SYMBOL(tcf_block_put_ext); @@ -423,9 +678,16 @@ struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block, { struct tcf_block_cb *block_cb; + /* At this point, playback of previous block cb calls is not supported, + * so forbid to register to block which already has some offloaded + * filters present. + */ + if (tcf_block_offload_in_use(block)) + return ERR_PTR(-EOPNOTSUPP); + block_cb = kzalloc(sizeof(*block_cb), GFP_KERNEL); if (!block_cb) - return NULL; + return ERR_PTR(-ENOMEM); block_cb->cb = cb; block_cb->cb_ident = cb_ident; block_cb->cb_priv = cb_priv; @@ -441,7 +703,7 @@ int tcf_block_cb_register(struct tcf_block *block, struct tcf_block_cb *block_cb; block_cb = __tcf_block_cb_register(block, cb, cb_ident, cb_priv); - return block_cb ? 0 : -ENOMEM; + return IS_ERR(block_cb) ? PTR_ERR(block_cb) : 0; } EXPORT_SYMBOL(tcf_block_cb_register); @@ -471,6 +733,10 @@ static int tcf_block_cb_call(struct tcf_block *block, enum tc_setup_type type, int ok_count = 0; int err; + /* Make sure all netdevs sharing this block are offload-capable. */ + if (block->nooffloaddevcnt && err_stop) + return -EOPNOTSUPP; + list_for_each_entry(block_cb, &block->cb_list, list) { err = block_cb->cb(type, type_data, block_cb->cb_priv); if (err) { @@ -524,8 +790,9 @@ reclassify: #ifdef CONFIG_NET_CLS_ACT reset: if (unlikely(limit++ >= max_reclassify_loop)) { - net_notice_ratelimited("%s: reclassify loop, rule prio %u, protocol %02x\n", - tp->q->ops->id, tp->prio & 0xffff, + net_notice_ratelimited("%u: reclassify loop, rule prio %u, protocol %02x\n", + tp->chain->block->index, + tp->prio & 0xffff, ntohs(tp->protocol)); return TC_ACT_SHOT; } @@ -598,8 +865,9 @@ static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain, } static int tcf_fill_node(struct net *net, struct sk_buff *skb, - struct tcf_proto *tp, struct Qdisc *q, u32 parent, - void *fh, u32 portid, u32 seq, u16 flags, int event) + struct tcf_proto *tp, struct tcf_block *block, + struct Qdisc *q, u32 parent, void *fh, + u32 portid, u32 seq, u16 flags, int event) { struct tcmsg *tcm; struct nlmsghdr *nlh; @@ -612,8 +880,13 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb, tcm->tcm_family = AF_UNSPEC; tcm->tcm__pad1 = 0; tcm->tcm__pad2 = 0; - tcm->tcm_ifindex = qdisc_dev(q)->ifindex; - tcm->tcm_parent = parent; + if (q) { + tcm->tcm_ifindex = qdisc_dev(q)->ifindex; + tcm->tcm_parent = parent; + } else { + tcm->tcm_ifindex = TCM_IFINDEX_MAGIC_BLOCK; + tcm->tcm_block_index = block->index; + } tcm->tcm_info = TC_H_MAKE(tp->prio, tp->protocol); if (nla_put_string(skb, TCA_KIND, tp->ops->kind)) goto nla_put_failure; @@ -636,8 +909,8 @@ nla_put_failure: static int tfilter_notify(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, struct tcf_proto *tp, - struct Qdisc *q, u32 parent, - void *fh, int event, bool unicast) + struct tcf_block *block, struct Qdisc *q, + u32 parent, void *fh, int event, bool unicast) { struct sk_buff *skb; u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; @@ -646,8 +919,8 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb, if (!skb) return -ENOBUFS; - if (tcf_fill_node(net, skb, tp, q, parent, fh, portid, n->nlmsg_seq, - n->nlmsg_flags, event) <= 0) { + if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid, + n->nlmsg_seq, n->nlmsg_flags, event) <= 0) { kfree_skb(skb); return -EINVAL; } @@ -661,8 +934,8 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb, static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, struct nlmsghdr *n, struct tcf_proto *tp, - struct Qdisc *q, u32 parent, - void *fh, bool unicast, bool *last) + struct tcf_block *block, struct Qdisc *q, + u32 parent, void *fh, bool unicast, bool *last) { struct sk_buff *skb; u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; @@ -672,8 +945,8 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, if (!skb) return -ENOBUFS; - if (tcf_fill_node(net, skb, tp, q, parent, fh, portid, n->nlmsg_seq, - n->nlmsg_flags, RTM_DELTFILTER) <= 0) { + if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid, + n->nlmsg_seq, n->nlmsg_flags, RTM_DELTFILTER) <= 0) { kfree_skb(skb); return -EINVAL; } @@ -692,15 +965,16 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb, } static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb, - struct Qdisc *q, u32 parent, - struct nlmsghdr *n, + struct tcf_block *block, struct Qdisc *q, + u32 parent, struct nlmsghdr *n, struct tcf_chain *chain, int event) { struct tcf_proto *tp; for (tp = rtnl_dereference(chain->filter_chain); tp; tp = rtnl_dereference(tp->next)) - tfilter_notify(net, oskb, n, tp, q, parent, 0, event, false); + tfilter_notify(net, oskb, n, tp, block, + q, parent, 0, event, false); } /* Add/change/delete/get a filter node */ @@ -716,13 +990,11 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, bool prio_allocate; u32 parent; u32 chain_index; - struct net_device *dev; - struct Qdisc *q; + struct Qdisc *q = NULL; struct tcf_chain_info chain_info; struct tcf_chain *chain = NULL; struct tcf_block *block; struct tcf_proto *tp; - const struct Qdisc_class_ops *cops; unsigned long cl; void *fh; int err; @@ -769,41 +1041,58 @@ replay: /* Find head of filter chain. */ - /* Find link */ - dev = __dev_get_by_index(net, t->tcm_ifindex); - if (dev == NULL) - return -ENODEV; - - /* Find qdisc */ - if (!parent) { - q = dev->qdisc; - parent = q->handle; + if (t->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) { + block = tcf_block_lookup(net, t->tcm_block_index); + if (!block) { + NL_SET_ERR_MSG(extack, "Block of given index was not found"); + err = -EINVAL; + goto errout; + } } else { - q = qdisc_lookup(dev, TC_H_MAJ(t->tcm_parent)); - if (q == NULL) - return -EINVAL; - } + const struct Qdisc_class_ops *cops; + struct net_device *dev; - /* Is it classful? */ - cops = q->ops->cl_ops; - if (!cops) - return -EINVAL; + /* Find link */ + dev = __dev_get_by_index(net, t->tcm_ifindex); + if (!dev) + return -ENODEV; - if (!cops->tcf_block) - return -EOPNOTSUPP; + /* Find qdisc */ + if (!parent) { + q = dev->qdisc; + parent = q->handle; + } else { + q = qdisc_lookup(dev, TC_H_MAJ(t->tcm_parent)); + if (!q) + return -EINVAL; + } - /* Do we search for filter, attached to class? */ - if (TC_H_MIN(parent)) { - cl = cops->find(q, parent); - if (cl == 0) - return -ENOENT; - } + /* Is it classful? */ + cops = q->ops->cl_ops; + if (!cops) + return -EINVAL; - /* And the last stroke */ - block = cops->tcf_block(q, cl, extack); - if (!block) { - err = -EINVAL; - goto errout; + if (!cops->tcf_block) + return -EOPNOTSUPP; + + /* Do we search for filter, attached to class? */ + if (TC_H_MIN(parent)) { + cl = cops->find(q, parent); + if (cl == 0) + return -ENOENT; + } + + /* And the last stroke */ + block = cops->tcf_block(q, cl, extack); + if (!block) { + err = -EINVAL; + goto errout; + } + if (tcf_block_shared(block)) { + NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters"); + err = -EOPNOTSUPP; + goto errout; + } } chain_index = tca[TCA_CHAIN] ? nla_get_u32(tca[TCA_CHAIN]) : 0; @@ -819,7 +1108,7 @@ replay: } if (n->nlmsg_type == RTM_DELTFILTER && prio == 0) { - tfilter_notify_chain(net, skb, q, parent, n, + tfilter_notify_chain(net, skb, block, q, parent, n, chain, RTM_DELTFILTER); tcf_chain_flush(chain); err = 0; @@ -851,7 +1140,7 @@ replay: prio = tcf_auto_prio(tcf_chain_tp_prev(&chain_info)); tp = tcf_proto_create(nla_data(tca[TCA_KIND]), - protocol, prio, parent, q, chain); + protocol, prio, chain); if (IS_ERR(tp)) { err = PTR_ERR(tp); goto errout; @@ -867,7 +1156,7 @@ replay: if (!fh) { if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) { tcf_chain_tp_remove(chain, &chain_info, tp); - tfilter_notify(net, skb, n, tp, q, parent, fh, + tfilter_notify(net, skb, n, tp, block, q, parent, fh, RTM_DELTFILTER, false); tcf_proto_destroy(tp); err = 0; @@ -892,8 +1181,8 @@ replay: } break; case RTM_DELTFILTER: - err = tfilter_del_notify(net, skb, n, tp, q, parent, - fh, false, &last); + err = tfilter_del_notify(net, skb, n, tp, block, + q, parent, fh, false, &last); if (err) goto errout; if (last) { @@ -902,8 +1191,8 @@ replay: } goto errout; case RTM_GETTFILTER: - err = tfilter_notify(net, skb, n, tp, q, parent, fh, - RTM_NEWTFILTER, true); + err = tfilter_notify(net, skb, n, tp, block, q, parent, + fh, RTM_NEWTFILTER, true); goto errout; default: err = -EINVAL; @@ -916,7 +1205,7 @@ replay: if (err == 0) { if (tp_created) tcf_chain_tp_insert(chain, &chain_info, tp); - tfilter_notify(net, skb, n, tp, q, parent, fh, + tfilter_notify(net, skb, n, tp, block, q, parent, fh, RTM_NEWTFILTER, false); } else { if (tp_created) @@ -936,6 +1225,7 @@ struct tcf_dump_args { struct tcf_walker w; struct sk_buff *skb; struct netlink_callback *cb; + struct tcf_block *block; struct Qdisc *q; u32 parent; }; @@ -945,7 +1235,7 @@ static int tcf_node_dump(struct tcf_proto *tp, void *n, struct tcf_walker *arg) struct tcf_dump_args *a = (void *)arg; struct net *net = sock_net(a->skb->sk); - return tcf_fill_node(net, a->skb, tp, a->q, a->parent, + return tcf_fill_node(net, a->skb, tp, a->block, a->q, a->parent, n, NETLINK_CB(a->cb->skb).portid, a->cb->nlh->nlmsg_seq, NLM_F_MULTI, RTM_NEWTFILTER); @@ -956,6 +1246,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent, long index_start, long *p_index) { struct net *net = sock_net(skb->sk); + struct tcf_block *block = chain->block; struct tcmsg *tcm = nlmsg_data(cb->nlh); struct tcf_dump_args arg; struct tcf_proto *tp; @@ -974,7 +1265,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent, memset(&cb->args[1], 0, sizeof(cb->args) - sizeof(cb->args[0])); if (cb->args[1] == 0) { - if (tcf_fill_node(net, skb, tp, q, parent, 0, + if (tcf_fill_node(net, skb, tp, block, q, parent, 0, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, NLM_F_MULTI, RTM_NEWTFILTER) <= 0) @@ -987,6 +1278,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent, arg.w.fn = tcf_node_dump; arg.skb = skb; arg.cb = cb; + arg.block = block; arg.q = q; arg.parent = parent; arg.w.stop = 0; @@ -1005,13 +1297,10 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb) { struct net *net = sock_net(skb->sk); struct nlattr *tca[TCA_MAX + 1]; - struct net_device *dev; - struct Qdisc *q; + struct Qdisc *q = NULL; struct tcf_block *block; struct tcf_chain *chain; struct tcmsg *tcm = nlmsg_data(cb->nlh); - unsigned long cl = 0; - const struct Qdisc_class_ops *cops; long index_start; long index; u32 parent; @@ -1024,32 +1313,44 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb) if (err) return err; - dev = __dev_get_by_index(net, tcm->tcm_ifindex); - if (!dev) - return skb->len; - - parent = tcm->tcm_parent; - if (!parent) { - q = dev->qdisc; - parent = q->handle; + if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) { + block = tcf_block_lookup(net, tcm->tcm_block_index); + if (!block) + goto out; } else { - q = qdisc_lookup(dev, TC_H_MAJ(tcm->tcm_parent)); - } - if (!q) - goto out; - cops = q->ops->cl_ops; - if (!cops) - goto out; - if (!cops->tcf_block) - goto out; - if (TC_H_MIN(tcm->tcm_parent)) { - cl = cops->find(q, tcm->tcm_parent); - if (cl == 0) + const struct Qdisc_class_ops *cops; + struct net_device *dev; + unsigned long cl = 0; + + dev = __dev_get_by_index(net, tcm->tcm_ifindex); + if (!dev) + return skb->len; + + parent = tcm->tcm_parent; + if (!parent) { + q = dev->qdisc; + parent = q->handle; + } else { + q = qdisc_lookup(dev, TC_H_MAJ(tcm->tcm_parent)); + } + if (!q) + goto out; + cops = q->ops->cl_ops; + if (!cops) + goto out; + if (!cops->tcf_block) + goto out; + if (TC_H_MIN(tcm->tcm_parent)) { + cl = cops->find(q, tcm->tcm_parent); + if (cl == 0) + goto out; + } + block = cops->tcf_block(q, cl, NULL); + if (!block) goto out; + if (tcf_block_shared(block)) + q = NULL; } - block = cops->tcf_block(q, cl, NULL); - if (!block) - goto out; index_start = cb->args[0]; index = 0; @@ -1252,18 +1553,50 @@ int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts, } EXPORT_SYMBOL(tc_setup_cb_call); +static __net_init int tcf_net_init(struct net *net) +{ + struct tcf_net *tn = net_generic(net, tcf_net_id); + + idr_init(&tn->idr); + return 0; +} + +static void __net_exit tcf_net_exit(struct net *net) +{ + struct tcf_net *tn = net_generic(net, tcf_net_id); + + idr_destroy(&tn->idr); +} + +static struct pernet_operations tcf_net_ops = { + .init = tcf_net_init, + .exit = tcf_net_exit, + .id = &tcf_net_id, + .size = sizeof(struct tcf_net), +}; + static int __init tc_filter_init(void) { + int err; + tc_filter_wq = alloc_ordered_workqueue("tc_filter_workqueue", 0); if (!tc_filter_wq) return -ENOMEM; + err = register_pernet_subsys(&tcf_net_ops); + if (err) + goto err_register_pernet_subsys; + rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, 0); rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL, 0); rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_ctl_tfilter, tc_dump_tfilter, 0); return 0; + +err_register_pernet_subsys: + destroy_workqueue(tc_filter_wq); + return err; } subsys_initcall(tc_filter_init); |