aboutsummaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2020-03-12Revert "net: sched: make newly activated qdiscs visible"Julian Wiedmann1-21/+0
This reverts commit 4cda75275f9f89f9485b0ca4d6950c95258a9bce from net-next. Brown bag time. Michal noticed that this change doesn't work at all when netif_set_real_num_tx_queues() gets called prior to an initial dev_activate(), as for instance igb does. Doing so dies with: [ 40.579142] BUG: kernel NULL pointer dereference, address: 0000000000000400 [ 40.586922] #PF: supervisor read access in kernel mode [ 40.592668] #PF: error_code(0x0000) - not-present page [ 40.598405] PGD 0 P4D 0 [ 40.601234] Oops: 0000 [#1] PREEMPT SMP PTI [ 40.605909] CPU: 18 PID: 1681 Comm: wickedd Tainted: G E 5.6.0-rc3-ethnl.50-default #1 [ 40.616205] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013 [ 40.627377] RIP: 0010:qdisc_hash_add.part.22+0x2e/0x90 [ 40.633115] Code: 00 55 53 89 f5 48 89 fb e8 2f 9b fb ff 85 c0 74 44 48 8b 43 40 48 8b 08 69 43 38 47 86 c8 61 c1 e8 1c 48 83 e8 80 48 8d 14 c1 <48> 8b 04 c1 48 8d 4b 28 48 89 53 30 48 89 43 28 48 85 c0 48 89 0a [ 40.654080] RSP: 0018:ffffb879864934d8 EFLAGS: 00010203 [ 40.659914] RAX: 0000000000000080 RBX: ffffffffb8328d80 RCX: 0000000000000000 [ 40.667882] RDX: 0000000000000400 RSI: 0000000000000000 RDI: ffffffffb831faa0 [ 40.675849] RBP: 0000000000000000 R08: ffffa0752c8b9088 R09: ffffa0752c8b9208 [ 40.683816] R10: 0000000000000006 R11: 0000000000000000 R12: ffffa0752d734000 [ 40.691783] R13: 0000000000000008 R14: 0000000000000000 R15: ffffa07113c18000 [ 40.699750] FS: 00007f94548e5880(0000) GS:ffffa0752e980000(0000) knlGS:0000000000000000 [ 40.708782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.715189] CR2: 0000000000000400 CR3: 000000082b6ae006 CR4: 00000000001606e0 [ 40.723156] Call Trace: [ 40.725888] dev_qdisc_set_real_num_tx_queues+0x61/0x90 [ 40.731725] netif_set_real_num_tx_queues+0x94/0x1d0 [ 40.737286] __igb_open+0x19a/0x5d0 [igb] [ 40.741767] __dev_open+0xbb/0x150 [ 40.745567] __dev_change_flags+0x157/0x1a0 [ 40.750240] dev_change_flags+0x23/0x60 [...] Fixes: 4cda75275f9f ("net: sched: make newly activated qdiscs visible") Reported-by: Michal Kubecek <[email protected]> CC: Michal Kubecek <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jamal Hadi Salim <[email protected]> CC: Cong Wang <[email protected]> CC: Jiri Pirko <[email protected]> Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-11net: sched: make newly activated qdiscs visibleJulian Wiedmann1-0/+21
In their .attach callback, mq[prio] only add the qdiscs of the currently active TX queues to the device's qdisc hash list. If a user later increases the number of active TX queues, their qdiscs are not visible via eg. 'tc qdisc show'. Add a hook to netif_set_real_num_tx_queues() that walks all active TX queues and adds those which are missing to the hash list. CC: Eric Dumazet <[email protected]> CC: Jamal Hadi Salim <[email protected]> CC: Cong Wang <[email protected]> CC: Jiri Pirko <[email protected]> Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-09net: sched: pie: change tc_pie_xstats->probLeslie Monis1-1/+1
Commit 105e808c1da2 ("pie: remove pie_vars->accu_prob_overflows") changes the scale of probability values in PIE from (2^64 - 1) to (2^56 - 1). This affects the precision of tc_pie_xstats->prob in user space. This patch ensures user space is unaffected. Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08net/sched: act_ct: fix lockdep splat in tcf_ct_flow_table_getEric Dumazet1-13/+11
Convert zones_lock spinlock to zones_mutex mutex, and struct (tcf_ct_flow_table)->ref to a refcount, so that control path can use regular GFP_KERNEL allocations from standard process context. This is more robust in case of memory pressure. The refcount is needed because tcf_ct_flow_table_put() can be called from RCU callback, thus in BH context. The issue was spotted by syzbot, as rhashtable_init() was called with a spinlock held, which is bad since GFP_KERNEL allocations can sleep. Note to developers : Please make sure your patches are tested with CONFIG_DEBUG_ATOMIC_SLEEP=y BUG: sleeping function called from invalid context at mm/slab.h:565 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9582, name: syz-executor610 2 locks held by syz-executor610/9582: #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:72 [inline] #0: ffffffff8a34eb80 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x3f9/0xad0 net/core/rtnetlink.c:5437 #1: ffffffff8a3961b8 (zones_lock){+...}, at: spin_lock_bh include/linux/spinlock.h:343 [inline] #1: ffffffff8a3961b8 (zones_lock){+...}, at: tcf_ct_flow_table_get+0xa3/0x1700 net/sched/act_ct.c:67 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 0 PID: 9582 Comm: syz-executor610 Not tainted 5.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 ___might_sleep.cold+0x1f4/0x23d kernel/sched/core.c:6798 slab_pre_alloc_hook mm/slab.h:565 [inline] slab_alloc_node mm/slab.c:3227 [inline] kmem_cache_alloc_node_trace+0x272/0x790 mm/slab.c:3593 __do_kmalloc_node mm/slab.c:3615 [inline] __kmalloc_node+0x38/0x60 mm/slab.c:3623 kmalloc_node include/linux/slab.h:578 [inline] kvmalloc_node+0x61/0xf0 mm/util.c:574 kvmalloc include/linux/mm.h:645 [inline] kvzalloc include/linux/mm.h:653 [inline] bucket_table_alloc+0x8b/0x480 lib/rhashtable.c:175 rhashtable_init+0x3d2/0x750 lib/rhashtable.c:1054 nf_flow_table_init+0x16d/0x310 net/netfilter/nf_flow_table_core.c:498 tcf_ct_flow_table_get+0xe33/0x1700 net/sched/act_ct.c:82 tcf_ct_init+0xba4/0x18a6 net/sched/act_ct.c:1050 tcf_action_init_1+0x697/0xa20 net/sched/act_api.c:945 tcf_action_init+0x1e9/0x2f0 net/sched/act_api.c:1001 tcf_action_add+0xdb/0x370 net/sched/act_api.c:1411 tc_ctl_action+0x366/0x456 net/sched/act_api.c:1466 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5440 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2478 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0xec/0x1b0 net/socket.c:2430 do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4403d9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd719af218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9 RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000005 R09: 00000000004002c8 R10: 0000000000000008 R11: 00000000000 Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: Eric Dumazet <[email protected]> Cc: Paul Blakey <[email protected]> Cc: Jiri Pirko <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08sched: act: allow user to specify type of HW stats for a filterJiri Pirko2-0/+43
Currently, user who is adding an action expects HW to report stats, however it does not have exact expectations about the stats types. That is aligned with TCA_ACT_HW_STATS_TYPE_ANY. Allow user to specify the type of HW stats for an action and require it. Pass the information down to flow_offload layer. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-05net: sched: Make FIFO Qdisc offloadablePetr Machata1-6/+91
Invoke ndo_setup_tc() as appropriate to signal init / replacement, destroying and dumping of pFIFO / bFIFO Qdisc. A lot of the FIFO logic is used for pFIFO_head_drop as well, but that's a semantically very different Qdisc that isn't really in the same boat as pFIFO / bFIFO. Split some of the functions to keep the Qdisc intact. Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-04pie: remove pie_vars->accu_prob_overflowsLeslie Monis2-16/+6
The variable pie_vars->accu_prob is used as an accumulator for probability values. Since probabilty values are scaled using the MAX_PROB macro denoting (2^64 - 1), pie_vars->accu_prob is likely to overflow as it is of type u64. The variable pie_vars->accu_prob_overflows counts the number of times the variable pie_vars->accu_prob overflows. The MAX_PROB macro needs to be equal to at least (2^39 - 1) in order to do precise calculations without any underflow. Thus MAX_PROB can be reduced to (2^56 - 1) without affecting the precision in calculations drastically. Doing so will eliminate the need for the variable pie_vars->accu_prob_overflows as the variable pie_vars->accu_prob will never overflow. Removing the variable pie_vars->accu_prob_overflows also reduces the size of the structure pie_vars to exactly 64 bytes. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-04pie: remove unnecessary type castingLeslie Monis1-2/+2
In function pie_calculate_probability(), the variables alpha and beta are of type u64. The variables qdelay, qdelay_old and params->target are of type psched_time_t (which is also u64). The explicit type casting done when calculating the value for the variable delta is redundant and not required. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-04pie: use term backlog instead of qlenLeslie Monis1-11/+11
Remove ambiguity by using the term backlog instead of qlen when representing the queue length in bytes. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-04net/sched: act_ct: Use pskb_network_may_pull()Paul Blakey1-8/+8
To make the filler functions more generic, use network relative skb pulling. Signed-off-by: Paul Blakey <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-04net/sched: act_ct: Fix ipv6 lookup of offloaded connectionsPaul Blakey1-34/+26
When checking the protocol number tcf_ct_flow_table_lookup() handles the flow as if it's always ipv4, while it can be ipv6. Instead, refactor the code to fetch the tcp header, if available, in the relevant family (ipv4/ipv6) filler function, and do the check on the returned tcp header. Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows") Signed-off-by: Paul Blakey <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-03net/sched: act_ct: Software offload of established flowsPaul Blakey1-2/+158
Offload nf conntrack processing by looking up the 5-tuple in the zone's flow table. The nf conntrack module will process the packets until a connection is in established state. Once in established state, the ct state pointer (nf_conn) will be restored on the skb from a successful ft lookup. Signed-off-by: Paul Blakey <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-03net/sched: act_ct: Offload established connections to flow tablePaul Blakey1-0/+63
Add a ft entry when connections enter an established state and delete the connections when they leave the established state. The flow table assumes ownership of the connection. In the following patch act_ct will lookup the ct state from the FT. In future patches, drivers will register for callbacks for ft add/del events and will be able to use the information to offload the connections. Note that connection aging is managed by the FT. Signed-off-by: Paul Blakey <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-03net/sched: act_ct: Create nf flow table per zonePaul Blakey2-2/+134
Use the NF flow tables infrastructure for CT offload. Create a nf flow table per zone. Next patches will add FT entries to this table, and do the software offload. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-03net: taprio: add missing attribute validation for txtime delayJakub Kicinski1-0/+1
Add missing attribute validation for TCA_TAPRIO_ATTR_TXTIME_DELAY to the netlink policy. Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-03net: fq: add missing attribute validation for orphan maskJakub Kicinski1-0/+1
Add missing attribute validation for TCA_FQ_ORPHAN_MASK to the netlink policy. Fixes: 06eb395fa985 ("pkt_sched: fq: better control of DDOS traffic") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-29net: sched: Replace zero-length array with flexible-array memberGustavo A. R. Silva4-4/+4
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+1
The mptcp conflict was overlapping additions. The SMC conflict was an additional and removal happening at the same time. Signed-off-by: David S. Miller <[email protected]>
2020-02-26sched: act: count in the size of action flags bitfieldJiri Pirko1-0/+1
The put of the flags was added by the commit referenced in fixes tag, however the size of the message was not extended accordingly. Fix this by adding size of the flags bitfield to the message size. Fixes: e38226786022 ("net: sched: update action implementations to support flags") Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-25flow_offload: pass action cookie through offload structuresJiri Pirko1-1/+30
Extend struct flow_action_entry in order to hold TC action cookie specified by user inserting the action. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+1
Conflict resolution of ice_virtchnl_pf.c based upon work by Stephen Rothwell. Signed-off-by: David S. Miller <[email protected]>
2020-02-19net: sched: Support specifying a starting chain via tc skb extPaul Blakey1-4/+35
Set the starting chain from the tc skb ext chain value. Once we read the tc skb ext, delete it, so cloned/redirect packets won't inherit it. In order to lookup a chain by the chain index on the ingress block at ingress classification, provide a lookup function. Co-developed-by: Vlad Buslov <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-02-19net: sched: Change the block's chain list to an rcu listPaul Blakey1-2/+3
To allow lookup of a block's chain under atomic context. Co-developed-by: Vlad Buslov <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-02-19net: sched: Pass ingress block to tcf_classify_ingressPaul Blakey3-2/+21
On ingress and cls_act qdiscs init, save the block on ingress mini_Qdisc and and pass it on to ingress classification, so it can be used for the looking up a specified chain index. Co-developed-by: Vlad Buslov <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-02-19net: sched: Introduce ingress classification functionPaul Blakey1-14/+44
TC multi chain configuration can cause offloaded tc chains to miss in hardware after jumping to some chain. In such cases the software should continue from the chain that missed in hardware, as the hardware may have manipulated the packet and updated some counters. Currently a single tcf classification function serves both ingress and egress. However, multi chain miss processing (get tc skb extension on hw miss, set tc skb extension on tc miss) should happen only on ingress. Refactor the code to use ingress classification function, and move setting the tc skb extension from general classification to it, as a prestep for supporting the hw miss scenario. Co-developed-by: Vlad Buslov <[email protected]> Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2020-02-17net: sched: correct flower port blockingJason Baron1-0/+1
tc flower rules that are based on src or dst port blocking are sometimes ineffective due to uninitialized stack data. __skb_flow_dissect() extracts ports from the skb for tc flower to match against. However, the port dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in key_control->flags. All callers of __skb_flow_dissect(), zero-out the key_control field except for fl_classify() as used by the flower classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to __skb_flow_dissect(), since key_control is allocated on the stack and may not be initialized. Since key_basic and key_control are present for all flow keys, let's make sure they are initialized. Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments") Co-developed-by: Eric Dumazet <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: Jason Baron <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-17net: sched: don't take rtnl lock during flow_action setupVlad Buslov3-13/+5
Refactor tc_setup_flow_action() function not to use rtnl lock and remove 'rtnl_held' argument that is no longer needed. Signed-off-by: Vlad Buslov <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-17net: sched: lock action when translating it to flow_action infraVlad Buslov2-8/+11
In order to remove dependency on rtnl lock, take action's tcfa_lock when constructing its representation as flow_action_entry structure. Refactor tcf_sample_get_group() to assume that caller holds tcf_lock and don't take it manually. This callback is only called from flow_action infra representation translator which now calls it with tcf_lock held, so this refactoring is necessary to prevent deadlock. Allocate memory with GFP_ATOMIC flag for ip_tunnel_info copy because tcf_tunnel_info_copy() is only called from flow_action representation infra code with tcf_lock spinlock taken. Signed-off-by: Vlad Buslov <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-13net/sched: flower: add missing validation of TCA_FLOWER_FLAGSDavide Caratti1-0/+1
unlike other classifiers that can be offloaded (i.e. users can set flags like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry to fl_policy. Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support") Signed-off-by: Davide Caratti <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-13net/sched: matchall: add missing validation of TCA_MATCHALL_FLAGSDavide Caratti1-0/+1
unlike other classifiers that can be offloaded (i.e. users can set flags like 'skip_hw' and 'skip_sw'), 'cls_matchall' doesn't validate the size of netlink attribute 'TCA_MATCHALL_FLAGS' provided by user: add a proper entry to mall_policy. Fixes: b87f7936a932 ("net/sched: Add match-all classifier hw offloading.") Signed-off-by: Davide Caratti <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-07taprio: Fix dropping packets when using taprio + ETF offloadingVinicius Costa Gomes1-2/+2
When using taprio offloading together with ETF offloading, configured like this, for example: $ tc qdisc replace dev $IFACE parent root handle 100 taprio \ num_tc 4 \ map 2 2 1 0 3 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 1@2 1@3 \ base-time $BASE_TIME \ sched-entry S 01 1000000 \ sched-entry S 0e 1000000 \ flags 0x2 $ tc qdisc replace dev $IFACE parent 100:1 etf \ offload delta 300000 clockid CLOCK_TAI During enqueue, it works out that the verification added for the "txtime" assisted mode is run when using taprio + ETF offloading, the only thing missing is initializing the 'next_txtime' of all the cycle entries. (if we don't set 'next_txtime' all packets from SO_TXTIME sockets are dropped) Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Signed-off-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-07taprio: Use taprio_reset_tc() to reset Traffic Classes configurationVinicius Costa Gomes1-1/+1
When destroying the current taprio instance, which can happen when the creation of one fails, we should reset the traffic class configuration back to the default state. netdev_reset_tc() is a better way because in addition to setting the number of traffic classes to zero, it also resets the priority to traffic classes mapping to the default value. Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-07taprio: Add missing policy validation for flagsVinicius Costa Gomes1-0/+1
netlink policy validation for the 'flags' argument was missing. Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Signed-off-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-07taprio: Fix still allowing changing the flags during runtimeVinicius Costa Gomes1-20/+41
Because 'q->flags' starts as zero, and zero is a valid value, we aren't able to detect the transition from zero to something else during "runtime". The solution is to initialize 'q->flags' with an invalid value, so we can detect if 'q->flags' was set by the user or not. To better solidify the behavior, 'flags' handling is moved to a separate function. The behavior is: - 'flags' if unspecified by the user, is assumed to be zero; - 'flags' cannot change during "runtime" (i.e. a change() request cannot modify it); With this new function we can remove taprio_flags, which should reduce the risk of future accidents. Allowing flags to be changed was causing the following RCU stall: [ 1730.558249] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 1730.558258] rcu: 6-...0: (190 ticks this GP) idle=922/0/0x1 softirq=25580/25582 fqs=16250 [ 1730.558264] (detected by 2, t=65002 jiffies, g=33017, q=81) [ 1730.558269] Sending NMI from CPU 2 to CPUs 6: [ 1730.559277] NMI backtrace for cpu 6 [ 1730.559277] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G E 5.5.0-rc6+ #35 [ 1730.559278] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019 [ 1730.559278] RIP: 0010:__hrtimer_run_queues+0xe2/0x440 [ 1730.559278] Code: 48 8b 43 28 4c 89 ff 48 8b 75 c0 48 89 45 c8 e8 f4 bb 7c 00 0f 1f 44 00 00 65 8b 05 40 31 f0 68 89 c0 48 0f a3 05 3e 5c 25 01 <0f> 82 fc 01 00 00 48 8b 45 c8 48 89 df ff d0 89 45 c8 0f 1f 44 00 [ 1730.559279] RSP: 0018:ffff9970802d8f10 EFLAGS: 00000083 [ 1730.559279] RAX: 0000000000000006 RBX: ffff8b31645bff38 RCX: 0000000000000000 [ 1730.559280] RDX: 0000000000000000 RSI: ffffffff9710f2ec RDI: ffffffff978daf0e [ 1730.559280] RBP: ffff9970802d8f68 R08: 0000000000000000 R09: 0000000000000000 [ 1730.559280] R10: 0000018336d7944e R11: 0000000000000001 R12: ffff8b316e39f9c0 [ 1730.559281] R13: ffff8b316e39f940 R14: ffff8b316e39f998 R15: ffff8b316e39f7c0 [ 1730.559281] FS: 0000000000000000(0000) GS:ffff8b316e380000(0000) knlGS:0000000000000000 [ 1730.559281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1730.559281] CR2: 00007f1105303760 CR3: 0000000227210005 CR4: 00000000003606e0 [ 1730.559282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1730.559282] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1730.559282] Call Trace: [ 1730.559282] <IRQ> [ 1730.559283] ? taprio_dequeue_soft+0x2d0/0x2d0 [sch_taprio] [ 1730.559283] hrtimer_interrupt+0x104/0x220 [ 1730.559283] ? irqtime_account_irq+0x34/0xa0 [ 1730.559283] smp_apic_timer_interrupt+0x6d/0x230 [ 1730.559284] apic_timer_interrupt+0xf/0x20 [ 1730.559284] </IRQ> [ 1730.559284] RIP: 0010:cpu_idle_poll+0x35/0x1a0 [ 1730.559285] Code: 88 82 ff 65 44 8b 25 12 7d 73 68 0f 1f 44 00 00 e8 90 c3 89 ff fb 65 48 8b 1c 25 c0 7e 01 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 be a8 a8 00 85 c0 75 ed e8 75 48 84 ff [ 1730.559285] RSP: 0018:ffff997080137ea8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 [ 1730.559285] RAX: 0000000000000001 RBX: ffff8b316bc3c580 RCX: 0000000000000000 [ 1730.559286] RDX: 0000000000000001 RSI: 000000002819aad9 RDI: ffffffff978da730 [ 1730.559286] RBP: ffff997080137ec0 R08: 0000018324a6d387 R09: 0000000000000000 [ 1730.559286] R10: 0000000000000400 R11: 0000000000000001 R12: 0000000000000006 [ 1730.559286] R13: ffff8b316bc3c580 R14: 0000000000000000 R15: 0000000000000000 [ 1730.559287] ? cpu_idle_poll+0x20/0x1a0 [ 1730.559287] ? cpu_idle_poll+0x20/0x1a0 [ 1730.559287] do_idle+0x4d/0x1f0 [ 1730.559287] ? complete+0x44/0x50 [ 1730.559288] cpu_startup_entry+0x1b/0x20 [ 1730.559288] start_secondary+0x142/0x180 [ 1730.559288] secondary_startup_64+0xb6/0xc0 [ 1776.686313] nvme nvme0: I/O 96 QID 1 timeout, completion polled Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Signed-off-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-07taprio: Fix enabling offload with wrong number of traffic classesVinicius Costa Gomes1-13/+13
If the driver implementing taprio offloading depends on the value of the network device number of traffic classes (dev->num_tc) for whatever reason, it was going to receive the value zero. The value was only set after the offloading function is called. So, moving setting the number of traffic classes to before the offloading function is called fixes this issue. This is safe because this only happens when taprio is instantiated (we don't allow this configuration to be changed without first removing taprio). Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading") Reported-by: Po Liu <[email protected]> Signed-off-by: Vinicius Costa Gomes <[email protected]> Acked-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-06net: sched: prevent a use after freeDan Carpenter1-1/+1
The bug is that we call kfree_skb(skb) and then pass "skb" to qdisc_pkt_len(skb) on the next line, which is a use after free. Also Cong Wang points out that it's better to delay the actual frees until we drop the rtnl lock so we should use rtnl_kfree_skbs() instead of kfree_skb(). Cc: Cong Wang <[email protected]> Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-05net_sched: fix a resource leak in tcindex_set_parms()Cong Wang1-2/+1
Jakub noticed there is a potential resource leak in tcindex_set_parms(): when tcindex_filter_result_init() fails and it jumps to 'errout1' which doesn't release the memory and resources allocated by tcindex_alloc_perfect_hash(). We should just jump to 'errout_alloc' which calls tcindex_free_perfect_hash(). Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") Reported-by: Jakub Kicinski <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-04net_sched: fix an OOB access in cls_tcindexCong Wang1-20/+20
As Eric noticed, tcindex_alloc_perfect_hash() uses cp->hash to compute the size of memory allocation, but cp->hash is set again after the allocation, this caused an out-of-bound access. So we have to move all cp->hash initialization and computation before the memory allocation. Move cp->mask and cp->shift together as cp->hash may need them for computation too. Reported-and-tested-by: [email protected] Fixes: 331b72922c5f ("net: sched: RCU cls_tcindex") Cc: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Cc: Jakub Kicinski <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-01cls_rsvp: fix rsvp_policyEric Dumazet1-4/+2
NLA_BINARY can be confusing, since .len value represents the max size of the blob. cls_rsvp really wants user space to provide long enough data for TCA_RSVP_DST and TCA_RSVP_SRC attributes. BUG: KMSAN: uninit-value in rsvp_get net/sched/cls_rsvp.h:258 [inline] BUG: KMSAN: uninit-value in gen_handle net/sched/cls_rsvp.h:402 [inline] BUG: KMSAN: uninit-value in rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572 CPU: 1 PID: 13228 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 rsvp_get net/sched/cls_rsvp.h:258 [inline] gen_handle net/sched/cls_rsvp.h:402 [inline] rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572 tc_new_tfilter+0x31fe/0x5010 net/sched/cls_api.c:2104 rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45b349 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f269d43dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f269d43e6d4 RCX: 000000000045b349 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000009c2 R14: 00000000004cb338 R15: 000000000075bfd4 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 6fa8c0144b77 ("[NET_SCHED]: Use nla_policy for attribute validation in classifiers") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-01-29sch_choke: Use kvcallocJoe Perches1-1/+1
Convert the use of kvmalloc_array with __GFP_ZERO to the equivalent kvcalloc. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net_sched: ematch: reject invalid TCF_EM_SIMPLEEric Dumazet1-0/+3
It is possible for malicious userspace to set TCF_EM_SIMPLE bit even for matches that should not have this bit set. This can fool two places using tcf_em_is_simple() 1) tcf_em_tree_destroy() -> memory leak of em->data if ops->destroy() is NULL 2) tcf_em_tree_dump() wrongly report/leak 4 low-order bytes of a kernel pointer. BUG: memory leak unreferenced object 0xffff888121850a40 (size 32): comm "syz-executor927", pid 7193, jiffies 4294941655 (age 19.840s) hex dump (first 32 bytes): 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f67036ea>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000f67036ea>] slab_post_alloc_hook mm/slab.h:586 [inline] [<00000000f67036ea>] slab_alloc mm/slab.c:3320 [inline] [<00000000f67036ea>] __do_kmalloc mm/slab.c:3654 [inline] [<00000000f67036ea>] __kmalloc_track_caller+0x165/0x300 mm/slab.c:3671 [<00000000fab0cc8e>] kmemdup+0x27/0x60 mm/util.c:127 [<00000000d9992e0a>] kmemdup include/linux/string.h:453 [inline] [<00000000d9992e0a>] em_nbyte_change+0x5b/0x90 net/sched/em_nbyte.c:32 [<000000007e04f711>] tcf_em_validate net/sched/ematch.c:241 [inline] [<000000007e04f711>] tcf_em_tree_validate net/sched/ematch.c:359 [inline] [<000000007e04f711>] tcf_em_tree_validate+0x332/0x46f net/sched/ematch.c:300 [<000000007a769204>] basic_set_parms net/sched/cls_basic.c:157 [inline] [<000000007a769204>] basic_change+0x1d7/0x5f0 net/sched/cls_basic.c:219 [<00000000e57a5997>] tc_new_tfilter+0x566/0xf70 net/sched/cls_api.c:2104 [<0000000074b68559>] rtnetlink_rcv_msg+0x3b2/0x4b0 net/core/rtnetlink.c:5415 [<00000000b7fe53fb>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<00000000e83a40d0>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000d62ba933>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000d62ba933>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<0000000088070f72>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<00000000f70b15ea>] sock_sendmsg_nosec net/socket.c:639 [inline] [<00000000f70b15ea>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<00000000ef95a9be>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<00000000b650f1ab>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<0000000055bfa74a>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<000000002abac183>] __do_sys_sendmsg net/socket.c:2426 [inline] [<000000002abac183>] __se_sys_sendmsg net/socket.c:2424 [inline] [<000000002abac183>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: [email protected] Cc: Cong Wang <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net_sched: walk through all child classes in tc_bind_tclass()Cong Wang1-11/+30
In a complex TC class hierarchy like this: tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit \ avpkt 1000 cell 8 tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \ rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 \ avpkt 1000 bounded tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 80 0xffff flowid 1:3 tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 25 0xffff flowid 1:4 tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \ rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \ rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 where filters are installed on qdisc 1:0, so we can't merely search from class 1:1 when creating class 1:3 and class 1:4. We have to walk through all the child classes of the direct parent qdisc. Otherwise we would miss filters those need reverse binding. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net_sched: fix ops->bind_class() implementationsCong Wang10-29/+76
The current implementations of ops->bind_class() are merely searching for classid and updating class in the struct tcf_result, without invoking either of cl_ops->bind_tcf() or cl_ops->unbind_tcf(). This breaks the design of them as qdisc's like cbq use them to count filters too. This is why syzbot triggered the warning in cbq_destroy_class(). In order to fix this, we have to call cl_ops->bind_tcf() and cl_ops->unbind_tcf() like the filter binding path. This patch does so by refactoring out two helper functions __tcf_bind_filter() and __tcf_unbind_filter(), which are lockless and accept a Qdisc pointer, then teaching each implementation to call them correctly. Note, we merely pass the Qdisc pointer as an opaque pointer to each filter, they only need to pass it down to the helper functions without understanding it at all. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Reported-and-tested-by: [email protected] Reported-and-tested-by: [email protected] Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2-4/+3
Minor conflict in mlx5 because changes happened to code that has moved meanwhile. Signed-off-by: David S. Miller <[email protected]>
2020-01-25net: sched: Make TBF Qdisc offloadablePetr Machata1-0/+55
Invoke ndo_setup_tc as appropriate to signal init / replacement, destroying and dumping of TBF Qdisc. Signed-off-by: Petr Machata <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-25net: sched: sch_tbf: Don't overwrite backlog before dumpingPetr Machata1-1/+0
In 2011, in commit b0460e4484f9 ("sch_tbf: report backlog information"), TBF started copying backlog depth from the child Qdisc before dumping, with the motivation that the backlog was otherwise not visible in "tc -s qdisc show". Later, in 2016, in commit 8d5958f424b6 ("sch_tbf: update backlog as well"), TBF got a full-blown backlog tracking. However it kept copying the child's backlog over before dumping. That line is now unnecessary, so remove it. As shown in the following example, backlog is still reported correctly: # tc -s qdisc show dev veth0 invisible qdisc tbf 1: root refcnt 2 rate 1Mbit burst 128Kb lat 82.8s Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 812544 requeues 0) backlog 81972b 66p requeues 0 qdisc bfifo 0: parent 1:1 limit 10Mb Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 0 requeues 0) backlog 81972b 66p requeues 0 Signed-off-by: Petr Machata <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23net_sched: fix datalen for ematchCong Wang1-1/+1
syzbot reported an out-of-bound access in em_nbyte. As initially analyzed by Eric, this is because em_nbyte sets its own em->datalen in em_nbyte_change() other than the one specified by user, but this value gets overwritten later by its caller tcf_em_validate(). We should leave em->datalen untouched to respect their choices. I audit all the in-tree ematch users, all of those implement ->change() set em->datalen, so we can just avoid setting it twice in this case. Reported-and-tested-by: [email protected] Reported-by: [email protected] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Eric Dumazet <[email protected]> Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23net: sched: add Flow Queue PIE packet schedulerMohit P. Tahiliani3-0/+576
Principles: - Packets are classified on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed to the same slot) - Each flow has a PIE managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered. - Drops during enqueue only. - ECN capability is off by default. - ECN threshold (if ECN is enabled) is at 10% by default. - Uses timestamps to calculate queue delay by default. Usage: tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ] [ target TIME ] [ tupdate TIME ] [ alpha NUMBER ] [ beta NUMBER ] [ quantum BYTES ] [ memory_limit BYTES ] [ ecnprob PERCENTAGE ] [ [no]ecn ] [ [no]bytemode ] [ [no_]dq_rate_estimator ] defaults: limit: 10240 packets, flows: 1024 target: 15 ms, tupdate: 15 ms (in jiffies) alpha: 1/8, beta : 5/4 quantum: device MTU, memory_limit: 32 Mb ecnprob: 10%, ecn: off bytemode: off, dq_rate_estimator: off Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Sachin D. Patil <[email protected]> Signed-off-by: V. Saicharan <[email protected]> Signed-off-by: Mohit Bhasi <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23net: sched: pie: export symbols to be reused by FQ-PIEMohit P. Tahiliani1-85/+88
This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23net: sched: pie: fix alignment in struct instancesMohit P. Tahiliani1-9/+9
Make the alignment in the initialization of the struct instances consistent in the file. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>