aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-05-19net/mlx5e: E-Switch: move debug print of adding mac to correct placeRoi Dayan1-3/+4
Move the debug print inside the if clause that actually does the change. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5e: E-Switch, Check device is PF when stopping esw offloadsRoi Dayan1-1/+1
Checking sriov is done on the pci device so it can return true on other devices like SF but nothing should be done in this case. Add a check that the device is PF. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5: Remove redundant vport_group_manager cap checkRoi Dayan1-4/+1
It's enough to check for esw_manager cap for get the esw flow table caps. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rulesRoi Dayan1-19/+43
Like other rules use metadata matching if supported instead of source_port. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5e: E-Switch, Allow get vport api if esw existsRoi Dayan1-1/+1
We could have an esw manager device which is not a vport group manager. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5e: E-Switch, Update when to set other vport contextRoi Dayan3-3/+6
Other vport context should be set if vport number is not 0. In case of ECPF, vport 0 represents the host PF representor so also need to set other vport context. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5e: Remove redundant __func__ arg from fs_err() callsRoi Dayan1-7/+5
fs_err() already logs the function name. remote the arg so the function name will not be logged twice. Signed-off-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5e: E-Switch, Remove flow_source check for metadata matchingRoi Dayan1-3/+0
There is no reason to check for flow_source cap to allow metadata matching. When flow_source match is being used the flow_source cap is being checked. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5: E-Switch, Remove redundant checkRoi Dayan1-4/+0
The call to mlx5_eswitch_enable() also does the same check and if E-Switch not supported it returns 0 without any change. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19net/mlx5: Remove redundant esw multiport validate functionRoi Dayan1-22/+1
The function didn't validate the value and doesn't require value validation as it will always be valid true or false values. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-19Merge branch 'bpf: Show target_{obj,btf}_id for tracing link'Alexei Starovoitov2-2/+15
Yafang Shao says: ==================== The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. For some other link types like perf_event and kprobe_multi, it is not easy to find which functions are attached either. We may support ->fill_link_info for them in the future. v1->v2: - Skip showing them in the plain output for the old kernels. (Quentin) - Coding improvement. (Andrii) ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2023-05-19bpftool: Show target_{obj,btf}_id in tracing link infoYafang Shao1-0/+6
The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. The result as follows, $ tools/bpf/bpftool/bpftool link show 2: tracing prog 13 prog_type tracing attach_type trace_fentry target_obj_id 1 target_btf_id 13964 pids trace(10673) $ tools/bpf/bpftool/bpftool link show -j [{"id":2,"type":"tracing","prog_id":13,"prog_type":"tracing","attach_type":"trace_fentry","target_obj_id":1,"target_btf_id":13964,"pids":[{"pid":10673,"comm":"trace"}]}] Signed-off-by: Yafang Shao <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-05-19bpf: Show target_{obj,btf}_id in tracing link fdinfoYafang Shao1-2/+9
The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. The result as follows, $ cat /proc/10673/fdinfo/10 pos: 0 flags: 02000000 mnt_id: 15 ino: 2094 link_type: tracing link_id: 2 prog_tag: a04f5eef06a7f555 prog_id: 13 attach_type: 24 target_obj_id: 1 target_btf_id: 13964 Signed-off-by: Yafang Shao <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-05-19bpf: Fix mask generation for 32-bit narrow loads of 64-bit fieldsWill Deacon1-1/+1
A narrow load from a 64-bit context field results in a 64-bit load followed potentially by a 64-bit right-shift and then a bitwise AND operation to extract the relevant data. In the case of a 32-bit access, an immediate mask of 0xffffffff is used to construct a 64-bit BPP_AND operation which then sign-extends the mask value and effectively acts as a glorified no-op. For example: 0: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0) results in the following code generation for a 64-bit field: ldr x7, [x7] // 64-bit load mov x10, #0xffffffffffffffff and x7, x7, x10 Fix the mask generation so that narrow loads always perform a 32-bit AND operation: ldr x7, [x7] // 64-bit load mov w10, #0xffffffff and w7, w7, w10 Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: John Fastabend <[email protected]> Cc: Krzesimir Nowak <[email protected]> Cc: Andrey Ignatov <[email protected]> Acked-by: Yonghong Song <[email protected]> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-05-19ice: use src VSI instead of src MAC in slow-pathMichal Swiatkowski9-102/+40
The use of a source MAC to direct packets from the VF to the corresponding port representor is only ok if there is only one MAC on a VF. To support this functionality when the number of MACs on a VF is greater, it is necessary to match a source VSI instead of a source MAC. Let's use the new switch API that allows matching on metadata. If MAC isn't used in match criteria there is no need to handle adding rule after virtchnl command. Instead add new rule while port representor is being configured. Remove rule_added field, checking for sp_rule can be used instead. Remove also checking for switchdev running in deleting rule as it can be called from unroll context when running flag isn't set. Checking for sp_rule covers both context (with and without running flag). Rules are added in eswitch configuration flow, so there is no need to have replay function. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Piotr Raczynski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-05-19ice: allow matching on meta dataMichal Swiatkowski5-105/+95
Add meta data matching criteria in the same place as protocol matching criteria. There is no need to add meta data as special words after parsing all lookups. Trade meta data in the same why as other lookups. The one difference between meta data lookups and protocol lookups is that meta data doesn't impact how the packets looks like. Because of that ignore it when filling testing packet. Match on tunnel type meta data always if tunnel type is different than TNL_LAST. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Piotr Raczynski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-05-19ice: specify field names in ice_prot_ext initMichal Swiatkowski1-23/+28
Anonymous initializers are now discouraged. Define ICE_PROTCOL_ENTRY macro to rewrite anonymous initializers to named one. No functional changes here. Suggested-by: Alexander Lobakin <[email protected]> Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-05-19ice: remove redundant Rx field from rule infoMichal Swiatkowski4-22/+14
Information about the direction is currently stored in sw_act.flag. There is no need to duplicate it in another field. Setting direction flag doesn't mean that there is a match criteria for direction in rule. It is only a information for HW from where switch id should be collected (VSI or port). In current implementation of advance rule handling, without matching for direction meta data, we can always set one the same flag and everything will work the same. Ability to match on direction meta data will be added in follow up patches. Recipe 0, 3 and 9 loaded from package has direction match criteria, but they are handled in other function. Move ice_adv_rule_info fields to avoid holes. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Piotr Raczynski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-05-19ice: define meta data to match in switchMichal Swiatkowski3-16/+183
Add description for each meta data. Redefine tunnel mask to match only tunneled MAC and tunneled VLAN. It shouldn't try to match other flags (previously it was 0xff, it is redundant). VLAN mask was 0xd000, change it to 0xf000. 4 last bits are flags depending on the same field in packets (VLAN tag). Because of that, It isn't harmful to match also on ITAG. Group all MDID and MDID offsets into enums to keep things organized. Signed-off-by: Michal Swiatkowski <[email protected]> Reviewed-by: Piotr Raczynski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Tested-by: Sujai Buvaneswaran <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-05-19perf bench syscall: Fix __NR_execve undeclared build errorTiezhu Yang1-0/+3
The __NR_execve definition for i386 was deleted by mistake in the commit ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark"), add it to fix the build error on i386. Fixes: ece7f7c0507cc147 ("perf bench syscall: Add fork syscall benchmark") Reported-by: Linux Kernel Functional Testing <[email protected]> Signed-off-by: Tiezhu Yang <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tiezhu Yang <[email protected]> Cc: [email protected] Closes: https://lore.kernel.org/all/CA+G9fYvgBR1iB0CorM8OC4AM_w_tFzyQKHc+rF6qPzJL=TbfDQ@mail.gmail.com/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-05-19Merge branch 'pm-tools'Rafael J. Wysocki2-24/+30
Merge cpupower utility fixes for 6.4-rc3: - Read TSC on each CPU right before reading MPERF so as to reduce the potential time difference between the TSC and MPERF accesses and improve the C0 percentage calculation (Wyes Karny). - Fix a possible file handle leak and clean up the code in sysfs_get_enabled() (Hao Zeng). * pm-tools: cpupower: Make TSC read per CPU for Mperf monitor cpupower:Fix resource leaks in sysfs_get_enabled()
2023-05-19Merge tag 'linux-cpupower-6.4-rc3' of ↵Rafael J. Wysocki2-24/+30
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility fixes for 6.4-rc3 from Shuah Khan: "This cpupower fixes update for Linux 67.4-rc3 consists of: - a resource leak fix - fix drift in C0 percentage calculation due to System-wide TSC read. To lower this drift read TSC per CPU and also just after mperf read. This technique improves C0 percentage calculation in Mperf monitor" * tag 'linux-cpupower-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: cpupower: Make TSC read per CPU for Mperf monitor cpupower:Fix resource leaks in sysfs_get_enabled()
2023-05-19fbdev: omapfb: panel-tpo-td043mtea1: fix error code in probe()Dan Carpenter1-1/+2
This was using the wrong variable, "r", instead of "ddata->vcc_reg", so it returned success instead of a negative error code. Fixes: 0d3dbeb8142a ("video: fbdev: omapfb: panel-tpo-td043mtea1: Make use of the helper function dev_err_probe()") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2023-05-19perf test attr: Fix python SafeConfigParser() deprecation warningIan Rogers1-3/+3
Address the warning: ``` tests/attr.py:155: DeprecationWarning: The SafeConfigParser class has been renamed to ConfigParser in Python 3.2. This alias will be removed in Python 3.12. Use ConfigParser directly instead. parser = configparser.SafeConfigParser() ``` by removing the word 'Safe'. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-05-19perf test attr: Update no event/metric expectationsIan Rogers5-174/+249
Previously hard coded events/metrics were used, update for the use of the TopdownL1 json metric group. Reported-by: Arnaldo Carvalho de Melo <[email protected]> Fixes: 94b1a603fca78388 ("perf stat: Add TopdownL1 metric as a default if present") Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Richter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2023-05-19net: arc: Make arc_emac_remove() return voidUwe Kleine-König4-10/+7
The function returns zero unconditionally. Change it to return void instead which simplifies its callers as error handing becomes unnecessary. Signed-off-by: Uwe Kleine-König <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19driver core: class: properly reference count class_dev_iter()Greg Kroah-Hartman2-0/+3
When class_dev_iter is initialized, the reference count for the subsys private structure is incremented, but never decremented, causing a memory leak over time. To resolve this, save off a pointer to the internal structure into the class_dev_iter structure and then when the iterator is finished, drop the reference count. Reported-and-tested-by: [email protected] Fixes: 7b884b7f24b4 ("driver core: class.c: convert to only use class_to_subsys") Reported-by: Mirsad Goran Todorovac <[email protected]> Cc: Alan Stern <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Tested-by: Mirsad Goran Todorovac <[email protected]> Link: https://lore.kernel.org/r/2023051610-stove-condense-9a77@gregkh Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-05-19net: fec: add dma_wmb to ensure correct descriptor valuesShenwei Wang1-6/+11
Two dma_wmb() are added in the XDP TX path to ensure proper ordering of descriptor and buffer updates: 1. A dma_wmb() is added after updating the last BD to make sure the updates to rest of the descriptor are visible before transferring ownership to FEC. 2. A dma_wmb() is also added after updating the bdp to ensure these updates are visible before updating txq->bd.cur. 3. Start the xmit of the frame immediately right after configuring the tx descriptor. Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support") Signed-off-by: Shenwei Wang <[email protected]> Reviewed-by: Wei Fang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19MAINTAINERS: add myself as maintainer for enetcVladimir Oltean1-0/+1
I would like to be copied on new patches submitted on this driver. I am relatively familiar with the code, having practically maintained it for a while. Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Claudiu Manoil <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19octeontx2-pf: Fix TSOv6 offloadSunil Goutham1-3/+1
HW adds segment size to the payload length in the IPv6 header. Fix payload length to just TCP header length instead of 'TCP header size + IPv6 header size'. Fixes: 86d7476078b8 ("octeontx2-pf: TCP segmentation offload support") Signed-off-by: Sunil Goutham <[email protected]> Signed-off-by: Ratheesh Kannoth <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19sfc: fix devlink info error handlingAlejandro Lucero1-50/+45
Avoid early devlink info return if errors arise with MCDI commands executed for getting the required info from the device. The rationale is some commands can fail but later ones could still give useful data. Moreover, some nvram partitions could not be present which needs to be handled as a non error. The specific errors are reported through system messages and if any error appears, it will be reported generically through extack. Fixes 14743ddd2495 ("sfc: add devlink info support for ef100") Signed-off-by: Alejandro Lucero <[email protected]> Acked-by: Martin Habets <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19net/smc: Reset connection when trying to use SMCRv2 fails.Wen Gu2-2/+8
We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It can be reproduced by: - smc_run nginx - smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port> BUG: kernel NULL pointer dereference, address: 0000000000000014 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G W E 6.4.0-rc1+ #42 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc] RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06 RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000 RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00 RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180 R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib] ? smcr_buf_map_link+0x24b/0x290 [smc] ? __smc_buf_create+0x4ee/0x9b0 [smc] smc_clc_send_accept+0x4c/0xb0 [smc] smc_listen_work+0x346/0x650 [smc] ? __schedule+0x279/0x820 process_one_work+0x1e5/0x3f0 worker_thread+0x4d/0x2f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> During the CLC handshake, server sequentially tries available SMCRv2 and SMCRv1 devices in smc_listen_work(). If an SMCRv2 device is found. SMCv2 based link group and link will be assigned to the connection. Then assumed that some buffer assignment errors happen later in the CLC handshake, such as RMB registration failure, server will give up SMCRv2 and try SMCRv1 device instead. But the resources assigned to the connection won't be reset. When server tries SMCRv1 device, the connection creation process will be executed again. Since conn->lnk has been assigned when trying SMCRv2, it will not be set to the correct SMCRv1 link in smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to correct SMCRv1 link group but conn->lnk points to the SMCRv2 link mistakenly. Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx] will be accessed. Since the link->link_idx is not correct, the related MR may not have been initialized, so crash happens. | Try SMCRv2 device first | |-> conn->lgr: assign existed SMCRv2 link group; | |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC); | |-> sndbuf & RMB creation fails, quit; | | Try SMCRv1 device then | |-> conn->lgr: create SMCRv1 link group and assign; | |-> conn->link: keep SMCRv2 link mistakenly; | |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0] | initialized. | | Then smc_clc_send_confirm_accept() accesses | conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash. v This patch tries to fix this by cleaning conn->lnk before assigning link. In addition, it is better to reset the connection and clean the resources assigned if trying SMCRv2 failed in buffer creation or registration. Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2") Link: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Tony Lu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19selftests: fib_tests: mute cleanup error messagePo-Hsu Lin1-1/+1
In the end of the test, there will be an error message induced by the `ip netns del ns1` command in cleanup() Tests passed: 201 Tests failed: 0 Cannot remove namespace file "/run/netns/ns1": No such file or directory This can even be reproduced with just `./fib_tests.sh -h` as we're calling cleanup() on exit. Redirect the error message to /dev/null to mute it. V2: Update commit message and fixes tag. V3: resubmit due to missing netdev ML in V2 Fixes: b60417a9f2b8 ("selftest: fib_tests: Always cleanup before exit") Signed-off-by: Po-Hsu Lin <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19net/mlx5e: do as little as possible in napi poll when budget is 0Jakub Kicinski1-7/+9
NAPI gets called with budget of 0 from netpoll, which has interrupts disabled. We should try to free some space on Tx rings and nothing else. Specifically do not try to handle XDP TX or try to refill Rx buffers - we can't use the page pool from IRQ context. Don't check if IRQs moved, either, that makes no sense in netpoll. Netpoll calls _all_ the rings from whatever CPU it happens to be invoked on. In general do as little as possible, the work quickly adds up when there's tens of rings to poll. The immediate stack trace I was seeing is: __do_softirq+0xd1/0x2c0 __local_bh_enable_ip+0xc7/0x120 </IRQ> <TASK> page_pool_put_defragged_page+0x267/0x320 mlx5e_free_xdpsq_desc+0x99/0xd0 mlx5e_poll_xdpsq_cq+0x138/0x3b0 mlx5e_napi_poll+0xc3/0x8b0 netpoll_poll_dev+0xce/0x150 AFAIU page pool takes a BH lock, releases it and since BH is now enabled tries to run softirqs. Reviewed-by: Tariq Toukan <[email protected]> Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19Merge branch 'tls-fixes'David S. Miller6-50/+177
Jakub Kicinski says: ==================== tls: rx: strp: fix inline crypto offload The local strparser version I added to TLS does not preserve decryption status, which breaks inline crypto (NIC offload). ==================== Signed-off-by: David S. Miller <[email protected]>
2023-05-19tls: rx: strp: don't use GFP_KERNEL in softirq contextJakub Kicinski1-0/+4
When receive buffer is small, or the TCP rx queue looks too complicated to bother using it directly - we allocate a new skb and copy data into it. We already use sk->sk_allocation... but nothing actually sets it to GFP_ATOMIC on the ->sk_data_ready() path. Users of HW offload are far more likely to experience problems due to scheduling while atomic. "Copy mode" is very rarely triggered with SW crypto. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19tls: rx: strp: preserve decryption status of skbs when neededJakub Kicinski4-31/+114
When receive buffer is small we try to copy out the data from TCP into a skb maintained by TLS to prevent connection from stalling. Unfortunately if a single record is made up of a mix of decrypted and non-decrypted skbs combining them into a single skb leads to loss of decryption status, resulting in decryption errors or data corruption. Similarly when trying to use TCP receive queue directly we need to make sure that all the skbs within the record have the same status. If we don't the mixed status will be detected correctly but we'll CoW the anchor, again collapsing it into a single paged skb without decrypted status preserved. So the "fixup" code will not know which parts of skb to re-encrypt. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19tls: rx: strp: factor out copying skb dataJakub Kicinski1-10/+23
We'll need to copy input skbs individually in the next patch. Factor that code out (without assuming we're copying a full record). Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19tls: rx: strp: fix determining record length in copy modeJakub Kicinski1-6/+15
We call tls_rx_msg_size(skb) before doing skb->len += chunk. So the tls_rx_msg_size() code will see old skb->len, most likely leading to an over-read. Worst case we will over read an entire record, next iteration will try to trim the skb but may end up turning frag len negative or discarding the subsequent record (since we already told TCP we've read it during previous read but now we'll trim it out of the skb). Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19tls: rx: strp: force mixed decrypted records into copy modeJakub Kicinski2-5/+21
If a record is partially decrypted we'll have to CoW it, anyway, so go into copy mode and allocate a writable skb right away. This will make subsequent fix simpler because we won't have to teach tls_strp_msg_make_copy() how to copy skbs while preserving decrypt status. Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19tls: rx: strp: set the skb->len of detached / CoW'ed skbsJakub Kicinski1-0/+2
alloc_skb_with_frags() fills in page frag sizes but does not set skb->len and skb->data_len. Set those correctly otherwise device offload will most likely generate an empty skb and hit the BUG() at the end of __skb_nsg(). Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-19tls: rx: device: fix checking decryption statusJakub Kicinski1-1/+1
skb->len covers the entire skb, including the frag_list. In fact we're guaranteed that rxm->full_len <= skb->len, so since the change under Fixes we were not checking decrypt status of any skb but the first. Note that the skb_pagelen() added here may feel a bit costly, but it's removed by subsequent fixes, anyway. Reported-by: Tariq Toukan <[email protected]> Fixes: 86b259f6f888 ("tls: rx: device: bound the frag walk") Tested-by: Shai Amiram <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-18Merge branch 'mptcp-refactor-inet_accept-and-mib-updates'Jakub Kicinski8-80/+122
Mat Martineau says: ==================== mptcp: Refactor inet_accept() and MIB updates Patches 1 and 2 refactor inet_accept() to provide a new __inet_accept() that's usable with locked sockets, and then make use of that helper to simplify mptcp_stream_accept(). Patches 3 and 4 add some new MIBS related to MPTCP address advertisement and update related selftest scripts. Patch 5 modifies the selftests to ensure MIBS are only printed once when a test case fails. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-18selftests: mptcp: centralize stats dumpingPaolo Abeni1-61/+5
If a test case fails, the mptcp_join.sh script can dump the netns MIBs multiple times, leading to confusing output. Let's dump such info only once per test-case, when needed. This additionally allow removing some code duplication. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-18selftests: mptcp: add explicit check for new mibsPaolo Abeni1-0/+62
Instead of duplicating the all existing TX check with the TX side, add the new ones on selected test cases. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-18mptcp: introduces more address related mibsPaolo Abeni5-5/+34
Currently we don't track explicitly a few events related to address management suboption handling; this patch adds new mibs for ADD_ADDR and RM_ADDR options tx and for missed tx events due to internal storage exhaustion. The self-tests must be updated to properly handle different mibs with the same/shared prefix. Additionally removes a couple of warning tracking the loss event. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/378 Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-18mptcp: refactor mptcp_stream_accept()Paolo Abeni1-9/+12
Rewrite the mptcp socket accept op, leveraging the new __inet_accept() helper. This way we can avoid acquiring the new socket lock twice and we can avoid a couple of indirect calls. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-18inet: factor out locked section of inet_accept() in a new helperPaolo Abeni2-15/+19
No functional changes intended. The new helper will be used by the MPTCP protocol in the next patch to avoid duplicating a few LoC. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-18Merge branch '100GbE' of ↵Jakub Kicinski10-387/+144
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-05-17 (ice, MAINTAINERS) This series contains updates to ice driver and MAINTAINERS file. Paul refactors PHY to link mode reporting and updates some PHY types to report more accurate link modes for ice. Dave removes mutual exclusion policy between LAG and SR-IOV in ice driver. Jesse updates link for Intel Wired LAN in the MAINTAINERS file. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: MAINTAINERS: update Intel Ethernet links ice: Remove LAG+SRIOV mutual exclusion ice: update PHY type to ethtool link mode mapping ice: refactor PHY type to ethtool link mode ice: update ICE_PHY_TYPE_HIGH_MAX_INDEX ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-18net: cdc_ncm: Deal with too low values of dwNtbOutMaxSizeTudor Ambarus1-9/+15
Currently in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is lower than the calculated "min" value, but greater than zero, the logic sets tx_max to dwNtbOutMaxSize. This is then used to allocate a new SKB in cdc_ncm_fill_tx_frame() where all the data is handled. For small values of dwNtbOutMaxSize the memory allocated during alloc_skb(dwNtbOutMaxSize, GFP_ATOMIC) will have the same size, due to how size is aligned at alloc time: size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); Thus we hit the same bug that we tried to squash with commit 2be6d4d16a084 ("net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero") Low values of dwNtbOutMaxSize do not cause an issue presently because at alloc_skb() time more memory (512b) is allocated than required for the SKB headers alone (320b), leaving some space (512b - 320b = 192b) for CDC data (172b). However, if more elements (for example 3 x u64 = [24b]) were added to one of the SKB header structs, say 'struct skb_shared_info', increasing its original size (320b [320b aligned]) to something larger (344b [384b aligned]), then suddenly the CDC data (172b) no longer fits in the spare SKB data area (512b - 384b = 128b). Consequently the SKB bounds checking semantics fails and panics: skbuff: skb_over_panic: text:ffffffff831f755b len:184 put:172 head:ffff88811f1c6c00 data:ffff88811f1c6c00 tail:0xb8 end:0x80 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:113! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 5.15.106-syzkaller-00249-g19c0ed55a470 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023 Workqueue: mld mld_ifc_work RIP: 0010:skb_panic net/core/skbuff.c:113 [inline] RIP: 0010:skb_over_panic+0x14c/0x150 net/core/skbuff.c:118 [snip] Call Trace: <TASK> skb_put+0x151/0x210 net/core/skbuff.c:2047 skb_put_zero include/linux/skbuff.h:2422 [inline] cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1131 [inline] cdc_ncm_fill_tx_frame+0x11ab/0x3da0 drivers/net/usb/cdc_ncm.c:1308 cdc_ncm_tx_fixup+0xa3/0x100 Deal with too low values of dwNtbOutMaxSize, clamp it in the range [USB_CDC_NCM_NTB_MIN_OUT_SIZE, CDC_NCM_NTB_MAX_SIZE_TX]. We ensure enough data space is allocated to handle CDC data by making sure dwNtbOutMaxSize is not smaller than USB_CDC_NCM_NTB_MIN_OUT_SIZE. Fixes: 289507d3364f ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning") Cc: [email protected] Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=b982f1059506db48409d Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Tudor Ambarus <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>