aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-25Merge tag 'x86_cpu_for_v6.4_rc1' of ↵Linus Torvalds5-46/+43
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu model updates from Borislav Petkov: - Add Emerald Rapids to the list of Intel models supporting PPIN - Finally use a CPUID bit for split lock detection instead of enumerating every model - Make sure automatic IBRS is set on AMD, even though the AP bringup code does that now by replicating the MSR which contains the switch * tag 'x86_cpu_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Xeon Emerald Rapids to list of CPUs that support PPIN x86/split_lock: Enumerate architectural split lock disable bit x86/CPU/AMD: Make sure EFER[AIBRSE] is set
2023-04-25Merge tag 'x86_acpi_for_v6.4_rc1' of ↵Linus Torvalds1-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ACPI update from Borislav Petkov: - Improve code generation in ACPI's global lock's acquisition function * tag 'x86_acpi_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ACPI/boot: Improve __acpi_acquire_global_lock
2023-04-25Merge tag 'ras_core_for_v6.4_rc1' of ↵Linus Torvalds2-13/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Just cleanups and fixes this time around: make threshold_ktype const, an objtool fix and use proper size for a bitmap * tag 'ras_core_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE/AMD: Use an u64 for bank_map x86/mce: Always inline old MCA stubs x86/MCE/AMD: Make kobj_type structure constant
2023-04-25Merge tag 'edac_updates_for_v6.4' of ↵Linus Torvalds15-645/+490
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - skx_edac: Fix overflow when decoding 32G DIMM ranks - i10nm_edac: Add Sierra Forest support - amd64_edac: Split driver code between legacy and SMCA systems. The final goal is adding support for more hw, like GPUs - The usual minor cleanups and fixes * tag 'edac_updates_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (25 commits) EDAC/i10nm: Add Intel Sierra Forest server support EDAC/amd64: Fix indentation in umc_determine_edac_cap() EDAC/altera: Remove MODULE_LICENSE in non-module EDAC: Sanitize MODULE_AUTHOR strings EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR EDAC/amd64: Add get_err_info() to pvt->ops EDAC/amd64: Split dump_misc_regs() into dct/umc functions EDAC/amd64: Split init_csrows() into dct/umc functions EDAC/amd64: Split determine_edac_cap() into dct/umc functions EDAC/amd64: Rename f17h_determine_edac_ctl_cap() EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions EDAC/amd64: Split ecc_enabled() into dct/umc functions EDAC/amd64: Split read_mc_regs() into dct/umc functions EDAC/amd64: Split determine_memory_type() into dct/umc functions EDAC/amd64: Split read_base_mask() into dct/umc functions EDAC/amd64: Split prep_chip_selects() into dct/umc functions EDAC/amd64: Rework hw_info_{get,put} EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt EDAC/amd64: Do not discover ECC symbol size for Family 17h and later EDAC/amd64: Drop dbam_to_cs() for Family 17h and later ...
2023-04-25Merge tag 'm68k-for-v6.4-tag1' of ↵Linus Torvalds15-24/+25
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - defconfig updates - miscellaneous fixes and improvements * tag 'm68k-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: kexec: Include <linux/reboot.h> m68k: defconfig: Update defconfigs for v6.3-rc1 m68k: Remove obsolete config NO_KERNEL_MSG nubus: Drop noop match function
2023-04-25spi: bcm63xx: use macro DEFINE_SIMPLE_DEV_PM_OPSDhruva Gole1-3/+1
Using this macro makes the code more readable. It also inits the members of dev_pm_ops in the following manner without us explicitly needing to: .suspend = bcm63xx_spi_suspend, \ .resume = bcm63xx_spi_resume, \ .freeze = bcm63xx_spi_suspend, \ .thaw = bcm63xx_spi_resume, \ .poweroff = bcm63xx_spi_suspend, \ .restore = bcm63xx_spi_resume Signed-off-by: Dhruva Gole <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-04-25xen/blkback: move blkif_get_x86_*_req() into blkback.cJuergen Gross2-96/+104
There is no need to have the functions blkif_get_x86_32_req() and blkif_get_x86_64_req() in a header file, as they are used in one place only. So move them into the using source file and drop the inline qualifier. While at it fix some style issues, and simplify the code by variable reusing and using min() instead of open coding it. Instead of using barrier() use READ_ONCE() for avoiding multiple reads of nr_segments. Signed-off-by: Juergen Gross <[email protected]> Acked-by: Roger Pau Monné <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2023-04-25xen/blkback: simplify free_persistent_gnts() interfaceJuergen Gross1-10/+10
The interface of free_persistent_gnts() can be simplified, as there is only a single caller of free_persistent_gnts() and the 2nd and 3rd parameters are easily obtainable via the ring pointer, which is passed as the first parameter anyway. Signed-off-by: Juergen Gross <[email protected]> Acked-by: Roger Pau Monné <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2023-04-25xen/blkback: remove stale prototypeJuergen Gross1-1/+0
There is no function xen_blkif_purge_persistent(), so remove its prototype from common.h. Signed-off-by: Juergen Gross <[email protected]> Acked-by: Roger Pau Monné <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2023-04-25xen/blkback: fix white space code style issuesJuergen Gross2-4/+4
Signed-off-by: Juergen Gross <[email protected]> Acked-by: Roger Pau Monné <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2023-04-25gfs2: gfs2_ail_empty_gl no log flush on errorBob Peterson1-2/+3
Before this patch, function gfs2_ail_empty_gl called gfs2_log_flush even in cases where it encountered an error. It should probably skip the log flush and leave the file system in an inconsistent state, letting a subsequent withdraw force the journal to be replayed to reestablish metadata consistency. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-04-25gfs2: Issue message when revokes cannot be writtenBob Peterson1-1/+3
Before this patch, function gfs2_ail_empty_gl would silently return an error to the caller. This would get silently set into sd_log_error which would cause a withdraw, but there was no indication why the file system was withdrawn. This patch adds a fs_err to log the appropriate error message. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-04-25gfs2: Perform second log flush in gfs2_make_fs_roBob Peterson1-0/+9
Before this patch, function gfs2_make_fs_ro called gfs2_log_flush once to finalize the log. However, if there's dirty metadata, log flushes tend to sync the metadata and formulate revokes. Before this patch, those revokes may not be written out to the journal immediately, which meant unresolved glocks could still have revokes in their ail lists. When the glock worker runs, it tries to transition the glock, but the unresolved revokes in the ail still need to be written, so it tries to start a transaction. It's impossible to start a transaction because at that point, the SDF_JOURNAL_LIVE flag has been cleared by gfs2_make_fs_ro. That causes the glock worker to fail, unable to write the revokes. The calling sequence looked something like this: gfs2_make_fs_ro gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN) clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags); ...meanwhile... glock_work_func do_xmote rgrp_go_sync (or possibly inode_go_sync) ... gfs2_ail_empty_gl __gfs2_trans_begin if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) { ... return -EROFS; The previous patch in the series ("gfs2: return errors from gfs2_ail_empty_gl") now causes the transaction error to no longer be ignored, so it causes a warning from MOST of the xfstests: WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2] which corresponds to: WARN_ON(gfs2_withdrawing(sdp)); The withdraw was triggered silently from do_xmote by: if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp))) gfs2_withdraw_delayed(sdp); This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the data and one to sync any outstanding revokes and finalize the journal. Note that both of these log flushes need to be "special," in other words, not GFS2_LOG_HEAD_FLUSH_NORMAL. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-04-25gfs2: return errors from gfs2_ail_empty_glBob Peterson1-3/+5
Before this patch, function gfs2_ail_empty_gl did not return errors it encountered from __gfs2_trans_begin. Those errors usually came from the fact that the file system was made read-only, often due to unmount (but theoretically could be due to -o remount,ro), which prevented the transaction from starting. The inability to start a transaction prevented its revokes from being properly written to the journal for glocks during unmount (and transition to ro). That meant glocks could be unlocked without the metadata properly revoked in the journal. So other nodes could grab the glock thinking that their lvb values were correct, but in fact corresponded to the glock without its revokes properly synced. That presented as lvb mismatch errors. This patch allows gfs2_ail_empty_gl to return the error properly to the caller. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2023-04-25HID: amd_sfh: Fix max supported HID devicesBasavaraj Natikar1-1/+1
commit 4bd763568dbd ("HID: amd_sfh: Support for additional light sensor") adds additional sensor devices, but forgets to add the number of HID devices to match. Thus, the number of HID devices does not match the actual number of sensors. In order to prevent corruption and system hangs when more than the allowed number of HID devices are accessed, the number of HID devices is increased accordingly. Fixes: 4bd763568dbd ("HID: amd_sfh: Support for additional light sensor") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217354 Signed-off-by: Basavaraj Natikar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>
2023-04-25net: phy: marvell-88x2222: remove unnecessary (void*) conversionswuych1-2/+2
Pointer variables of void * type do not require type cast. Signed-off-by: wuych <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-25tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.Kuniyuki Iwashima1-0/+3
syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY skbs. We can reproduce the problem with these sequences: sk = socket(AF_INET, SOCK_DGRAM, 0) sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE) sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1) sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53)) sk.close() sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets skb->cb->ubuf.refcnt to 1, and calls sock_hold(). Here, struct ubuf_info_msgzc indirectly holds a refcnt of the socket. When the skb is sent, __skb_tstamp_tx() clones it and puts the clone into the socket's error queue with the TX timestamp. When the original skb is received locally, skb_copy_ubufs() calls skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt. This additional count is decremented while freeing the skb, but struct ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is not called. The last refcnt is not released unless we retrieve the TX timestamped skb by recvmsg(). Since we clear the error queue in inet_sock_destruct() after the socket's refcnt reaches 0, there is a circular dependency. If we close() the socket holding such skbs, we never call sock_put() and leak the count, sk, and skb. TCP has the same problem, and commit e0c8bccd40fc ("net: stream: purge sk_error_queue in sk_stream_kill_queues()") tried to fix it by calling skb_queue_purge() during close(). However, there is a small chance that skb queued in a qdisc or device could be put into the error queue after the skb_queue_purge() call. In __skb_tstamp_tx(), the cloned skb should not have a reference to the ubuf to remove the circular dependency, but skb_clone() does not call skb_copy_ubufs() for zerocopy skb. So, we need to call skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs(). [0]: BUG: memory leak unreferenced object 0xffff88800c6d2d00 (size 1152): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00 ................ 02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024 [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083 [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline] [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245 [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515 [<00000000b9b11231>] sock_create net/socket.c:1566 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline] [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline] [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636 [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline] [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline] [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd BUG: memory leak unreferenced object 0xffff888017633a00 (size 240): comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff .........-m..... backtrace: [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497 [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline] [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596 [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline] [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370 [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037 [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652 [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253 [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819 [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline] [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734 [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117 [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline] [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline] [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125 [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Fixes: b5947e5d1e71 ("udp: msg_zerocopy") Reported-by: syzbot <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-25net: amd: Fix link leak when verifying config failedGencen Gan1-1/+1
After failing to verify configuration, it returns directly without releasing link, which may cause memory leak. Paolo Abeni thinks that the whole code of this driver is quite "suboptimal" and looks unmainatained since at least ~15y, so he suggests that we could simply remove the whole driver, please take it into consideration. Simon Horman suggests that the fix label should be set to "Linux-2.6.12-rc2" considering that the problem has existed since the driver was introduced and the commit above doesn't seem to exist in net/net-next. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gan Gecen <[email protected]> Reviewed-by: Dongliang Mu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-25media: ov5670: Fix probe on ACPISakari Ailus1-1/+1
devm_clk_get() will return either an error or NULL, which the driver handles, continuing to use the clock of reading the value of the clock-frequency property. However, the value of ov5670->xvclk is left as-is and the other clock framework functions aren't capable of handling error values. Use devm_clk_get_optional() to obtain NULL instead of -ENOENT. Fixes: 8004c91e2095 ("media: i2c: ov5670: Use common clock framework") Signed-off-by: Sakari Ailus <[email protected]> Reviewed-by: Jacopo Mondi <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2023-04-25sh: Replace <uapi/asm/types.h> by <asm-generic/int-ll64.h>Geert Uytterhoeven1-1/+1
As arch/sh/include/uapi/asm/types.h doesn't exist, sh doesn't provide any sh-specific uapi definitions, and it can just include <asm-generic/int-ll64.h>, like most other architectures. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: John Paul Adrian Glaubitz <[email protected]> Link: https://lore.kernel.org/r/26932016c83c2ad350db59f5daf96117a38bbbd8.1679566927.git.geert@linux-m68k.org Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
2023-04-25sh: Use generic GCC library routinesGeert Uytterhoeven6-97/+6
The C implementations of __ashldi3(), __ashrdi3__(), and __lshrdi3() in arch/sh/lib/ are identical to the generic C implementations in lib/. Reduce duplication by switching SH to the generic versions. Update the include path in arch/sh/boot/compressed accordingly. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Palmer Dabbelt <[email protected]> Reviewed-by: John Paul Adrian Glaubitz <[email protected]> Link: https://lore.kernel.org/r/74dbe68dc8e2ffb6180092f73723fe21ab692c7a.1679566500.git.geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
2023-04-24Merge tag 'pull-nios2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-3/+0
Pull trivial nios2 cleanup from Al Viro. * tag 'pull-nios2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nios2: _TIF_ALLWORK_MASK is unused
2023-04-24Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds4-24/+17
Pull misc vfs pile from Al Viro. Random minor cleanups. * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Fix description of vfs_tmpfile() sysv: switch to put_and_unmap_page() fs/sysv: Don't round down address for kunmap_flush_on_unmap()
2023-04-24Merge tag 'pull-old-dio' of ↵Linus Torvalds3-10/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull legacy dio cleanup from Al Viro. * tag 'pull-old-dio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: __blockdev_direct_IO(): get rid of submit_io callback
2023-04-24Merge tag 'pull-write-one-page' of ↵Linus Torvalds5-65/+58
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs write_one_page removal from Al Viro: "write_one_page series" * tag 'pull-write-one-page' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mm,jfs: move write_one_page/folio_write_one to jfs ocfs2: don't use write_one_page in ocfs2_duplicate_clusters_by_page ufs: don't flush page immediately for DIRSYNC directories
2023-04-24Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds10-113/+84
Pull vfs fget updates from Al Viro: "fget() to fdget() conversions" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse_dev_ioctl(): switch to fdget() cgroup_get_from_fd(): switch to fdget_raw() bpf: switch to fdget_raw() build_mount_idmapped(): switch to fdget() kill the last remaining user of proc_ns_fget() SVM-SEV: convert the rest of fget() uses to fdget() in there convert sgx_set_attribute() to fdget()/fdput() convert setns(2) to fdget()/fdput()
2023-04-24net: phy: marvell: Fix inconsistent indenting in led_blink_setChristian Marangi1-4/+4
Fix inconsistent indeinting in m88e1318_led_blink_set reported by kernel test robot, probably done by the presence of an if condition dropped in later revision of the same code. Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: ea9e86485dec ("net: phy: marvell: Implement led_blink_set()") Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24lan966x: Don't use xdp_frame when action is XDP_TXHoratiu Vultur3-23/+28
When the action of an xdp program was XDP_TX, lan966x was creating a xdp_frame and use this one to send the frame back. But it is also possible to send back the frame without needing a xdp_frame, because it is possible to send it back using the page. And then once the frame is transmitted is possible to use directly page_pool_recycle_direct as lan966x is using page pools. This would save some CPU usage on this path, which results in higher number of transmitted frames. Bellow are the statistics: Frame size: Improvement: 64 ~8% 256 ~11% 512 ~8% 1000 ~0% 1500 ~0% Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24Merge tag 'for-netdev' of ↵Jakub Kicinski7-44/+87
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-04-24 We've added 5 non-merge commits during the last 3 day(s) which contain a total of 7 files changed, 87 insertions(+), 44 deletions(-). The main changes are: 1) Workaround for bpf iter selftest due to lack of subprog support in precision tracking, from Andrii. 2) Disable bpf_refcount_acquire kfunc until races are fixed, from Dave. 3) One more test_verifier test converted from asm macro to asm in C, from Eduard. 4) Fix build with NETFILTER=y INET=n config, from Florian. 5) Add __rcu_read_{lock,unlock} into deny list, from Yafang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests bpf: Add __rcu_read_{lock,unlock} into btf id deny list bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed selftests/bpf: verifier/prevent_map_lookup converted to inline assembly bpf: fix link failure with NETFILTER=y INET=n ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24Merge branch 'tsnep-xdp-socket-zero-copy-support'Jakub Kicinski3-123/+822
Gerhard Engleder says: ==================== tsnep: XDP socket zero-copy support Implement XDP socket zero-copy support for tsnep driver. I tried to follow existing drivers like igc as far as possible. But one main difference is that tsnep does not need any reconfiguration for XDP BPF program setup. So I decided to keep this behavior no matter if a XSK pool is used or not. As a result, tsnep starts using the XSK pool even if no XDP BPF program is available. Another difference is that I tried to prevent potentially failing allocations during XSK pool setup. E.g. both memory models for page pool and XSK pool are registered all the time. Thus, XSK pool setup cannot end up with not working queues. Some prework is done to reduce the last two XSK commits to actual XSK changes. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Add XDP socket zero-copy TX supportGerhard Engleder2-11/+121
Send and complete XSK pool frames within TX NAPI context. NAPI context is triggered by ndo_xsk_wakeup. Test results with A53 1.2GHz: xdpsock txonly copy mode, 64 byte frames: pps pkts 1.00 tx 284,409 11,398,144 Two CPUs with 100% and 10% utilization. xdpsock txonly zero-copy mode, 64 byte frames: pps pkts 1.00 tx 511,929 5,890,368 Two CPUs with 100% and 1% utilization. xdpsock l2fwd copy mode, 64 byte frames: pps pkts 1.00 rx 248,985 7,315,885 tx 248,921 7,315,885 Two CPUs with 100% and 10% utilization. xdpsock l2fwd zero-copy mode, 64 byte frames: pps pkts 1.00 rx 254,735 3,039,456 tx 254,735 3,039,456 Two CPUs with 100% and 4% utilization. Packet rate increases and CPU utilization is reduced in both cases. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Add XDP socket zero-copy RX supportGerhard Engleder3-14/+556
Add support for XSK zero-copy to RX path. The setup of the XSK pool can be done at runtime. If the netdev is running, then the queue must be disabled and enabled during reconfiguration. This can be done easily with functions introduced in previous commits. A more important property is that, if the netdev is running, then the setup of the XSK pool shall not stop the netdev in case of errors. A broken netdev after a failed XSK pool setup is bad behavior. Therefore, the allocation and setup of resources during XSK pool setup is done only before any queue is disabled. Additionally, freeing and later allocation of resources is eliminated in some cases. Page pool entries are kept for later use. Two memory models are registered in parallel. As a result, the XSK pool setup cannot fail during queue reconfiguration. In contrast to other drivers, XSK pool setup and XDP BPF program setup are separate actions. XSK pool setup can be done without any XDP BPF program. The XDP BPF program can be added, removed or changed without any reconfiguration of the XSK pool. Test results with A53 1.2GHz: xdpsock rxdrop copy mode, 64 byte frames: pps pkts 1.00 rx 856,054 10,625,775 Two CPUs with both 100% utilization. xdpsock rxdrop zero-copy mode, 64 byte frames: pps pkts 1.00 rx 889,388 4,615,284 Two CPUs with 100% and 20% utilization. Packet rate increases and CPU utilization is reduced. 100% CPU load seems to the base load. This load is consumed by ksoftirqd just for dropping the generated packets without xdpsock running. Using batch API reduced CPU utilization slightly, but measurements are not stable enough to provide meaningful numbers. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Move skb receive action to separate functionGerhard Engleder1-16/+23
The function tsnep_rx_poll() is already pretty long and the skb receive action can be reused for XSK zero-copy support. Move page based skb receive to separate function. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Add functions for queue enable/disableGerhard Engleder1-33/+64
Move queue enable and disable code to separate functions. This way the activation and deactivation of the queues are defined actions, which can be used in future execution paths. This functions will be used for the queue reconfiguration at runtime, which is necessary for XSK zero-copy support. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Rework TX/RX queue initializationGerhard Engleder1-43/+51
Make initialization of TX and RX queues less dynamic by moving some initialization from netdev open/close to device probing. Additionally, move some initialization code to separate functions to enable future use in other execution paths. This is done as preparation for queue reconfigure at runtime, which is necessary for XSK zero-copy support. Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24tsnep: Replace modulo operation with maskGerhard Engleder2-14/+15
TX/RX ring size is static and power of 2 to enable compiler to optimize modulo operation to mask operation. Make this optimization already in the code and don't rely on the compiler. CPU utilisation during high packet rate has not changed. So no performance improvement has been measured. But it is best practice to prevent modulo operations. Suggested-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Gerhard Engleder <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: phy: dp83867: Add led_brightness_set supportAlexander Stein1-0/+31
Up to 4 LEDs can be attached to the PHY, add support for setting brightness manually. Signed-off-by: Alexander Stein <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: phy: Fix reading LED reg propertyAlexander Stein1-1/+5
'reg' is always encoded in 32 bits, thus it has to be read using the function with the corresponding bit width. Fixes: 01e5b728e9e4 ("net: phy: Add a binding for PHY LEDs") Signed-off-by: Alexander Stein <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24drivers: nfc: nfcsim: remove return value check of `dev_dir`Jianuo Kuang1-5/+0
Smatch complains that: nfcsim_debugfs_init_dev() warn: 'dev_dir' is an error pointer or valid According to the documentation of the debugfs_create_dir() function, there is no need to check the return value of this function. Just delete the dead code. Signed-off-by: Jianuo Kuang <[email protected]> Reviewed-by: Dongliang Mu <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: phy: dp83867: Remove unnecessary (void*) conversionswuych1-2/+1
Pointer variables of void * type do not require type cast. Signed-off-by: wuych <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: ethtool: coalesce: try to make user settings stick twiceJakub Kicinski2-11/+47
SET_COALESCE may change operation mode and parameters in one call. Changing operation mode may cause the driver to reset the parameter values to what is a reasonable default for new operation mode. Since driver does not know which parameters come from user and which are echoed back from ->get, driver may ignore the parameters when switching operation modes. This used to be inevitable for ioctl() but in netlink we know which parameters are actually specified by the user. We could inform which parameters were set by the user but this would lead to a lot of code duplication in the drivers. Instead try to call the drivers twice if both mode and params are changed. The set method already checks if any params need updating so in case the driver did the right thing the first time around - there will be no second call to it's ->set method (only an extra call to ->get()). For mlx5 for example before this patch we'd see: # ethtool -C eth0 adaptive-rx on adaptive-tx on # ethtool -C eth0 adaptive-rx off adaptive-tx off \ tx-usecs 123 rx-usecs 123 Adaptive RX: off TX: off rx-usecs: 3 rx-frames: 32 tx-usecs: 16 tx-frames: 32 [...] After the change: # ethtool -C eth0 adaptive-rx on adaptive-tx on # ethtool -C eth0 adaptive-rx off adaptive-tx off \ tx-usecs 123 rx-usecs 123 Adaptive RX: off TX: off rx-usecs: 123 rx-frames: 32 tx-usecs: 123 tx-frames: 32 [...] This only works for netlink, so it's a small discrepancy between netlink and ioctl(). Since we anticipate most users to move to netlink I believe it's worth making their lives easier. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24Merge branch 'update-coding-style-and-check-alloc_frag'Jakub Kicinski1-6/+18
Haiyang Zhang says: ==================== Update coding style and check alloc_frag Follow up patches for the jumbo frame support. As suggested by Jakub Kicinski, update coding style, and check napi_alloc_frag for possible fallback to single pages. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: mana: Check if netdev/napi_alloc_frag returns single pageHaiyang Zhang1-0/+15
netdev/napi_alloc_frag() may fall back to single page which is smaller than the requested size. Add error checking to avoid memory overwritten. Signed-off-by: Haiyang Zhang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: mana: Rename mana_refill_rxoob and remove some empty linesHaiyang Zhang1-6/+3
Rename mana_refill_rxoob for naming consistency. And remove some empty lines between function call and error checking. Signed-off-by: Haiyang Zhang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24Merge branch 'add-page_pool-support-for-page-recycling-in-veth-driver'Jakub Kicinski2-7/+63
Lorenzo Bianconi says: ==================== add page_pool support for page recycling in veth driver Introduce page_pool support in veth driver in order to recycle pages in veth_convert_skb_to_xdp_buff routine and avoid reallocating the skb through the page allocator when we run a xdp program on the device and we receive skbs from the stack. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: veth: add page_pool statsLorenzo Bianconi2-3/+18
Introduce page_pool stats support to report info about local page_pool through ethtool Tested-by: Maryam Tahhan <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24net: veth: add page_pool for page recyclingLorenzo Bianconi2-4/+45
Introduce page_pool support in veth driver in order to recycle pages in veth_convert_skb_to_xdp_buff routine and avoid reallocating the skb through the page allocator. The patch has been tested sending tcp traffic to a veth pair where the remote peer is running a simple xdp program just returning xdp_pass: veth upstream codebase: MTU 1500B: ~ 8Gbps MTU 8000B: ~ 13.9Gbps veth upstream codebase + pp support: MTU 1500B: ~ 9.2Gbps MTU 8000B: ~ 16.2Gbps Tested-by: Maryam Tahhan <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24netlink: Use copy_to_user() for optval in netlink_getsockopt().Kuniyuki Iwashima1-52/+23
Brad Spencer provided a detailed report [0] that when calling getsockopt() for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such options require at least sizeof(int) as length. The options return a flag value that fits into 1 byte, but such behaviour confuses users who do not initialise the variable before calling getsockopt() and do not strictly check the returned value as char. Currently, netlink_getsockopt() uses put_user() to copy data to optlen and optval, but put_user() casts the data based on the pointer, char *optval. As a result, only 1 byte is set to optval. To avoid this behaviour, we need to use copy_to_user() or cast optval for put_user(). Note that this changes the behaviour on big-endian systems, but we document that the size of optval is int in the man page. $ man 7 netlink ... Socket options To set or get a netlink socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level argument set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to an int. Fixes: 9a4595bc7e67 ("[NETLINK]: Add set/getsockopt options to support more than 32 groups") Fixes: be0c22a46cfb ("netlink: add NETLINK_BROADCAST_ERROR socket option") Fixes: 38938bfe3489 ("netlink: add NETLINK_NO_ENOBUFS socket flag") Fixes: 0a6a3a23ea6e ("netlink: add NETLINK_CAP_ACK socket option") Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting") Fixes: 89d35528d17d ("netlink: Add new socket option to enable strict checking on dumps") Reported-by: Brad Spencer <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-24selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter testsAndrii Nakryiko1-11/+15
iter_pass_iter_ptr_to_subprog subtest is relying on actual array size being passed as subprog parameter. This combined with recent fixes to precision tracking in conditional jumps ([0]) is now causing verifier to backtrack all the way to the point where sum() and fill() subprogs are called, at which point precision backtrack bails out and forces all the states to have precise SCALAR registers. This in turn causes each possible value of i within fill() and sum() subprogs to cause a different non-equivalent state, preventing iterator code to converge. For now, change the test to assume fixed size of passed in array. Once BPF verifier supports precision tracking across subprogram calls, these changes will be reverted as unnecessary. [0] 71b547f56124 ("bpf: Fix incorrect verifier pruning due to missing register precision taints") Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-24Merge tag 'nf-next-23-04-22' of ↵Jakub Kicinski12-382/+463
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Reduce jumpstack footprint: Stash chain in last rule marker in blob for tracing. Remove last rule and chain from jumpstack. From Florian Westphal. 2) nf_tables validates all tables before committing the new rules. Unfortunately, this has two drawbacks: - Since addition of the transaction mutex pernet state gets written to outside of the locked section from the cleanup callback, this is wrong so do this cleanup directly after table has passed all checks. - Revalidate tables that saw no changes. This can be avoided by keeping the validation state per table, not per netns. From Florian Westphal. 3) Get rid of a few redundant pointers in the traceinfo structure. The three removed pointers are used in the expression evaluation loop, so gcc keeps them in registers. Passing them to the (inlined) helpers thus doesn't increase nft_do_chain text size, while stack is reduced by another 24 bytes on 64bit arches. From Florian Westphal. 4) IPVS cleanups in several ways without implementing any functional changes, aside from removing some debugging output: - Update width of source for ip_vs_sync_conn_options The operation is safe, use an annotation to describe it properly. - Consistently use array_size() in ip_vs_conn_init() It seems better to use helpers consistently. - Remove {Enter,Leave}Function. These seem to be well past their use-by date. - Correct spelling in comments. From Simon Horman. 5) Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device. * tag 'nf-next-23-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow to create netdev chain without device netfilter: nf_tables: support for deleting devices in an existing netdev chain netfilter: nf_tables: support for adding new devices to an existing netdev chain netfilter: nf_tables: rename function to destroy hook list netfilter: nf_tables: do not send complete notification of deletions netfilter: nf_tables: extended netlink error reporting for netdevice ipvs: Correct spelling in comments ipvs: Remove {Enter,Leave}Function ipvs: Consistently use array_size() in ip_vs_conn_init() ipvs: Update width of source for ip_vs_sync_conn_options netfilter: nf_tables: do not store rule in traceinfo structure netfilter: nf_tables: do not store verdict in traceinfo structure netfilter: nf_tables: do not store pktinfo in traceinfo structure netfilter: nf_tables: remove unneeded conditional netfilter: nf_tables: make validation state per table netfilter: nf_tables: don't write table validation state without mutex netfilter: nf_tables: don't store chain address on jump netfilter: nf_tables: don't store address of last rule on jump netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>