aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-10-03net: phylink: use phy_interface_t bitmaps for optical modulesRussell King1-30/+134
Where a MAC provides a phy_interface_t bitmap, use these bitmaps to select the operating interface mode for optical SFP modules, rather than using the linkmode bitmaps. Signed-off-by: Russell King <[email protected]> Signed-off-by: Marek Behún <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03net: sfp: augment SFP parsing with phy_interface_t bitmapRussell King9-28/+78
We currently parse the SFP EEPROM to a bitmap of ethtool link modes, and then attempt to convert the link modes to a PHY interface mode. While this works at present, there are cases where this is sub-optimal. For example, where a module can operate with several different PHY interface modes. To start addressing this, arrange for the SFP EEPROM parsing to also provide a bitmap of the possible PHY interface modes. Signed-off-by: Russell King <[email protected]> Signed-off-by: Marek Behún <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03net: phylink: add ability to validate a set of interface modesRussell King (Oracle)1-7/+10
Rather than having the ability to validate all supported interface modes or a single interface mode, introduce the ability to validate a subset of supported modes. Signed-off-by: Russell King (Oracle) <[email protected]> [ rebased on current net-next ] Signed-off-by: Marek Behún <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03platform/x86: use PLATFORM_DEVID_NONE instead of -1Barnabás Pőcze25-26/+27
Use the `PLATFORM_DEVID_NONE` constant instead of hard-coding -1 when creating a platform device. No functional changes are intended. Signed-off-by: Barnabás Pőcze <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-10-03platform/x86/amd: pmc: Dump idle mask during "check" stage insteadMario Limonciello1-5/+3
The idle mask is dumped during the "prepare" and "restore" stage right now, which helps to demonstrate issues only related to the first s2idle entry. If the system has entered s2idle once, but was woken up never breaking the s2idle loop but also never went back to sleep we might still have another issue to deal with however. Move the dynamic debugging message here so that we'll catch it on each iteration. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216516 Signed-off-by: Mario Limonciello <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2022-10-03net: sparx5: Fix return type of sparx5_port_xmit_implNathan Huckleberry2-3/+3
The ndo_start_xmit field in net_device_ops is expected to be of type netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev). The mismatched return type breaks forward edge kCFI since the underlying function definition does not match the function hook definition. The return type of sparx5_port_xmit_impl should be changed from int to netdev_tx_t. Reported-by: Dan Carpenter <[email protected]> Link: https://github.com/ClangBuiltLinux/linux/issues/1703 Cc: [email protected] Signed-off-by: Nathan Huckleberry <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03af_unix: Fix memory leaks of the whole sk due to OOB skb.Kuniyuki Iwashima1-6/+7
syzbot reported a sequence of memory leaks, and one of them indicated we failed to free a whole sk: unreferenced object 0xffff8880126e0000 (size 1088): comm "syz-executor419", pid 326, jiffies 4294773607 (age 12.609s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 7d 00 00 00 00 00 00 00 ........}....... 01 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<000000006fefe750>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:1970 [<0000000074006db5>] sk_alloc+0x3b/0x800 net/core/sock.c:2029 [<00000000728cd434>] unix_create1+0xaf/0x920 net/unix/af_unix.c:928 [<00000000a279a139>] unix_create+0x113/0x1d0 net/unix/af_unix.c:997 [<0000000068259812>] __sock_create+0x2ab/0x550 net/socket.c:1516 [<00000000da1521e1>] sock_create net/socket.c:1566 [inline] [<00000000da1521e1>] __sys_socketpair+0x1a8/0x550 net/socket.c:1698 [<000000007ab259e1>] __do_sys_socketpair net/socket.c:1751 [inline] [<000000007ab259e1>] __se_sys_socketpair net/socket.c:1748 [inline] [<000000007ab259e1>] __x64_sys_socketpair+0x97/0x100 net/socket.c:1748 [<000000007dedddc1>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<000000007dedddc1>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<000000009456679f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd We can reproduce this issue by creating two AF_UNIX SOCK_STREAM sockets, send()ing an OOB skb to each other, and close()ing them without consuming the OOB skbs. int skpair[2]; socketpair(AF_UNIX, SOCK_STREAM, 0, skpair); send(skpair[0], "x", 1, MSG_OOB); send(skpair[1], "x", 1, MSG_OOB); close(skpair[0]); close(skpair[1]); Currently, we free an OOB skb in unix_sock_destructor() which is called via __sk_free(), but it's too late because the receiver's unix_sk(sk)->oob_skb is accounted against the sender's sk->sk_wmem_alloc and __sk_free() is called only when sk->sk_wmem_alloc is 0. In the repro sequences, we do not consume the OOB skb, so both two sk's sock_put() never reach __sk_free() due to the positive sk->sk_wmem_alloc. Then, no one can consume the OOB skb nor call __sk_free(), and we finally leak the two whole sk. Thus, we must free the unconsumed OOB skb earlier when close()ing the socket. Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: syzbot <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03Merge branch 'ip_tunnel-netlink-parms'David S. Miller5-156/+81
Liu Jian says: ==================== Add helper functions to parse netlink msg of ip_tunnel v1->v2: Move the implementation of the helper function to ip_tunnel_core.c v2->v3: Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL ==================== Signed-off-by: David S. Miller <[email protected]>
2022-10-03net: Add helper function to parse netlink msg of ip_tunnel_parmLiu Jian4-49/+37
Add ip_tunnel_netlink_parms to parse netlink msg of ip_tunnel_parm. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03net: Add helper function to parse netlink msg of ip_tunnel_encapLiu Jian5-107/+44
Add ip_tunnel_netlink_encap_parms to parse netlink msg of ip_tunnel_encap. Reduces duplicate code, no actual functional changes. Signed-off-by: Liu Jian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-03net: rds: don't hold sock lock when cancelling work from ↵Tetsuo Handa1-1/+1
rds_tcp_reset_callbacks() syzbot is reporting lockdep warning at rds_tcp_reset_callbacks() [1], for commit ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()") added cancel_delayed_work_sync() into a section protected by lock_sock() without realizing that rds_send_xmit() might call lock_sock(). We don't need to protect cancel_delayed_work_sync() using lock_sock(), for even if rds_{send,recv}_worker() re-queued this work while __flush_work() from cancel_delayed_work_sync() was waiting for this work to complete, retried rds_{send,recv}_worker() is no-op due to the absence of RDS_CONN_UP bit. Link: https://syzkaller.appspot.com/bug?extid=78c55c7bc6f66e53dce2 [1] Reported-by: syzbot <[email protected]> Co-developed-by: Hillf Danton <[email protected]> Signed-off-by: Hillf Danton <[email protected]> Signed-off-by: Tetsuo Handa <[email protected]> Tested-by: syzbot <[email protected]> Fixes: ac3615e7f3cffe2a ("RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()") Signed-off-by: David S. Miller <[email protected]>
2022-10-03Merge branch 'master' of ↵David S. Miller24-323/+738
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Refactor selftests to use an array of structs in xfrm_fill_key(). From Gautam Menghani. 2) Drop an unused argument from xfrm_policy_match. From Hongbin Wang. 3) Support collect metadata mode for xfrm interfaces. From Eyal Birger. 4) Add netlink extack support to xfrm. From Sabrina Dubroca. Please note, there is a merge conflict in: include/net/dst_metadata.h between commit: 0a28bfd4971f ("net/macsec: Add MACsec skb_metadata_dst Tx Data path support") from the net-next tree and commit: 5182a5d48c3d ("net: allow storing xfrm interface metadata in metadata_dst") from the ipsec-next tree. Can be solved as done in linux-next. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-10-03Merge branch 'for-next' into for-linusTakashi Iwai95-1089/+1106
2022-10-02hwmon: (corsair-psu) add USB id of new revision of the HX1000i psuWilken Gottwalt2-2/+3
Also updates the documentation accordingly. Signed-off-by: Wilken Gottwalt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2022-10-02Linux 6.0Linus Torvalds1-1/+1
2022-10-02iomap: add a tracepoint for mappings returned by map_blocksDarrick J. Wong2-0/+2
Add a new tracepoint so we can see what mapping the filesystem returns to writeback a dirty page. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2022-10-02iomap: iomap: fix memory corruption when recording errors during writebackDarrick J. Wong1-1/+1
Every now and then I see this crash on arm64: Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8 Buffer I/O error on dev dm-0, logical block 8733687, async page read Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 64k pages, 42-bit VAs, pgdp=0000000139750000 [00000000000000f8] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Buffer I/O error on dev dm-0, logical block 8733688, async page read Dumping ftrace buffer: Buffer I/O error on dev dm-0, logical block 8733689, async page read (ftrace buffer empty) XFS (dm-0): log I/O error -5 Modules linked in: dm_thin_pool dm_persistent_data XFS (dm-0): Metadata I/O Error (0x1) detected at xfs_trans_read_buf_map+0x1ec/0x590 [xfs] (fs/xfs/xfs_trans_buf.c:296). dm_bio_prison XFS (dm-0): Please unmount the filesystem and rectify the problem(s) XFS (dm-0): xfs_imap_lookup: xfs_ialloc_read_agi() returned error -5, agno 0 dm_bufio dm_log_writes xfs nft_chain_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_REJECT potentially unexpected fatal signal 6. nf_reject_ipv6 potentially unexpected fatal signal 6. ipt_REJECT nf_reject_ipv4 CPU: 1 PID: 122166 Comm: fsstress Tainted: G W 6.0.0-rc5-djwa #rc5 3004c9f1de887ebae86015f2677638ce51ee7 rpcsec_gss_krb5 auth_rpcgss xt_tcpudp ip_set_hash_ip ip_set_hash_net xt_set nft_compat ip_set_hash_mac ip_set nf_tables Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021 pstate: 60001000 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) ip_tables pc : 000003fd6d7df200 x_tables lr : 000003fd6d7df1ec overlay nfsv4 CPU: 0 PID: 54031 Comm: u4:3 Tainted: G W 6.0.0-rc5-djwa #rc5 3004c9f1de887ebae86015f2677638ce51ee7405 Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021 Workqueue: writeback wb_workfn sp : 000003ffd9522fd0 (flush-253:0) pstate: 60401005 (nZCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--) pc : errseq_set+0x1c/0x100 x29: 000003ffd9522fd0 x28: 0000000000000023 x27: 000002acefeb6780 x26: 0000000000000005 x25: 0000000000000001 x24: 0000000000000000 x23: 00000000ffffffff x22: 0000000000000005 lr : __filemap_set_wb_err+0x24/0xe0 x21: 0000000000000006 sp : fffffe000f80f760 x29: fffffe000f80f760 x28: 0000000000000003 x27: fffffe000f80f9f8 x26: 0000000002523000 x25: 00000000fffffffb x24: fffffe000f80f868 x23: fffffe000f80fbb0 x22: fffffc0180c26a78 x21: 0000000002530000 x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000001 x13: 0000000000470af3 x12: fffffc0058f70000 x11: 0000000000000040 x10: 0000000000001b20 x9 : fffffe000836b288 x8 : fffffc00eb9fd480 x7 : 0000000000f83659 x6 : 0000000000000000 x5 : 0000000000000869 x4 : 0000000000000005 x3 : 00000000000000f8 x20: 000003fd6d740020 x19: 000000000001dd36 x18: 0000000000000001 x17: 000003fd6d78704c x16: 0000000000000001 x15: 000002acfac87668 x2 : 0000000000000ffa x1 : 00000000fffffffb x0 : 00000000000000f8 Call trace: errseq_set+0x1c/0x100 __filemap_set_wb_err+0x24/0xe0 iomap_do_writepage+0x5e4/0xd5c write_cache_pages+0x208/0x674 iomap_writepages+0x34/0x60 xfs_vm_writepages+0x8c/0xcc [xfs 7a861f39c43631f15d3a5884246ba5035d4ca78b] x14: 0000000000000000 x13: 2064656e72757465 x12: 0000000000002180 x11: 000003fd6d8a82d0 x10: 0000000000000000 x9 : 000003fd6d8ae288 x8 : 0000000000000083 x7 : 00000000ffffffff x6 : 00000000ffffffee x5 : 00000000fbad2887 x4 : 000003fd6d9abb58 x3 : 000003fd6d740020 x2 : 0000000000000006 x1 : 000000000001dd36 x0 : 0000000000000000 CPU: 1 PID: 122167 Comm: fsstress Tainted: G W 6.0.0-rc5-djwa #rc5 3004c9f1de887ebae86015f2677638ce51ee7 do_writepages+0x90/0x1c4 __writeback_single_inode+0x4c/0x4ac Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021 writeback_sb_inodes+0x214/0x4ac wb_writeback+0xf4/0x3b0 pstate: 60001000 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) wb_workfn+0xfc/0x580 process_one_work+0x1e8/0x480 pc : 000003fd6d7df200 worker_thread+0x78/0x430 This crash is a result of iomap_writepage_map encountering some sort of error during writeback and wanting to set that error code in the file mapping so that fsync will report it. Unfortunately, the code dereferences folio->mapping after unlocking the folio, which means that another thread could have removed the page from the page cache (writeback doesn't hold the invalidation lock) and give it to somebody else. At best we crash the system like above; at worst, we corrupt memory or set an error on some other unsuspecting file while failing to record the problems with *this* file. Regardless, fix the problem by reporting the error to the inode mapping. NOTE: Commit 598ecfbaa742 lifted the XFS writeback code to iomap, so this fix should be backported to XFS in the 4.6-5.4 kernels in addition to iomap in the 5.5-5.19 kernels. Fixes: e735c0079465 ("iomap: Convert iomap_add_to_ioend() to take a folio") # 5.17 onward Fixes: 598ecfbaa742 ("iomap: lift the xfs writeback code to iomap") # 5.5-5.16, needs backporting Fixes: 150d5be09ce4 ("xfs: remove xfs_cancel_ioend") # 4.6-5.4, needs backporting Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
2022-10-02Merge tag 'i2c-for-6.0-rc8' of ↵Linus Torvalds2-1/+9
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Add missing DT bindings for STM32 and a resource leak fix for DaVinci" * tag 'i2c-for-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: davinci: fix PM disable depth imbalance in davinci_i2c_probe dt-bindings: i2c: st,stm32-i2c: Document wakeup-source property dt-bindings: i2c: st,stm32-i2c: Document interrupt-names property
2022-10-02Merge tag 'perf-urgent-2022-10-02' of ↵Linus Torvalds4-3/+57
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc perf fixes from Ingo Molnar: - Fix a PMU enumeration/initialization bug on Intel Alder Lake CPUs - Fix KVM guest PEBS register handling - Fix race/reentry bug in perf_output_read_group() reading of PMU counters * tag 'perf-urgent-2022-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix reentry problem in perf_output_read_group() perf/x86/core: Completely disable guest PEBS via guest's global_ctrl perf/x86/intel: Fix unchecked MSR access error for Alder Lake N
2022-10-02Merge tag 'x86_urgent_for_v6.0' of ↵Linus Torvalds2-32/+38
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Add the respective UP last level cache mask accessors in order not to cause segfaults when lscpu accesses their representation in sysfs - Fix for a race in the alternatives batch patching machinery when kprobes are set * tag 'x86_urgent_for_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cacheinfo: Add a cpu_llc_shared_mask() UP variant x86/alternative: Fix race in try_get_desc()
2022-10-02Merge branch 'tc-bind_class-hook'David S. Miller11-55/+22
Zhengchao Shao says: ==================== refactor duplicate codes in bind_class hook function All the bind_class callback duplicate the same logic, so we can refactor them. First, ensure n arg not empty before call bind_class hook function. Then, add tc_cls_bind_class() helper. Last, use tc_cls_bind_class() in filter. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-10-02net: sched: use tc_cls_bind_class() in filterZhengchao Shao9-54/+9
Use tc_cls_bind_class() in filter. Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-02net: sched: cls_api: introduce tc_cls_bind_class() helperZhengchao Shao1-0/+12
All the bind_class callback duplicate the same logic, this patch introduces tc_cls_bind_class() helper for common usage. Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-02net: sched: ensure n arg not empty before call bind_classZhengchao Shao1-1/+1
All bind_class callbacks are directly returned when n arg is empty. Therefore, bind_class is invoked only when n arg is not empty. Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-10-02hwmon: (pmbus/mp2888) Fix sensors readouts for MPS Multi-phase mp2888 controllerOleksandr Shamray1-7/+6
Fix scale factors for reading MPS Multi-phase mp2888 controller. Fixed sensors: - PIN/POUT: based on vendor documentation, set bscale factor 0.5W/LSB - IOUT: based on vendor documentation, set scale factor 0.25 A/LSB Fixes: e4db7719d037 ("hwmon: (pmbus) Add support for MPS Multi-phase mp2888 controller") Signed-off-by: Oleksandr Shamray <[email protected]> Reviewed-by: Vadim Pasternak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2022-10-02dt-bindings: hwmon: sensirion,shtc1: Clean up spelling mistakes and grammarColin Ian King1-4/+4
The yaml text contains some minor spelling mistakes and grammar issues, clean these up. Signed-off-by: Colin Ian King <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2022-10-02hwmon: (nct6683) remove unused variable in nct6683_create_attr_groupZeng Heng1-2/+2
When enable 'unused-but-set-variable' compile warning option, it would raise warning as below: drivers/hwmon/nct6683.c:415:9: warning: variable 'j' set but not used [-Wunused-but-set-variable] Variable 'j' in nct6683_create_attr_group is unused, so remove it and simplify the 'for' loop. Signed-off-by: Zeng Heng <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2022-10-02i2c: pci1xxxx: prevent signed integer overflowWolfram Sang1-3/+3
Some constants need 'UL' markings, otherwise they are shifted into the sign bit. Fixes: 361693697249 ("i2c: microchip: pci1xxxx: Add driver for I2C host controller in multifunction endpoint of pci1xxxx switch") Signed-off-by: Wolfram Sang <[email protected]>
2022-10-02i2c: acpi: Replace zero-length array with DECLARE_FLEX_ARRAY() helperGustavo A. R. Silva1-1/+1
Zero-length arrays are deprecated and we are moving towards adopting C99 flexible-array members, instead. So, replace zero-length arrays declarations in anonymous union with the new DECLARE_FLEX_ARRAY() helper macro. This helper allows for flexible-array members in unions. Link: https://github.com/KSPP/linux/issues/193 Link: https://github.com/KSPP/linux/issues/218 Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-10-02i2c: i801: Prefer async probeMani Milani1-0/+1
This i801 driver probe can take more than ~190ms in some devices, since the "i2c_register_spd()" call was added inside "i801_probe_optional_slaves()". Prefer async probe so that other drivers can be probed and boot can continue in parallel while this driver loads, to reduce boot time. There is no reason to block other drivers from probing while this driver is loading. Signed-off-by: Mani Milani <[email protected]> Tested-by: Jarkko Nikula <[email protected]> Reviewed-by: Jean Delvare <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-10-02i2c: davinci: fix PM disable depth imbalance in davinci_i2c_probeZhang Qilong1-1/+2
The pm_runtime_enable will increase power disable depth. Thus a pairing decrement is needed on the error handling path to keep it balanced according to context. Fixes: 17f88151ff190 ("i2c: davinci: Add PM Runtime Support") Signed-off-by: Zhang Qilong <[email protected]> Reviewed-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-10-02i2c: designware-pci: Use standard pattern for memory allocationAndy Shevchenko1-1/+1
The pattern foo = kmalloc(sizeof(*foo), GFP_KERNEL); has an advantage when foo type is changed. Since we are planning a such, better to be prepared by using standard pattern for memory allocation. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Jarkko Nikula <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-10-02i2c: designware-pci: Group AMD NAVI quirk parts togetherAndy Shevchenko1-15/+15
The code is ogranized in a way that all related parts to the certain platform quirk go together. This is not the case for AMD NAVI. Shuffle code to make it happen. While at it, drop the frequency definition and use hard coded value as it's done for other platforms and add a comment to the PCI ID list. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Jarkko Nikula <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-10-02dt-bindings: i2c: st,stm32-i2c: Document wakeup-source propertyMarek Vasut1-0/+2
Document wakeup-source property. This fixes dtbs_check warnings when building current Linux DTs: " arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dtb: i2c@40015000: Unevaluated properties are not allowed ('wakeup-source' was unexpected) " Signed-off-by: Marek Vasut <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-10-02dt-bindings: i2c: st,stm32-i2c: Document interrupt-names propertyMarek Vasut1-0/+5
Document interrupt-names property with "event" and "error" interrupt names. This fixes dtbs_check warnings when building current Linux DTs: " arch/arm/boot/dts/stm32mp153c-dhcom-drc02.dtb: i2c@40015000: Unevaluated properties are not allowed ('interrupt-names' was unexpected) " Signed-off-by: Marek Vasut <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2022-10-01Merge branch 'mlx5-xsk-updates-part3-2022-09-30'Jakub Kicinski21-385/+385
Saeed Mahameed says: ==================== mlx5 xsk updates part3 2022-09-30 The gist of this 4 part series is in this patchset's last patch This series contains performance optimizations. XSK starts using the batching allocator, and XSK data path gets separated from the regular RX, allowing to drop some branches not relevant for non-XSK use cases. Some minor optimizations for indirect calls and need_wakeup are also included. Other than that, this series adds a few features to the mlx5e implementation of XSK: 1. XDP metadata support on XSK RQs. 2. RSS contexts support for XSK RQs. 3. Some other optimizations 4. Last but not least, change the queuing scheme, so that XSK RQs no longer use higher indices, but replace the regular RQs. Maxim Says: ========== In the initial implementation of XSK in mlx5e, XSK RQs coexisted with regular RQs in the same channel. The main idea was to allow RSS work the same for regular traffic, without need to reconfigure RSS to exclude XSK queues. However, this scheme didn't prove to be beneficial, mainly because of incompatibility with other vendors. Some tools don't properly support using higher indices for XSK queues, some tools get confused with the double amount of RQs exposed in sysfs. Some use cases are purely XSK, and allocating the same amount of unused regular RQs is a waste of resources. This commit changes the queuing scheme to the standard one, where XSK RQs replace regular RQs on the channels where XSK sockets are open. Two RQs still exist in the channel to allow failsafe disable of XSK, but only one is exposed at a time. The next commit will achieve the desired memory save by flushing the buffers when the regular RQ is unused. As the result of this transition: 1. It's possible to use RSS contexts over XSK RQs. 2. It's possible to dedicate all queues to XSK. 3. When XSK RQs coexist with regular RQs, the admin should make sure no unwanted traffic goes into XSK RQs by either excluding them from RSS or settings up the XDP program to return XDP_PASS for non-XSK traffic. 4. When using a mixed fleet of mlx5e devices and other netdevs, the same configuration can be applied. If the application supports the fallback to copy mode on unsupported drivers, it will work too. ========== Part 4 will include some final xsk optimizations and minor improvements part 1: https://lore.kernel.org/netdev/[email protected]/ part 2: https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: xsk: Use queue indices starting from 0 for XSK queuesMaxim Mikityanskiy14-205/+52
In the initial implementation of XSK in mlx5e, XSK RQs coexisted with regular RQs in the same channel. The main idea was to allow RSS work the same for regular traffic, without need to reconfigure RSS to exclude XSK queues. However, this scheme didn't prove to be beneficial, mainly because of incompatibility with other vendors. Some tools don't properly support using higher indices for XSK queues, some tools get confused with the double amount of RQs exposed in sysfs. Some use cases are purely XSK, and allocating the same amount of unused regular RQs is a waste of resources. This commit changes the queuing scheme to the standard one, where XSK RQs replace regular RQs on the channels where XSK sockets are open. Two RQs still exist in the channel to allow failsafe disable of XSK, but only one is exposed at a time. The next commit will achieve the desired memory save by flushing the buffers when the regular RQ is unused. As the result of this transition: 1. It's possible to use RSS contexts over XSK RQs. 2. It's possible to dedicate all queues to XSK. 3. When XSK RQs coexist with regular RQs, the admin should make sure no unwanted traffic goes into XSK RQs by either excluding them from RSS or settings up the XDP program to return XDP_PASS for non-XSK traffic. 4. When using a mixed fleet of mlx5e devices and other netdevs, the same configuration can be applied. If the application supports the fallback to copy mode on unsupported drivers, it will work too. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Introduce the mlx5e_flush_rq functionMaxim Mikityanskiy3-24/+29
Add a function to flush an RQ: clean up descriptors, release pages and reset the RQ. This procedure is used by the recovery flow, and it will also be used in a following commit to free some memory when switching a channel to the XSK mode. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: xsk: Support XDP metadata on XSK RQsMaxim Mikityanskiy1-8/+12
Add support for XDP metadata on XSK RQs for cross-program communication. The driver no longer calls xdp_set_data_meta_invalid and copies the metadata to a newly allocated SKB on XDP_PASS. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Optimize RQ page deallocationMaxim Mikityanskiy2-19/+24
mlx5e_free_rx_mpwqe loops over all pages of a MPWQE, calling mlx5e_page_release for ones that are not scheduled for XDP_TX or XDP_REDIRECT; and mlx5e_page_release checks whether it's an XSK RQ or a regular one for each page/XSK frame. This check can be moved outside the loop to reduce the number of branches. mlx5e_free_rx_wqe loops over all fragments, calling mlx5e_page_release for the ones that are last in a page; and mlx5e_page_release checks whether it's an XSK RQ or a regular one for each fragment. Using the fact that XSK doesn't support multiple fragments, it can be optimized for both XSK and regular usages: 1. Make an early check for XSK and call its deallocator directly, saving 3 branches (loop condition, frag->last_in_page and selection of deallocator). 2. Call the regular deallocator directly in the non-XSK case, saving a branch per fragment, except the first one. After the changes, mlx5e_page_release is removed, as there are no callers left. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Call mlx5e_page_release_dynamic directly where possibleMaxim Mikityanskiy1-16/+4
mlx5e_page_release calls the appropriate deallocator depending on whether it's an XSK RQ or a regular one. Some flows that call this function are not compatible with XSK, so they can call the non-XSK deallocator directly to save a branch. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Use non-XSK page allocator in SHAMPOMaxim Mikityanskiy1-11/+1
The SHAMPO flow is not compatible with XSK, it can call the page pool allocator directly to save a branch. mlx5e_page_alloc is removed, as it's no longer used in any flow. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQMaxim Mikityanskiy4-48/+106
XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on striding RQ and creates an optimized flow for XSK. A side effect is an opportunity to optimize the regular RX flow by dropping branching for XSK cases. Performance improvement is up to 6.4% in the aligned mode and up to 7.5% in the unaligned mode. Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: xsk: Use xsk_buff_alloc_batch on legacy RQMaxim Mikityanskiy4-0/+55
XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on legacy RQ, adding a special case for XSK. The new branch introduced basically replaces the branch that was removed from the same place a few commits before. A check is made that DMA sync is not needed, because the batching allocator falls back to returning one frame when DMA sync is needed, and this is best handled by the loop in the standard case. Performance improvement is up to 8% in the aligned mode and up to 9% in the unaligned mode. Aligned mode, 2048-byte frames: 12.8 Mpps -> 13.5 Mpps Aligned mode, 4096-byte frames: 11.5 Mpps -> 12.4 Mpps Unaligned mode, 2048-byte frames: 12.2 Mpps -> 13.4 Mpps Unaligned mode, 3072-byte frames: 11.6 Mpps -> 12.5 Mpps Unaligned mode, 4096-byte frames: 11.2 Mpps -> 12.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: xsk: Split out WQE allocation for legacy XSK RQMaxim Mikityanskiy3-4/+34
Allocation of XSK frames on legacy RQ may be made more efficient with a specialized routine that relies on certain assumptions, such as there is only one fragment, allocation units (XSK frames) are not shared among multiple packets. It reduces the number of branches both in the XSK code and in the regular RQ, because with this approach there is only a single check whether it's an XSK or regular RQ. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Remove the outer loop when allocating legacy RQ WQEsMaxim Mikityanskiy2-22/+17
Legacy RQ WQEs are allocated in a loop in small batches (8 WQEs). As partial batches are allowed, there is no point to have a loop in a loop, so the outer loop is removed, and the batch size is increased up to the total number of WQEs to allocate, still not smaller than 8. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: xsk: Use partial batches in legacy RQ with XSKMaxim Mikityanskiy1-13/+1
The previous commit allowed allocating WQE batches in legacy RQ partially, however, XSK still checks whether there are enough frames in the fill ring. Remove this check to allow to allocate batches partially also with XSK. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Use partial batches in legacy RQMaxim Mikityanskiy1-18/+21
Legacy RQ allocates WQEs in batches. If the batch allocation fails, the pages of the allocated part are released. This commit changes this behavior to allow to use the pages that have been already allocated. After this change, we need to be careful about indexing rq->wqe.frags[]. The WQ size is a power of two that divides by wqe_bulk (8), and the old code used whole bulks, which allowed to use indices [8*K; 8*K+7] without overflowing. Now that the bulks may be partial, the range can start at any location (not only at 8*K), so we need to wrap them around to avoid out-of-bounds array access. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Make the wqe_index_mask calculation more exactMaxim Mikityanskiy1-1/+20
The old calculation of wqe_index_mask may give false positives, i.e. request bulking of pairs of WQEs when not strictly needed, for example, when the first fragment size is equal to the PAGE_SIZE, bulking is not needed, even if the number of fragments is odd. Make the calculation more exact to cut false positives. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-01net/mlx5e: Introduce wqe_index_mask for legacy RQMaxim Mikityanskiy2-4/+22
When fragments of different WQEs share the same page, mlx5e_post_rx_wqes must wait until the old WQE stops using the page, only then the new WQE can allocate the new page. Essentially, it means that if WQE index i is still in use, the allocation must stop before `i % bulk`, where bulk is the number of WQEs that may share the same page. As bulk is always a power of two, `i % bulk = i & (bulk - 1)`, and the new wqe_index_mask field will be equal to `bulk - 1`. At the same time, wqe_bulk remains for optimization purposes and stores `max(bulk, 8)`, which allows to skip the allocation until we have at least 8 WQEs free. Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>