aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-21netfilter: conntrack: include zone id in tuple hash againFlorian Westphal1-15/+52
commit deedb59039f111 ("netfilter: nf_conntrack: add direction support for zones") removed the zone id from the hash value. This has implications on hash chain lengths with overlapping tuples, which can hit 64k entries on released kernels, before upper droplimit was added in d7e7747ac5c ("netfilter: refuse insertion if chain has grown too large"). With that change reverted, test script coming with this series shows linear insertion time growth: 10000 entries in 3737 ms (now 10000 total, loop 1) 10000 entries in 16994 ms (now 20000 total, loop 2) 10000 entries in 47787 ms (now 30000 total, loop 3) 10000 entries in 72731 ms (now 40000 total, loop 4) 10000 entries in 95761 ms (now 50000 total, loop 5) 10000 entries in 96809 ms (now 60000 total, loop 6) inserted 60000 entries from packet path in 333825 ms With d7e7747ac5c in place, the test fails. There are three supported zone use cases: 1. Connection is in the default zone (zone 0). This means to special config (the default). 2. Connection is in a different zone (1 to 2**16). This means rules are in place to put packets in the desired zone, e.g. derived from vlan id or interface. 3. Original direction is in zone X and Reply is in zone 0. 3) allows to use of the existing NAT port collision avoidance to provide connectivity to internet/wan even when the various zones have overlapping source networks separated via policy routing. In case the original zone is 0 all three cases are identical. There is no way to place original direction in zone x and reply in zone y (with y != 0). Zones need to be assigned manually via the iptables/nftables ruleset, before conntrack lookup occurs (raw table in iptables) using the "CT" target conntrack template support (-j CT --{zone,zone-orig,zone-reply} X). Normally zone assignment happens based on incoming interface, but could also be derived from packet mark, vlan id and so on. This means that when case 3 is used, the ruleset will typically not even assign a connection tracking template to the "reply" packets, so lookup happens in zone 0. However, it is possible that reply packets also match a ct zone assignment rule which sets up a template for zone X (X > 0) in original direction only. Therefore, after making the zone id part of the hash, we need to do a second lookup using the reply zone id if we did not find an entry on the first lookup. In practice, most deployments will either not use zones at all or the origin and reply zones are the same, no second lookup is required in either case. After this change, packet path insertion test passes with constant insertion times: 10000 entries in 1064 ms (now 10000 total, loop 1) 10000 entries in 1074 ms (now 20000 total, loop 2) 10000 entries in 1066 ms (now 30000 total, loop 3) 10000 entries in 1079 ms (now 40000 total, loop 4) 10000 entries in 1081 ms (now 50000 total, loop 5) 10000 entries in 1082 ms (now 60000 total, loop 6) inserted 60000 entries from packet path in 6452 ms Cc: Daniel Borkmann <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-09-21netfilter: conntrack: make max chain length randomFlorian Westphal1-6/+11
Similar to commit 67d6d681e15b ("ipv4: make exception cache less predictible"): Use a random drop length to make it harder to detect when entries were hashed to same bucket list. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-09-20Merge tag 'afs-fixes-20210913' of ↵Linus Torvalds17-120/+294
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fixes for AFS problems that can cause data corruption due to interaction with another client modifying data cached locally: - When d_revalidating a dentry, don't look at the inode to which it points. Only check the directory to which the dentry belongs. This was confusing things and causing the silly-rename cleanup code to remove the file now at the dentry of a file that got deleted. - Fix mmap data coherency. When a callback break is received that relates to a file that we have cached, the data content may have been changed (there are other reasons, such as the user's rights having been changed). However, we're checking it lazily, only on entry to the kernel, which doesn't happen if we have a writeable shared mapped page on that file. We make the kernel keep track of mmapped files and clear all PTEs mapping to that file as soon as the callback comes in by calling unmap_mapping_pages() (we don't necessarily want to zap the pagecache). This causes the kernel to be reentered when userspace tries to access the mmapped address range again - and at that point we can query the server and, if we need to, zap the page cache. Ideally, I would check each file at the point of notification, but that involves poking the server[*] - which is holding an exclusive lock on the vnode it is changing, waiting for all the clients it notified to reply. This could then deadlock against the server. Further, invalidating the pagecache might call ->launder_page(), which would try to write to the file, which would definitely deadlock. (AFS doesn't lease file access). [*] Checking to see if the file content has changed is a matter of comparing the current data version number, but we have to ask the server for that. We also need to get a new callback promise and we need to poke the server for that too. - Add some more points at which the inode is validated, since we're doing it lazily, notably in ->read_iter() and ->page_mkwrite(), but also when performing some directory operations. Ideally, checking in ->read_iter() would be done in some derivation of filemap_read(). If we're going to call the server to read the file, then we get the file status fetch as part of that. - The above is now causing us to make a lot more calls to afs_validate() to check the inode - and afs_validate() takes the RCU read lock each time to make a quick check (ie. afs_check_validity()). This is entirely for the purpose of checking cb_s_break to see if the server we're using reinitialised its list of callbacks - however this isn't a very common event, so most of the time we're taking this needlessly. Add a new cell-wide counter to count the number of reinitialisations done by any server and check that - and only if that changes, take the RCU read lock and check the server list (the server list may change, but the cell a file is part of won't). - Don't update vnode->cb_s_break and ->cb_v_break inside the validity checking loop. The cb_lock is done with read_seqretry, so we might go round the loop a second time after resetting those values - and that could cause someone else checking validity to miss something (I think). Also included are patches for fixes for some bugs encountered whilst debugging this: - Fix a leak of afs_read objects and fix a leak of keys hidden by that. - Fix a leak of pages that couldn't be added to extend a writeback. - Fix the maintenance of i_blocks when i_size is changed by a local write or a local dir edit" Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217 [1] Link: https://lore.kernel.org/r/163111665183.283156.17200205573146438918.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163113612442.352844.11162345591911691150.stgit@warthog.procyon.org.uk/ # i_blocks patch * tag 'afs-fixes-20210913' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix updating of i_blocks on file/dir extension afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server afs: Try to avoid taking RCU read lock when checking vnode validity afs: Fix mmap coherency vs 3rd-party changes afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation afs: Add missing vnode validation checks afs: Fix page leak afs: Fix missing put on afs_read objects and missing get on the key therein
2021-09-20Merge tag '5.15-rc1-ksmbd' of git://git.samba.org/ksmbdLinus Torvalds4-17/+81
Pull ksmbd server fixes from Steve French: "Three ksmbd fixes, including an important security fix for path processing, and a buffer overflow check, and a trivial fix for incorrect header inclusion" * tag '5.15-rc1-ksmbd' of git://git.samba.org/ksmbd: ksmbd: add validation for FILE_FULL_EA_INFORMATION of smb2_get_info ksmbd: prevent out of share access ksmbd: transport_rdma: Don't include rwlock.h directly
2021-09-20Merge tag '5.15-rc1-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds48-57/+67
Pull cifs client fixes from Steve French: - two deferred close fixes (for bugs found with xfstests 478 and 461) - a deferred close improvement in rename - two trivial fixes for incorrect Linux comment formatting of multiple cifs files (pointed out by automated kernel test robot and checkpatch) * tag '5.15-rc1-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: Not to defer close on file when lock is set cifs: Fix soft lockup during fsstress cifs: Deferred close performance improvements cifs: fix incorrect kernel doc comments cifs: remove pathname for file from SPDX header
2021-09-20Merge tag 'spi-fix-v5.15-rc2' of ↵Linus Torvalds2-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark BrownL "This contains a couple of fixes, one fix for handling of zero length transfers on Rockchip devices and a warning fix which will conflict with a version you did but cleans up some extra unneeded forward declarations as well which seems a bit neater" * tag 'spi-fix-v5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: tegra20-slink: Declare runtime suspend and resume functions conditionally spi: rockchip: handle zero length transfers without timing out
2021-09-20Merge tag 'regulator-fix-v5.15-rc2' of ↵Linus Torvalds2-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A couple of small device specific fixes that have been sent since the merge window, neither of which stands out particularly" * tag 'regulator-fix-v5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: max14577: Revert "regulator: max14577: Add proper module aliases strings" regulator: qcom-rpmh-regulator: fix pm8009-1 ldo7 resource name
2021-09-20drm/nouveau/nvkm: Replace -ENOSYS with -ENODEVGuenter Roeck1-1/+1
nvkm test builds fail with the following error. drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c: In function 'nvkm_control_mthd_pstate_info': drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c:60:35: error: overflow in conversion from 'int' to '__s8' {aka 'signed char'} changes value from '-251' to '5' The code builds on most architectures, but fails on parisc where ENOSYS is defined as 251. Replace the error code with -ENODEV (-19). The actual error code does not really matter and is not passed to userspace - it just has to be negative. Fixes: 7238eca4cf18 ("drm/nouveau: expose pstate selection per-power source in sysfs") Signed-off-by: Guenter Roeck <[email protected]> Cc: Ben Skeggs <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-20sparc64: fix pci_iounmap() when CONFIG_PCI is not setLinus Torvalds1-0/+2
Guenter reported [1] that the pci_iounmap() changes remain problematic, with sparc64 allnoconfig and tinyconfig still not building due to the header file changes and confusion with the arch-specific pci_iounmap() implementation. I'm pretty convinced that sparc should just use GENERIC_IOMAP instead of doing its own thing, since it turns out that the sparc64 version of pci_iounmap() is somewhat buggy (see [2]). But in the meantime, this just fixes the build by avoiding the trivial re-definition of the empty case. Link: https://lore.kernel.org/lkml/[email protected]/ [1] Link: https://lore.kernel.org/lkml/CAHk-=wgheheFx9myQyy5osh79BAazvmvYURAtub2gQtMvLrhqQ@mail.gmail.com/ [2] Reported-by: Guenter Roeck <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-20Merge branch 'hns3-fixes'David S. Miller5-49/+103
Guangbin Huang says: ==================== net: hns3: add some fixes for -net This series adds some fixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-09-20net: hns3: fix a return value error in hclge_get_reset_status()Yufeng Mo1-4/+11
hclge_get_reset_status() should return the tqp reset status. However, if the CMDQ fails, the caller will take it as tqp reset success status by mistake. Therefore, uses a parameters to get the tqp reset status instead. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20net: hns3: check vlan id before using itliaoguojia1-0/+3
The input parameters may not be reliable, so check the vlan id before using it, otherwise may set wrong vlan id into hardware. Fixes: dc8131d846d4 ("net: hns3: Fix for packet loss due wrong filter config in VLAN tbls") Signed-off-by: liaoguojia <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20net: hns3: check queue id range before usingYufeng Mo1-0/+8
The input parameters may not be reliable. Before using the queue id, we should check this parameter. Otherwise, memory overwriting may occur. Fixes: d34100184685 ("net: hns3: refactor the mailbox message between PF and VF") Signed-off-by: Yufeng Mo <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20net: hns3: fix misuse vf id and vport id in some logsJiaran Zhang4-10/+12
vport_id include PF and VFs, vport_id = 0 means PF, other values mean VFs. So the actual vf id is equal to vport_id minus 1. Some VF print logs are actually vport, and logs of vf id actually use vport id, so this patch fixes them. Fixes: ac887be5b0fe ("net: hns3: change print level of RAS error log from warning to error") Fixes: adcf738b804b ("net: hns3: cleanup some print format warning") Signed-off-by: Jiaran Zhang <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20net: hns3: fix inconsistent vf id printJian Shen1-2/+5
The vf id from ethtool is added 1 before configured to driver. So it's necessary to minus 1 when printing it, in order to keep consistent with user's configuration. Fixes: dd74f815dd41 ("net: hns3: Add support for rule add/delete for flow director") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20net: hns3: fix change RSS 'hfunc' ineffective issueJian Shen2-33/+64
When user change rss 'hfunc' without set rss 'hkey' by ethtool -X command, the driver will ignore the 'hfunc' for the hkey is NULL. It's unreasonable. So fix it. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Fixes: 374ad291762a ("net: hns3: Add RSS general configuration support for VF") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20ptp: ocp: add COMMON_CLK dependencyArnd Bergmann1-0/+1
Without CONFIG_COMMON_CLK, this fails to link: arm-linux-gnueabi-ld: drivers/ptp/ptp_ocp.o: in function `ptp_ocp_register_i2c': ptp_ocp.c:(.text+0xcc0): undefined reference to `__clk_hw_register_fixed_rate' arm-linux-gnueabi-ld: ptp_ocp.c:(.text+0xcf4): undefined reference to `devm_clk_hw_register_clkdev' arm-linux-gnueabi-ld: drivers/ptp/ptp_ocp.o: in function `ptp_ocp_detach': ptp_ocp.c:(.text+0x1c24): undefined reference to `clk_hw_unregister_fixed_rate' Fixes: a7e1abad13f3 ("ptp: Add clock driver for the OpenCompute TimeCard.") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20bnxt_en: Fix TX timeout when TX ring size is set to the smallestMichael Chan3-5/+10
The smallest TX ring size we support must fit a TX SKB with MAX_SKB_FRAGS + 1. Because the first TX BD for a packet is always a long TX BD, we need an extra TX BD to fit this packet. Define BNXT_MIN_TX_DESC_CNT with this value to make this more clear. The current code uses a minimum that is off by 1. Fix it using this constant. The tx_wake_thresh to determine when to wake up the TX queue is half the ring size but we must have at least BNXT_MIN_TX_DESC_CNT for the next packet which may have maximum fragments. So the comparison of the available TX BDs with tx_wake_thresh should be >= instead of > in the current code. Otherwise, at the smallest ring size, we will never wake up the TX queue and will cause TX timeout. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20nexthop: Fix division by zero while replacing a resilient groupIdo Schimmel1-0/+2
The resilient nexthop group torture tests in fib_nexthop.sh exposed a possible division by zero while replacing a resilient group [1]. The division by zero occurs when the data path sees a resilient nexthop group with zero buckets. The tests replace a resilient nexthop group in a loop while traffic is forwarded through it. The tests do not specify the number of buckets while performing the replacement, resulting in the kernel allocating a stub resilient table (i.e, 'struct nh_res_table') with zero buckets. This table should never be visible to the data path, but the old nexthop group (i.e., 'oldg') might still be used by the data path when the stub table is assigned to it. Fix this by only assigning the stub table to the old nexthop group after making sure the group is no longer used by the data path. Tested with fib_nexthops.sh: Tests passed: 222 Tests failed: 0 [1] divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 1850 Comm: ping Not tainted 5.14.0-custom-10271-ga86eb53057fe #1107 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:nexthop_select_path+0x2d2/0x1a80 [...] Call Trace: fib_select_multipath+0x79b/0x1530 fib_select_path+0x8fb/0x1c10 ip_route_output_key_hash_rcu+0x1198/0x2da0 ip_route_output_key_hash+0x190/0x340 ip_route_output_flow+0x21/0x120 raw_sendmsg+0x91d/0x2e10 inet_sendmsg+0x9e/0xe0 __sys_sendto+0x23d/0x360 __x64_sys_sendto+0xe1/0x1b0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: [email protected] Fixes: 283a72a5599e ("nexthop: Add implementation of resilient next-hop groups") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-20napi: fix race inside napi_enableXuan Zhuo1-6/+10
The process will cause napi.state to contain NAPI_STATE_SCHED and not in the poll_list, which will cause napi_disable() to get stuck. The prefix "NAPI_STATE_" is removed in the figure below, and NAPI_STATE_HASHED is ignored in napi.state. CPU0 | CPU1 | napi.state =============================================================================== napi_disable() | | SCHED | NPSVC napi_enable() | | { | | smp_mb__before_atomic(); | | clear_bit(SCHED, &n->state); | | NPSVC | napi_schedule_prep() | SCHED | NPSVC | napi_poll() | | napi_complete_done() | | { | | if (n->state & (NPSVC | | (1) | _BUSY_POLL))) | | return false; | | ................ | | } | SCHED | NPSVC | | clear_bit(NPSVC, &n->state); | | SCHED } | | | | napi_schedule_prep() | | SCHED | MISSED (2) (1) Here return direct. Because of NAPI_STATE_NPSVC exists. (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list Since NAPI_STATE_SCHED already exists and napi is not in the sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always exist. 1. This will cause this queue to no longer receive packets. 2. If you encounter napi_disable under the protection of rtnl_lock, it will cause the entire rtnl_lock to be locked, affecting the overall system. This patch uses cmpxchg to implement napi_enable(), which ensures that there will be no race due to the separation of clear two bits. Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable") Signed-off-by: Xuan Zhuo <[email protected]> Reviewed-by: Dust Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19Linux 5.15-rc2Linus Torvalds1-1/+1
2021-09-19pci_iounmap'2: Electric Boogaloo: try to make sense of it allLinus Torvalds2-23/+46
Nathan Chancellor reports that the recent change to pci_iounmap in commit 9caea0007601 ("parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") causes build errors on arm64. It took me about two hours to convince myself that I think I know what the logic of that mess of #ifdef's in the <asm-generic/io.h> header file really aim to do, and rewrite it to be easier to follow. Famous last words. Anyway, the code has now been lifted from that grotty header file into lib/pci_iomap.c, and has fairly extensive comments about what the logic is. It also avoids indirecting through another confusing (and badly named) helper function that has other preprocessor config conditionals. Let's see what odd architecture did something else strange in this area to break things. But my arm64 cross build is clean. Fixes: 9caea0007601 ("parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") Reported-by: Nathan Chancellor <[email protected]> Cc: Helge Deller <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Ulrich Teichert <[email protected]> Cc: James Bottomley <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-19Merge tag 'x86_urgent_for_v5.15_rc2' of ↵Linus Torvalds5-15/+47
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Prevent a infinite loop in the MCE recovery on return to user space, which was caused by a second MCE queueing work for the same page and thereby creating a circular work list. - Make kern_addr_valid() handle existing PMD entries, which are marked not present in the higher level page table, correctly instead of blindly dereferencing them. - Pass a valid address to sanitize_phys(). This was caused by the mixture of inclusive and exclusive ranges. memtype_reserve() expect 'end' being exclusive, but sanitize_phys() wants it inclusive. This worked so far, but with end being the end of the physical address space the fail is exposed. - Increase the maximum supported GPIO numbers for 64bit. Newer SoCs exceed the previous maximum. * tag 'x86_urgent_for_v5.15_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Avoid infinite loop for copy from user recovery x86/mm: Fix kern_addr_valid() to cope with existing but not present entries x86/platform: Increase maximum GPIO number for X86_64 x86/pat: Pass valid address to sanitize_phys()
2021-09-19Merge tag 'perf-urgent-2021-09-19' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Thomas Gleixner: "A single fix for the perf core where a value read with READ_ONCE() was checked and then reread which makes all the checks invalid. Reuse the already read value instead" * tag 'perf-urgent-2021-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: events: Reuse value read using READ_ONCE instead of re-reading it
2021-09-19Merge tag 'locking-urgent-2021-09-19' of ↵Linus Torvalds1-20/+45
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A set of updates for the RT specific reader/writer locking base code: - Make the fast path reader ordering guarantees correct. - Code reshuffling to make the fix simpler" [ This plays ugly games with atomic_add_return_release() because we don't have a plain atomic_add_release(), and should really be cleaned up, I think - Linus ] * tag 'locking-urgent-2021-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwbase: Take care of ordering guarantee for fastpath reader locking/rwbase: Extract __rwbase_write_trylock() locking/rwbase: Properly match set_and_save_state() to restore_state()
2021-09-19Merge tag 'powerpc-5.15-2' of ↵Linus Torvalds7-55/+159
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when scv (System Call Vectored) is used to make a syscall when a transaction is active, on Power9 or later. - Fix bad interactions between rfscv (Return-from scv) and Power9 fake-suspend mode. - Fix crashes when handling machine checks in LPARs using the Hash MMU. - Partly revert a recent change to our XICS interrupt controller code, which broke the recently added Microwatt support. Thanks to Cédric Le Goater, Eirik Fuller, Ganesh Goudar, Gustavo Romero, Joel Stanley, Nicholas Piggin. * tag 'powerpc-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xics: Set the IRQ chip data for the ICS native backend powerpc/mce: Fix access error in mce handler KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers powerpc/64s: system call rfscv workaround for TM bugs selftests/powerpc: Add scv versions of the basic TM syscall tests powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
2021-09-19Merge tag 'kbuild-fixes-v5.15' of ↵Linus Torvalds6-20/+27
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix bugs in checkkconfigsymbols.py - Fix missing sys import in gen_compile_commands.py - Fix missing FORCE warning for ARCH=sh builds - Fix -Wignored-optimization-argument warnings for Clang builds - Turn -Wignored-optimization-argument into an error in order to stop building instead of sprinkling warnings * tag 'kbuild-fixes-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Add -Werror=ignored-optimization-argument to CLANG_FLAGS x86/build: Do not add -falign flags unconditionally for clang kbuild: Fix comment typo in scripts/Makefile.modpost sh: Add missing FORCE prerequisites in Makefile gen_compile_commands: fix missing 'sys' package checkkconfigsymbols.py: Remove skipping of help lines in parse_kconfig_file checkkconfigsymbols.py: Forbid passing 'HEAD' to --commit
2021-09-19Merge tag 'perf-tools-fixes-for-v5.15-2021-09-18' of ↵Linus Torvalds7-51/+100
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix ip display in 'perf script' when output type != attr->type. - Ignore deprecation warning when using libbpf'sg btf__get_from_id(), fixing the build with libbpf v0.6+. - Make use of FD() robust in libperf, fixing a segfault with 'perf stat --iostat list'. - Initialize addr_location:srcline pointer to NULL when resolving callchain addresses. - Fix fused instruction logic for assembly functions in 'perf annotate'. * tag 'perf-tools-fixes-for-v5.15-2021-09-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf bpf: Ignore deprecation warning when using libbpf's btf__get_from_id() libperf evsel: Make use of FD robust. perf machine: Initialize srcline string member in add_location struct perf script: Fix ip display when type != attr->type perf annotate: Fix fused instr logic for assembly functions
2021-09-19dmascc: use proper 'virt_to_bus()' rather than casting to 'int'Linus Torvalds1-3/+3
The old dmascc driver depends on the legacy ISA_DMA_API, and blindly just casts the kernel virtual address to 'int' for set_dma_addr(). That works only incidentally, and because the high bits of the address will be ignored anyway. And on 64-bit architectures it causes warnings. Admittedly, 64-bit architectures with ISA are basically dead - I think the only example of this is alpha, and nobody would ever use the dmascc driver there. But hey, the fix is easy enough, the end result is cleaner, and it's yet another configuration that now builds without warnings. If somebody actually uses this driver on an alpha and this fixes it for you, please email me. Because that is just incredibly bizarre. Signed-off-by: Linus Torvalds <[email protected]>
2021-09-19alpha: enable GENERIC_PCI_IOMAP unconditionallyLinus Torvalds1-1/+1
With the previous commit (9caea0007601: "parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled") we can now enable GENERIC_PCI_IOMAP unconditionally on alpha, and if PCI is not enabled we will just get the nice empty helper functions that allow mixed-bus drivers to build. Example driver: the old 3com/3c59x.c driver works with either the PCI or the EISA version of the 3x59x card, but wouldn't build in an EISA-only configuration because of missing pci_iomap() and pci_iounmap() dummy wrappers. Most of the other PCI infrastructure just becomes empty wrappers even without GENERIC_PCI_IOMAP, and it's not obvious that the pci_iomap functionality shouldn't do the same, but this works. Cc: Ulrich Teichert <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-19parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabledHelge Deller3-11/+6
Linus noticed odd declaration rules for pci_iounmap() in iomap.h and pci_iomap.h, where it dependend on either NO_GENERIC_PCI_IOPORT_MAP or GENERIC_IOMAP when CONFIG_PCI was disabled. Testing on parisc seems to indicate that we need pci_iounmap() only when CONFIG_PCI is enabled, so the declaration of pci_iounmap() can be moved cleanly into pci_iomap.h in sync with the declarations of pci_iomap(). Link: https://lore.kernel.org/all/CAHk-=wjRrh98pZoQ+AzfWmsTZacWxTJKXZ9eKU2X_0+jM=O8nw@mail.gmail.com/ Signed-off-by: Helge Deller <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Fixes: 97a29d59fc22 ("[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional") Cc: Arnd Bergmann <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Ulrich Teichert <[email protected]> Cc: James Bottomley <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-19Revert "drm/vc4: hdmi: Remove drm_encoder->crtc usage"Linus Torvalds1-27/+13
This reverts commit 27da370e0fb343a0baf308f503bb3e5dcdfe3362. Sudip Mukherjee reports that this broke pulseaudio with a NULL pointer dereference in vc4_hdmi_audio_prepare(), bisected it to this commit, and confirmed that a revert fixed the problem. Revert the problematic commit until fixed. Link: https://lore.kernel.org/all/CADVatmPB9-oKd=ypvj25UYysVo6EZhQ6bCM7EvztQBMyiZfAyw@mail.gmail.com/ Link: https://lore.kernel.org/all/CADVatmN5EpRshGEPS_JozbFQRXg5w_8LFB3OMP1Ai-ghxd3w4g@mail.gmail.com/ Reported-and-tested-by: Sudip Mukherjee <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Emma Anholt <[email protected]> Cc: Dave Airlie <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-19Revert drm/vc4 hdmi runtime PM changesLinus Torvalds1-34/+10
This reverts commits 9984d6664ce9 ("drm/vc4: hdmi: Make sure the controller is powered in detect") 411efa18e4b0 ("drm/vc4: hdmi: Move the HSM clock enable to runtime_pm") as Michael Stapelberg reports that the new runtime PM changes cause his Raspberry Pi 3 to hang on boot, probably due to interactions with other changes in the DRM tree (because a bisect points to the merge in commit e058a84bfddc: "Merge tag 'drm-next-2021-07-01' of git://.../drm"). Revert these two commits until it's been resolved. Link: https://lore.kernel.org/all/871r5mp7h2.fsf@midna.i-did-not-set--mail-host-address--so-tickle-me/ Reported-and-tested-by: Michael Stapelberg <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Dave Stevenson <[email protected]> Cc: Dave Airlie <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-19selftests: net: af_unix: Fix makefile to use TEST_GEN_PROGSShuah Khan1-4/+1
Makefile uses TEST_PROGS instead of TEST_GEN_PROGS to define executables. TEST_PROGS is for shell scripts that need to be installed and run by the common lib.mk framework. The common framework doesn't touch TEST_PROGS when it does build and clean. As a result "make kselftest-clean" and "make clean" fail to remove executables. Run and install work because the common framework runs and installs TEST_PROGS. Build works because the Makefile defines "all" rule which is unnecessary if TEST_GEN_PROGS is used. Use TEST_GEN_PROGS so the common framework can handle build/run/ install/clean properly. Signed-off-by: Shuah Khan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19net/mlx4_en: Resolve bad operstate valueLama Kayal2-19/+29
Any link state change that's done prior to net device registration isn't reflected on the state, thus the operational state is left obsolete, with 'UNKNOWN' status. To resolve the issue, query link state from FW upon open operations to ensure operational state is updated. Fixes: c27a02cd94d6 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC") Signed-off-by: Lama Kayal <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19selftests: net: af_unix: Fix incorrect args in test result msgShuah Khan1-2/+3
Fix the args to fprintf(). Splitting the message ends up passing incorrect arg for "sigurg %d" and an extra arg overall. The test result message ends up incorrect. test_unix_oob.c: In function ‘main’: test_unix_oob.c:274:43: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘char *’ [-Wformat=] 274 | fprintf(stderr, "Test 3 failed, sigurg %d len %d OOB %c ", | ~^ | | | int | %s 275 | "atmark %d\n", signal_recvd, len, oob, atmark); | ~~~~~~~~~~~~~ | | | char * test_unix_oob.c:274:19: warning: too many arguments for format [-Wformat-extra-args] 274 | fprintf(stderr, "Test 3 failed, sigurg %d len %d OOB %c ", Signed-off-by: Shuah Khan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19net: bgmac-bcma: handle deferred probe error due to mac-addressChristian Lamparter1-0/+2
Due to the inclusion of nvmem handling into the mac-address getter function of_get_mac_address() by commit d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") it is now possible to get a -EPROBE_DEFER return code. Which did cause bgmac to assign a random ethernet address. This exact issue happened on my Meraki MR32. The nvmem provider is an EEPROM (at24c64) which gets instantiated once the module driver is loaded... This happens once the filesystem becomes available. With this patch, bgmac_probe() will propagate the -EPROBE_DEFER error. Then the driver subsystem will reschedule the probe at a later time. Cc: Petr Štetiar <[email protected]> Cc: Michael Walle <[email protected]> Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") Signed-off-by: Christian Lamparter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19net: dsa: tear down devlink port regions when tearing down the devlink port ↵Vladimir Oltean5-73/+81
on error Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal") decided it was fine to ignore errors on certain ports that fail to probe, and go on with the ports that do probe fine. Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port") noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get called, and devlink notices after a timeout of 3600 seconds and prints a WARN_ON. So it went ahead to unregister the devlink port. And because there exists an UNUSED port flavour, we actually re-register the devlink port as UNUSED. Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to DSA") added devlink port regions, which are set up by the driver and not by DSA. When we trigger the devlink port deregistration and reregistration as unused, devlink now prints another WARN_ON, from here: devlink_port_unregister: WARN_ON(!list_empty(&devlink_port->region_list)); So the port still has regions, which makes sense, because they were set up by the driver, and the driver doesn't know we're unregistering the devlink port. Somebody needs to tear them down, and optionally (actually it would be nice, to be consistent) set them up again for the new devlink port. But DSA's layering stays in our way quite badly here. The options I've considered are: 1. Introduce a function in devlink to just change a port's type and flavour. No dice, devlink keeps a lot of state, it really wants the port to not be registered when you set its parameters, so changing anything can only be done by destroying what we currently have and recreating it. 2. Make DSA cache the parameters passed to dsa_devlink_port_region_create, and the region returned, keep those in a list, then when the devlink port unregister needs to take place, the existing devlink regions are destroyed by DSA, and we replay the creation of new regions using the cached parameters. Problem: mv88e6xxx keeps the region pointers in chip->ports[port].region, and these will remain stale after DSA frees them. There are many things DSA can do, but updating mv88e6xxx's private pointers is not one of them. 3. Just let the driver do it (i.e. introduce a very specific method called ds->ops->port_reinit_as_unused, which unregisters its devlink port devlink regions, then the old devlink port, then registers the new one, then the devlink port regions for it). While it does work, as opposed to the others, it's pretty horrible from an API perspective and we can do better. 4. Introduce a new pair of methods, ->port_setup and ->port_teardown, which in the case of mv88e6xxx must register and unregister the devlink port regions. Call these 2 methods when the port must be reinitialized as unused. Naturally, I went for the 4th approach. Fixes: 08156ba430b4 ("net: dsa: Add devlink port regions support to DSA") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19net: freescale: drop unneeded MODULE_ALIASKrzysztof Kozlowski1-1/+0
The MODULE_DEVICE_TABLE already creates proper alias for platform driver. Having another MODULE_ALIAS causes the alias to be duplicated. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19Merge branch 'ocelot-phylink-fixes'David S. Miller1-10/+0
Colin Foster says: ==================== ocelot phylink fixes When the ocelot driver was migrated to phylink, e6e12df625f2 ("net: mscc: ocelot: convert to phylink") there were two additional writes to registers that became stale. One write was to DEV_CLOCK_CFG and one was to ANA_PFC_PCF_CFG. Both of these writes referenced the variable "speed" which originally was set to OCELOT_SPEED_{10,100,1000,2500}. These macros expand to values of 3, 2, 1, or 0, respectively. After the update, the variable speed is set to SPEED_{10,100,1000,2500} which expand to 10, 100, 1000, and 2500. So invalid values were getting written to the two registers, which would lead to either a lack of functionality or undefined funcationality. Fixing these values was the intent of v1 of this patch set - submitted as "[PATCH v1 net] net: ethernet: mscc: ocelot: bug fix when writing MAC speed" During that review it was determined that both writes were actually unnecessary. DEV_CLOCK_CFG is a duplicate write, so can be removed entirely. This was accidentally submitted as as a new, lone patch titled "[PATCH v1 net] net: mscc: ocelot: remove buggy duplicate write to DEV_CLOCK_CFG". This is part of what is considered v2 of this patch set. Additionally, the write to ANA_PFC_PFC_CFG is also unnecessary. Priority flow contol is disabled, so configuring it is useless and should be removed. This was also submitted as a new, lone patch titled "[PATCH v1 net] net: mscc: ocelot: remove buggy and useless write to ANA_PFC_PFC_CFG". This is the rest of what is considered v2 of this patch set. v3 Identical to v2, but fixes the patch numbering to v3 and submitting the two changes as a patch set. v2 Note: I misunderstood and submitted two new "v1" patches instead of a single "v2" patch set. - Remove the buggy writes altogher ==================== Signed-off-by: David S. Miller <[email protected]>
2021-09-19net: mscc: ocelot: remove buggy duplicate write to DEV_CLOCK_CFGColin Foster1-6/+0
When updating ocelot to use phylink, a second write to DEV_CLOCK_CFG was mistakenly left in. It used the variable "speed" which, previously, would would have been assigned a value of OCELOT_SPEED_1000. In phylink the variable is be SPEED_1000, which is invalid for the DEV_CLOCK_LINK_SPEED macro. Removing it as unnecessary and buggy. Fixes: e6e12df625f2 ("net: mscc: ocelot: convert to phylink") Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19net: mscc: ocelot: remove buggy and useless write to ANA_PFC_PFC_CFGColin Foster1-4/+0
A useless write to ANA_PFC_PFC_CFG was left in while refactoring ocelot to phylink. Since priority flow control is disabled, writing the speed has no effect. Further, it was using ethtool.h SPEED_ instead of OCELOT_SPEED_ macros, which are incorrectly offset for GENMASK. Lastly, for priority flow control to properly function, some scenarios would rely on the rate adaptation from the PCS while the MAC speed would be fixed. So it isn't used, and even if it was, neither "speed" nor "mac_speed" are necessarily the correct values to be used. Fixes: e6e12df625f2 ("net: mscc: ocelot: convert to phylink") Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19net: core: Correct the sock::sk_lock.owned lockdep annotationsThomas Gleixner2-14/+24
lock_sock_fast() and lock_sock_nested() contain lockdep annotations for the sock::sk_lock.owned 'mutex'. sock::sk_lock.owned is not a regular mutex. It is just lockdep wise equivalent. In fact it's an open coded trivial mutex implementation with some interesting features. sock::sk_lock.slock is a regular spinlock protecting the 'mutex' representation sock::sk_lock.owned which is a plain boolean. If 'owned' is true, then some other task holds the 'mutex', otherwise it is uncontended. As this locking construct is obviously endangered by lock ordering issues as any other locking primitive it got lockdep annotated via a dedicated dependency map sock::sk_lock.dep_map which has to be updated at the lock and unlock sites. lock_sock_nested() is a straight forward 'mutex' lock operation: might_sleep(); spin_lock_bh(sock::sk_lock.slock) while (!try_lock(sock::sk_lock.owned)) { spin_unlock_bh(sock::sk_lock.slock); wait_for_release(); spin_lock_bh(sock::sk_lock.slock); } The lockdep annotation for sock::sk_lock.owned is for unknown reasons _after_ the lock has been acquired, i.e. after the code block above and after releasing sock::sk_lock.slock, but inside the bottom halves disabled region: spin_unlock(sock::sk_lock.slock); mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_); local_bh_enable(); The placement after the unlock is obvious because otherwise the mutex_acquire() would nest into the spin lock held region. But that's from the lockdep perspective still the wrong place: 1) The mutex_acquire() is issued _after_ the successful acquisition which is pointless because in a dead lock scenario this point is never reached which means that if the deadlock is the first instance of exposing the wrong lock order lockdep does not have a chance to detect it. 2) It only works because lockdep is rather lax on the context from which the mutex_acquire() is issued. Acquiring a mutex inside a bottom halves and therefore non-preemptible region is obviously invalid, except for a trylock which is clearly not the case here. This 'works' stops working on RT enabled kernels where the bottom halves serialization is done via a local lock, which exposes this misplacement because the 'mutex' and the local lock nest the wrong way around and lockdep complains rightfully about a lock inversion. The placement is wrong since the initial commit a5b5bb9a053a ("[PATCH] lockdep: annotate sk_locks") which introduced this. Fix it by moving the mutex_acquire() in front of the actual lock acquisition, which is what the regular mutex_lock() operation does as well. lock_sock_fast() is not that straight forward. It looks at the first glance like a convoluted trylock operation: spin_lock_bh(sock::sk_lock.slock) if (!sock::sk_lock.owned) return false; while (!try_lock(sock::sk_lock.owned)) { spin_unlock_bh(sock::sk_lock.slock); wait_for_release(); spin_lock_bh(sock::sk_lock.slock); } spin_unlock(sock::sk_lock.slock); mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_); local_bh_enable(); return true; But that's not the case: lock_sock_fast() is an interesting optimization for short critical sections which can run with bottom halves disabled and sock::sk_lock.slock held. This allows to shortcut the 'mutex' operation in the non contended case by preventing other lockers to acquire sock::sk_lock.owned because they are blocked on sock::sk_lock.slock, which in turn avoids the overhead of doing the heavy processing in release_sock() including waking up wait queue waiters. In the contended case, i.e. when sock::sk_lock.owned == true the behavior is the same as lock_sock_nested(). Semantically this shortcut means, that the task acquired the 'mutex' even if it does not touch the sock::sk_lock.owned field in the non-contended case. Not telling lockdep about this shortcut acquisition is hiding potential lock ordering violations in the fast path. As a consequence the same reasoning as for the above lock_sock_nested() case vs. the placement of the lockdep annotation applies. The current placement of the lockdep annotation was just copied from the original lock_sock(), now renamed to lock_sock_nested(), implementation. Fix this by moving the mutex_acquire() in front of the actual lock acquisition and adding the corresponding mutex_release() into unlock_sock_fast(). Also document the fast path return case with a comment. Reported-by: Sebastian Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19docs: net: dsa: sja1105: fix reference to sja1105.txtAlejandro Concepcion-Rodriguez1-1/+1
The file sja1105.txt was converted to nxp,sja1105.yaml. Signed-off-by: Alejandro Concepcion-Rodriguez <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19igc: fix build errors for PTPRandy Dunlap1-0/+1
When IGC=y and PTP_1588_CLOCK=m, the ptp_*() interface family is not available to the igc driver. Make this driver depend on PTP_1588_CLOCK_OPTIONAL so that it will build without errors. Various igc commits have used ptp_*() functions without checking that PTP_1588_CLOCK is enabled. Fix all of these here. Fixes these build errors: ld: drivers/net/ethernet/intel/igc/igc_main.o: in function `igc_msix_other': igc_main.c:(.text+0x6494): undefined reference to `ptp_clock_event' ld: igc_main.c:(.text+0x64ef): undefined reference to `ptp_clock_event' ld: igc_main.c:(.text+0x6559): undefined reference to `ptp_clock_event' ld: drivers/net/ethernet/intel/igc/igc_ethtool.o: in function `igc_ethtool_get_ts_info': igc_ethtool.c:(.text+0xc7a): undefined reference to `ptp_clock_index' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_feature_enable_i225': igc_ptp.c:(.text+0x330): undefined reference to `ptp_find_pin' ld: igc_ptp.c:(.text+0x36f): undefined reference to `ptp_find_pin' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_init': igc_ptp.c:(.text+0x11cd): undefined reference to `ptp_clock_register' ld: drivers/net/ethernet/intel/igc/igc_ptp.o: in function `igc_ptp_stop': igc_ptp.c:(.text+0x12dd): undefined reference to `ptp_clock_unregister' ld: drivers/platform/x86/dell/dell-wmi-privacy.o: in function `dell_privacy_wmi_probe': Fixes: 64433e5bf40ab ("igc: Enable internal i225 PPS") Fixes: 60dbede0c4f3d ("igc: Add support for ethtool GET_TS_INFO command") Fixes: 87938851b6efb ("igc: enable auxiliary PHC functions for the i225") Fixes: 5f2958052c582 ("igc: Add basic skeleton for PTP") Signed-off-by: Randy Dunlap <[email protected]> Cc: Ederson de Souza <[email protected]> Cc: Tony Nguyen <[email protected]> Cc: Vinicius Costa Gomes <[email protected]> Cc: Jeff Kirsher <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jesse Brandeburg <[email protected]> Cc: [email protected] Acked-by: Vinicius Costa Gomes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19enetc: Fix uninitialized struct dim_sample field usageClaudiu Manoil1-1/+1
The only struct dim_sample member that does not get initialized by dim_update_sample() is comp_ctr. (There is special API to initialize comp_ctr: dim_update_sample_with_comps(), and it is currently used only for RDMA.) comp_ctr is used to compute curr_stats->cmps and curr_stats->cpe_ratio (see dim_calc_stats()) which in turn are consumed by the rdma_dim_*() API. Therefore, functionally, the net_dim*() API consumers are not affected. Nevertheless, fix the computation of statistics based on an uninitialized variable, even if the mentioned statistics are not used at the moment. Fixes: ae0e6a5d1627 ("enetc: Add adaptive interrupt coalescing") Signed-off-by: Claudiu Manoil <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19enetc: Fix illegal access when reading affinity_hintClaudiu Manoil1-4/+1
irq_set_affinity_hit() stores a reference to the cpumask_t parameter in the irq descriptor, and that reference can be accessed later from irq_affinity_hint_proc_show(). Since the cpu_mask parameter passed to irq_set_affinity_hit() has only temporary storage (it's on the stack memory), later accesses to it are illegal. Thus reads from the corresponding procfs affinity_hint file can result in paging request oops. The issue is fixed by the get_cpu_mask() helper, which provides a permanent storage for the cpumask_t parameter. Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Claudiu Manoil <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19virtio-net: fix pages leaking when building skb in big modeJason Wang1-0/+4
We try to use build_skb() if we had sufficient tailroom. But we forget to release the unused pages chained via private in big mode which will leak pages. Fixing this by release the pages after building the skb in big mode. Cc: Xuan Zhuo <[email protected]> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Jason Wang <[email protected]> Reviewed-by: Xuan Zhuo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19xen-netback: correct success/error reporting for the SKB-with-fraglist caseJan Beulich1-1/+1
When re-entering the main loop of xenvif_tx_check_gop() a 2nd time, the special considerations for the head of the SKB no longer apply. Don't mistakenly report ERROR to the frontend for the first entry in the list, even if - from all I can tell - this shouldn't matter much as the overall transmit will need to be considered failed anyway. Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Paul Durrant <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-19Merge branch 'dsa-shutdown'David S. Miller38-24/+543
Vladimir Oltean says: ==================== Make DSA switch drivers compatible with masters which unregister on shutdown Changes in v2: - fix build for b53_mmap - use unregister_netdevice_many It was reported by Lino here: https://lore.kernel.org/netdev/[email protected]/ that when the DSA master attempts to unregister its net_device on shutdown, DSA should prevent that operation from succeeding because it holds a reference to it. This hangs the shutdown process. This issue was essentially introduced in commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"). The present series patches all DSA drivers to handle that case, depending on whether those drivers were introduced before or after the offending commit, a different Fixes: tag is specified for them. The approach taken by this series solves the issue in essentially the same way as Lino's patches, except for three key differences: - this series takes a more minimal approach in what is done on shutdown, we do not attempt a full tree teardown as that is not strictly necessary. I might revisit this if there are compelling reasons to do otherwise - this series fixes the issues for all DSA drivers, not just KSZ9897 - this series works even if the ->remove driver method gets called for the same device too, not just ->shutdown. This is really possible to happen for SPI device drivers, and potentially possible for other bus device drivers too. ==================== Signed-off-by: David S. Miller <[email protected]>