aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-13powerpc/64s: system call scv tabort fix for corrupt irq soft-mask stateNicholas Piggin2-41/+30
If a system call is made with a transaction active, the kernel immediately aborts it and returns. scv system calls disable irqs even earlier in their interrupt handler, and tabort_syscall does not fix this up. This can result in irq soft-mask state being messed up on the next kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in the kernel exit handlers, or possibly worse. This can't easily be fixed in asm because at this point an async irq may have hit, which is soft-masked and marked pending. The pending interrupt has to be replayed before returning to userspace. The fix is to move the tabort_syscall code to C in the main syscall handler, and just skip the system call but otherwise return as usual, which will take care of the pending irqs. This also does a bunch of other things including possible signal delivery to the process, but the doomed transaction should still be aborted when it is eventually returned to. The sc system call path is changed to use the new C function as well to reduce code and path differences. This slows down how quickly system calls are aborted when called while a transaction is active, which could potentially impact TM performance. But making any system call is already bad for performance, and TM is on the way out, so go with simpler over faster. Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions") Reported-by: Eirik Fuller <[email protected]> Signed-off-by: Nicholas Piggin <[email protected]> [mpe: Use #ifdef rather than IS_ENABLED() to fix build error on 32-bit] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-09-13net: dsa: lantiq_gswip: Add 200ms assert delayAleksander Jan Bajkowski1-0/+6
The delay is especially needed by the xRX300 and xRX330 SoCs. Without this patch, some phys are sometimes not properly detected. The patch was tested on BT Home Hub 5A and D-Link DWR-966. Fixes: a09d042b0862 ("net: dsa: lantiq: allow to use all GPHYs on xRX300 and xRX330") Signed-off-by: Aleksander Jan Bajkowski <[email protected]> Signed-off-by: Martin Blumenstingl <[email protected]> Acked-by: Hauke Mehrtens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13ipv6: delay fib6_sernum increase in fib6_addzhang kai1-2/+1
only increase fib6_sernum in net namespace after add fib6_info successfully. Signed-off-by: zhang kai <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13tipc: increase timeout in tipc_sk_enqueue()Hoang Le1-1/+1
In tipc_sk_enqueue() we use hardcoded 2 jiffies to extract socket buffer from generic queue to particular socket. The 2 jiffies is too short in case there are other high priority tasks get CPU cycles for multiple jiffies update. As result, no buffer could be enqueued to particular socket. To solve this, we switch to use constant timeout 20msecs. Then, the function will be expired between 2 jiffies (CONFIG_100HZ) and 20 jiffies (CONFIG_1000HZ). Fixes: c637c1035534 ("tipc: resolve race problem at unicast message reception") Acked-by: Jon Maloy <[email protected]> Signed-off-by: Hoang Le <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13udp_tunnel: Fix udp_tunnel_nic work-queue typeAya Levin1-1/+1
Turn udp_tunnel_nic work-queue to an ordered work-queue. This queue holds the UDP-tunnel configuration commands of the different netdevs. When the netdevs are functions of the same NIC the order of execution may be crucial. Problem example: NIC with 2 PFs, both PFs declare offload quota of up to 3 UDP-ports. $ifconfig eth2 1.1.1.1/16 up $ip link add eth2_19503 type vxlan id 5049 remote 1.1.1.2 dev eth2 dstport 19053 $ip link set dev eth2_19503 up $ip link add eth2_19504 type vxlan id 5049 remote 1.1.1.3 dev eth2 dstport 19054 $ip link set dev eth2_19504 up $ip link add eth2_19505 type vxlan id 5049 remote 1.1.1.4 dev eth2 dstport 19055 $ip link set dev eth2_19505 up $ip link add eth2_19506 type vxlan id 5049 remote 1.1.1.5 dev eth2 dstport 19056 $ip link set dev eth2_19506 up NIC RX port offload infrastructure offloads the first 3 UDP-ports (on all devices which sets NETIF_F_RX_UDP_TUNNEL_PORT feature) and not UDP-port 19056. So both PFs gets this offload configuration. $ip link set dev eth2_19504 down This triggers udp-tunnel-core to remove the UDP-port 19504 from offload-ports-list and offload UDP-port 19056 instead. In this scenario it is important that the UDP-port of 19504 will be removed from both PFs before trying to add UDP-port 19056. The NIC can stop offloading a UDP-port only when all references are removed. Otherwise the NIC may report exceeding of the offload quota. Fixes: cc4e3835eff4 ("udp_tunnel: add central NIC RX port offload infrastructure") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"Yajun Deng4-10/+7
This reverts commit 919483096bfe75dda338e98d56da91a263746a0a. There is only when ip_options_get() return zero need to free. It already called kfree() when return error. Fixes: 919483096bfe ("ipv4: fix memory leaks in ip_cmsg_send() callers") Signed-off-by: Yajun Deng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13Merge branch 'bnxt_en-fixes'David S. Miller1-4/+29
Michael Chan says: ==================== bnxt_en: Bug fixes. The first patch fixes an error recovery regression just introduced about a week ago. The other two patches fix issues related to freeing rings in the bnxt_close() path under error conditions. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-09-13bnxt_en: Clean up completion ring page arrays completelyMichael Chan1-0/+8
We recently changed the completion ring page arrays to be dynamically allocated to better support the expanded range of ring depths. The cleanup path for this was not quite complete. It might cause the shutdown path to crash if we need to abort before the completion ring arrays have been allocated and initialized. Fix it by initializing the ring_mem->pg_arr to NULL after freeing the completion ring page array. Add a check in bnxt_free_ring() to skip referencing the rmem->pg_arr if it is NULL. Fixes: 03c7448790b8 ("bnxt_en: Don't use static arrays for completion ring pages") Reviewed-by: Andy Gospodarek <[email protected]> Reviewed-by: Edwin Peer <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13bnxt_en: make bnxt_free_skbs() safe to call after bnxt_free_mem()Edwin Peer1-0/+13
The call to bnxt_free_mem(..., false) in the bnxt_half_open_nic() error path will deallocate ring descriptor memory via bnxt_free_?x_rings(), but because irq_re_init is false, the ring info itself is not freed. To simplify error paths, deallocation functions have generally been written to be safe when called on unallocated memory. It should always be safe to call dev_close(), which calls bnxt_free_skbs() a second time, even in this semi- allocated ring state. Calling bnxt_free_skbs() a second time with the rings already freed will cause NULL pointer dereference. Fix it by checking the rings are valid before proceeding in bnxt_free_tx_skbs() and bnxt_free_one_rx_ring_skbs(). Fixes: 975bc99a4a39 ("bnxt_en: Refactor bnxt_free_rx_skbs().") Signed-off-by: Edwin Peer <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13bnxt_en: Fix error recovery regressionMichael Chan1-4/+8
The recent patch has introduced a regression by not reading the reset count in the ERROR_RECOVERY async event handler. We may have just gone through a reset and the reset count has just incremented. If we don't update the reset count in the ERROR_RECOVERY event handler, the health check timer will see that the reset count has changed and will initiate an unintended reset. Restore the unconditional update of the reset count in bnxt_async_event_process() if error recovery watchdog is enabled. Also, update the reset count at the end of the reset sequence to make it even more robust. Fixes: 1b2b91831983 ("bnxt_en: Fix possible unintended driver initiated error recovery") Reviewed-by: Edwin Peer <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-13m68k: mvme: Remove overdue #warnings in RTC handlingGeert Uytterhoeven2-2/+6
The warnings were introduced when converting the MVME147 and MVME16x RTC handling from gettod to hwclk. Replace the #warning by a comment, and return an error to inform the upper layer that writing to the RTC is not yet supported. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-09-13m68k: Double cast io functions to unsigned longGuenter Roeck1-10/+10
m68k builds fail widely with errors such as arch/m68k/include/asm/raw_io.h:20:19: error: cast to pointer from integer of different size arch/m68k/include/asm/raw_io.h:30:32: error: cast to pointer from integer of different size [-Werror=int-to-p On m68k, io functions are defined as macros. The problem is seen if the macro parameter variable size differs from the size of a pointer. Cast the parameter of all io macros to unsigned long before casting it to a pointer to fix the problem. Signed-off-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]>
2021-09-13platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) ↵Hans de Goede1-4/+13
tablet Add info for getting the firmware directly from the UEFI for the Chuwi Hi10 Plus (CWI527), so that the user does not need to manually install the firmware in /lib/firmware/silead. This change will make the touchscreen on these devices work OOTB, without requiring any manual setup. Also tweak the min and width/height values a bit for more accurate position reporting. Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-09-13platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tabletHans de Goede1-0/+37
Add touchscreen info for the Chuwi HiBook (CWI514) tablet. This includes info for getting the firmware directly from the UEFI, so that the user does not need to manually install the firmware in /lib/firmware/silead. This change will make the touchscreen on these devices work OOTB, without requiring any manual setup. Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-09-13lg-laptop: Correctly handle dmi_get_system_info() returning NULLMatan Ziv-Av1-1/+1
The laptop model is identified by parsing the product name. If no product name is available, do not try to parse it. Default model is 2017. Signed-off-by: Matan Ziv-Av <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2021-09-13platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()Andy Shevchenko1-2/+1
ACPI_PTR() is more harmful than helpful. For example, in this case if CONFIG_ACPI=n, the ID table left unused which is not what we want. Instead of adding ifdeffery here and there, drop ACPI_PTR() and unused acpi.h. Fixes: fdca4f16f57d ("platform:x86: add Intel P-Unit mailbox IPC driver") Reported-by: kernel test robot <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2021-09-13afs: Fix updating of i_blocks on file/dir extensionDavid Howells4-13/+13
When an afs file or directory is modified locally such that the total file size is extended, i_blocks needs to be recalculated too. Fix this by making afs_write_end() and afs_edit_dir_add() call afs_set_i_size() rather than setting inode->i_size directly as that also recalculates inode->i_blocks. This can be tested by creating and writing into directories and files and then examining them with du. Without this change, directories show a 4 blocks (they start out at 2048 bytes) and files show 0 blocks; with this change, they should show a number of blocks proportional to the file size rounded up to 1024. Fixes: 31143d5d515e ("AFS: implement basic file write support") Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Reported-by: Markus Suvanto <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Marc Dionne <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163113612442.352844.11162345591911691150.stgit@warthog.procyon.org.uk/
2021-09-13afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS serverDavid Howells5-12/+49
AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and Linux's afs client switches between them when talking to a non-YFS server if the read size, the file position or the sum of the two have the upper 32 bits set of the 64-bit value. This is a problem, however, since the file position and length fields of FS.FetchData are *signed* 32-bit values. Fix this by capturing the capability bits obtained from the fileserver when it's sent an FS.GetCapabilities RPC, rather than just discarding them, and then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be used to decide whether to use FS.FetchData or FS.FetchData64 - and also FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to switch on the parameter values. This capabilities flag could also be used to limit the maximum size of the file, but all servers must be checked for that. Note that the issue does not exist with FS.StoreData - that uses *unsigned* 32-bit values. It's also not a problem with Auristor servers as its YFS.FetchData64 op uses unsigned 64-bit values. This can be tested by cloning a git repo through an OpenAFS client to an OpenAFS server and then doing "git status" on it from a Linux afs client[1]. Provided the clone has a pack file that's in the 2G-4G range, the git status will show errors like: error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index This can be observed in the server's FileLog with something like the following appearing: Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001 Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866 ... Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5 Note the file position of 18446744071815340032. This is the requested file position sign-extended. Fixes: b9b1f8d5930a ("AFS: write support fixes") Reported-by: Markus Suvanto <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Marc Dionne <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1] Link: https://lore.kernel.org/r/[email protected]/
2021-09-13afs: Try to avoid taking RCU read lock when checking vnode validityDavid Howells5-47/+54
Try to avoid taking the RCU read lock when checking the validity of a vnode's callback state. The only thing it's needed for is to pin the parent volume's server list whilst we search it to find the record of the server we're currently using to see if it has been reinitialised (ie. it sent us a CB.InitCallBackState* RPC). Do this by the following means: (1) Keep an additional per-cell counter (fs_s_break) that's incremented each time any of the fileservers in the cell reinitialises. Since the new counter can be accessed without RCU from the vnode, we can check that first - and only if it differs, get the RCU read lock and check the volume's server list. (2) Replace afs_get_s_break_rcu() with afs_check_server_good() which now indicates whether the callback promise is still expected to be present on the server. This does the checks as described in (1). (3) Restructure afs_check_validity() to take account of the change in (2). We can also get rid of the valid variable and just use the need_clear variable with the addition of the afs_cb_break_no_promise reason. (4) afs_check_validity() probably shouldn't be altering vnode->cb_v_break and vnode->cb_s_break when it doesn't have cb_lock exclusively locked. Move the change to vnode->cb_v_break to __afs_break_callback(). Delegate the change to vnode->cb_s_break to afs_select_fileserver() and set vnode->cb_fs_s_break there also. (5) afs_validate() no longer needs to get the RCU read lock around its call to afs_check_validity() - and can skip the call entirely if we don't have a promise. Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163111669583.283156.1397603105683094563.stgit@warthog.procyon.org.uk/
2021-09-13afs: Fix mmap coherency vs 3rd-party changesDavid Howells7-3/+120
Fix the coherency management of mmap'd data such that 3rd-party changes become visible as soon as possible after the callback notification is delivered by the fileserver. This is done by the following means: (1) When we break a callback on a vnode specified by the CB.CallBack call from the server, we queue a work item (vnode->cb_work) to go and clobber all the PTEs mapping to that inode. This causes the CPU to trip through the ->map_pages() and ->page_mkwrite() handlers if userspace attempts to access the page(s) again. (Ideally, this would be done in the service handler for CB.CallBack, but the server is waiting for our reply before considering, and we have a list of vnodes, all of which need breaking - and the process of getting the mmap_lock and stripping the PTEs on all CPUs could be quite slow.) (2) Call afs_validate() from the ->map_pages() handler to check to see if the file has changed and to get a new callback promise from the server. Also handle the fileserver telling us that it's dropping all callbacks, possibly after it's been restarted by sending us a CB.InitCallBackState* call by the following means: (3) Maintain a per-cell list of afs files that are currently mmap'd (cell->fs_open_mmaps). (4) Add a work item to each server that is invoked if there are any open mmaps when CB.InitCallBackState happens. This work item goes through the aforementioned list and invokes the vnode->cb_work work item for each one that is currently using this server. This causes the PTEs to be cleared, causing ->map_pages() or ->page_mkwrite() to be called again, thereby calling afs_validate() again. I've chosen to simply strip the PTEs at the point of notification reception rather than invalidate all the pages as well because (a) it's faster, (b) we may get a notification for other reasons than the data being altered (in which case we don't want to clobber the pagecache) and (c) we need to ask the server to find out - and I don't want to wait for the reply before holding up userspace. This was tested using the attached test program: #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> int main(int argc, char *argv[]) { size_t size = getpagesize(); unsigned char *p; bool mod = (argc == 3); int fd; if (argc != 2 && argc != 3) { fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]); exit(2); } fd = open(argv[1], mod ? O_RDWR : O_RDONLY); if (fd < 0) { perror(argv[1]); exit(1); } p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ, MAP_SHARED, fd, 0); if (p == MAP_FAILED) { perror("mmap"); exit(1); } for (;;) { if (mod) { p[0]++; msync(p, size, MS_ASYNC); fsync(fd); } printf("%02x", p[0]); fflush(stdout); sleep(1); } } It runs in two modes: in one mode, it mmaps a file, then sits in a loop reading the first byte, printing it and sleeping for a second; in the second mode it mmaps a file, then sits in a loop incrementing the first byte and flushing, then printing and sleeping. Two instances of this program can be run on different machines, one doing the reading and one doing the writing. The reader should see the changes made by the writer, but without this patch, they aren't because validity checking is being done lazily - only on entry to the filesystem. Testing the InitCallBackState change is more complicated. The server has to be taken offline, the saved callback state file removed and then the server restarted whilst the reading-mode program continues to run. The client machine then has to poke the server to trigger the InitCallBackState call. Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
2021-09-13afs: Fix incorrect triggering of sillyrename on 3rd-party invalidationDavid Howells1-39/+7
The AFS filesystem is currently triggering the silly-rename cleanup from afs_d_revalidate() when it sees that a dentry has been changed by a third party[1]. It should not be doing this as the cleanup includes deleting the silly-rename target file on iput. Fix this by removing the places in the d_revalidate handling that validate anything other than the directory and the dirent. It probably should not be looking to validate the target inode of the dentry also. This includes removing the point in afs_d_revalidate() where the inode that a dentry used to point to was marked as being deleted (AFS_VNODE_DELETED). We don't know it got deleted. It could have been renamed or it could have hard links remaining. This was reproduced by cloning a git repo onto an afs volume on one machine, switching to another machine and doing "git status", then switching back to the first and doing "git status". The second status would show weird output due to ".git/index" getting deleted by the above mentioned mechanism. A simpler way to do it is to do: machine 1: touch a machine 2: touch b; mv -f b a machine 1: stat a on an afs volume. The bug shows up as the stat failing with ENOENT and the file server log showing that machine 1 deleted "a". Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename") Reported-by: Markus Suvanto <[email protected]> Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c4 [1] Link: https://lore.kernel.org/r/163111668100.283156.3851669884664475428.stgit@warthog.procyon.org.uk/
2021-09-13afs: Add missing vnode validation checksDavid Howells3-3/+41
afs_d_revalidate() should only be validating the directory entry it is given and the directory to which that belongs; it shouldn't be validating the inode/vnode to which that dentry points. Besides, validation need to be done even if we don't call afs_d_revalidate() - which might be the case if we're starting from a file descriptor. In order for afs_d_revalidate() to be fixed, validation points must be added in some other places. Certain directory operations, such as afs_unlink(), already check this, but not all and not all file operations either. Note that the validation of a vnode not only checks to see if the attributes we have are correct, but also gets a promise from the server to notify us if that file gets changed by a third party. Add the following checks: - Check the vnode we're going to make a hard link to. - Check the vnode we're going to move/rename. - Check the vnode we're going to read from. - Check the vnode we're going to write to. - Check the vnode we're going to sync. - Check the vnode we're going to make a mapped page writable for. Some of these aren't strictly necessary as we're going to perform a server operation that might get the attributes anyway from which we can determine if something changed - though it might not get us a callback promise. Signed-off-by: David Howells <[email protected]> Tested-by: Markus Suvanto <[email protected]> cc: [email protected] Link: https://lore.kernel.org/r/163111667354.283156.12720698333342917516.stgit@warthog.procyon.org.uk/
2021-09-12blk-mq: avoid to iterate over stale requestMing Lei1-1/+1
blk-mq can't run allocating driver tag and updating ->rqs[tag] atomically, meantime blk-mq doesn't clear ->rqs[tag] after the driver tag is released. So there is chance to iterating over one stale request just after the tag is allocated and before updating ->rqs[tag]. scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi in-flight requests after scsi host is blocked, so no new scsi command can be marked as SCMD_STATE_INFLIGHT. However, driver tag allocation still can be run by blk-mq core. One request is marked as SCMD_STATE_INFLIGHT, but this request may have been kept in another slot of ->rqs[], meantime the slot can be allocated out but ->rqs[] isn't updated yet. Then this in-flight request is counted twice as SCMD_STATE_INFLIGHT. This way causes trouble in handling scsi error. Fixes the issue by not iterating over stale request. Cc: [email protected] Cc: "Martin K. Petersen" <[email protected]> Reported-by: luojiaxing <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-09-12io-wq: fix potential race of acct->nr_workersHao Xu1-2/+1
Given max_worker is 1, and we currently have 1 running and it is exiting. There may be race like: io_wqe_enqueue worker1 no work there and timeout unlock(wqe->lock) ->insert work -->io_worker_exit lock(wqe->lock) ->if(!nr_workers) //it's still 1 unlock(wqe->lock) goto run_cancel lock(wqe->lock) nr_workers-- ->dec_running ->worker creation fails unlock(wqe->lock) We enqueued one work but there is no workers, causes hung. Signed-off-by: Hao Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-12io-wq: code clean of io_wqe_create_worker()Hao Xu1-12/+7
Remove do_create to save a local variable. Signed-off-by: Hao Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-12io_uring: ensure symmetry in handling iter types in loop_rw_iter()Jens Axboe1-3/+6
When setting up the next segment, we check what type the iter is and handle it accordingly. However, when incrementing and processed amount we do not, and both iter advance and addr/len are adjusted, regardless of type. Split the increment side just like we do on the setup side. Fixes: 4017eb91a9e7 ("io_uring: make loop_rw_iter() use original user supplied pointers") Cc: [email protected] Reported-by: Valentina Palmiotti <[email protected]> Reviewed-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-09-12Linux 5.15-rc1Linus Torvalds1-2/+2
2021-09-12Merge tag 'perf-tools-for-v5.15-2021-09-11' of ↵Linus Torvalds36-175/+1147
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools updates from Arnaldo Carvalho de Melo: - Add missing fields and remove some duplicate fields when printing a perf_event_attr. - Fix hybrid config terms list corruption. - Update kernel header copies, some resulted in new kernel features being automagically added to 'perf trace' syscall/tracepoint argument id->string translators. - Add a file generated during the documentation build to .gitignore. - Add an option to build without libbfd, as some distros, like Debian consider its ABI unstable. - Add support to print a textual representation of IBS raw sample data in 'perf report'. - Fix bpf 'perf test' sample mismatch reporting - Fix passing arguments to stackcollapse report in a 'perf script' python script. - Allow build-id with trailing zeros. - Look for ImageBase in PE file to compute .text offset. * tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (25 commits) tools headers UAPI: Update tools's copy of drm.h headers tools headers UAPI: Sync drm/i915_drm.h with the kernel sources tools headers UAPI: Sync linux/fs.h with the kernel sources tools headers UAPI: Sync linux/in.h copy with the kernel sources perf tools: Add an option to build without libbfd perf tools: Allow build-id with trailing zeros perf tools: Fix hybrid config terms list corruption perf tools: Factor out copy_config_terms() and free_config_terms() perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields perf tools: Ignore Documentation dependency file perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions tools include UAPI: Update linux/mount.h copy perf beauty: Cover more flags in the move_mount syscall argument beautifier tools headers UAPI: Sync linux/prctl.h with the kernel sources tools include UAPI: Sync sound/asound.h copy with the kernel sources tools headers UAPI: Sync linux/kvm.h with the kernel sources tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources perf report: Add support to print a textual representation of IBS raw sample data perf report: Add tools/arch/x86/include/asm/amd-ibs.h perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings ...
2021-09-12Merge tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of ↵Linus Torvalds4-9/+26
git://github.com/ojeda/linux Pull compiler attributes updates from Miguel Ojeda: - Fix __has_attribute(__no_sanitize_coverage__) for GCC 4 (Marco Elver) - Add Nick as Reviewer for compiler_attributes.h (Nick Desaulniers) - Move __compiletime_{error|warning} (Nick Desaulniers) * tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux: compiler_attributes.h: move __compiletime_{error|warning} MAINTAINERS: add Nick as Reviewer for compiler_attributes.h Compiler Attributes: fix __has_attribute(__no_sanitize_coverage__) for GCC 4
2021-09-12Merge tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linuxLinus Torvalds5-21/+36
Pull auxdisplay updates from Miguel Ojeda: "An assortment of improvements for auxdisplay: - Replace symbolic permissions with octal permissions (Jinchao Wang) - ks0108: Switch to use module_parport_driver() (Andy Shevchenko) - charlcd: Drop unneeded initializers and switch to C99 style (Andy Shevchenko) - hd44780: Fix oops on module unloading (Lars Poeschel) - Add I2C gpio expander example (Ralf Schlatterbeck)" * tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux: auxdisplay: Replace symbolic permissions with octal permissions auxdisplay: ks0108: Switch to use module_parport_driver() auxdisplay: charlcd: Drop unneeded initializers and switch to C99 style auxdisplay: hd44780: Fix oops on module unloading auxdisplay: Add I2C gpio expander example
2021-09-12Merge tag 'smp-urgent-2021-09-12' of ↵Linus Torvalds8-165/+598
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "Updates for the SMP and CPU hotplug: - Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the original hotplug code and now causing trouble with the ARM64 cache topology setup due to the pointless SMP function call. It's not longer required as the hotplug callbacks are guaranteed to be invoked on the upcoming CPU. - Remove the deprecated and now unused CPU hotplug functions - Rewrite the CPU hotplug API documentation" * tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: core-api/cpuhotplug: Rewrite the API section cpu/hotplug: Remove deprecated CPU-hotplug functions. thermal: Replace deprecated CPU-hotplug functions. drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
2021-09-12Merge tag 'char-misc-5.15-rc1-lkdtm' of ↵Linus Torvalds2-11/+27
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull misc driver fix from Greg KH: "Here is a single patch for 5.15-rc1, for the lkdtm misc driver. It resolves a build issue that many people were hitting with your current tree, and Kees and others felt would be good to get merged before -rc1 comes out, to prevent them from having to constantly hit it as many development trees restart on -rc1, not older -rc releases. It has NOT been in linux-next, but has passed 0-day testing and looks 'obviously correct' when reviewing it locally :)" * tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: lkdtm: Use init_uts_ns.name instead of macros
2021-09-12Merge tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmiLinus Torvalds1-12/+11
Pull IPMI updates from Corey Minyard: "A couple of very minor fixes for style and rate limiting. Nothing big, but probably needs to go in" * tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi: char: ipmi: use DEVICE_ATTR helper macro ipmi: rate limit ipmi smi_event failure message
2021-09-12Merge tag 'sched_urgent_for_v5.15_rc1' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Make sure the idle timer expires in hardirq context, on PREEMPT_RT - Make sure the run-queue balance callback is invoked only on the outgoing CPU * tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Prevent balance_push() on remote runqueues sched/idle: Make the idle timer expire in hard interrupt context
2021-09-12Merge tag 'locking_urgent_for_v5.15_rc1' of ↵Linus Torvalds4-94/+120
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Borislav Petkov: - Fix the futex PI requeue machinery to not return to userspace in inconsistent state - Avoid a potential null pointer dereference in the ww_mutex deadlock check - Other smaller cleanups and optimizations * tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix ww_mutex deadlock check futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic() futex: Avoid redundant task lookup futex: Clarify comment for requeue_pi_wake_futex() futex: Prevent inconsistent state and exit race futex: Return error code instead of assigning it without effect locking/rwsem: Add missing __init_rwsem() for PREEMPT_RT
2021-09-12Merge tag 'timers_urgent_for_v5.15_rc1' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Handle negative second values properly when converting a timespec64 to nanoseconds. * tag 'timers_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Handle negative seconds correctly in timespec64_to_ns()
2021-09-12Merge branch 'misc.namei' of ↵Linus Torvalds2-59/+58
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull namei updates from Al Viro: "Clearing fallout from mkdirat in io_uring series. The fix in the kern_path_locked() patch plus associated cleanups" * 'misc.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: putname(): IS_ERR_OR_NULL() is wrong here namei: Standardize callers of filename_create() namei: Standardize callers of filename_lookup() rename __filename_parentat() to filename_parentat() namei: Fix use after free in kern_path_locked
2021-09-12Merge tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds12-21/+37
Pull smbfs updates from Steve French: "cifs/smb3 updates: - DFS reconnect fix - begin creating common headers for server and client - rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb (smb3 or smbfs is more accurate, as the very old cifs dialect has long been superseded by smb3 dialects). In the future we can rename the fs/cifs directory to fs/smbfs. This does not include the set of multichannel fixes nor the two deferred close fixes (they are still being reviewed and tested)" * tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: properly invalidate cached root handle when closing it cifs: move SMB FSCTL definitions to common code cifs: rename cifs_common to smbfs_common cifs: update FSCTL definitions
2021-09-12net: mana: Prefer struct_size over open coded arithmeticLen Baker1-3/+1
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the struct_size() helper to do the arithmetic instead of the argument "size + count * size" in the kzalloc() function. [1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker <[email protected]> Reviewed-by: Haiyang Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-12selftest: net: fix typo in altname testAndrea Claudi1-1/+1
If altname deletion of the short alternative name fails, the error message printed is: "Failed to add short alternative name". This is obviously a typo, as we are testing altname deletion. Fix this using a proper error message. Fixes: f95e6c9c4617 ("selftest: net: add alternative names test") Signed-off-by: Andrea Claudi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-12net: dsa: qca8k: fix kernel panic with legacy mdio mappingAnsuel Smith1-8/+22
When the mdio legacy mapping is used the mii_bus priv registered by DSA refer to the dsa switch struct instead of the qca8k_priv struct and causes a kernel panic. Create dedicated function when the internal dedicated mdio driver is used to properly handle the 2 different implementation. Fixes: 759bafb8a322 ("net: dsa: qca8k: add support for internal phy and internal mdio") Signed-off-by: Ansuel Smith <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds41-325/+4127
Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...
2021-09-11Merge tag 'riscv-for-linus-5.15-mw1' of ↵Linus Torvalds10-15/+22
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - A pair of defconfig additions, for NVMe and the EFI filesystem localization options. - A larger address space for stack randomization. - A cleanup to our install rules. - A DTS update for the Microchip Icicle board, to fix the serial console. - Support for build-time table sorting, which allows us to have __ex_table read-only. * tag 'riscv-for-linus-5.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Move EXCEPTION_TABLE to RO_DATA segment riscv: Enable BUILDTIME_TABLE_SORT riscv: dts: microchip: mpfs-icicle: Fix serial console riscv: move the (z)install rules to arch/riscv/Makefile riscv: Improve stack randomisation on RV64 riscv: defconfig: enable NLS_CODEPAGE_437, NLS_ISO8859_1 riscv: defconfig: enable BLK_DEV_NVME
2021-09-11Merge branch 'for-5.15' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux Pull coccinelle updates from Julia Lawall: "These changes update some existing semantic patches with respect to some recent changes in the kernel. Specifically, the change to kvmalloc.cocci searches for kfree_sensitive rather than kzfree, and the change to use_after_iter.cocci adds list_entry_is_head as a valid use of a list iterator index variable after the end of the loop" * 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux: scripts: coccinelle: allow list_entry_is_head() to use pos coccinelle: api: rename kzfree to kfree_sensitive
2021-09-11tools headers UAPI: Update tools's copy of drm.h headersArnaldo Carvalho de Melo1-2/+12
Picking the changes from: 17ce9c61c71cbc0d ("drm: document DRM_IOCTL_MODE_RMFB") Doesn't result in any tooling changes: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after Silencing these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h' diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h Cc: Simon Ser <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-09-11tools headers UAPI: Sync drm/i915_drm.h with the kernel sourcesArnaldo Carvalho de Melo1-81/+417
To pick the changes in: b65a9489730a2494 ("drm/i915/userptr: Probe existence of backing struct pages upon creation") ee242ca704d38699 ("drm/i915/guc: Implement GuC priority management") 81340cf3bddded4f ("drm/i915/uapi: reject set_domain for discrete") 7961c5b60f23dff5 ("drm/i915: Add TTM offset argument to mmap.") aef7b67a79564f6c ("drm/i915/uapi: convert drm_i915_gem_userptr to kernel doc") e7737b67ab46ee0e ("drm/i915/uapi: reject caching ioctls for discrete") 3aa8c57fe25a9247 ("drm/i915/uapi: convert drm_i915_gem_set_domain to kernel doc") 289f5a72009b8f67 ("drm/i915/uapi: convert drm_i915_gem_caching to kernel doc") 4a766ae40ec83301 ("drm/i915: Drop the CONTEXT_CLONE API (v2)") 6ff6d61dd2a943bd ("drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP") fe4751c3d513ff4f ("drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE") 577729533cdc4e37 ("drm/i915: Document the Virtual Engine uAPI") c649432e86ca677d ("drm/i915: Fix busy ioctl commentary") That doesn't result in any changes to tooling as no new ioctl were added (at least not perceived by tools/perf/trace/beauty/drm_ioctl.sh). Addressing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jason Ekstrand <[email protected]> Cc: John Harrison <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Matthew Brost <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-09-11tools headers UAPI: Sync linux/fs.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+1
To pick the change in: 7957d93bf32bc211 ("block: add ioctl to read the disk sequence number") It adds a new ioctl, but we are still not using that to generate tables for 'perf trace', so no changes in tooling. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h' diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h Cc: Jens Axboe <[email protected]> Cc: Matteo Croce <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-09-11tools headers UAPI: Sync linux/in.h copy with the kernel sourcesArnaldo Carvalho de Melo1-10/+32
To pick the changes in: db243b796439c0ca ("net/ipv4/ipv6: Replace one-element arraya with flexible-array members") 2d3e5caf96b9449a ("net/ipv4: Replace one-element array with flexible-array member") That don't result in any change in tooling, the structs changed remains with the same layout. This addresses this build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h' diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h Cc: David S. Miller <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-09-11perf tools: Add an option to build without libbfdIan Rogers1-22/+25
Some distributions, like debian, don't link perf with libbfd. Add a build flag to make this configuration buildable and testable. This was inspired by: https://lore.kernel.org/linux-perf-users/[email protected]/T/#u Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: tony garnock-jones <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-09-11perf tools: Allow build-id with trailing zerosNamhyung Kim1-0/+10
Currently perf saves a build-id with size but old versions assumes the size of 20. In case the build-id is less than 20 (like for MD5), it'd fill the rest with 0s. I saw a problem when old version of perf record saved a binary in the build-id cache and new version of perf reads the data. The symbols should be read from the build-id cache (as the path no longer has the same binary) but it failed due to mismatch in the build-id. symsrc__init: build id mismatch for /home/namhyung/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf. The build-id event in the data has 20 byte build-ids, but it saw a different size (16) when it reads the build-id of the elf file in the build-id cache. $ readelf -n ~/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000010 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 53e4c2f42a4c61a2d632d92a72afa08f Let's fix this by allowing trailing zeros if the size is different. Fixes: 39be8d0115b321ed ("perf tools: Pass build_id object to dso__build_id_equal()") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>