aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-18x86: don't use REP_GOOD or ERMS for user memory copiesLinus Torvalds2-54/+12
The modern target to use is FSRM (Fast Short REP MOVS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18x86: don't use REP_GOOD or ERMS for small memory clearingLinus Torvalds1-36/+11
The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18x86: don't use REP_GOOD or ERMS for small memory copiesLinus Torvalds1-24/+10
The modern target to use is FSRM (Fast Short REP MOVS), and the other cases should only be used for bigger areas (ie mainly things like page copying and clearing). Signed-off-by: Linus Torvalds <[email protected]>
2023-04-18nilfs2: initialize unused bytes in segment summary blocksRyusuke Konishi1-0/+20
Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for KMSAN enabled kernels after applying commit 7397031622e0 ("nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field"). This is because the unused bytes at the end of each block in segment summaries are not initialized. So this fixes the issue by padding the unused bytes with null bytes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Tested-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Cc: Alexander Potapenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pagesMel Gorman1-0/+3
A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is taking an excessive amount of time for large amounts of memory. Further testing allocating huge pages that the cost is linear i.e. if allocating 1G pages in batches of 10 then the time to allocate nr_hugepages from 10->20->30->etc increases linearly even though 10 pages are allocated at each step. Profiles indicated that much of the time is spent checking the validity within already existing huge pages and then attempting a migration that fails after isolating the range, draining pages and a whole lot of other useless work. Commit eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") removed two checks, one which ignored huge pages for contiguous allocations as huge pages can sometimes migrate. While there may be value on migrating a 2M page to satisfy a 1G allocation, it's potentially expensive if the 1G allocation fails and it's pointless to try moving a 1G page for a new 1G allocation or scan the tail pages for valid PFNs. Reintroduce the PageHuge check and assume any contiguous region with hugetlbfs pages is unsuitable for a new 1G allocation. The hpagealloc test allocates huge pages in batches and reports the average latency per page over time. This test happens just after boot when fragmentation is not an issue. Units are in milliseconds. hpagealloc 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla hugeallocrevert-v1r1 hugeallocsimple-v1r2 Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) 1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 19.85 ( 94.43%) 2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.44 ( 97.07%) 3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 20.81 ( 97.86%) Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 19.31 ( 76.49%) Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 19.43 ( 87.09%) Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 20.97 ( 98.20%) Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.06 ( 98.28%) Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.56 ( 98.24%) Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 26.62 ( 97.97%) Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.44 * 96.99%* 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla revert-v1 hugeallocfix-v2 Duration User 0.28 0.27 0.30 Duration System 808.66 17.77 35.99 Duration Elapsed 830.87 18.08 36.33 The vanilla kernel is poor, taking up to 1.3 second to allocate a huge page and almost 10 minutes in total to run the test. Reverting the problematic commit reduces it to 8ms at worst and the patch takes 26ms. This patch fixes the main issue with skipping huge pages but leaves the page_count() out because a page with an elevated count potentially can migrate. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022 Link: https://lkml.kernel.org/r/[email protected] Fixes: eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") Signed-off-by: Mel Gorman <[email protected]> Reported-by: Yuanxi Liu <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm/mmap: regression fix for unmapped_area{_topdown}Liam R. Howlett1-5/+43
The maple tree limits the gap returned to a window that specifically fits what was asked. This may not be optimal in the case of switching search directions or a gap that does not satisfy the requested space for other reasons. Fix the search by retrying the operation and limiting the search window in the rare occasion that a conflict occurs. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: Rick Edgecombe <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18maple_tree: fix mas_empty_area() searchLiam R. Howlett1-9/+11
The internal function of mas_awalk() was incorrectly skipping the last entry in a node, which could potentially be NULL. This is only a problem for the left-most node in the tree - otherwise that NULL would not exist. Fix mas_awalk() by using the metadata to obtain the end of the node for the loop and the logical pivot as apposed to the raw pivot value. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: Rick Edgecombe <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18maple_tree: make maple state reusable after mas_empty_area_rev()Liam R. Howlett1-14/+13
Stop using maple state min/max for the range by passing through pointers for those values. This will allow the maple state to be reused without resetting. Also add some logic to fail out early on searching with invalid arguments. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: Rick Edgecombe <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: kmsan: handle alloc failures in kmsan_ioremap_page_range()Alexander Potapenko3-19/+59
Similarly to kmsan_vmap_pages_range_noflush(), kmsan_ioremap_page_range() must also properly handle allocation/mapping failures. In the case of such, it must clean up the already created metadata mappings and return an error code, so that the error can be propagated to ioremap_page_range(). Without doing so, KMSAN may silently fail to bring the metadata for the page range into a consistent state, which will result in user-visible crashes when trying to access them. Link: https://lkml.kernel.org/r/[email protected] Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko <[email protected]> Reported-by: Dipanjan Das <[email protected]> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()Alexander Potapenko3-19/+34
As reported by Dipanjan Das, when KMSAN is used together with kernel fault injection (or, generally, even without the latter), calls to kcalloc() or __vmap_pages_range_noflush() may fail, leaving the metadata mappings for the virtual mapping in an inconsistent state. When these metadata mappings are accessed later, the kernel crashes. To address the problem, we return a non-zero error code from kmsan_vmap_pages_range_noflush() in the case of any allocation/mapping failure inside it, and make vmap_pages_range_noflush() return an error if KMSAN fails to allocate the metadata. This patch also removes KMSAN_WARN_ON() from vmap_pages_range_noflush(), as these allocation failures are not fatal anymore. Link: https://lkml.kernel.org/r/[email protected] Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko <[email protected]> Reported-by: Dipanjan Das <[email protected]> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18tools/Makefile: do missed s/vm/mm/SeongJae Park1-7/+7
Commit 799fb82aa132 ("tools/vm: rename tools/vm to tools/mm") missed renaming 'vm' in 'tools/Makefile' to 'mm'. As a result, 'make clean' under 'tools/' directory fails as below: $ make -C tools clean DESCEND vm make[1]: Entering directory '/linux/tools/vm' make[1]: *** No rule to make target 'clean'. Stop. make[1]: Leaving directory '/linux/tools/vm' make: *** [Makefile:173: vm_clean] Error 2 make: Leaving directory '/linux/tools' Do the missed rename. Link: https://lkml.kernel.org/r/[email protected] Fixes: 799fb82aa132 ("tools/vm: rename tools/vm to tools/mm") Signed-off-by: SeongJae Park <[email protected]> Reported-by: Ricardo Pardini <[email protected]> Link: https://lore.kernel.org/linux-mm/[email protected]/ Tested-by: Ricardo Pardini <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: fix memory leak on mm_init error handlingMathieu Desnoyers1-0/+1
commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") introduces a memory leak by missing a call to destroy_context() when a percpu_counter fails to allocate. Before introducing the per-cpu counter allocations, init_new_context() was the last call that could fail in mm_init(), and thus there was no need to ever invoke destroy_context() in the error paths. Adding the following percpu counter allocations adds error paths after init_new_context(), which means its associated destroy_context() needs to be called when percpu counters fail to allocate. Link: https://lkml.kernel.org/r/[email protected] Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") Signed-off-by: Mathieu Desnoyers <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlockTetsuo Handa1-0/+16
syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried. One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even } This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach. Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held. CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } } This deadlock scenario can be eliminated by preventing interrupt context from calling kmalloc(GFP_ATOMIC) and preventing printk() from calling console_flush_all() while zonelist_update_seq.seqcount is odd. Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) and disabling synchronous printk() in order to avoid console_flush_all() . As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section... Link: https://lkml.kernel.org/r/[email protected] Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Reported-by: syzbot <[email protected]> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Petr Mladek <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Ilpo Järvinen <[email protected]> Cc: John Ogness <[email protected]> Cc: Patrick Daly <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()Ondrej Mosnacek1-29/+40
Linux Security Modules (LSMs) that implement the "capable" hook will usually emit an access denial message to the audit log whenever they "block" the current task from using the given capability based on their security policy. The occurrence of a denial is used as an indication that the given task has attempted an operation that requires the given access permission, so the callers of functions that perform LSM permission checks must take care to avoid calling them too early (before it is decided if the permission is actually needed to perform the requested operation). The __sys_setres[ug]id() functions violate this convention by first calling ns_capable_setid() and only then checking if the operation requires the capability or not. It means that any caller that has the capability granted by DAC (task's capability set) but not by MAC (LSMs) will generate a "denied" audit record, even if is doing an operation for which the capability is not required. Fix this by reordering the checks such that ns_capable_setid() is checked last and -EPERM is returned immediately if it returns false. While there, also do two small optimizations: * move the capability check before prepare_creds() and * bail out early in case of a no-op. Link: https://lkml.kernel.org/r/[email protected] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ondrej Mosnacek <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18drm/amd/display: fix a divided-by-zero errorAlex Hung1-0/+4
[Why & How] timing.dsc_cfg.num_slices_v can be zero and it is necessary to check before using it. This fixes the error "divide error: 0000 [#1] PREEMPT SMP NOPTI". Reviewed-by: Aurabindo Pillai <[email protected]> Acked-by: Qingqing Zhuo <[email protected]> Signed-off-by: Alex Hung <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-04-18drm/amd/display: limit timing for single dimm memoryDaniel Miess1-0/+20
[Why] 1. It could hit bandwidth limitdation under single dimm memory when connecting 8K external monitor. 2. IsSupportedVidPn got validation failed with 2K240Hz eDP + 8K24Hz external monitor. 3. It's better to filter out such combination in EnumVidPnCofuncModality 4. For short term, filter out in dc bandwidth validation. [How] Force 2K@240Hz+8K@24Hz timing validation false in dc. Reviewed-by: Nicholas Kazlauskas <[email protected]> Acked-by: Qingqing Zhuo <[email protected]> Signed-off-by: Daniel Miess <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-04-18drm/amd/display: set dcn315 lb bpp to 48Dmytro Laktyushkin1-1/+1
[Why & How] Fix a typo for dcn315 line buffer bpp. Reviewed-by: Jun Lei <[email protected]> Acked-by: Qingqing Zhuo <[email protected]> Signed-off-by: Dmytro Laktyushkin <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-04-18drm/amdgpu: Fix desktop freezed after gpu-resetAlan Liu2-3/+17
[Why] After gpu-reset, sometimes the driver fails to enable vblank irq, causing flip_done timed out and the desktop freezed. During gpu-reset, we disable and enable vblank irq in dm_suspend() and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we check irqs' refcount and decide to enable or disable the irqs again. However, we have 2 sets of API for controling vblank irq, one is dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has its own refcount and flag to store the state of vblank irq, and they are not synchronized. In drm we use the first API to control vblank irq but in amdgpu_irq_gpu_reset_resume_helper() we use the second set of API. The failure happens when vblank irq was enabled by dm_vblank_get() before gpu-reset, we have vblank->enabled true. However, during gpu-reset, in amdgpu_irq_gpu_reset_resume_helper() vblank irq's state checked from amdgpu_irq_update() is DISABLED. So finally it disables vblank irq again. After gpu-reset, if there is a cursor plane commit, the driver will try to enable vblank irq by calling drm_vblank_enable(), but the vblank->enabled is still true, so it fails to turn on vblank irq and causes flip_done can't be completed in vblank irq handler and desktop become freezed. [How] Combining the 2 vblank control APIs by letting drm's API finally calls amdgpu_irq's API, so the irq's refcount and state of both APIs can be synchronized. Also add a check to prevent refcount from being less then 0 in amdgpu_irq_put(). v2: - Add warning in amdgpu_irq_enable() if the irq is already disabled. - Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change if it is in gpu-reset. v3: - Improve commit message and code comments. Signed-off-by: Alan Liu <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2023-04-18veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features ↵Lorenzo Bianconi1-6/+11
flag For veth pairs, NETDEV_XDP_ACT_NDO_XMIT is supported by the current device if the peer one is running a XDP program or if it has GRO enabled. Fix the xdp_features flags reporting considering peer device and not current one for NETDEV_XDP_ACT_NDO_XMIT. Fixes: fccca038f300 ("veth: take into account device reconfiguration for xdp_features flag") Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/4f1ca6f6f6b42ae125bfdb5c7782217c83968b2e.1681767806.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-18Merge tag 'mmc-v6.3-rc3' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC host: - sdhci_am654: Fix support for UHS-I SDR12 and SDR25 speed modes MEMSTICK: - Fix memory leak if card device never gets registered" * tag 'mmc-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: memstick: fix memory leak if card device is never registered mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25
2023-04-18KVM: arm64: Make vcpu flag updates non-preemptibleMarc Zyngier1-1/+18
Per-vcpu flags are updated using a non-atomic RMW operation. Which means it is possible to get preempted between the read and write operations. Another interesting thing to note is that preemption also updates flags, as we have some flag manipulation in both the load and put operations. It is thus possible to lose information communicated by either load or put, as the preempted flag update will overwrite the flags when the thread is resumed. This is specially critical if either load or put has stored information which depends on the physical CPU the vcpu runs on. This results in really elusive bugs, and kudos must be given to Mostafa for the long hours of debugging, and finally spotting the problem. Fix it by disabling preemption during the RMW operation, which ensures that the state stays consistent. Also upgrade vcpu_get_flag path to use READ_ONCE() to make sure the field is always atomically accessed. Fixes: e87abb73e594 ("KVM: arm64: Add helpers to manipulate vcpu flags among a set") Reported-by: Mostafa Saleh <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
2023-04-18irqchip/gic-v3: Add Rockchip 3588001 erratum workaroundSebastian Reichel3-0/+48
Rockchip RK3588/RK3588s GIC600 integration does not support the sharability feature. Rockchip assigned Erratum ID #3588001 for this issue. Note, that the 0x0201743b ID is not Rockchip specific and thus there is an extra of_machine_is_compatible() check. The flags are named FORCE_NON_SHAREABLE to be vendor agnostic, since apparently similar integration design errors exist in other platforms and they can reuse the same flag. Co-developed-by: XiaoDong Huang <[email protected]> Signed-off-by: XiaoDong Huang <[email protected]> Co-developed-by: Kever Yang <[email protected]> Signed-off-by: Kever Yang <[email protected]> Co-developed-by: Lucas Tanure <[email protected]> Signed-off-by: Lucas Tanure <[email protected]> Reviewed-by: AngeloGioacchino Del Regno <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18Merge tag 'arm-fixes-6.3-3' of ↵Linus Torvalds37-98/+88
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There are a number of updates for devicetree files for Qualcomm, Rockchips, and NXP i.MX platforms, addressing mistakes in the DT contents: - Wrong GPIO polarity on some boards - Lower SD card interface speed for better stability - Incorrect power supply, clock, pmic, cache properties - Disable broken hbr3 on sc7280-herobrine - Devicetree warning fixes The only other changes are: - A regression fix for the Amlogic performance monitoring unit driver, along with two related DT changes. - imx_v6_v7_defconfig enables PCI support again. - Trivial fixes for tee, optee and psci firmware drivers, addressing compiler warning and error output" * tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits) firmware/psci: demote suspend-mode warning to info level arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards ARM: imx_v6_v7_defconfig: Fix unintentional disablement of PCI arm64: dts: rockchip: correct panel supplies on some rk3326 boards arm64: dts: rockchip: use just "port" in panel on RockPro64 arm64: dts: rockchip: use just "port" in panel on Pinebook Pro ARM: dts: imx6ull-colibri: Remove unnecessary #address-cells/#size-cells ARM: dts: imx7d-remarkable2: Remove unnecessary #address-cells/#size-cells arm64: dts: imx8mp-verdin: correct off-on-delay arm64: dts: imx8mm-verdin: correct off-on-delay arm64: dts: imx8mm-evk: correct pmic clock source arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers arm64: dts: rockchip: Remove non-existing pwm-delay-us property arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices tee: Pass a pointer to virt_to_page() perf/amlogic: adjust register offsets arm64: dts: meson-g12-common: resolve conflict between canvas & pmu arm64: dts: meson-g12-common: specify full DMC range arm64: dts: imx8mp: fix address length for LCDIF2 riscv: dts: canaan: drop invalid spi-max-frequency ...
2023-04-18Merge tag 'mvebu-arm64-6.4-1' of ↵Arnd Bergmann1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into soc/arm mvebu arm64 for 6.4 (part 1) turris-mox-rwtm firmware: - prevent modification at runtime of the kobj_type struct * tag 'mvebu-arm64-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: firmware: turris-mox-rwtm: make kobj_type structure constant Link: https://lore.kernel.org/r/878repzfbp.fsf@BL-laptop Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18ARM: mv78xx0: fix entries for gpios, buttons and usb portsJeremy J. Peper1-15/+51
Original code was largely copy-pasted from the reference board code, correct values to reflect the hardware actually present in the TS-WXL. Signed-off-by: Jeremy J. Peper <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18ARM: mv78xx0: add code to enable XOR and CRYPTO engines on mv78xx0Jeremy J. Peper4-0/+37
Adding missing code/values required to enable the XOR and CESA engines for this SoC Signed-off-by: Jeremy J. Peper <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18ARM: mv78xx0: set the correct driver for the i2c RTCJeremy J. Peper1-1/+1
Original code was largely copy-pasted from the reference board code, adjust to use the actual RTC chip present on the TS-WXL. Signed-off-by: Jeremy J. Peper <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18ARM: mv78xx0: adjust init logic for ts-wxl to reflect single core devJeremy J. Peper2-11/+3
Original code was largely copy-pasted from the reference board code, adjust pcie initialiazation to reflect the TS-WXL using the single-core variant of this SoC. Correct pcie_port_size to be a power of 2 as required. Signed-off-by: Jeremy J. Peper <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime ↵Frederic Weisbecker3-8/+47
monotonicity The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which can also be fetched from clock_gettime() API. Improve the test coverage while verifying the monotonicity of CLOCK_BOOTTIME accross both interfaces. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18selftests/proc: Remove idle time monotonicity assertionsFrederic Weisbecker3-27/+14
Due to broken iowait task counting design (cf: comments above get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide the guarantee that /proc/stat or /proc/uptime display monotonic idle time values. Remove the assertions that verify the related wrong assumption so that testers and maintainers don't spend more time on that. Reported-by: Yu Liao <[email protected]> Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18MAINTAINERS: Remove stale email addressFrederic Weisbecker1-1/+1
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()Frederic Weisbecker1-12/+8
There is no need for the __tick_nohz_idle_stop_tick() function between tick_nohz_idle_stop_tick() and its implementation. Remove that unnecessary step. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18timers/nohz: Add a comment about broken iowait counter update raceFrederic Weisbecker1-2/+8
The per-cpu iowait task counter is incremented locally upon sleeping. But since the task can be woken to (and by) another CPU, the counter may then be decremented remotely. This is the source of a race involving readers VS writer of idle/iowait sleeptime. The following scenario shows an example where a /proc/stat reader observes a pending sleep time as IO whereas that pending sleep time later eventually gets accounted as non-IO. CPU 0 CPU 1 CPU 2 ----- ----- ------ //io_schedule() TASK A current->in_iowait = 1 rq(0)->nr_iowait++ //switch to idle // READ /proc/stat // See nr_iowait_cpu(0) == 1 return ts->iowait_sleeptime + ktime_sub(ktime_get(), ts->idle_entrytime) //try_to_wake_up(TASK A) rq(0)->nr_iowait-- //idle exit // See nr_iowait_cpu(0) == 0 ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime) As a result subsequent reads on /proc/stat may expose backward progress. This is unfortunately hardly fixable. Just add a comment about that condition. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18timers/nohz: Protect idle/iowait sleep time under seqcountFrederic Weisbecker2-6/+17
Reading idle/IO sleep time (eg: from /proc/stat) can race with idle exit updates because the state machine handling the stats is not atomic and requires a coherent read batch. As a result reading the sleep time may report irrelevant or backward values. Fix this with protecting the simple state machine within a seqcount. This is expected to be cheap enough not to add measurable performance impact on the idle path. Note this only fixes reader VS writer condition partitially. A race remains that involves remote updates of the CPU iowait task counter. It can hardly be fixed. Reported-by: Yu Liao <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18timers/nohz: Only ever update sleeptime from idle exitFrederic Weisbecker1-58/+37
The idle and IO sleeptime statistics appearing in /proc/stat can be currently updated from two sites: locally on idle exit and remotely by cpufreq. However there is no synchronization mechanism protecting concurrent updates. It is therefore possible to account the sleeptime twice, among all the other possible broken scenarios. To prevent from breaking the sleeptime accounting source, restrict the sleeptime updates to the local idle exit site. If there is a delta to add since the last update, IO/Idle sleep time readers will now only compute the delta without actually writing it back to the internal idle statistic fields. This fixes a writer VS writer race. Note there are still two known reader VS writer races to handle. A subsequent patch will fix one. Reported-by: Yu Liao <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18timers/nohz: Restructure and reshuffle struct tick_schedFrederic Weisbecker1-25/+41
Restructure and group fields by access in order to optimize cache layout. While at it, also add missing kernel doc for two fields: @last_jiffies and @idle_expires. Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-18Merge tag 'mvebu-dt64-6.4-1' of ↵Arnd Bergmann6-7/+249
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into soc/dt mvebu dt64 for 6.4 (part 1) Enlarge PCI memory window on Machiatobin (Armada 7040 based) Add supoport for the GL.iNet GL-MV1000 (Armada 3700 based) Add missing phy-mode on the cn9310 Align thermal node names with bindings * tag 'mvebu-dt64-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: ARM64: dts: marvell: cn9310: Add missing phy-mode arm64: dts: marvell: add DTS for GL.iNet GL-MV1000 arm64: dts: marvell: align thermal node names with bindings arm64: dts: marvell: mochabin: enlarge PCI memory window Link: https://lore.kernel.org/r/87bkjlzfcw.fsf@BL-laptop Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18Merge tag 'mvebu-dt-6.4-1' of ↵Arnd Bergmann13-14/+30
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into soc/dt mvebu dt for 6.4 (part 1) Add missing phy-mode and fixed links for kirkwood, orion5 and Armada (370, XP, 38x) SoCs * tag 'mvebu-dt-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: ARM: dts: armada: Add missing phy-mode and fixed links ARM: dts: orion5: Add missing phy-mode and fixed links ARM: dts: kirkwood: Add missing phy-mode and fixed links Link: https://lore.kernel.org/r/87edohzfeg.fsf@BL-laptop Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18Merge tag 'v6.4-rockchip-dts64-2' of ↵Arnd Bergmann5-3/+147
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt On the Rock5b a fix the newly added rtc node and cpu-regulators for the big cluster. Volume-keys (via adc) for the Pinephone Pro, display support for the Anbernic RG353. As well as gpio-ranges for rk356x and fixes for the audio-codec node-names on two boards. * tag 'v6.4-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Add support for volume keys to rk3399-pinephone-pro arm64: dts: rockchip: Add vdd_cpu_big regulators to rk3588-rock-5b arm64: dts: rockchip: Use generic name for es8316 on Pinebook Pro and Rock 5B arm64: dts: rockchip: Drop RTC clock-frequency on rk3588-rock-5b arm64: dts: rockchip: Add pinctrl gpio-ranges for rk356x arm64: dts: rockchip: add panel to Anbernic RG353 series Link: https://lore.kernel.org/r/5144826.MHq7AAxBmi@phil Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18ARM: config: Update Vexpress defconfigLinus Walleij1-1/+3
The Versatile Express should conform to standard contemporary kernel features: add NO_HZ_FULL and HIGH_RES_TIMERS. Also add the AFS flash partitions as these are used on the platform. The removed SCHED_DEBUG is due to Kconfig changes. Signed-off-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2023-04-18tick/common: Align tick period with the HZ tick.Sebastian Andrzej Siewior1-1/+11
With HIGHRES enabled tick_sched_timer() is programmed every jiffy to expire the timer_list timers. This timer is programmed accurate in respect to CLOCK_MONOTONIC so that 0 seconds and nanoseconds is the first tick and the next one is 1000/CONFIG_HZ ms later. For HZ=250 it is every 4 ms and so based on the current time the next tick can be computed. This accuracy broke since the commit mentioned below because the jiffy based clocksource is initialized with higher accuracy in read_persistent_wall_and_boot_offset(). This higher accuracy is inherited during the setup in tick_setup_device(). The timer still fires every 4ms with HZ=250 but timer is no longer aligned with CLOCK_MONOTONIC with 0 as it origin but has an offset in the us/ns part of the timestamp. The offset differs with every boot and makes it impossible for user land to align with the tick. Align the tick period with CLOCK_MONOTONIC ensuring that it is always a multiple of 1000/CONFIG_HZ ms. Fixes: 857baa87b6422 ("sched/clock: Enable sched clock early") Reported-by: Gusenleitner Klaus <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/[email protected] Link: https://lore.kernel.org/r/[email protected]
2023-04-18LoongArch: module: set section addresses to 0x0Huacai Chen1-4/+4
These got*, plt* and .text.ftrace_trampoline sections specified for LoongArch have non-zero addressses. Non-zero section addresses in a relocatable ELF would confuse GDB when it tries to compute the section offsets and it ends up printing wrong symbol addresses. Therefore, set them to zero, which mirrors the change in commit 5d8591bc0fbaeb6ded ("arm64 module: set plt* section addresses to 0x0"). Cc: [email protected] Reviewed-by: Guo Ren <[email protected]> Signed-off-by: Chong Qiao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Mark 3 symbol exports as non-GPLHuacai Chen2-3/+3
vm_map_base, empty_zero_page and invalid_pmd_table could be accessed widely by some out-of-tree non-GPL but important file systems or drivers (e.g. OpenZFS). Let's use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL() to export them, so as to avoid build errors. 1, Details about vm_map_base: This is a LoongArch-specific symbol and may be referenced through macros PCI_IOBASE, VMALLOC_START and VMALLOC_END. 2, Details about empty_zero_page: As it stands today, only 3 architectures export empty_zero_page as a GPL symbol: IA64, LoongArch and MIPS. LoongArch gets the GPL export by inheriting from MIPS, and the MIPS export was first introduced in commit 497d2adcbf50b ("[MIPS] Export empty_zero_page for sake of the ext4 module."). The IA64 export was similar: commit a7d57ecf4216e ("[IA64] Export three symbols for module use") did so for kvm. In both IA64 and MIPS, the export of empty_zero_page was done for satisfying some in-kernel component built as module (kvm and ext4 respectively), and given its reasonably low-level nature, GPL is a reasonable choice. But looking at the bigger picture it is evident most other architectures do not regard it as GPL, so in effect the symbol probably should not be treated as such, in favor of consistency. 3, Details about invalid_pmd_table: Keep consistency with invalid_pte_table and make it be possible by some modules. Cc: [email protected] Reviewed-by: WANG Xuerui <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Enable PG when wakeup from suspendHuacai Chen1-0/+4
Some firmwares don't enable PG when wakeup from suspend, so do it in kernel. This can improve code compatibility for boot kernel. Signed-off-by: Baoqi Zhang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Fix _CONST64_(x) as unsignedQing Zhang1-2/+2
Addresses should all be of unsigned type to avoid unnecessary conversions. Signed-off-by: Qing Zhang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Fix build error if CONFIG_SUSPEND is not setHuacai Chen1-0/+3
We can see the following build error on LoongArch if CONFIG_SUSPEND is not set: ld: drivers/acpi/sleep.o: in function 'acpi_pm_prepare': sleep.c:(.text+0x2b8): undefined reference to 'loongarch_wakeup_start' Here is the call trace: acpi_pm_prepare() __acpi_pm_prepare() acpi_sleep_prepare() acpi_get_wakeup_address() loongarch_wakeup_start() Root cause: loongarch_wakeup_start() is defined in arch/loongarch/power/ suspend_asm.S which is only built under CONFIG_SUSPEND. In order to fix the build error, just let acpi_get_wakeup_address() return 0 if CONFIG_ SUSPEND is not set. Fixes: 366bb35a8e48 ("LoongArch: Add suspend (ACPI S3) support") Reviewed-by: WANG Xuerui <[email protected]> Reported-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Fix probing of the CRC32 featureHuacai Chen5-21/+30
Not all LoongArch processors support CRC32 instructions. This feature is indicated by CPUCFG1.CRC32 (Bit25) but it is wrongly defined in the previous versions of the ISA manual (and so does in loongarch.h). The CRC32 feature is set unconditionally now, so fix it. BTW, expose the CRC32 feature in /proc/cpuinfo. Cc: [email protected] Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Make WriteCombine configurable for ioremap()Huacai Chen5-1/+47
LoongArch maintains cache coherency in hardware, but when paired with LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar to WriteCombine) is out of the scope of cache coherency machanism for PCIe devices (this is a PCIe protocol violation, which may be fixed in newer chipsets). This means WUC can only used for write-only memory regions now, so this option is disabled by default, making WUC silently fallback to SUC for ioremap(). You can enable this option if the kernel is ensured to run on hardware without this bug. Kernel parameter writecombine=on/off can be used to override the Kconfig option. Cc: [email protected] Suggested-by: WANG Xuerui <[email protected]> Reviewed-by: WANG Xuerui <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()Nikita Zhandarovich1-0/+2
Function mlxfw_mfa2_tlv_multi_get() returns NULL if 'tlv' in question does not pass checks in mlxfw_mfa2_tlv_payload_get(). This behaviour may lead to NULL pointer dereference in 'multi->total_len'. Fix this issue by testing mlxfw_mfa2_tlv_multi_get()'s return value against NULL. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Co-developed-by: Natalia Petrova <[email protected]> Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18Merge branch 'bnxt_en-bug-fixes'Paolo Abeni2-10/+11
Michael Chan says: ==================== bnxt_en: Bug fixes This small series contains 2 fixes. The first one fixes the PTP initialization logic on older chips to avoid logging a warning. The second one fixes a potenial NULL pointer dereference in the driver's aux bus unload path. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>