aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-12-29mm/mglru: skip special VMAs in lru_gen_look_around()Yu Zhao1-4/+9
Special VMAs like VM_PFNMAP can contain anon pages from COW. There isn't much profit in doing lookaround on them. Besides, they can trigger the pte_special() warning in get_pte_pfn(). Skip them in lru_gen_look_around(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 018ee47f1489 ("mm: multi-gen LRU: exploit locality in rmap") Signed-off-by: Yu Zhao <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/[email protected]/ Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29MAINTAINERS: hand over hwpoison maintainership to Miaohe LinNaoya Horiguchi1-2/+2
Miaohe Lin has contributed to hwpoison subsystem as a reviewer for more than 1.5 year, and has made many patch contributions in hwpoison subsystem and the memory management subsystem. So I'd like to pass on the hwpoison maintainership to Miaohe. [[email protected]: update to keep myself as a reviewer] Link: https://lkml.kernel.org/r/20231223031115.GA2883156@u2004 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29MAINTAINERS: remove hugetlb maintainer Mike KravetzMike Kravetz2-1/+4
I am stepping away from my role as hugetlb maintainer. There should be no gap in coverage as Muchun Song is also a hugetlb maintainer. [[email protected]: update CREDITS] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm: fix unmap_mapping_range high bits shift bugJiajun Xie1-2/+2
The bug happens when highest bit of holebegin is 1, suppose holebegin is 0x8000000111111000, after shift, hba would be 0xfff8000000111111, then vma_interval_tree_foreach would look it up fail or leads to the wrong result. error call seq e.g.: - mmap(..., offset=0x8000000111111000) |- syscall(mmap, ... unsigned long, off): |- ksys_mmap_pgoff( ... , off >> PAGE_SHIFT); here pgoff is correctly shifted to 0x8000000111111, but pass 0x8000000111111000 as holebegin to unmap would then cause terrible result, as shown below: - unmap_mapping_range(..., loff_t const holebegin) |- pgoff_t hba = holebegin >> PAGE_SHIFT; /* hba = 0xfff8000000111111 unexpectedly */ The issue happens in Heterogeneous computing, where the device(e.g. gpu) and host share the same virtual address space. A simple workflow pattern which hit the issue is: /* host */ 1. userspace first mmap a file backed VA range with specified offset. e.g. (offset=0x800..., mmap return: va_a) 2. write some data to the corresponding sys page e.g. (va_a = 0xAABB) /* device */ 3. gpu workload touches VA, triggers gpu fault and notify the host. /* host */ 4. reviced gpu fault notification, then it will: 4.1 unmap host pages and also takes care of cpu tlb (use unmap_mapping_range with offset=0x800...) 4.2 migrate sys page to device 4.3 setup device page table and resolve device fault. /* device */ 5. gpu workload continued, it accessed va_a and got 0xAABB. 6. gpu workload continued, it wrote 0xBBCC to va_a. /* host */ 7. userspace access va_a, as expected, it will: 7.1 trigger cpu vm fault. 7.2 driver handling fault to migrate gpu local page to host. 8. userspace then could correctly get 0xBBCC from va_a 9. done But in step 4.1, if we hit the bug this patch mentioned, then userspace would never trigger cpu fault, and still get the old value: 0xAABB. Making holebegin unsigned first fixes the bug. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiajun Xie <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm: memcg: fix split queue list crash when large folio migrationBaolin Wang2-1/+12
When running autonuma with enabling multi-size THP, I encountered the following kernel crash issue: [ 134.290216] list_del corruption. prev->next should be fffff9ad42e1c490, but was dead000000000100. (prev=fffff9ad42399890) [ 134.290877] kernel BUG at lib/list_debug.c:62! [ 134.291052] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 134.291210] CPU: 56 PID: 8037 Comm: numa01 Kdump: loaded Tainted: G E 6.7.0-rc4+ #20 [ 134.291649] RIP: 0010:__list_del_entry_valid_or_report+0x97/0xb0 ...... [ 134.294252] Call Trace: [ 134.294362] <TASK> [ 134.294440] ? die+0x33/0x90 [ 134.294561] ? do_trap+0xe0/0x110 ...... [ 134.295681] ? __list_del_entry_valid_or_report+0x97/0xb0 [ 134.295842] folio_undo_large_rmappable+0x99/0x100 [ 134.296003] destroy_large_folio+0x68/0x70 [ 134.296172] migrate_folio_move+0x12e/0x260 [ 134.296264] ? __pfx_remove_migration_pte+0x10/0x10 [ 134.296389] migrate_pages_batch+0x495/0x6b0 [ 134.296523] migrate_pages+0x1d0/0x500 [ 134.296646] ? __pfx_alloc_misplaced_dst_folio+0x10/0x10 [ 134.296799] migrate_misplaced_folio+0x12d/0x2b0 [ 134.296953] do_numa_page+0x1f4/0x570 [ 134.297121] __handle_mm_fault+0x2b0/0x6c0 [ 134.297254] handle_mm_fault+0x107/0x270 [ 134.300897] do_user_addr_fault+0x167/0x680 [ 134.304561] exc_page_fault+0x65/0x140 [ 134.307919] asm_exc_page_fault+0x22/0x30 The reason for the crash is that, the commit 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") removed the charging and uncharging operations of the migration folios and cleared the memcg data of the old folio. During the subsequent release process of the old large folio in destroy_large_folio(), if the large folio needs to be removed from the split queue, an incorrect split queue can be obtained (which is pgdat->deferred_split_queue) because the old folio's memcg is NULL now. This can lead to list operations being performed under the wrong split queue lock protection, resulting in a list crash as above. After the migration, the old folio is going to be freed, so we can remove it from the split queue in mem_cgroup_migrate() a bit earlier before clearing the memcg data to avoid getting incorrect split queue. [[email protected]: fix comment, per Zi Yan] Link: https://lkml.kernel.org/r/61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration") Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm: fix arithmetic for max_prop_frac when setting max_ratioJingbo Xu1-1/+2
Since now bdi->max_ratio is part per million, fix the wrong arithmetic for max_prop_frac when setting max_ratio. Otherwise the miscalculated max_prop_frac will affect the incrementing of writeout completion count when max_ratio is not 100%. Link: https://lkml.kernel.org/r/[email protected] Fixes: efc3e6ad53ea ("mm: split off __bdi_set_max_ratio() function") Signed-off-by: Jingbo Xu <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Stefan Roesch <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm: fix arithmetic for bdi min_ratioJingbo Xu1-1/+0
Since now bdi->min_ratio is part per million, fix the wrong arithmetic. Otherwise it will fail with -EINVAL when setting a reasonable min_ratio, as it tries to set min_ratio to (min_ratio * BDI_RATIO_SCALE) in percentage unit, which exceeds 100% anyway. # cat /sys/class/bdi/253\:0/min_ratio 0 # cat /sys/class/bdi/253\:0/max_ratio 100 # echo 1 > /sys/class/bdi/253\:0/min_ratio -bash: echo: write error: Invalid argument Link: https://lkml.kernel.org/r/[email protected] Fixes: 8021fb3232f2 ("mm: split off __bdi_set_min_ratio() function") Signed-off-by: Jingbo Xu <[email protected]> Reported-by: Joseph Qi <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Stefan Roesch <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm: align larger anonymous mappings on THP boundariesRik van Riel1-0/+3
Align larger anonymous memory mappings on THP boundaries by going through thp_get_unmapped_area if THPs are enabled for the current process. With this patch, larger anonymous mappings are now THP aligned. When a malloc library allocates a 2MB or larger arena, that arena can now be mapped with THPs right from the start, which can result in better TLB hit rates and execution time. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rik van Riel <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Christopher Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29ACPI: button: trigger wakeup key eventsKen Xue1-0/+10
Andorid can wakeup from various wakeup sources, but only several wakeup sources can wake up screen with right events(POWER, WAKEUP) from input device. Regarding pressing acpi power button, it can resume system and ACPI_BITMASK_WAKE_STATUS and ACPI_BITMASK_POWER_BUTTON_STATUS are set in pm1a_sts, but kernel does not report any key event to user space during resuming by default. So, send wakeup key event to user space during resume from power button. Signed-off-by: Ken Xue <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29ACPI: resource: Add another DMI match for the TongFang GMxXGxxHans de Goede1-0/+7
The TongFang GMxXGxx, which needs IRQ overriding for the keyboard to work, is also sold as the Eluktronics RP-15 which does not use the standard TongFang GMxXGxx DMI board_name. Add an entry for this laptop to the irq1_edge_low_force_override[] DMI table to make the internal keyboard functional. Reported-by: Luis Acuna <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Cc: All applicable <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29cpuidle: haltpoll: Do not enable interrupts when entering idleBorislav Petkov (AMD)1-5/+4
The cpuidle drivers' ->enter() methods are supposed to be IRQ invariant: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") bb7b11258561 ("cpuidle: Move IRQ state validation") Do that in the haltpoll driver too. Fixes: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218245 Reported-by: <[email protected]> Tested-by: <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offsetEugen Hristev1-1/+1
AUD_PAD_TOP widget's correct register is AFE_AUD_PAD_TOP , and not zero. Having a zero as register, it would mean that the `snd_soc_dapm_new_widgets` would try to read the register at offset zero when trying to get the power status of this widget, which is incorrect. Fixes: b65c466220b3 ("ASoC: mediatek: mt8186: support adda in platform driver") Signed-off-by: Eugen Hristev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-12-29thermal: gov_power_allocator: Support new update callback of weightsLukasz Luba1-6/+9
When the thermal instance's weight is updated from the sysfs the governor update_tz() callback is triggered. Implement proper reaction to this event in the IPA, which would save CPU cycles spent in throttle(). This will speed-up the main throttle() IPA function and clean it up a bit. Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal/sysfs: Update governors when the 'weight' has changedLukasz Luba2-0/+6
Support governors update when the thermal instance's weight has changed. This allows to adjust internal state for the governor. Signed-off-by: Lukasz Luba <[email protected]> [ rjw: Add two empty code lines aroung the locking ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal/sysfs: Update instance->weight under tz lockLukasz Luba1-0/+4
User space can change the weight of a thermal instance via sysfs while the .throttle() callback is running for a governor, because weight_store() does not use the zone lock. The IPA governor uses instance weight values for power calculations and caches the sum of them as total_weight, so it gets confused when one of them changes while its .throttle() callback is running. To prevent that from happening, use thermal zone locking in weight_store(). Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal: gov_power_allocator: Simplify checks for valid power actorLukasz Luba1-23/+17
There is a need to check if the cooling device in the thermal zone supports IPA callback and is set for control trip point. Refactor the code which validates the power actor capabilities and make it more consistent in all places. No intentional functional impact. Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal: gov_power_allocator: Move memory allocation out of throttle()Lukasz Luba1-71/+136
The new thermal callback allows to react to the change of cooling instances in the thermal zone. Move the memory allocation to that new callback and save CPU cycles in the throttle() code path. Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal: gov_power_allocator: Change trace functionsLukasz Luba2-23/+32
Change trace event trace_thermal_power_allocator() to not use dynamic array for requested power and granted power for all power actors. Instead, simplify the trace event and print other simple values. Add new trace event to print power actor information of requested power and granted power. That trace event would be called in a loop for each power actor. The trace data would be easier to parse comparing to the dynamic array implementation. Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal: gov_power_allocator: Refactor checks in divvy_up_power()Lukasz Luba1-10/+10
Simplify the code and remove one extra 'if' block. No intentional functional impact. Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal: gov_power_allocator: Refactor check_power_actors()Lukasz Luba1-4/+6
In preparation for a subsequent change, rearrange check_power_actors(). No intentional functional impact. Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29thermal: core: Add governor callback for thermal zone changeLukasz Luba3-0/+22
Add a new callback to the struct thermal_governor. It can be used for updating governors when there is a change in the thermal zone internals, e.g. thermal cooling device is bind to the thermal zone. That makes possible to move some heavy operations like memory allocations related to the number of cooling instances out of the throttle() callback. Both callback code paths (throttle() and update_tz()) are protected with the same thermal zone lock, which guaranties the consistency. Signed-off-by: Lukasz Luba <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-29cifs: do not depend on release_iface for maintaining iface_listShyam Prasad N2-11/+17
parse_server_interfaces should be in complete charge of maintaining the iface_list linked list. Today, iface entries are removed from the list only when the last refcount is dropped. i.e. in release_iface. However, this can result in undercounting of refcount if the server stops advertising interfaces (which Azure SMB server does). This change puts parse_server_interfaces in full charge of maintaining the iface_list. So if an empty list is returned by the server, the entries in the list will immediately be removed. This way, a following call to the same function will not find entries in the list. Fixes: aa45dadd34e4 ("cifs: change iface_list from array to sorted linked list") Cc: [email protected] Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-12-29cifs: cifs_chan_is_iface_active should be called with chan_lock heldShyam Prasad N2-3/+11
cifs_chan_is_iface_active checks the channels of a session to see if the associated iface is active. This should always happen with chan_lock held. However, these two callers of this function were missing this locking. This change makes sure the function calls are protected with proper locking. Fixes: b54034a73baf ("cifs: during reconnect, update interface if necessary") Fixes: fa1d0508bdd4 ("cifs: account for primary channel in the interface list") Cc: [email protected] Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-12-29cifs: after disabling multichannel, mark tcon for reconnectShyam Prasad N1-8/+12
Once the server disables multichannel for an active multichannel session, on the following reconnect, the client would reduce the number of channels to 1. However, it could be the case that the tree connect was active on one of these disabled channels. This results in an unrecoverable state. This change fixes that by making sure that whenever a channel is being terminated, the session and tcon are marked for reconnect too. This could mean a few redundant tree connect calls to the server, but considering that this is not a frequent event, we should be okay. Fixes: ee1d21794e55 ("cifs: handle when server stops supporting multichannel") Signed-off-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-12-29ALSA: scarlett2: Convert meter levels from little-endianGeoffrey D. Bennett1-2/+2
Add missing conversion from little-endian data to CPU-endian in scarlett2_usb_get_meter_levels(). Fixes: 3473185f31df ("ALSA: scarlett2: Remap Level Meter values") Signed-off-by: Geoffrey D. Bennett <[email protected]> Link: https://lore.kernel.org/r/ZYsBIE3DSKdi4YC/@m.b4.vu Signed-off-by: Takashi Iwai <[email protected]>
2023-12-29tracing: Fix blocked reader of snapshot bufferSteven Rostedt (Google)2-4/+19
If an application blocks on the snapshot or snapshot_raw files, expecting to be woken up when a snapshot occurs, it will not happen. Or it may happen with an unexpected result. That result is that the application will be reading the main buffer instead of the snapshot buffer. That is because when the snapshot occurs, the main and snapshot buffers are swapped. But the reader has a descriptor still pointing to the buffer that it originally connected to. This is fine for the main buffer readers, as they may be blocked waiting for a watermark to be hit, and when a snapshot occurs, the data that the main readers want is now on the snapshot buffer. But for waiters of the snapshot buffer, they are waiting for an event to occur that will trigger the snapshot and they can then consume it quickly to save the snapshot before the next snapshot occurs. But to do this, they need to read the new snapshot buffer, not the old one that is now receiving new data. Also, it does not make sense to have a watermark "buffer_percent" on the snapshot buffer, as the snapshot buffer is static and does not receive new data except all at once. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Mathieu Desnoyers <[email protected]> Cc: Mark Rutland <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Fixes: debdd57f5145f ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-29ring-buffer: Fix wake ups when buffer_percent is set to 100Steven Rostedt (Google)1-2/+7
The tracefs file "buffer_percent" is to allow user space to set a water-mark on how much of the tracing ring buffer needs to be filled in order to wake up a blocked reader. 0 - is to wait until any data is in the buffer 1 - is to wait for 1% of the sub buffers to be filled 50 - would be half of the sub buffers are filled with data 100 - is not to wake the waiter until the ring buffer is completely full Unfortunately the test for being full was: dirty = ring_buffer_nr_dirty_pages(buffer, cpu); return (dirty * 100) > (full * nr_pages); Where "full" is the value for "buffer_percent". There is two issues with the above when full == 100. 1. dirty * 100 > 100 * nr_pages will never be true That is, the above is basically saying that if the user sets buffer_percent to 100, more pages need to be dirty than exist in the ring buffer! 2. The page that the writer is on is never considered dirty, as dirty pages are only those that are full. When the writer goes to a new sub-buffer, it clears the contents of that sub-buffer. That is, even if the check was ">=" it would still not be equal as the most pages that can be considered "dirty" is nr_pages - 1. To fix this, add one to dirty and use ">=" in the compare. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Fixes: 03329f9939781 ("tracing: Add tracefs file buffer_percentage") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-29platform/x86/intel/pmc: Move GBE LTR ignore to suspend callbackDavid E. Box5-23/+33
Commit 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") caused a network performance regression due to the GBE LTR ignore that it added at probe. This was needed in order to allow the SoC to enter the deepest Package C state. To fix the regression and at least support PC10 during suspend, move the LTR ignore from probe to the suspend callback, and enable it again on resume. This solution will allow PC10 during suspend but restrict Package C entry at runtime to no deeper than PC8/9 while a network cable it attach to the PCH LAN. Fixes: 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") Signed-off-by: "David E. Box" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]>
2023-12-29platform/x86/intel/pmc: Allow reenabling LTRsDavid E. Box6-8/+11
Commit 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") caused a network performance regression due to the GBE LTR ignore that it added during probe. The fix will move the ignore to occur at suspend-time (so as to not affect suspend power). This will require the ability to enable the LTR again on resume. Modify pmc_core_send_ltr_ignore() to allow enabling an LTR. Fixes: 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") Signed-off-by: "David E. Box" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]>
2023-12-29platform/x86/intel/pmc: Add suspend callbackDavid E. Box2-0/+5
Add a suspend callback to struct pmc for performing platform specific tasks before device suspend. This is needed in order to perform GBE LTR ignore on certain platforms at suspend-time instead of at probe-time and replace the GBE LTR ignore removal that was done in order to fix a bug introduced by commit 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()"). Fixes: 804951203aa5 ("platform/x86:intel/pmc: Combine core_init() and core_configure()") Signed-off-by: "David E. Box" <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]>
2023-12-29platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probeShin'ichiro Kawasaki1-41/+131
p2sb_bar() unhides P2SB device to get resources from the device. It guards the operation by locking pci_rescan_remove_lock so that parallel rescans do not find the P2SB device. However, this lock causes deadlock when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan locks pci_rescan_remove_lock and probes PCI devices. When PCI devices call p2sb_bar() during probe, it locks pci_rescan_remove_lock again. Hence the deadlock. To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar(). Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources() for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(), refer the cache and return to the caller. Suggested-by: Andy Shevchenko <[email protected]> Fixes: 9745fb07474f ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support") Cc: [email protected] Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsuawuqqqydx@7cl3uik5ef6j/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ilpo Järvinen <[email protected]>
2023-12-29ALSA: hda/tas2781: remove sound controls in unbindGergo Koteles1-93/+130
Remove sound controls in hda_unbind to make module loadable after module unload. Add a driver specific struct (tas2781_hda) to store the controls. This patch depends on patch: ALSA: hda/tas2781: do not use regcache Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") CC: [email protected] Signed-off-by: Gergo Koteles <[email protected]> Link: https://lore.kernel.org/r/362aa3e2f81b9259a3e5222f576bec5debfc5e88.1703204848.git.soyer@irl.hu Signed-off-by: Takashi Iwai <[email protected]>
2023-12-29ALSA: hda/tas2781: move set_drv_data outside tasdevice_initGergo Koteles3-2/+4
allow driver specific driver data in tas2781-hda-i2c and tas2781-i2c Fixes: ef3bcde75d06 ("ASoC: tas2781: Add tas2781 driver") CC: [email protected] Signed-off-by: Gergo Koteles <[email protected]> Link: https://lore.kernel.org/r/1398bd8bf3e935b1595a99128320e4a1913e210a.1703204848.git.soyer@irl.hu Signed-off-by: Takashi Iwai <[email protected]>
2023-12-29ALSA: hda/tas2781: fix typos in commentGergo Koteles1-2/+2
Correct typos. Signed-off-by: Gergo Koteles <[email protected]> Link: https://lore.kernel.org/r/ead5609d63e71e8e87c13e1767decca5b272d696.1703203812.git.soyer@irl.hu Signed-off-by: Takashi Iwai <[email protected]>
2023-12-29ALSA: hda/tas2781: do not use regcacheGergo Koteles2-17/+2
There are two problems with using regcache in this module. The amplifier has 3 addressing levels (BOOK, PAGE, REG). The firmware contains blocks that must be written to BOOK 0x8C. The regcache doesn't know anything about BOOK, so regcache_sync writes invalid values to the actual BOOK. The module handles 2 or more separate amplifiers. The amplifiers have different register values, and the module uses only one regmap/regcache for all the amplifiers. The regcache_sync only writes the last amplifier used. The module successfully restores all the written register values (RC profile, program, configuration, calibration) without regcache. Remove regcache functions and set regmap cache_type to REGCACHE_NONE. Link: https://lore.kernel.org/r/21a183b5a08cb23b193af78d4b1114cc59419272.1701906455.git.soyer@irl.hu/ Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") Acked-by: Mark Brown <[email protected]> CC: [email protected] Signed-off-by: Gergo Koteles <[email protected]> Link: https://lore.kernel.org/r/491aeed0e2eecc3b704ec856f815db21bad3ba0e.1703202126.git.soyer@irl.hu Signed-off-by: Takashi Iwai <[email protected]>
2023-12-29sched/fair: Fix tg->load when offlining a CPUVincent Guittot1-0/+52
When a CPU is taken offline, the contribution of its cfs_rqs to task_groups' load may remain and will negatively impact the calculation of the share of the online CPUs. To fix this bug, clear the contribution of an offlining CPU to task groups' load and skip its contribution while it is inactive. Here's the reproducer of the anomaly, by Imran Khan: "So far I have encountered only one rather lengthy way of reproducing this issue, which is as follows: 1. Take a KVM guest (booted with 4 CPUs and can be scaled up to 124 CPUs) and create 2 custom cgroups: /sys/fs/cgroup/cpu/test_group_1 and /sys/fs/cgroup/ cpu/test_group_2 2. Assign a CPU intensive workload to each of these cgroups and start the workload. For my tests I am using following app: int main(int argc, char *argv[]) { unsigned long count, i, val; if (argc != 2) { printf("usage: ./a.out <number of random nums to generate> \n"); return 0; } count = strtoul(argv[1], NULL, 10); printf("Generating %lu random numbers \n", count); for (i = 0; i < count; i++) { val = rand(); val = val % 2; //usleep(1); } printf("Generated %lu random numbers \n", count); return 0; } Also since the system is booted with 4 CPUs, in order to completely load the system I am also launching 4 instances of same test app under: /sys/fs/cgroup/cpu/ 3. We can see that both of the cgroups get similar CPU time: # systemd-cgtop --depth 1 Path Tasks %CPU Memory Input/s Output/s / 659 - 5.5G - - /system.slice - - 5.7G - - /test_group_1 4 - - - - /test_group_2 3 - - - - /user.slice 31 - 56.5M - - Path Tasks %CPU Memory Input/s Output/s / 659 394.6 5.5G - - /test_group_2 3 65.7 - - - /user.slice 29 55.1 48.0M - - /test_group_1 4 47.3 - - - /system.slice - 2.2 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.8 5.5G - - /test_group_1 4 62.9 - - - /user.slice 28 44.9 54.2M - - /test_group_2 3 44.7 - - - /system.slice - 0.9 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.4 5.5G - - /test_group_2 3 58.8 - - - /test_group_1 4 51.9 - - - /user.slice 30 39.3 59.6M - - /system.slice - 1.9 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.7 5.5G - - /test_group_1 4 60.9 - - - /test_group_2 3 57.9 - - - /user.slice 28 43.5 36.9M - - /system.slice - 3.0 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 395.0 5.5G - - /test_group_1 4 66.8 - - - /test_group_2 3 56.3 - - - /user.slice 29 43.1 51.8M - - /system.slice - 0.7 5.7G - - 4. Now move systemd-udevd to one of these test groups, say test_group_1, and perform scale up to 124 CPUs followed by scale down back to 4 CPUs from the host side. 5. Run the same workload i.e 4 instances of CPU hogger under /sys/fs/cgroup/cpu and one instance of CPU hogger each in /sys/fs/cgroup/cpu/test_group_1 and /sys/fs/cgroup/test_group_2. It can be seen that test_group_1 (the one where systemd-udevd was moved) is getting much less CPU time than the test_group_2, even though at this point of time both of these groups have only CPU hogger running: # systemd-cgtop --depth 1 Path Tasks %CPU Memory Input/s Output/s / 1219 - 5.4G - - /system.slice - - 5.6G - - /test_group_1 4 - - - - /test_group_2 3 - - - - /user.slice 26 - 91.3M - - Path Tasks %CPU Memory Input/s Output/s / 1221 394.3 5.4G - - /test_group_2 3 82.7 - - - /test_group_1 4 14.3 - - - /system.slice - 0.8 5.6G - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 394.6 5.4G - - /test_group_2 3 67.4 - - - /system.slice - 24.6 5.6G - - /test_group_1 4 12.5 - - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.2 5.4G - - /test_group_2 3 60.9 - - - /system.slice - 27.9 5.6G - - /test_group_1 4 12.2 - - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.2 5.4G - - /test_group_2 3 69.4 - - - /test_group_1 4 13.9 - - - /user.slice 28 1.6 92.0M - - /system.slice - 1.0 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.6 5.4G - - /test_group_2 3 59.3 - - - /test_group_1 4 14.1 - - - /user.slice 28 1.3 92.2M - - /system.slice - 0.7 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.5 5.4G - - /test_group_2 3 67.2 - - - /test_group_1 4 11.5 - - - /user.slice 28 1.3 92.5M - - /system.slice - 0.6 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.1 5.4G - - /test_group_2 3 76.8 - - - /test_group_1 4 12.9 - - - /user.slice 28 1.3 92.8M - - /system.slice - 1.2 5.6G - - From sched_debug data it can be seen that in bad case the load.weight of per-CPU sched entities corresponding to test_group_1 has reduced significantly and also load_avg of test_group_1 remains much higher than that of test_group_2, even though systemd-udevd stopped running long time back and at this point of time both cgroups just have the CPU hogger app as running entity." [ mingo: Added details from the original discussion, plus minor edits to the patch. ] Reported-by: Imran Khan <[email protected]> Tested-by: Imran Khan <[email protected]> Tested-by: Aaron Lu <[email protected]> Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Imran Khan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-29ptp: ocp: fix bug in unregistering the DPLL subsystemSagi Maimon1-1/+1
When unregistering the DPLL subsystem the priv pointer is different then the one used for registration which cause failure in unregistering. Fixes: 09eeb3aecc6c ("ptp_ocp: implement DPLL ops") Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Vadim Fedorenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-29Merge tag 'nf-23-12-20' of ↵David S. Miller3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablu Neira Syuso says: ==================== netfilter pull request 23-12-20 The following patchset contains Netfilter fixes for net: 1) Skip set commit for deleted/destroyed sets, this might trigger double deactivation of expired elements. 2) Fix packet mangling from egress, set transport offset from mac header for netdev/egress. Both fixes address bugs already present in several releases. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-12-29Merge branch 'topic/ppc-kvm' into nextMichael Ellerman8-39/+107
Merge our topic branch containing KVM related patches.
2023-12-29powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2Geoff Levand1-0/+1
Commit 8c5fa3b5c4df ("powerpc/64: Make ELFv2 the default for big-endian builds"), merged in Linux-6.5-rc1 changes the calling ABI in a way that is incompatible with the current code for the PS3's LV1 hypervisor calls. This change just adds the line '# CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 is not set' to the ps3_defconfig file so that the PPC64_ELF_ABI_V1 is used. Fixes run time errors like these: BUG: Kernel NULL pointer dereference at 0x00000000 Faulting instruction address: 0xc000000000047cf0 Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: [c0000000023039e0] [c00000000100ebfc] ps3_create_spu+0xc4/0x2b0 (unreliable) [c000000002303ab0] [c00000000100d4c4] create_spu+0xcc/0x3c4 [c000000002303b40] [c00000000100eae4] ps3_enumerate_spus+0xa4/0xf8 Fixes: 8c5fa3b5c4df ("powerpc/64: Make ELFv2 the default for big-endian builds") Cc: [email protected] # v6.5+ Signed-off-by: Geoff Levand <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-12-29powerpc/86xx: Drop unused CONFIG_MPC8610Michael Ellerman1-7/+0
The MPC8610 symbol used to be default y if MPC8610_HPCD, but since MPC8610_HPCD was removed MPC8610 is now never used. Remove it. Fixes: 248667f8bbde ("powerpc: drop HPCD/MPC8610 evaluation platform support") Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-12-29ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enableChancel Liu1-2/+8
There is error message when defer probe happens: fsl_rpmsg rpmsg_audio: Unbalanced pm_runtime_enable! Fix the error handler with pm_runtime_enable. Fixes: b73d9e6225e8 ("ASoC: fsl_rpmsg: Add CPU DAI driver for audio base on rpmsg") Signed-off-by: Chancel Liu <[email protected]> Acked-by: Shengjiu Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-12-28Merge tag '6.7rc7-smb3-srv-fix' of git://git.samba.org/ksmbdLinus Torvalds1-3/+12
Pull ksmbd server fix from Steve French: - address possible slab out of bounds in parsing of open requests * tag '6.7rc7-smb3-srv-fix' of git://git.samba.org/ksmbd: ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
2023-12-28Merge tag 'kbuild-fixes-v6.7-2' of ↵Linus Torvalds4-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Revive proper alignment for the ksymtab and kcrctab sections - Fix gen_compile_commands.py tool to resolve symbolic links - Fix symbolic links to installed debug VDSO files - Update MAINTAINERS * tag 'kbuild-fixes-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: linux/export: Ensure natural alignment of kcrctab array kbuild: fix build ID symlinks to installed debug VDSO files gen_compile_commands.py: fix path resolve with symlinks in it MAINTAINERS: Add scripts/clang-tools to Kbuild section linux/export: Fix alignment for 64-bit ksymtab entries
2023-12-28Merge tag 'bcachefs-2023-12-27' of https://evilpiepirate.org/git/bcachefsLinus Torvalds5-12/+16
Pull bcachefs fixes from Kent Overstreet: "Just a few fixes: besides a few one liners, we have a fix for snapshots + compression where the extent update path didn't account for the fact that with snapshots, we might split an existing extent into three, not just two; and a small fixup for promotes which were broken by the recent changes in the data update path to correctly take into account device durability" * tag 'bcachefs-2023-12-27' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix promotes bcachefs: Fix leakage of internal error code bcachefs: Fix insufficient disk reservation with compression + snapshots bcachefs: fix BCH_FSCK_ERR enum
2023-12-28thermal: netlink: Add thermal_group_has_listeners() helperStanislaw Gruszka1-0/+11
Add a helper function to check if there are listeners for thermal_gnl_family multicast groups. For now use it to avoid unnecessary allocations and sending thermal genl messages when there are no recipients. In the future, in conjunction with (not yet implemented) notification of change in the netlink socket group membership, this helper can be used to open/close hardware interfaces based on the presence of user space subscribers. Signed-off-by: Stanislaw Gruszka <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-28thermal: netlink: Add enum for mutlicast groups indexesStanislaw Gruszka1-4/+9
Use enum instead of hard-coded numbers for indexing multicast groups. Signed-off-by: Stanislaw Gruszka <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-12-28mm/slub: free KFENCE objects in slab_free_hook()Vlastimil Babka1-12/+10
When freeing an object that was allocated from KFENCE, we do that in the slowpath __slab_free(), relying on the fact that KFENCE "slab" cannot be the cpu slab, so the fastpath has to fallback to the slowpath. This optimization doesn't help much though, because is_kfence_address() is checked earlier anyway during the free hook processing or detached freelist building. Thus we can simplify the code by making the slab_free_hook() free the KFENCE object immediately, similarly to KASAN quarantine. In slab_free_hook() we can place kfence_free() above init processing, as callers have been making sure to set init to false for KFENCE objects. This simplifies slab_free(). This places it also above kasan_slab_free() which is ok as that skips KFENCE objects anyway. While at it also determine the init value in slab_free_freelist_hook() outside of the loop. This change will also make introducing per cpu array caches easier. Tested-by: Marco Elver <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2023-12-29linux/export: Ensure natural alignment of kcrctab arrayHelge Deller1-0/+1
The ___kcrctab section holds an array of 32-bit CRC values. Add a .balign 4 to tell the linker the correct memory alignment. Fixes: f3304ecd7f06 ("linux/export: use inline assembler to populate symbol CRCs") Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2023-12-28thermal: core: Resume thermal zones asynchronouslyRafael J. Wysocki1-4/+26
The resume of thermal zones in thermal_pm_notify() is carried out sequentially, which may be a problem if __thermal_zone_device_update() takes a significant time to run for some thermal zones, because some other thermal zones may need to wait for them to resume then and if any other PM notifiers are going to be invoked after the thermal one, they will need to wait for it either. To address this, make thermal_pm_notify() switch the poll_queue delayed work over to a one-shot thermal_zone_device_resume() work function that will restore the original one during the thermal zone resume and queue up poll_queue without a delay for each thermal zone. Link: https://lore.kernel.org/linux-pm/[email protected]/ Reported-by: Radu Solea <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>