aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-04-19net: skbuff: hide csum_not_inet when CONFIG_IP_SCTP not setJakub Kicinski1-0/+14
SCTP is not universally deployed, allow hiding its bit from the skb. Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net: skbuff: hide wifi_acked when CONFIG_WIRELESS not setJakub Kicinski1-0/+11
Datacenter kernel builds will very likely not include WIRELESS, so let them shave 2 bits off the skb by hiding the wifi fields. Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net: phy: phy_device: Call into the PHY driver to set LED blinkingAndrew Lunn1-0/+12
Linux LEDs can be requested to perform hardware accelerated blinking. Pass this to the PHY driver, if it implements the op. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net: phy: phy_device: Call into the PHY driver to set LED brightnessAndrew Lunn1-0/+13
Linux LEDs can be software controlled via the brightness file in /sys. LED drivers need to implement a brightness_set function which the core will call. Implement an intermediary in phy_device, which will call into the phy driver if it implements the necessary function. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19net: phy: Add a binding for PHY LEDsAndrew Lunn1-0/+16
Define common binding parsing for all PHY drivers with LEDs using phylib. Parse the DT as part of the phy_probe and add LEDs to the linux LED class infrastructure. For the moment, provide a dummy brightness function, which will later be replaced with a call into the PHY driver. This allows testing since the LED core might otherwise reject an LED whose brightness cannot be set. Add a dependency on LED_CLASS. It either needs to be built in, or not enabled, since a modular build can result in linker errors. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-19leds: Provide stubs for when CLASS_LED & NEW_LEDS are disabledAndrew Lunn1-0/+18
Provide stubs for devm_led_classdev_register_ext() and led_init_default_state_get() so that LED drivers embedded within other drivers such as PHYs and Ethernet switches still build when LEDS_CLASS or NEW_LEDS are disabled. This also helps with Kconfig dependencies, which are somewhat hairy for phylib and mdio and only get worse when adding a dependency on LED_CLASS. Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: Christian Marangi <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-18bonding: add software tx timestamping supportHangbin Liu1-0/+1
Currently, bonding only obtain the timestamp (ts) information of the active slave, which is available only for modes 1, 5, and 6. For other modes, bonding only has software rx timestamping support. However, some users who use modes such as LACP also want tx timestamp support. To address this issue, let's check the ts information of each slave. If all slaves support tx timestamping, we can enable tx timestamping support for the bond. Add a note that the get_ts_info may be called with RCU, or rtnl or reference on the device in ethtool.h> Suggested-by: Miroslav Lichvar <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Jay Vosburgh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski1-2/+3
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Unbreak br_netfilter physdev match support, from Florian Westphal. 2) Use GFP_KERNEL_ACCOUNT for stateful/policy objects, from Chen Aotian. 3) Use IS_ENABLED() in nf_reset_trace(), from Florian Westphal. 4) Fix validation of catch-all set element. 5) Tighten requirements for catch-all set elements. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements netfilter: nf_tables: validate catch-all set elements netfilter: nf_tables: fix ifdef to also consider nf_tables=m netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT netfilter: br_netfilter: fix recent physdev match breakage ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-18mm: ksm: support hwpoison for ksm pageLonglong Xia2-0/+18
hwpoison_user_mappings() is updated to support ksm pages, and add collect_procs_ksm() to collect processes when the error hit an ksm page. The difference from collect_procs_anon() is that it also needs to traverse the rmap-item list on the stable node of the ksm page. At the same time, add_to_kill_ksm() is added to handle ksm pages. And task_in_to_kill_list() is added to avoid duplicate addition of tsk to the to_kill list. This is because when scanning the list, if the pages that make up the ksm page all come from the same process, they may be added repeatedly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Longlong Xia <[email protected]> Tested-by: Naoya Horiguchi <[email protected]> Reviewed-by: Kefeng Wang <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Nanyong Sun <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18delayacct: track delays from IRQ/SOFTIRQYang Yang1-0/+15
Delay accounting does not track the delay of IRQ/SOFTIRQ. While IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such as when workloads are running on system which is busy handling network IRQ/SOFTIRQ. Get the delay of IRQ/SOFTIRQ could help users to reduce such delay. Such as setting interrupt affinity or task affinity, using kernel thread for NAPI etc. This is inspired by "sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"[1]. Also fix some code indent problems of older code. And update tools/accounting/getdelays.c: / # ./getdelays -p 156 -di print delayacct stats ON printing IO accounting PID 156 CPU count real total virtual total delay total delay average 15 15836008 16218149 275700790 18.380ms IO count delay total delay average 0 0 0.000ms SWAP count delay total delay average 0 0 0.000ms RECLAIM count delay total delay average 0 0 0.000ms THRASHING count delay total delay average 0 0 0.000ms COMPACT count delay total delay average 0 0 0.000ms WPCOPY count delay total delay average 36 7586118 0.211ms IRQ count delay total delay average 42 929161 0.022ms [1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Yang <[email protected]> Cc: Jiang Xuexin <[email protected]> Cc: wangyong <[email protected]> Cc: junhua huang <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18lib/rbtree: use '+' instead of '|' for setting color.Noah Goldstein1-2/+2
This has a slight benefit for x86 and has no effect on other targets. The benefit to x86 is it change the codegen for setting a node to block from `mov %r0, %r1; or $RB_BLACK, %r1` to `lea RB_BLACK(%r0), %r1` which saves an instructions. In all other cases it just replace ALU with ALU (or -> and) which perform the same on all machines I am aware of. Total instructions in rbtree.o: Before - 802 After - 782 so it saves about 20 `mov` instructions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Noah Goldstein <[email protected]> Cc: Michel Lespinasse <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18memfd: pass argument of memfd_fcntl as intLuca Vizzarro1-2/+2
The interface for fcntl expects the argument passed for the command F_ADD_SEALS to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. This commit changes the signature of all the related and helper functions so that they treat the argument as int instead of long. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luca Vizzarro <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Kevin Brodsky <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Szabolcs Nagy <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: David Laight <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: Multi-gen LRU: remove wait_event_killable()Kalesh Singh1-6/+2
Android 14 and later default to MGLRU [1] and field telemetry showed occasional long tail latency (>100ms) in the reclaim path. Tracing revealed priority inversion in the reclaim path. In try_to_inc_max_seq(), when high priority tasks were blocked on wait_event_killable(), the preemption of the low priority task to call wake_up_all() caused those high priority tasks to wait longer than necessary. In general, this problem is not different from others of its kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU because it introduced the new wait queue lruvec->mm_state.wait. The purpose of this new wait queue is to avoid the thundering herd problem. If many direct reclaimers rush into try_to_inc_max_seq(), only one can succeed, i.e., the one to wake up the rest, and the rest who failed might cause premature OOM kills if they do not wait. So far there is no evidence supporting this scenario, based on how often the wait has been hit. And this begs the question how useful the wait queue is in practice. Based on Minchan's recommendation, which is in line with his commit 6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path") and the rest of the MGLRU code which also uses trylock when possible, remove the wait queue. [1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf Link: https://lkml.kernel.org/r/[email protected] Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Kalesh Singh <[email protected]> Suggested-by: Minchan Kim <[email protected]> Reported-by: Wei Wang <[email protected]> Acked-by: Yu Zhao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Jan Alexander Steffens (heftig) <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: vmscan: refactor updating current->reclaim_stateYosry Ahmed1-1/+16
During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct. However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: NeilBrown <[email protected]> Cc: Peter Xu <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tim Chen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: kmsan: apply __must_check to non-void functionsAlexander Potapenko1-21/+22
Non-void KMSAN hooks may return error codes that indicate that KMSAN failed to reflect the changed memory state in the metadata (e.g. it could not create the necessary memory mappings). In such cases the callers should handle the errors to prevent the tool from using the inconsistent metadata in the future. We mark non-void hooks with __must_check so that error handling is not skipped. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dipanjan Das <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: hwpoison: support recovery from HugePage copy-on-write faultsLiu Shixin1-3/+3
copy-on-write of hugetlb user pages with uncorrectable errors will result in a kernel crash. This is because the copy is performed in kernel mode and in general we can not handle accessing memory with such errors while in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") introduced the routine copy_user_highpage_mc() to gracefully handle copying of user pages with uncorrectable errors. However, the separate hugetlb copy-on-write code paths were not modified as part of commit a873dfe1032a. Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() so that they can also gracefully handle uncorrectable errors in user pages. This involves changing the hugetlb specific routine copy_user_large_folio() from type void to int so that it can return an error. Modify the hugetlb userfaultfd code in the same way so that it can return -EHWPOISON if it encounters an uncorrectable error. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Acked-by: Mike Kravetz <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAPAneesh Kumar K.V1-1/+1
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Joao Martins <[email protected]> Cc: Dan Williams <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Tarun Sahu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm/vmemmap/devdax: fix kernel crash when probing devdax devicesAneesh Kumar K.V1-0/+16
commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Link: https://lkml.kernel.org/r/[email protected] Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Signed-off-by: Aneesh Kumar K.V <[email protected]> Reported-by: Tarun Sahu <[email protected]> Reviewed-by: Joao Martins <[email protected]> Cc: Muchun Song <[email protected]> Cc: Dan Williams <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18userfaultfd: convert mfill_atomic() to use a folioZhangPeng1-2/+2
Convert mfill_atomic_pte_copy(), shmem_mfill_atomic_pte() and mfill_atomic_pte() to take in a folio pointer. Convert mfill_atomic() to use a folio. Convert page_kaddr to kaddr in mfill_atomic(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: convert copy_user_huge_page() to copy_user_large_folio()ZhangPeng1-4/+3
Replace copy_user_huge_page() with copy_user_large_folio(). copy_user_large_folio() does the same as copy_user_huge_page(), but takes in folios instead of pages. Remove pages_per_huge_page from copy_user_large_folio(), because we can get that from folio_nr_pages(dst). Convert copy_user_gigantic_page() to take in folios. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Sidhartha Kumar <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18userfaultfd: convert mfill_atomic_hugetlb() to use a folioZhangPeng1-2/+2
Convert hugetlb_mfill_atomic_pte() to take in a folio pointer instead of a page pointer. Convert mfill_atomic_hugetlb() to use a folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Reviewed-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18userfaultfd: convert copy_huge_page_from_user() to copy_folio_from_user()ZhangPeng1-4/+3
Replace copy_huge_page_from_user() with copy_folio_from_user(). copy_folio_from_user() does the same as copy_huge_page_from_user(), but takes in a folio instead of a page. Convert page_kaddr to kaddr in copy_folio_from_user() to do indenting cleanup. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ZhangPeng <[email protected]> Reviewed-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18smaps: fix defined but not used smaps_shmem_walk_opsSteven Price1-0/+7
When !CONFIG_SHMEM smaps_shmem_walk_ops is defined but not used, triggering a compiler warning. To avoid the warning remove the #ifdef around the usage. This has no effect because shmem_mapping() is a stub returning false when !CONFIG_SHMEM so the code will be compiled out, however we now need to also provide a stub for shmem_swap_usage(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 7b86ac3371b7 ("pagewalk: separate function pointers from iterator data") Signed-off-by: Steven Price <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Thomas Hellström <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm/hwpoison: introduce copy_mc_highpageJiaqi Yan1-13/+41
Similar to how copy_mc_user_highpage is implemented for copy_user_highpage on #MC supported architecture, introduce the #MC handled version of copy_highpage. This helper has immediate usage when khugepaged wants to copy file-backed memory pages and tolerate #MC. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiaqi Yan <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: David Stevens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Tong Tiangen <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18workingset: memcg: sleep when flushing stats in workingset_refault()Yosry Ahmed1-2/+2
In workingset_refault(), we call mem_cgroup_flush_stats_atomic_ratelimited() to read accurate stats within an RCU read section and with sleeping disallowed. Move the call above the RCU read section to make it non-atomic. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Since workingset_refault() is the only caller of mem_cgroup_flush_stats_atomic_ratelimited(), just make it non-atomic, and rename it to mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18memcg: sleep during flushing stats in safe contextsYosry Ahmed1-2/+7
Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Refactor the code to make mem_cgroup_flush_stats() non-atomic (aka sleepable), and provide a separate atomic version. The atomic version is used in reclaim, refault, writeback, and in mem_cgroup_usage(). All other code paths are left to use the non-atomic version. This includes callbacks for userspace reads and the periodic flusher. Since refault is the only caller of mem_cgroup_flush_stats_ratelimited(), change it to mem_cgroup_flush_stats_atomic_ratelimited(). Reclaim and refault code paths are modified to do non-atomic flushing in separate later patches -- so it will eventually be changed back to mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18memcg: rename mem_cgroup_flush_stats_"delayed" to "ratelimited"Yosry Ahmed1-2/+2
mem_cgroup_flush_stats_delayed() suggests his is using a delayed_work, but this is actually sometimes flushing directly from the callsite. What it's doing is ratelimited calls. A better name would be mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Cc: Zefan Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18cgroup: rename cgroup_rstat_flush_"irqsafe" to "atomic"Yosry Ahmed1-1/+1
Patch series "memcg: avoid flushing stats atomically where possible", v3. rstat flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system. The purpose of this series is to minimize the contexts where we flush stats atomically. Patches 1 and 2 are cleanups requested during reviews of prior versions of this series. Patch 3 makes sure we never try to flush from within an irq context. Patches 4 to 7 introduce separate variants of mem_cgroup_flush_stats() for atomic and non-atomic flushing, and make sure we only flush the stats atomically when necessary. Patch 8 is a slightly tangential optimization that limits the work done by rstat flushing in some scenarios. This patch (of 8): cgroup_rstat_flush_irqsafe() can be a confusing name. It may read as "irqs are disabled throughout", which is what the current implementation does (currently under discussion [1]), but is not the intention. The intention is that this function is safe to call from atomic contexts. Name it as such. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Cc: Zefan Li <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: move free_area_empty() to mm/internal.hMike Rapoport (IBM)1-5/+0
The free_area_empty() helper is only used inside mm/ so move it there to reduce noise in include/linux/mmzone.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Suggested-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18hugetlb: remove PageHeadHuge()Matthew Wilcox (Oracle)1-6/+1
Sidhartha Kumar removed the last caller of PageHeadHuge(), so we can now remove it and make folio_test_hugetlb() the real implementation. Add kernel-doc for folio_test_hugetlb(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18sched/isolation: add cpu_is_isolated() APIFrederic Weisbecker1-0/+12
Patch series "memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads". Leonardo has reported [1] that pcp memcg charge draining can interfere with cpu isolated workloads. The said draining is done from a WQ context with a pcp worker scheduled on each CPU which holds any cached charges for a specific memcg hierarchy. Operation is not really a common operation [2]. It can be triggered from the userspace though so some care is definitely due. Leonardo has tried to address the issue by allowing remote charge draining [3]. This approach requires an additional locking to synchronize pcp caches sync from a remote cpu from local pcp consumers. Even though the proposed lock was per-cpu there is still potential for contention and less predictable behavior. This patchset addresses the issue from a different angle. Rather than dealing with a potential synchronization, cpus which are isolated are simply never scheduled to be drained. This means that a small amount of charges could be laying around and waiting for a later use or they are flushed when a different memcg is charged from the same cpu. More details are in patch 2. The first patch from Frederic is implementing an abstraction to tell whether a specific cpu has been isolated and therefore require a special treatment. This patch (of 2): Provide this new API to check if a CPU has been isolated either through isolcpus= or nohz_full= kernel parameter. It aims at avoiding kernel load deemed to be safely spared on CPUs running sensitive workload that can't bear any disturbance, such as pcp cache draining. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Suggested-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Leonardo Bras <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: make arch_has_descending_max_zone_pfns() staticArnd Bergmann1-1/+0
clang produces a build failure on x86 for some randconfig builds after a change that moves around code to mm/mm_init.c: Cannot find symbol for section 2: .text. mm/mm_init.o: failed I have not been able to figure out why this happens, but the __weak annotation on arch_has_descending_max_zone_pfns() is the trigger here. Removing the weak function in favor of an open-coded Kconfig option check avoids the problem and becomes clearer as well as better to optimize by the compiler. [[email protected]: fix logic bug] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 9420f89db2dd ("mm: move most of core MM initialization to mm/mm_init.c") Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: SeongJae Park <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changesAndrew Morton1-18/+21
2023-04-18mm: kmsan: handle alloc failures in kmsan_ioremap_page_range()Alexander Potapenko1-9/+10
Similarly to kmsan_vmap_pages_range_noflush(), kmsan_ioremap_page_range() must also properly handle allocation/mapping failures. In the case of such, it must clean up the already created metadata mappings and return an error code, so that the error can be propagated to ioremap_page_range(). Without doing so, KMSAN may silently fail to bring the metadata for the page range into a consistent state, which will result in user-visible crashes when trying to access them. Link: https://lkml.kernel.org/r/[email protected] Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko <[email protected]> Reported-by: Dipanjan Das <[email protected]> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()Alexander Potapenko1-9/+11
As reported by Dipanjan Das, when KMSAN is used together with kernel fault injection (or, generally, even without the latter), calls to kcalloc() or __vmap_pages_range_noflush() may fail, leaving the metadata mappings for the virtual mapping in an inconsistent state. When these metadata mappings are accessed later, the kernel crashes. To address the problem, we return a non-zero error code from kmsan_vmap_pages_range_noflush() in the case of any allocation/mapping failure inside it, and make vmap_pages_range_noflush() return an error if KMSAN fails to allocate the metadata. This patch also removes KMSAN_WARN_ON() from vmap_pages_range_noflush(), as these allocation failures are not fatal anymore. Link: https://lkml.kernel.org/r/[email protected] Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko <[email protected]> Reported-by: Dipanjan Das <[email protected]> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18Change DEFINE_SEMAPHORE() to take a number argumentPeter Zijlstra1-2/+8
Fundamentally semaphores are a counted primitive, but DEFINE_SEMAPHORE() does not expose this and explicitly creates a binary semaphore. Change DEFINE_SEMAPHORE() to take a number argument and use that in the few places that open-coded it using __SEMAPHORE_INITIALIZER(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> [mcgrof: add some tribal knowledge about why some folks prefer binary sempahores over mutexes] Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-04-18PCI/DOE: Make mailbox creation API privateLukas Wunner1-14/+0
The PCI core has just been amended to create a pci_doe_mb struct for every DOE instance on device enumeration. CXL (the only in-tree DOE user so far) has been migrated to use those mailboxes instead of creating its own. That leaves pcim_doe_create_mb() and pci_doe_for_each_off() without any callers, so drop them. pci_doe_supports_prot() is now only used internally, so declare it static. pci_doe_destroy_mb() is no longer used as callback for devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer instead of a generic void pointer. Because pci_doe_create_mb() is only called on device enumeration, i.e. before driver binding, the workqueue name never contains a driver name. So replace dev_driver_string() with dev_bus_name() when generating the workqueue name. Tested-by: Ira Weiny <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Reviewed-by: Ming Li <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/64f614b6584982986c55d2c6229b4ee2b276dd59.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <[email protected]>
2023-04-18PCI/DOE: Create mailboxes on device enumerationLukas Wunner2-0/+5
Currently a DOE instance cannot be shared by multiple drivers because each driver creates its own pci_doe_mb struct for a given DOE instance. For the same reason a DOE instance cannot be shared between the PCI core and a driver. Moreover, finding out which protocols a DOE instance supports requires creating a pci_doe_mb for it. If a device has multiple DOE instances, a driver looking for a specific protocol may need to create a pci_doe_mb for each of the device's DOE instances and then destroy those which do not support the desired protocol. That's obviously an inefficient way to do things. Overcome these issues by creating mailboxes in the PCI core on device enumeration. Provide a pci_find_doe_mailbox() API call to allow drivers to get a pci_doe_mb for a given (pci_dev, vendor, protocol) triple. This API is modeled after pci_find_capability() and can later be amended with a pci_find_next_doe_mailbox() call to iterate over all mailboxes of a given pci_dev which support a specific protocol. On removal, destroy the mailboxes in pci_destroy_dev(), after the driver is unbound. This allows drivers to use DOE in their ->remove() hook. On surprise removal, cancel ongoing DOE exchanges and prevent new ones from being scheduled. Thereby ensure that a hot-removed device doesn't needlessly wait for a running exchange to time out. Tested-by: Ira Weiny <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Reviewed-by: Ming Li <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/40a6f973f72ef283d79dd55e7e6fddc7481199af.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <[email protected]>
2023-04-18PCI/DOE: Make asynchronous API privateLukas Wunner1-48/+0
A synchronous API for DOE has just been introduced. CXL (the only in-tree DOE user so far) was converted to use it instead of the asynchronous API. Consequently, pci_doe_submit_task() as well as the pci_doe_task struct are only used internally, so make them private. Tested-by: Ira Weiny <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Reviewed-by: Ming Li <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/cc19544068483681e91dfe27545c2180cd09f931.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <[email protected]>
2023-04-18PCI/DOE: Provide synchronous API and use it internallyLukas Wunner1-1/+5
The DOE API only allows asynchronous exchanges and forces callers to provide a completion callback. Yet all existing callers only perform synchronous exchanges. Upcoming commits for CMA (Component Measurement and Authentication, PCIe r6.0 sec 6.31) likewise require only synchronous DOE exchanges. Provide a synchronous pci_doe() API call which builds on the internal asynchronous machinery. Convert the internal pci_doe_discovery() to the new call. The new API allows submission of const-declared requests, necessitating the addition of a const qualifier in struct pci_doe_task. Tested-by: Ira Weiny <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Reviewed-by: Ming Li <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/0f444206da9615c56301fbaff459c0f45d27f122.1678543498.git.lukas@wunner.de Signed-off-by: Dan Williams <[email protected]>
2023-04-18mailbox: Allow direct registration to a channelElliot Berman1-0/+1
Support virtual mailbox controllers and clients which are not platform devices or come from the devicetree by allowing them to match client to channel via some other mechanism. Tested-by: Sudeep Holla <[email protected]> (pcc) Signed-off-by: Elliot Berman <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2023-04-18RDMA/mlx5: Fix flow counter query via DEVXMark Bloch1-1/+2
Commit cited in "fixes" tag added bulk support for flow counters but it didn't account that's also possible to query a counter using a non-base id if the counter was allocated as bulk. When a user performs a query, validate the flow counter id given in the mailbox is inside the valid range taking bulk value into account. Fixes: 208d70f562e5 ("IB/mlx5: Support flow counters offset for bulk counters") Signed-off-by: Mark Bloch <[email protected]> Reviewed-by: Maor Gottlieb <[email protected]> Link: https://lore.kernel.org/r/79d7fbe291690128e44672418934256254d93115.1681377114.git.leon@kernel.org Signed-off-by: Leon Romanovsky <[email protected]>
2023-04-17net/mlx5e: Add IPsec packet offload tunnel bitsLeon Romanovsky1-2/+6
Extend packet reformat types and flow table capabilities with IPsec packet offload tunnel bits. Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17bpf: Set skb redirect and from_ingress info in __bpf_tx_skbDaniel Borkmann1-0/+9
There are some use-cases where it is desirable to use bpf_redirect() in combination with ifb device, which currently is not supported, for example, around filtering inbound traffic with BPF to then push it to ifb which holds the qdisc for shaping in contrast to doing that on the egress device. Toke mentions the following case related to OpenWrt: Because there's not always a single egress on the other side. These are mainly home routers, which tend to have one or more WiFi devices bridged to one or more ethernet ports on the LAN side, and a single upstream WAN port. And the objective is to control the total amount of traffic going over the WAN link (in both directions), to deal with bufferbloat in the ISP network (which is sadly still all too prevalent). In this setup, the traffic can be split arbitrarily between the links on the LAN side, and the only "single bottleneck" is the WAN link. So we install both egress and ingress shapers on this, configured to something like 95-98% of the true link bandwidth, thus moving the queues into the qdisc layer in the router. It's usually necessary to set the ingress bandwidth shaper a bit lower than the egress due to being "downstream" of the bottleneck link, but it does work surprisingly well. We usually use something like a matchall filter to put all ingress traffic on the ifb, so doing the redirect from BPF has not been an immediate requirement thus far. However, it does seem a bit odd that this is not possible, and we do have a BPF-based filter that layers on top of this kind of setup, which currently uses u32 as the ingress filter and so it could presumably be improved to use BPF instead if that was available. Reported-by: Toke Høiland-Jørgensen <[email protected]> Reported-by: Yafang Shao <[email protected]> Reported-by: Tonghao Zhang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yafang Shao <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Link: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README Link: https://lore.kernel.org/bpf/[email protected] Link: https://lore.kernel.org/r/8cebc8b2b6e967e10cbafe2ffd6795050e74accd.1681739137.git.daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-17swiotlb: Remove bounce buffer remapping for Hyper-VMichael Kelley1-2/+0
With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), creating a second kernel virtual mapping for shared memory is no longer necessary. Everything needed for the transition to shared is handled by set_memory_decrypted(). As such, remove swiotlb_unencrypted_base and the associated code. Signed-off-by: Michael Kelley <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2023-04-17ACPI: bus: Add stub acpi_sleep_state_supported() in non-ACPI casesSaurabh Sengar1-0/+5
acpi_sleep_state_supported() is defined only when CONFIG_ACPI=y. The function is in acpi_bus.h, and acpi_bus.h can only be used in CONFIG_ACPI=y cases. Add the stub function to linux/acpi.h to make compilation successful for !CONFIG_ACPI cases. Signed-off-by: Saurabh Sengar <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2023-04-17libcrc32c: remove crc32c_implChristoph Hellwig1-1/+0
This was only ever used by btrfs, and the usage just went away. This effectively reverts df91f56adce1 ("libcrc32c: Add crc32c_impl function"). Acked-by: Herbert Xu <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-04-17btrfs, block: move REQ_CGROUP_PUNT to btrfsChristoph Hellwig2-13/+10
REQ_CGROUP_PUNT is a bit annoying as it is hard to follow and adds a branch to the bio submission hot path. To fix this, export blkcg_punt_bio_submit and let btrfs call it directly. Add a new REQ_FS_PRIVATE flag for btrfs to indicate to it's own low-level bio submission code that a punt to the cgroup submission helper is required. Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-04-17btrfs, mm: remove the punt_to_cgroup field in struct writeback_controlChristoph Hellwig1-5/+0
punt_to_cgroup is only used by extent_write_locked_range, but that function also directly controls the bio flags for the actual submission. Remove th punt_to_cgroup field, and just set REQ_CGROUP_PUNT directly in extent_write_locked_range. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-04-17Merge tag 'thermal-v6.4-rc1-2' of ↵Rafael J. Wysocki1-55/+0
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull more thermal control changes for 6.4-rc1 from Daniel Lezcano: "- Do preparating cleaning and DT bindings for RK3588 support (Sebastian Reichel) - Add driver support for RK3588 (Finley Xiao) - Use devm_reset_control_array_get_exclusive() for the Rockchip driver (Ye Xingchen) - Detect power gated thermal zones and return -EAGAIN when reading the temperature (Mikko Perttunen) - Remove thermal_bind_params structure as it is unused (Zhang Rui) - Drop unneeded quotes in DT bindings allowing to run yamllint (Rob Herring) - Update the power allocator documentation according to the thermal trace relocation (Lukas Bulwahn) - Fix sensor 1 interrupt status bitmask for the Mediatek LVTS sensor (Chen-Yu Tsai) - Use the dev_err_probe() helper in the Amlogic driver (Ye Xingchen) - Add AP domain support to LVTS thermal controllers for mt8195 (Balsam CHIHI) - Remove buggy call to thermal_of_zone_unregister() (Daniel Lezcano) - Make thermal_of_zone_[un]register() private to the thermal OF code (Daniel Lezcano) - Create a private copy of the thermal zone device parameters structure when registering a thermal zone (Daniel Lezcano)" * tag 'thermal-v6.4-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal/core: Alloc-copy-free the thermal zone parameters structure thermal/of: Unexport unused OF functions thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister thermal/drivers/mediatek/lvts_thermal: Add AP domain for mt8195 dt-bindings: thermal: mediatek: Add AP domain to LVTS thermal controllers for mt8195 thermal: amlogic: Use dev_err_probe() thermal/drivers/mediatek/lvts_thermal: Fix sensor 1 interrupt status bitmask MAINTAINERS: adjust entry in THERMAL/POWER_ALLOCATOR after header movement dt-bindings: thermal: Drop unneeded quotes thermal/core: Remove thermal_bind_params structure thermal/drivers/tegra-bpmp: Handle offline zones thermal/drivers/rockchip: use devm_reset_control_array_get_exclusive() dt-bindings: rockchip-thermal: Support the RK3588 SoC compatible thermal/drivers/rockchip: Support RK3588 SoC in the thermal driver thermal/drivers/rockchip: Support dynamic sized sensor array thermal/drivers/rockchip: Simplify channel id logic thermal/drivers/rockchip: Use dev_err_probe thermal/drivers/rockchip: Simplify clock logic thermal/drivers/rockchip: Simplify getting match data