aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2024-05-02media: cec: core: avoid recursive cec_claim_log_addrsHans Verkuil1-0/+1
Keep track if cec_claim_log_addrs() is running, and return -EBUSY if it is when calling CEC_ADAP_S_LOG_ADDRS. This prevents a case where cec_claim_log_addrs() could be called while it was still in progress. Signed-off-by: Hans Verkuil <[email protected]> Reported-by: Yang, Chenyuan <[email protected]> Closes: https://lore.kernel.org/linux-media/PH7PR11MB57688E64ADE4FE82E658D86DA09EA@PH7PR11MB5768.namprd11.prod.outlook.com/ Fixes: ca684386e6e2 ("[media] cec: add HDMI CEC framework (api)") Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2024-05-02Merge tag 'net-6.9-rc7' of ↵Linus Torvalds3-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
2024-05-02string: Add additional __realloc_size() annotations for "dup" helpersKees Cook1-5/+8
Several other "dup"-style interfaces could use the __realloc_size() attribute. (As a reminder to myself and others: "realloc" is used here instead of "alloc" because the "alloc_size" attribute implies that the memory contents are uninitialized. Since we're copying contents into the resulting allocation, it must use "realloc_size" to avoid confusing the compiler's optimization passes.) Add KUnit test coverage where possible. (KUnit still does not have the ability to manipulate userspace memory.) Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-05-02KVM: Remove kvm_make_all_cpus_request_except()Venkatesh Srinivas1-2/+0
Remove kvm_make_all_cpus_request_except() as it effectively has no users, and arguably should never have been added in the first place. Commit 54163a346d4a ("KVM: Introduce kvm_make_all_cpus_request_except()") added the "except" variation for use in SVM's AVIC update path, which used it to skip sending a request to the current vCPU (commit 7d611233b016 ("KVM: SVM: Disable AVIC before setting V_IRQ")). But the AVIC usage of kvm_make_all_cpus_request_except() was essentially a hack-a-fix that simply squashed the most likely scenario of a racy WARN without addressing the underlying problem(s). Commit f1577ab21442 ("KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated") eventually fixed the WARN itself, and the "except" usage was subsequently dropped by df63202fe52b ("KVM: x86: APICv: drop immediate APICv disablement on current vCPU"). That kvm_make_all_cpus_request_except() hasn't gained any users in the last ~3 years isn't a coincidence. If a VM-wide broadcast *needs* to skip the current vCPU, then odds are very good that there is underlying bug that could be better fixed elsewhere. Signed-off-by: Venkatesh Srinivas <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: rewrite changelog with --verbose] Signed-off-by: Sean Christopherson <[email protected]>
2024-05-02seq_file: Optimize seq_puts()Christophe JAILLET1-1/+12
Most of seq_puts() usages are done with a string literal. In such cases, the length of the string car be computed at compile time in order to save a strlen() call at run-time. seq_putc() or seq_write() can then be used instead. This saves a few cycles. To have an estimation of how often this optimization triggers: $ git grep seq_puts.*\" | wc -l 3436 $ git grep seq_puts.*\".\" | wc -l 84 Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/a8589bffe4830dafcb9111e22acf06603fea7132.1713781332.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christian Brauner <[email protected]> The output for seq_putc() generation has also be checked and works.
2024-05-02kallsyms: Avoid weak references for kallsyms symbolsArd Biesheuvel1-0/+19
kallsyms is a directory of all the symbols in the vmlinux binary, and so creating it is somewhat of a chicken-and-egg problem, as its non-zero size affects the layout of the binary, and therefore the values of the symbols. For this reason, the kernel is linked more than once, and the first pass does not include any kallsyms data at all. For the linker to accept this, the symbol declarations describing the kallsyms metadata are emitted as having weak linkage, so they can remain unsatisfied. During the subsequent passes, the weak references are satisfied by the kallsyms metadata that was constructed based on information gathered from the preceding passes. Weak references lead to somewhat worse codegen, because taking their address may need to produce NULL (if the reference was unsatisfied), and this is not usually supported by RIP or PC relative symbol references. Given that these references are ultimately always satisfied in the final link, let's drop the weak annotation, and instead, provide fallback definitions in the linker script that are only emitted if an unsatisfied reference exists. While at it, drop the FRV specific annotation that these symbols reside in .rodata - FRV is long gone. Tested-by: Nick Desaulniers <[email protected]> # Boot Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/20230504174320.3930345-1-ardb%40kernel.org Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-02net: gro: fix udp bad offset in socket lookup by adding ↵Richard Gobert1-0/+9
{inner_}network_offset to napi_gro_cb Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the complete phase of gro. The functions always return skb->network_header, which in the case of encapsulated packets at the gro complete phase, is always set to the innermost L3 of the packet. That means that calling {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ L3/L4 may return an unexpected value. This incorrect usage leads to a bug in GRO's UDP socket lookup. udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These *_hdr functions return network_header which will point to the innermost L3, resulting in the wrong offset being used in __udp{4,6}_lib_lookup with encapsulated packets. This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. To fix the issue, network_offsets union is used inside napi_gro_cb, in which both the outer and the inner network offsets are saved. Reproduction example: Endpoint configuration example (fou + local address bind) # ip fou add port 6666 ipproto 4 # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip # ip link set tun1 up # ip a add 1.1.1.2/24 dev tun1 Netperf TCP_STREAM result on net-next before patch is applied: net-next main, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.28 2.37 net-next main, GRO disabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2745.06 patch applied, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2877.38 Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Signed-off-by: Richard Gobert <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-05-02ASoC: Use inline function for type safety in snd_soc_substream_to_rtd()Krzysztof Kozlowski1-2/+6
A common pattern in sound drivers is getting 'struct snd_soc_pcm_runtime' from 'struct snd_pcm_substream' opaque pointer private_data field with snd_soc_substream_to_rtd(). However 'private_data' appears in several other structures as well, including 'struct snd_compr_stream'. The field might not hold the same type for every structure, although seems the case at least for 'struct snd_compr_stream', so code can easily make a mistake by using macro for wrong structure passed as argument. Switch from macro to inline function, so such mistake will be build-time detectable. Signed-off-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-05-01net: dsa: Remove adjust_link pathsFlorian Fainelli1-7/+0
Now that we no longer any drivers using PHYLIB's adjust_link callback, remove all paths that made use of adjust_link as well as the associated functions. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01net: dsa: Remove fixed_link_update memberFlorian Fainelli1-4/+0
We have not had a switch driver use a fixed_link_update callback since 58d56fcc3964f9be0a9ca42fd126bcd9dc7afc90 ("net: dsa: bcm_sf2: Get rid of PHYLIB functions") remove this callback. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01net: Protect dev->name by seqlock.Kuniyuki Iwashima1-0/+1
We will convert ioctl(SIOCGARP) to RCU, and then we need to copy dev->name which is currently protected by rtnl_lock(). This patch does the following: 1) Add seqlock netdev_rename_lock to protect dev->name 2) Add netdev_copy_name() that copies dev->name to buffer under netdev_rename_lock 3) Use netdev_copy_name() in netdev_get_name() and drop devnet_rename_sem Suggested-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/netdev/CANn89iJEWs7AYSJqGCUABeVqOCTkErponfZdT5kV-iD=-SajnQ@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-01kunit/fortify: Fix replaced failure path to unbreak __alloc_sizeKees Cook1-1/+2
The __alloc_size annotation for kmemdup() was getting disabled under KUnit testing because the replaced fortify_panic macro implementation was using "return NULL" as a way to survive the sanity checking. But having the chance to return NULL invalidated __alloc_size, so kmemdup was not passing the __builtin_dynamic_object_size() tests any more: [23:26:18] [PASSED] fortify_test_alloc_size_kmalloc_const [23:26:19] # fortify_test_alloc_size_kmalloc_dynamic: EXPECTATION FAILED at lib/fortify_kunit.c:265 [23:26:19] Expected __builtin_dynamic_object_size(p, 1) == expected, but [23:26:19] __builtin_dynamic_object_size(p, 1) == -1 (0xffffffffffffffff) [23:26:19] expected == 11 (0xb) [23:26:19] __alloc_size() not working with __bdos on kmemdup("hello there", len, gfp) [23:26:19] [FAILED] fortify_test_alloc_size_kmalloc_dynamic Normal builds were not affected: __alloc_size continued to work there. Use a zero-sized allocation instead, which allows __alloc_size to behave. Fixes: 4ce615e798a7 ("fortify: Provide KUnit counters for failure testing") Fixes: fa4a3f86d498 ("fortify: Add KUnit tests for runtime overflows") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-05-01cifs: Implement netfslib hooksDavid Howells2-0/+2
Provide implementation of the netfslib hooks that will be used by netfslib to ask cifs to set up and perform operations. Of particular note are (*) cifs_clamp_length() - This is used to negotiate the size of the next subrequest in a read request, taking into account the credit available and the rsize. The credits are attached to the subrequest. (*) cifs_req_issue_read() - This is used to issue a subrequest that has been set up and clamped. (*) cifs_prepare_write() - This prepares to fill a subrequest by picking a channel, reopening the file and requesting credits so that we can set the maximum size of the subrequest and also sets the maximum number of segments if we're doing RDMA. (*) cifs_issue_write() - This releases any unneeded credits and issues an asynchronous data write for the contiguous slice of file covered by the subrequest. This should possibly be folded in to all ->async_writev() ops and that called directly. (*) cifs_begin_writeback() - This gets the cached writable handle through which we do writeback (this does not affect writethrough, unbuffered or direct writes). At this point, cifs is not wired up to actually *use* netfslib; that will be done in a subsequent patch. Signed-off-by: David Howells <[email protected]> cc: Steve French <[email protected]> cc: Shyam Prasad N <[email protected]> cc: Rohith Surabattula <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01netfs, afs: Use writeback retry to deal with alternate keysDavid Howells1-0/+2
Use a hook in the new writeback code's retry algorithm to rotate the keys once all the outstanding subreqs have failed rather than doing it separately on each subreq. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01netfs: Miscellaneous tidy upsDavid Howells1-5/+1
Do a couple of miscellaneous tidy ups: (1) Add a qualifier into a file banner comment. (2) Put the writeback folio traces back into alphabetical order. (3) Remove some unused folio traces. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected]
2024-05-01netfs: Cut over to using new writeback codeDavid Howells1-9/+0
Cut over to using the new writeback code. The old code is #ifdef'd out or otherwise removed from compilation to avoid conflicts and will be removed in a future patch. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01netfs, 9p: Implement helpers for new write codeDavid Howells1-0/+2
Implement the helpers for the new write code in 9p. There's now an optional ->prepare_write() that allows the filesystem to set the parameters for the next write, such as maximum size and maximum segment count, and an ->issue_write() that is called to initiate an (asynchronous) write operation. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01netfs: New writeback implementationDavid Howells2-4/+296
The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01netfs: Switch to using unsigned long long rather than loff_tDavid Howells2-10/+12
Switch to using unsigned long long rather than loff_t in netfslib to avoid problems with the sign flipping in the maths when we're dealing with the byte at position 0x7fffffffffffffff. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: Ilya Dryomov <[email protected]> cc: Xiubo Li <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01netfs: Use mempools for allocating requests and subrequestsDavid Howells1-2/+3
Use mempools for allocating requests and subrequests in an effort to make sure that allocation always succeeds so that when performing writeback we can always make progress. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01netfs: Remove ->launder_folio() supportDavid Howells2-5/+0
Remove support for ->launder_folio() from netfslib and expect filesystems to use filemap_invalidate_inode() instead. netfs_launder_folio() can then be got rid of. Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeff Layton <[email protected]> cc: Eric Van Hensbergen <[email protected]> cc: Latchesar Ionkov <[email protected]> cc: Dominique Martinet <[email protected]> cc: Christian Schoenebeck <[email protected]> cc: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Steve French <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01mm: Provide a means of invalidation without using launder_folioDavid Howells1-0/+2
Implement a replacement for launder_folio. The key feature of invalidate_inode_pages2() is that it locks each folio individually, unmaps it to prevent mmap'd accesses interfering and calls the ->launder_folio() address_space op to flush it. This has problems: firstly, each folio is written individually as one or more small writes; secondly, adjacent folios cannot be added so easily into the laundry; thirdly, it's yet another op to implement. Instead, use the invalidate lock to cause anyone wanting to add a folio to the inode to wait, then unmap all the folios if we have mmaps, then, conditionally, use ->writepages() to flush any dirty data back and then discard all pages. The invalidate lock prevents ->read_iter(), ->write_iter() and faulting through mmap all from adding pages for the duration. This is then used from netfslib to handle the flusing in unbuffered and direct writes. Signed-off-by: David Howells <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Miklos Szeredi <[email protected]> cc: Trond Myklebust <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Andrew Morton <[email protected]> cc: Alexander Viro <[email protected]> cc: Christian Brauner <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected]
2024-05-01Merge remote-tracking branch 'cxl/for-6.10/cper' into cxl-for-nextDave Jiang1-0/+26
Add support to send CPER records to CXL for more detailed parsing.
2024-05-01acpi/ghes: Process CXL Component EventsIra Weiny1-0/+27
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then inform the OS of these events via UEFI. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference lies in the use of a GUID as the CPER Section Type which matches the UUID defined in CXL 3.1 Table 8-43. Currently a configuration such as this will trace a non standard event in the log omitting useful details of the event. In addition the CXL sub-system contains additional region and HPA information useful to the user.[0] The CXL code is required to be called from process context as it needs to take a device lock. The GHES code may be in interrupt context. This complicated the use of a callback. Dan Williams suggested the use of work items as an atomic way of switching between the callback execution and a default handler.[1] The use of a kfifo simplifies queue processing by providing lock free fifo operations. cxl_cper_kfifo_get() allows easier management of the kfifo between the ghes and cxl modules. CXL 3.1 Table 8-127 requires a device to have a queue depth of 1 for each of the four event logs. A combined queue depth of 32 is chosen to provide room for 8 entries of each log type. Add GHES support to detect CXL CPER records. Add the ability for the CXL sub-system to register a work queue to process the events. This patch adds back the functionality which was removed to fix the report by Dan Carpenter[2]. Cc: Ard Biesheuvel <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Tony Luck <[email protected]> Cc: Borislav Petkov <[email protected]> Suggested-by: Dan Carpenter <[email protected]> Suggested-by: Dan Williams <[email protected]> Link: http://lore.kernel.org/r/[email protected] [0] Link: http://lore.kernel.org/r/[email protected] [1] Link: http://lore.kernel.org/r/[email protected] [2] Reviewed-by: Dan Williams <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Tony Luck <[email protected]> Tested-by: Smita Koralahalli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dave Jiang <[email protected]>
2024-05-01Merge tag 'asoc-fix-v6.9-rc6' of ↵Takashi Iwai37-66/+212
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.9 This is much larger than is ideal, partly due to your holiday but also due to several vendors having come in with relatively large fixes at similar times. It's all driver specific stuff. The meson fixes from Jerome fix some rare timing issues with blocking operations happening in triggers, plus the continuous clock support which fixes clocking for some platforms. The SOF series from Peter builds to the fix to avoid spurious resets of ChainDMA which triggered errors in cleanup paths with both PulseAudio and PipeWire, and there's also some simple new debugfs files from Pierre which make support a lot eaiser.
2024-05-01Merge tag 'regulator-fix-v6.9-rc6' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "There's a few simple driver specific fixes here, plus some core cleanups from Matti which fix issues found with client drivers due to the API being confusing. The two fixes for the stubs provide more constructive behaviour with !REGULATOR configurations, issues were noticed with some hwmon drivers which would otherwise have needed confusing bodges in the users. The irq_helpers fix to duplicate the provided name for the interrupt controller was found because a driver got this wrong and it's again a case where the core is the sensible place to put the fix" * tag 'regulator-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: change devm_regulator_get_enable_optional() stub to return Ok regulator: change stubbed devm_regulator_get_enable to return Ok regulator: vqmmc-ipq4019: fix module autoloading regulator: qcom-refgen: fix module autoloading regulator: mt6360: De-capitalize devicetree regulator subnodes regulator: irq_helpers: duplicate IRQ name
2024-05-01KVM: arm64: Simplify vgic-v3 hypercallsMarc Zyngier1-1/+0
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore hypercalls so that all of the EL2 GICv3 state is covered by a single pair of hypercalls. Signed-off-by: Fuad Tabba <[email protected]> Acked-by: Oliver Upton <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Marc Zyngier <[email protected]>
2024-05-01mm/slab: make __free(kfree) accept error pointersDan Carpenter1-2/+2
Currently, if an automatically freed allocation is an error pointer that will lead to a crash. An example of this is in wm831x_gpio_dbg_show(). 171 char *label __free(kfree) = gpiochip_dup_line_label(chip, i); 172 if (IS_ERR(label)) { 173 dev_err(wm831x->dev, "Failed to duplicate label\n"); 174 continue; 175 } The auto clean up function should check for error pointers as well, otherwise we're going to keep hitting issues like this. Fixes: 54da6a092431 ("locking: Introduce __cleanup() based infrastructure") Cc: <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2024-05-01objpool: cache nr_possible_cpus() and avoid caching nr_cpu_idsAndrii Nakryiko1-3/+3
Profiling shows that calling nr_possible_cpus() in objpool_pop() takes a noticeable amount of CPU (when profiled on 80-core machine), as we need to recalculate number of set bits in a CPU bit mask. This number can't change, so there is no point in paying the price for recalculating it. As such, cache this value in struct objpool_head and use it in objpool_pop(). On the other hand, cached pool->nr_cpus isn't necessary, as it's not used in hot path and is also a pretty trivial value to retrieve. So drop pool->nr_cpus in favor of using nr_cpu_ids everywhere. This way the size of struct objpool_head remains the same, which is a nice bonus. Same BPF selftests benchmarks were used to evaluate the effect. Using changes in previous patch (inlining of objpool_pop/objpool_push) as baseline, here are the differences: BASELINE ======== kretprobe : 9.937 ± 0.174M/s kretprobe-multi: 10.440 ± 0.108M/s AFTER ===== kretprobe : 10.106 ± 0.120M/s (+1.7%) kretprobe-multi: 10.515 ± 0.180M/s (+0.7%) Link: https://lore.kernel.org/all/[email protected]/ Cc: Matt (Qiang) Wu <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-05-01objpool: enable inlining objpool_push() and objpool_pop() operationsAndrii Nakryiko1-2/+99
objpool_push() and objpool_pop() are very performance-critical functions and can be called very frequently in kretprobe triggering path. As such, it makes sense to allow compiler to inline them completely to eliminate function calls overhead. Luckily, their logic is quite well isolated and doesn't have any sprawling dependencies. This patch moves both objpool_push() and objpool_pop() into include/linux/objpool.h and marks them as static inline functions, enabling inlining. To avoid anyone using internal helpers (objpool_try_get_slot, objpool_try_add_slot), rename them to use leading underscores. We used kretprobe microbenchmark from BPF selftests (bench trig-kprobe and trig-kprobe-multi benchmarks) running no-op BPF kretprobe/kretprobe.multi programs in a tight loop to evaluate the effect. BPF own overhead in this case is minimal and it mostly stresses the rest of in-kernel kretprobe infrastructure overhead. Results are in millions of calls per second. This is not super scientific, but shows the trend nevertheless. BEFORE ====== kretprobe : 9.794 ± 0.086M/s kretprobe-multi: 10.219 ± 0.032M/s AFTER ===== kretprobe : 9.937 ± 0.174M/s (+1.5%) kretprobe-multi: 10.440 ± 0.108M/s (+2.2%) Link: https://lore.kernel.org/all/[email protected]/ Cc: Matt (Qiang) Wu <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-05-01ftrace: make extra rcu_is_watching() validation check optionalAndrii Nakryiko1-1/+1
Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to control whether ftrace low-level code performs additional rcu_is_watching()-based validation logic in an attempt to catch noinstr violations. This check is expected to never be true and is mostly useful for low-level validation of ftrace subsystem invariants. For most users it should probably be kept disabled to eliminate unnecessary runtime overhead. This improves BPF multi-kretprobe (relying on ftrace and rethook infrastructure) runtime throughput by 2%, according to BPF benchmarks ([0]). [0] https://lore.kernel.org/bpf/CAEf4BzauQ2WKMjZdc9s0rBWa01BYbgwHN6aNDXQSHYia47pQ-w@mail.gmail.com/ Link: https://lore.kernel.org/all/[email protected]/ Cc: Steven Rostedt <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Paul E. McKenney <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-05-01fprobe: Add entry/exit callbacks typesJiri Olsa1-6/+12
We are going to store callbacks in following change, so this will ease up the code. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2024-05-01Merge branches 'fixes.2024.04.15a', 'misc.2024.04.12a', ↵Uladzislau Rezki (Sony)2-9/+36
'rcu-sync-normal-improve.2024.04.15a', 'rcu-tasks.2024.04.15a' and 'rcutorture.2024.04.15a' into rcu-merge.2024.04.15a fixes.2024.04.15a: RCU fixes misc.2024.04.12a: Miscellaneous fixes rcu-sync-normal-improve.2024.04.15a: Improving synchronize_rcu() call rcu-tasks.2024.04.15a: Tasks RCU updates rcutorture.2024.04.15a: Torture-test updates
2024-05-01xfrm: Add dir validation to "in" data path lookupAntony Antony1-0/+1
Introduces validation for the x->dir attribute within the XFRM input data lookup path. If the configured direction does not match the expected direction, input, increment the XfrmInStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmInStateDirError 1 Signed-off-by: Antony Antony <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Reviewed-by: Nicolas Dichtel <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2024-05-01xfrm: Add dir validation to "out" data path lookupAntony Antony1-0/+1
Introduces validation for the x->dir attribute within the XFRM output data lookup path. If the configured direction does not match the expected direction, output, increment the XfrmOutStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmOutPolError 1 XfrmOutStateDirError 1 Signed-off-by: Antony Antony <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Reviewed-by: Nicolas Dichtel <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2024-05-01xfrm: Add Direction to the SA in or outAntony Antony2-0/+7
This patch introduces the 'dir' attribute, 'in' or 'out', to the xfrm_state, SA, enhancing usability by delineating the scope of values based on direction. An input SA will restrict values pertinent to input, effectively segregating them from output-related values. And an output SA will restrict attributes for output. This change aims to streamline the configuration process and improve the overall consistency of SA attributes during configuration. This feature sets the groundwork for future patches, including the upcoming IP-TFS patch. Signed-off-by: Antony Antony <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2024-05-01Merge tag 'mhi-for-6.10' of ↵Greg Kroah-Hartman1-1/+28
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host ======== - Added a new API mhi_power_down_keep_dev() to not destroy the struct dev associated with the MHI channels during MHI power down. This is useful in scenarios such as system suspend/hibernation where the probability of channels coming back is very high. So the PM maintainer suggested not to destroy the struct dev in those cases. This API is introduced for fixing the failure reported in the ath11k driver during resume from system suspend. NOTE: Due to the API dependency, the patch adding the API is pushed to an immutable branch (mhi-immutable) and merged into both mhi and ath trees. But the merge commit is not visible in mhi tree due to git being smart with 'fast-forward'. - Added an optional sysfs entry to force the MHI devices to enter the Emergency Download (EDL) mode to download the firmware from host. - Added EDL mode support for Qcom SDX75/65/55 modems as per the MHI spec v1.2, Chapter 13.2. This involves writing a cookie to the EDL doorbell registers and then triggering the device reset from host. * tag 'mhi-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: host: pci_generic: Add generic edl_trigger to allow devices to enter EDL mode bus: mhi: host: Add a new API for getting channel doorbell offset bus: mhi: host: Add sysfs entry to force device to enter EDL bus: mhi: host: Add mhi_power_down_keep_dev() API to support system suspend/hibernation
2024-04-30net: move sysctl_mem_pcpu_rsv to net_hotdataEric Dumazet2-3/+4
sysctl_mem_pcpu_rsv is used in TCP fast path, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: add <net/proto_memory.h>Eric Dumazet2-78/+83
Move some proto memory definitions out of <net/sock.h> Very few files need them, and following patch will include <net/hotdata.h> from <net/proto_memory.h> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30tcp: move tcp_out_of_memory() to net/ipv4/tcp.cEric Dumazet1-9/+1
tcp_out_of_memory() has a single caller: tcp_check_oom(). Following patch will also make sk_memory_allocated() not anymore visible from <net/sock.h> and <net/tcp.h> Add const qualifier to sock argument of tcp_out_of_memory() and tcp_check_oom(). Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: move sysctl_skb_defer_max to net_hotdataEric Dumazet1-0/+1
sysctl_skb_defer_max is used in TCP fast path, move it to net_hodata. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30net: move sysctl_max_skb_frags to net_hotdataEric Dumazet2-2/+1
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30inet: introduce dst_rtable() helperEric Dumazet3-11/+13
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Sabrina Dubroca <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-30Merge remote-tracking branch 'cxl/for-6.10/dpa-to-hpa' into cxl-for-nextDave Jiang1-0/+10
Support for HPA to DPA translation for CXL events cxl_dram and cxl_general_media.
2024-04-30ACPI: Move acpi_blacklisted() declaration to asm/acpi.hKuppuswamy Sathyanarayanan1-1/+1
The function acpi_blacklisted() is defined only when CONFIG_X86 is enabled and is only used by X86 arch code. To align with its usage and definition conditions, move its declaration to asm/acpi.h Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> [ rjw: Added empty code line in a header file ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-04-30cxl/core: Add region info to cxl_general_media and cxl_dram eventsAlison Schofield1-0/+10
User space may need to know which region, if any, maps the DPAs (device physical addresses) reported in a cxl_general_media or cxl_dram event. Since the mapping can change, the kernel provides this information at the time the event occurs. This informs user space that at event <timestamp> this <region> mapped this <DPA> to this <HPA>. Add the same region info that is included in the cxl_poison trace event: the DPA->HPA translation, region name, and region uuid. The new fields are inserted in the trace event and no existing fields are modified. If the DPA is not mapped, user will see: hpa=ULLONG_MAX, region="", and uuid=0 This work must be protected by dpa_rwsem & region_rwsem since it is looking up region mappings. Signed-off-by: Alison Schofield <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/dd8d708b7a7ebfb64a27020a5eb338091336b34d.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <[email protected]>
2024-04-30ACPICA: AEST: Add support for the AEST V2 tableRuidong Tian1-6/+82
ACPICA commit ebb49799c78891cbe370f1264844664a3d8b6f35 AEST V2 was published[1], add V2 support based on AEST V1. [1]: https://developer.arm.com/documentation/den0085/latest/ Link: https://github.com/acpica/acpica/commit/ebb4979 Signed-off-by: Ruidong Tian <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-04-30powercap: intel_rapl: Introduce APIs for PMU supportZhang Rui1-0/+32
Introduce two new APIs rapl_package_add_pmu()/rapl_package_remove_pmu(). RAPL driver can invoke these APIs to expose its supported energy counters via perf PMU. The new RAPL PMU is fully compatible with current MSR RAPL PMU, including using the same PMU name and events name/id/unit/scale, etc. For example, use below command perf stat -e power/energy-pkg/ -e power/energy-ram/ FOO to get the energy consumption if power/energy-pkg/ and power/energy-ram/ events are available in the "perf list" output. This does not introduce any conflict because TPMI RAPL is the only user of these APIs currently, and it never co-exists with MSR RAPL. Note that RAPL Packages can be probed/removed dynamically, and the events supported by each TPMI RAPL device can be different. Thus the RAPL PMU support is done on demand, which means 1. PMU is registered only if it is needed by a RAPL Package. PMU events for unsupported counters are not exposed. 2. PMU is unregistered and registered when a new RAPL Package is probed and supports new counters that are not supported by current PMU. For example, on a dual-package system using TPMI RAPL, it is possible that Package 1 behaves as TPMI domain root and supports Psys domain. In this case, register PMU without Psys event when probing Package 0, and re-register the PMU with Psys event when probing Package 1. 3. PMU is unregistered when all registered RAPL Packages don't need PMU. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-04-30Merge remote-tracking branch 'cxl/for-6.10/add-log-mbox-cmds' into cxl-for-nextDave Jiang1-0/+3
Add CXL log related mailbox commands - Add Get Log Capabilities command - Add Get Supported Log Sub-List Commands command - Add Clear Log command
2024-04-30cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h>Sangyun Kim1-0/+3
The linux/cxl-event.h header file uses the u8, u16, and uuid_t types, but it doesn't include the necessary header files, <linux/types.h> and <linux/uuid.h>. Currently, cxl-event.h is only used by drivers/cxl/cxlmem.h, and it doesn't cause any errors because cxlmem.h indirectly includes the required types. However, cxl-event.h may be used by other CXL-related code in the future, so it's important to fix this issue by including the missing header files directly in cxl-event.h. Signed-off-by: Sangyun Kim <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dave Jiang <[email protected]>