aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-04-05mm, treewide: redefine MAX_ORDER sanelyKirill A. Shutemov5-12/+12
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [[email protected]: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [[email protected]: fix another min_t warning] [[email protected]: fixups per Zi Yan] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Michael Ellerman <[email protected]> [powerpc] Cc: "Kirill A. Shutemov" <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm/uffd: UFFD_FEATURE_WP_UNPOPULATEDPeter Xu3-1/+38
Patch series "mm/uffd: Add feature bit UFFD_FEATURE_WP_UNPOPULATED", v4. The new feature bit makes anonymous memory acts the same as file memory on userfaultfd-wp in that it'll also wr-protect none ptes. It can be useful in two cases: (1) Uffd-wp app that needs to wr-protect none ptes like QEMU snapshot, so pre-fault can be replaced by enabling this flag and speed up protections (2) It helps to implement async uffd-wp mode that Muhammad is working on [1] It's debatable whether this is the most ideal solution because with the new feature bit set, wr-protect none pte needs to pre-populate the pgtables to the last level (PAGE_SIZE). But it seems fine so far to service either purpose above, so we can leave optimizations for later. The series brings pte markers to anonymous memory too. There's some change in the common mm code path in the 1st patch, great to have some eye looking at it, but hopefully they're still relatively straightforward. This patch (of 2): This is a new feature that controls how uffd-wp handles none ptes. When it's set, the kernel will handle anonymous memory the same way as file memory, by allowing the user to wr-protect unpopulated ptes. File memories handles none ptes consistently by allowing wr-protecting of none ptes because of the unawareness of page cache being exist or not. For anonymous it was not as persistent because we used to assume that we don't need protections on none ptes or known zero pages. One use case of such a feature bit was VM live snapshot, where if without wr-protecting empty ptes the snapshot can contain random rubbish in the holes of the anonymous memory, which can cause misbehave of the guest when the guest OS assumes the pages should be all zeros. QEMU worked it around by pre-populate the section with reads to fill in zero page entries before starting the whole snapshot process [1]. Recently there's another need raised on using userfaultfd wr-protect for detecting dirty pages (to replace soft-dirty in some cases) [2]. In that case if without being able to wr-protect none ptes by default, the dirty info can get lost, since we cannot treat every none pte to be dirty (the current design is identify a page dirty based on uffd-wp bit being cleared). In general, we want to be able to wr-protect empty ptes too even for anonymous. This patch implements UFFD_FEATURE_WP_UNPOPULATED so that it'll make uffd-wp handling on none ptes being consistent no matter what the memory type is underneath. It doesn't have any impact on file memories so far because we already have pte markers taking care of that. So it only affects anonymous. The feature bit is by default off, so the old behavior will be maintained. Sometimes it may be wanted because the wr-protect of none ptes will contain overheads not only during UFFDIO_WRITEPROTECT (by applying pte markers to anonymous), but also on creating the pgtables to store the pte markers. So there's potentially less chance of using thp on the first fault for a none pmd or larger than a pmd. The major implementation part is teaching the whole kernel to understand pte markers even for anonymously mapped ranges, meanwhile allowing the UFFDIO_WRITEPROTECT ioctl to apply pte markers for anonymous too when the new feature bit is set. Note that even if the patch subject starts with mm/uffd, there're a few small refactors to major mm path of handling anonymous page faults. But they should be straightforward. With WP_UNPOPUATED, application like QEMU can avoid pre-read faults all the memory before wr-protect during taking a live snapshot. Quotting from Muhammad's test result here [3] based on a simple program [4]: (1) With huge page disabled echo madvise > /sys/kernel/mm/transparent_hugepage/enabled ./uffd_wp_perf Test DEFAULT: 4 Test PRE-READ: 1111453 (pre-fault 1101011) Test MADVISE: 278276 (pre-fault 266378) Test WP-UNPOPULATE: 11712 (2) With Huge page enabled echo always > /sys/kernel/mm/transparent_hugepage/enabled ./uffd_wp_perf Test DEFAULT: 4 Test PRE-READ: 22521 (pre-fault 22348) Test MADVISE: 4909 (pre-fault 4743) Test WP-UNPOPULATE: 14448 There'll be a great perf boost for no-thp case, while for thp enabled with extreme case of all-thp-zero WP_UNPOPULATED can be slower than MADVISE, but that's low possibility in reality, also the overhead was not reduced but postponed until a follow up write on any huge zero thp, so potentially it is faster by making the follow up writes slower. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/ [3] https://lore.kernel.org/all/[email protected]/ [4] https://github.com/xzpeter/clibs/blob/master/uffd-test/uffd-wp-perf.c [[email protected]: comment changes, oneliner fix to khugepaged] Link: https://lkml.kernel.org/r/ZB2/8jPhD3fpx5U8@x1n Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Paul Gofman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: return an ERR_PTR from __filemap_get_folioChristoph Hellwig1-5/+6
Instead of returning NULL for all errors, distinguish between: - no entry found and not asked to allocated (-ENOENT) - failed to allocate memory (-ENOMEM) - would block (-EAGAIN) so that callers don't have to guess the error based on the passed in flags. Also pass through the error through the direct callers: filemap_get_folio, filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio. [[email protected]: fix null-pointer deref] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> [nilfs2] Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: remove FGP_ENTRYChristoph Hellwig1-2/+1
FGP_ENTRY is unused now, so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05mm: make mapping_get_entry available outside of filemap.cChristoph Hellwig1-0/+1
mapping_get_entry is useful for page cache API users that need to know about xa_value internals. Rename it and make it available in pagemap.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05net: stmmac: add support for platform specific resetShenwei Wang1-0/+1
This patch adds support for platform-specific reset logic in the stmmac driver. Some SoCs require a different reset mechanism than the standard dwmac IP reset. To support these platforms, a new function pointer 'fix_soc_reset' is added to the plat_stmmacenet_data structure. The stmmac_reset in hwif.h is modified to call the 'fix_soc_reset' function if it exists. This enables the driver to use the platform-specific reset logic when necessary. Signed-off-by: Shenwei Wang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-05mm: enable maple tree RCU mode by defaultLiam R. Howlett1-1/+2
Use the maple tree in RCU mode for VMA tracking. The maple tree tracks the stack and is able to update the pivot (lower/upper boundary) in-place to allow the page fault handler to write to the tree while holding just the mmap read lock. This is safe as the writes to the stack have a guard VMA which ensures there will always be a NULL in the direction of the growth and thus will only update a pivot. It is possible, but not recommended, to have VMAs that grow up/down without guard VMAs. syzbot has constructed a testcase which sets up a VMA to grow and consume the empty space. Overwriting the entire NULL entry causes the tree to be altered in a way that is not safe for concurrent readers; the readers may see a node being rewritten or one that does not match the maple state they are using. Enabling RCU mode allows the concurrent readers to see a stable node and will return the expected result. [[email protected]: we don't need to free the nodes with RCU[ Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: [email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-06drm/scdc-helper: Pimp SCDC debugsVille Syrjälä1-3/+4
Include the device and connector information in the SCDC debugs. Makes it easier to figure out who did what. v2: Rely on connector->ddc (Maxime) Cc: Andrzej Hajda <[email protected]> Cc: Neil Armstrong <[email protected]> Cc: Robert Foss <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Jonas Karlman <[email protected]> Cc: Jernej Skrabec <[email protected]> Cc: Thierry Reding <[email protected]> Cc: Emma Anholt <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Laurent Pinchart <[email protected]> Acked-by: Maxime Ripard <[email protected]> Reviewed-by: Andrzej Hajda <[email protected]> Acked-by: Thierry Reding <[email protected]>
2023-04-05Migrate the PCIe-IDIO-24 and WS16C48 GPIO driversMark Brown1-2/+4
Merge series from William Breathitt Gray <[email protected]>: The regmap API supports IO port accessors so we can take advantage of regmap abstractions rather than handling access to the device registers directly in the driver. A patch to pass irq_drv_data as a parameter for struct regmap_irq_chip set_type_config() is included. This is needed by the idio_24_set_type_config() and ws16c48_set_type_config() callbacks in order to update the type configuration on their respective devices.
2023-04-05PCI: Make pci_bus_for_each_resource() index optionalAndy Shevchenko1-5/+19
Refactor pci_bus_for_each_resource() in the same way as pci_dev_for_each_resource(). This allows the index to be hidden inside the implementation so the caller can omit it when it's not used otherwise. No functional changes intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Krzysztof Wilczyński <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
2023-04-05clk: Remove mmask and nmask fields in struct clk_fractional_dividerChristophe JAILLET1-2/+0
All users of these fields have been removed. They are now computed when needed with [mn]shift and [mn]width. This shrinks the size of struct clk_fractional_divider from 72 to 56 bytes. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/680357e5acb338433bfc94114b65b4a4ce2c99e2.1680423909.git.christophe.jaillet@wanadoo.fr Reviewed-by: Heiko Stuebner <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2023-04-05ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()Hans de Goede1-2/+13
Allow callers of __acpi_video_get_backlight_type() to pass a pointer to a bool which will get set to false if the backlight-type comes from the cmdline or a DMI quirk and set to true if auto-detection was used. And make __acpi_video_get_backlight_type() non static so that it can be called directly outside of video_detect.c . While at it turn the acpi_video_get_backlight_type() and acpi_video_backlight_use_native() wrappers into static inline functions in include/acpi/video.h, so that we need to export one less symbol. Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-05nvmem: Add macro to register nvmem layout driversMiquel Raynal1-0/+6
Provide a module_nvmem_layout_driver() macro at the end of the nvmem-provider.h header to reduce the boilerplate when registering nvmem layout drivers. Suggested-by: Srinivas Kandagatla <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Rafał Miłecki <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: support specifying both: cell raw data & post read lengthsRafał Miłecki1-0/+2
Callback .read_post_process() is designed to modify raw cell content before providing it to the consumer. So far we were dealing with modifications that didn't affect cell size (length). In some cases however cell content needs to be reformatted and resized. It's required e.g. to provide properly formatted MAC address in case it's stored in a non-binary format (e.g. using ASCII). There were few discussions how to optimally handle that. Following possible solutions were considered: 1. Allow .read_post_process() to realloc (resize) content buffer 2. Allow .read_post_process() to adjust (decrease) just buffer length 3. Register NVMEM cells using post-read sizes The preferred solution was the last one. The problem is that simply adjusting "bytes" in NVMEM providers would result in core code NOT passing whole raw data to .read_post_process() callbacks. It means callback functions couldn't do their job without somehow manually reading original cell content on their own. This patch deals with that by registering NVMEM cells with both lengths: raw content one and post read one. It allows: 1. Core code to read whole raw cell content 2. Callbacks to return content they want Signed-off-by: Rafał Miłecki <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: provide own priv pointer in post process callbackMichael Walle1-1/+4
It doesn't make any more sense to have a opaque pointer set up by the nvmem device. Usually, the layout isn't associated with a particular nvmem device. Instead, let the caller who set the post process callback provide the priv pointer. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: cell: drop global cell_post_processMichael Walle1-2/+0
There are no users anymore for the global cell_post_process callback anymore. New users should use proper nvmem layouts. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: allow to modify a cell before adding itMichael Walle1-0/+5
Provide a way to modify a cell before it will get added. This is useful to attach a custom post processing hook via a layout. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: add per-cell post processingMichael Walle1-0/+3
Instead of relying on the name the consumer is using for the cell, like it is done for the nvmem .cell_post_process configuration parameter, provide a per-cell post processing hook. This can then be populated by the NVMEM provider (or the NVMEM layout) when adding the cell. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05nvmem: core: introduce NVMEM layoutsMichael Walle2-0/+58
NVMEM layouts are used to generate NVMEM cells during runtime. Think of an EEPROM with a well-defined conent. For now, the content can be described by a device tree or a board file. But this only works if the offsets and lengths are static and don't change. One could also argue that putting the layout of the EEPROM in the device tree is the wrong place. Instead, the device tree should just have a specific compatible string. Right now there are two use cases: (1) The NVMEM cell needs special processing. E.g. if it only specifies a base MAC address offset and you need to add an offset, or it needs to parse a MAC from ASCII format or some proprietary format. (Post processing of cells is added in a later commit). (2) u-boot environment parsing. The cells don't have a particular offset but it needs parsing the content to determine the offsets and length. Co-developed-by: Miquel Raynal <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: device: Kill of_device_request_module()Miquel Raynal1-6/+0
A new helper has been introduced, of_request_module(). Users have been converted, this helper can now be deleted. Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: Move the request module helper logic to module.cMiquel Raynal1-0/+6
Depending on device.c for pure OF handling is considered backwards. Let's extract the content of of_device_request_module() to have the real logic under module.c. The next step will be to convert users of of_device_request_module() to use the new helper. Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: Move of_modalias() to module.cMiquel Raynal1-0/+9
Create a specific .c file for OF related module handling. Move of_modalias() inside as a first step. The helper is exposed through of.h even though it is only used by core files because the users from device.c will soon be split into an OF-only helper in module.c as well as a device-oriented inline helper in of_device.h. Putting this helper in of_private.h would require to include of_private.h from of_device.h, which is not acceptable. Suggested-by: Rob Herring <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05of: Rename of_modalias_node()Miquel Raynal1-1/+2
This helper does not produce a real modalias, but tries to get the "product" compatible part of the "vendor,product" compatibles only. It is far from creating a purely useful modalias string and does not seem to be used like that directly anyway, so let's try to give this helper a more meaningful name before moving there a real modalias helper (already existing under of/device.c). Also update the various documentations to refer to the strings as "aliases" rather than "modaliases" which has a real meaning in the Linux kernel. There is no functional change. Cc: Rafael J. Wysocki <[email protected]> Cc: Len Brown <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Wolfram Sang <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Reviewed-by: Rob Herring <[email protected]> Acked-by: Mark Brown <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Acked-by: Sebastian Reichel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05regmap: Pass irq_drv_data as a parameter for set_type_config()William Breathitt Gray1-2/+4
Allow the struct regmap_irq_chip set_type_config() callback to access irq_drv_data by passing it as a parameter. Signed-off-by: William Breathitt Gray <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/20e15cd3afae80922b7e0577c7741df86b3390c5.1680708357.git.william.gray@linaro.org Signed-off-by: Mark Brown <[email protected]>
2023-04-05Merge tag 'trace-v6.3-rc5' of ↵Linus Torvalds2-5/+18
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix timerlat notification, as it was not triggering the notify to users when a new max latency was hit. - Do not trigger max latency if the tracing is off. When tracing is off, the ring buffer is not updated, it does not make sense to notify when there's a new max latency detected by the tracer, as why that latency happened is not available. The tracing logic still runs when the ring buffer is disabled, but it should not be triggering notifications. - Fix race on freeing the synthetic event "last_cmd" variable by adding a mutex around it. - Fix race between reader and writer of the ring buffer by adding memory barriers. When the writer is still on the reader page it must have its content visible on the buffer before it moves the commit index that the reader uses to know how much content is on the page. - Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and _RET_IP_, which gets broken if it is not inlined. - Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build. The field formats of trace events are calculated by using sizeof(type) and other means by what is passed into the structure macros like __field(). The __field() macro is only meant for atom types like int, long, short, pointer, etc. It is not meant for arrays. The code will currently compile with arrays, but then the format produced will be inaccurate, and user space parsing tools will break. Two bugs have already been fixed, now add code that will make the kernel fail to build if another trace event includes this buggy field format. - Fix boot up snapshot code: Boot snapshots were triggering when not even asked for on the kernel command line. This was caused by two bugs: 1) It would trigger a snapshot on any instance if one was created from the kernel command line. 2) The error handling would only affect the top level instance. So the fact that a snapshot was done on a instance that didn't allocate a buffer triggered a warning written into the top level buffer, and worse yet, disabled the top level buffer. - Fix memory leak that was caused when an error was logged in a trace buffer instance, and then the buffer instance was removed. The allocated error log messages still needed to be freed. * tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Free error logs of tracing instances tracing: Fix ftrace_boot_snapshot command line logic tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance tracing: Error if a trace event has an array for a __field() tracing/osnoise: Fix notify new tracing_max_latency tracing/timerlat: Notify new max thread latency ftrace: Mark get_lock_parent_ip() __always_inline ring-buffer: Fix race while reader and writer are on the same page tracing/synthetic: Fix races on freeing last_cmd
2023-04-05dt-bindings: clock: Add StarFive JH7110 always-on clock and reset generatorEmil Renner Berthing2-0/+30
Add bindings for the always-on clock and reset generator (AONCRG) on the JH7110 RISC-V SoC by StarFive Ltd. Reviewed-by: Conor Dooley <[email protected]> Reviewed-by: Rob Herring <[email protected]> Reviewed-by: Emil Renner Berthing <[email protected]> Signed-off-by: Emil Renner Berthing <[email protected]> Signed-off-by: Hal Feng <[email protected]> Signed-off-by: Conor Dooley <[email protected]>
2023-04-05dt-bindings: clock: Add StarFive JH7110 system clock and reset generatorEmil Renner Berthing2-0/+345
Add bindings for the system clock and reset generator (SYSCRG) on the JH7110 RISC-V SoC by StarFive Ltd. Reviewed-by: Conor Dooley <[email protected]> Reviewed-by: Rob Herring <[email protected]> Reviewed-by: Emil Renner Berthing <[email protected]> Signed-off-by: Emil Renner Berthing <[email protected]> Signed-off-by: Hal Feng <[email protected]> Signed-off-by: Conor Dooley <[email protected]>
2023-04-05Merge branches 'rcu/staging-core', 'rcu/staging-docs' and ↵Joel Fernandes (Google)5-36/+111
'rcu/staging-kfree', remote-tracking branches 'paul/srcu-cf.2023.04.04a', 'fbq/rcu/lockdep.2023.03.27a' and 'fbq/rcu/rcutorture.2023.03.20a' into rcu/staging
2023-04-05rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency checkZqiang1-1/+2
This commit adds checks for the TICK_DEP_MASK_RCU_EXP bit, thus enabling RCU expedited grace periods to actually force-enable scheduling-clock interrupts on holdout CPUs. Fixes: df1e849ae455 ("rcu: Enable tick for nohz_full CPUs slow to provide expedited QS") Signed-off-by: Zqiang <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Anna-Maria Behnsen <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]>
2023-04-05rcu/trace: use strscpy() to instead of strncpy()Xu Panda1-3/+1
This commit saves a line of code by switching from strncpy() to strscpy() by permitting the later NUL assignment to be removed. While in the area, save another line by taking advantage of 100 characters. Acked-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Xu Panda <[email protected]> Signed-off-by: Yang Yang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]>
2023-04-05tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystemJoel Fernandes (Google)1-0/+2
For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined. However, cpu_is_hotpluggable() still returns true for those CPUs. This causes torture tests that do offlining to end up trying to offline this CPU causing test failures. Such failure happens on all architectures. Fix the repeated error messages thrown by this (even if the hotplug errors are harmless) by asking the opinion of the nohz subsystem on whether the CPU can be hotplugged. [ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ] For drivers/base/ portion: Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Zhouyi Zhou <[email protected]> Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: rcu <[email protected]> Cc: [email protected] Fixes: 2987557f52b9 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel") Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]>
2023-04-05srcu: Add comments for srcu_size_statePingfan Liu1-10/+23
The SRCU_SIZE_* names are not self-explanatory, so this commit therefore adds comments to the definitions. Signed-off-by: Pingfan Liu <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: "Zhang, Qiang1" <[email protected]> To: [email protected] Reviewed-by: Paul E. McKenney <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]>
2023-04-05sed-opal: Add command to read locking range parameters.Ondrej Kozina2-0/+12
It returns following attributes: locking range start locking range length read lock enabled write lock enabled lock state (RW, RO or LK) It can be retrieved by user authority provided the authority was added to locking range via prior IOC_OPAL_ADD_USR_TO_LR ioctl command. The command was extended to add user in ACE that allows to read attributes listed above. Signed-off-by: Ondrej Kozina <[email protected]> Tested-by: Luca Boccassi <[email protected]> Tested-by: Milan Broz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-05ASoC: soc.h: remove unused params/num_paramsKuninori Morimoto1-3/+0
No drivers are using params/num_params any more. Let's remove these. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-04-05ASoC: soc.h: clarify Codec2Codec paramsKuninori Morimoto1-3/+9
snd_soc_dai_link has params/num_params, but it is unclear that params for what. This patch clarify it is params for Codec2Codec. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-04-05KVM: arm64: Introduce support for userspace SMCCC filteringOliver Upton1-0/+3
As the SMCCC (and related specifications) march towards an 'everything and the kitchen sink' interface for interacting with a system it becomes less likely that KVM will support every related feature. We could do better by letting userspace have a crack at it instead. Allow userspace to define an 'SMCCC filter' that applies to both HVCs and SMCs initiated by the guest. Supporting both conduits with this interface is important for a couple of reasons. Guest SMC usage is table stakes for a nested guest, as HVCs are always taken to the virtual EL2. Additionally, guests may want to interact with a service on the secure side which can now be proxied by userspace. Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-05KVM: arm64: Use a maple tree to represent the SMCCC filterOliver Upton1-0/+1
Maple tree is an efficient B-tree implementation that is intended for storing non-overlapping intervals. Such a data structure is a good fit for the SMCCC filter as it is desirable to sparsely allocate the 32 bit function ID space. To that end, add a maple tree to kvm_arch and correctly init/teardown along with the VM. Wire in a test against the hypercall filter for HVCs which does nothing until the controls are exposed to userspace. Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-05KVM: arm64: Rename SMC/HVC call handler to reflect realityOliver Upton1-1/+1
KVM handles SMCCC calls from virtual EL2 that use the SMC instruction since commit bd36b1a9eb5a ("KVM: arm64: nv: Handle SMCs taken from virtual EL2"). Thus, the function name of the handler no longer reflects reality. Normalize the name on SMCCC, since that's the only hypercall interface KVM supports in the first place. No fuctional change intended. Reviewed-by: Suzuki K Poulose <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-05KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALLOliver Upton1-2/+7
The 'longmode' field is a bit annoying as it blows an entire __u32 to represent a boolean value. Since other architectures are looking to add support for KVM_EXIT_HYPERCALL, now is probably a good time to clean it up. Redefine the field (and the remaining padding) as a set of flags. Preserve the existing ABI by using bit 0 to indicate if the guest was in long mode and requiring that the remaining 31 bits must be zero. Cc: Paolo Bonzini <[email protected]> Acked-by: Sean Christopherson <[email protected]> Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-05interconnect: drop unused icc_link_destroy() interfaceJohan Hovold1-6/+0
Now that the link array is deallocated when destroying nodes and the explicit link removal has been dropped from the exynos driver there are no further users of and no need for the icc_link_destroy() interface. Signed-off-by: Johan Hovold <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Georgi Djakov <[email protected]>
2023-04-05sched/psi: Allow unprivileged polling of N*2s periodDomenico Cerasuolo2-1/+8
PSI offers 2 mechanisms to get information about a specific resource pressure. One is reading from /proc/pressure/<resource>, which gives average pressures aggregated every 2s. The other is creating a pollable fd for a specific resource and cgroup. The trigger creation requires CAP_SYS_RESOURCE, and gives the possibility to pick specific time window and threshold, spawing an RT thread to aggregate the data. Systemd would like to provide containers the option to monitor pressure on their own cgroup and sub-cgroups. For example, if systemd launches a container that itself then launches services, the container should have the ability to poll() for pressure in individual services. But neither the container nor the services are privileged. This patch implements a mechanism to allow unprivileged users to create pressure triggers. The difference with privileged triggers creation is that unprivileged ones must have a time window that's a multiple of 2s. This is so that we can avoid unrestricted spawning of rt threads, and use instead the same aggregation mechanism done for the averages, which runs independently of any triggers. Suggested-by: Johannes Weiner <[email protected]> Signed-off-by: Domenico Cerasuolo <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-05sched/psi: Rename existing poll members in preparationDomenico Cerasuolo1-18/+18
Renaming in PSI implementation to make a clear distinction between privileged and unprivileged triggers code to be implemented in the next patch. Suggested-by: Johannes Weiner <[email protected]> Signed-off-by: Domenico Cerasuolo <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-04Merge branch '[email protected]' into HEADBjorn Andersson3-0/+103
Introduce SM6115 GPUCC devicetree bindings, to make it possible to use clock defines in the devicetree source.
2023-04-04Merge branch '[email protected]' ↵Bjorn Andersson1-0/+4
into clk-for-6.4 Merge dt-binding include file additions through topic branch, to allow them to be made available in DT source tree as well.
2023-04-04dt-bindings: clock: dispcc-qcm2290: Add MDSS_CORE resetKonrad Dybcio1-0/+4
Add the MDSS_CORE reset which can be asserted to reset the state of the entire MDSS. Signed-off-by: Konrad Dybcio <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-04net: qrtr: correct types of trace event parametersSimon Horman1-15/+18
The arguments passed to the trace events are of type unsigned int, however the signature of the events used __le32 parameters. I may be missing the point here, but sparse flagged this and it does seem incorrect to me. net/qrtr/ns.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/qrtr.h): ./include/trace/events/qrtr.h:11:1: warning: cast to restricted __le32 ./include/trace/events/qrtr.h:11:1: warning: restricted __le32 degrades to integer ./include/trace/events/qrtr.h:11:1: warning: restricted __le32 degrades to integer ... (a lot more similar warnings) net/qrtr/ns.c:115:47: expected restricted __le32 [usertype] service net/qrtr/ns.c:115:47: got unsigned int service net/qrtr/ns.c:115:61: warning: incorrect type in argument 2 (different base types) ... (a lot more similar warnings) Fixes: dfddb54043f0 ("net: qrtr: Add tracepoint support") Reviewed-by: Manivannan Sadhasivam <[email protected]> Signed-off-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04raw: Fix NULL deref in raw_get_next().Kuniyuki Iwashima1-2/+2
Dae R. Jeong reported a NULL deref in raw_get_next() [0]. It seems that the repro was running these sequences in parallel so that one thread was iterating on a socket that was being freed in another netns. unshare(0x40060200) r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00') socket$inet_icmp_raw(0x2, 0x3, 0x1) pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f) After commit 0daf07e52709 ("raw: convert raw sockets to RCU"), we use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW sockets. However, we should use spinlock for slow paths to avoid the NULL deref. Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object is not reused during iteration in the grace period. In fact, the lockless readers do not check the nulls marker with get_nulls_value(). So, SOCK_RAW should use hlist instead of hlist_nulls. Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(), let's convert hlist_nulls to hlist and use sk_for_each_rcu() for fast paths and sk_for_each() and spinlock for /proc/net/raw. [0]: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline] RIP: 0010:sock_net include/net/sock.h:649 [inline] RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline] RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline] RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995 Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206 RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000 RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338 RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9 R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78 R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030 FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225 seq_read+0x224/0x320 fs/seq_file.c:162 pde_read fs/proc/inode.c:316 [inline] proc_reg_read+0x23f/0x330 fs/proc/inode.c:328 vfs_read+0x31e/0xd30 fs/read_write.c:468 ksys_pread64 fs/read_write.c:665 [inline] __do_sys_pread64 fs/read_write.c:675 [inline] __se_sys_pread64 fs/read_write.c:672 [inline] __x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x478d29 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011 RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29 RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000 R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740 R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline] RIP: 0010:sock_net include/net/sock.h:649 [inline] RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline] RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline] RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995 Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206 RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000 RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338 RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9 R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78 R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030 FS: 00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 0daf07e52709 ("raw: convert raw sockets to RCU") Reported-by: syzbot <[email protected]> Reported-by: Dae R. Jeong <[email protected]> Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-04bpf: Refactor btf_nested_type_is_trusted().Alexei Starovoitov1-3/+4
btf_nested_type_is_trusted() tries to find a struct member at corresponding offset. It works for flat structures and falls apart in more complex structs with nested structs. The offset->member search is already performed by btf_struct_walk() including nested structs. Reuse this work and pass {field name, field btf id} into btf_nested_type_is_trusted() instead of offset to make BTF_TYPE_SAFE*() logic more robust. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-04bpf: Remove unused arguments from btf_struct_access().Alexei Starovoitov2-4/+2
Remove unused arguments from btf_struct_access() callback. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-04-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-1/+12
Pull kvm fixes from Paolo Bonzini: "PPC: - Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled s390: - Fix handling of external interrupts in protected guests x86: - Resample the pending state of IOAPIC interrupts when unmasking them - Fix usage of Hyper-V "enlightened TLB" on AMD - Small fixes to real mode exceptions - Suppress pending MMIO write exits if emulator detects exception Documentation: - Fix rST syntax" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: docs: kvm: x86: Fix broken field list KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent KVM: s390: pv: fix external interruption loop not always detected KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection KVM: x86: Suppress pending MMIO write exits if emulator detects exception KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking KVM: irqfd: Make resampler_list an RCU list KVM: SVM: Flush Hyper-V TLB when required