40546 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
8fa37a6835 |
sysctl changes for v6.2-rc1
Only step forward on the sysctl cleanups for this cycle. This has been on linux-next since September and this time it goes with a "Yeah, think so, it just moves stuff around a bit" from Peter Zijlstra. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmOYC3sSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinVEYQAL6/3nRt854jULd3zRwrWDyJZd5yxbnc R8jJBTt3q4CKwtMqd59uQqVYLpSqOCx/GsArfsXkmY4x7KYhlaSKcC4LHmFS8Z/u dofyVKumIFqtXMI+hYuTyqNqfGoK9UKXUftrqYb8pK+K3h73uqYbrDgSex4G9GJo Au0/WeDjTzLlgqt7RPN7n0PL2jMtfWVQkr3001OCQOWW9sdrOjtprn/3bDTUnW5q KukKB5saU0CvUzrTn2DaweQiRCJxQfCQfy3DZfhDRHVuWFYMV9b1okaGEoVmQlQT I9/urfdf3aLCdBBxCQG5W6uRxZwZ2Yb93M+rijZNWNFMC6WHrMCmSiADwz9LJzIK iQV7LoolGe1TFTEVJbsde5xKSF6BeId0IF5mmPQuokAx3TPE9279HNgluaB/38c8 p3P4+mP6qE12mMPyhpwDwNOzEWgUnLsGSIE5n/WPwxCiGNa7UsN2lzMDP1cJejp5 NlRg1hRKmgt30d9+t9sHeKMcWhrjxyPGsyUMwBJTuMCHbjqizGyBsB8DzyK95OoF aN66pyRqwsK0+IUivd8VfLgfriE2gDrQD5VqkJ8lfWBx9pq8RMEq7zQ1eE9IbCff hzbfG+7k9R3o4SPfJYmCBXtp6fcq+ovjbLYSvGGCJk0zfFe6SQE21rZ3hCQPq3v5 xKFh05xUfbRF =M48U -----END PGP SIGNATURE----- Merge tag 'sysctl-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "Only a small step forward on the sysctl cleanups for this cycle" * tag 'sysctl-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sched: Move numa_balancing sysctls to its own file |
||
Linus Torvalds
|
3ba2c3ff98 |
modules changes for v6.2-rc1
Tux gets for xmass improving the average lookup performance of kallsyms_lookup_name() by 715x thanks to the work by Zhen Lei, which upgraded our old implementation from being O(n) to O(log(n)), while also retaining the old implementation support on /proc/kallsyms. The only penalty was increasing the memory footprint by 3 * kallsyms_num_syms. Folks who want to improve this further now also have a dedicated selftest facility through KALLSYMS_SELFTEST. Since I had to start reviewing other future kallsyms / modules enhancements by Nick Alcock (his stuff is not merged, it requires more work) I carefully reviewed and merged Zhen Lei's kallsyms changes through modules-next tree a bit less than a month ago. So this has been exposed on linux-next for about a month now with no reported regressions. Stephen Boyd added zstd in-kernel decompression support, but the only users of this would be folks using the load-pin LSM because otherwise we do module docompression in userspace. This is the newest code and was merged last week on modules-next. We spent a lot of time analyzing and coming to grips with a proper fix to an old modules regression which only recently came to light (since v5.3-rc1, May 2019) but even though I merged that fix onto modules-next last week I'm having second thoughts about it now as I was writing about that fix in this git tag message for you, as I found a few things we cannot quite justify there yet. So I'm going to push back to the drawing board again there until all i's are properly dotted. Yes, it's a regression but the issue has been there for 2 years now and it came up because of high end CPU count, it can wait a *tiny* bit more for a proper fix. The only other thing with mentioning is a minor boot time optimization by Rasmus Villemoes which deferes param_sysfs_init() to late init. The rest is cleanups and minor fixes. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmOX54cSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinFSgP/AxBdTOljYoqHnIL2/5f7PO2epSUq4Yu h42kE7aQ9qbiR9Rq2piCmD0SmeroVBIfxoMJwxnTuUy5IeujLXe6mtt7nG7Z96H1 mFxBCF63LqE/VUp/bYTusJkOPLksmhK2tCo5zMEnrRuhDAMrbFRc/S3jGm15P3/v /hdzh384Ou4cEQPQY6nsMwSvYsuNeYAhE/BqhkrQ1LDOAgSGx8DUp4db7i1/gmxG PAu7CPJEdEGljSHhG7v7PmqYYAyhyMRsW2WkCndQUvpcfZ3Q5abMFMa+9DQuT0j9 VIJtIFMZzgC+wy+1HB2hdhVcfYj1cGul0DIKugbJ9obTrIEiB8BFSaNluZroKXkU MIB/rlQCU8kkiw5CocTwM7APHPyv5sZf97oYm9MHcRE2QYYq+o4XKul3Pl2vcEB7 Cetbdxv+rUGl4Qm0U9AbXzZijwOnRu0XZShy4FTjwhipBSER93hGzYph9beSS+sb QLnDK1c6fATT9Ye95w9Oq7/4wdK5yDifRLwA57qo4oKNz4F4MhToPcIQbhLtU2IC NHWa5udylfbpNf2MnpCw0040b1hUV8atUxwZ2kBMPH/5bodXzQBKZkQEpzdPyOD8 sihPRlVmiVVXwRMxtQtFZXpb4l9Zg9YTEFMAA1ixgT1Gefh1VjAMAfOmy/hqiqBp x5CXDwftRckB =acEB -----END PGP SIGNATURE----- Merge tag 'modules-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: "Tux gets for xmas an improvement to the average lookup performance of kallsyms_lookup_name() by 715x thanks to the work by Zhen Lei, which upgraded our old implementation from being O(n) to O(log(n)), while also retaining the old implementation support on /proc/kallsyms. The only penalty was increasing the memory footprint by 3 * kallsyms_num_syms. Folks who want to improve this further now also have a dedicated selftest facility through KALLSYMS_SELFTEST. Stephen Boyd added zstd in-kernel decompression support, but the only users of this would be folks using the load-pin LSM because otherwise we do module decompression in userspace. The only other thing with mentioning is a minor boot time optimization by Rasmus Villemoes which deferes param_sysfs_init() to late init. The rest is cleanups and minor fixes" * tag 'modules-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: livepatch: Call klp_match_callback() in klp_find_callback() to avoid code duplication module/decompress: Support zstd in-kernel decompression kallsyms: Remove unneeded semicolon kallsyms: Add self-test facility livepatch: Use kallsyms_on_each_match_symbol() to improve performance kallsyms: Add helper kallsyms_on_each_match_symbol() kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] kallsyms: Correctly sequence symbols when CONFIG_LTO_CLANG=y kallsyms: Improve the performance of kallsyms_lookup_name() scripts/kallsyms: rename build_initial_tok_table() module: Fix NULL vs IS_ERR checking for module_get_next_page kernel/params.c: defer most of param_sysfs_init() to late_initcall time module: Remove unused macros module_addr_min/max module: remove redundant module_sysfs_initialized variable |
||
Linus Torvalds
|
ce8a79d560 |
for-6.2/block-2022-12-08
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmOScsgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpi5ID/9pLXFYOq1+uDjU0KO/MdjMjK8Ukr34lCnk WkajRLheE8JBKOFDE54XJk56sQSZHX9bTWqziar0h1fioh7FlQR/tVvzsERCm2M9 2y9THJNJygC68wgybStyiKlshFjl7TD7Kv5N9Y3xP3mkQygT+D6o8fXZk5xQbYyH YdFSoq4rJVHxRL03yzQiReGGIYdOUEQQh8l1FiLwLlKa3lXAey1KuxWIzksVN0KK aZB4QhiBpOiPgDHUVisq2XtyQjpZ2byoCImPzgrcqk9Jo4esvm/e6esrg4xlsvII LKFFkTmbVqjUZtFjqakFHmfuzVor4nU5f+xb90ZHExuuODYckkxWp5rWhf9QwqqI 0ik6WYgI1/5vnHnX8f2DYzOFQf9qa/rLgg0CshyUODlD6RfHa9vntqYvlIFkmOBd Q7KblIoK8YTzUS1M+v7X8JQ7gDR2KwygH37Da2KJS+vgvfIb8kJGr1ZORuhJuJJ7 Bl69gaNkHTHrqufp7UI64YXfueeuNu2J9z3zwzGoxeaFaofF/phDn0/2gCQE1fQI XBhsMw+ETqI6B2SPHMnzYDu2DM1S8ZTOYQlaD4G3uqgWnAM1tG707395uAy5yu4n D5azU1fVG4UocoNIyPujpaoSRs2zWZycEFEeUQkhyDDww/j4hlHi6H33eOnk0zsr wxzFGfvHfw== =k/vv -----END PGP SIGNATURE----- Merge tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - Support some passthrough commands without CAP_SYS_ADMIN (Kanchan Joshi) - Refactor PCIe probing and reset (Christoph Hellwig) - Various fabrics authentication fixes and improvements (Sagi Grimberg) - Avoid fallback to sequential scan due to transient issues (Uday Shankar) - Implement support for the DEAC bit in Write Zeroes (Christoph Hellwig) - Allow overriding the IEEE OUI and firmware revision in configfs for nvmet (Aleksandr Miloserdov) - Force reconnect when number of queue changes in nvmet (Daniel Wagner) - Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi Grimberg, Christoph Hellwig, Christophe JAILLET) - Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni) - Use the common tagset helpers in nvme-pci driver (Christoph Hellwig) - Cleanup the nvme-pci removal path (Christoph Hellwig) - Use kstrtobool() instead of strtobool (Christophe JAILLET) - Allow unprivileged passthrough of Identify Controller (Joel Granados) - Support io stats on the mpath device (Sagi Grimberg) - Minor nvmet cleanup (Sagi Grimberg) - MD pull requests via Song: - Code cleanups (Christoph) - Various fixes - Floppy pull request from Denis: - Fix a memory leak in the init error path (Yuan) - Series fixing some batch wakeup issues with sbitmap (Gabriel) - Removal of the pktcdvd driver that was deprecated more than 5 years ago, and subsequent removal of the devnode callback in struct block_device_operations as no users are now left (Greg) - Fix for partition read on an exclusively opened bdev (Jan) - Series of elevator API cleanups (Jinlong, Christoph) - Series of fixes and cleanups for blk-iocost (Kemeng) - Series of fixes and cleanups for blk-throttle (Kemeng) - Series adding concurrent support for sync queues in BFQ (Yu) - Series bringing drbd a bit closer to the out-of-tree maintained version (Christian, Joel, Lars, Philipp) - Misc drbd fixes (Wang) - blk-wbt fixes and tweaks for enable/disable (Yu) - Fixes for mq-deadline for zoned devices (Damien) - Add support for read-only and offline zones for null_blk (Shin'ichiro) - Series fixing the delayed holder tracking, as used by DM (Yu, Christoph) - Series enabling bio alloc caching for IRQ based IO (Pavel) - Series enabling userspace peer-to-peer DMA (Logan) - BFQ waker fixes (Khazhismel) - Series fixing elevator refcount issues (Christoph, Jinlong) - Series cleaning up references around queue destruction (Christoph) - Series doing quiesce by tagset, enabling cleanups in drivers (Christoph, Chao) - Series untangling the queue kobject and queue references (Christoph) - Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye, Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph) * tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits) blktrace: Fix output non-blktrace event when blk_classic option enabled block: sed-opal: Don't include <linux/kernel.h> sed-opal: allow using IOC_OPAL_SAVE for locking too blk-cgroup: Fix typo in comment block: remove bio_set_op_attrs nvmet: don't open-code NVME_NS_ATTR_RO enumeration nvme-pci: use the tagset alloc/free helpers nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers nvme: consolidate setting the tagset flags nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set block: bio_copy_data_iter nvme-pci: split out a nvme_pci_ctrl_is_dead helper nvme-pci: return early on ctrl state mismatch in nvme_reset_work nvme-pci: rename nvme_disable_io_queues nvme-pci: cleanup nvme_suspend_queue nvme-pci: remove nvme_pci_disable nvme-pci: remove nvme_disable_admin_queue nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl nvme: use nvme_wait_ready in nvme_shutdown_ctrl ... |
||
Linus Torvalds
|
bbdf4d5461 |
audit/stable-6.2 PR 20221212
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmOXmt4UHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOuuA//VwjYXQFyA6odWD9urv33jkOxBpoI MydNGO+xKQy9EPyoSBzsdGDZIxrLn73nscFCLdDqKd59wmv0XYxJ1q7d/mKv5ISX hfAdWO9TFY7//WGjuz8gGdn8UpBE8lHJKSdXzsztgHOnPtNrXPbRytO1OTAA4XD0 gPXeyNnQSHr5yMogh+rYYwsTc6tytmWeqeaQddrlPZZYrlemgKcWKbiq35Un+fMC uzDCilq8Usbi/SmSC2nmP8GHI3MQ9HDpIlRp7nfBCOStUnt31X1LN8u++EL2NEV2 GIoVvjyVdY8H0VFYR/Xf0Wldv1TsLlwGu9WkZCgg9E/IpsEbs9K+9q1HeeDvAt2T vQWtMUvRIVMa1iLc+OTKXoukIgHmwsXvbX32jfpGpHc2dUCoFMDHcAsLEep/SJme RE+3DlJD8BHsign/4Guc1j/OGvbrJ2zRmuFt6pwHacpmFaGS+yW8CdjG+oX9ncZ2 IB9qAc/E6fIGzf05/PmP4elnjIsDMLpPUtQ7yjL/c6c78wVSpWDvw8R/0+oyCjE5 D725Q9SK2lxZPLGR+NA34ik0LREhMgJufFpiP3vtwRJ4tvooVLWKrRtQTCLFWW1t VgTLOwYv1HqTq0b8TBi9cslDowKAw7svTK/Iqqv9yG3H44uUX65fbqKue0AK7yvv ytLaiT/RbkAo3wY= =yJOJ -----END PGP SIGNATURE----- Merge tag 'audit-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Two performance oriented patches for the audit subsystem: one consolidates similar code to gain some caching advantages, while the other stores a value in a stack variable to avoid repeated lookups in a loop. The commit descriptions have more information, including some before/after performance measurements" * tag 'audit-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: unify audit_filter_{uring(), inode_name(), syscall()} audit: cache ctx->major in audit_filter_syscall() |
||
Linus Torvalds
|
e529d3507a |
dma-mapping updates for Linux 2.6
- reduce the swiotlb buffer size on allocation failure (Alexey Kardashevskiy) - clean up passing of bogus GFP flags to the dma-coherent allocator (Christoph Hellwig) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmOYqxsLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYPh8g/+JB8mwv/qVkTecf+rJ4eewgNdkdRPFZfl2oPixZ3n tfTuy5+ly7zwzpNK3Kjy4UTj9rcyc9Pn5vKp8O9l8/d4w4HvCH9z3XaPARVR8cz0 TBDrfX5PDWsR8bN246GxJJRZPH5ogmZ9Pl1EG7/ZLM5CF8PFVct9usr/zQTNkcQH K4Hf6suTXDmlxMbC22EyCtwKTA6ThppJOJTId+iLSu77g5xi51LrNGbYV8Ylfynb 9p3lU67nTpXwrn019moPrEYs+QHvjVnfrIK2b2cafpu/DA1Vrkk9dF8DDvK0kXu5 OBqU2NlPqHsdAp3jgNXxemPmw8eMbUW+gV3IQknTQEPsGStPKWM3b0qCpCGXAT8r 79sEFc3NoUyJsz26BqqI2lWIBu7KOs+j1ZJlilG7pBZu1zwdYPrzglj8qUnTbT1F n9TP/A6Yu10ea2zLWrrbMVmA45lMMczHteqX6Hxr7gfHRQqsB3401YYGkQbUa15w xiQVx29dicXsD+QDS1FziGVbBqexZ8PsbBAlX3NJJcxMvAkCuG0CUclGhlbsEIx1 zRmUNDbL4b1ImzmFUh9avpfDJntBfyWpjh4lWu3G+Y4JRfwfG/HNncDf+DFJHns+ oT0Ox4fPdbfvWKXJx48LPoe0XzgjCguMS0Ql6CDgQj+uy7PdqfTHFJOlsCUDsBBH c5c= =t9QU -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.2-2022-12-13' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - reduce the swiotlb buffer size on allocation failure (Alexey Kardashevskiy) - clean up passing of bogus GFP flags to the dma-coherent allocator (Christoph Hellwig) * tag 'dma-mapping-6.2-2022-12-13' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: reject __GFP_COMP in dma_alloc_attrs ALSA: memalloc: don't pass bogus GFP_ flags to dma_alloc_* s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent cnic: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/qib: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/hfi1: don't pass bogus GFP_ flags to dma_alloc_coherent media: videobuf-dma-contig: use dma_mmap_coherent swiotlb: reduce the swiotlb buffer size on allocation failure |
||
Linus Torvalds
|
e1212e9b6f |
fs.vfsuid.conversion.v6.2
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bspgAKCRCRxhvAZXjc opEWAQDpF5rnZn1vv4/uOTij9ztcA4yLxu/Q19CdqBaoHlWZ9AD/d3eecee3bh5h iPHtlUK5/VspfD9LPpdc5ZbPCdZ2pA4= =t6NN -----END PGP SIGNATURE----- Merge tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfsuid updates from Christian Brauner: "Last cycle we introduced the vfs{g,u}id_t types and associated helpers to gain type safety when dealing with idmapped mounts. That initial work already converted a lot of places over but there were still some left, This converts all remaining places that still make use of non-type safe idmapping helpers to rely on the new type safe vfs{g,u}id based helpers. Afterwards it removes all the old non-type safe helpers" * tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: remove unused idmapping helpers ovl: port to vfs{g,u}id_t and associated helpers fuse: port to vfs{g,u}id_t and associated helpers ima: use type safe idmapping helpers apparmor: use type safe idmapping helpers caps: use type safe idmapping helpers fs: use type safe idmapping helpers mnt_idmapping: add missing helpers |
||
Zhen Lei
|
4f1354d5c6 |
livepatch: Call klp_match_callback() in klp_find_callback() to avoid code duplication
The implementation of function klp_match_callback() is identical to the partial implementation of function klp_find_callback(). So call function klp_match_callback() in function klp_find_callback() instead of the duplicated code. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
Linus Torvalds
|
75f4d9af8b |
iov_iter work; most of that is about getting rid of
direction misannotations and (hopefully) preventing more of the same for the future. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZzQAAKCRBZ7Krx/gZQ 65RZAP4nTkvOn0NZLVFkuGOx8pgJelXAvrteyAuecVL8V6CR4AD40qCVY51PJp8N MzwiRTeqnGDxTTF7mgd//IB6hoatAA== =bcvF -----END PGP SIGNATURE----- Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls |
||
Linus Torvalds
|
405b2fc663 |
Unification of regset and non-regset sides of ELF coredump
handling. Collecting per-thread register values is the only thing that needs to be ifdefed there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZyNgAKCRBZ7Krx/gZQ 63MZAQDZDE9Pk9EQ/3qOPNb2cuz8KSB3THUyotvustUUGPTUVAD/Ut1xD03jpWCY oQ6tM8dNyh3+Vsx6/XKNd1+pj6IgNQE= =XYkZ -----END PGP SIGNATURE----- Merge tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull elf coredumping updates from Al Viro: "Unification of regset and non-regset sides of ELF coredump handling. Collecting per-thread register values is the only thing that needs to be ifdefed there..." * tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: [elf] get rid of get_note_info_size() [elf] unify regset and non-regset cases [elf][non-regset] use elf_core_copy_task_regs() for dumper as well [elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument) elf_core_copy_task_regs(): task_pt_regs is defined everywhere [elf][regset] simplify thread list handling in fill_note_info() [elf][regset] clean fill_note_info() a bit kill extern of vsyscall32_sysctl kill coredump_params->regs kill signal_pt_regs() |
||
Linus Torvalds
|
8702f2c611 |
Non-MM patches for 6.2-rc1.
- A ptrace API cleanup series from Sergey Shtylyov - Fixes and cleanups for kexec from ye xingchen - nilfs2 updates from Ryusuke Konishi - squashfs feature work from Xiaoming Ni: permit configuration of the filesystem's compression concurrency from the mount command line. - A series from Akinobu Mita which addresses bound checking errors when writing to debugfs files. - A series from Yang Yingliang to address rapido memory leaks - A series from Zheng Yejian to address possible overflow errors in encode_comp_t(). - And a whole shower of singleton patches all over the place. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5efRgAKCRDdBJ7gKXxA jgvdAP0al6oFDtaSsshIdNhrzcMwfjt6PfVxxHdLmNhF1hX2dwD/SVluS1bPSP7y 0sZp7Ustu3YTb8aFkMl96Y9m9mY1Nwg= =ga5B -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - A ptrace API cleanup series from Sergey Shtylyov - Fixes and cleanups for kexec from ye xingchen - nilfs2 updates from Ryusuke Konishi - squashfs feature work from Xiaoming Ni: permit configuration of the filesystem's compression concurrency from the mount command line - A series from Akinobu Mita which addresses bound checking errors when writing to debugfs files - A series from Yang Yingliang to address rapidio memory leaks - A series from Zheng Yejian to address possible overflow errors in encode_comp_t() - And a whole shower of singleton patches all over the place * tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits) ipc: fix memory leak in init_mqueue_fs() hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount rapidio: devices: fix missing put_device in mport_cdev_open kcov: fix spelling typos in comments hfs: Fix OOB Write in hfs_asc2mac hfs: fix OOB Read in __hfs_brec_find relay: fix type mismatch when allocating memory in relay_create_buf() ocfs2: always read both high and low parts of dinode link count io-mapping: move some code within the include guarded section kernel: kcsan: kcsan_test: build without structleak plugin mailmap: update email for Iskren Chernev eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD rapidio: fix possible UAF when kfifo_alloc() fails relay: use strscpy() is more robust and safer cpumask: limit visibility of FORCE_NR_CPUS acct: fix potential integer overflow in encode_comp_t() acct: fix accuracy loss for input value of encode_comp_t() linux/init.h: include <linux/build_bug.h> and <linux/stringify.h> rapidio: rio: fix possible name leak in rio_register_mport() rapidio: fix possible name leaks when rio_add_device() fails ... |
||
Linus Torvalds
|
268325bda5 |
Random number generator updates for Linux 6.2-rc1.
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6 /ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4 CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE= =QRhK -----END PGP SIGNATURE----- Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ... |
||
Linus Torvalds
|
e1a1ccef7a |
Livepatching changes for 6.2
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmOXOe0ACgkQUqAMR0iA lPL4Qg//aqlwtM/i2Mkz4Vd0NBlT+CqYO2aUZrODNefOv5ok+cOZmcwcRzr5sPQw USJZ7LKvz6OgTw+NkHFYfPhu7aRj0kBknqtorhw8NgCVI6vCXzcZF0JzsZ+1E5IS xg5XNV+iDWe2fgXgqtJaPAiLlxlo/fuvizlkJ9GSUsMmBUS6QVkjLpCmk/5LNYdc +ZXroqXkB+2q0nnKYu4+di8vlnFrMxuGO0fPw7VnOsvEy9drnyrpiWzl+t2NV+Q6 GV9T5bxyQgP7wlMVn5IZa8QZL2Vh7jGEVaEkW7JuEspokytqXPSw4e03BaDjfKOl MMulLI+yizrKkHRMLcrVzNcECbTZ9qbRXOmaa7FuxabPkZ+JcnkKYdVQa8/Sq9to TssL8yTNFWGnjMVLUbgXuxMaOz1oLugc8nUxOhB9JIgEO+LvwTRRlu0LTnrvx19k 7peRzu4GkPSToKqrdMmJVw7nIX2sUnrOGLXJ/MK/WpZ61MZoi0Ppd0oo8lDrv9oq mP/D9RG2nOCdbHfYDLAxqvV5DWeHDDHmEoqIacenPyWOBRZGThhV7++/q6OIlyvD wuM6RMqzTe4JPlDrSlGX4tgFpf260BezoRJebLaL3k1YQl5xU40pII/16RdayR3f w8RelBdpaY4rudg9Qm+/5nAXrlCS4Y5vijAOuNpXC9AZze40SX8= =hkCN -----END PGP SIGNATURE----- Merge tag 'livepatching-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching update from Petr Mladek: - code cleanup * tag 'livepatching-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Move the result-invariant calculation out of the loop |
||
Linus Torvalds
|
a312a8cc3c |
cgroup changes for v6.2-rc1
Nothing too interesting. * Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations kprobable. * A couple cpuset optimizations. * Other misc changes including doc and test updates. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCY5bHvg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGcYrAQCfrlzrbWw6gTQ7fmr0Avxjy5FxLjsdzEGPcmGY ByEMhgD/VdUf3zI/Khr91Gsi5JXQxQf7a5caD369xupRWUWjqA8= =Nf+E -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting: - Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations kprobable - A couple cpuset optimizations - Other misc changes including doc and test updates" * tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: remove rcu_read_lock()/rcu_read_unlock() in critical section of spin_lock_irq() cgroup/cpuset: Improve cpuset_css_alloc() description kselftest/cgroup: Add cleanup() to test_cpuset_prs.sh cgroup/cpuset: Optimize cpuset_attach() on v2 cgroup/cpuset: Skip spread flags update on v2 kselftest/cgroup: Fix gathering number of CPUs cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REF cgroup: Implement DEBUG_CGROUP_REF |
||
Linus Torvalds
|
bf57ae2165 |
Scheduler changes for v6.2:
- Implement persistent user-requested affinity: introduce affinity_context::user_mask and unconditionally preserve the user-requested CPU affinity masks, for long-lived tasks to better interact with cpusets & CPU hotplug events over longer timespans, without destroying the original affinity intent if the underlying topology changes. - Uclamp updates: fix relationship between uclamp and fits_capacity() - PSI fixes - Misc fixes & updates. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmOXkmgRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1j/dQ//WYW/JaBpydqnVxDu6C21z0w3+fHDdlsN nQ6jyLPlouFjI2Ink1E7i7Iq8C73sdewCgD7Jq3xGa1GhsPEJIrPAaBgacxYjOqc x9HHZoygSkAihTfrVvzq37YttD2t/gQQxc81tBziMBVP2A+gb9z44u+ezMlxjiGz irgE07qNfiLyTeD/dJhEU2EOsPJm/gestW3+Cd8uwYAe6pj0X4FE3n8ipmr0BzNZ 6nxFJaSspwAkREjpAIZVENEArq7XrkGqUFKgKpYqWn0HAnuTWgFcW8E2NrDw7Qbf 4aAdBuzimbWdbkqRoX9r7++r5wqc3KW+is8Y97aEUsc0zhrXHAW1Hn2w7en5XxiQ btaPi77Boi69sHvOrfMy3i6UZ895yh2sROIkYBDT485w57BR75HsMLkk2LNIm7qE mATrrZ65bbGAgAxZouqlnQk40BUlniIfDlfZyReyFtXkW8UH5tTNX6qtpWzzdwfy posrm+XvgDcP96/7DIczZwT6VEJE5GBZbPvk2Vw4GNq6/QeW7g9GPhYTaV6CXzzW lCk/MV1n+IWCUqjkGXXCTS53TIyC6WZh2ehegcsh1KYyWcVijEs42S6eqXZI9cO7 F4oU7sehg4vlhMm1uE5JgaABfYqqzzKlvZySdwXbne2Vjt4nsWlWoe6u6JAdA4EB PRwmUDRMyEE= =aao/ -----END PGP SIGNATURE----- Merge tag 'sched-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Implement persistent user-requested affinity: introduce affinity_context::user_mask and unconditionally preserve the user-requested CPU affinity masks, for long-lived tasks to better interact with cpusets & CPU hotplug events over longer timespans, without destroying the original affinity intent if the underlying topology changes. - Uclamp updates: fix relationship between uclamp and fits_capacity() - PSI fixes - Misc fixes & updates * tag 'sched-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Clear ttwu_pending after enqueue_task() sched/psi: Use task->psi_flags to clear in CPU migration sched/psi: Stop relying on timer_pending() for poll_work rescheduling sched/psi: Fix avgs_work re-arm in psi_avgs_work() sched/psi: Fix possible missing or delayed pending event sched: Always clear user_cpus_ptr in do_set_cpus_allowed() sched: Enforce user requested affinity sched: Always preserve the user requested cpumask sched: Introduce affinity_context sched: Add __releases annotations to affine_move_task() sched/fair: Check if prev_cpu has highest spare cap in feec() sched/fair: Consider capacity inversion in util_fits_cpu() sched/fair: Detect capacity inversion sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition sched/uclamp: Make cpu_overutilized() use util_fits_cpu() sched/uclamp: Make asym_fits_capacity() use util_fits_cpu() sched/uclamp: Make select_idle_capacity() use util_fits_cpu() sched/uclamp: Fix fits_capacity() check in feec() sched/uclamp: Make task_fits_capacity() use util_fits_cpu() sched/uclamp: Fix relationship between uclamp and migration margin |
||
Linus Torvalds
|
add7695957 |
Perf events updates for v6.2:
- Thoroughly rewrite the data structures that implement perf task context handling, with the goal of fixing various quirks and unfeatures both in already merged, and in upcoming proposed code. The old data structure is the per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' In this new design this is replaced with a single task context and a single CPU context, plus intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' [ See commit |
||
Linus Torvalds
|
617fe4fa82 |
Two changes in this cycle:
- a micro-optimization in static_key_slow_inc_cpuslocked() - fix futex death-notification wakeup bug Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmOXi5sRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gf5A/8CbrmjosK6ls/NSLP7zVIOF3inAU8Ih0H pxkHWfDCzMG7WSTcjEH3AZl+hW7nXj41rC29i/7WyIxt4wslZiBQzpwEbmg92aF1 fZnmHtywvRPg1+l9XcYY6dHCL+3qr5K4v1plTXTgLLEQgKNAzO5XyPHLTYKrixGc 5hv1ntZnL9jw7+cTNmiS8tesenkrm1ZZC3IsqQcI7SxxTnrHjSgQQRRMMY2Wmom4 L9SKWTJ5xks2AgG/zzXCTqfSTBH9+tiqTLKq4vUqViIHGrWZxdEB3gPzm3qdXlyv 8IXiJ4ySGOEvK6Z6aaQOvVOJt/c7C7mx93NCLzGXS+dTOmAsead5IEtWBxUL90Un KwnV5RUcHgyeewpzTrY+GpYSgPk666HvBu0TEZrzkPH1aZR5cCa1AA52Y87pq6Xp YsAnz0X3s2daXToPBHZnXK+54PzkYyrqwtbAuolxW5h5PAKOKWQ5HLWtM5HD+qt6 oCtuVBxoTS14DsfM3zaNhMtQYykt2QgaxS5ic5Lt65HnxX1b85xxjNPukpCWuL4h TlTcaj3gW3cJybO+R9RTTAUITCx4ukLXUq/R/E4IKkX/6wWhm5a/KtRKUAxL1x4U BWDcQRhKn39Y4xKeLVQ4JHy74DItqLF8k7PFPUSYpmssLW8zLP54gTq9ontBU6CO bVGAaH+U2ZI= =7B6i -----END PGP SIGNATURE----- Merge tag 'locking-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Two changes in this cycle: - a micro-optimization in static_key_slow_inc_cpuslocked() - fix futex death-notification wakeup bug" * tag 'locking-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Resend potentially swallowed owner death notification jump_label: Use atomic_try_cmpxchg() in static_key_slow_inc_cpuslocked() |
||
Linus Torvalds
|
c1f0fcd85d |
cxl for 6.2
- Add the cpu_cache_invalidate_memregion() API for cache flushing in response to physical memory reconfiguration, or memory-side data invalidation from operations like secure erase or memory-device unlock. - Add a facility for the kernel to warn about collisions between kernel and userspace access to PCI configuration registers - Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1) - Add handling and reporting of CXL errors reported via the PCIe AER mechanism - Add support for CXL Persistent Memory Security commands - Add support for the "XOR" algorithm for CXL host bridge interleave - Rework / simplify CXL to NVDIMM interactions - Miscellaneous cleanups and fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY5UpyAAKCRDfioYZHlFs Z0ttAP4uxCjIibKsFVyexpSgI4vaZqQ9yt9NesmPwonc0XookwD+PlwP6Xc0d0Ox t0gJ6+pwdh11NRzhcNE1pAaPcJZU4gs= =HAQk -----END PGP SIGNATURE----- Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for 6.2. While it may seem backwards, the CXL update this time around includes some focus on CXL 1.x enabling where the work to date had been with CXL 2.0 (VH topologies) in mind. First generation CXL can mostly be supported via BIOS, similar to DDR, however it became clear there are use cases for OS native CXL error handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x hosts (Restricted CXL Host (RCH) topologies). So, this update brings RCH topologies into the Linux CXL device model. In support of the ongoing CXL 2.0+ enabling two new core kernel facilities are added. One is the ability for the kernel to flag collisions between userspace access to PCI configuration registers and kernel accesses. This is brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware mailbox over config-cycles. The other is a cpu_cache_invalidate_memregion() API that maps to wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest VMs and architectures that do not support it yet. The CXL paths that need it, dynamic memory region creation and security commands (erase / unlock), are disabled when it is not present. As for the CXL 2.0+ this cycle the subsystem gains support Persistent Memory Security commands, error handling in response to PCIe AER notifications, and support for the "XOR" host bridge interleave algorithm. Summary: - Add the cpu_cache_invalidate_memregion() API for cache flushing in response to physical memory reconfiguration, or memory-side data invalidation from operations like secure erase or memory-device unlock. - Add a facility for the kernel to warn about collisions between kernel and userspace access to PCI configuration registers - Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1) - Add handling and reporting of CXL errors reported via the PCIe AER mechanism - Add support for CXL Persistent Memory Security commands - Add support for the "XOR" algorithm for CXL host bridge interleave - Rework / simplify CXL to NVDIMM interactions - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits) cxl/region: Fix memdev reuse check cxl/pci: Remove endian confusion cxl/pci: Add some type-safety to the AER trace points cxl/security: Drop security command ioctl uapi cxl/mbox: Add variable output size validation for internal commands cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size cxl/security: Fix Get Security State output payload endian handling cxl: update names for interleave ways conversion macros cxl: update names for interleave granularity conversion macros cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry tools/testing/cxl: Require cache invalidation bypass cxl/acpi: Fail decoder add if CXIMS for HBIG is missing cxl/region: Fix spelling mistake "memergion" -> "memregion" cxl/regs: Fix sparse warning cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support tools/testing/cxl: Add an RCH topology cxl/port: Add RCD endpoint port enumeration cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem tools/testing/cxl: Add XOR Math support to cxl_test cxl/acpi: Support CXL XOR Interleave Math (CXIMS) ... |
||
Linus Torvalds
|
045e222d0a |
Power management updates for 6.2-rc1
- Fix nasty and hard to debug race condition introduced by mistake in the runtime PM core code and clean up that code somewhat on top of the fix (Rafael Wysocki). - Generalize of_perf_domain_get_sharing_cpumask phandle format (Hector Martin). - Add new cpufreq driver for Apple SoC CPU P-states (Hector Martin). - Update Qualcomm cpufreq driver, including: * CPU clock provider support, * Generic cleanups or reorganization. * Potential memleak fix. * Fix of the return value of cpufreq_driver->get(). (Manivannan Sadhasivam, Chen Hui). - Update Qualcomm cpufreq driver's DT bindings, including: * Support for CPU clock provider. * Missing cache-related properties fixes. * Support for QDU1000/QRU1000. (Manivannan Sadhasivam, Rob Herring, Melody Olvera). - Add support for ti,am625 SoC and enable build of ti-cpufreq for ARCH_K3 (Dave Gerlach, and Vibhore Vardhan). - Use flexible array to simplify memory allocation in the tegra186 cpufreq driver (Christophe JAILLET). - Convert cpufreq statistics code to use sysfs_emit_at() (ye xingchen). - Allow intel_pstate to use no-HWP mode on Sapphire Rapids (Giovanni Gherdovich). - Add missing pci_dev_put() to the amd_freq_sensitivity cpufreq driver (Xiongfeng Wang). - Initialize the kobj_unregister completion before calling kobject_init_and_add() in the cpufreq core code (Yongqiang Liu). - Defer setting boost MSRs in the ACPI cpufreq driver (Stuart Hayes, Nathan Chancellor). - Make intel_pstate accept initial EPP value of 0x80 (Srinivas Pandruvada). - Make read-only array sys_clk_src in the SPEAr cpufreq driver static (Colin Ian King). - Make array speeds in the longhaul cpufreq driver static (Colin Ian King). - Use str_enabled_disabled() helper in the ACPI cpufreq driver (Andy Shevchenko). - Drop a reference to CVS from cpufreq documentation (Conghui Wang). - Improve kernel messages printed by the PSCI cpuidle driver (Ulf Hansson). - Make the DT cpuidle driver return the correct number of parsed idle states, clean it up and clarify a comment in it (Ulf Hansson). - Modify the tasks freezing code to avoid using pr_cont() and refine an error message printed by it (Rafael Wysocki). - Make the hibernation core code complain about memory map mismatches during resume to help diagnostics (Xueqin Luo). - Fix mistake in a kerneldoc comment in the hibernation code (xiongxin). - Reverse the order of performance and enabling operations in the generic power domains code (Abel Vesa). - Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the generic power domains code (Abel Vesa). - Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn Guo). - Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo). - Drop generic power domain status manipulation during hibernate restore (Shawn Guo). - Fix compiler warnings with make W=1 in the idle_inject power capping driver (Srinivas Pandruvada). - Use kstrtobool() instead of strtobool() in the power capping sysfs interface (Christophe JAILLET). - Add SCMI Powercap based power capping driver (Cristian Marussi). - Add Emerald Rapids support to the intel-uncore-freq driver (Artem Bityutskiy). - Repair slips in kernel-doc comments in the generic notifier code (Lukas Bulwahn). - Fix several DT issues in the OPP library reorganize code around opp-microvolt-<named> DT property (Viresh Kumar). - Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties to be present without the others present (James Calligeros). - Fix clock-latency-ns property in DT example (Serge Semin). - Add a private governor_data for devfreq governors (Kant Fan). - Reorganize devfreq code to use device_match_of_node() and devm_platform_get_and_ioremap_resource() instead of open coding them (ye xingchen, Minghao Chi). - Make cpupower choose base_cpu to display default cpupower details instead of picking CPU 0 (Saket Kumar Bhaskar). - Add Georgian translation to cpupower documentation (Zurab Kargareteli). - Introduce powercap intel-rapl library, powercap-info command, and RAPL monitor into cpupower (Thomas Renninger). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmOXWKsSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxzKsP/jEKIwMSG4KHyXjJSDopFvppA13468ma ao2G5EnbtPgZOiN66BOcAfPB+pzBM8WBCnpy8sfNzcpQSaGaJr+flQDqQV/1QG/H GNQ0MUYN6TF/zfz/hKawDtQJihw9OrJgqQfUJIyc7Djo8ntSBu299XAt3X8VB5D1 azU1WwfOnEhr8evkqd8DS81fwm6b5cWvLkfG3Qvk2VxlwC/BCFdygqNjwOXmMNMb DPYWv1xoVhSKzJsPHbAtzFq6veLsw2Glf2xPDyjf9ZPB0ujrftFoRoeCrC/neBDb 5bB4P5Injg3IB7SAHf97XgGAH2biUKwVnQhVUOTWXdQ7u/xDbH5fOLFJkBOBP6n6 gZiEOqzg5wVXk+ZfKx4fjsf4LvB1r+nM2tmx/bzhxyt9UDLUfB9kY0PMXLRuYqyn ITvk00CJ/hkwD98pql4pCnc1PYZLUv/CHiaqTjwwOKuue3Jb3OTSPrSWtYIyTyNx s2eBz/CxGSg4Q25u3loIiNVAaCOul6SZq+Iz6BlVP8sy3q62LWi8mp5b+kb8HFWH lk8GpavqOLF6brxpPL/n0vav2bCmdwblMjTcowtGbLgiGSZaD97AkPFTN2H7tGPv iUZDTdK3H24aqY62yKzo2HK3PhwNCg06gF0VTsuvJ7iIQmfeUpLjB/3qGeJNjlEQ 20fQ6YU/NytB =B9Uu -----END PGP SIGNATURE----- Merge tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These include two new drivers (cpufreq driver for Apple SoC CPU P-states and the SCMI Powercap based power capping driver), other new hardware support and driver extensions (Qualcomm cpufreq driver and its DT bindings, TI cpufreq driver, intel_pstate, intel-uncore-freq), a bunch of fixes and cleanups all over and a cpupower utility update including new features related to RAPL support. Specifics: - Fix nasty and hard to debug race condition introduced by mistake in the runtime PM core code and clean up that code somewhat on top of the fix (Rafael Wysocki) - Generalize of_perf_domain_get_sharing_cpumask phandle format (Hector Martin) - Add new cpufreq driver for Apple SoC CPU P-states (Hector Martin) - Update Qualcomm cpufreq driver (Manivannan Sadhasivam, Chen Hui): - CPU clock provider support - Generic cleanups or reorganization - Potential memleak fix - Fix of the return value of cpufreq_driver->get() - Update Qualcomm cpufreq driver's DT bindings (Manivannan Sadhasivam, Rob Herring, Melody Olvera): - Support for CPU clock provider - Missing cache-related properties fixes - Support for QDU1000/QRU1000 - Add support for ti,am625 SoC and enable build of ti-cpufreq for ARCH_K3 (Dave Gerlach, and Vibhore Vardhan) - Use flexible array to simplify memory allocation in the tegra186 cpufreq driver (Christophe JAILLET) - Convert cpufreq statistics code to use sysfs_emit_at() (ye xingchen) - Allow intel_pstate to use no-HWP mode on Sapphire Rapids (Giovanni Gherdovich) - Add missing pci_dev_put() to the amd_freq_sensitivity cpufreq driver (Xiongfeng Wang) - Initialize the kobj_unregister completion before calling kobject_init_and_add() in the cpufreq core code (Yongqiang Liu) - Defer setting boost MSRs in the ACPI cpufreq driver (Stuart Hayes, Nathan Chancellor) - Make intel_pstate accept initial EPP value of 0x80 (Srinivas Pandruvada) - Make read-only array sys_clk_src in the SPEAr cpufreq driver static (Colin Ian King) - Make array speeds in the longhaul cpufreq driver static (Colin Ian King) - Use str_enabled_disabled() helper in the ACPI cpufreq driver (Andy Shevchenko) - Drop a reference to CVS from cpufreq documentation (Conghui Wang) - Improve kernel messages printed by the PSCI cpuidle driver (Ulf Hansson) - Make the DT cpuidle driver return the correct number of parsed idle states, clean it up and clarify a comment in it (Ulf Hansson) - Modify the tasks freezing code to avoid using pr_cont() and refine an error message printed by it (Rafael Wysocki) - Make the hibernation core code complain about memory map mismatches during resume to help diagnostics (Xueqin Luo) - Fix mistake in a kerneldoc comment in the hibernation code (xiongxin) - Reverse the order of performance and enabling operations in the generic power domains code (Abel Vesa) - Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the generic power domains code (Abel Vesa) - Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn Guo) - Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo) - Drop generic power domain status manipulation during hibernate restore (Shawn Guo) - Fix compiler warnings with make W=1 in the idle_inject power capping driver (Srinivas Pandruvada) - Use kstrtobool() instead of strtobool() in the power capping sysfs interface (Christophe JAILLET) - Add SCMI Powercap based power capping driver (Cristian Marussi) - Add Emerald Rapids support to the intel-uncore-freq driver (Artem Bityutskiy) - Repair slips in kernel-doc comments in the generic notifier code (Lukas Bulwahn) - Fix several DT issues in the OPP library reorganize code around opp-microvolt-<named> DT property (Viresh Kumar) - Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties to be present without the others present (James Calligeros) - Fix clock-latency-ns property in DT example (Serge Semin) - Add a private governor_data for devfreq governors (Kant Fan) - Reorganize devfreq code to use device_match_of_node() and devm_platform_get_and_ioremap_resource() instead of open coding them (ye xingchen, Minghao Chi) - Make cpupower choose base_cpu to display default cpupower details instead of picking CPU 0 (Saket Kumar Bhaskar) - Add Georgian translation to cpupower documentation (Zurab Kargareteli) - Introduce powercap intel-rapl library, powercap-info command, and RAPL monitor into cpupower (Thomas Renninger)" * tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits) PM: runtime: Adjust white space in the core code cpufreq: Remove CVS version control contents from documentation cpufreq: stats: Convert to use sysfs_emit_at() API cpufreq: ACPI: Only set boost MSRs on supported CPUs PM: sleep: Refine error message in try_to_freeze_tasks() PM: sleep: Avoid using pr_cont() in the tasks freezing code PM: runtime: Relocate rpm_callback() right after __rpm_callback() PM: runtime: Do not call __rpm_callback() from rpm_idle() PM / devfreq: event: use devm_platform_get_and_ioremap_resource() PM / devfreq: event: Use device_match_of_node() PM / devfreq: Use device_match_of_node() powercap: idle_inject: Fix warnings with make W=1 PM: hibernate: Complain about memory map mismatches during resume dt-bindings: cpufreq: cpufreq-qcom-hw: Add QDU1000/QRU1000 cpufreq cpufreq: tegra186: Use flexible array to simplify memory allocation cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain cpupower: Introduce powercap intel-rapl library and powercap-info command cpupower: Add Georgian translation cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode cpufreq: amd_freq_sensitivity: Add missing pci_dev_put() ... |
||
Linus Torvalds
|
0a1d4434db |
Updates for timers, timekeeping and drivers:
- Core: - The timer_shutdown[_sync]() infrastructure: Tearing down timers can be tedious when there are circular dependencies to other things which need to be torn down. A prime example is timer and workqueue where the timer schedules work and the work arms the timer. What needs to prevented is that pending work which is drained via destroy_workqueue() does not rearm the previously shutdown timer. Nothing in that shutdown sequence relies on the timer being functional. The conclusion was that the semantics of timer_shutdown_sync() should be: - timer is not enqueued - timer callback is not running - timer cannot be rearmed Preventing the rearming of shutdown timers is done by discarding rearm attempts silently. A warning for the case that a rearm attempt of a shutdown timer is detected would not be really helpful because it's entirely unclear how it should be acted upon. The only way to address such a case is to add 'if (in_shutdown)' conditionals all over the place. This is error prone and in most cases of teardown not required all. - The real fix for the bluetooth HCI teardown based on timer_shutdown_sync(). A larger scale conversion to timer_shutdown_sync() is work in progress. - Consolidation of VDSO time namespace helper functions - Small fixes for timer and timerqueue - Drivers: - Prevent integer overflow on the XGene-1 TVAL register which causes an never ending interrupt storm. - The usual set of new device tree bindings - Small fixes and improvements all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUuC0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYodpZD/9kCDi009n65QFF1J4kE5aZuABbRMtO 7sy66fJpDyB/MtcbPPH29uzQUEs1VMTQVB+ZM+7e1YGoxSWuSTzeoFH+yK1w4tEZ VPbOcvUEjG0esKUehwYFeOjSnIjy6M1Y41aOUaDnq00/azhfTrzLxQA1BbbFbkpw S7u2hllbyRJ8KdqQyV9cVpXmze6fcpdtNhdQeoA7qQCsSPnJ24MSpZ/PG9bAovq8 75IRROT7CQRd6AMKAVpA9Ov8ak9nbY3EgQmoKcp5ZXfXz8kD3nHky9Lste7djgYB U085Vwcelt39V5iXevDFfzrBYRUqrMKOXIf2xnnoDNeF5Jlj5gChSNVZwTLO38wu RFEVCjCjuC41GQJWSck9LRSYdriW/htVbEE8JLc6uzUJGSyjshgJRn/PK4HjpiLY AvH2rd4rAap/rjDKvfWvBqClcfL7pyBvavgJeyJ8oXyQjHrHQwapPcsMFBm0Cky5 soF0Lr3hIlQ9u+hwUuFdNZkY9mOg09g9ImEjW1AZTKY0DfJMc5JAGjjSCfuopVUN Uf/qqcUeQPSEaC+C9xiFs0T3svYFxBqpgPv4B6t8zAnozon9fyZs+lv5KdRg4X77 qX395qc6PaOSQlA7gcxVw3vjCPd0+hljXX84BORP7z+uzcsomvIH1MxJepIHmgaJ JrYbSZ5qzY5TTA== =JlDe -----END PGP SIGNATURE----- Merge tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and drivers: Core: - The timer_shutdown[_sync]() infrastructure: Tearing down timers can be tedious when there are circular dependencies to other things which need to be torn down. A prime example is timer and workqueue where the timer schedules work and the work arms the timer. What needs to prevented is that pending work which is drained via destroy_workqueue() does not rearm the previously shutdown timer. Nothing in that shutdown sequence relies on the timer being functional. The conclusion was that the semantics of timer_shutdown_sync() should be: - timer is not enqueued - timer callback is not running - timer cannot be rearmed Preventing the rearming of shutdown timers is done by discarding rearm attempts silently. A warning for the case that a rearm attempt of a shutdown timer is detected would not be really helpful because it's entirely unclear how it should be acted upon. The only way to address such a case is to add 'if (in_shutdown)' conditionals all over the place. This is error prone and in most cases of teardown not required all. - The real fix for the bluetooth HCI teardown based on timer_shutdown_sync(). A larger scale conversion to timer_shutdown_sync() is work in progress. - Consolidation of VDSO time namespace helper functions - Small fixes for timer and timerqueue Drivers: - Prevent integer overflow on the XGene-1 TVAL register which causes an never ending interrupt storm. - The usual set of new device tree bindings - Small fixes and improvements all over the place" * tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support dt-bindings: timer: renesas,tmu: Add r8a779g0 support clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool() clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock() clocksource/drivers/timer-ti-dm: Clear settings on probe and free clocksource/drivers/timer-ti-dm: Make timer_get_irq static clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks dt-bindings: timer: rockchip: Add rockchip,rk3128-timer clockevents: Repair kernel-doc for clockevent_delta2ns() clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct clocksource/drivers/sh_cmt: Access registers according to spec vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy Bluetooth: hci_qca: Fix the teardown problem for real timers: Update the documentation to reflect on the new timer_shutdown() API timers: Provide timer_shutdown[_sync]() timers: Add shutdown mechanism to the internal functions timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode ... |
||
Linus Torvalds
|
08d72bd299 |
A small set of updates for CPU hotplug:
- Prevent stale CPU hotplug state in the cpu_down() path which was detected by stress testing the sysfs interface - Ensure that the target CPU hotplug state for the boot CPU is CPUHP_ONLINE instead of the compile time init value CPUHP_OFFLINE. - Switch back to the original behaviour of warning when a CPU hotplug callback in the DYING/STARTING section returns an error code. Otherwise a buggy callback can leave the CPUs in an non recoverable state. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUtQQTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXGLD/4h+BkT+76oQ6MEmOQC/uaDzM2eOTlE vuq9iMTMCiHjiBmkzrGT2yxzpOYiEvDJVEHfCBrwskI2rAHrjtx2VCk5yoYPajoI P604Qgml+PTCk0uOxy2pPenF4PpTQ+oe3UdfZxc3zz+Qirzsd2QpXaiY6KrNktIN vnEY0mnQjsCKf5V1/Ckws0lMUMrgHFnxQhEyT8Bvo5BhTHMpsx3zS4l0QwBanqSo hqiiYIja9alRcOKSf2l1XZB6XoNOVcubRV1YZ8saN0RJ8P7gBkeR/6Sk0Imwsn2P k3xdW1AOCoCbG8NnXH7g+ohP8k5AlpP59SsMVujXwTqI55NURFibv0rj5ZhQhSWa MKstICvUc6xyqHHwtriHyzqy3naKkiptEuxto9WL6WZYF+Xj/3JD0tT4pRvey18/ yxOjDqZAYHmS3cOh7m6VzOG4CwAu9p5qloZVzvqgR0G3LzMu66e7mucciG26TFPb C5mmUsb8SvW0yYr8ZOtKXwoQKMTZtAmU9L7DJEKRZ8AesKswaAASTgb+lDFXRId6 lxI2Vs8DPQXYJMtioYEKnwzmgHaPJ6VQKv7++74xZMHMpR/Mc3+Sm4iSIXkJK87o 07DwJMTUPR3E5/K6d4ERSvlLaKVNk3frsylhgsSmPbJn672jBe8HlDSU2uLsso2O 9kTe9KPikuHWEw== =7mN4 -----END PGP SIGNATURE----- Merge tag 'smp-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "A small set of updates for CPU hotplug: - Prevent stale CPU hotplug state in the cpu_down() path which was detected by stress testing the sysfs interface - Ensure that the target CPU hotplug state for the boot CPU is CPUHP_ONLINE instead of the compile time init value CPUHP_OFFLINE. - Switch back to the original behaviour of warning when a CPU hotplug callback in the DYING/STARTING section returns an error code. Otherwise a buggy callback can leave the CPUs in an non recoverable state" * tag 'smp-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Do not bail-out in DYING/STARTING sections cpu/hotplug: Set cpuhp target for boot cpu cpu/hotplug: Make target_store() a nop when target == state |
||
Linus Torvalds
|
9d33edb20f |
Updates for the interrupt core and driver subsystem:
- Core: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages contrary to the uniform and specification defined storage mechanisms for PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stranglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. - Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2 wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7 fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+ CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5 /FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM 03JfvdxnueM3gw== =9erA -----END PGP SIGNATURE----- Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ... |
||
Linus Torvalds
|
06cff4a58e |
arm64 updates for 6.2
ACPI: * Enable FPDT support for boot-time profiling * Fix CPU PMU probing to work better with PREEMPT_RT * Update SMMUv3 MSI DeviceID parsing to latest IORT spec * APMT support for probing Arm CoreSight PMU devices CPU features: * Advertise new SVE instructions (v2.1) * Advertise range prefetch instruction * Advertise CSSC ("Common Short Sequence Compression") scalar instructions, adding things like min, max, abs, popcount * Enable DIT (Data Independent Timing) when running in the kernel * More conversion of system register fields over to the generated header CPU misfeatures: * Workaround for Cortex-A715 erratum #2645198 Dynamic SCS: * Support for dynamic shadow call stacks to allow switching at runtime between Clang's SCS implementation and the CPU's pointer authentication feature when it is supported (complete with scary DWARF parser!) Tracing and debug: * Remove static ftrace in favour of, err, dynamic ftrace! * Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace and existing arch code * Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the old FTRACE_WITH_REGS * Extend 'crashkernel=' parameter with default value and fallback to placement above 4G physical if initial (low) allocation fails SVE: * Optimisation to avoid disabling SVE unconditionally on syscall entry and just zeroing the non-shared state on return instead Exceptions: * Rework of undefined instruction handling to avoid serialisation on global lock (this includes emulation of user accesses to the ID registers) Perf and PMU: * Support for TLP filters in Hisilicon's PCIe PMU device * Support for the DDR PMU present in Amlogic Meson G12 SoCs * Support for the terribly-named "CoreSight PMU" architecture from Arm (and Nvidia's implementation of said architecture) Misc: * Tighten up our boot protocol for systems with memory above 52 bits physical * Const-ify static keys to satisty jump label asm constraints * Trivial FFA driver cleanups in preparation for v1.1 support * Export the kernel_neon_* APIs as GPL symbols * Harden our instruction generation routines against instrumentation * A bunch of robustness improvements to our arch-specific selftests * Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...) -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmOPLFAQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNPRcCACLyDTvkimiqfoPxzzgdkx/6QOvw9s3/mXg UcTORSZBR1VnYkiMYEKVz/tTfG99dnWtD8/0k/rz48NbhBfsF2sN4ukyBBXVf0zR fjnaVyVC11LUgBgZKPo6maV+jf/JWf9hJtpPl06KTiPb2Hw2JX4DXg+PeF8t2hGx NLH4ekQOrlDM8mlsN5mc0YsHbiuO7Xe/NRuet8TsgU4bEvLAwO6bzOLVUMqDQZNq bQe2ENcGVAzAf7iRJb38lj9qB/5hrQTHRXqLXMSnJyyVjQEwYca0PeJMa7x30bXF ZZ+xQ8Wq0mxiffZraf6SE34yD4gaYS4Fziw7rqvydC15vYhzJBH1 =hV+2 -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The highlights this time are support for dynamically enabling and disabling Clang's Shadow Call Stack at boot and a long-awaited optimisation to the way in which we handle the SVE register state on system call entry to avoid taking unnecessary traps from userspace. Summary: ACPI: - Enable FPDT support for boot-time profiling - Fix CPU PMU probing to work better with PREEMPT_RT - Update SMMUv3 MSI DeviceID parsing to latest IORT spec - APMT support for probing Arm CoreSight PMU devices CPU features: - Advertise new SVE instructions (v2.1) - Advertise range prefetch instruction - Advertise CSSC ("Common Short Sequence Compression") scalar instructions, adding things like min, max, abs, popcount - Enable DIT (Data Independent Timing) when running in the kernel - More conversion of system register fields over to the generated header CPU misfeatures: - Workaround for Cortex-A715 erratum #2645198 Dynamic SCS: - Support for dynamic shadow call stacks to allow switching at runtime between Clang's SCS implementation and the CPU's pointer authentication feature when it is supported (complete with scary DWARF parser!) Tracing and debug: - Remove static ftrace in favour of, err, dynamic ftrace! - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace and existing arch code - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the old FTRACE_WITH_REGS - Extend 'crashkernel=' parameter with default value and fallback to placement above 4G physical if initial (low) allocation fails SVE: - Optimisation to avoid disabling SVE unconditionally on syscall entry and just zeroing the non-shared state on return instead Exceptions: - Rework of undefined instruction handling to avoid serialisation on global lock (this includes emulation of user accesses to the ID registers) Perf and PMU: - Support for TLP filters in Hisilicon's PCIe PMU device - Support for the DDR PMU present in Amlogic Meson G12 SoCs - Support for the terribly-named "CoreSight PMU" architecture from Arm (and Nvidia's implementation of said architecture) Misc: - Tighten up our boot protocol for systems with memory above 52 bits physical - Const-ify static keys to satisty jump label asm constraints - Trivial FFA driver cleanups in preparation for v1.1 support - Export the kernel_neon_* APIs as GPL symbols - Harden our instruction generation routines against instrumentation - A bunch of robustness improvements to our arch-specific selftests - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits) arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler arm64: Prohibit instrumentation on arch_stack_walk() arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: alternatives: add __init/__initconst to some functions/variables arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() kselftest/arm64: Allow epoll_wait() to return more than one result kselftest/arm64: Don't drain output while spawning children kselftest/arm64: Hold fp-stress children until they're all spawned arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation ... |
||
Linus Torvalds
|
893660b0e1 |
slab updates for 6.2-rc1
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmOTQvYACgkQ4CHKc/GJ qRCaqQf/UjCDmj1vYKcsTzp5L4MDXdQPA7dKtytbnZtROtClVNUzB0jODsfeMI7C SwbDJRoUU1y99GRFYIx9oGji1q7TYOWS/PsZxOGkv8ILommmQ1kJdZdxt9rOqYNg 3mjCZoQmZMIRipLDrN55C096Mi+mI89kkE4Lkyrigpmxvc0KyX6QBerr+VmaBMHw DjmFC6Gj+ZH2AX6z7AzOF1gZ42gPBQUjWdHFRcY41dShOQZNl2FPT5ITAvotlJlH 9mj6woCqW936UOcpUl+Qqk7mekDJb1hqmYXV2VAlhprBi6Vcd9PU6GmPPb6w51bS HkSNNYjkbuNxBXY13PUPcR0hEHv9zw== =AlWx -----END PGP SIGNATURE----- Merge tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - SLOB deprecation and SLUB_TINY The SLOB allocator adds maintenance burden and stands in the way of API improvements [1]. Deprecate it by renaming the config option (to make users notice) to CONFIG_SLOB_DEPRECATED with updated help text. SLUB should be used instead as SLAB will be the next on the removal list. Based on reports from a riscv k210 board with 8MB RAM, add a CONFIG_SLUB_TINY option to minimize SLUB's memory usage at the expense of scalability. This has resolved the k210 regression [2] so in case there are no others (that wouldn't be resolvable by further tweaks to SLUB_TINY) plan is to remove SLOB in a few cycles. Existing defconfigs with CONFIG_SLOB are converted to CONFIG_SLUB_TINY. - kmalloc() slub_debug redzone improvements A series from Feng Tang that builds on the tracking or requested size for kmalloc() allocations (for caches with debugging enabled) added in 6.1, to make redzone checks consider the requested size and not the rounded up one, in order to catch more subtle buffer overruns. Includes new slub_kunit test. - struct slab fields reordering to accomodate larger rcu_head RCU folks would like to grow rcu_head with debugging options, which breaks current struct slab layout's assumptions, so reorganize it to make this possible. - Miscellaneous improvements/fixes: - __alloc_size checking compiler workaround (Kees Cook) - Optimize and cleanup SLUB's sysfs init (Rasmus Villemoes) - Make SLAB compatible with PROVE_RAW_LOCK_NESTING (Jiri Kosina) - Correct SLUB's percpu allocation estimates (Baoquan He) - Re-enableS LUB's run-time failslab sysfs control (Alexander Atanasov) - Make tools/vm/slabinfo more user friendly when not run as root (Rong Tao) - Dead code removal in SLUB (Hyeonggon Yoo) * tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (31 commits) mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED mm, slub: don't aggressively inline with CONFIG_SLUB_TINY mm, slub: remove percpu slabs with CONFIG_SLUB_TINY mm, slub: split out allocations from pre/post hooks mm/slub, kunit: Add a test case for kmalloc redzone check mm/slub, kunit: add SLAB_SKIP_KFENCE flag for cache creation mm, slub: refactor free debug processing mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY mm, slub: disable SYSFS support with CONFIG_SLUB_TINY mm, slub: add CONFIG_SLUB_TINY mm, slab: ignore hardened usercopy parameters when disabled slab: Remove special-casing of const 0 size allocations slab: Clean up SLOB vs kmalloc() definition mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head mm/migrate: make isolate_movable_page() skip slab pages mm/slab: move and adjust kernel-doc for kmem_cache_alloc mm/slub, percpu: correct the calculation of early percpu allocation size ... |
||
Linus Torvalds
|
98d0052d0d |
printk changes for 6.2
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmORzikACgkQUqAMR0iA lPKF/g/7Bmcao3rJkZjEagsYY+s7rGhaFaSbML8FDdyE3UzeXLJOnNxBLrD0JIe9 XFW7+DMqr2uRxsab5C7APy0mrIWp/zCGyJ8CmBILnrPDNcAQ27OhFzxv6WlMUmEc xEjGHrk5dFV96s63gyHGLkKGOZMd/cfcpy/QDOyg0vfF8EZCiPywWMbQQ2Ij8E50 N6UL70ExkoLjT9tzb8NXQiaDqHxqNRvd15aIomDjRrce7eeaL4TaZIT7fKnEcULz 0Lmdo8RUknonCI7Y00RWdVXMqqPD2JsKz3+fh0vBnXEN+aItwyxis/YajtN+m6l7 jhPGt7hNhCKG17auK0/6XVJ3717QwjI3+xLXCvayA8jyewMK14PgzX70hCws0eXM +5M+IeXI4ze5qsq+ln9Dt8zfC+5HGmwXODUtaYTBWhB4nVWdL/CZ+nTv349zt+Uc VIi/QcPQ4vq6EfsxUZR2r6Y12+sSH40iLIROUfqSchtujbLo7qxSNF5x7x9+rtff nWuXo5OsjGE7TZDwn3kr0zSuJ+w/pkWMYQ7jch+A2WqUMYyGC86sL3At7ocL+Esq 34uvzwEgWnNySV8cLiMh34kBmgBwhAP34RhV0RS9iCv8kev2DV7pLQTs9V3QAjw9 EZnFDHATUdikgugaFKCeDV86R3wFgnRWWOdlRrRi6aAzFDqNcYk= =1PTZ -----END PGP SIGNATURE----- Merge tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add NMI-safe SRCU reader API. It uses atomic_inc() instead of this_cpu_inc() on strong load-store architectures. - Introduce new console_list_lock to synchronize a manipulation of the list of registered consoles and their flags. This is a first step in removing the big-kernel-lock-like behavior of console_lock(). This semaphore still serializes console->write() calbacks against: - each other. It primary prevents potential races between early and proper console drivers using the same device. - suspend()/resume() callbacks and init() operations in some drivers. - various other operations in the tty/vt and framebufer susbsystems. It is likely that console_lock() serializes even operations that are not directly conflicting with the console->write() callbacks here. This is the most complicated big-kernel-lock aspect of the console_lock() that will be hard to untangle. - Introduce new console_srcu lock that is used to safely iterate and access the registered console drivers under SRCU read lock. This is a prerequisite for introducing atomic console drivers and console kthreads. It will reduce the complexity of serialization against normal consoles and console_lock(). Also it should remove the risk of deadlock during critical situations, like Oops or panic, when only atomic consoles are registered. - Check whether the console is registered instead of enabled on many locations. It was a historical leftover. - Cleanly force a preferred console in xenfb code instead of a dirty hack. - A lot of code and comment clean ups and improvements. * tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (47 commits) printk: htmldocs: add missing description tty: serial: sh-sci: use setup() callback for early console printk: relieve console_lock of list synchronization duties tty: serial: kgdboc: use console_list_lock to trap exit tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console() tty: serial: kgdboc: use console_list_lock for list traversal tty: serial: kgdboc: use srcu console list iterator proc: consoles: use console_list_lock for list iteration tty: tty_io: use console_list_lock for list synchronization printk, xen: fbfront: create/use safe function for forcing preferred netconsole: avoid CON_ENABLED misuse to track registration usb: early: xhci-dbc: use console_is_registered() tty: serial: xilinx_uartps: use console_is_registered() tty: serial: samsung_tty: use console_is_registered() tty: serial: pic32_uart: use console_is_registered() tty: serial: earlycon: use console_is_registered() tty: hvc: use console_is_registered() efi: earlycon: use console_is_registered() tty: nfcon: use console_is_registered() serial_core: replace uart_console_enabled() with uart_console_registered() ... |
||
Linus Torvalds
|
7fc035058e |
execve updates for v6.2-rc1
- Add timens support (when switching mm). This version has survived in -next for the entire cycle (Andrei Vagin). - Various small bug fixes, refactoring, and readability improvements (Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin). - Remove FOLL_FORCE for stack setup (Kees Cook). - Whilespace cleanups (Rolf Eike Beer, Kees Cook). -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOOjsgWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqwED/9mjtKL2GwHOYKsfhtc0m4HVGBw gxTEKuyo5mRwaRLg2bfuWe1OQfeGWQd9+IZ83Kr2ijzm4R16Gslv9i69Iwdf2tce iFf2R+iR7On+zNokHxaNflRH9fMsZLobVFqzLvB73BUF82ybJlTR3WMnQhS6HZQB Gse8jRfueOnVgKldRLlgdxIucPVsXYSoBS4B0nvIUuQn3aNzDNuuctMe/5NFK0ud +TWMXtKzS3B9pcLTXy3e0bPk/Ptio18CBUEI+iLMAHswtNCoxx1ZCcuvnEcrd5Qr h2WGaRvYJ7oSUXeEsqPKuDdhqEJQH2AQoX8FzvD+hyIutQJCJzVYlHvwGCqn/Km6 0Dalng9Pjb6z2LEie/N42LDXEQmLZO2WtJ4otpORJlsJ7ZkrLjB4u+hDU1JA/Q14 YPWvth3fMA5vAFKvGCtpEc7YdHmghmXCW+YGXOBm625fPYnwFSXOarHfow1RKNE5 MOM4l60WwzLIHgmr8AFUaLf8TbutXN+BKvbMRh2ToWzDYXEoywxAedHDyo4LVwEy mZEca/3izT1ynBcyZg1t8shf4htgLjcPHqM0B+Hq0iNMIrwtecqAcYL/Oj6XssPx OuQYv341KF9fV/hMy84GM2HMr0ygUmrP7b9x+PEvCwzWf/2Glaw6Z4rtCdYC+TjW 8ZWqPqEY+LRsZsL18Q== =ZDYk -----END PGP SIGNATURE----- Merge tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: "Most are small refactorings and bug fixes, but three things stand out: switching timens (which got reverted before) looks solid now, FOLL_FORCE has been removed (no failures seen yet across several weeks in -next), and some whitespace cleanups (which are long overdue). - Add timens support (when switching mm). This version has survived in -next for the entire cycle (Andrei Vagin) - Various small bug fixes, refactoring, and readability improvements (Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin) - Remove FOLL_FORCE for stack setup (Kees Cook) - Whitespace cleanups (Rolf Eike Beer, Kees Cook)" * tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_misc: fix shift-out-of-bounds in check_special_flags binfmt: Fix error return code in load_elf_fdpic_binary() exec: Remove FOLL_FORCE for stack setup binfmt_elf: replace IS_ERR() with IS_ERR_VALUE() binfmt_elf: simplify error handling in load_elf_phdrs() binfmt_elf: fix documented return value for load_elf_phdrs() exec: simplify initial stack size expansion binfmt: Fix whitespace issues exec: Add comments on check_unsafe_exec() fs counting ELF uapi: add spaces before '{' selftests/timens: add a test for vfork+exit fs/exec: switch timens when a task gets a new mm |
||
Linus Torvalds
|
667161ba0a |
seccomp updates for v6.2-rc1
- Add missing kerndoc parameter (Randy Dunlap). - Improve seccomp selftest to check CAP_SYS_ADMIN (Gautam Menghani). - Fix allocation leak when cloned thread immediately dies (Kuniyuki Iwashima). -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOOjOAWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuXHD/45wafnnxUkunfY8Sv6zHSV/93+ L7GHrEGgKe4fAdb64jy0yMWhffhYW95WvlWBPYE+RCyeRliMj+RNfiXZsUGJXLjB 4h7rPe8wzWllW7tcEAl+gHf++1/h9U4iiyMCFsT2MZv+rnQrK33H4cmDmNUHhd7K DcvoxzXkYLrs0pQTIb5xhfdKU0ZbcTEViPra5CbHASwuamVI6Qc5GupcUoPfr7um 2YhmyK4KZQt0zRKrdwyngeQgjuMfMQ1QsuEOhkHLSswWYrEC8xabGWEizS5Ow7Y7 qrz4KH9hTQgKZIKZ52B+6OslOYWVeYba1Zj3SkDiOAbY5ATzKwOOW+5hHGd+0VS5 r32KfC1Y51ZwoS/4hoW4JCITK31GvHT1zvHHnTL2S/ydpPQ72rAUcLNuxYi5Zs1I jDpOpEt8JNPoRqG2qngEHDsdmUqRwdDGkC2hJc8Kzv8aTBTBch1lwAxYIDLf8lqH t27WjZrmN7F+TR1mpTsrPrfi7btoP4ARMkOrDqsf03gfRWHVzpGpRqm9IWJR9/xI PRbWNMAzePSmcfWpo+oh8389Zybp97iCurwhlu8ZCWEUwPK7FMtf8cW78AprAUc2 QLIaOnhM2WlxmqZrGNy636LY25zZMqtS+95nKDFmii3PQU57tmByM36DP2IGuIie 6h+Pwyf1LJht7yiczg== =++ua -----END PGP SIGNATURE----- Merge tag 'seccomp-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - Add missing kerndoc parameter (Randy Dunlap) - Improve seccomp selftest to check CAP_SYS_ADMIN (Gautam Menghani) - Fix allocation leak when cloned thread immediately dies (Kuniyuki Iwashima) * tag 'seccomp-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: document the "filter_count" field seccomp: Move copy_seccomp() to no failure path. selftests/seccomp: Check CAP_SYS_ADMIN capability in the test mode_filter_without_nnp |
||
Linus Torvalds
|
f433cf2102 |
KCSAN updates for v6.2
This series adds instrumentation for memcpy(), memset(), and memmove() for Clang v16+'s new function names that are used when the -fsanitize=thread argument is given. It also fixes objtool warnings from KCSAN's volatile instrumentation, and fixes a pair of typos in a pair of Kconfig options' help clauses. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKn5gTHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jHoyD/9gkMct9vIgKRZbMYW9haoLIJlYiwrT sps8x2PssOwy99I89BTEovnIyPdQ9y3uLuHWMHAUcSN0JZqe797OIEnFImPiXPQF q2dEg4zeHXGlD0+EDpr+FUcu1Sc4ppk2AqQiS4YiIfQzunS2RMETB+FkehLDMmgm sJCd40E+xsCbq8yeCYOP2UkDeeJbVvdekli3GsjCu8vjE2UYaBjjZugXgIge9lOQ FnMfJfrcQutLgLrm4oz2s2Jt7Km3Bl40IJVYeFGdrKBaIXhCXKsbESfOdudRGRTb jPNf7s7Ofce8b3DQcT/sr8II49CZ0ekhEsExTfGdQKTz+2tghxGolY7VOZ8Nvd78 fM4SHicN/JREMcLTES0VNR+qPQLoFX1qtIXWQUt6OvxP1EoMandRahaGv+OYJ2Cm lWcmiZWJIDNhQukgnFn2wSd2pkn+Bqj5S6oUhBdcjvVBvt2vCCJtHfZenCLJvJLq k7nPvofvxA7oec9kDRcwJz+Np29DT7MR8gcn0kElF/Biq1F/wlKNuXyX9Sexm821 XQWEWUGFOtirK9BtxDI8R1uIpKWvLm66mnoXDWSb9kTZrkZHc2sa5j7ipPfOtWkr GAPfGrn2o6ckg7M5SKlRo87RdjDzyFxLXn5vqkmwMM8ntRTq8nc7JpJSxX96wVJm v5+HXwRwB5iT0w== =/aBz -----END PGP SIGNATURE----- Merge tag 'kcsan.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull KCSAN updates from Paul McKenney: - Add instrumentation for memcpy(), memset(), and memmove() for Clang v16+'s new function names that are used when the -fsanitize=thread argument is given - Fix objtool warnings from KCSAN's volatile instrumentation, and typos in a pair of Kconfig options' help clauses * tag 'kcsan.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan: Fix trivial typo in Kconfig help comments objtool, kcsan: Add volatile read/write instrumentation to whitelist kcsan: Instrument memcpy/memset/memmove with newer Clang |
||
Linus Torvalds
|
1fab45ab6e |
RCU pull request for v6.2
This pull request contains the following branches: doc.2022.10.20a: Documentation updates. This is the second in a series from an ongoing review of the RCU documentation. fixes.2022.10.21a: Miscellaneous fixes. lazy.2022.11.30a: Introduces a default-off Kconfig option that depends on RCU_NOCB_CPU that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument CPU lists, causes call_rcu() to introduce delays. These delays result in significant power savings on nearly idle Android and ChromeOS systems. These savings range from a few percent to more than ten percent. This series also includes several commits that change call_rcu() to a new call_rcu_hurry() function that avoids these delays in a few cases, for example, where timely wakeups are required. Several of these are outside of RCU and thus have acks and reviews from the relevant maintainers. srcunmisafe.2022.11.09a: Creates an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe() for architectures that support NMIs, but which do not provide NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required by the upcoming lockless printk() work by John Ogness et al. That printk() series depends on these commits, so if you pull the printk() series before this one, you will have already pulled in this branch, plus two more SRCU commits: |
||
Rafael J. Wysocki
|
e0e44513c7 |
Merge branches 'powercap', 'pm-x86', 'pm-opp' and 'pm-misc'
Merge power capping code updates, x86-specific power management pdate, operating performance points library updates and miscellaneous power management updates for 6.2-rc1: - Fix compiler warnings with make W=1 in the idle_inject power capping driver (Srinivas Pandruvada). - Use kstrtobool() instead of strtobool() in the power capping sysfs interface (Christophe JAILLET). - Add SCMI Powercap based power capping driver (Cristian Marussi). - Add Emerald Rapids support to the intel-uncore-freq driver (Artem Bityutskiy). - Repair slips in kernel-doc comments in the generic notifier code (Lukas Bulwahn). - Fix several DT issues in the OPP library reorganize code around opp-microvolt-<named> DT property (Viresh Kumar). - Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties to be present without the others present (James Calligeros). - Fix clock-latency-ns property in DT example (Serge Semin). * powercap: powercap: idle_inject: Fix warnings with make W=1 powercap: Use kstrtobool() instead of strtobool() powercap: arm_scmi: Add SCMI Powercap based driver * pm-x86: platform/x86: intel-uncore-freq: add Emerald Rapids support * pm-opp: dt-bindings: opp-v2: Fix clock-latency-ns prop in example OPP: decouple dt properties in opp_parse_supplies() OPP: Simplify opp_parse_supplies() by restructuring it OPP: Parse named opp-microwatt property too dt-bindings: opp: Fix named microwatt property dt-bindings: opp: Fix usage of current in microwatt property * pm-misc: notifier: repair slips in kernel-doc comments |
||
Rafael J. Wysocki
|
7680d45a91 |
Merge branches 'pm-cpuidle', 'pm-sleep' and 'pm-domains'
Merge cpuidle changes, updates related to system sleep amd generic power domains code fixes for 6.2-rc1: - Improve kernel messages printed by the cpuidle PCI driver (Ulf Hansson). - Make the DT cpuidle driver return the correct number of parsed idle states, clean it up and clarify a comment in it (Ulf Hansson). - Modify the tasks freezing code to avoid using pr_cont() and refine an error message printed by it (Rafael Wysocki). - Make the hibernation core code complain about memory map mismatches during resume to help diagnostics (Xueqin Luo). - Fix mistake in a kerneldoc comment in the hibernation code (xiongxin). - Reverse the order of performance and enabling operations in the generic power domains code (Abel Vesa). - Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the generic power domains code (Abel Vesa). - Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn Guo). - Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo). - Drop generic power domain status manipulation during hibernate restore (Shawn Guo). * pm-cpuidle: cpuidle: dt: Clarify a comment and simplify code in dt_init_idle_driver() cpuidle: dt: Return the correct numbers of parsed idle states cpuidle: psci: Extend information in log about OSI/PC mode * pm-sleep: PM: sleep: Refine error message in try_to_freeze_tasks() PM: sleep: Avoid using pr_cont() in the tasks freezing code PM: hibernate: Complain about memory map mismatches during resume PM: hibernate: Fix mistake in kerneldoc comment * pm-domains: PM: domains: Reverse the order of performance and enabling ops PM: domains: Power off[on] domain in hibernate .freeze[thaw]_noirq hook PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq() PM: domains: Pass generic PM noirq hooks to genpd_finish_suspend() PM: domains: Drop genpd status manipulation for hibernate restore |
||
Gavrilov Ilia
|
4d8586e046 |
relay: fix type mismatch when allocating memory in relay_create_buf()
The 'padding' field of the 'rchan_buf' structure is an array of 'size_t'
elements, but the memory is allocated for an array of 'size_t *' elements.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lkml.kernel.org/r/20221129092002.3538384-1-Ilia.Gavrilov@infotecs.ru
Fixes:
|
||
Anders Roxell
|
6fcd4267a8 |
kernel: kcsan: kcsan_test: build without structleak plugin
Building kcsan_test with structleak plugin enabled makes the stack frame size to grow. kernel/kcsan/kcsan_test.c:704:1: error: the frame size of 3296 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Turn off the structleak plugin checks for kcsan_test. Link: https://lkml.kernel.org/r/20221128104358.2660634-1-anders.roxell@linaro.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marco Elver <elver@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Gow <davidgow@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Xu Panda
|
3f0dad0105 |
relay: use strscpy() is more robust and safer
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Link: https://lkml.kernel.org/r/202211220853259244666@zte.com.cn Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Tejun Heo
|
fbf8321238 |
memcg: Fix possible use-after-free in memcg_write_event_control()
memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to |
||
Yang Jihong
|
f596da3efa |
blktrace: Fix output non-blktrace event when blk_classic option enabled
When the blk_classic option is enabled, non-blktrace events must be
filtered out. Otherwise, events of other types are output in the blktrace
classic format, which is unexpected.
The problem can be triggered in the following ways:
# echo 1 > /sys/kernel/debug/tracing/options/blk_classic
# echo 1 > /sys/kernel/debug/tracing/events/enable
# echo blk > /sys/kernel/debug/tracing/current_tracer
# cat /sys/kernel/debug/tracing/trace_pipe
Fixes:
|
||
Petr Mladek
|
6b2b0d839a | Merge branch 'rework/console-list-lock' into for-linus | ||
Stephen Boyd
|
169a58ad82 |
module/decompress: Support zstd in-kernel decompression
Add support for zstd compressed modules to the in-kernel decompression code. This allows zstd compressed modules to be decompressed by the kernel, similar to the existing support for gzip and xz compressed modules. Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Piotr Gorski <lucjan.lucjanov@gmail.com> Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
||
Will Deacon
|
a4aebff7ef |
Merge branch 'for-next/ftrace' into for-next/core
* for-next/ftrace: ftrace: arm64: remove static ftrace ftrace: arm64: move from REGS to ARGS ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer() ftrace: pass fregs to arch_ftrace_set_direct_caller() |
||
Rafael J. Wysocki
|
96d4b8e1ad |
PM: sleep: Refine error message in try_to_freeze_tasks()
A previous change amended try_to_freeze_tasks() with the "what" variable pointing to a string describing the group of tasks subject to the freezing which may be used in the error message in there too, so make that happen. Accordingly, update sleepgraph.py to catch the modified error message as appropriate. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> |
||
Rafael J. Wysocki
|
a449dfbfc0 |
PM: sleep: Avoid using pr_cont() in the tasks freezing code
Using pr_cont() in the tasks freezing code related to system-wide suspend and hibernation is problematic, because the continuation messages printed there are susceptible to interspersing with other unrelated messages which results in output that is hard to understand. Address this issue by modifying try_to_freeze_tasks() to print messages that don't require continuations and adjusting its callers accordingly. Reported-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> |
||
Thomas Gleixner
|
3d393b2174 |
genirq/msi: Provide msi_domain_alloc_irq_at()
For supporting post MSI-X enable allocations and for the upcoming PCI/IMS support a separate interface is required which allows not only the allocation of a specific index, but also the allocation of any, i.e. the next free index. The latter is especially required for IMS because IMS completely does away with index to functionality mappings which are often found in MSI/MSI-X implementation. But even with MSI-X there are devices where only the first few indices have a fixed functionality and the rest is freely assignable by software, e.g. to queues. msi_domain_alloc_irq_at() is also different from the range based interfaces as it always enforces that the MSI descriptor is allocated by the core code and not preallocated by the caller like the PCI/MSI[-X] enable code path does. msi_domain_alloc_irq_at() can be invoked with the index argument set to MSI_ANY_INDEX which makes the core code pick the next free index. The irq domain can provide a prepare_desc() operation callback in it's msi_domain_ops to do domain specific post allocation initialization before the actual Linux interrupt and the associated interrupt descriptor and hierarchy alloccations are conducted. The function also takes an optional @icookie argument which is of type union msi_instance_cookie. This cookie is not used by the core code and is stored in the allocated msi_desc::data::icookie. The meaning of the cookie is completely implementation defined. In case of IMS this might be a PASID or a pointer to a device queue, but for the MSI core it's opaque and not used in any way. The function returns a struct msi_map which on success contains the allocated index number and the Linux interrupt number so the caller can spare the index to Linux interrupt number lookup. On failure map::index contains the error code and map::virq is 0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.501359457@linutronix.de |
||
Thomas Gleixner
|
8f986fd775 |
genirq/msi: Provide msi_domain_ops:: Prepare_desc()
The existing MSI domain ops msi_prepare() and set_desc() turned out to be unsuitable for implementing IMS support. msi_prepare() does not operate on the MSI descriptors. set_desc() lacks an irq_domain pointer and has a completely different purpose. Introduce a prepare_desc() op which allows IMS implementations to amend an MSI descriptor which was allocated by the core code, e.g. by adjusting the iomem base or adding some data based on the allocated index. This is way better than requiring that all IMS domain implementations preallocate the MSI descriptor and then allocate the interrupt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.444560717@linutronix.de |
||
Thomas Gleixner
|
bd141a3db4 |
genirq/msi: Provide BUS_DEVICE_PCI_MSI[X]
Provide new bus tokens for the upcoming per device PCI/MSI and PCI/MSIX interrupt domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.917219885@linutronix.de |
||
Thomas Gleixner
|
36db3d9003 |
genirq/msi: Add range checking to msi_insert_desc()
Per device domains provide the real domain size to the core code. This allows range checking on insertion of MSI descriptors and also paves the way for dynamic index allocations which are required e.g. for IMS. This avoids external mechanisms like bitmaps on the device side and just utilizes the core internal MSI descriptor storxe for it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.798556374@linutronix.de |
||
Linus Torvalds
|
bce9332220 |
proc: proc_skip_spaces() shouldn't think it is working on C strings
proc_skip_spaces() seems to think it is working on C strings, and ends up being just a wrapper around skip_spaces() with a really odd calling convention. Instead of basing it on skip_spaces(), it should have looked more like proc_skip_char(), which really is the exact same function (except it skips a particular character, rather than whitespace). So use that as inspiration, odd coding and all. Now the calling convention actually makes sense and works for the intended purpose. Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
e6cfaf34be |
proc: avoid integer type confusion in get_proc_long
proc_get_long() is passed a size_t, but then assigns it to an 'int' variable for the length. Let's not do that, even if our IO paths are limited to MAX_RW_COUNT (exactly because of these kinds of type errors). So do the proper test in the rigth type. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Thomas Gleixner
|
26e91b75bf |
genirq/msi: Provide msi_match_device_domain()
Provide an interface to match a per device domain bus token. This allows to query which type of domain is installed for a particular domain id. Will be used for PCI to avoid frequent create/remove cycles for the MSI resp. MSI-X domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.738047902@linutronix.de |
||
Thomas Gleixner
|
27a6dea3eb |
genirq/msi: Provide msi_create/free_device_irq_domain()
Now that all prerequsites are in place, provide the actual interfaces for creating and removing per device interrupt domains. MSI device interrupt domains are created from the provided msi_domain_template which is duplicated so that it can be modified for the particular device. The name of the domain and the name of the interrupt chip are composed by "$(PREFIX)$(CHIPNAME)-$(DEVNAME)" $PREFIX: The optional prefix provided by the underlying MSI parent domain via msi_parent_ops::prefix. $CHIPNAME: The name of the irq_chip in the template $DEVNAME: The name of the device The domain is further initialized through a MSI parent domain callback which fills in the required functionality for the parent domain or domains further down the hierarchy. This initialization can fail, e.g. when the requested feature or MSI domain type cannot be supported. The domain pointer is stored in the pointer array inside of msi_device_data which is attached to the domain. The domain can be removed via the API or left for disposal via devres when the device is torn down. The API removal is useful e.g. for PCI to have seperate domains for MSI and MSI-X, which are mutually exclusive and always occupy the default domain id slot. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.678838546@linutronix.de |
||
Thomas Gleixner
|
a80c0aceea |
genirq/msi: Split msi_create_irq_domain()
Split the functionality of msi_create_irq_domain() so it can be reused for creating per device irq domains. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.559086358@linutronix.de |
||
Thomas Gleixner
|
61bf992fc6 |
genirq/msi: Add size info to struct msi_domain_info
To allow proper range checking especially for dynamic allocations add a size field to struct msi_domain_info. If the field is 0 then the size is unknown or unlimited (up to MSI_MAX_INDEX) to provide backwards compability. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.501144862@linutronix.de |