blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-02-01	nvme-auth: use workqueue dedicated to authentication	Shin'ichiro Kawasaki	1	-2/+12
	NVMe In-Band authentication uses two kinds of works: chap->auth_work and ctrl->dhchap_auth_work. The latter work flushes or cancels the former work. However, the both works are queued to the same workqueue nvme-wq. It results in the lockdep WARNING as follows: WARNING: possible recursive locking detected 6.2.0-rc4+ #1 Not tainted -------------------------------------------- kworker/u16:7/69 is trying to acquire lock: ffff902d52e65548 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: start_flush_work+0x2c5/0x380 but task is already holding lock: ffff902d52e65548 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x210/0x410 To avoid the WARNING, introduce a new workqueue nvme-auth-wq dedicated to chap->auth_work. Reported-by: Daniel Wagner <[email protected]> Link: https://lore.kernel.org/linux-nvme/[email protected]/ Fixes: f50fff73d620 ("nvme: implement In-Band authentication") Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Tested-by: Daniel Wagner <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-02-01	ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro 360	Guillaume Pinot	1	-0/+1
	Samsung Galaxy Book2 Pro 360 (13" 2022 NP930QED-KA1FR) with codec SSID 144d:ca03 requires the same workaround for enabling the speaker amp like other Samsung models with ALC298 codec. Cc: <[email protected]> Signed-off-by: Guillaume Pinot <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-02-01	nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set	Maurizio Lombardi	1	-0/+1
	In nvme_alloc_io_tag_set(), the connect_q pointer should be set to NULL in case of error to avoid potential invalid pointer dereferences. Signed-off-by: Maurizio Lombardi <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-02-01	nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set	Maurizio Lombardi	1	-1/+3
	If nvme_alloc_admin_tag_set() fails, the admin_q and fabrics_q pointers are left with an invalid, non-NULL value. Other functions may then check the pointers and dereference them, e.g. in nvme_probe() -> out_disable: -> nvme_dev_remove_admin(). Fix the bug by setting admin_q and fabrics_q to NULL in case of error. Also use the set variable to free the tag_set as ctrl->admin_tagset isn't initialized yet. Signed-off-by: Maurizio Lombardi <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-02-01	nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association	Amit Engel	1	-1/+3
	As part of nvmet_fc_ls_create_association there is a case where nvmet_fc_alloc_target_queue fails right after a new association with an admin queue is created. In this case, no one releases the get taken in nvmet_fc_alloc_target_assoc. This fix is adding the missing put. Signed-off-by: Amit Engel <[email protected]> Reviewed-by: James Smart <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-02-01	ALSA: pci: lx6464es: fix a debug loop	Dan Carpenter	1	-6/+5
	This loop accidentally reuses the "i" iterator for both the inside and the outside loop. The value of MAX_STREAM_BUFFER is 5. I believe that chip->rmh.stat_len is in the 2-12 range. If the value of .stat_len is 4 or more then it will loop exactly one time, but if it's less then it is a forever loop. It looks like it was supposed to combined into one loop where conditions are checked. Fixes: 8e6320064c33 ("ALSA: lx_core: Remove useless #if 0 .. #endif") Signed-off-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/Y9jnJTis/mRFJAQp@kili Signed-off-by: Takashi Iwai <[email protected]>
2023-02-01	drm/panel: boe-tv101wum-nl6: Ensure DSI writes succeed during disable	Stephen Boyd	1	-4/+12
	The unprepare sequence has started to fail after moving to panel bridge code in the msm drm driver (commit 007ac0262b0d ("drm/msm/dsi: switch to DRM_PANEL_BRIDGE")). You'll see messages like this in the kernel logs: panel-boe-tv101wum-nl6 ae94000.dsi.0: failed to set panel off: -22 This is because boe_panel_enter_sleep_mode() needs an operating DSI link to set the panel into sleep mode. Performing those writes in the unprepare phase of bridge ops is too late, because the link has already been torn down by the DSI controller in post_disable, i.e. the PHY has been disabled, etc. See dsi_mgr_bridge_post_disable() for more details on the DSI . Split the unprepare function into a disable part and an unprepare part. For now, just the DSI writes to enter sleep mode are put in the disable function. This fixes the panel off routine and keeps the panel happy. My Wormdingler has an integrated touchscreen that stops responding to touch if the panel is only half disabled too. This patch fixes it. And finally, this saves power when the screen is off because without this fix the regulators for the panel are left enabled when nothing is being displayed on the screen. Fixes: 007ac0262b0d ("drm/msm/dsi: switch to DRM_PANEL_BRIDGE") Fixes: a869b9db7adf ("drm/panel: support for boe tv101wum-nl6 wuxga dsi video mode panel") Cc: yangcong <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Jitao Shi <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Rob Clark <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit c913cd5489930abbb557ef144a333846286754c3) Signed-off-by: Thomas Zimmermann <[email protected]>
2023-01-31	Merge patch "riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y"	Palmer Dabbelt	2	-19/+12
	This is a single fix, but it conflicts with some recent features. I'm merging it on top of the commit it fixes to ease backporting. * b4-shazam-merge: riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2023-01-31	riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y	Samuel Holland	2	-18/+10
	commit 8eb060e10185 ("arch/riscv: add Zihintpause support") broke building with CONFIG_CC_OPTIMIZE_FOR_SIZE enabled (gcc 11.1.0): CC arch/riscv/kernel/vdso/vgettimeofday.o In file included from <command-line>: ./arch/riscv/include/asm/jump_label.h: In function 'cpu_relax': ././include/linux/compiler_types.h:285:33: warning: 'asm' operand 0 probably does not match constraints 285 \| #define asm_volatile_goto(x...) asm goto(x) \| ^~~ ./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto' 41 \| asm_volatile_goto( \| ^~~~~~~~~~~~~~~~~ ././include/linux/compiler_types.h:285:33: error: impossible constraint in 'asm' 285 \| #define asm_volatile_goto(x...) asm goto(x) \| ^~~ ./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto' 41 \| asm_volatile_goto( \| ^~~~~~~~~~~~~~~~~ make[1]: * [scripts/Makefile.build:249: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1 make: * [arch/riscv/Makefile:128: vdso_prepare] Error 2 Having a static branch in cpu_relax() is problematic because that function is widely inlined, including in some quite complex functions like in the VDSO. A quick measurement shows this static branch is responsible by itself for around 40% of the jump table. Drop the static branch, which ends up being the same number of instructions anyway. If Zihintpause is supported, we trade the nop from the static branch for a div. If Zihintpause is unsupported, we trade the jump from the static branch for (what gets interpreted as) a nop. Fixes: 8eb060e10185 ("arch/riscv: add Zihintpause support") Signed-off-by: Samuel Holland <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Cc: [email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2023-01-31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf	Jakub Kicinski	2	-2/+4
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Release bridge info once packet escapes the br_netfilter path, from Florian Westphal. 2) Revert incorrect fix for the SCTP connection tracking chunk iterator, also from Florian. First path fixes a long standing issue, the second path addresses a mistake in the previous pull request for net. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: Revert "netfilter: conntrack: fix bug in for_each_sctp_chunk" netfilter: br_netfilter: disable sabotage_in hook after first suppression ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-31	net: phy: meson-gxl: Add generic dummy stubs for MMD register access	Chris Healy	1	-0/+2
	The Meson G12A Internal PHY does not support standard IEEE MMD extended register access, therefore add generic dummy stubs to fail the read and write MMD calls. This is necessary to prevent the core PHY code from erroneously believing that EEE is supported by this PHY even though this PHY does not support EEE, as MMD register access returns all FFFFs. Fixes: 5c3407abb338 ("net: phy: meson-gxl: add g12a support") Reviewed-by: Heiner Kallweit <[email protected]> Signed-off-by: Chris Healy <[email protected]> Reviewed-by: Jerome Brunet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-31	net: fix NULL pointer in skb_segment_list	Yan Zhai	1	-3/+2
	Commit 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") introduced UDP listifyed GRO. The segmentation relies on frag_list being untouched when passing through the network stack. This assumption can be broken sometimes, where frag_list itself gets pulled into linear area, leaving frag_list being NULL. When this happens it can trigger following NULL pointer dereference, and panic the kernel. Reverse the test condition should fix it. [19185.577801][ C1] BUG: kernel NULL pointer dereference, address: ... [19185.663775][ C1] RIP: 0010:skb_segment_list+0x1cc/0x390 ... [19185.834644][ C1] Call Trace: [19185.841730][ C1] <TASK> [19185.848563][ C1] __udp_gso_segment+0x33e/0x510 [19185.857370][ C1] inet_gso_segment+0x15b/0x3e0 [19185.866059][ C1] skb_mac_gso_segment+0x97/0x110 [19185.874939][ C1] __skb_gso_segment+0xb2/0x160 [19185.883646][ C1] udp_queue_rcv_skb+0xc3/0x1d0 [19185.892319][ C1] udp_unicast_rcv_skb+0x75/0x90 [19185.900979][ C1] ip_protocol_deliver_rcu+0xd2/0x200 [19185.910003][ C1] ip_local_deliver_finish+0x44/0x60 [19185.918757][ C1] __netif_receive_skb_one_core+0x8b/0xa0 [19185.927834][ C1] process_backlog+0x88/0x130 [19185.935840][ C1] __napi_poll+0x27/0x150 [19185.943447][ C1] net_rx_action+0x27e/0x5f0 [19185.951331][ C1] ? mlx5_cq_tasklet_cb+0x70/0x160 [mlx5_core] [19185.960848][ C1] __do_softirq+0xbc/0x25d [19185.968607][ C1] irq_exit_rcu+0x83/0xb0 [19185.976247][ C1] common_interrupt+0x43/0xa0 [19185.984235][ C1] asm_common_interrupt+0x22/0x40 ... [19186.094106][ C1] </TASK> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Suggested-by: Daniel Borkmann <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Yan Zhai <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/Y9gt5EUizK1UImEP@debian Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-31	net: fman: memac: free mdio device if lynx_pcs_create() fails	Vladimir Oltean	1	-0/+3
	When memory allocation fails in lynx_pcs_create() and it returns NULL, there remains a dangling reference to the mdiodev returned by of_mdio_find_device() which is leaked as soon as memac_pcs_create() returns empty-handed. Fixes: a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Sean Anderson <[email protected]> Acked-by: Madalin Bucur <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-01-31	sctp: do not check hb_timer.expires when resetting hb_timer	Xin Long	1	-3/+1
	It tries to avoid the frequently hb_timer refresh in commit ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often"), and it only allows mod_timer when the new expires is after hb_timer.expires. It means even a much shorter interval for hb timer gets applied, it will have to wait until the current hb timer to time out. In sctp_do_8_2_transport_strike(), when a transport enters PF state, it expects to update the hb timer to resend a heartbeat every rto after calling sctp_transport_reset_hb_timer(), which will not work as the change mentioned above. The frequently hb_timer refresh was caused by sctp_transport_reset_timers() called in sctp_outq_flush() and it was already removed in the commit above. So we don't have to check hb_timer.expires when resetting hb_timer as it is now not called very often. Fixes: ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often") Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Link: https://lore.kernel.org/r/d958c06985713ec84049a2d5664879802710179a.1675095933.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-01	powerpc/kexec_file: Count hot-pluggable memory in FDT estimate	Sourabh Jain	1	-1/+1
	On Systems where online memory is lesser compared to max memory, the kexec_file_load system call may fail to load the kdump kernel with the below errors: "Failed to update fdt with linux,drconf-usable-memory property" "Error setting up usable-memory property for kdump kernel" This happens because the size estimation for usable memory properties for the kdump kernel's FDT is based on the online memory whereas the usable memory properties include max memory. In short, the hot-pluggable memory is not accounted for while estimating the size of the usable memory properties. The issue is addressed by calculating usable memory property size using max hotplug address instead of the last online memory address. Fixes: 2377c92e37fe ("powerpc/kexec_file: fix FDT size estimation for kdump kernel") Signed-off-by: Sourabh Jain <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-31	mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()	Kefeng Wang	1	-1/+4
	As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"), hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could occurs a NULL pointer dereference, let's do not record the foreign writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to fix it. Link: https://lkml.kernel.org/r/[email protected] Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") Signed-off-by: Kefeng Wang <[email protected]> Reported-by: Ma Wupeng <[email protected]> Tested-by: Miko Larsson <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Ma Wupeng <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	Kconfig.debug: fix the help description in SCHED_DEBUG	ye xingchen	1	-1/+1
	The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: ye xingchen <[email protected]> Cc: Dan Williams <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Cook <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zhaoyang Huang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	mm/swapfile: add cond_resched() in get_swap_pages()	Longlong Xia	1	-0/+1
	The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently. The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Longlong Xia <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Cc: Chen Wandun <[email protected]> Cc: Huang Ying <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Nanyong Sun <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	mm: use stack_depot_early_init for kmemleak	Zhaoyang Huang	2	-2/+4
	Mirsad report the below error which is caused by stack_depot_init() failure in kvcalloc. Solve this by having stackdepot use stack_depot_early_init(). On 1/4/23 17:08, Mirsad Goran Todorovac wrote: I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak: [root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff951c118568b0 (size 16): comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s) hex dump (first 16 bytes): 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0....... backtrace: [root@pc-mtodorov ~]# Apparently, backtrace of called functions on the stack is no longer printed with the list of memory leaks. This appeared on Lenovo desktop 10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and 6.2-rc1 and 6.2-rc2 builds. This worked on 6.1 with the same CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from Mr. Torvalds' tree. I don't know if this is deliberate feature for some reason or a bug. Please find attached the config, lshw and kmemleak output. [[email protected]: remove stack_depot_init() call] Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 56a61617dd22 ("mm: use stack_depot for recording kmemleak's backtrace") Reported-by: Mirsad Todorovac <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Signed-off-by: Zhaoyang Huang <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Borislav Petkov (AMD) <[email protected]> Cc: ke.wang <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	Squashfs: fix handling and sanity checking of xattr_ids count	Phillip Lougher	4	-5/+5
	A Sysbot [1] corrupted filesystem exposes two flaws in the handling and sanity checking of the xattr_ids count in the filesystem. Both of these flaws cause computation overflow due to incorrect typing. In the corrupted filesystem the xattr_ids value is 4294967071, which stored in a signed variable becomes the negative number -225. Flaw 1 (64-bit systems only): The signed integer xattr_ids variable causes sign extension. This causes variable overflow in the SQUASHFS_XATTR_(A) macros. The variable is first multiplied by sizeof(struct squashfs_xattr_id) where the type of the sizeof operator is "unsigned long". On a 64-bit system this is 64-bits in size, and causes the negative number to be sign extended and widened to 64-bits and then become unsigned. This produces the very large number 18446744073709548016 or 2^64 - 3600. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0 (stored in len). Flaw 2 (32-bit systems only): On a 32-bit system the integer variable is not widened by the unsigned long type of the sizeof operator (32-bits), and the signedness of the variable has no effect due it always being treated as unsigned. The above corrupted xattr_ids value of 4294967071, when multiplied overflows and produces the number 4294963696 or 2^32 - 3400. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows again and produces a length of 0. The effect of the 0 length computation: In conjunction with the corrupted xattr_ids field, the filesystem also has a corrupted xattr_table_start value, where it matches the end of filesystem value of 850. This causes the following sanity check code to fail because the incorrectly computed len of 0 matches the incorrect size of the table reported by the superblock (0 bytes). len = SQUASHFS_XATTR_BLOCK_BYTES(xattr_ids); indexes = SQUASHFS_XATTR_BLOCKS(xattr_ids); / * The computed size of the index table (len bytes) should exactly * match the table start and end points / start = table_start + sizeof(id_table); end = msblk->bytes_used; if (len != (end - start)) return ERR_PTR(-EINVAL); Changing the xattr_ids variable to be "usigned int" fixes the flaw on a 64-bit system. This relies on the fact the computation is widened by the unsigned long type of the sizeof operator. Casting the variable to u64 in the above macro fixes this flaw on a 32-bit system. It also means 64-bit systems do not implicitly rely on the type of the sizeof operator to widen the computation. [1] https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup") Signed-off-by: Phillip Lougher <[email protected]> Reported-by: <[email protected]> Cc: Alexey Khoroshilov <[email protected]> Cc: Fedor Pchelkin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	sh: define RUNTIME_DISCARD_EXIT	Tom Saeger	1	-0/+1
	sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv"). This is similar to fixes for powerpc and s390: commit 4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT"). commit a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36"). $ sh4-linux-gnu-ld --version \| head -n1 GNU ld (GNU Binutils for Debian) 2.35.2 $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- `.exit.text' referenced in section `__bug_table' of crypto/algboss.o: defined in discarded section `.exit.text' of crypto/algboss.o `.exit.text' referenced in section `__bug_table' of drivers/char/hw_random/core.o: defined in discarded section `.exit.text' of drivers/char/hw_random/core.o make[2]: * [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make[1]: * [Makefile:1252: vmlinux] Error 2 arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT: /* * .exit.text is discarded at runtime, not link time, to deal with * references from __bug_table */ .exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT } However, EXIT_TEXT is thrown away by DISCARD(include/asm-generic/vmlinux.lds.h) because sh does not define RUNTIME_DISCARD_EXIT. GNU ld 2.40 does not have this issue and builds fine. This corresponds with Masahiro's comments in a494398bde27: "Nathan [Chancellor] also found that binutils commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this issue, so we cannot reproduce it with binutils 2.36+, but it is better to not rely on it." Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv") Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Tom Saeger <[email protected]> Tested-by: John Paul Adrian Glaubitz <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dennis Gilmore <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Naresh Kamboju <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Rich Felker <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	highmem: round down the address passed to kunmap_flush_on_unmap()	Matthew Wilcox (Oracle)	1	-2/+2
	We already round down the address in kunmap_local_indexed() which is the other implementation of __kunmap_local(). The only implementation of kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned address. This may be causing PA-RISC to be flushing the wrong addresses currently. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Fixes: 298fa1ad5571 ("highmem: Provide generic variant of kmap_atomic*") Reviewed-by: Ira Weiny <[email protected]> Cc: "Fabio M. De Francesco" <[email protected]> Cc: Al Viro <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Helge Deller <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: David Sterba <[email protected]> Cc: Kees Cook <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Tony Luck <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	migrate: hugetlb: check for hugetlb shared PMD in node migration	Mike Kravetz	1	-1/+2
	migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to move pages shared with another process to a different node. page_mapcount > 1 is being used to determine if a hugetlb page is shared. However, a hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. As a result, hugetlb pages shared by multiple processes and mapped with a shared PMD can be moved by a process without CAP_SYS_NICE. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found consider the page shared. Link: https://lkml.kernel.org/r/[email protected] Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()") Signed-off-by: Mike Kravetz <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: James Houghton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps	Mike Kravetz	2	-3/+14
	Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs". This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue. [1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/ This patch (of 2): A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table. page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added. [[email protected]: simplification, per David] [[email protected]: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/[email protected] Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: James Houghton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups	Zach O'Keefe	1	-0/+8
	In commit 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none(): - if (!pmd_present(pmde)) - return SCAN_PMD_NULL; + if (pmd_none(pmde)) + return SCAN_PMD_NONE; This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include: A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address. In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or it's a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd. However, the commit introduces a logical hole; namely, that we've allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values aren't checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd. We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch. The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86's pmd_present() also checks the _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit. Covering all 4 cases for x86 (all checks done on the same pmd value): 1) pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger? The only relevant path is: madvise_collapse() -> collapse_pte_mapped_thp() Where we might just incorrectly report back "success", when really the memory isn't pmd-backed. This is fine, since split could happen immediately after (actually) successful madvise_collapse(). So, it should be safe to just assume huge-pmd here. 2) pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock -- or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated. 3) !pmd_present() && pmd_trans_huge() Not possible. 4) !pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present I've checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesn't necessarily hold in general -- but that doesn't matter since !pmd_present() always takes failure path). Also, add a comment above find_pmd_or_thp_or_none() to help future travelers reason about the validity of the code; namely, the possible mutations that might happen out from under us, depending on how mmap_lock is held (if at all). Link: https://lkml.kernel.org/r/[email protected] Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") Signed-off-by: Zach O'Keefe <[email protected]> Reported-by: Hugh Dickins <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	Revert "mm: kmemleak: alloc gray object for reserved region with direct map"	Isaac J. Manjarres	1	-5/+1
	This reverts commit 972fa3a7c17c9d60212e32ecc0205dc585b1e769. Kmemleak operates by periodically scanning memory regions for pointers to allocated memory blocks to determine if they are leaked or not. However, reserved memory regions can be used for DMA transactions between a device and a CPU, and thus, wouldn't contain pointers to allocated memory blocks, making them inappropriate for kmemleak to scan. Thus, revert this commit. Link: https://lkml.kernel.org/r/[email protected] Fixes: 972fa3a7c17c9 ("mm: kmemleak: alloc gray object for reserved region with direct map") Signed-off-by: Isaac J. Manjarres <[email protected]> Acked-by: Catalin Marinas <[email protected]> Cc: Calvin Zhang <[email protected]> Cc: Frank Rowand <[email protected]> Cc: Rob Herring <[email protected]> Cc: Saravana Kannan <[email protected]> Cc: <[email protected]> [5.17+] Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	freevxfs: Kconfig: fix spelling	Randy Dunlap	1	-1/+1
	Fix a spello in freevxfs Kconfig. (reported by codespell) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	maple_tree: should get pivots boundary by type	Wei Yang	1	-2/+3
	We should get pivots boundary by type. Fixes a potential overindexing of mt_pivots[]. Link: https://lkml.kernel.org/r/[email protected] Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Wei Yang <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	.mailmap: update e-mail address for Eugen Hristev	Eugen Hristev	1	-0/+1
	Update e-mail address. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eugen Hristev <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	mm, mremap: fix mremap() expanding for vma's with vm_ops->close()	Vlastimil Babka	1	-6/+19
	Fabian has reported another regression in 6.1 due to ca3d76b0aa80 ("mm: add merging after mremap resize"). The problem is that vma_merge() can fail when vma has a vm_ops->close() method, causing is_mergeable_vma() test to be negative. This was happening for vma mapping a file from fuse-overlayfs, which does have the method. But when we are simply expanding the vma, we never remove it due to the "merge" with the added area, so the test should not prevent the expansion. As a quick fix, check for such vmas and expand them using vma_adjust() directly as was done before commit ca3d76b0aa80. For a more robust long term solution we should try to limit the check for vma_ops->close only to cases that actually result in vma removal, so that no merge would be prevented unnecessarily. [[email protected]: fix indenting whitespace, reflow comment] Link: https://lkml.kernel.org/r/[email protected] Fixes: ca3d76b0aa80 ("mm: add merging after mremap resize") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Fabian Vogt <[email protected]> Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359#c35 Tested-by: Fabian Vogt <[email protected]> Cc: Jakub Matěna <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	squashfs: harden sanity check in squashfs_read_xattr_id_table	Fedor Pchelkin	1	-1/+1
	While mounting a corrupted filesystem, a signed integer '*xattr_ids' can become less than zero. This leads to the incorrect computation of 'len' and 'indexes' values which can cause null-ptr-deref in copy_bio_to_actor() or out-of-bounds accesses in the next sanity checks inside squashfs_read_xattr_id_table(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Link: https://lkml.kernel.org/r/[email protected] Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup") Reported-by: <[email protected]> Signed-off-by: Fedor Pchelkin <[email protected]> Signed-off-by: Alexey Khoroshilov <[email protected]> Cc: Phillip Lougher <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	ia64: fix build error due to switch case label appearing next to declaration	James Morse	1	-2/+5
	Since commit aa06a9bd8533 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency"), gcc 10.1.0 fails to build ia64 with the gnomic: \| ../arch/ia64/kernel/sys_ia64.c: In function 'ia64_clock_getres': \| ../arch/ia64/kernel/sys_ia64.c:189:3: error: a label can only be part of a statement and a declaration is not a statement \| 189 \| s64 tick_ns = DIV_ROUND_UP(NSEC_PER_SEC, local_cpu_data->itc_freq); This line appears immediately after a case label in a switch. Move the declarations out of the case, to the top of the function. Link: https://lkml.kernel.org/r/[email protected] Fixes: aa06a9bd8533 ("ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency") Signed-off-by: James Morse <[email protected]> Reviewed-by: Sergei Trofimovich <[email protected]> Cc: Émeric Maschino <[email protected]> Cc: matoro <[email protected]> Cc: John Paul Adrian Glaubitz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	mm: multi-gen LRU: fix crash during cgroup migration	Yu Zhao	1	-1/+4
	lru_gen_migrate_mm() assumes lru_gen_add_mm() runs prior to itself. This isn't true for the following scenario: CPU 1 CPU 2 clone() cgroup_can_fork() cgroup_procs_write() cgroup_post_fork() task_lock() lru_gen_migrate_mm() task_unlock() task_lock() lru_gen_add_mm() task_unlock() And when the above happens, kernel crashes because of linked list corruption (mm_struct->lru_gen.list). Link: https://lore.kernel.org/r/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <[email protected]> Reported-by: msizanoen <[email protected]> Tested-by: msizanoen <[email protected]> Cc: <[email protected]> [6.1+] Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	Revert "mm: add nodes= arg to memory.reclaim"	Michal Hocko	4	-68/+21
	This reverts commit 12a5d3955227b0d7e04fb793ccceeb2a1dd275c5. Although it is recognized that a finer grained pro-active reclaim is something we need and want the semantic of this implementation is really ambiguous. In a follow up discussion it became clear that there are two essential usecases here. One is to use memory.reclaim to pro-actively reclaim memory and expectation is that the requested and reported amount of memory is uncharged from the memcg. Another usecase focuses on pro-active demotion when the memory is merely shuffled around to demotion targets while the overall charged memory stays unchanged. The current implementation considers demoted pages as reclaimed and that break both usecases. [1] has tried to address the reporting part but there are more issues with that summarized in [2] and follow up emails. Let's revert the nodemask based extension of the memcg pro-active reclaim for now until we settle with a more robust semantic. [1] http://lkml.kernel.org/r/http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 12a5d3955227b0d ("mm: add nodes= arg to memory.reclaim") Signed-off-by: Michal Hocko <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: Huang Ying <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Wei Xu <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: zefan li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	zsmalloc: fix a race with deferred_handles storing	Nhat Pham	1	-32/+205
	Currently, there is a race between zs_free() and zs_reclaim_page(): zs_reclaim_page() finds a handle to an allocated object, but before the eviction happens, an independent zs_free() call to the same handle could come in and overwrite the object value stored at the handle with the last deferred handle. When zs_reclaim_page() finally gets to call the eviction handler, it will see an invalid object value (i.e the previous deferred handle instead of the original object value). This race happens quite infrequently. We only managed to produce it with out-of-tree developmental code that triggers zsmalloc writeback with a much higher frequency than usual. This patch fixes this race by storing the deferred handle in the object header instead. We differentiate the deferred handle from the other two cases (handle for allocated object, and linkage for free object) with a new tag. If zspage reclamation succeeds, we will free these deferred handles by walking through the zspage objects. On the other hand, if zspage reclamation fails, we reconstruct the zspage freelist (with the deferred handle tag and allocated tag) before trying again with the reclamation. [[email protected]: avoid unused-function warning] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 9997bc017549 ("zsmalloc: implement writeback mechanism for zsmalloc") Signed-off-by: Nhat Pham <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Vitaly Wool <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	mm/khugepaged: fix ->anon_vma race	Jann Horn	1	-1/+13
	If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires it to be locked. Page table traversal is allowed under any one of the mmap lock, the anon_vma lock (if the VMA is associated with an anon_vma), and the mapping lock (if the VMA is associated with a mapping); and so to be able to remove page tables, we must hold all three of them. retract_page_tables() bails out if an ->anon_vma is attached, but does this check before holding the mmap lock (as the comment above the check explains). If we racily merged an existing ->anon_vma (shared with a child process) from a neighboring VMA, subsequent rmap traversals on pages belonging to the child will be able to see the page tables that we are concurrently removing while assuming that nothing else can access them. Repeat the ->anon_vma check once we hold the mmap lock to ensure that there really is no concurrent page table access. Hitting this bug causes a lockdep warning in collapse_and_free_pmd(), in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)". It can also lead to use-after-free access. Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Jann Horn <[email protected]> Reported-by: Zach O'Keefe <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	maple_tree: fix mas_empty_area_rev() lower bound validation	Liam Howlett	2	-9/+97
	mas_empty_area_rev() was not correctly validating the start of a gap against the lower limit. This could lead to the range starting lower than the requested minimum. Fix the issue by better validating a gap once one is found. This commit also adds tests to the maple tree test suite for this issue and tests the mas_empty_area() function for similar bound checking. Link: https://lkml.kernel.org/r/[email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=216911 Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: <[email protected]> Link: https://lore.kernel.org/linux-mm/[email protected]/ Tested-by: Holger Hoffsttte <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-01-31	Merge tag 'cgroup-for-6.2-fixes' of ↵	Linus Torvalds	2	-1/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "cpuset has a bug which can cause an oops after some configuration operations, introduced during the v6.1 cycle. This single commit fixes the bug" * tag 'cgroup-for-6.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask()
2023-01-31	cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask()	Waiman Long	2	-1/+3
	It was found that the check to see if a partition could use up all the cpus from the parent cpuset in update_parent_subparts_cpumask() was incorrect. As a result, it is possible to leave parent with no effective cpu left even if there are tasks in the parent cpuset. This can lead to system panic as reported in [1]. Fix this probem by updating the check to fail the enabling the partition if parent's effective_cpus is a subset of the child's cpus_allowed. Also record the error code when an error happens in update_prstate() and add a test case where parent partition and child have the same cpu list and parent has task. Enabling partition in the child will fail in this case. [1] https://www.spinics.net/lists/cgroups/msg36254.html Fixes: f0af1bfc27b5 ("cgroup/cpuset: Relax constraints to partition & cpus changes") Cc: [email protected] # v6.1 Reported-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-01-31	Merge tag 'scsi-fixes' of ↵	Linus Torvalds	3	-6/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two core fixes. One simply moves an annotation from put to release to avoid the warning triggering needlessly in alua, but to keep it in case release is ever called from that path (which we don't think will happen). The other reverts a change to the PQ=1 target scanning behaviour that's under intense discussion at the moment" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT" scsi: core: Fix the scsi_device_put() might_sleep annotation
2023-01-31	perf: Fix perf_event_pmu_context serialization	James Clark	3	-22/+57
	Syzkaller triggered a WARN in put_pmu_ctx(). WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278 This is because there is no locking around the access of "if (!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in put_pmu_ctx(). The decrement of the reference count in put_pmu_ctx() also happens outside of the spinlock, leading to the possibility of this order of events, and the context being cleared in put_pmu_ctx(), after its refcount is non zero: CPU0 CPU1 find_get_pmu_context() if (!epc->ctx) == false put_pmu_ctx() atomic_dec_and_test(&epc->refcount) == true epc->refcount == 0 atomic_inc(&epc->refcount); epc->refcount == 1 list_del_init(&epc->pmu_ctx_entry); epc->ctx = NULL; Another issue is that WARN_ON for no active PMU events in put_pmu_ctx() is outside of the lock. If the perf_event_pmu_context is an embedded one, even after clearing it, it won't be deleted and can be re-used. So the warning can trigger. For this reason it also needs to be moved inside the lock. The above warning is very quick to trigger on Arm by running these two commands at the same time: while true; do perf record -- ls; done while true; do perf record -- ls; done [peterz: atomic_dec_and_raw_lock*()] Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: [email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-01-31	Merge tag 'media/v6.2-3' of ↵	Linus Torvalds	2	-4/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "A couple of v4l2 core fixes: - fix a regression on strings control support - fix a regression for some drivers that depend on an odd streaming behavior" * tag 'media/v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: videobuf2: set q->streaming later media: v4l2-ctrls-api.c: move ctrl->is_new = 1 to the correct line
2023-01-31	block: Fix the blk_mq_destroy_queue() documentation	Bart Van Assche	1	-2/+3
	Commit 2b3f056f72e5 moved a blk_put_queue() call from blk_mq_destroy_queue() into its callers. Reflect this change in the documentation block above blk_mq_destroy_queue(). Cc: Christoph Hellwig <[email protected]> Cc: Sagi Grimberg <[email protected]> Cc: Chaitanya Kulkarni <[email protected]> Cc: Keith Busch <[email protected]> Fixes: 2b3f056f72e5 ("blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queue") Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-01-31	s390/decompressor: specify __decompress() buf len to avoid overflow	Vasily Gorbik	1	-1/+1
	Historically calls to __decompress() didn't specify "out_len" parameter on many architectures including s390, expecting that no writes beyond uncompressed kernel image are performed. This has changed since commit 2aa14b1ab2c4 ("zstd: import usptream v1.5.2") which includes zstd library commit 6a7ede3dfccb ("Reduce size of dctx by reutilizing dst buffer (#2751)"). Now zstd decompression code might store literal buffer in the unwritten portion of the destination buffer. Since "out_len" is not set, it is considered to be unlimited and hence free to use for optimization needs. On s390 this might corrupt initrd or ipl report which are often placed right after the decompressor buffer. Luckily the size of uncompressed kernel image is already known to the decompressor, so to avoid the problem simply specify it in the "out_len" parameter. Link: https://github.com/facebook/zstd/commit/6a7ede3dfccb Signed-off-by: Vasily Gorbik <[email protected]> Tested-by: Alexander Egorenkov <[email protected]> Link: https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-here.call-01675030179-ext-9637@work.hours Signed-off-by: Heiko Carstens <[email protected]>
2023-01-31	kunit: fix kunit_test_init_section_suites(...)	Brendan Higgins	1	-1/+0
	Looks like kunit_test_init_section_suites(...) was messed up in a merge conflict. This fixes it. kunit_test_init_section_suites(...) was not updated to avoid the extra level of indirection when .kunit_test_suites was flattened. Given no-one was actively using it, this went unnoticed for a long period of time. Fixes: e5857d396f35 ("kunit: flatten kunit_suite* to kunit_suite in .kunit_test_suites") Signed-off-by: Brendan Higgins <[email protected]> Signed-off-by: David Gow <[email protected]> Tested-by: Martin Fernandez <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2023-02-01	MAINTAINERS: Update OpenRISC mailing list	Stafford Horne	1	-1/+1
	The mailing list at librecores.org is being shut down due to infrastructure issues. Update the the newly created list on vger.kernel.org. Signed-off-by: Stafford Horne <[email protected]>
2023-01-31	block: ublk: extending queue_size to fix overflow	Liu Xiaodong	1	-1/+1
	When validating drafted SPDK ublk target, in a case that assigning large queue depth to multiqueue ublk device, ublk target would run into a weird incorrect state. During rounds of review and debug, An overflow bug was found in ublk driver. In ublk_cmd.h, UBLK_MAX_QUEUE_DEPTH is 4096 which means each ublk queue depth can be set as large as 4096. But when setting qd for a ublk device, sizeof(struct ublk_queue) + depth * sizeof(struct ublk_io) will be larger than 65535 if qd is larger than 2728. Then queue_size is overflowed, and ublk_get_queue() references a wrong pointer position. The wrong content of ublk_queue elements will lead to out-of-bounds memory access. Extend queue_size in ublk_device as "unsigned int". Signed-off-by: Liu Xiaodong <[email protected]> Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-01-31	IB/hfi1: Assign npages earlier	Dean Luick	1	-7/+2
	Improve code clarity and enable earlier use of tidbuf->npages by moving its assignment to structure creation time. Signed-off-by: Dean Luick <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Link: https://lore.kernel.org/r/167329104884.1472990.4639750192433251493.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2023-01-31	vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF	George Kennedy	1	-4/+5
	After a call to console_unlock() in vcs_read() the vc_data struct can be freed by vc_deallocate(). Because of that, the struct vc_data pointer load must be done at the top of while loop in vcs_read() to avoid a UAF when vcs_size() is called. Syzkaller reported a UAF in vcs_size(). BUG: KASAN: use-after-free in vcs_size (drivers/tty/vt/vc_screen.c:215) Read of size 4 at addr ffff8881137479a8 by task 4a005ed81e27e65/1537 CPU: 0 PID: 1537 Comm: 4a005ed81e27e65 Not tainted 6.2.0-rc5 #1 Hardware name: Red Hat KVM, BIOS 1.15.0-2.module Call Trace: <TASK> __asan_report_load4_noabort (mm/kasan/report_generic.c:350) vcs_size (drivers/tty/vt/vc_screen.c:215) vcs_read (drivers/tty/vt/vc_screen.c:415) vfs_read (fs/read_write.c:468 fs/read_write.c:450) ... </TASK> Allocated by task 1191: ... kmalloc_trace (mm/slab_common.c:1069) vc_allocate (./include/linux/slab.h:580 ./include/linux/slab.h:720 drivers/tty/vt/vt.c:1128 drivers/tty/vt/vt.c:1108) con_install (drivers/tty/vt/vt.c:3383) tty_init_dev (drivers/tty/tty_io.c:1301 drivers/tty/tty_io.c:1413 drivers/tty/tty_io.c:1390) tty_open (drivers/tty/tty_io.c:2080 drivers/tty/tty_io.c:2126) chrdev_open (fs/char_dev.c:415) do_dentry_open (fs/open.c:883) vfs_open (fs/open.c:1014) ... Freed by task 1548: ... kfree (mm/slab_common.c:1021) vc_port_destruct (drivers/tty/vt/vt.c:1094) tty_port_destructor (drivers/tty/tty_port.c:296) tty_port_put (drivers/tty/tty_port.c:312) vt_disallocate_all (drivers/tty/vt/vt_ioctl.c:662 (discriminator 2)) vt_ioctl (drivers/tty/vt/vt_ioctl.c:903) tty_ioctl (drivers/tty/tty_io.c:2776) ... The buggy address belongs to the object at ffff888113747800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 424 bytes inside of 1024-byte region [ffff888113747800, ffff888113747c00) The buggy address belongs to the physical page: page:00000000b3fe6c7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113740 head:00000000b3fe6c7c order:3 compound_mapcount:0 subpages_mapcount:0 compound_pincount:0 anon flags: 0x17ffffc0010200(slab\|head\|node=0\|zone=2\|lastcpupid=0x1fffff) raw: 0017ffffc0010200 ffff888100042dc0 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888113747880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888113747900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff888113747980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888113747a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888113747a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint Fixes: ac751efa6a0d ("console: rename acquire/release_console_sem() to console_lock/unlock()") Reported-by: syzkaller <[email protected]> Suggested-by: Jiri Slaby <[email protected]> Signed-off-by: George Kennedy <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-01-31	RDMA/umem: Use dma-buf locked API to solve deadlock	Maor Gottlieb	1	-4/+4
	The cited commit moves umem to call the unlocked versions of dmabuf unmap/map attachment, but the lock is held while calling to these functions, hence move back to the locked versions of these APIs. Fixes: 21c9c5c0784f ("RDMA/umem: Prepare to dynamic dma-buf locking specification") Link: https://lore.kernel.org/r/311c2cb791f8af75486df446819071357353db1b.1675088709.git.leon@kernel.org Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Dmitry Osipenko <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>