blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-03-30	Merge tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme into block-6.3	Jens Axboe	2	-21/+28
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - mark Lexar NM760 as IGNORE_DEV_SUBNQN (Juraj Pecigos) - fix a possible UAF when failing to allocate an TCP io queue (Sagi Grimberg)" * tag 'nvme-6.3-2023-03-31' of git://git.infradead.org/nvme: nvme-tcp: fix a possible UAF when failing to allocate an io queue nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
2023-03-30	nvme-tcp: fix a possible UAF when failing to allocate an io queue	Sagi Grimberg	1	-20/+26
	When we allocate a nvme-tcp queue, we set the data_ready callback before we actually need to use it. This creates the potential that if a stray controller sends us data on the socket before we connect, we can trigger the io_work and start consuming the socket. In this case reported: we failed to allocate one of the io queues, and as we start releasing the queues that we already allocated, we get a UAF [1] from the io_work which is running before it should really. Fix this by setting the socket ops callbacks only before we start the queue, so that we can't accidentally schedule the io_work in the initialization phase before the queue started. While we are at it, rename nvme_tcp_restore_sock_calls to pair with nvme_tcp_setup_sock_ops. [1]: [16802.107284] nvme nvme4: starting error recovery [16802.109166] nvme nvme4: Reconnecting in 10 seconds... [16812.173535] nvme nvme4: failed to connect socket: -111 [16812.173745] nvme nvme4: Failed reconnect attempt 1 [16812.173747] nvme nvme4: Reconnecting in 10 seconds... [16822.413555] nvme nvme4: failed to connect socket: -111 [16822.413762] nvme nvme4: Failed reconnect attempt 2 [16822.413765] nvme nvme4: Reconnecting in 10 seconds... [16832.661274] nvme nvme4: creating 32 I/O queues. [16833.919887] BUG: kernel NULL pointer dereference, address: 0000000000000088 [16833.920068] nvme nvme4: Failed reconnect attempt 3 [16833.920094] #PF: supervisor write access in kernel mode [16833.920261] nvme nvme4: Reconnecting in 10 seconds... [16833.920368] #PF: error_code(0x0002) - not-present page [16833.921086] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] [16833.921191] RIP: 0010:_raw_spin_lock_bh+0x17/0x30 ... [16833.923138] Call Trace: [16833.923271] <TASK> [16833.923402] lock_sock_nested+0x1e/0x50 [16833.923545] nvme_tcp_try_recv+0x40/0xa0 [nvme_tcp] [16833.923685] nvme_tcp_io_work+0x68/0xa0 [nvme_tcp] [16833.923824] process_one_work+0x1e8/0x390 [16833.923969] worker_thread+0x53/0x3d0 [16833.924104] ? process_one_work+0x390/0x390 [16833.924240] kthread+0x124/0x150 [16833.924376] ? set_kthread_struct+0x50/0x50 [16833.924518] ret_from_fork+0x1f/0x30 [16833.924655] </TASK> Reported-by: Yanjun Zhang <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Tested-by: Yanjun Zhang <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-28	nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN	Juraj Pecigos	1	-1/+2
	A system with more than one of these SSDs will only have one usable. The kernel fails to detect more than one nvme device due to duplicate cntlids. before: [ 9.395229] nvme 0000:01:00.0: platform quirk: setting simple suspend [ 9.395262] nvme nvme0: pci function 0000:01:00.0 [ 9.395282] nvme 0000:03:00.0: platform quirk: setting simple suspend [ 9.395305] nvme nvme1: pci function 0000:03:00.0 [ 9.409873] nvme nvme0: Duplicate cntlid 1 with nvme1, subsys nqn.2022-07.com.siliconmotion:nvm-subsystem-sn- , rejecting [ 9.409982] nvme nvme0: Removing after probe failure status: -22 [ 9.427487] nvme nvme1: allocated 64 MiB host memory buffer. [ 9.445088] nvme nvme1: 16/0/0 default/read/poll queues [ 9.449898] nvme nvme1: Ignoring bogus Namespace Identifiers after: [ 1.161890] nvme 0000:01:00.0: platform quirk: setting simple suspend [ 1.162660] nvme nvme0: pci function 0000:01:00.0 [ 1.162684] nvme 0000:03:00.0: platform quirk: setting simple suspend [ 1.162707] nvme nvme1: pci function 0000:03:00.0 [ 1.191354] nvme nvme0: allocated 64 MiB host memory buffer. [ 1.193378] nvme nvme1: allocated 64 MiB host memory buffer. [ 1.211044] nvme nvme1: 16/0/0 default/read/poll queues [ 1.211080] nvme nvme0: 16/0/0 default/read/poll queues [ 1.216145] nvme nvme0: Ignoring bogus Namespace Identifiers [ 1.216261] nvme nvme1: Ignoring bogus Namespace Identifiers Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue. Signed-off-by: Juraj Pecigos <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-27	loop: LOOP_CONFIGURE: send uevents for partitions	Alyssa Ross	1	-9/+9
	LOOP_CONFIGURE is, as far as I understand it, supposed to be a way to combine LOOP_SET_FD and LOOP_SET_STATUS64 into a single syscall. When using LOOP_SET_FD+LOOP_SET_STATUS64, a single uevent would be sent for each partition found on the loop device after the second ioctl(), but when using LOOP_CONFIGURE, no such uevent was being sent. In the old setup, uevents are disabled for LOOP_SET_FD, but not for LOOP_SET_STATUS64. This makes sense, as it prevents uevents being sent for a partially configured device during LOOP_SET_FD - they're only sent at the end of LOOP_SET_STATUS64. But for LOOP_CONFIGURE, uevents were disabled for the entire operation, so that final notification was never issued. To fix this, reduce the critical section to exclude the loop_reread_partitions() call, which causes the uevents to be issued, to after uevents are re-enabled, matching the behaviour of the LOOP_SET_FD+LOOP_SET_STATUS64 combination. I noticed this because Busybox's losetup program recently changed from using LOOP_SET_FD+LOOP_SET_STATUS64 to LOOP_CONFIGURE, and this broke my setup, for which I want a notification from the kernel any time a new partition becomes available. Signed-off-by: Alyssa Ross <[email protected]> [hch: reduced the critical section] Signed-off-by: Christoph Hellwig <[email protected]> Fixes: 3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-23	Merge tag 'nvme-6.3-2023-03-23' of git://git.infradead.org/nvme into block-6.3	Jens Axboe	2	-3/+5
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - send Identify with CNS 06h only to I/O controllers (Martin George) - fix nvme_tcp_term_pdu to match spec (Caleb Sander)" * tag 'nvme-6.3-2023-03-23' of git://git.infradead.org/nvme: nvme-tcp: fix nvme_tcp_term_pdu to match spec nvme: send Identify with CNS 06h only to I/O controllers
2023-03-22	nvme-tcp: fix nvme_tcp_term_pdu to match spec	Caleb Sander	1	-2/+3
	The FEI field of C2HTermReq/H2CTermReq is 4 bytes but not 4-byte-aligned in the NVMe/TCP specification (it is located at offset 10 in the PDU). Split it into two 16-bit integers in struct nvme_tcp_term_pdu so no padding is inserted. There should also be 10 reserved bytes after. There are currently no users of this type. Fixes: fc221d05447aa6db ("nvme-tcp: Add protocol header") Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Caleb Sander <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-22	nvme: send Identify with CNS 06h only to I/O controllers	Martin George	1	-1/+2
	Identify CNS 06h (I/O Command Set Specific Identify Controller data structure) is supported only on i/o controllers. But nvme_init_non_mdts_limits() currently invokes this on all controllers. Correct this by ensuring this is sent to I/O controllers only. Signed-off-by: Martin George <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-20	block/io_uring: pass in issue_flags for uring_cmd task_work handling	Jens Axboe	4	-28/+38
	io_uring_cmd_done() currently assumes that the uring_lock is held when invoked, and while it generally is, this is not guaranteed. Pass in the issue_flags associated with it, so that we have IO_URING_F_UNLOCKED available to be able to lock the CQ ring appropriately when completing events. Cc: [email protected] Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Jens Axboe <[email protected]>
2023-03-18	block: ublk_drv: mark device as LIVE before adding disk	Ming Lei	1	-1/+2
	IO can be started before add_disk() returns, such as reading parititon table, then the monitor work should work for making forward progress. So mark device as LIVE before adding disk, meantime change to DEAD if add_disk() fails. Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reviewed-by: Ziyang Zhang <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-16	block: remove obsolete config BLOCK_COMPAT	Lukas Bulwahn	1	-3/+0
	Before commit bdc1ddad3e5f ("compat_ioctl: block: move blkdev_compat_ioctl() into ioctl.c"), the config BLOCK_COMPAT was used to include compat_ioctl.c into the kernel build. With this commit, the code is moved into ioctl.c and included with the config COMPAT. So, since then, the config BLOCK_COMPAT has no effect and any further purpose. Remove this obsolete config BLOCK_COMPAT. Signed-off-by: Lukas Bulwahn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-16	Merge tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme into block-6.3	Jens Axboe	6	-20/+63
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - avoid potential UAF in nvmet_req_complete (Damien Le Moal) - more quirks (Elmer Miroslav Mosher Golovin, Philipp Geulen) - fix a memory leak in the nvme-pci probe teardown path (Irvin Cote) - repair the MAINTAINERS entry (Lukas Bulwahn) - fix handling single range discard request (Ming Lei) - show more opcode names in trace events (Minwoo Im) - fix nvme-tcp timeout reporting (Sagi Grimberg)" * tag 'nvme-6.3-2022-03-16' of git://git.infradead.org/nvme: nvmet: avoid potential UAF in nvmet_req_complete() nvme-trace: show more opcode names nvme-tcp: add nvme-tcp pdu size build protection nvme-tcp: fix opcode reporting in the timeout handler nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620 nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000 nvme-pci: fixing memory leak in probe teardown path nvme: fix handling single range discard request MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS
2023-03-15	Merge branch 'md-fixes' of ↵	Jens Axboe	2	-9/+12
	https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.3 Pull MD fixes from Song: "This set contains two fixes for old issues (by Neil) and one fix for 6.3 (by Xiao)." * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: select BLOCK_LEGACY_AUTOLOAD md: avoid signed overflow in slot_store() md: Free resources in __md_stop
2023-03-15	md: select BLOCK_LEGACY_AUTOLOAD	NeilBrown	1	-0/+4
	When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to activate new arrays unless "CREATE names=yes" appears in mdadm.conf As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD for when MD is selected - at least until mdadm is updated and the updates widely available. Cc: [email protected] # v5.18+ Fixes: fbdee71bb5d8 ("block: deprecate autoloading based on dev_t") Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Song Liu <[email protected]>
2023-03-15	block: count 'ios' and 'sectors' when io is done for bio-based device	Yu Kuai	4	-20/+15
	While using iostat for raid, I observed very strange 'await' occasionally, and turns out it's due to that 'ios' and 'sectors' is counted in bdev_start_io_acct(), while 'nsecs' is counted in bdev_end_io_acct(). I'm not sure why they are ccounted like that but I think this behaviour is obviously wrong because user will get wrong disk stats. Fix the problem by counting 'ios' and 'sectors' when io is done, like what rq-based device does. Fixes: 394ffa503bc4 ("blk: introduce generic io stat accounting help function") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15	block: sunvdc: add check for mdesc_grab() returning NULL	Liang He	1	-0/+2
	In vdc_port_probe(), we should check the return value of mdesc_grab() as it may return NULL, which can cause potential NPD bug. Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.") Signed-off-by: Liang He <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: style cleanup] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15	nvmet: avoid potential UAF in nvmet_req_complete()	Damien Le Moal	1	-1/+3
	An nvme target ->queue_response() operation implementation may free the request passed as argument. Such implementation potentially could result in a use after free of the request pointer when percpu_ref_put() is called in nvmet_req_complete(). Avoid such problem by using a local variable to save the sq pointer before calling __nvmet_req_complete(), thus avoiding dereferencing the req pointer after that function call. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	nvme-trace: show more opcode names	Minwoo Im	1	-0/+5
	We have more commands to show in the trace. Sync up. Signed-off-by: Minwoo Im <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	nvme-tcp: add nvme-tcp pdu size build protection	Sagi Grimberg	1	-0/+9
	Make sure that we don't somehow mess up the wire structures in the spec. Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	nvme-tcp: fix opcode reporting in the timeout handler	Sagi Grimberg	1	-6/+18
	For non in-capsule writes we reuse the request pdu space for a h2cdata pdu in order to avoid over allocating space (either preallocate or dynamically upon receving an r2t pdu). However if the request times out the core expects to find the opcode in the start of the request, which we override. In order to prevent that, without sacrificing additional 24 bytes per request, we just use the tail of the command pdu space instead (last 24 bytes from the 72 bytes command pdu). That should make the command opcode always available, and we get away from allocating more space. If in the future we would need the last 24 bytes of the nvme command available we would need to allocate a dedicated space for it in the request, but until then we can avoid doing so. Reported-by: Akinobu Mita <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Tested-by: Akinobu Mita <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620	Philipp Geulen	1	-0/+2
	Added a quirk to fix Lexar NM620 1TB SSD reporting duplicate NGUIDs. Signed-off-by: Philipp Geulen <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000	Elmer Miroslav Mosher Golovin	1	-0/+2
	Added a quirk to fix the Netac NV3000 SSD reporting duplicate NGUIDs. Cc: <[email protected]> Signed-off-by: Elmer Miroslav Mosher Golovin <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	nvme-pci: fixing memory leak in probe teardown path	Irvin Cote	1	-0/+1
	In case the nvme_probe teardown path is triggered the ctrl ref count does not reach 0 thus creating a memory leak upon failure of nvme_probe. Signed-off-by: Irvin Cote <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	nvme: fix handling single range discard request	Ming Lei	1	-9/+19
	When investigating one customer report on warning in nvme_setup_discard, we observed the controller(nvme/tcp) actually exposes queue_max_discard_segments(req->q) == 1. Obviously the current code can't handle this situation, since contiguity merge like normal RW request is taken. Fix the issue by building range from request sector/nr_sectors directly. Fixes: b35ba01ea697 ("nvme: support ranged discard requests") Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERS	Lukas Bulwahn	1	-4/+4
	The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit or topgit) and location. Add the SCM tree type to the T: entry, and reorder the file entries in alphabetical order. Fixes: b508fc354f6d ("nvme: update maintainers information") Signed-off-by: Lukas Bulwahn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15	block: null_blk: cleanup null_queue_rq()	Damien Le Moal	1	-15/+14
	Use a local struct request pointer variable to avoid having to dereference struct blk_mq_queue_data multiple times. While at it, also fix the function argument indentation and remove a useless "else" after a return. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Pankaj Raghav <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15	block: null_blk: Fix handling of fake timeout request	Damien Le Moal	1	-3/+3
	When injecting a fake timeout into the null_blk driver using fail_io_timeout, the request timeout handler does not execute blk_mq_complete_request(), so the complete callback is never executed for a timedout request. The null_blk driver also has a driver-specific fake timeout mechanism which does not have this problem. Fix the problem with fail_io_timeout by using the same meachanism as null_blk internal timeout feature, using the fake_timeout field of null_blk commands. Reported-by: Akinobu Mita <[email protected]> Fixes: de3510e52b0a ("null_blk: fix command timeout completion handling") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-14	blk-mq: fix "bad unlock balance detected" on q->srcu in ↵	Chris Leech	1	-2/+3
	__blk_mq_run_dispatch_ops The 'q' parameter of the macro __blk_mq_run_dispatch_ops may not be one local variable, such as, it is rq->q, then request queue pointed by this variable could be changed to another queue in case of BLK_MQ_F_TAG_QUEUE_SHARED after 'dispatch_ops' returns, then 'bad unlock balance' is triggered. Fixes the issue by adding one local variable for doing srcu lock/unlock. Fixes: 2a904d00855f ("blk-mq: remove hctx_lock and hctx_unlock") Cc: Marco Patalano <[email protected]> Signed-off-by: Chris Leech <[email protected]> Signed-off-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-14	loop: Fix use-after-free issues	Bart Van Assche	1	-8/+17
	do_req_filebacked() calls blk_mq_complete_request() synchronously or asynchronously when using asynchronous I/O unless memory allocation fails. Hence, modify loop_handle_cmd() such that it does not dereference 'cmd' nor 'rq' after do_req_filebacked() finished unless we are sure that the request has not yet been completed. This patch fixes the following kernel crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000054 Call trace: css_put.42938+0x1c/0x1ac loop_process_work+0xc8c/0xfd4 loop_rootcg_workfn+0x24/0x34 process_one_work+0x244/0x558 worker_thread+0x400/0x8fc kthread+0x16c/0x1e0 ret_from_fork+0x10/0x20 Cc: Christoph Hellwig <[email protected]> Cc: Ming Lei <[email protected]> Cc: Jan Kara <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Dan Schatzberg <[email protected]> Fixes: c74d40e8b5e2 ("loop: charge i/o to mem and blk cg") Fixes: bc07c10a3603 ("block: loop: support DIO & AIO") Signed-off-by: Bart Van Assche <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-14	block: do not reverse request order when flushing plug list	Jan Kara	2	-2/+9
	Commit 26fed4ac4eab ("block: flush plug based on hardware and software queue order") changed flushing of plug list to submit requests one device at a time. However while doing that it also started using list_add_tail() instead of list_add() used previously thus effectively submitting requests in reverse order. Also when forming a rq_list with remaining requests (in case two or more devices are used), we effectively reverse the ordering of the plug list for each device we process. Submitting requests in reverse order has negative impact on performance for rotational disks (when BFQ is not in use). We observe 10-25% regression in random 4k write throughput, as well as ~20% regression in MariaDB OLTP benchmark on rotational storage on btrfs filesystem. Fix the problem by preserving ordering of the plug list when inserting requests into the queuelist as well as by appending to requeue_list instead of prepending to it. Fixes: 26fed4ac4eab ("block: flush plug based on hardware and software queue order") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-13	md: avoid signed overflow in slot_store()	NeilBrown	1	-0/+3
	slot_store() uses kstrtouint() to get a slot number, but stores the result in an "int" variable (by casting a pointer). This can result in a negative slot number if the unsigned int value is very large. A negative number means that the slot is empty, but setting a negative slot number this way will not remove the device from the array. I don't think this is a serious problem, but it could cause confusion and it is best to fix it. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Song Liu <[email protected]>
2023-03-13	md: Free resources in __md_stop	Xiao Ni	1	-9/+5
	If md_run() fails after ->active_io is initialized, then percpu_ref_exit is called in error path. However, later md_free_disk will call percpu_ref_exit again which leads to a panic because of null pointer dereference. It can also trigger this bug when resources are initialized but are freed in error path, then will be freed again in md_free_disk. BUG: kernel NULL pointer dereference, address: 0000000000000038 Oops: 0000 [#1] PREEMPT SMP Workqueue: md_misc mddev_delayed_delete RIP: 0010:free_percpu+0x110/0x630 Call Trace: <TASK> __percpu_ref_exit+0x44/0x70 percpu_ref_exit+0x16/0x90 md_free_disk+0x2f/0x80 disk_release+0x101/0x180 device_release+0x84/0x110 kobject_put+0x12a/0x380 kobject_put+0x160/0x380 mddev_delayed_delete+0x19/0x30 process_one_work+0x269/0x680 worker_thread+0x266/0x640 kthread+0x151/0x1b0 ret_from_fork+0x1f/0x30 For creating raid device, md raid calls do_md_run->md_run, dm raid calls md_run. We alloc those memory in md_run. For stopping raid device, md raid calls do_md_stop->__md_stop, dm raid calls md_stop->__md_stop. So we can free those memory resources in __md_stop. Fixes: 72adae23a72c ("md: Change active_io to percpu") Reported-and-tested-by: Yu Kuai <[email protected]> Signed-off-by: Xiao Ni <[email protected]> Signed-off-by: Song Liu <[email protected]>
2023-03-08	block, bfq: fix uaf for 'stable_merge_bfqq'	Yu Kuai	1	-9/+9
	Before commit fd571df0ac5b ("block, bfq: turn bfqq_data into an array in bfq_io_cq"), process reference is read before bfq_put_stable_ref(), and it's safe if bfq_put_stable_ref() put the last reference, because process reference will be 0 and 'stable_merge_bfqq' won't be accessed in this case. However, the commit changed the order and will cause uaf for 'stable_merge_bfqq'. In order to emphasize that bfq_put_stable_ref() can drop the last reference, fix the problem by moving bfq_put_stable_ref() to the end of bfq_setup_stable_merge(). Fixes: fd571df0ac5b ("block, bfq: turn bfqq_data into an array in bfq_io_cq") Reported-and-tested-by: Shinichiro Kawasaki <[email protected]> Link: https://lore.kernel.org/linux-block/20230307071448.rzihxbm4jhbf5krj@shindev/ Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2023-03-07	docs: sysfs-block: document hidden sysfs entry	Sagi Grimberg	1	-0/+9
	/sys/block/<disk>/hidden is undocumented. Document it. Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-07	block: fix wrong mode for blkdev_put() from disk_scan_partitions()	Yu Kuai	1	-1/+1
	If disk_scan_partitions() is called with 'FMODE_EXCL', blkdev_get_by_dev() will be called without 'FMODE_EXCL', however, follow blkdev_put() is still called with 'FMODE_EXCL', which will cause 'bd_holders' counter to leak. Fix the problem by using the right mode for blkdev_put(). Reported-by: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/T/ Tested-by: Julian Ruess <[email protected]> Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2023-03-05	Linux 6.3-rc1	Linus Torvalds	1	-2/+2

2023-03-05	cpumask: re-introduce constant-sized cpumask optimizations	Linus Torvalds	4	-72/+72
	Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by: Linus Torvalds <[email protected]>
2023-03-05	Merge tag 'v6.3-p2' of ↵	Linus Torvalds	3	-23/+53
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a regression in the caam driver" * tag 'v6.3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: caam - Fix edesc/iv ordering mixup
2023-03-05	Merge tag 'x86-urgent-2023-03-05' of ↵	Linus Torvalds	3	-15/+51
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Thomas Gleixner: "A small set of updates for x86: - Return -EIO instead of success when the certificate buffer for SEV guests is not large enough - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared on return to userspace for performance reasons, but the leaves user space vulnerable to cross-thread attacks which STIBP prevents. Update the documentation accordingly" * tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt/sev-guest: Return -EIO if certificate buffer is not large enough Documentation/hw-vuln: Document the interaction between IBRS and STIBP x86/speculation: Allow enabling STIBP with legacy IBRS
2023-03-05	Merge tag 'irq-urgent-2023-03-05' of ↵	Linus Torvalds	7	-15/+44
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "A set of updates for the interrupt susbsystem: - Prevent possible NULL pointer derefences in irq_data_get_affinity_mask() and irq_domain_create_hierarchy() - Take the per device MSI lock before invoking code which relies on it being hold - Make sure that MSI descriptors are unreferenced before freeing them. This was overlooked when the platform MSI code was converted to use core infrastructure and results in a fals positive warning - Remove dead code in the MSI subsystem - Clarify the documentation for pci_msix_free_irq() - More kobj_type constification" * tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced genirq/msi: Drop dead domain name assignment irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy() genirq/irqdesc: Make kobj_type structures constant PCI/MSI: Clarify usage of pci_msix_free_irq() genirq/msi: Take the per-device MSI lock before validating the control structure genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
2023-03-05	Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds	1	-0/+1
	Pull vfs update from Al Viro: "Adding Christian Brauner as VFS co-maintainer" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Adding VFS co-maintainer
2023-03-05	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds	11	-11/+48
	Pull VM_FAULT_RETRY fixes from Al Viro: "Some of the page fault handlers do not deal with the following case correctly: - handle_mm_fault() has returned VM_FAULT_RETRY - there is a pending fatal signal - fault had happened in kernel mode Correct action in such case is not "return unconditionally" - fatal signals are handled only upon return to userland and something like copy_to_user() would end up retrying the faulting instruction and triggering the same fault again and again. What we need to do in such case is to make the caller to treat that as failed uaccess attempt - handle exception if there is an exception handler for faulting instruction or oops if there isn't one. Over the years some architectures had been fixed and now are handling that case properly; some still do not. This series should fix the remaining ones. Status: - m68k, riscv, hexagon, parisc: tested/acked by maintainers. - alpha, sparc32, sparc64: tested locally - bug has been reproduced on the unpatched kernel and verified to be fixed by this series. - ia64, microblaze, nios2, openrisc: build, but otherwise completely untested" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: openrisc: fix livelock in uaccess nios2: fix livelock in uaccess microblaze: fix livelock in uaccess ia64: fix livelock in uaccess sparc: fix livelock in uaccess alpha: fix livelock in uaccess parisc: fix livelock in uaccess hexagon: fix livelock in uaccess riscv: fix livelock in uaccess m68k: fix livelock in uaccess
2023-03-05	Remove Intel compiler support	Masahiro Yamada	11	-287/+5
	include/linux/compiler-intel.h had no update in the past 3 years. We often forget about the third C compiler to build the kernel. For example, commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") only mentioned GCC and Clang. init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC, and nobody has reported any issue. I guess the Intel Compiler support is broken, and nobody is caring about it. Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is deprecated: $ icc -v icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. icc version 2021.7.0 (gcc version 12.1.0 compatibility) Arnd Bergmann provided a link to the article, "Intel C/C++ compilers complete adoption of LLVM". lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept untouched for better sync with https://github.com/facebook/zstd Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Reviewed-by: Miguel Ojeda <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-03-05	Adding VFS co-maintainer	Al Viro	1	-0/+1
	Acked-by: Christian Brauner <[email protected]> Signed-off-by: Al Viro <[email protected]>
2023-03-04	Merge tag 'i2c-for-6.3-rc1-part2' of ↵	Linus Torvalds	2	-15/+7
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull more i2c updates from Wolfram Sang: "Some improvements/fixes for the newly added GXP driver and a Kconfig dependency fix" * tag 'i2c-for-6.3-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: gxp: fix an error code in probe i2c: gxp: return proper error on address NACK i2c: gxp: remove "empty" switch statement i2c: Disable I2C_APPLE when I2C_PASEMI is a builtin
2023-03-04	mm: avoid gcc complaint about pointer casting	Linus Torvalds	1	-2/+8
	The migration code ends up temporarily stashing information of the wrong type in unused fields of the newly allocated destination folio. That all works fine, but gcc does complain about the pointer type mis-use: mm/migrate.c: In function ‘__migrate_folio_extract’: mm/migrate.c:1050:20: note: randstruct: casting between randomized structure pointer types (ssa): ‘struct anon_vma’ and ‘struct address_space’ 1050 \| anon_vmap = (void )dst->mapping; \| ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~ and gcc is actually right to complain since it really doesn't understand that this is a very temporary special case where this is ok. This could be fixed in different ways by just obfuscating the assignment sufficiently that gcc doesn't see what is going on, but the truly "proper C" way to do this is by explicitly using a union. Using unions for type conversions like this is normally hugely ugly and syntactically nasty, but this really is one of the few cases where we want to make it clear that we're not doing type conversion, we're really re-using the value bit-for-bit just using another type. IOW, this should not become a common pattern, but in this one case using that odd union is probably the best way to document to the compiler what is conceptually going on here. [ Side note: there are valid cases where we convert pointers to other pointer types, notably the whole "folio vs page" situation, where the types actually have fundamental commonalities. The fact that the gcc note is limited to just randomized structures means that we don't see equivalent warnings for those cases, but it migth also mean that we miss other cases where we do play these kinds of dodgy games, and this kind of explicit conversion might be a good idea. ] I verified that at least for an allmodconfig build on x86-64, this generates the exact same code, apart from line numbers and assembler comment changes. Fixes: 64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()") Cc: Huang, Ying <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-03-04	Merge tag 'mm-hotfixes-stable-2023-03-04-13-12' of ↵	Linus Torvalds	19	-82/+147
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. Eight are for MM and seven are for other parts of the kernel. Seven are cc:stable and eight address post-6.3 issues or were judged unsuitable for -stable backporting" * tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: map Dikshita Agarwal's old address to his current one mailmap: map Vikash Garodia's old address to his current one fs/cramfs/inode.c: initialize file_ra_state fs: hfsplus: fix UAF issue in hfsplus_put_super panic: fix the panic_print NMI backtrace setting lib: parser: update documentation for match_NUMBER functions kasan, x86: don't rename memintrinsics in uninstrumented files kasan: test: fix test for new meminstrinsic instrumentation kasan: treat meminstrinsic as builtins in uninstrumented files kasan: emit different calls for instrumentable memintrinsics ocfs2: fix non-auto defrag path not working issue ocfs2: fix defrag path triggering jbd2 ASSERT mailmap: map Georgi Djakov's old Linaro address to his current one mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH mm/damon/paddr: fix missing folio_put() mm/mremap: fix dup_anon_vma() in vma_merge() case 4
2023-03-04	Merge tag 'powerpc-6.3-2' of ↵	Linus Torvalds	3	-8/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Drop orphaned VAS MAINTAINERS entry - Fix build errors with clang and KCSAN - Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together with recordmcount Thanks to Nathan Chancellor. * tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Avoid dead code/data elimination when using recordmcount powerpc/vmlinux.lds: Add .text.asan/tsan sections powerpc: Drop orphaned VAS MAINTAINERS entry
2023-03-04	Merge tag 'sound-fix-6.3-rc1' of ↵	Linus Torvalds	23	-152/+263
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of various small fixes that have been gathered since the last PR. The majority of changes are for ASoC, and there is a small change in ASoC PCM core, but the rest are all for driver- specific fixes / quirks / updates" * tag 'sound-fix-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (32 commits) ALSA: ice1712: Delete unreachable code in aureon_add_controls() ALSA: ice1712: Do not left ice->gpio_mutex locked in aureon_add_controls() ALSA: hda/realtek: Add quirk for HP EliteDesk 800 G6 Tower PC ALSA: hda/realtek: Improve support for Dell Precision 3260 ASoC: mediatek: mt8195: add missing initialization ASoC: mediatek: mt8188: add missing initialization ASoC: amd: yc: Add DMI entries to support HP OMEN 16-n0xxx (8A43) ASoC: zl38060 add gpiolib dependency ASoC: sam9g20ek: Disable capture unless building with microphone input ASoC: mt8192: Fix range for sidetone positive gain ASoC: mt8192: Report an error if when an invalid sidetone gain is written ASoC: mt8192: Fix event generation for controls ASoC: mt8192: Remove spammy log messages ASoC: mchp-pdmc: fix poc noise at capture startup ASoC: dt-bindings: sama7g5-pdmc: add microchip,startup-delay-us binding ASoC: soc-pcm: add option to start DMA after DAI ASoC: mt8183: Fix event generation for I2S DAI operations ASoC: mt8183: Remove spammy logging from I2S DAI driver ASoC: mt6358: Remove undefined HPx Mux enumeration values ASoC: mt6358: Validate Wake on Voice 2 writes ...
2023-03-03	Merge tag 'for-v6.3-part2' of ↵	Linus Torvalds	3	-7/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull more power supply updates from Sebastian Reichel: - Fix DT binding for Richtek RT9467 - Fix a NULL pointer check in the power-supply core - Document meaning of absent "present" property * tag 'for-v6.3-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: dt-bindings: power: supply: Revise Richtek RT9467 compatible name ABI: testing: sysfs-class-power: Document absence of "present" property power: supply: fix null pointer check order in __power_supply_register
2023-03-03	Merge tag '6.3-rc-smb3-client-fixes-part2' of ↵	Linus Torvalds	10	-157/+190
	git://git.samba.org/sfrench/cifs-2.6 Pull more cifs updates from Steve French: - xfstest generic/208 fix (memory leak) - minor netfs fix (to address smatch warning) - a DFS fix for stable - a reconnect race fix - two multichannel fixes - RDMA (smbdirect) fix - two additional writeback fixes from David * tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix memory leak in direct I/O cifs: prevent data race in cifs_reconnect_tcon() cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID iov: Fix netfs_extract_user_to_sg() cifs: Fix cifs_write_back_from_locked_folio() cifs: reuse cifs_match_ipaddr for comparison of dstaddr too cifs: match even the scope id for ipv6 addresses cifs: Fix an uninitialised variable cifs: Add some missing xas_retry() calls