Age | Commit message (Collapse) | Author | Files | Lines |
|
If nvmet_auth_ctrl_hash() fails, return the error code to its callers
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Sandisk SN530 NVMe drives have broken MSIs. On systems without MSI-X
support, all commands time out resulting in the following message:
nvme nvme0: I/O tag 12 (100c) QID 0 timeout, completion polled
These timeouts cause the boot to take an excessively-long time (over 20
minutes) while the initial command queue is flushed.
Address this by adding a quirk for drives with buggy MSIs. The lspci
output for this device (recorded on a system with MSI-X support) is:
02:00.0 Non-Volatile memory controller: Sandisk Corp Device 5008 (rev 01) (prog-if 02 [NVM Express])
Subsystem: Sandisk Corp Device 5008
Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
Memory at f7e00000 (64-bit, non-prefetchable) [size=16K]
Memory at f7e04000 (64-bit, non-prefetchable) [size=256]
Capabilities: [80] Power Management version 3
Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
Capabilities: [b0] MSI-X: Enable+ Count=17 Masked-
Capabilities: [c0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [150] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [1b8] Latency Tolerance Reporting
Capabilities: [300] Secondary PCI Express
Capabilities: [900] L1 PM Substates
Kernel driver in use: nvme
Kernel modules: nvme
Cc: <stable@vger.kernel.org>
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
For devices with virt_boundary limit, the driver may provide zero max
segment size, we have to set it as UINT_MAX at default. Otherwise, it
may cause warning in driver when handling sglist.
Fix it by setting default max segment size as UINT_MAX.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@kernel.org>
Fixes: b561ea56a264 ("block: allow device to have both virt_boundary_mask and max segment size")
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Closes: https://lore.kernel.org/linux-block/7e38b67c-9372-a42d-41eb-abdce33d3372@linux-m68k.org/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240424134722.2584284-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.infradead.org/nvme into block-6.9
Pull NVMe fixes from Keith.
* git://git.infradead.org/nvme:
nvme-tcp: strict pdu pacing to avoid send stalls on TLS
nvmet: fix nvme status code when namespace is disabled
nvmet-tcp: fix possible memory leak when tearing down a controller
nvme: cancel pending I/O if nvme controller is in terminal state
nvmet-auth: replace pr_debug() with pr_err() to report an error.
nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
nvme: find numa distance only if controller has valid numa id
nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
|
|
TLS requires a strict pdu pacing via MSG_EOR to signal the end
of a record and subsequent encryption. If we do not set MSG_EOR
at the end of a sequence the record won't be closed, encryption
doesn't start, and we end up with a send stall as the message
will never be passed on to the TCP layer.
So do not check for the queue status when TLS is enabled but
rather make the MSG_MORE setting dependent on the current
request only.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
If the user disabled a nvmet namespace, it is removed from the subsystem
namespaces list. When nvmet processes a command directed to an nsid that
was disabled, it cannot differentiate between a nsid that is disabled
vs. a non-existent namespace, and resorts to return NVME_SC_INVALID_NS
with the dnr bit set.
This translates to a non-retryable status for the host, which translates
to a user error. We should expect disabled namespaces to not cause an
I/O error in a multipath environment.
Address this by searching a configfs item for the namespace nvmet failed
to find, and if we found one, conclude that the namespace is disabled
(perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR in this case
and keep DNR bit cleared.
Reported-by: Jirong Feng <jirong.feng@easystack.cn>
Tested-by: Jirong Feng <jirong.feng@easystack.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
When we teardown the controller, we wait for pending I/Os to complete
(sq->ref on all queues to drop to zero) and then we go over the commands,
and free their command buffers in case they are still fetching data from
the host (e.g. processing nvme writes) and have yet to take a reference
on the sq.
However, we may miss the case where commands have failed before executing
and are queued for sending a response, but will never occur because the
queue socket is already down. In this case we may miss deallocating command
buffers.
Solve this by freeing all commands buffers as nvmet_tcp_free_cmd_buffers is
idempotent anyways.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
While I/O is running, if the pci bus error occurs then
in-flight I/O can not complete. Worst, if at this time,
user (logically) hot-unplug the nvme disk then the
nvme_remove() code path can't forward progress until
in-flight I/O is cancelled. So these sequence of events
may potentially hang hot-unplug code path indefinitely.
This patch helps cancel the pending/in-flight I/O from the
nvme request timeout handler in case the nvme controller
is in the terminal (DEAD/DELETING/DELETING_NOIO) state and
that helps nvme_remove() code path forward progress and
finish successfully.
Link: https://lore.kernel.org/all/199be893-5dfa-41e5-b6f2-40ac90ebccc4@linux.ibm.com/
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
In nvmet_auth_host_hash(), if a mismatch is detected in the hash length
the kernel should print an error.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
If the nvmet_auth_host_hash() function fails, the error code should
be returned to its callers.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
On system where native nvme multipath is configured and iopolicy
is set to numa but the nvme controller numa node id is undefined
or -1 (NUMA_NO_NODE) then avoid calculating node distance for
finding optimal io path. In such case we may access numa distance
table with invalid index and that may potentially refer to incorrect
memory. So this patch ensures that if the nvme controller numa node
id is -1 then instead of calculating node distance for finding optimal
io path, we set the numa node distance of such controller to default 10
(LOCAL_DISTANCE).
Link: https://lore.kernel.org/all/20240413090614.678353-1-nilay@linux.ibm.com/
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
ublk_drv currently creates block devices with the default max_segments
and max_segment_size limits of BLK_MAX_SEGMENTS (128) and
BLK_MAX_SEGMENT_SIZE (65536) respectively. These defaults can
artificially constrain the I/O size seen by the ublk server - for
example, suppose that the ublk server has configured itself to accept
I/Os up to 1M and the application is also issuing 1M sized I/Os. If the
I/O buffer used by the application is backed by 4K pages, the buffer
could consist of up to 1M / 4K = 256 physically discontiguous segments
(even if the buffer is virtually contiguous). As such, the I/O could
exceed the default max_segments limit and get split. This can cause
unnecessary performance issues if the ublk server is optimized to handle
1M I/Os. The block layer's segment count/size limits exist to model
hardware constraints which don't exist in ublk_drv's case, so just
remove those limits for the block devices created by ublk_drv.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Riley Thomasson <riley@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240430211623.2802036-1-ushankar@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which
is intended to confirm iocg is active when it has debt. However, warn
can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn()
is run at that time:
WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190
Call trace:
iocg_pay_debt+0x14c/0x190
iocg_kick_waitq+0x438/0x4c0
iocg_waitq_timer_fn+0xd8/0x130
__run_hrtimer+0x144/0x45c
__hrtimer_run_queues+0x16c/0x244
hrtimer_interrupt+0x2cc/0x7b0
The warn in this situation is meaningless. Since this iocg is being
removed, the state of the 'active_list' is irrelevant, and 'waitq_timer'
is canceled after removing 'active_list' in ioc_pd_free(), which ensures
iocg is freed after iocg_waitq_timer_fn() returns.
Therefore, add the check if iocg was already offlined to avoid warn
when removing a blkcg or disk.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 4601b4b130de ("block: reopen the device in blkdev_reread_part")
lost the propagation of I/O errors from the low-level read of the
partition table to the user space caller of the BLKRRPART.
Apparently some user space relies on, so restore the propagation. This
isn't exactly pretty as other block device open calls explicitly do not
are about these errors, so add a new BLK_OPEN_STRICT_SCAN to opt into
the error propagation.
Fixes: 4601b4b130de ("block: reopen the device in blkdev_reread_part")
Reported-by: Saranya Muruganandam <saranyamohan@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20240417144743.2277601-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
While monitoring the throttle time of IO from iocost, it's found that
such time is always zero after the io_schedule() from ioc_rqos_throttle,
for example, with the following debug patch:
+ printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
while (true) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (wait.committed)
break;
io_schedule();
}
+ printk("%s-%d: %s exit %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
It can be observerd that blk_time_get_ns() always return the same time:
[ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.272587] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.472690] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.672656] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.872655] fio-1268: ioc_rqos_throttle exit 1067901962288
And I think the root cause is that 'PF_BLOCK_TS' is always cleared
by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
will never be called:
blk_time_get_ns
plug->cur_ktime = ktime_get_ns();
current->flags |= PF_BLOCK_TS;
io_schedule:
io_schedule_prepare
blk_flush_plug
__blk_flush_plug
/* the flag is cleared, while time is not */
current->flags &= ~PF_BLOCK_TS;
schedule
sched_update_worker
/* the flag is not set, hence plug->cur_ktime is not cleared */
if (tsk->flags & PF_BLOCK_TS)
blk_plug_invalidate_ts()
blk_time_get_ns
/* got the time stashed before schedule */
return plug->cur_ktime;
Fix the problem by clearing cached time in __blk_flush_plug().
Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240411032349.3051233-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move the stray '.' that is currently at the end of the line after
newline '\n' to before newline character which is the right position.
Fixes: ce8d78616a6b ("nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH")
Signed-off-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.9
Pull MD fix from Song:
"This change, by Yu Kuai, fixes a UAF in a corner case."
* tag 'md-6.9-20240408' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
raid1: fix use-after-free for original bio in raid1_write_request()
|
|
When one stacking device is over one device with virt_boundary_mask and
another one with max segment size, the stacking device have both limits
set. This way is allowed before d690cb8ae14b ("block: add an API to
atomically update queue limits").
Relax the limit so that we won't break such kind of stacking setting.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218687
Reported-by: janpieter.sollie@edpnet.be
Fixes: d690cb8ae14b ("block: add an API to atomically update queue limits")
Link: https://lore.kernel.org/linux-block/ZfGl8HzUpiOxCLm3@fedora/
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: dm-devel@lists.linux.dev
Cc: Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20240407131931.4055231-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Multiple gendisk instances can allocated/added for single request queue
in case of disk rebind. blkg may still stay in q->blkg_list when calling
blkcg_init_disk() for rebind, then q->blkg_list becomes corrupted.
Fix the list corruption issue by:
- add blkg_init_queue() to initialize q->blkg_list & q->blkcg_mutex only
- move calling blkg_init_queue() into blk_alloc_queue()
The list corruption should be started since commit f1c006f1c685 ("blk-cgroup:
synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
which delays removing blkg from q->blkg_list into blkg_free_workfn().
Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Fixes: 1059699f87eb ("block: move blkcg initialization/destroy into disk allocation/release handler")
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240407125910.4053377-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
UBSAN catches undefined behavior in blk-iocost, where sometimes
iocg->delay is shifted right by a number that is too large,
resulting in undefined behavior on some architectures.
[ 186.556576] ------------[ cut here ]------------
UBSAN: shift-out-of-bounds in block/blk-iocost.c:1366:23
shift exponent 64 is too large for 64-bit type 'u64' (aka 'unsigned long long')
CPU: 16 PID: 0 Comm: swapper/16 Tainted: G S E N 6.9.0-0_fbk700_debug_rc2_kbuilder_0_gc85af715cac0 #1
Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A23 12/08/2020
Call Trace:
<IRQ>
dump_stack_lvl+0x8f/0xe0
__ubsan_handle_shift_out_of_bounds+0x22c/0x280
iocg_kick_delay+0x30b/0x310
ioc_timer_fn+0x2fb/0x1f80
__run_timer_base+0x1b6/0x250
...
Avoid that undefined behavior by simply taking the
"delay = 0" branch if the shift is too large.
I am not sure what the symptoms of an undefined value
delay will be, but I suspect it could be more than a
little annoying to debug.
Signed-off-by: Rik van Riel <riel@surriel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240404123253.0f58010f@imladris.surriel.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.9
- Atomic queue limits fixes (Christoph)
- Fabrics fixes (Hannes, Daniel)"
* tag 'nvme-6.9-2024-04-04' of git://git.infradead.org/nvme:
nvme-fc: rename free_ctrl callback to match name pattern
nvmet-fc: move RCU read lock to nvmet_fc_assoc_exists
nvmet: implement unique discovery NQN
nvme: don't create a multipath node for zero capacity devices
nvme: split nvme_update_zone_info
nvme-multipath: don't inherit LBA-related fields for the multipath node
|
|
Rename nvme_fc_nvme_ctrl_freed to nvme_fc_free_ctrl to match the name
pattern for the callback.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
The RCU lock is only needed for the lookup loop and not for
list_ad_tail_rcu call. Thus move it down the call chain into
nvmet_fc_assoc_exists.
While at it also fix the name typo of the function.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Unique discovery NQNs allow to differentiate between discovery
services from (typically physically separate) NVMe-oF subsystems.
This is required for establishing secured connections as otherwise
the credentials won't be unique and the integrity of the connection
cannot be guaranteed.
This patch adds a configfs attribute 'discovery_nqn' in the 'nvmet'
configfs directory to specify the unique discovery NQN.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Apparently there are nvme controllers around that report namespaces
in the namespace list which have zero capacity. Return -ENXIO instead
of -ENODEV from nvme_update_ns_info_block so we don't create a hidden
multipath node for these namespaces but entirely ignore them.
Fixes: 46e7422cda84 ("nvme: move common logic into nvme_update_ns_info")
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
nvme_update_zone_info does (admin queue) I/O to the device and can fail.
We fail to abort the queue limits update if that happen, but really
should avoid with the frozen I/O queue as much as possible anyway.
Split the logic into a helper to query the information that can be
called on an unfrozen queue and one to apply it to the queue limits.
Fixes: 9b130d681443 ("nvme: use the atomic queue limits update API")
Reported-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Linux 6.9 made the nvme multipath nodes not properly pick up changes when
the LBA size goes smaller after an nvme format. This is because we now
try to inherit the queue settings for the multipath node entirely from
the individual paths. That is the right thing to do for I/O size
limitations, which make up most of the queue limits, but it is wrong for
changes to the namespace configuration, where we do want to pick up the
new format, which will eventually show up on all paths once they are
re-queried.
Fix this by not inheriting the block size and related fields and always
for updating them.
Fixes: 8f03cfa117e0 ("nvme: don't use nvme_update_disk_info for the multipath disk")
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Tested-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
There is no check for overflow of 'start + len' in blk_ioctl_discard().
Hung task occurs if submit an discard ioctl with the following param:
start = 0x80000000000ff000, len = 0x8000000000fff000;
Add the overflow validation now.
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240329012319.2034550-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In null_add_dev(), if an error happen after initializing the resources
for a zoned null block device, we must free these resources before
exiting the function. To ensure this, move the out_cleanup_zone label
after out_cleanup_disk as we jump to this latter label if an error
happens after calling null_init_zoned_dev().
Fixes: e440626b1caf ("null_blk: pass queue_limits to blk_mq_alloc_disk")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240330005300.1503252-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Deduplicate Kconfig entries for CONFIG_CXL_PMU
- Fix unselectable choice entry in MIPS Kconfig, and forbid this
structure
- Remove unused include/asm-generic/export.h
- Fix a NULL pointer dereference bug in modpost
- Enable -Woverride-init warning consistently with W=1
- Drop KCSAN flags from *.mod.c files
* tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: Fix typo HEIGTH to HEIGHT
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
kbuild: make -Woverride-init warnings more consistent
modpost: do not make find_tosym() return NULL
export.h: remove include/asm-generic/export.h
kconfig: do not reparent the menu inside a choice block
MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix more issues in the AMD FMPM driver
* tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
RAS: Avoid build errors when CONFIG_DEBUG_FS=n
RAS/AMD/FMPM: Safely handle saved records of various sizes
RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix an unused function warning on irqchip/irq-armada-370-xp
- Fix the IRQ sharing with pinctrl-amd and ACPI OSL
* tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/armada-370-xp: Suppress unused-function warning
genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Define the correct set of default hw events on AMD Zen4
- Use the correct stalled cycles PMCs on AMD Zen2 and newer
- Fix detection of the LBR freeze feature on AMD
* tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later
perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later
perf/x86/amd/lbr: Use freeze based on availability
x86/cpufeatures: Add new word for scattered features
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers update from Borislav Petkov:
- Volunteer in Anna-Maria and Frederic as timers co-maintainers so that
tglx can relax more :-P
* tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add co-maintainers for time[rs]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Fix a format specifier build error in objtool during an x32 build
* tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix compile failure when using the x32 compiler
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure single object builds in arch/x86/virt/ ala
make ... arch/x86/virt/vmx/tdx/seamcall.o
work again
- Do not do ROM range scans and memory validation when the kernel is
running as a SEV-SNP guest as those can get problematic and, before
that, are not really needed in such a guest
- Exclude the build-time generated vdso-image-x32.o object from objtool
validation and in particular the return sites in there due to a
warning which fires when an unpatched return thunk is being used
- Improve the NMI CPUs stall message to show additional information
about the state of each CPU wrt the NMI handler
- Enable gcc named address spaces support only on !KCSAN configs due to
compiler options incompatibility
- Revert a change which was trying to use GB pages for mapping regions
only when the regions would be large enough but that change lead to
kexec failing
- A documentation fixlet
* tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Use obj-y to descend into arch/x86/virt/
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
x86/vdso: Fix rethunk patching for vdso-image-x32.o too
x86/nmi: Upgrade NMI backtrace stall checks & messages
x86/percpu: Disable named address spaces for KCSAN
Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Documentation/x86: Fix title underline length
|
|
Fixed a typo in some variables where height was misspelled as heigth.
Signed-off-by: Isak Ellmer <isak01@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
As of the first s390 pull request during the 6.9 merge window,
commit 691632f0e869 ("Merge tag 's390-6.9-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux"), s390 can be
built with LLVM=1 when using LLVM 18.1.0, which is the first version
that has SystemZ support implemented in ld.lld and llvm-objcopy.
Update the supported architectures table in the Kbuild LLVM
documentation to note this explicitly to make it more discoverable by
users and other developers. Additionally, this brings s390 in line with
the rest of the architectures in the table, which all support LLVM=1.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When KCSAN and CONSTRUCTORS are enabled, one can trigger the
"Unpatched return thunk in use. This should not happen!"
catch-all warning.
Usually, when objtool runs on the .o objects, it does generate a section
.return_sites which contains all offsets in the objects to the return
thunks of the functions present there. Those return thunks then get
patched at runtime by the alternatives.
KCSAN and CONSTRUCTORS add this to the object file's .text.startup
section:
-------------------
Disassembly of section .text.startup:
...
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
which, if it is built as a module goes through the intermediary stage of
creating a <module>.mod.c file which, when translated, receives a second
constructor:
-------------------
Disassembly of section .text.startup:
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
...
0000000000000030 <_sub_I_00099_0>:
30: f3 0f 1e fa endbr64
34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
35: R_X86_64_PLT32 __tsan_init-0x4
39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
3a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
in the .ko file.
Objtool has run already so that second constructor's return thunk cannot
be added to the .return_sites section and thus the return thunk remains
unpatched and the warning rightfully fires.
Drop KCSAN flags from the mod.c generation stage as those constructors
do not contain data races one would be interested about.
Debugged together with David Kaplan <David.Kaplan@amd.com> and Nikolay
Borisov <nik.borisov@suse.com>.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/r/0851a207-7143-417e-be31-8bf2b3afb57d@molgen.mpg.de
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The -Woverride-init warn about code that may be intentional or not,
but the inintentional ones tend to be real bugs, so there is a bit of
disagreement on whether this warning option should be enabled by default
and we have multiple settings in scripts/Makefile.extrawarn as well as
individual subsystems.
Older versions of clang only supported -Wno-initializer-overrides with
the same meaning as gcc's -Woverride-init, though all supported versions
now work with both. Because of this difference, an earlier cleanup of
mine accidentally turned the clang warning off for W=1 builds and only
left it on for W=2, while it's still enabled for gcc with W=1.
There is also one driver that only turns the warning off for newer
versions of gcc but not other compilers, and some but not all the
Makefiles still use a cc-disable-warning conditional that is no
longer needed with supported compilers here.
Address all of the above by removing the special cases for clang
and always turning the warning off unconditionally where it got
in the way, using the syntax that is supported by both compilers.
Fixes: 2cd3271b7a31 ("kbuild: avoid duplicate warning options")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When compiling the v6.9-rc1 kernel with the x32 compiler, the following
errors are reported. The reason is that we take an "unsigned long"
variable and print it using "PRIx64" format string.
In file included from check.c:16:
check.c: In function ‘add_dead_ends’:
/usr/src/git/linux-2.6/tools/objtool/include/objtool/warn.h:46:17: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘long unsigned int’ [-Werror=format=]
46 | "%s: warning: objtool: " format "\n", \
| ^~~~~~~~~~~~~~~~~~~~~~~~
check.c:613:33: note: in expansion of macro ‘WARN’
613 | WARN("can't find unreachable insn at %s+0x%" PRIx64,
| ^~~~
...
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-kernel@vger.kernel.org
|
|
Pull xfs fixes from Chandan Babu:
- Allow stripe unit/width value passed via mount option to be written
over existing values in the super block
- Do not set current->journal_info to avoid its value from being miused
by another filesystem context
* tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't use current->journal_info
xfs: allow sunit mount option to repair bad primary sb stripe values
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes and updates from James Bottomley:
"Fully half this pull is updates to lpfc and qla2xxx which got
committed just as the merge window opened. A sizeable fraction of the
driver updates are simple bug fixes (and lock reworks for bug fixes in
the case of lpfc), so rather than splitting the few actual
enhancements out, we're just adding the drivers to the -rc1 pull.
The enhancements for lpfc are log message removals, copyright updates
and three patches redefining types. For qla2xxx it's just removing a
debug message on module removal and the manufacturer detail update.
The two major fixes are the sg teardown race and a core error leg
problem with the procfs directory not being removed if we destroy a
created host that never got to the running state. The rest are minor
fixes and constifications"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (41 commits)
scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload
scsi: core: Fix unremoved procfs host directory regression
scsi: mpi3mr: Avoid memcpy field-spanning write WARNING
scsi: sd: Fix TCG OPAL unlock on system resume
scsi: sg: Avoid sg device teardown race
scsi: lpfc: Copyright updates for 14.4.0.1 patches
scsi: lpfc: Update lpfc version to 14.4.0.1
scsi: lpfc: Define types in a union for generic void *context3 ptr
scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr
scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr
scsi: lpfc: Use a dedicated lock for ras_fwlog state
scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()
scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic
scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling
scsi: lpfc: Move NPIV's transport unregistration to after resource clean up
scsi: lpfc: Remove unnecessary log message in queuecommand path
scsi: qla2xxx: Update version to 10.02.09.200-k
scsi: qla2xxx: Delay I/O Abort on PCI error
scsi: qla2xxx: Change debug message during driver unload
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"A fix from Andi for I2C host drivers"
* tag 'i2c-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Fix a refactoring that broke a touchpad on Lenovo P1
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a bunch of small USB fixes for reported problems and
regressions for 6.9-rc2. Included in here are:
- deadlock fixes for long-suffering issues
- USB phy driver revert for reported problem
- typec fixes for reported problems
- duplicate id in dwc3 dropped
- dwc2 driver fixes
- udc driver warning fix
- cdc-wdm race bugfix
- other tiny USB bugfixes
All of these have been in linux-next this past week with no reported
issues"
* tag 'usb-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
USB: core: Fix deadlock in port "disable" sysfs attribute
USB: core: Add hub_get() and hub_put() routines
usb: typec: ucsi: Check capabilities before cable and identity discovery
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
usb: typec: ucsi_acpi: Refactor and fix DELL quirk
usb: typec: ucsi: Ack unsupported commands
usb: typec: ucsi: Check for notifications after init
usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
usb: typec: Return size of buffer if pd_set operation succeeds
usb: udc: remove warning when queue disabled ep
usb: dwc3: pci: Drop duplicate ID
usb: dwc3: Properly set system wakeup
Revert "usb: phy: generic: Get the vbus supply"
usb: cdc-wdm: close race between read and workqueue
usb: dwc2: gadget: LPM flow fix
usb: dwc2: gadget: Fix exiting from clock gating
usb: dwc2: host: Fix ISOC flow in DDMA mode
usb: dwc2: host: Fix remote wakeup from hibernation
usb: dwc2: host: Fix hibernation flow
USB: core: Fix deadlock in usb_deauthorize_interface()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for the vc04_services driver
that resolve reported problems:
- strncpy fix for information leak
- another information leak discovered by the previous strncpy fix
Both of these have been in linux-next all this past week with no
reported issues"
* tag 'staging-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vc04_services: fix information leak in create_component()
staging: vc04_services: changen strncpy() to strscpy_pad()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
One fix in the i801 driver where a bug caused touchpad
malfunctions on some Lenovo P1 models by incorrectly overwriting
a status variable during successful SMBUS transactions.
|
|
Commit c33621b4c5ad ("x86/virt/tdx: Wire up basic SEAMCALL functions")
introduced a new instance of core-y instead of the standardized obj-y
syntax.
X86 Makefiles descend into subdirectories of arch/x86/virt inconsistently;
into arch/x86/virt/ via core-y defined in arch/x86/Makefile, but into
arch/x86/virt/svm/ via obj-y defined in arch/x86/Kbuild.
This is problematic when you build a single object in parallel because
multiple threads attempt to build the same file.
$ make -j$(nproc) arch/x86/virt/vmx/tdx/seamcall.o
[ snip ]
AS arch/x86/virt/vmx/tdx/seamcall.o
AS arch/x86/virt/vmx/tdx/seamcall.o
fixdep: error opening file: arch/x86/virt/vmx/tdx/.seamcall.o.d: No such file or directory
make[4]: *** [scripts/Makefile.build:362: arch/x86/virt/vmx/tdx/seamcall.o] Error 2
Use the obj-y syntax, as it works correctly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240330060554.18524-1-masahiroy@kernel.org
|
|
Pull drm fixes from Dave Airlie:
"Regular fixes for rc2, quite a few i915/amdgpu as usual, some xe, and
then mostly scattered around. rc3 might be quieter with the holidays
but we shall see.
bridge:
- select DRM_KMS_HELPER
dma-buf:
- fix NULL-pointer deref
dp:
- fix div-by-zero in DP MST unplug code
fbdev:
- select FB_IOMEM_FOPS for SBus
sched:
- fix NULL-pointer deref
xe:
- Fix build on mips
- Fix wrong bound checks
- Fix use of msec rather than jiffies
- Remove dead code
amdgpu:
- SMU 14.0.1 updates
- DCN 3.5.x updates
- VPE fix
- eDP panel flickering fix
- Suspend fix
- PSR fix
- DCN 3.0+ fix
- VCN 4.0.6 updates
- debugfs fix
amdkfd:
- DMA-Buf fix
- GFX 9.4.2 TLB flush fix
- CP interrupt fix
i915:
- Fix for BUG_ON/BUILD_BUG_ON IN I915_memcpy.c
- Update a MTL workaround
- Fix locking inversion in hwmon's sysfs
- Remove a bogus error message around PXP
- Fix UAF on VMA
- Reset queue_priority_hint on parking
- Display Fixes:
- Remove duplicated audio enable/disable on SDVO and DP
- Disable AuxCCS for Xe driver
- Revert init order of MIPI DSI
- DRRS debugfs fix with an extra refactor patch
- VRR related fixes
- Fix a JSL eDP corruption
- Fix the cursor physical dma address
- BIOS VBT related fix
nouveau:
- dmem: handle kcalloc() allocation failures
qxl:
- remove unused variables
rockchip:
- vop2: remove support for AR30 and AB30 formats
vmwgfx:
- debugfs: create ttm_resource_manager entry only if needed"
* tag 'drm-fixes-2024-03-30' of https://gitlab.freedesktop.org/drm/kernel: (55 commits)
drm/i915/bios: Tolerate devdata==NULL in intel_bios_encoder_supports_dp_dual_mode()
drm/i915: Pre-populate the cursor physical dma address
drm/i915/gt: Reset queue_priority_hint on parking
drm/i915/vma: Fix UAF on destroy against retire race
drm/i915: Do not print 'pxp init failed with 0' when it succeed
drm/i915: Do not match JSL in ehl_combo_pll_div_frac_wa_needed()
drm/i915/hwmon: Fix locking inversion in sysfs getter
drm/i915/dsb: Fix DSB vblank waits when using VRR
drm/i915/vrr: Generate VRR "safe window" for DSB
drm/i915/display/debugfs: Fix duplicate checks in i915_drrs_status
drm/i915/drrs: Refactor CPU transcoder DRRS check
drm/i915/mtl: Update workaround 14018575942
drm/i915/dsi: Go back to the previous INIT_OTP/DISPLAY_ON order, mostly
drm/i915/display: Disable AuxCCS framebuffers if built for Xe
drm/i915: Stop doing double audio enable/disable on SDVO and g4x+ DP
drm/i915: Add includes for BUG_ON/BUILD_BUG_ON in i915_memcpy.c
drm/qxl: remove unused variable from `qxl_process_single_command()`
drm/qxl: remove unused `count` variable from `qxl_surface_id_alloc()`
drm/i915: add bug.h include to i915_memcpy.c
drm/vmwgfx: Create debugfs ttm_resource_manager entry only if needed
...
|