Age | Commit message (Collapse) | Author | Files | Lines |
|
The sparc mdesc code does pointer games with 'struct mdesc_hdr', but
didn't describe to the compiler how that header is then followed by the
data that the header describes.
As a result, gcc is now unhappy since it does stricter pointer range
tracking, and doesn't understand about how these things work. This
results in various errors like:
arch/sparc/kernel/mdesc.c: In function ‘mdesc_node_by_name’:
arch/sparc/kernel/mdesc.c:647:22: error: ‘strcmp’ reading 1 or more bytes from a region of size 0 [-Werror=stringop-overread]
647 | if (!strcmp(names + ep[ret].name_offset, name))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
which are easily avoided by just describing 'struct mdesc_hdr' better,
and making the node_block() helper function look into that unsized
data[] that follows the header.
This makes the sparc64 build happy again at least for my cross-compiler
version (gcc version 11.2.1).
Link: https://lore.kernel.org/lkml/CAHk-=wi4NW3NC0xWykkw=6LnjQD6D_rtRtxY9g8gQAJXtQMi8A@mail.gmail.com/
Cc: Guenter Roeck <[email protected]>
Cc: David S. Miller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Merge absolute_pointer macro series from Guenter Roeck:
"Kernel test builds currently fail for several architectures with error
messages such as the following.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0
[-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses if gcc's builtin functions are used for
those operations.
This series introduces absolute_pointer() to fix the problem.
absolute_pointer() disassociates a pointer from its originating symbol
type and context, and thus prevents gcc from making assumptions about
pointers passed to memory operations"
* emailed patches from Guenter Roeck <[email protected]>:
alpha: Use absolute_pointer to define COMMAND_LINE
alpha: Move setup.h out of uapi
net: i825xx: Use absolute_pointer for memcpy from fixed memory location
compiler.h: Introduce absolute_pointer macro
|
|
alpha:allmodconfig fails to build with the following error
when using gcc 11.x.
arch/alpha/kernel/setup.c: In function 'setup_arch':
arch/alpha/kernel/setup.c:493:13: error:
'strcmp' reading 1 or more bytes from a region of size 0
Avoid the problem by declaring COMMAND_LINE as absolute_pointer().
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Most of the contents of setup.h have no value for userspace
applications. The file was probably moved to uapi accidentally.
Keep the file in uapi to define the alpha-specific COMMAND_LINE_SIZE.
Move all other defines to arch/alpha/include/asm/setup.h.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
gcc 11.x reports the following compiler warning/error.
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Use absolute_pointer() to work around the problem.
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
absolute_pointer() disassociates a pointer from its originating symbol
type and context. Use it to prevent compiler warnings/errors such as
drivers/net/ethernet/i825xx/82596.c: In function 'i82596_probe':
arch/m68k/include/asm/string.h:72:25: error:
'__builtin_memcpy' reading 6 bytes from a region of size 0 [-Werror=stringop-overread]
Such warnings may be reported by gcc 11.x for string and memory
operations on fixed addresses.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
KASAN reports a use-after-free report when doing fuzz test:
[693354.104835] ==================================================================
[693354.105094] BUG: KASAN: use-after-free in bfq_io_set_weight_legacy+0xd3/0x160
[693354.105336] Read of size 4 at addr ffff888be0a35664 by task sh/1453338
[693354.105607] CPU: 41 PID: 1453338 Comm: sh Kdump: loaded Not tainted 4.18.0-147
[693354.105610] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.81 07/02/2018
[693354.105612] Call Trace:
[693354.105621] dump_stack+0xf1/0x19b
[693354.105626] ? show_regs_print_info+0x5/0x5
[693354.105634] ? printk+0x9c/0xc3
[693354.105638] ? cpumask_weight+0x1f/0x1f
[693354.105648] print_address_description+0x70/0x360
[693354.105654] kasan_report+0x1b2/0x330
[693354.105659] ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105665] ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105670] bfq_io_set_weight_legacy+0xd3/0x160
[693354.105675] ? bfq_cpd_init+0x20/0x20
[693354.105683] cgroup_file_write+0x3aa/0x510
[693354.105693] ? ___slab_alloc+0x507/0x540
[693354.105698] ? cgroup_file_poll+0x60/0x60
[693354.105702] ? 0xffffffff89600000
[693354.105708] ? usercopy_abort+0x90/0x90
[693354.105716] ? mutex_lock+0xef/0x180
[693354.105726] kernfs_fop_write+0x1ab/0x280
[693354.105732] ? cgroup_file_poll+0x60/0x60
[693354.105738] vfs_write+0xe7/0x230
[693354.105744] ksys_write+0xb0/0x140
[693354.105749] ? __ia32_sys_read+0x50/0x50
[693354.105760] do_syscall_64+0x112/0x370
[693354.105766] ? syscall_return_slowpath+0x260/0x260
[693354.105772] ? do_page_fault+0x9b/0x270
[693354.105779] ? prepare_exit_to_usermode+0xf9/0x1a0
[693354.105784] ? enter_from_user_mode+0x30/0x30
[693354.105793] entry_SYSCALL_64_after_hwframe+0x65/0xca
[693354.105875] Allocated by task 1453337:
[693354.106001] kasan_kmalloc+0xa0/0xd0
[693354.106006] kmem_cache_alloc_node_trace+0x108/0x220
[693354.106010] bfq_pd_alloc+0x96/0x120
[693354.106015] blkcg_activate_policy+0x1b7/0x2b0
[693354.106020] bfq_create_group_hierarchy+0x1e/0x80
[693354.106026] bfq_init_queue+0x678/0x8c0
[693354.106031] blk_mq_init_sched+0x1f8/0x460
[693354.106037] elevator_switch_mq+0xe1/0x240
[693354.106041] elevator_switch+0x25/0x40
[693354.106045] elv_iosched_store+0x1a1/0x230
[693354.106049] queue_attr_store+0x78/0xb0
[693354.106053] kernfs_fop_write+0x1ab/0x280
[693354.106056] vfs_write+0xe7/0x230
[693354.106060] ksys_write+0xb0/0x140
[693354.106064] do_syscall_64+0x112/0x370
[693354.106069] entry_SYSCALL_64_after_hwframe+0x65/0xca
[693354.106114] Freed by task 1453336:
[693354.106225] __kasan_slab_free+0x130/0x180
[693354.106229] kfree+0x90/0x1b0
[693354.106233] blkcg_deactivate_policy+0x12c/0x220
[693354.106238] bfq_exit_queue+0xf5/0x110
[693354.106241] blk_mq_exit_sched+0x104/0x130
[693354.106245] __elevator_exit+0x45/0x60
[693354.106249] elevator_switch_mq+0xd6/0x240
[693354.106253] elevator_switch+0x25/0x40
[693354.106257] elv_iosched_store+0x1a1/0x230
[693354.106261] queue_attr_store+0x78/0xb0
[693354.106264] kernfs_fop_write+0x1ab/0x280
[693354.106268] vfs_write+0xe7/0x230
[693354.106271] ksys_write+0xb0/0x140
[693354.106275] do_syscall_64+0x112/0x370
[693354.106280] entry_SYSCALL_64_after_hwframe+0x65/0xca
[693354.106329] The buggy address belongs to the object at ffff888be0a35580
which belongs to the cache kmalloc-1k of size 1024
[693354.106736] The buggy address is located 228 bytes inside of
1024-byte region [ffff888be0a35580, ffff888be0a35980)
[693354.107114] The buggy address belongs to the page:
[693354.107273] page:ffffea002f828c00 count:1 mapcount:0 mapping:ffff888107c17080 index:0x0 compound_mapcount: 0
[693354.107606] flags: 0x17ffffc0008100(slab|head)
[693354.107760] raw: 0017ffffc0008100 ffffea002fcbc808 ffffea0030bd3a08 ffff888107c17080
[693354.108020] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
[693354.108278] page dumped because: kasan: bad access detected
[693354.108511] Memory state around the buggy address:
[693354.108671] ffff888be0a35500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[693354.116396] ffff888be0a35580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.124473] >ffff888be0a35600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.132421] ^
[693354.140284] ffff888be0a35680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.147912] ffff888be0a35700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.155281] ==================================================================
blkgs are protected by both queue and blkcg locks and holding
either should stabilize them. However, the path of destroying
blkg policy data is only protected by queue lock in
blkcg_activate_policy()/blkcg_deactivate_policy(). Other tasks
can get the blkg policy data before the blkg policy data is
destroyed, and use it after destroyed, which will result in a
use-after-free.
CPU0 CPU1
blkcg_deactivate_policy
spin_lock_irq(&q->queue_lock)
bfq_io_set_weight_legacy
spin_lock_irq(&blkcg->lock)
blkg_to_bfqg(blkg)
pd_to_bfqg(blkg->pd[pol->plid])
^^^^^^blkg->pd[pol->plid] != NULL
bfqg != NULL
pol->pd_free_fn(blkg->pd[pol->plid])
pd_to_bfqg(blkg->pd[pol->plid])
bfqg_put(bfqg)
kfree(bfqg)
blkg->pd[pol->plid] = NULL
spin_unlock_irq(q->queue_lock);
bfq_group_set_weight(bfqg, val, 0)
bfqg->entity.new_weight
^^^^^^trigger uaf here
spin_unlock_irq(&blkcg->lock);
Fix by grabbing the matching blkcg lock before trying to
destroy blkg policy data.
Suggested-by: Tejun Heo <[email protected]>
Signed-off-by: Li Jinlin <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
BUG: memory leak
unreferenced object 0xffff888129acdb80 (size 96):
comm "syz-executor.1", pid 12661, jiffies 4294962682 (age 15.220s)
hex dump (first 32 bytes):
20 47 c9 85 ff ff ff ff 20 d4 8e 29 81 88 ff ff G...... ..)....
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff82264ec8>] kmalloc include/linux/slab.h:591 [inline]
[<ffffffff82264ec8>] kzalloc include/linux/slab.h:721 [inline]
[<ffffffff82264ec8>] blk_iolatency_init+0x28/0x190 block/blk-iolatency.c:724
[<ffffffff8225b8c4>] blkcg_init_queue+0xb4/0x1c0 block/blk-cgroup.c:1185
[<ffffffff822253da>] blk_alloc_queue+0x22a/0x2e0 block/blk-core.c:566
[<ffffffff8223b175>] blk_mq_init_queue_data block/blk-mq.c:3100 [inline]
[<ffffffff8223b175>] __blk_mq_alloc_disk+0x25/0xd0 block/blk-mq.c:3124
[<ffffffff826a9303>] loop_add+0x1c3/0x360 drivers/block/loop.c:2344
[<ffffffff826a966e>] loop_control_get_free drivers/block/loop.c:2501 [inline]
[<ffffffff826a966e>] loop_control_ioctl+0x17e/0x2e0 drivers/block/loop.c:2516
[<ffffffff81597eec>] vfs_ioctl fs/ioctl.c:51 [inline]
[<ffffffff81597eec>] __do_sys_ioctl fs/ioctl.c:874 [inline]
[<ffffffff81597eec>] __se_sys_ioctl fs/ioctl.c:860 [inline]
[<ffffffff81597eec>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
[<ffffffff843fa745>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff843fa745>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Once blk_throtl_init() queue init failed, blkcg_iolatency_exit() will
not be invoked for cleanup. That leads a memory leak. Swap the
blk_throtl_init() and blk_iolatency_init() calls can solve this.
Reported-by: [email protected]
Fixes: 19688d7f9592 (block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls)
Signed-off-by: Yanfei Xu <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The lib/bootconfig.c file is shared with the 'bootconfig' tooling, and
as a result, the changes incommit 77e02cf57b6c ("memblock: introduce
saner 'memblock_free_ptr()' interface") need to also be reflected in the
tooling header file.
So define the new memblock_free_ptr() wrapper, and remove unused __pa()
and memblock_free().
Fixes: 77e02cf57b6c ("memblock: introduce saner 'memblock_free_ptr()' interface")
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Readers of rwbase can lock and unlock without taking any inner lock, if
that happens, we need the ordering provided by atomic operations to
satisfy the ordering semantics of lock/unlock. Without that, considering
the follow case:
{ X = 0 initially }
CPU 0 CPU 1
===== =====
rt_write_lock();
X = 1
rt_write_unlock():
atomic_add(READER_BIAS - WRITER_BIAS, ->readers);
// ->readers is READER_BIAS.
rt_read_lock():
if ((r = atomic_read(->readers)) < 0) // True
atomic_try_cmpxchg(->readers, r, r + 1); // succeed.
<acquire the read lock via fast path>
r1 = X; // r1 may be 0, because nothing prevent the reordering
// of "X=1" and atomic_add() on CPU 1.
Therefore audit every usage of atomic operations that may happen in a
fast path, and add necessary barriers.
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The code in rwbase_write_lock() is a little non-obvious vs the
read+set 'trylock', extract the sequence into a helper function to
clarify the code.
This also provides a single site to fix fast-path ordering.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Noticed while looking at the readers race.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
In perf_event_addr_filters_apply, the task associated with
the event (event->ctx->task) is read using READ_ONCE at the beginning
of the function, checked, and then re-read from event->ctx->task,
voiding all guarantees of the checks. Reuse the value that was read by
READ_ONCE to ensure the consistency of the task struct throughout the
function.
Fixes: 375637bc52495 ("perf/core: Introduce address range filtering")
Signed-off-by: Baptiste Lepers <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
230d50d448acb ("io_uring: move reissue into regular IO path")
made non-IOPOLL I/O to not retry from ki_complete handler. Follow it
steps and do the same for IOPOLL. Same problems, same implementation,
same -EAGAIN assumptions.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/f80dfee2d5fa7678f0052a8ab3cfca9496a112ca.1631699928.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa.
We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Get rid of the need to do re-expand and revert on an iterator when we
encounter a short IO, or failure that warrants a retry. Use the new
state save/restore helpers instead.
We keep the iov_iter_state persistent across retries, if we need to
restart the read or write operation. If there's a pending retry, the
operation will always exit with the state correctly saved.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.15
- fix ANA state updates when a namespace is not present (Anton Eidelman)
- nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show
(Dan Carpenter)
- avoid race in shutdown namespace removal (Daniel Wagner)
- fix io_work priority inversion in nvme-tcp (Keith Busch)
- destroy cm id before destroy qp to avoid use after free (Ruozhu Li)"
* tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme:
nvme-tcp: fix io_work priority inversion
nvme-rdma: destroy cm id before destroy qp to avoid use after free
nvme-multipath: fix ANA state updates when a namespace is not present
nvme: avoid race in shutdown namespace removal
nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()
|
|
This reverts commit cf4b94c8530d14017fbddae26aad064ddc42edd4.
Some PHYs pointed to by "phy-handle" will never bind to a driver until a
consumer attaches to it. And when the consumer attaches to it, they get
forcefully bound to a generic PHY driver. In such cases, parsing the
phy-handle property and creating a device link will prevent the consumer
from ever probing. We don't want that. So revert support for
"phy-handle" property until we come up with a better mechanism for
binding PHYs to generic drivers before a consumer tries to attach to it.
Signed-off-by: Saravana Kannan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
|
|
s390 is the only architecture which allows to set the
-mwarn-dynamicstack compile option. This however will also always
generate a warning with system call stack randomization, which uses
alloca to generate some random sized stack frame.
On the other hand Linus just enabled "-Werror" by default with commit
3fe617ccafd6 ("Enable '-Werror' by default for all kernel builds"),
which means compiles will always fail by default.
So instead of playing once again whack-a-mole for something which is
s390 specific, simply remove this option.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Get rid of warnings like:
drivers/s390/crypto/ap_bus.c:216: warning:
bad line:
drivers/s390/crypto/ap_bus.c:444:
warning: Function parameter or member 'floating' not described in 'ap_interrupt_handler'
Reviewed-by: Harald Freudenberger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Prevent out-of-range access if the returned SCLP SCCB response is smaller
in size than the address of the Secure-IPL flag.
Fixes: c9896acc7851 ("s390/ipl: Provide has_secure sysfs attribute")
Cc: [email protected] # 5.2+
Signed-off-by: Alexander Egorenkov <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
with read mmap_sem in munmap").
find_vma() does not check if the address is >= the VMA start address;
use vma_lookup() instead.
Reviewed-by: Niklas Schnelle <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
The ICS native driver relies on the IRQ chip data to find the struct
'ics_native' describing the ICS controller but it was removed by commit
248af248a8f4 ("powerpc/xics: Rename the map handler in a check handler").
Revert this change to fix the Microwatt SoC platform.
Fixes: 248af248a8f4 ("powerpc/xics: Rename the map handler in a check handler")
Signed-off-by: Cédric Le Goater <[email protected]>
Tested-by: Gustavo Romero <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It was introduced by 4035b43da6da ("xen-swiotlb: remove xen_set_nslabs")
and then not removed by 2d29960af0be ("swiotlb: dynamically allocate
io_tlb_default_mem").
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
I consider it unhelpful that address and size of the buffer aren't put
in the log file; it makes diagnosing issues needlessly harder. The
majority of callers of swiotlb_init() also passes 1 for the "verbose"
parameter.
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Commit a98f565462f0 ("xen-swiotlb: split xen_swiotlb_init") should not
only have added __init to the split off function, but also should have
dropped __ref from the one left.
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Due to the use of max(1024, ...) there's no point retrying (and issuing
bogus log messages) when the number of slabs is already no larger than
this minimum value.
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Only on the 2nd of the paths leading to xen_swiotlb_init()'s "error"
label it is useful to retry the allocation; the first one did already
iterate through all possible order values.
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Generic swiotlb code makes sure to keep the slab count a multiple of the
number of slabs per segment. Yet even without checking whether any such
assumption is made elsewhere, it is easy to see that xen_swiotlb_fixup()
might alter unrelated memory when calling xen_create_contiguous_region()
for the last segment, when that's not a full one - the function acts on
full order-N regions, not individual pages.
Align the slab count suitably when halving it for a retry. Add a build
time check and a runtime one. Replace the no longer useful local
variable "slabs" by an "order" one calculated just once, outside of the
loop. Re-use "order" for calculating "dma_bits", and change the type of
the latter as well as the one of "i" while touching this anyway.
Signed-off-by: Jan Beulich <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
The commit referenced below removed the assignment of "bytes" from
xen_swiotlb_init() without - like done for xen_swiotlb_init_early() -
adding an assignment on the retry path, thus leading to excessively
sized allocations upon retries.
Fixes: 2d29960af0be ("swiotlb: dynamically allocate io_tlb_default_mem")
Signed-off-by: Jan Beulich <[email protected]>
Cc: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Of the two paths leading to the "error" label in xen_swiotlb_init() one
didn't allocate anything, while the other did already free what was
allocated.
Fixes: b82776005369 ("xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used")
Signed-off-by: Jan Beulich <[email protected]>
Cc: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
It's not clear to me why only the frontend has been tristate. Switch the
backend to be, too.
Signed-off-by: Jan Beulich <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page
table entries") introduced a regression when running as Xen PV guest.
Today pmd_populate() for Xen PV assumes that the PFN inserted is
referencing a not yet used page table. In case of move_normal_pmd()
this is not true, resulting in WARN splats like:
[34321.304270] ------------[ cut here ]------------
[34321.304277] WARNING: CPU: 0 PID: 23628 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x176/0x1a0
[34321.304288] Modules linked in:
[34321.304291] CPU: 0 PID: 23628 Comm: apt-get Not tainted 5.14.1-20210906-doflr-mac80211debug+ #1
[34321.304294] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010
[34321.304296] RIP: e030:xen_mc_flush+0x176/0x1a0
[34321.304300] Code: 89 45 18 48 c1 e9 3f 48 89 ce e9 20 ff ff ff e8 60 03 00 00 66 90 5b 5d 41 5c 41 5d c3 48 c7 45 18 ea ff ff ff be 01 00 00 00 <0f> 0b 8b 55 00 48 c7 c7 10 97 aa 82 31 db 49 c7 c5 38 97 aa 82 65
[34321.304303] RSP: e02b:ffffc90000a97c90 EFLAGS: 00010002
[34321.304305] RAX: ffff88807d416398 RBX: ffff88807d416350 RCX: ffff88807d416398
[34321.304306] RDX: 0000000000000001 RSI: 0000000000000001 RDI: deadbeefdeadf00d
[34321.304308] RBP: ffff88807d416300 R08: aaaaaaaaaaaaaaaa R09: ffff888006160cc0
[34321.304309] R10: deadbeefdeadf00d R11: ffffea000026a600 R12: 0000000000000000
[34321.304310] R13: ffff888012f6b000 R14: 0000000012f6b000 R15: 0000000000000001
[34321.304320] FS: 00007f5071177800(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
[34321.304322] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[34321.304323] CR2: 00007f506f542000 CR3: 00000000160cc000 CR4: 0000000000000660
[34321.304326] Call Trace:
[34321.304331] xen_alloc_pte+0x294/0x320
[34321.304334] move_pgt_entry+0x165/0x4b0
[34321.304339] move_page_tables+0x6fa/0x8d0
[34321.304342] move_vma.isra.44+0x138/0x500
[34321.304345] __x64_sys_mremap+0x296/0x410
[34321.304348] do_syscall_64+0x3a/0x80
[34321.304352] entry_SYSCALL_64_after_hwframe+0x44/0xae
[34321.304355] RIP: 0033:0x7f507196301a
[34321.304358] Code: 73 01 c3 48 8b 0d 76 0e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 0e 0c 00 f7 d8 64 89 01 48
[34321.304360] RSP: 002b:00007ffda1eecd38 EFLAGS: 00000246 ORIG_RAX: 0000000000000019
[34321.304362] RAX: ffffffffffffffda RBX: 000056205f950f30 RCX: 00007f507196301a
[34321.304363] RDX: 0000000001a00000 RSI: 0000000001900000 RDI: 00007f506dc56000
[34321.304364] RBP: 0000000001a00000 R08: 0000000000000010 R09: 0000000000000004
[34321.304365] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f506dc56060
[34321.304367] R13: 00007f506dc56000 R14: 00007f506dc56060 R15: 000056205f950f30
[34321.304368] ---[ end trace a19885b78fe8f33e ]---
[34321.304370] 1 of 2 multicall(s) failed: cpu 0
[34321.304371] call 2: op=12297829382473034410 arg=[aaaaaaaaaaaaaaaa] result=-22
Fix that by modifying xen_alloc_ptpage() to only pin the page table in
case it wasn't pinned already.
Fixes: 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries")
Cc: <[email protected]>
Reported-by: Sander Eikelenboom <[email protected]>
Tested-by: Sander Eikelenboom <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
RTC flag. Otherwise the following WARN splat will occur at boot:
[ 1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
[ 1.333404] Modules linked in:
[ 1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-default+ #282
[ 1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
[ 1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
[ 1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
[ 1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
[ 1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
[ 1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
[ 1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
[ 1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
[ 1.333404] FS: 0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
[ 1.333404] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
[ 1.333404] Call Trace:
[ 1.333404] ? wakeup_sources_sysfs_init+0x30/0x30
[ 1.333404] ? rdinit_setup+0x2b/0x2b
[ 1.333404] early_resume_init+0x23/0xa4
[ 1.333404] ? cn_proc_init+0x36/0x36
[ 1.333404] do_one_initcall+0x3e/0x200
[ 1.333404] kernel_init_freeable+0x232/0x28e
[ 1.333404] ? rest_init+0xd0/0xd0
[ 1.333404] kernel_init+0x16/0x120
[ 1.333404] ret_from_fork+0x1f/0x30
Cc: <[email protected]>
Fixes: 8d152e7a5c7537 ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
Building dp83640.c on arch/parisc/ produces a build warning for
PAGE0 being redefined. Since the macro is not used in the dp83640
driver, just make it a comment for documentation purposes.
In file included from ../drivers/net/phy/dp83640.c:23:
../drivers/net/phy/dp83640_reg.h:8: warning: "PAGE0" redefined
8 | #define PAGE0 0x0000
from ../drivers/net/phy/dp83640.c:11:
../arch/parisc/include/asm/page.h:187: note: this is the location of the previous definition
187 | #define PAGE0 ((struct zeropage *)__PAGE_OFFSET)
Fixes: cb646e2b02b2 ("ptp: Added a clock driver for the National Semiconductor PHYTER.")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: Geert Uytterhoeven <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Heiner Kallweit <[email protected]>
Cc: Russell King <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This function is called to enable SR-IOV when available,
not enabling interfaces without VFs was a regression.
Fixes: 65161c35554f ("bnx2x: Fix missing error code in bnx2x_iov_init_one()")
Signed-off-by: Adrian Bunk <[email protected]>
Reported-by: YunQiang Su <[email protected]>
Tested-by: YunQiang Su <[email protected]>
Cc: [email protected]
Acked-by: Shai Malin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There is no need to explicitly unregister the integrity profile when
deleting the gendisk.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
When the integrity profile is unregistered there can still be integrity
reads queued up which could see a NULL verify_fn as shown by the race
window below:
CPU0 CPU1
process_one_work nvme_validate_ns
bio_integrity_verify_fn nvme_update_ns_info
nvme_update_disk_info
blk_integrity_unregister
---set queue->integrity as 0
bio_integrity_process
--access bi->profile->verify_fn(bi is a pointer of queue->integity)
Before calling blk_integrity_unregister in nvme_update_disk_info, we must
make sure that there is no work item in the kintegrityd_wq. Just call
blk_flush_integrity to flush the work queue so the bug can be resolved.
Signed-off-by: Lihong Kou <[email protected]>
[hch: split up and shortened the changelog]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
While clearing the profile itself is harmless, we really should not clear
the stable writes flag if it wasn't set due to a registered integrity
profile.
Reported-by: Lihong Kou <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
On sparc64, __fls() returns an "int", but the drm TTM code expected it
to be "unsigned long" as on x86. As a result, on sparc (and arc, and
m68k) you get build errors because 'min()' checks that the types match.
As suggested by Linus, it can use min_t instead of min to force the type
to be "unsigned int".
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Huang Rui <[email protected]>
Reviewed-by: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Guenter Roeck <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The boot-time allocation interface for memblock is a mess, with
'memblock_alloc()' returning a virtual pointer, but then you are
supposed to free it with 'memblock_free()' that takes a _physical_
address.
Not only is that all kinds of strange and illogical, but it actually
causes bugs, when people then use it like a normal allocation function,
and it fails spectacularly on a NULL pointer:
https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
or just random memory corruption if the debug checks don't catch it:
https://lore.kernel.org/all/[email protected]/
I really don't want to apply patches that treat the symptoms, when the
fundamental cause is this horribly confusing interface.
I started out looking at just automating a sane replacement sequence,
but because of this mix or virtual and physical addresses, and because
people have used the "__pa()" macro that can take either a regular
kernel pointer, or just the raw "unsigned long" address, it's all quite
messy.
So this just introduces a new saner interface for freeing a virtual
address that was allocated using 'memblock_alloc()', and that was kept
as a regular kernel pointer. And then it converts a couple of users
that are obvious and easy to test, including the 'xbc_nodes' case in
lib/bootconfig.c that caused problems.
Reported-by: kernel test robot <[email protected]>
Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
Cc: Steven Rostedt <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
debugfs APIs returns encoded error so use
IS_ERR for checking return value.
v2: return PTR_ERR(ent)
References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-By: Shashank Sharma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
In amdgpu_dm_atomic_check, dc_validate_global_state is called. On
failure this logs a warning to the kernel journal. However warnings
shouldn't be used for atomic test-only commit failures: user-space
might be perfoming a lot of atomic test-only commits to find the
best hardware configuration.
Downgrade the log to a regular DRM atomic message. While at it, use
the new device-aware logging infrastructure.
This fixes error messages in the kernel when running gamescope [1].
[1]: https://github.com/Plagman/gamescope/issues/245
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Simon Ser <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The memory backing old_mem is already freed at that point, move the
check a bit more up.
Signed-off-by: Christian König <[email protected]>
Fixes: bfa3357ef9ab ("drm/ttm: allocate resource object instead of embedding it v2")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1699
Acked-by: Nirmoy Das <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
fix the issue of uploading powerplay table due to the dependancy of rlc.
Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Jack Gui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Seems like newer cards can have even more instances now.
Found by UBSAN: array-index-out-of-bounds in
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29
index 8 is out of range for type 'uint32_t *[8]'
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697
Cc: [email protected]
Signed-off-by: Ernst Sjöstrand <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Linus proposes to revert an accounting for sops objects in
do_semtimedop() because it's really just a temporary buffer
for a single semtimedop() system call.
This object can consume up to 2 pages, syscall is sleeping
one, size and duration can be controlled by user, and this
allocation can be repeated by many thread at the same time.
However Shakeel Butt pointed that there are much more popular
objects with the same life time and similar memory
consumption, the accounting of which was decided to be
rejected for performance reasons.
Considering at least 2 pages for task_struct and 2 pages for
the kernel stack, a back of the envelope calculation gives a
footprint amplification of <1.5 so this temporal buffer can be
safely ignored.
The factor would IMO be interesting if it was >> 2 (from the
PoV of excessive (ab)use, fine-grained accounting seems to be
currently unfeasible due to performance impact).
Link: https://lore.kernel.org/lkml/[email protected]/
Fixes: 18319498fdd4 ("memcg: enable accounting of ipc resources")
Signed-off-by: Vasily Averin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Michal Koutný <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A common complaint is that using O_NONBLOCK files with io_uring can be a
bit of a pain. Be a bit nicer and allow normal retry IFF the file does
support async behavior. This makes it possible to use io_uring more
reliably with O_NONBLOCK files, for use cases where it either isn't
possible or feasible to modify the file flags.
Cc: [email protected]
Reported-and-tested-by: Dan Melnic <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Since commit e5c6b312ce3c ("cpufreq: schedutil: Use kobject release()
method to free sugov_tunables") kobject_put() has kfree()d the
attr_set before gov_attr_set_put() returns.
kobject_put() isn't the last user of attr_set in gov_attr_set_put(),
the subsequent mutex_destroy() triggers a use-after-free:
| BUG: KASAN: use-after-free in mutex_is_locked+0x20/0x60
| Read of size 8 at addr ffff000800ca4250 by task cpuhp/2/20
|
| CPU: 2 PID: 20 Comm: cpuhp/2 Not tainted 5.15.0-rc1 #12369
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
| Platform, BIOS EDK II Jul 30 2018
| Call trace:
| dump_backtrace+0x0/0x380
| show_stack+0x1c/0x30
| dump_stack_lvl+0x8c/0xb8
| print_address_description.constprop.0+0x74/0x2b8
| kasan_report+0x1f4/0x210
| kasan_check_range+0xfc/0x1a4
| __kasan_check_read+0x38/0x60
| mutex_is_locked+0x20/0x60
| mutex_destroy+0x80/0x100
| gov_attr_set_put+0xfc/0x150
| sugov_exit+0x78/0x190
| cpufreq_offline.isra.0+0x2c0/0x660
| cpuhp_cpufreq_offline+0x14/0x24
| cpuhp_invoke_callback+0x430/0x6d0
| cpuhp_thread_fun+0x1b0/0x624
| smpboot_thread_fn+0x5e0/0xa6c
| kthread+0x3a0/0x450
| ret_from_fork+0x10/0x20
Swap the order of the calls.
Fixes: e5c6b312ce3c ("cpufreq: schedutil: Use kobject release() method to free sugov_tunables")
Cc: 4.7+ <[email protected]> # 4.7+
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|