Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO driver fixes from Greg KH:
"Here are some small IIO driver fixes for reported problems for
5.14-rc6 (no staging driver fixes at the moment).
All of them resolve reported issues and have been in linux-next all
week with no reported problems. Full details are in the shortlog"
* tag 'staging-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: Fix incorrect exit of for-loop
iio: humidity: hdc100x: Add margin to the conversion time
dt-bindings: iio: st: Remove wrong items length check
iio: accel: fxls8962af: fix i2c dependency
iio: adis: set GPIO reset pin direction
iio: adc: ti-ads7950: Ensure CS is deasserted after reading channels
iio: accel: fxls8962af: fix potential use of uninitialized symbol
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"One driver bugfix, a documentation bugfix, and an "uninitialized data"
leak fix for the core"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Documentation: i2c: add i2c-sysfs into index
i2c: dev: zero out array used for i2c reads from userspace
i2c: iproc: fix race between client unreg and tasklet
|
|
If an SQPOLL based ring is newly created and an application issues an
io_uring_enter(2) system call on it, then we can return a spurious
-EOWNERDEAD error. This happens because there's nothing to submit, and
if the caller doesn't specify any other action, the initial error
assignment of -EOWNERDEAD never gets overwritten. This causes us to
return it directly, even if it isn't valid.
Move the error assignment into the actual failure case instead.
Cc: [email protected]
Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death")
Reported-by: Sherlock Holo [email protected]
Link: https://github.com/axboe/liburing/issues/413
Signed-off-by: Jens Axboe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"A small cleanup patch and a fix of a rare race in the Xen evtchn
driver"
* tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: Fix race in set_evtchn_to_irq
xen/events: remove redundant initialization of variable irq
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- avoid passing -mno-relax to compilers that don't support it
- a comment fix
* tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUE
riscv: kexec: do not add '-mno-relax' flag if compiler doesn't support it
|
|
Pull configfs fix from Christoph Hellwig:
- fix to revert to the historic write behavior (Bart Van Assche)
* tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs:
configfs: restore the kernel v5.13 text attribute write behavior
|
|
Merge misc fixes from Andrew Morton:
"7 patches.
Subsystems affected by this patch series: mm (kasan, mm/slub,
mm/madvise, and memcg), and lib"
* emailed patches from Andrew Morton <[email protected]>:
lib: use PFN_PHYS() in devmem_is_allowed()
mm/memcg: fix incorrect flushing of lruvec data in obj_stock
mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)
mm: slub: fix slub_debug disabling for list of slabs
slub: fix kmalloc_pagealloc_invalid_free unit test
kasan, slub: reset tag when printing address
kasan, kmemleak: reset tags when scanning block
|
|
Pull cifs fixes from Steve French:
"Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close,
and one for the 'modefromsid' mount option (when 'idsfromsid' not
specified)"
* tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Call close synchronously during unlink/rename/lease break.
cifs: Handle race conditions during rename
cifs: use the correct max-length for dentry_path_raw()
cifs: create sd context must be a multiple of 8
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
"A single patch to sgx test to fix Q1 and Q2 calculation"
* tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c
|
|
Internal tests found out that the latest code doesn't bring up 1PPS out
as expected. As a result of incorrect define used to round the time up
the time was round down to the past second boundary.
Fix define used for rounding to properly round up to the next Top of
second in ice_ptp_cfg_clkout to fix it.
Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Signed-off-by: Maciej Machnikowski <[email protected]>
Tested-by: Sunitha Mekala <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The physical address may exceed 32 bits on 32-bit systems with more than
32 bits of physcial address. Use PFN_PHYS() in devmem_is_allowed(), or
the physical address may overflow and be truncated.
We found this bug when mapping a high addresses through devmem tool,
when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem
is used to map a high address that is not in the iomem address range, an
unexpected error indicating no permission is returned.
This bug was initially introduced from v2.6.37, and the function was
moved to lib in v5.11.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem")
Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
Signed-off-by: Liang Wang <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Russell King <[email protected]>
Cc: Liang Wang <[email protected]>
Cc: Xiaoming Ni <[email protected]>
Cc: Kefeng Wang <[email protected]>
Cc: <[email protected]> [2.6.37+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When mod_objcg_state() is called with a pgdat that is different from
that in the obj_stock, the old lruvec data cached in obj_stock are
flushed out. Unfortunately, they were flushed to the new pgdat and so
the data go to the wrong node. This will screw up the slab data
reported in /sys/devices/system/node/node*/meminfo.
Fix that by flushing the data to the cached pgdat instead.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 68ac5b3c8db2 ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp")
Signed-off-by: Waiman Long <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Chris Down <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Waiman Long <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Doing some extended tests and polishing the man page update for
MADV_POPULATE_(READ|WRITE), I realized that we end up converting also
SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another
madvise() user error.
We want to report only problematic mappings and permission problems that
the user could have know as -EINVAL.
Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL,
but instead indicate -EFAULT to user space. While we could also convert
it to -ENOMEM, using -EFAULT looks more helpful when user space might
want to troubleshoot what's going wrong: MADV_POPULATE_(READ|WRITE) is
not part of an final Linux release and we can still adjust the behavior.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Rolf Eike Beer <[email protected]>
Cc: Ram Pai <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Vijayanand Jitta reports:
Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would
want to disable slub_debug for few slabs. Using boot parameter with
slub_debug=-,slab_name syntax doesn't work as expected i.e; only
disabling debugging for the specified list of slabs. Instead it
disables debugging for all slabs, which is wrong.
This patch fixes it by delaying the moment when the global slub_debug
flags variable is updated. In case a "slub_debug=-,slab_name" has been
passed, the global flags remain as initialized (depending on
CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Vijayanand Jitta <[email protected]>
Reviewed-by: Vijayanand Jitta <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vinayak Menon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The unit test kmalloc_pagealloc_invalid_free makes sure that for the
higher order slub allocation which goes to page allocator, the free is
called with the correct address i.e. the virtual address of the head
page.
Commit f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free")
unified the free code paths for page allocator based slub allocations
but instead of using the address passed by the caller, it extracted the
address from the page. Thus making the unit test
kmalloc_pagealloc_invalid_free moot. So, fix this by using the address
passed by the caller.
Should we fix this? I think yes because dev expect kasan to catch these
type of programming bugs.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f227f0faf63b ("slub: fix unreclaimable slab stat for bulk free")
Signed-off-by: Shakeel Butt <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The address still includes the tags when it is printed. With hardware
tag-based kasan enabled, we will get a false positive KASAN issue when
we access metadata.
Reset the tag before we access the metadata.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: aa1ef4d7b3f6 ("kasan, mm: reset tags when accessing metadata")
Signed-off-by: Kuan-Ying Lee <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: Nicholas Tang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "kasan, slub: reset tag when printing address", v3.
With hardware tag-based kasan enabled, we reset the tag when we access
metadata to avoid from false alarm.
This patch (of 2):
Kmemleak needs to scan kernel memory to check memory leak. With hardware
tag-based kasan enabled, when it scans on the invalid slab and
dereference, the issue will occur as below.
Hardware tag-based KASAN doesn't use compiler instrumentation, we can not
use kasan_disable_current() to ignore tag check.
Based on the below report, there are 11 0xf7 granules, which amounts to
176 bytes, and the object is allocated from the kmalloc-256 cache. So
when kmemleak accesses the last 256-176 bytes, it causes faults, as those
are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe.
Thus, we reset tags before accessing metadata to avoid from false positives.
BUG: KASAN: out-of-bounds in scan_block+0x58/0x170
Read at addr f7ff0000c0074eb0 by task kmemleak/138
Pointer tag: [f7], memory tag: [fe]
CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd89f05-dirty #134
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x1b0
show_stack+0x1c/0x30
dump_stack_lvl+0x68/0x84
print_address_description+0x7c/0x2b4
kasan_report+0x138/0x38c
__do_kernel_fault+0x190/0x1c4
do_tag_check_fault+0x78/0x90
do_mem_abort+0x44/0xb4
el1_abort+0x40/0x60
el1h_64_sync_handler+0xb4/0xd0
el1h_64_sync+0x78/0x7c
scan_block+0x58/0x170
scan_gray_list+0xdc/0x1a0
kmemleak_scan+0x2ac/0x560
kmemleak_scan_thread+0xb0/0xe0
kthread+0x154/0x160
ret_from_fork+0x10/0x18
Allocated by task 0:
kasan_save_stack+0x2c/0x60
__kasan_kmalloc+0xec/0x104
__kmalloc+0x224/0x3c4
__register_sysctl_paths+0x200/0x290
register_sysctl_table+0x2c/0x40
sysctl_init+0x20/0x34
proc_sys_init+0x3c/0x48
proc_root_init+0x80/0x9c
start_kernel+0x648/0x6a4
__primary_switched+0xc0/0xc8
Freed by task 0:
kasan_save_stack+0x2c/0x60
kasan_set_track+0x2c/0x40
kasan_set_free_info+0x44/0x54
____kasan_slab_free.constprop.0+0x150/0x1b0
__kasan_slab_free+0x14/0x20
slab_free_freelist_hook+0xa4/0x1fc
kfree+0x1e8/0x30c
put_fs_context+0x124/0x220
vfs_kern_mount.part.0+0x60/0xd4
kern_mount+0x24/0x4c
bdev_cache_init+0x70/0x9c
vfs_caches_init+0xdc/0xf4
start_kernel+0x638/0x6a4
__primary_switched+0xc0/0xc8
The buggy address belongs to the object at ffff0000c0074e00
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 176 bytes inside of
256-byte region [ffff0000c0074e00, ffff0000c0074f00)
The buggy address belongs to the page:
page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074
head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0
flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff|kasantag=0x0)
raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300
raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe
ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
>ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe
^
ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Ying Lee <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Nicholas Tang <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Chinwen Chang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull block fixes from Jens Axboe:
"A few fixes for block that should go into 5.14:
- Revert the mq-deadline cgroup addition. More work is needed on this
front, let's revert it for now and get it right before having it in
a released kernel (Tejun)
- blk-iocost lockdep fix (Ming)
- nbd double completion fix (Xie)
- Fix for non-idling when clearing the shared tag flag (Yu)"
* tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
nbd: Aovid double completion of a request
blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED
Revert "block/mq-deadline: Add cgroup support"
blk-iocost: fix lockdep warning on blkcg->lock
|
|
Pull io_uring fixes from Jens Axboe:
"A bit bigger than the previous weeks, but mostly just a few stable
bound fixes. In detail:
- Followup fixes to patches from last week for io-wq, turns out they
weren't complete (Hao)
- Two lockdep reported fixes out of the RT camp (me)
- Sync the io_uring-cp example with liburing, as a few bug fixes
never made it to the kernel carried version (me)
- SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav)
- Use WRITE_ONCE() when writing sq flags (Nadav)
- io_rsrc_put_work() deadlock fix (Pavel)"
* tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
tools/io_uring/io_uring-cp: sync with liburing example
io_uring: fix ctx-exit io_rsrc_put_work() deadlock
io_uring: drop ctx->uring_lock before flushing work item
io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()
io-wq: fix bug of creating io-wokers unconditionally
io_uring: rsrc ref lock needs to be IRQ safe
io_uring: Use WRITE_ONCE() when writing to sq_flags
io_uring: clear TIF_NOTIFY_SIGNAL when running task work
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"An assortment of pin control fixes of varying importance, the most
important ones affecting Intel and AMD laptops turned up the recent
few days so it's time to push this to your tree.
- Fix the Kconfig dependency for Qualcomm SM8350 pin controller
- Fix pin biasing fallback behaviour on the Mediatek pin controller
- Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond
to the products that are now actually out on the market
- Fix a pin control function itemization in the Sunxi driver
out-of-bounds access bug
- Fix disable clocking for the RISC-V K210 pin controller on the
errorpath
- Fix a system shutdown bug affecting AMD Ryzen-based laptops, the
system would not suspend but just bounce back up"
* tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Fix an issue with shutdown when system set to s0ix
pinctrl: k210: Fix k210_fpioa_probe()
pinctrl: sunxi: Don't underestimate number of functions
pinctrl: tigerlake: Fix GPIO mapping for newer version of software
pinctrl: mediatek: Fix fallback behavior for bias_set_combo
pinctrl: qcom: fix GPIOLIB dependencies
|
|
The patch be7ecbd240b2: "soc: fsl: qe: convert QE interrupt
controller to platform_device" from Aug 3, 2021, leads to the
following static checker warning:
drivers/soc/fsl/qe/qe_ic.c:438 qe_ic_init()
warn: unsigned 'qe_ic->virq_low' is never less than zero.
In old variant irq_of_parse_and_map() returns zero if failed so
unsigned int for virq_high/virq_low was ok.
In new variant platform_get_irq() returns negative error codes
if failed so we need to use int for virq_high/virq_low.
Also simplify high_handler checking and remove the curly braces
to make checkpatch happy.
Fixes: be7ecbd240b2 ("soc: fsl: qe: convert QE interrupt controller to platform_device")
Signed-off-by: Maxim Kochetkov <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Li Yang <[email protected]>
|
|
Jakub Kicinski says:
====================
bnxt: Tx NAPI disabling resiliency improvements
A lockdep warning was triggered by netpoll because napi poll
was taking the xmit lock. Fix that and a couple more issues
noticed while reading the code.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drivers should count packets they are dropping.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
skbs are freed on error and not put on the ring. We may, however,
be in a situation where we're freeing the last skb of a batch,
and there is a doorbell ring pending because of xmit_more() being
true earlier. Make sure we ring the door bell in such situations.
Since errors are rare don't pay attention to xmit_more() and just
always flush the pending frames.
The busy case should be safe to be left alone because it can
only happen if start_xmit races with completions and they
both enable the queue. In that case the kick can't be pending.
Noticed while reading the code.
Fixes: 4d172f21cefe ("bnxt_en: Implement xmit_more.")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
napi schedules DIM, napi has to be disabled first,
then DIM canceled.
Noticed while reading the code.
Fixes: 0bc0b97fca73 ("bnxt_en: cleanup DIM work on device shutdown")
Fixes: 6a8788f25625 ("bnxt_en: add support for software dynamic interrupt moderation")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We can't take the tx lock from the napi poll routine, because
netpoll can poll napi at any moment, including with the tx lock
already held.
The tx lock is protecting against two paths - the disable
path, and (as Michael points out) the NETDEV_TX_BUSY case
which may occur if NAPI completions race with start_xmit
and both decide to re-enable the queue.
For the disable/ifdown path use synchronize_net() to make sure
closing the device does not race we restarting the queues.
Annotate accesses to dev_state against data races.
For the NAPI cleanup vs start_xmit path - appropriate barriers
are already in place in the main spot where Tx queue is stopped
but we need to do the same careful dance in the TX_BUSY case.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Reviewed-by: Michael Chan <[email protected]>
Reviewed-by: Edwin Peer <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There is a race between iterating over requests in
nbd_clear_que() and completing requests in recv_work(),
which can lead to double completion of a request.
To fix it, flush the recv worker before iterating over
the requests and don't abort the completed request
while iterating.
Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect")
Reported-by: Jiang Yadong <[email protected]>
Signed-off-by: Xie Yongji <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Prevent regressions related to zero-extension metadata handling during
dead code sanitization.
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
"access skb fields ok" verifier test fails on s390 with the "verifier
bug. zext_dst is set, but no reg is defined" message. The first insns
of the test prog are ...
0: 61 01 00 00 00 00 00 00 ldxw %r0,[%r1+0]
8: 35 00 00 01 00 00 00 00 jge %r0,0,1
10: 61 01 00 08 00 00 00 00 ldxw %r0,[%r1+8]
... and the 3rd one is dead (this does not look intentional to me, but
this is a separate topic).
sanitize_dead_code() converts dead insns into "ja -1", but keeps
zext_dst. When opt_subreg_zext_lo32_rnd_hi32() tries to parse such
an insn, it sees this discrepancy and bails. This problem can be seen
only with JITs whose bpf_jit_needs_zext() returns true.
Fix by clearning dead insns' zext_dst.
The commits that contributed to this problem are:
1. 5aa5bd14c5f8 ("bpf: add initial suite for selftests"), which
introduced the test with the dead code.
2. 5327ed3d44b7 ("bpf: verifier: mark verified-insn with
sub-register zext flag"), which introduced the zext_dst flag.
3. 83a2881903f3 ("bpf: Account for BPF_FETCH in
insn_has_def32()"), which introduced the sanity check.
4. 9183671af6db ("bpf: Fix leakage under speculation on
mispredicted branches"), which bisect points to.
It's best to fix this on stable branches that contain the second one,
since that's the point where the inconsistency was introduced.
Fixes: 5327ed3d44b7 ("bpf: verifier: mark verified-insn with sub-register zext flag")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This example is missing a few fixes that are in the liburing version,
synchronize with the upstream version.
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
We run a test that delete and recover devcies frequently(two devices on
the same host), and we found that 'active_queues' is super big after a
period of time.
If device a and device b share a tag set, and a is deleted, then
blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there
is only one queue that are using the tag set. However, if b is still
active, the active_queues of b might never be cleared even if b is
deleted.
Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared.
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Fixes: 77e89afc25f3 ("PCI/MSI: Protect msi_desc::masked for multi-MSI")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
The error handling code in tpci200_register does not free interface_regs
allocated by ioremap and the current version of error handling code is
problematic.
Fix this by refactoring the error handling code and free interface_regs
when necessary.
Fixes: 43986798fd50 ("ipack: add error handling for ioremap_nocache")
Cc: [email protected]
Reported-by: Dongliang Mu <[email protected]>
Signed-off-by: Dongliang Mu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The function tpci200_register called by tpci200_install and
tpci200_unregister called by tpci200_uninstall are in pair. However,
tpci200_unregister has some cleanup operations not in the
tpci200_register. So the error handling code of tpci200_pci_probe has
many different double free issues.
Fix this problem by moving those cleanup operations out of
tpci200_unregister, into tpci200_pci_remove and reverting
the previous commit 9272e5d0028d ("ipack/carriers/tpci200:
Fix a double free in tpci200_pci_probe").
Fixes: 9272e5d0028d ("ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe")
Cc: [email protected]
Reported-by: Dongliang Mu <[email protected]>
Signed-off-by: Dongliang Mu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
During suspend/resume NGD remote instance is power cycled along
with remotely controlled bam dma engine.
So Reset the dma configuration during this suspend resume path
so that we are not dealing with any stale dma setup.
Without this transactions timeout after first suspend resume path.
Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
For some reason we ended up using wrong device in some places for pm_runtime calls.
Fix this so that NGG driver can do runtime pm correctly.
Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
In some usecases transaction ids are dynamically allocated inside
the controller driver after sending the messages which have generic
acknowledge responses. So check for this before refcounting pm_runtime.
Without this we would end up imbalancing runtime pm count by
doing pm_runtime_put() in both slim_do_transfer() and slim_msg_response()
for a single pm_runtime_get() in slim_do_transfer()
Fixes: d3062a210930 ("slimbus: messaging: add slim_alloc/free_txn_tid()")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
As tid is unsigned its hard to figure out if the tid is valid or
invalid. So Start the transaction ids from 1 instead of zero
so that we could differentiate between a valid tid and invalid tids
This is useful in cases where controller would add a tid for controller
specific transfers.
Fixes: d3062a210930 ("slimbus: messaging: add slim_alloc/free_txn_tid()")
Cc: <[email protected]>
Signed-off-by: Srinivas Kandagatla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Merge topic branch with fixes for both 5.14-rc6 and 5.15.
|
|
Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.
Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations. But underflow is the best case scenario. The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.
Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok. Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.
For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write. If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.
Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled. Holding mmu_lock for read also
prevents the indirect shadow page from being freed. But as above, keep
it simple and always take the lock.
Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP. Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).
Alternative #2, the unsync logic could be made thread safe. In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice. However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).
Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: [email protected]
Cc: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Set the min_level for the TDP iterator at the root level when zapping all
SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf
SPTE will recursively zap all its children, thus there is no need for the
iterator to attempt to step down. This avoids rereading the top-level
SPTEs after they are zapped by causing try_step_down() to short-circuit.
In most cases, optimizing try_step_down() will be in the noise as the cost
of zapping SPTEs completely dominates the overall time. The optimization
is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
is zapping in response to multiple memslot updates when userspace is adding
and removing read-only memslots for option ROMs. In that case, the task
doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
for read and thus can be a noisy neighbor of sorts.
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
really zap all SPTEs in this case. As is, zap_gfn_range() skips non-leaf
SPTEs whose range exceeds the range to be zapped. If shadow_phys_bits is
not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
paging, the "zap all" flows will skip top-level SPTEs whose range extends
beyond shadow_phys_bits and leak their SPs when the VM is destroyed.
Use the current upper bound (based on host.MAXPHYADDR) to detect that the
caller wants to zap all SPTEs, e.g. instead of using the max theoretical
gfn, 1 << (52 - 12). The more precise upper bound allows the TDP iterator
to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.
Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
help future debuggers should KVM decide to leak SPTEs again.
The bug is most easily reproduced by running (and unloading!) KVM in a
VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.
=============================================================================
BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
dump_stack_lvl+0x45/0x59
slab_err+0x95/0xc9
__kmem_cache_shutdown.cold+0x3c/0x158
kmem_cache_destroy+0x3d/0xf0
kvm_mmu_module_exit+0xa/0x30 [kvm]
kvm_arch_exit+0x5d/0x90 [kvm]
kvm_exit+0x78/0x90 [kvm]
vmx_exit+0x1a/0x50 [kvm_intel]
__x64_sys_delete_module+0x13f/0x220
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Cc: [email protected]
Cc: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.14, take #2
- Plug race between enabling MTE and creating vcpus
- Fix off-by-one bug when checking whether an address range is RAM
|
|
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
in L2 or if the VM-Exit should be forwarded to L1. The current logic fails
to account for the case where #PF is intercepted to handle
guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
L1. At best, L1 will complain and inject the #PF back into L2. At
worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
page faults.
Note, while the bug was technically introduced by the commit that added
support for the MAXPHYADDR madness, the shame is all on commit
a0c134347baf ("KVM: VMX: introduce vmx_need_pf_intercept").
Fixes: 1dbf5d68af6f ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
Cc: [email protected]
Cc: Peter Shier <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When a nested EPT violation/misconfig is injected into the guest,
the shadow EPT PTEs associated with that address need to be synced.
This is done by kvm_inject_emulated_page_fault() before it calls
nested_ept_inject_page_fault(). However, that will only sync the
shadow EPT PTE associated with the current L1 EPTP. Since the ASID
is based on EP4TA rather than the full EPTP, so syncing the current
EPTP is not enough. The SPTEs associated with any other L1 EPTPs
in the prev_roots cache with the same EP4TA also need to be synced.
Signed-off-by: Junaid Shahid <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Merge common topic branch for 5.14-rc6 and 5.15 merge window.
|
|
hv_vcpu is initialized again a dozen lines below, and at this
point vcpu->arch.hyperv is not valid. Remove the initializer.
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove an ancient restriction that disallowed exposing EFER.NX to the
guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU.
The motivation of the check, added by commit 2cc51560aed0 ("KVM: VMX:
Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule
out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run
the guest with the host's EFER.NX and thus avoid context switching EFER
if the only divergence was the NX bit.
Fast forward to today, and KVM has long since stopped running the guest
with the host's EFER.NX. Not only does KVM context switch EFER if
host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 &&
guest.EFER.NX=1 when using shadow paging (to emulate SMEP). Furthermore,
the entire motivation for the restriction was made obsolete over a decade
ago when Intel added dedicated host and guest EFER fields in the VMCS
(Nehalem timeframe), which reduced the overhead of context switching EFER
from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles.
In practice, the removed restriction only affects non-PAE 32-bit kernels,
as EFER.NX is set during boot if NX is supported and the kernel will use
PAE paging (32-bit or 64-bit), regardless of whether or not the kernel
will actually use NX itself (mark PTEs non-executable).
Alternatively and/or complementarily, startup_32_smp() in head_32.S could
be modified to set EFER.NX=1 regardless of paging mode, thus eliminating
the scenario where NX is supported but not enabled. However, that runs
the risk of breaking non-KVM non-PAE kernels (though the risk is very,
very low as there are no known EFER.NX errata), and also eliminates an
easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER
across nested virtualization transitions.
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, bpf, can and
ieee802154.
The size of this is pretty normal, but we got more fixes for 5.14
changes this week than last week. Nothing major but the trend is the
opposite of what we like. We'll see how the next week goes..
Current release - regressions:
- r8169: fix ASPM-related link-up regressions
- bridge: fix flags interpretation for extern learn fdb entries
- phy: micrel: fix link detection on ksz87xx switch
- Revert "tipc: Return the correct errno code"
- ptp: fix possible memory leak caused by invalid cast
Current release - new code bugs:
- bpf: add missing bpf_read_[un]lock_trace() for syscall program
- bpf: fix potentially incorrect results with bpf_get_local_storage()
- page_pool: mask the page->signature before the checking, avoid dma
mapping leaks
- netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
- bnxt_en: fix firmware interface issues with PTP
- mlx5: Bridge, fix ageing time
Previous releases - regressions:
- linkwatch: fix failure to restore device state across
suspend/resume
- bareudp: fix invalid read beyond skb's linear data
Previous releases - always broken:
- bpf: fix integer overflow involving bucket_size
- ppp: fix issues when desired interface name is specified via
netlink
- wwan: mhi_wwan_ctrl: fix possible deadlock
- dsa: microchip: ksz8795: fix number of VLAN related bugs
- dsa: drivers: fix broken backpressure in .port_fdb_dump
- dsa: qca: ar9331: make proper initial port defaults
Misc:
- bpf: add lockdown check for probe_write_user helper
- netfilter: conntrack: remove offload_pickup sysctl before 5.14 is
out
- netfilter: conntrack: collect all entries in one cycle,
heuristically slow down garbage collection scans on idle systems to
prevent frequent wake ups"
* tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
vsock/virtio: avoid potential deadlock when vsock device remove
wwan: core: Avoid returning NULL from wwan_create_dev()
net: dsa: sja1105: unregister the MDIO buses during teardown
Revert "tipc: Return the correct errno code"
net: mscc: Fix non-GPL export of regmap APIs
net: igmp: increase size of mr_ifc_count
MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
net: pcs: xpcs: fix error handling on failed to allocate memory
net: linkwatch: fix failure to restore device state across suspend/resume
net: bridge: fix memleak in br_add_if()
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
net: bridge: fix flags interpretation for extern learn fdb entries
net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
bpf, core: Fix kernel-doc notation
net: igmp: fix data-race in igmp_ifc_timer_expire()
net: Fix memory leak in ieee802154_raw_deliver
...
|
|
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis
and a reference handling fix from Jeff that should address some memory
corruption reports in the snaprealm area.
Both marked for stable"
* tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client:
ceph: take snap_empty_lock atomically with snaprealm refcount change
ceph: reduce contention in ceph_check_delayed_caps()
|