Age | Commit message (Collapse) | Author | Files | Lines |
|
The xarray.c file contains the only call to radix_tree_node_rcu_free(),
and it comes with its own extern declaration for it. This means the
function definition causes a missing-prototype warning:
lib/radix-tree.c:288:6: error: no previous prototype for 'radix_tree_node_rcu_free' [-Werror=missing-prototypes]
Instead, move the declaration for this function to a new header that can
be included by both, and do the same for the radix_tree_node_cachep
variable that has the same underlying problem but does not cause a warning
with gcc.
[[email protected]: fix building radix tree test suite]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Peng Zhang <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
A syzbot fault injection test reported that nilfs_btnode_create_block, a
helper function that allocates a new node block for b-trees, causes a
kernel BUG for disk images where the file system block size is smaller
than the page size.
This was due to unexpected flags on the newly allocated buffer head, and
it turned out to be because the buffer flags were not cleared by
nilfs_btnode_abort_change_key() after an error occurred during a b-tree
update operation and the buffer was later reused in that state.
Fix this issue by using nilfs_btnode_delete() to abandon the unused
preallocated buffer in nilfs_btnode_abort_change_key().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://lkml.kernel.org/r/[email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
A recent commit gated the core dumping task exit logic on current->flags
remaining consistent in terms of PF_{IO,USER}_WORKER at task exit time.
This exposed a problem with the io-wq handling of that, which explicitly
clears PF_IO_WORKER before calling do_exit().
The reasons for this manual clear of PF_IO_WORKER is historical, where
io-wq used to potentially trigger a sleep on exit. As the io-wq thread
is exiting, it should not participate any further accounting. But these
days we don't need to rely on current->flags anymore, so we can safely
remove the PF_IO_WORKER clearing.
Reported-by: Zorro Lang <[email protected]>
Reported-by: Dave Chinner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A more fixes and regression fixes:
- in subpage mode, fix crash when repairing metadata at the end of
a stripe
- properly enable async discard when remounting from read-only to
read-write
- scrub regression fixes:
- respect read-only scrub when attempting to do a repair
- fix reporting of found errors, the stats don't get properly
accounted after a stripe repair"
* tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: also report errors hit during the initial read
btrfs: scrub: respect the read-only flag during repair
btrfs: properly enable async discard when switching from RO->RW
btrfs: subpage: fix a crash in metadata repair path
|
|
We found a refcount UAF bug as follows:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148
Workqueue: events cpuset_hotplug_workfn
Call trace:
refcount_warn_saturate+0xa0/0x148
__refcount_add.constprop.0+0x5c/0x80
css_task_iter_advance_css_set+0xd8/0x210
css_task_iter_advance+0xa8/0x120
css_task_iter_next+0x94/0x158
update_tasks_root_domain+0x58/0x98
rebuild_root_domains+0xa0/0x1b0
rebuild_sched_domains_locked+0x144/0x188
cpuset_hotplug_workfn+0x138/0x5a0
process_one_work+0x1e8/0x448
worker_thread+0x228/0x3e0
kthread+0xe0/0xf0
ret_from_fork+0x10/0x20
then a kernel panic will be triggered as below:
Unable to handle kernel paging request at virtual address 00000000c0000010
Call trace:
cgroup_apply_control_disable+0xa4/0x16c
rebind_subsystems+0x224/0x590
cgroup_destroy_root+0x64/0x2e0
css_free_rwork_fn+0x198/0x2a0
process_one_work+0x1d4/0x4bc
worker_thread+0x158/0x410
kthread+0x108/0x13c
ret_from_fork+0x10/0x18
The race that cause this bug can be shown as below:
(hotplug cpu) | (umount cpuset)
mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex)
cpuset_hotplug_workfn |
rebuild_root_domains | rebind_subsystems
update_tasks_root_domain | spin_lock_irq(&css_set_lock)
css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id]
while(css_task_iter_next) | &dcgrp->e_csets[ss->id]);
css_task_iter_end | spin_unlock_irq(&css_set_lock)
mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex)
Inside css_task_iter_start/next/end, css_set_lock is hold and then
released, so when iterating task(left side), the css_set may be moved to
another list(right side), then it->cset_head points to the old list head
and it->cset_pos->next points to the head node of new list, which can't
be used as struct css_set.
To fix this issue, switch from all css_sets to only scgrp's css_sets to
patch in-flight iterators to preserve correct iteration, and then
update it->cset_head as well.
Reported-by: Gaosheng Cui <[email protected]>
Link: https://www.spinics.net/lists/cgroups/msg37935.html
Suggested-by: Michal Koutný <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Xiu Jianfeng <[email protected]>
Fixes: 2d8f243a5e6e ("cgroup: implement cgroup->e_csets[]")
Cc: [email protected] # v3.16+
Signed-off-by: Tejun Heo <[email protected]>
|
|
freezer_css_{online,offline}()
syzbot is again reporting circular locking dependency between
cpu_hotplug_lock and freezer_mutex. Do like what we did with
commit 57dcd64c7e036299 ("cgroup,freezer: hold cpu_hotplug_lock
before freezer_mutex").
Reported-by: syzbot <[email protected]>
Closes: https://syzkaller.appspot.com/bug?extid=2ab700fe1829880a2ec6
Signed-off-by: Tetsuo Handa <[email protected]>
Tested-by: syzbot <[email protected]>
Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
Cc: [email protected] # v6.1+
Signed-off-by: Tejun Heo <[email protected]>
|
|
The sysctl net/core/bpf_jit_enable does not work now due to commit
1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc"). The
commit saved the jitted insns into 'rw_image' instead of 'image'
which caused bpf_jit_dump not dumping proper content.
With 'echo 2 > /proc/sys/net/core/bpf_jit_enable', run
'./test_progs -t fentry_test'. Without this patch, one of jitted
image for one particular prog is:
flen=17 proglen=92 pass=4 image=0000000014c64883 from=test_progs pid=1807
00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000050: cc cc cc cc cc cc cc cc cc cc cc cc
With this patch, the jitte image for the same prog is:
flen=17 proglen=92 pass=4 image=00000000b90254b7 from=test_progs pid=1809
00000000: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3
00000010: 0f 1e fa 31 f6 48 8b 57 00 48 83 fa 07 75 2b 48
00000020: 8b 57 10 83 fa 09 75 22 48 8b 57 08 48 81 e2 ff
00000030: 00 00 00 48 83 fa 08 75 11 48 8b 7f 18 be 01 00
00000040: 00 00 48 83 ff 0a 74 02 31 f6 48 bf 18 d0 14 00
00000050: 00 c9 ff ff 48 89 77 00 31 c0 c9 c3
Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc")
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
While SDHCI claims to support 64-bit DMA on MSM8916 it does not seem to
be properly functional. It is not immediately obvious because SDHCI is
usually used with IOMMU bypassed on this SoC, and all physical memory
has 32-bit addresses. But when trying to enable the IOMMU it quickly
fails with an error such as the following:
arm-smmu 1e00000.iommu: Unhandled context fault:
fsr=0x402, iova=0xfffff200, fsynr=0xe0000, cbfrsynra=0x140, cb=3
mmc1: ADMA error: 0x02000000
mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00002e02
mmc1: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000000
mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
mmc1: sdhci: Present: 0x03f80206 | Host ctl: 0x00000019
mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
mmc1: sdhci: Timeout: 0x0000000a | Int stat: 0x00000001
mmc1: sdhci: Int enab: 0x03ff900b | Sig enab: 0x03ff100b
mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
mmc1: sdhci: Caps: 0x322dc8b2 | Caps_1: 0x00008007
mmc1: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000
mmc1: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x5b590000
mmc1: sdhci: Resp[2]: 0xe6487f80 | Resp[3]: 0x0a404094
mmc1: sdhci: Host ctl2: 0x00000008
mmc1: sdhci: ADMA Err: 0x00000001 | ADMA Ptr: 0x0000000ffffff224
mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP -----------
mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x60006400 | DLL cfg2: 0x00000000
mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00000000 | DDR cfg: 0x00000000
mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88018a8 Vndr func3: 0x00000000
mmc1: sdhci: ============================================
mmc1: sdhci: fffffffff200: DMA 0x0000ffffffffe100, LEN 0x0008, Attr=0x21
mmc1: sdhci: fffffffff20c: DMA 0x0000000000000000, LEN 0x0000, Attr=0x03
Looking closely it's obvious that only the 32-bit part of the address
(0xfffff200) arrives at the SMMU, the higher 16-bit (0xffff...) get
lost somewhere. This might not be a limitation of the SDHCI itself but
perhaps the bus/interconnect it is connected to, or even the connection
to the SMMU.
Work around this by setting SDHCI_QUIRK2_BROKEN_64_BIT_DMA to avoid
using 64-bit addresses.
Signed-off-by: Stephan Gerhold <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
For BEET the inner address and therefore family is stored in the
xfrm_state selector. Use that when decapsulating an input packet
instead of incorrectly relying on a non-existent tunnel protocol.
Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path")
Reported-by: Steffen Klassert <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
|
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition
values and returning a kernel error code will cause issues in the
caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM.
Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition
values but if the call to sctp_ulpevent_make_authkey() fails, it returns
-ENOMEM.
This results in calling BUG() inside the sctp_side_effects() function.
Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE()
instead.
This code predates git.
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The commit 59a0b022aa24 ("ipvlan: Make skb->skb_iif track skb->dev for l3s
mode") fixed ipvlan bonded dev checking by updating skb skb_iif. This fix
works for IPv4, as in raw_v4_input() the dif is from inet_iif(skb), which
is skb->skb_iif when there is no route.
But for IPv6, the fix is not enough, because in ipv6_raw_deliver() ->
raw_v6_match(), the dif is inet6_iif(skb), which is returns IP6CB(skb)->iif
instead of skb->skb_iif if it's not a l3_slave. To fix the IPv6 part
issue. Let's set IP6CB(skb)->iif to correct ifindex.
BTW, ipvlan handles NS/NA specifically. Since it works fine, I will not
reset IP6CB(skb)->iif when addr->atype is IPVL_ICMPV6.
Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2196710
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When fragmenting the ML per STA profile, the element ID should be
IEEE80211_MLE_SUBELEM_PER_STA_PROFILE rather than WLAN_EID_FRAGMENT.
Change the helper function to take the to be used element ID and pass
the appropriate value for each of the fragmentation levels.
Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Benjamin Berg <[email protected]>
Signed-off-by: Gregory Greenman <[email protected]>
Link: https://lore.kernel.org/r/20230611121219.9b5c793d904b.I7dad952bea8e555e2f3139fbd415d0cd2b3a08c3@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
When compiling YNL generated code compiler complains about
array-initializer-out-of-bounds. Turns out the MAX value
for STATS_GRP uses the value for STATS.
This may lead to random corruptions in user space (kernel
itself doesn't use this value as it never parses stats).
Fixes: f09ea6fb1272 ("ethtool: add a new command for reading standard stats")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Avoid passing arguments like Opcode which can be retrieved from
bnxt_qplib_crsqe structure.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
cmdq_bitmap is used to derive the next available index in the CMDQ.
This is not required as the we can get the next index
using the existing bnxt_qplib_crsqe array.
Driver will use bnxt_qplib_crsqe array and flag is_in_used to
derive valid entries. is_in_used is replacement of cmdq_bitmap.
There is no change in the existing mechanism of the circular buffer
used to get index.
Added opcode field in bnxt_qplib_crsqe array so that it is easy to map
opcode associated with pending rcfw command.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Firmware provides max request timeout value as part of hwrm_ver_get
API. Driver gets the timeout from firmware and if that interface is
not available then fall back to hardcoded timeout value.
Also, Add a helper function to check the FW status.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
When an error is detected in FW, wake up all the waiters as the
all of them need to be completed with timeout. Add the device
error state also as a wait condition.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
If destroy_ah is timed out, it is likely to be destroyed by firmware
but it is taking longer time due to temporary slowness
in processing the rcfw command. In worst case, there might be
AH resource leak in firmware.
Sending timeout return value can dump warning message from ib_core
which can be avoided if we map timeout of destroy_ah as success.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
AH create may be called from interrpt context and driver has a special
timeout (8 sec) for this command. This is to avoid soft lockups when
the FW command takes more time. Driver returns -ETIMEOUT and fail
create AH, without waiting for actual completion from firmware.
When FW completion is received, use is_waiter_alive flag to avoid
a regular completion path.
If create_ah opcode is detected in completion path which does not have
waiter alive, driver will fetch ah_id from successful firmware
completion in the interrupt context and sends destroy_ah command
for same ah_id. This special post is done in quick manner using helper
function __send_message_no_waiter.
timeout_send is only used for debugging purposes.
If timeout_send value keeps incrementing, it indicates out of sync
active ah counter between driver and firmware. This is a limitation
but graceful handling is possible in future.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Every completion will update last_seen value in the unit of jiffies.
last_seen field will be used to know if firmware is alive and
is useful to detect firmware stall.
Non blocking interface __wait_for_resp will have logic to detect
firmware stall. After every 10 second interval if __wait_for_resp
has not received completion for a given command it will check for
firmware stall condition.
If current jiffies is greater than last_seen
jiffies + RCFW_FW_STALL_TIMEOUT_SEC * HZ, it is a firmware stall.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
If calling context detect command timeout, associated memory stored on
stack will not be valid. If firmware complete the same command later,
this causes incorrect memory access by driver.
Added is_waiter_alive to handle delayed completion by firmware.
is_waiter_alive is set and reset under command queue lock.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
This interface will be used if the driver has not enabled interrupt
and/or interrupt is disabled for a short period of time.
Completion is not possible from interrupt so this interface does
self-polling.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
- Use __send_message_basic_sanity helper function.
- Do not retry posting same command if there is a queue full detection.
- ENXIO is used to indicate controller recovery.
- In the case of ERR_DEVICE_DETACHED state, the driver should not post
commands to the firmware, but also return fabricated written code.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Whenever there is a fast path IO and create/destroy resources from
the slow path is happening in parallel, we may notice high latency
of slow path command completion.
Introduces a shadow queue depth to prevent the outstanding requests
to the FW. Driver will not allow more than #RCFW_CMD_NON_BLOCKING_SHADOW_QD
non-blocking commands to the Firmware.
Shadow queue depth is a soft limit only for non-blocking
commands. Blocking commands will be posted to the firmware
as long as there is a free slot.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add a check to avoid waiting if driver already detects a
FW timeout. Return success for resource destroy in case
the device is detached. Add helper function to map timeout
error code to success.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Use jiffies based timewait instead of counting iteration for
commands that block for FW response.
Also add a poll routine for control path commands. This is for
polling completion if the waiting commands timeout. This avoids cases
where the driver misses completion interrupts.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
There is no need of setting max command queue entries based on
firmware version check.
Removing deperecated code.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
There is a common FW communication offset for both PF and VF.
Removed code around virt_fn check while creating FW channel.
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
bnxt_qplib_service_creq can be called from interrupt or tasklet or
process context. So the function take irq variant of spin_lock.
But when wake_up is invoked with the lock held, it is putting the
calling context to sleep.
[exception RIP: __wake_up_common+190]
RIP: ffffffffb7539d7e RSP: ffffa73300207ad8 RFLAGS: 00000083
RAX: 0000000000000001 RBX: ffff91fa295f69b8 RCX: dead000000000200
RDX: ffffa733344af940 RSI: ffffa73336527940 RDI: ffffa73336527940
RBP: 000000000000001c R8: 0000000000000002 R9: 00000000000299c0
R10: 0000017230de82c5 R11: 0000000000000002 R12: ffffa73300207b28
R13: 0000000000000000 R14: ffffa733341bf928 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
Call the wakeup after releasing the lock.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Driver is not handling the wraparound of the mbox producer index correctly.
Currently the wraparound happens once u32 max is reached.
Bit 31 of the producer index register is special and should be set
only once for the first command. Because the producer index overflow
setting bit31 after a long time, FW goes to initialization sequence
and this causes FW hang.
Fix is to wraparound the mbox producer index once it reaches u16 max.
Fixes: cee0c7bba486 ("RDMA/bnxt_re: Refactor command queue management code")
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Kashyap Desai <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
The current implementation of max_credits on the client does
not work because the CreditRequest logic for several commands
does not take max_credits into account.
Still, we can end up asking the server for more credits, depending
on the number of credits in flight. For this, we need to
limit the credits while parsing the responses too.
Signed-off-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
iface_cmp used to simply do a memcmp of the two
provided struct sockaddrs. The comparison needs to do more
based on the address family. Similar logic was already
present in cifs_match_ipaddr. Doing something similar now.
Signed-off-by: Shyam Prasad N <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
The virtio driver for Linux guests will not set a link speed to its
paravirtualized NICs. This will be seen as -1 in the ethernet layer, and
when some servers (e.g. samba) fetches it, it's converted to an unsigned
value (and multiplied by 1000 * 1000), so in client side we end up with:
1) Speed: 4294967295000000 bps
in DebugData.
This patch introduces a helper that returns a speed string (in Mbps or
Gbps) if interface speed is valid (>= SPEED_10 and <= SPEED_800000), or
"Unknown" otherwise.
The reason to not change the value in iface->speed is because we don't
know the real speed of the HW backing the server NIC, so let's keep
considering these as the fastest NICs available.
Also print "Capabilities: None" when the interface doesn't support any.
Signed-off-by: Enzo Matsumiya <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Output of /proc/fs/cifs/DebugData shows only the per-connection
counter for the number of credits of regular type. i.e. the
credits reserved for echo and oplocks are not displayed.
There have been situations recently where having this info
would have been useful. This change prints the credit counters
of all three types: regular, echo, oplocks.
Signed-off-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
The ordering of status checks at the beginning of
cifs_tree_connect is wrong. As a result, a tcon
which is good may stay marked as needing reconnect
infinitely.
Fixes: 2f0e4f034220 ("cifs: check only tcon status on tcon related functions")
Cc: [email protected] # 6.3
Signed-off-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Because do_gettimeofday has been removed and replaced by ktime_get_real_ts64,
So just remove the comment as it's not needed now.
Signed-off-by: 鑫华 <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
As noted by Michal, the blkg_iostat_set's in the lockless list hold
reference to blkg's to protect against their removal. Those blkg's
hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which
is called when the blkcg reference count reaches 0. This circular
dependency will prevent blkcg and some blkgs from being freed after
they are made offline.
It is less a problem if the cgroup to be destroyed also has other
controllers like memory that will call cgroup_rstat_flush() which will
clean up the reference count. If block is the only controller that uses
rstat, these offline blkcg and blkgs may never be freed leaking more
and more memory over time.
To prevent this potential memory leak:
- flush blkcg per-cpu stats list in __blkg_release(), when no new stat
can be added
- add global blkg_stat_lock for covering concurrent parent blkg stat
update
- don't grab bio->bi_blkg reference when adding the stats into blkcg's
per-cpu stat list since all stats are guaranteed to be consumed before
releasing blkg instance, and grabbing blkg reference for stats was the
most fragile part of original patch
Based on Waiman's patch:
https://lore.kernel.org/linux-block/[email protected]/
Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Cc: [email protected]
Reported-by: Jay Shin <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: [email protected]
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
|
|
The ib_isert module is releasing the isert connection both in
isert_wait_conn() handler as well as isert_free_conn() handler.
In isert_wait_conn() handler, it is expected to wait for iSCSI
session logout operation to complete. It should free the isert
connection only in isert_free_conn() handler.
When a bunch of iSER target is cleared, this issue can lead to
use-after-free memory issue as isert conn is twice released
Fixes: b02efbfc9a05 ("iser-target: Fix implicit termination of connections")
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Saravanan Vajravel <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
When ib_isert module receives connection error event, it is
releasing the isert session and removes corresponding list
node but it doesn't take appropriate mutex lock to remove
the list node. This can lead to linked list corruption
Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce")
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Saravanan Vajravel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
- When a iSER session is released, ib_isert module is taking a mutex
lock and releasing all pending connections. As part of this, ib_isert
is destroying rdma cm_id. To destroy cm_id, rdma_cm module is sending
CM events to CMA handler of ib_isert. This handler is taking same
mutex lock. Hence it leads to deadlock between ib_isert & rdma_cm
modules.
- For fix, created local list of pending connections and release the
connection outside of mutex lock.
Calltrace:
---------
[ 1229.791410] INFO: task kworker/10:1:642 blocked for more than 120 seconds.
[ 1229.791416] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1
[ 1229.791418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1229.791419] task:kworker/10:1 state:D stack: 0 pid: 642 ppid: 2 flags:0x80004000
[ 1229.791424] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 1229.791436] Call Trace:
[ 1229.791438] __schedule+0x2d1/0x830
[ 1229.791445] ? select_idle_sibling+0x23/0x6f0
[ 1229.791449] schedule+0x35/0xa0
[ 1229.791451] schedule_preempt_disabled+0xa/0x10
[ 1229.791453] __mutex_lock.isra.7+0x310/0x420
[ 1229.791456] ? select_task_rq_fair+0x351/0x990
[ 1229.791459] isert_cma_handler+0x224/0x330 [ib_isert]
[ 1229.791463] ? ttwu_queue_wakelist+0x159/0x170
[ 1229.791466] cma_cm_event_handler+0x25/0xd0 [rdma_cm]
[ 1229.791474] cma_ib_handler+0xa7/0x2e0 [rdma_cm]
[ 1229.791478] cm_process_work+0x22/0xf0 [ib_cm]
[ 1229.791483] cm_work_handler+0xf4/0xf30 [ib_cm]
[ 1229.791487] ? move_linked_works+0x6e/0xa0
[ 1229.791490] process_one_work+0x1a7/0x360
[ 1229.791491] ? create_worker+0x1a0/0x1a0
[ 1229.791493] worker_thread+0x30/0x390
[ 1229.791494] ? create_worker+0x1a0/0x1a0
[ 1229.791495] kthread+0x10a/0x120
[ 1229.791497] ? set_kthread_struct+0x40/0x40
[ 1229.791499] ret_from_fork+0x1f/0x40
[ 1229.791739] INFO: task targetcli:28666 blocked for more than 120 seconds.
[ 1229.791740] Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1
[ 1229.791741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1229.791742] task:targetcli state:D stack: 0 pid:28666 ppid: 5510 flags:0x00004080
[ 1229.791743] Call Trace:
[ 1229.791744] __schedule+0x2d1/0x830
[ 1229.791746] schedule+0x35/0xa0
[ 1229.791748] schedule_preempt_disabled+0xa/0x10
[ 1229.791749] __mutex_lock.isra.7+0x310/0x420
[ 1229.791751] rdma_destroy_id+0x15/0x20 [rdma_cm]
[ 1229.791755] isert_connect_release+0x115/0x130 [ib_isert]
[ 1229.791757] isert_free_np+0x87/0x140 [ib_isert]
[ 1229.791761] iscsit_del_np+0x74/0x120 [iscsi_target_mod]
[ 1229.791776] lio_target_np_driver_store+0xe9/0x140 [iscsi_target_mod]
[ 1229.791784] configfs_write_file+0xb2/0x110
[ 1229.791788] vfs_write+0xa5/0x1a0
[ 1229.791792] ksys_write+0x4f/0xb0
[ 1229.791794] do_syscall_64+0x5b/0x1a0
[ 1229.791798] entry_SYSCALL_64_after_hwframe+0x65/0xca
Fixes: bd3792205aae ("iser-target: Fix pending connections handling in target stack shutdown sequnce")
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Saravanan Vajravel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
- Set up the kernel CS earlier in the boot process in case EFI boots
the kernel after bypassing the decompressor and the CS descriptor
used ends up being the EFI one which is not mapped in the identity
page table, leading to early SEV/SNP guest communication exceptions
resulting in the guest crashing
* tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed
|
|
Pull smb server fixes from Steve French:
"Five smb3 server fixes, all also for stable:
- Fix four slab out of bounds warnings: improve checks for protocol
id, and for small packet length, and for create context parsing,
and for negotiate context parsing
- Fix for incorrect dereferencing POSIX ACLs"
* tag '6.4-rc5-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate smb request protocol id
ksmbd: check the validation of pdu_size in ksmbd_conn_handler_loop
ksmbd: fix posix_acls and acls dereferencing possible ERR_PTR()
ksmbd: fix out-of-bound read in parse_lease_state()
ksmbd: fix out-of-bound read in deassemble_neg_contexts()
|
|
The original doorbell allocation mechanism is complex and does not meet
the isolation requirement. So we introduce a new doorbell mechanism and the
original mechanism (only be used with CAP_SYS_RAWIO if hardware does not
support the new mechanism) needs to be kept as simple as possible for
compatibility.
Signed-off-by: Cheng Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
For the isolation requirement, each QP/CQ can only issue doorbells from the
allocated mmio space. Configure the relationship between QPs/CQs and
mmio doorbell spaces to hardware in create_qp/create_cq interfaces.
Signed-off-by: Cheng Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Each ucontext will try to allocate doorbell resources in the extended bar
space from hardware. For compatibility, we change nothing for the original
bar space, and it will be used only for applications with CAP_SYS_RAWIO
authority in the older HW/FW environments.
Signed-off-by: Cheng Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Add a new CMDQ message to configure hardware. Initially the page size (in
the format of shift) will be passed to hardware, so that hardware can
organize the mmio space properly. It's called only if hardware supports it.
Signed-off-by: Cheng Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.
However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.
The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.
To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.
Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Mark Bloch <[email protected]>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
|
|
Fix ib_uverbs_event_read() to consider event queue closing also upon
non-blocking mode.
Once the queue is closed (e.g. hot-plug flow) all the existing events
are cleaned-up as part of ib_uverbs_free_event_queue().
An application that uses the non-blocking FD mode should get -EIO in
that case to let it knows that the device was removed already.
Otherwise, it can loose the indication that the device was removed and
won't recover.
As part of that, refactor the code to have a single flow with regards to
'is_closed' for both blocking and non-blocking modes.
Fixes: 14e23bd6d221 ("RDMA/core: Fix locking in ib_uverbs_event_read")
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Yishai Hadas <[email protected]>
Link: https://lore.kernel.org/r/97b00116a1e1e13f8dc4ec38a5ea81cf8c030210.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <[email protected]>
|