Age | Commit message (Collapse) | Author | Files | Lines |
|
Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately
identify a client by forcing it to emit only 40 times more connections
than there are entries in the table_perturb[] table. The previous two
improvements consisting in resalting the secret every 10s and adding
randomness to each port selection only slightly improved the situation,
and the current value of 2^8 was too small as it's not very difficult
to make a client emit 10k connections in less than 10 seconds.
Thus we're increasing the perturb table from 2^8 to 2^16 so that the
same precision now requires 2.6M connections, which is more difficult in
this time frame and harder to hide as a background activity. The impact
is that the table now uses 256 kB instead of 1 kB, which could mostly
affect devices making frequent outgoing connections. However such
components usually target a small set of destinations (load balancers,
database clients, perf assessment tools), and in practice only a few
entries will be visited, like before.
A live test at 1 million connections per second showed no performance
difference from the previous value.
Reported-by: Moshe Kol <[email protected]>
Reported-by: Yossi Gilad <[email protected]>
Reported-by: Amit Klein <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We'll need to further increase the size of this table and it's likely
that at some point its size will not be suitable anymore for a static
table. Let's allocate it on boot from inet_hashinfo2_init(), which is
called from tcp_init().
Cc: Moshe Kol <[email protected]>
Cc: Yossi Gilad <[email protected]>
Cc: Amit Klein <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Here we're randomly adding between 0 and 7 random increments to the
selected source port in order to add some noise in the source port
selection that will make the next port less predictable.
With the default port range of 32768-60999 this means a worst case
reuse scenario of 14116/8=1764 connections between two consecutive
uses of the same port, with an average of 14116/4.5=3137. This code
was stressed at more than 800000 connections per second to a fixed
target with all connections closed by the client using RSTs (worst
condition) and only 2 connections failed among 13 billion, despite
the hash being reseeded every 10 seconds, indicating a perfectly
safe situation.
Cc: Moshe Kol <[email protected]>
Cc: Yossi Gilad <[email protected]>
Cc: Amit Klein <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In order to limit the ability for an observer to recognize the source
ports sequence used to contact a set of destinations, we should
periodically shuffle the secret. 10 seconds looks effective enough
without causing particular issues.
Cc: Moshe Kol <[email protected]>
Cc: Yossi Gilad <[email protected]>
Cc: Amit Klein <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Tested-by: Willy Tarreau <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Amit Klein suggests that we use different parts of port_offset for the
table's index and the port offset so that there is no direct relation
between them.
Cc: Jason A. Donenfeld <[email protected]>
Cc: Moshe Kol <[email protected]>
Cc: Yossi Gilad <[email protected]>
Cc: Amit Klein <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit
7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output
remained truncated to 32-bit only. In order to exploit more bits from the
hash, let's make the functions return the full 64-bit of siphash_3u32().
We also make sure the port offset calculation in __inet_hash_connect()
remains done on 32-bit to avoid the need for div_u64_rem() and an extra
cost on 32-bit systems.
Cc: Jason A. Donenfeld <[email protected]>
Cc: Moshe Kol <[email protected]>
Cc: Yossi Gilad <[email protected]>
Cc: Amit Klein <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Jason A. Donenfeld says:
====================
wireguard patches for 5.18-rc6
In working on some other problems, I wound up leaning on the WireGuard
CI more than usual and uncovered a few small issues with reliability.
These are fairly low key changes, since they don't impact kernel code
itself.
One change does stick out in particular, though, which is the "make
routing loop test non-fatal" commit. I'm not thrilled about doing this,
but currently [1] remains unsolved, and I'm still working on a real
solution to that (hopefully for 5.19 or 5.20 if I can come up with a
good idea...), so for now that test just prints a big red warning
instead.
[1] https://lore.kernel.org/netdev/[email protected]/
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Rather than setting this once init is running, set panic_on_warn from
the kernel command line, so that it catches splats from WireGuard
initialization code and the various crypto selftests.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use newer, more reliable package dependencies. These should hopefully
reduce flakes. However, we keep the old iputils package, as it
accumulated bugs after resulting in flakes on slow machines.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When moving to non-system toolchains, we inadvertantly killed the
ability to use ccache. So instead, build ccache support into the test
harness directly.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Rather than relying on the system to have cross toolchains available,
simply download musl.cc's ones and use that libc.so, and then we use it
to fill in a few missing platforms, such as riscv64, riscv64, powerpc64,
and s390x.
Since riscv doesn't have a second serial port in its device description,
we have to use virtio's vport. This is actually the same situation on
ARM, but we were previously hacking QEMU up to work around this, which
required a custom QEMU. Instead just do the vport trick on ARM too.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The parallel tests were added to catch queueing issues from multiple
cores. But what happens in reality when testing tons of processes is
that these separate threads wind up fighting with the scheduler, and we
wind up with contention in places we don't care about that decrease the
chances of hitting a bug. So just do a test with the number of CPU
cores, rather than trying to scale up arbitrarily.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
I hate to do this, but I still do not have a good solution to actually
fix this bug across architectures. So just disable it for now, so that
the CI can still deliver actionable results. This commit adds a large
red warning, so that at least the failure isn't lost forever, and
hopefully this can be revisited down the line.
Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/[email protected]/
Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The FPU usage related to task FPU management is either protected by
disabling interrupts (switch_to, return to user) or via fpregs_lock() which
is a wrapper around local_bh_disable(). When kernel code wants to use the
FPU then it has to check whether it is possible by calling irq_fpu_usable().
But the condition in irq_fpu_usable() is wrong. It allows FPU to be used
when:
!in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle()
The latter is checking whether some other context already uses FPU in the
kernel, but if that's not the case then it allows FPU to be used
unconditionally even if the calling context interrupted a fpregs_lock()
critical region. If that happens then the FPU state of the interrupted
context becomes corrupted.
Allow in kernel FPU usage only when no other context has in kernel FPU
usage and either the calling context is not hard interrupt context or the
hard interrupt did not interrupt a local bottomhalf disabled region.
It's hard to find a proper Fixes tag as the condition was broken in one way
or the other for a very long time and the eager/lazy FPU changes caused a
lot of churn. Picked something remotely connected from the history.
This survived undetected for quite some time as FPU usage in interrupt
context is rare, but the recent changes to the random code unearthed it at
least on a kernel which had FPU debugging enabled. There is probably a
higher rate of silent corruption as not all issues can be detected by the
FPU debugging code. This will be addressed in a subsequent change.
Fixes: 5d2bd7009f30 ("x86, fpu: decouple non-lazy/eager fpu restore from xsave")
Reported-by: Filipe Manana <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Filipe Manana <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while
rxe_recv.c uses _bh spinlocks for the same lock.
As there is no case where the mcg_lock can be taken from an IRQ, change
these all to bh locks so we don't have confusing mismatched lock types on
the same spinlock.
Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
These routines were not intended to be called under a spinlock and will
throw debugging warnings:
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50
CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:warn_bogus_irq_restore+0x2f/0x50
Call Trace:
<TASK>
_raw_spin_unlock_irqrestore+0x75/0x80
rxe_attach_mcast+0x304/0x480 [rdma_rxe]
ib_attach_mcast+0x88/0xa0 [ib_core]
ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs]
ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs]
ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs]
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Move them out of the spinlock, it is OK if there is some races setting up
the MC reception at the ethernet layer with rbtree lookups.
Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The calling of siw_cm_upcall and detaching new_cep with its listen_cep
should be atomistic semantics. Otherwise siw_reject may be called in a
temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep
has not being cleared.
This fixes a WARN:
WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw]
CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
RIP: 0010:siw_cep_put+0x125/0x130 [siw]
Call Trace:
<TASK>
siw_reject+0xac/0x180 [siw]
iw_cm_reject+0x68/0xc0 [iw_cm]
cm_work_handler+0x59d/0xe20 [iw_cm]
process_one_work+0x1e2/0x3b0
worker_thread+0x50/0x3a0
? rescuer_thread+0x390/0x390
kthread+0xe5/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com
Reported-by: Luis Chamberlain <[email protected]>
Reviewed-by: Bernard Metzler <[email protected]>
Signed-off-by: Cheng Xu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
We no longer use these since 111659c2a570 (and they never worked
anyway); drop them from the example to avoid confusion.
Fixes: 111659c2a570 ("arm64: dts: apple: t8103: Remove PCIe max-link-speed properties")
Signed-off-by: Hector Martin <[email protected]>
Reviewed-by: Alyssa Rosenzweig <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Another round of removing redundant minItems/maxItems when 'items' list is
specified. This time it is in if/then schemas as the meta-schema was
failing to check this case.
If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
same size as the list is redundant and can be dropped. Note that is DT
schema specific behavior and not standard json-schema behavior. The tooling
will fixup the final schema adding any unspecified minItems/maxItems.
Signed-off-by: Rob Herring <[email protected]>
Acked-By: Vinod Koul <[email protected]>
Acked-by: Marc Kleine-Budde <[email protected]>
Acked-by: Mark Brown <[email protected]>
Acked-by: Ulf Hansson <[email protected]> # For MMC
Acked-by: Jonathan Cameron <[email protected]> #for IIO
Link: https://lore.kernel.org/r/[email protected]
|
|
A few platforms, at91 and tegra, use drive-push-pull and
drive-open-drain with a 0 or 1 value. There's not really a need for values
as '1' should be equivalent to no value (it wasn't treated that way) and
drive-push-pull disabled is equivalent to drive-open-drain. So dropping the
value can't be done without breaking existing OSs. As we don't want new
cases, mark the case with values as deprecated.
Cc: Arnd Bergmann <[email protected]>
Cc: Thierry Reding <[email protected]>
Cc: Jonathan Hunter <[email protected]>
Cc: Nicolas Ferre <[email protected]>
Cc: Claudiu Beznea <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Change to my kernel.org email address.
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/1abc3de4b00dc6f915ac975a2ec29ed545d96dc4.1651687652.git.jpoimboe@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"IOMMU core:
- Fix for a regression which could cause NULL-ptr dereferences
Arm SMMU:
- Fix off-by-one in SMMUv3 SVA TLB invalidation
- Disable large mappings to workaround nvidia erratum
Intel VT-d:
- Handle PCI stop marker messages in IOMMU driver to meet the
requirement of I/O page fault handling framework.
- Calculate a feasible mask for non-aligned page-selective IOTLB
invalidation.
Apple DART IOMMU:
- Fix potential NULL-ptr dereference
- Set module owner"
* tag 'iomm-fixes-v5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Make sysfs robust for non-API groups
iommu/dart: Add missing module owner to ops structure
iommu/dart: check return value after calling platform_get_resource()
iommu/vt-d: Drop stop marker messages
iommu/vt-d: Calculate mask for non-aligned flushes
iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu
iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range()
|
|
Pull IPMI fixes from Corey Minyard:
"Fix some issues that were reported.
This has been in for-next for a bit (longer than the times would
indicate, I had to rebase to add some text to the headers) and these
are fixes that need to go in"
* tag 'for-linus-5.17-2' of https://github.com/cminyard/linux-ipmi:
ipmi:ipmi_ipmb: Fix null-ptr-deref in ipmi_unregister_smi()
ipmi: When handling send message responses, don't process the message
|
|
A faulty receiver might report an erroneous channel count. We
should guard against reading beyond AUDIO_CHANNELS_COUNT as
that would overflow the dpcd_pattern_period array.
Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
While technically Xen dom0 is a virtual machine too, it does have
access to most of the hardware so it doesn't need to be considered a
"passthrough". Commit b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for
APUs in passthrough") changed how FB is accessed based on passthrough
mode. This breaks amdgpu in Xen dom0 with message like this:
[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
While the reason for this failure is unclear, the passthrough mode is
not really necessary in Xen dom0 anyway. So, to unbreak booting affected
kernels, disable passthrough mode in this case.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985
Fixes: b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough")
Signed-off-by: Marek Marczykowski-Górecki <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Groups created by VFIO backends outside the core IOMMU API should never
be passed directly into the API itself, however they still expose their
standard sysfs attributes, so we can still stumble across them that way.
Take care to consider those cases before jumping into our normal
assumptions of a fully-initialised core API group.
Fixes: 3f6634d997db ("iommu: Use right way to retrieve iommu_ops")
Reported-by: Jan Stancek <[email protected]>
Tested-by: Jan Stancek <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Robin Murphy <[email protected]>
Link: https://lore.kernel.org/r/86ada41986988511a8424e84746dfe9ba7f87573.1651667683.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <[email protected]>
|
|
As reported by Alan, the CFI (Call Frame Information) in the VDSO time
routines is incorrect since commit ce7d8056e38b ("powerpc/vdso: Prepare
for switching VDSO to generic C implementation.").
DWARF has a concept called the CFA (Canonical Frame Address), which on
powerpc is calculated as an offset from the stack pointer (r1). That
means when the stack pointer is changed there must be a corresponding
CFI directive to update the calculation of the CFA.
The current code is missing those directives for the changes to r1,
which prevents gdb from being able to generate a backtrace from inside
VDSO functions, eg:
Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
(gdb) bt
#0 0x00007ffff7f804dc in __kernel_clock_gettime ()
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
#2 0x00007fffffffd960 in ?? ()
#3 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
Backtrace stopped: frame did not save the PC
Alan helpfully describes some rules for correctly maintaining the CFI information:
1) Every adjustment to the current frame address reg (ie. r1) must be
described, and exactly at the instruction where r1 changes. Why?
Because stack unwinding might want to access previous frames.
2) If a function changes LR or any non-volatile register, the save
location for those regs must be given. The CFI can be at any
instruction after the saves up to the point that the reg is
changed.
(Exception: LR save should be described before a bl. not after)
3) If asychronous unwind info is needed then restores of LR and
non-volatile regs must also be described. The CFI can be at any
instruction after the reg is restored up to the point where the
save location is (potentially) trashed.
Fix the inability to backtrace by adding CFI directives describing the
changes to r1, ie. satisfying rule 1.
Also change the information for LR to point to the copy saved on the
stack, not the value in r0 that will be overwritten by the function
call.
Finally, add CFI directives describing the save/restore of r2.
With the fix gdb can correctly back trace and navigate up and down the stack:
Breakpoint 1, 0x00007ffff7f804dc in __kernel_clock_gettime ()
(gdb) bt
#0 0x00007ffff7f804dc in __kernel_clock_gettime ()
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
#2 0x0000000100015b60 in gettime ()
#3 0x000000010000c8bc in print_long_format ()
#4 0x000000010000d180 in print_current_files ()
#5 0x00000001000054ac in main ()
(gdb) up
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
(gdb)
#2 0x0000000100015b60 in gettime ()
(gdb)
#3 0x000000010000c8bc in print_long_format ()
(gdb)
#4 0x000000010000d180 in print_current_files ()
(gdb)
#5 0x00000001000054ac in main ()
(gdb)
Initial frame selected; you cannot go up.
(gdb) down
#4 0x000000010000d180 in print_current_files ()
(gdb)
#3 0x000000010000c8bc in print_long_format ()
(gdb)
#2 0x0000000100015b60 in gettime ()
(gdb)
#1 0x00007ffff7d8872c in clock_gettime@@GLIBC_2.17 () from /lib64/libc.so.6
(gdb)
#0 0x00007ffff7f804dc in __kernel_clock_gettime ()
(gdb)
Fixes: ce7d8056e38b ("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")
Cc: [email protected] # v5.11+
Reported-by: Alan Modra <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Segher Boessenkool <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The user can change the QoS credits dynamically with the
management console interface which notifies OS with sysfs. After
returning from the OS interface successfully, the management
console updates the hypervisor. Since the VAS capabilities in
the hypervisor is not updated when the OS gets the update,
the kernel is using the old total credits value from the
hypervisor. Fix this issue by using the new QoS credits
from the userspace instead of depending on VAS capabilities
from the hypervisor.
Signed-off-by: Haren Myneni <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Reset GCC_SDCC_BCR register before every fresh initilazation. This will
reset whole SDHC-msm controller, clears the previous power control
states and avoids, software reset timeout issues as below.
[ 5.458061][ T262] mmc1: Reset 0x1 never completed.
[ 5.462454][ T262] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 5.469065][ T262] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00007202
[ 5.475688][ T262] mmc1: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000
[ 5.482315][ T262] mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
[ 5.488927][ T262] mmc1: sdhci: Present: 0x01f800f0 | Host ctl: 0x00000000
[ 5.495539][ T262] mmc1: sdhci: Power: 0x00000000 | Blk gap: 0x00000000
[ 5.502162][ T262] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000003
[ 5.508768][ T262] mmc1: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000
[ 5.515381][ T262] mmc1: sdhci: Int enab: 0x00000000 | Sig enab: 0x00000000
[ 5.521996][ T262] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[ 5.528607][ T262] mmc1: sdhci: Caps: 0x362dc8b2 | Caps_1: 0x0000808f
[ 5.535227][ T262] mmc1: sdhci: Cmd: 0x00000000 | Max curr: 0x00000000
[ 5.541841][ T262] mmc1: sdhci: Resp[0]: 0x00000000 | Resp[1]: 0x00000000
[ 5.548454][ T262] mmc1: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
[ 5.555079][ T262] mmc1: sdhci: Host ctl2: 0x00000000
[ 5.559651][ T262] mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP-----------
[ 5.566621][ T262] mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x6000642c | DLL cfg2: 0x0020a000
[ 5.575465][ T262] mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00010800 | DDR cfg: 0x80040873
[ 5.584658][ T262] mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88218a8 Vndr func3: 0x02626040
Fixes: 0eb0d9f4de34 ("mmc: sdhci-msm: Initial support for Qualcomm chipsets")
Signed-off-by: Shaik Sajida Bhanu <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Tested-by: Konrad Dybcio <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
it/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-05-03
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Newer variants of the MMC controller support a 34-bit physical address
space by using word addresses instead of byte addresses. However, the
code truncates the DMA descriptor address to 32 bits before applying the
shift. This breaks DMA for descriptors allocated above the 32-bit limit.
Fixes: 3536b82e5853 ("mmc: sunxi: add support for A100 mmc controller")
Signed-off-by: Samuel Holland <[email protected]>
Reviewed-by: Andre Przywara <[email protected]>
Reviewed-by: Jernej Skrabec <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
Currently MBSSID parameters in struct ieee80211_bss_conf
are not reset upon connection. This could be problematic
with some drivers in a scenario where the device first
connects to a non-transmit BSS and then connects to a
transmit BSS of a Multi BSS AP. The MBSSID parameters
which are set after connecting to a non-transmit BSS will
not be reset and the same parameters will be passed on to
the driver during the subsequent connection to a transmit
BSS of a Multi BSS AP.
For example, firmware running on the ath11k device uses the
Multi BSS data for tracking the beacon of a non-transmit BSS
and reports the driver when there is a beacon miss. If we do
not reset the MBSSID parameters during the subsequent
connection to a transmit BSS, then the driver would have
wrong MBSSID data and FW would be looking for an incorrect
BSSID in the MBSSID beacon of a Multi BSS AP and reports
beacon loss leading to an unstable connection.
Reset the MBSSID parameters upon every connection to solve this
problem.
Fixes: 78ac51f81532 ("mac80211: support multi-bssid")
Signed-off-by: Manikanta Pubbisetty <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
When retrieving the S1G channel number from IEs, we should retrieve
the operating channel instead of the primary channel. The S1G operation
element specifies the main channel of operation as the oper channel,
unlike for HT and HE which specify their main channel of operation as
the primary channel.
Signed-off-by: Kieran Frewen <[email protected]>
Signed-off-by: Bassem Dawood <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
Validate the S1G channel width input by user to ensure it matches
that of the requested channel
Signed-off-by: Kieran Frewen <[email protected]>
Signed-off-by: Bassem Dawood <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
When the QoS ack policy was set to non explicit / psmp ack, frames are treated
as not being part of a BA session, which causes extra latency on reordering.
Fix this by only bypassing reordering for packets with no-ack policy
Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
This is required to make loading this as a module work.
Signed-off-by: Hector Martin <[email protected]>
Fixes: 46d1fb072e76 ("iommu/dart: Add DART iommu driver")
Reviewed-by: Sven Peter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
|
|
The IT6505 is using functions provided by the DRM_DP_HELPER driver.
In order to avoid having the bridge enabled but the helper disabled,
let's add a select in order to be sure that the DP helper functions are
always available.
Fixes: b5c84a9edcd4 ("drm/bridge: add it6505 driver")
Signed-off-by: Fabien Parent <[email protected]>
Reviewed-by: Neil Armstrong <[email protected]>
Signed-off-by: Neil Armstrong <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The cited commits didn't use proper matching on inner TTC
as a result distribution of encapsulated packets wasn't symmetric
between the physical ports.
Fixes: 4c71ce50d2fe ("net/mlx5: Support partial TTC rules")
Fixes: 8e25a2bc6687 ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
Signed-off-by: Mark Bloch <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Double clear of reset requested state can lead to NULL pointer as it
will try to delete the timer twice. This can happen for example on a
race between abort from FW and pci error or reset. Avoid such case using
test_and_clear_bit() to verify only one time reset requested state clear
flow. Similarly use test_and_set_bit() to verify only one time reset
requested state set flow.
Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event")
Signed-off-by: Moshe Shemesh <[email protected]>
Reviewed-by: Maher Sanalla <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The sync reset flow can lead to the following deadlock when
poll_sync_reset() is called by timer softirq and waiting on
del_timer_sync() for the same timer. Fix that by moving the part of the
flow that waits for the timer to reset_reload_work.
It fixes the following kernel Trace:
RIP: 0010:del_timer_sync+0x32/0x40
...
Call Trace:
<IRQ>
mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
poll_sync_reset.cold+0x36/0x52 [mlx5_core]
call_timer_fn+0x32/0x130
__run_timers.part.0+0x180/0x280
? tick_sched_handle+0x33/0x60
? tick_sched_timer+0x3d/0x80
? ktime_get+0x3e/0xa0
run_timer_softirq+0x2a/0x50
__do_softirq+0xe1/0x2d6
? hrtimer_interrupt+0x136/0x220
irq_exit+0xae/0xb0
smp_apic_timer_interrupt+0x7b/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
Fixes: 3c5193a87b0f ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
Signed-off-by: Moshe Shemesh <[email protected]>
Reviewed-by: Maher Sanalla <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Setting dscp2prio during the driver reload can cause dcb ieee app list to
be not empty after the reload finish and as a result to a conflict between
the priority trust state reported by the app and the state in the device
register.
Reset the dcb ieee app list on initialization in case this is
conflicting with the register status.
Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: Moshe Tal <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
During TC action parsing, the can_offload callback is called
before calling the action's main parsing callback.
Later on, the can_offload callback is called again before handling
the action's post_parse callback if exists.
Since the main parsing callback might have changed and set parsing
params for the rule, following can_offload checks might fail because
some parsing params were already set.
Specifically, the ct action main parsing sets the ct param in the
parsing status structure and when the second can_offload for ct action
is called, before handling the ct post parsing, it will return an error
since it checks this ct param to indicate multiple ct actions which are
not supported.
Therefore, the can_offload call is removed from the post parsing
handling to prevent such cases.
This is allowed since the first can_offload call will ensure that the
action can be offloaded and the fact the code reached the post parsing
handling already means that the action can be offloaded.
Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Ariel Levkovich <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
__mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
if that is the last reference to that tuple, the actual deletion of
the tuple can happen after the FT is already destroyed and freed.
Flush the used workqueue before destroying the ct FT.
Fixes: a2173131526d ("net/mlx5e: CT: manage the lifetime of the ct entry object")
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When resolving the decap route device for a tunnel decap rule,
the result may be an OVS internal port device.
Prior to adding the support for internal port offload, such case
would result in using the uplink as the default decap route device
which allowed devices that can't support internal port offload
to offload this decap rule.
This behavior got broken by adding the internal port offload which
will fail in case the device can't support internal port offload.
To restore the old behavior, use the uplink device as the decap
route as before when internal port offload is not supported.
Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Ariel Levkovich <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
ct_clear action is translated to clearing reg_c metadata
which holds ct state and zone information using mod header
actions.
These actions are allocated during the actions parsing, as
part of the flow attributes main mod header action list.
If ct action exists in the rule, the flow's main mod header
is used only in the post action table rule, after the ct tables
which set the ct info in the reg_c as part of the ct actions.
Therefore, if the original rule has a ct_clear action followed
by a ct action, the ct action reg_c setting will be done first and
will be followed by the ct_clear resetting reg_c and overwriting
the ct info.
Fix this by moving the ct_clear mod header actions allocation from
the ct action parsing stage to the ct action post parsing stage where
it is already known if ct_clear is followed by a ct action.
In such case, we skip the mod header actions allocation for the ct
clear since the ct action will write to reg_c anyway after clearing it.
Fixes: 806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: Ariel Levkovich <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Referenced change added check to skip updating fib when new fib instance
has same or lower priority. However, new fib instance can be an update on
same dst address as existing one even though the structure is another
instance that has different address. Ignoring events on such instances
causes multipath LAG state to not be correctly updated.
Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
structure and don't skip events that have the same value of that fields.
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Referenced change incorrectly sets single path fib_info even when LAG is
not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
that verifies LAG state.
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.
Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.
[0]:
[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138
[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>
[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3
[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected
[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================
Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The arguments of update_buffer_lossy() is in a wrong order. Fix it.
Fixes: 88b3d5c90e96 ("net/mlx5e: Fix port buffers cell size value")
Signed-off-by: Mark Zhang <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Currently, match VLAN rule also matches packets that have multiple VLAN
headers. This behavior is similar to buggy flower classifier behavior that
has recently been fixed. Fix the issue by matching on
outer_second_cvlan_tag with value 0 which will cause the HW to verify the
packet doesn't contain second vlan header.
Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|