Age | Commit message (Collapse) | Author | Files | Lines |
|
The verifier checks that PTR_TO_BTF_ID pointer is either valid or NULL,
but it cannot distinguish IS_ERR pointer from valid one.
When offset is added to IS_ERR pointer it may become small positive
value which is a user address that is not handled by extable logic
and has to be checked for at the runtime.
Tighten BPF_PROBE_MEM pointer check code to prevent this case.
Fixes: 4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
Reported-by: Lorenzo Fontana <lorenzo.fontana@elastic.co>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The prog - start_of_ldx is the offset before the faulting ldx to the location
after it, so this will be used to adjust pt_regs->ip for jumping over it and
continuing, and with old temp it would have been fixed up to the wrong offset,
causing crash.
Fixes: 4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"A single fix for the clk framework that needed some more bake time in
linux-next.
The problem is that two clks being registered at the same time can
lead to a busted clk tree if the parent isn't fully registered by the
time the child finds the parent. We rejigger the place where we mark
the parent as fully registered so that the child can't find the parent
until things are proper"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Don't parent clks until the parent is fully registered
|
|
Add a test case which tries to taint map value pointer arithmetic into a
unknown scalar with subsequent export through the map.
Before fix:
# ./test_verifier 1186
#1186/u map access: trying to leak tained dst reg FAIL
Unexpected success to load!
verification time 24 usec
stack depth 8
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
#1186/p map access: trying to leak tained dst reg FAIL
Unexpected success to load!
verification time 8 usec
stack depth 8
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
After fix:
# ./test_verifier 1186
#1186/u map access: trying to leak tained dst reg OK
#1186/p map access: trying to leak tained dst reg OK
Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make the bounds propagation in __reg_assign_32_into_64() slightly more
robust and readable by aligning it similarly as we did back in the
__reg_combine_64_into_32() counterpart. Meaning, only propagate or
pessimize them as a smin/smax pair.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
For the case where both s32_{min,max}_value bounds are positive, the
__reg_assign_32_into_64() directly propagates them to their 64 bit
counterparts, otherwise it pessimises them into [0,u32_max] universe and
tries to refine them later on by learning through the tnum as per comment
in mentioned function. However, that does not always happen, for example,
in mov32 operation we call zext_32_to_64(dst_reg) which invokes the
__reg_assign_32_into_64() as is without subsequent bounds update as
elsewhere thus no refinement based on tnum takes place.
Thus, not calling into the __update_reg_bounds() / __reg_deduce_bounds() /
__reg_bound_offset() triplet as we do, for example, in case of ALU ops via
adjust_scalar_min_max_vals(), will lead to more pessimistic bounds when
dumping the full register state:
Before fix:
0: (b4) w0 = -1
1: R0_w=invP4294967295
(id=0,imm=ffffffff,
smin_value=4294967295,smax_value=4294967295,
umin_value=4294967295,umax_value=4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
1: (bc) w0 = w0
2: R0_w=invP4294967295
(id=0,imm=ffffffff,
smin_value=0,smax_value=4294967295,
umin_value=4294967295,umax_value=4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
Technically, the smin_value=0 and smax_value=4294967295 bounds are not
incorrect, but given the register is still a constant, they break assumptions
about const scalars that smin_value == smax_value and umin_value == umax_value.
After fix:
0: (b4) w0 = -1
1: R0_w=invP4294967295
(id=0,imm=ffffffff,
smin_value=4294967295,smax_value=4294967295,
umin_value=4294967295,umax_value=4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
1: (bc) w0 = w0
2: R0_w=invP4294967295
(id=0,imm=ffffffff,
smin_value=4294967295,smax_value=4294967295,
umin_value=4294967295,umax_value=4294967295,
var_off=(0xffffffff; 0x0),
s32_min_value=-1,s32_max_value=-1,
u32_min_value=-1,u32_max_value=-1)
Without the smin_value == smax_value and umin_value == umax_value invariant
being intact for const scalars, it is possible to leak out kernel pointers
from unprivileged user space if the latter is enabled. For example, when such
registers are involved in pointer arithmtics, then adjust_ptr_min_max_vals()
will taint the destination register into an unknown scalar, and the latter
can be exported and stored e.g. into a BPF map value.
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Kuee K1r0a <liulin063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix missing error code on kexec failure path"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kexec: Fix missing error code 'ret' warning in load_other_segments()
|
|
ath.git patches for v5.17. Major changes:
ath11k
* support PCI devices with 1 MSI vector
* WCN6855 hw2.1 support
* 11d scan offload support
* full monitor mode, only supported on QCN9074
* scan MAC address randomization support
* reserved host DDR addresses from DT for PCI devices support
ath9k
* switch to rate table based lookup
ath
* extend South Korea regulatory domain support
wcn36xx
* beacon filter support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix use after free in DM btree remove's rebalance_children()
- Fix DM integrity data corruption, introduced during 5.16 merge, due
to improper use of bvec_kmap_local()
* tag 'for-5.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix data corruption due to improper use of bvec_kmap_local
dm btree remove: fix use after free in rebalance_children()
|
|
Since commit ac10be5cdbfa ("arm64: Use common
of_kexec_alloc_and_setup_fdt()"), smatch reports the following warning:
arch/arm64/kernel/machine_kexec_file.c:152 load_other_segments()
warn: missing error code 'ret'
Return code is not set to an error code in load_other_segments() when
of_kexec_alloc_and_setup_fdt() call returns a NULL dtb. This results
in status success (return code set to 0) being returned from
load_other_segments().
Set return code to -EINVAL if of_kexec_alloc_and_setup_fdt() returns
NULL dtb.
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: ac10be5cdbfa ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20211210010121.101823-1-nramas@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Fix afs_add_open_map() to check that the vnode isn't already on the list
when it adds it. It's possible that afs_drop_open_mmap() decremented
the cb_nr_mmap counter, but hadn't yet got into the locked section to
remove it.
Also vnode->cb_mmap_link should be initialised, so fix that too.
Fixes: 6e0e99d58a65 ("afs: Fix mmap coherency vs 3rd-party changes")
Reported-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com
Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/686465.1639435380@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ipip6_dev_free is sit dev->priv_destructor, already called
by register_netdevice() if something goes wrong.
Alternative would be to make ipip6_dev_free() robust against
multiple invocations, but other drivers do not implement this
strategy.
syzbot reported:
dst_release underflow
WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173
Modules linked in:
CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173
Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48
RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246
RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000
RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c
R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358
R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000
FS: 00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160
ipip6_dev_free net/ipv6/sit.c:1414 [inline]
sit_init_net+0x229/0x550 net/ipv6/sit.c:1936
ops_init+0x313/0x430 net/core/net_namespace.c:140
setup_net+0x35b/0x9d0 net/core/net_namespace.c:326
copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470
create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226
ksys_unshare+0x57d/0xb50 kernel/fork.c:3075
__do_sys_unshare kernel/fork.c:3146 [inline]
__se_sys_unshare kernel/fork.c:3144 [inline]
__x64_sys_unshare+0x34/0x40 kernel/fork.c:3144
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f66c882ce99
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200
RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000
</TASK>
Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The descriptor list is a shared resource across all of the transmit queues, and
the locking mechanism used today only protects concurrency across a given
transmit queue between the transmit and reclaiming. This creates an opportunity
for the SYSTEMPORT hardware to work on corrupted descriptors if we have
multiple producers at once which is the case when using multiple transmit
queues.
This was particularly noticeable when using multiple flows/transmit queues and
it showed up in interesting ways in that UDP packets would get a correct UDP
header checksum being calculated over an incorrect packet length. Similarly TCP
packets would get an equally correct checksum computed by the hardware over an
incorrect packet length.
The SYSTEMPORT hardware maintains an internal descriptor list that it re-arranges
when the driver produces a new descriptor anytime it writes to the
WRITE_PORT_{HI,LO} registers, there is however some delay in the hardware to
re-organize its descriptors and it is possible that concurrent TX queues
eventually break this internal allocation scheme to the point where the
length/status part of the descriptor gets used for an incorrect data buffer.
The fix is to impose a global serialization for all TX queues in the short
section where we are writing to the WRITE_PORT_{HI,LO} registers which solves
the corruption even with multiple concurrent TX queues being used.
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211215202450.4086240-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In nginx/wrk benchmark, there's a hung problem with high probability
on case likes that: (client will last several minutes to exit)
server: smc_run nginx
client: smc_run wrk -c 10000 -t 1 http://server
Client hangs with the following backtrace:
0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c
This issue dues to flush_work(), which is used to wait for
smc_connect_work() to finish in smc_release(). Once lots of
smc_connect_work() was pending or all executing work dangling,
smc_release() has to block until one worker comes to free, which
is equivalent to wait another smc_connnect_work() to finish.
In order to fix this, There are two changes:
1. For those idle smc_connect_work(), cancel it from the workqueue; for
executing smc_connect_work(), waiting for it to finish. For that
purpose, replace flush_work() with cancel_work_sync().
2. Since smc_connect() hold a reference for passive closing, if
smc_connect_work() has been cancelled, release the reference.
Fixes: 24ac3a08e658 ("net/smc: rebuild nonblocking connect")
Reported-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The prima driver facilitates the direct programming of beacon filter tables via
SMD commands.
The purpose of beacon filters is quote:
/* When beacon filtering is enabled, firmware will
* analyze the selected beacons received during BMPS,
* and monitor any changes in the IEs as listed below.
* The format of the table is:
* - EID
* - Check for IE presence
* - Byte offset
* - Byte value
* - Bit Mask
* - Byte reference
*/
The default filter table looks something like this:
tBeaconFilterIe gaBcnFilterTable[12] =
{
{ WLAN_EID_DS_PARAMS, 0u, { 0u, 0u, 0u, 0u } },
{ WLAN_EID_ERP_INFO, 0u, { 0u, 0u, 248u, 0u } },
{ WLAN_EID_EDCA_PARAM_SET, 0u, { 0u, 0u, 240u, 0u } },
{ WLAN_EID_QOS_CAPA, 0u, { 0u, 0u, 240u, 0u } },
{ WLAN_EID_CHANNEL_SWITCH, 1u, { 0u, 0u, 0u, 0u } },
{ WLAN_EID_QUIET, 1u, { 0u, 0u, 0u, 0u } },
{ WLAN_EID_HT_OPERATION, 0u, { 0u, 0u, 0u, 0u } },
{ WLAN_EID_HT_OPERATION, 0u, { 1u, 0u, 248u, 0u } },
{ WLAN_EID_HT_OPERATION, 0u, { 2u, 0u, 235u, 0u } },
{ WLAN_EID_HT_OPERATION, 0u, { 5u, 0u, 253u, 0u } },
{ WLAN_EID_PWR_CONSTRAINT, 0u, { 0u, 0u, 0u, 0u } },
{ WLAN_EID_OPMODE_NOTIF, 0u, { 0u, 0u, 0u, 0u } }
};
Add in an equivalent filter set as present in the prima Linux driver.
For now omit the beacon filter "rem" command as the driver does not have an
explicit call to that SMD command. The filter mask should only count when
we are inside BMPS anyway.
Replicating the ability to program the filter table gives us scope to add and
remove elements in future. For now though this patch makes the rote-copy of the
downstream Linux beacon filter table, which we can tweak as desired from now
on.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20211214134630.2214840-4-bryan.odonoghue@linaro.org
|
|
The comment in the header with respect to beacon filtering makes a
reference to "the structure above" and "the structure below" which would be
informative if the comment appeared in the right place but, it does not.
Fix the comment location so that it a least makes sense w/r/t the physical
location statements.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20211214134630.2214840-3-bryan.odonoghue@linaro.org
|
|
The beacon filter structures need to be packed. Right now its fine because
we don't yet use these structures so just pack them without marking it for
backporting.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20211214134630.2214840-2-bryan.odonoghue@linaro.org
|
|
Host DDR memory (contiguous 45 MB in mode-0 or 15 MB in mode-2)
is reserved through DT entries for firmware usage. Send the base
address from DT entries.
If DT entry is available, PCI device will work with
fixed_mem_region else host allocates multiple segments.
IPQ8074 on HK10 board supports multiple PCI devices.
IPQ8074 + QCN9074 is tested with this patch.
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1-01838-QCAHKSWPL_SILICONZ-1
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/1638789319-2950-2-git-send-email-akolli@codeaurora.org
|
|
Ath11k driver supports PCI devices such as QCN9074/QCA6390.
Ath11k firmware uses host DDR memory, DT entry is used to
reserve host DDR memory regions, send these memory base
addresses using DT entries.
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/1638789319-2950-1-git-send-email-akolli@codeaurora.org
|
|
Florian Westphal says:
====================
fib: merge nl policies
v4: resend with fixed subject line. I preserved review tags
from David Ahern.
v3: drop first two patches, otherwise unchanged.
This series merges the different (largely identical) nla policies.
v2 also squashed the ->suppress() implementation, I've dropped this.
Problem is that it needs ugly ifdef'ry to avoid build breakage
with CONFIG_INET=n || IPV6=n.
Given that even microbenchmark doesn't show any noticeable improvement
when ->suppress is inlined (it uses INDIRECT_CALLABLE) i decided to toss
the patch instead of adding more ifdefs.
====================
Link: https://lore.kernel.org/r/20211216120507.3299-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that there is only one fib nla_policy there is no need to
keep the macro around. Place it where its used.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The attributes are identical in all implementations so move the ipv4 one
into the core and remove the per-family nla policies.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes
soc/tegra: Fixes for v5.16-rc6
This contains a single build fix without which ARM allmodconfig builds
are broken if -Werror is enabled.
* tag 'tegra-for-5.16-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
soc/tegra: fuse: Fix bitwise vs. logical OR warning
Link: https://lore.kernel.org/r/20211215162618.3568474-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
When printing netdev features %pNF already takes care of the 0x prefix,
remove the explicit one.
Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We found the stat of rx drops for small pkts does not increment when
build_skb fail, it's not coherent with other mode's rx drops stat.
Signed-off-by: Wenliang Wang <wangwenliang.1995@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Debug print uses invalid check to detect if speed is unforced:
(speed != SPEED_UNFORCED) should be used instead of (!speed).
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Eremeev <Axtone4all@yandex.ru>
Fixes: 96a2b40c7bd3 ("net: dsa: mv88e6xxx: add port's MAC speed setter")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The return value of kmalloc() needs to be checked.
To avoid use in efx_nic_update_stats() in case of the failure of alloc.
Fixes: b593b6f1b492 ("sfc_ef100: statistics gathering")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add user template explicit support. At this moment, max
TCAM rule size is utilized for all rules, doesn't matter
which and how much flower matches are provided by user. It
means that some of TCAM space is wasted, which impacts
the number of filters that can be offloaded.
Introducing the template, allows to have more HW offloaded
filters by specifying the template explicitly.
Example:
tc qd add dev PORT clsact
tc chain add dev PORT ingress protocol ip \
flower dst_ip 0.0.0.0/16
tc filter add dev PORT ingress protocol ip \
flower skip_sw dst_ip 1.2.3.4/16 action drop
NOTE: chain 0 is the default chain id for "tc chain" & "tc filter"
command, so it is omitted in the example above.
This patch adds only template support for default chain 0 suppoerted
by prestera driver at this moment. Chains are not supported yet,
and will be added later.
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent net-next fails to initialize ports with:
realtek-smi switch: phy mode gmii is unsupported on port 0
realtek-smi switch lan5 (uninitialized): validation of gmii with
support 0000000,00000000,000062ef and advertisement
0000000,00000000,000062ef failed: -22
realtek-smi switch lan5 (uninitialized): failed to connect to PHY:
-EINVAL
realtek-smi switch lan5 (uninitialized): error -22 setting up PHY
for tree 1, switch 0, port 0
Current net branch(3dd7d40b43663f58d11ee7a3d3798813b26a48f1) is not
affected.
I also noticed the same issue before with older versions but using
a MDIO interface driver, not realtek-smi.
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KASAN reports an out-of-bounds read in rk_gmac_setup on the line:
while (ops->regs[i]) {
This happens for most platforms since the regs flexible array member is
empty, so the memory after the ops structure is being read here. It
seems that mostly this happens to contain zero anyway, so we get lucky
and everything still works.
To avoid adding redundant data to nearly all the ops structures, add a
new flag to indicate whether the regs field is valid and avoid this loop
when it is not.
Fixes: 3bb3d6b1c195 ("net: stmmac: Add RK3566/RK3568 SoC support")
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jeroen de Borst says:
====================
gve improvements
This patchset consists of unrelated changes:
A bug fix for an issue that disabled jumbo-frame support, a few code
improvements and minor funcitonal changes and 3 new features:
Supporting tx|rx-coalesce-usec for DQO
Suspend/resume/shutdown
Optional metadata descriptors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding ethtool support for changing rx-coalesce-usec and tx-coalesce-usec
when using the DQO queue format.
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Being able to see how many descriptors are in-use is helpful
when diagnosing certain issues.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: Jordan Kim <jrkim@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for suspend, resume and shutdown.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow drivers to pass metadata along with packet data to the device.
Introduce a new metadata descriptor type
* GVE_TXD_MTD
This descriptor is optional. If present it immediate follows the
packet descriptor and precedes the segment descriptor.
This descriptor may be repeated. Multiple metadata descriptors may
follow. There are no immediate uses for this, this is for future
proofing. At present devices allow only 1 MTD descriptor.
The lower four bits of the type_flags field encode GVE_TXD_MTD.
The upper four bits of the type_flags field encodes a *sub*type.
Introduce one such metadata descriptor subtype
* GVE_MTD_SUBTYPE_PATH
This shares path information with the device for network failure
discovery and robust response:
Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
and updates this field on error with sk_rethink_txhash. Allow the host
stack to do the same. Pass the tx_hash value if set. Also communicate
whether the path hash is set, or more exactly, what its type is. Define
two common types
GVE_MTD_PATH_HASH_NONE
GVE_MTD_PATH_HASH_L4
Concrete examples of error conditions that are resolved are
mentioned in the commits that add sk_rethink_txhash calls. Such as
commit 7788174e8726 ("tcp: change IPv6 flow-label upon receiving
spurious retransmission").
Experimental results mirror what the theory suggests: where IPv6
FlowLabel is included in path selection (e.g., LAG/ECMP), flowlabel
rotation on TCP timeout avoids the vast majority of TCP disconnects
that would otherwise have occurred during link failures in long-haul
backbones, when an alternative path is available.
Rotation can be applied to various bad connection signals, such as
timeouts and spurious retransmissions. In aggregate, such flow level
signals can help locate network issues. Define initial common states:
GVE_MTD_PATH_STATE_DEFAULT
GVE_MTD_PATH_STATE_TIMEOUT
GVE_MTD_PATH_STATE_CONGESTION
GVE_MTD_PATH_STATE_RETRANSMIT
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No longer needed after we introduced the barrier in gve_napi_poll.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The id field should be a u32 not a signed int.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Giving the device access to other kernel structs is not ideal.
Move the indexes into their own array and just keep pointers to
them in the ntfy block struct.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The legacy raw addressing device option was processed before the
new RDA queue format option. This caused the supported features mask,
which is provided only on the RDA queue format option, not to be set.
This disabled jumbo-frame support when using raw adressing.
Fixes: 255489f5b33c ("gve: Add a jumbo-frame device option")
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King says:
====================
net: phylink: add PCS validation
This series allows phylink to include the PCS in its validation step.
There are two reasons to make this change:
1. Some of the network drivers that are making use of the split PCS
support are already manually calling into their PCS drivers to
perform validation. E.g. stmmac with xpcs.
2. Logically, some network drivers such as mvneta and mvpp2, the
restriction we impose in the validate() callback is a property of
the "PCS" block that we provide rather than the MAC.
This series:
1. Gives phylink a mechanism to query the MAC driver which PCS is
wishes to use for the PHY interface mode. This is necessary to allow
the PCS to be involved in the validation step without making changes
to the configuration.
2. Provide a pcs_validate() method that PCS can implement. This follows
a similar model to the MAC's validate() callback, but with some minor
differences due to observations from the various implementations.
E.g. returning an error code for not-supported and the way the
advertising bitmap is masked.
3. Convert mvpp2 and mvneta to this as examples of its use. Further
Conversions are in the pipeline, including for stmmac+xpcs, as well
as some DSA drivers. Note that DSA conversion to this is conditional
upon all DSA drivers populating their supported_interfaces bitmap,
since this is required before mac_select_pcs() can be used.
Existing drivers that set a PCS in mac_prepare() or mac_config(), or
shortly after phylink_create() will continue to work. However, it should
be noted that mac_select_pcs() will be called during phylink_create(),
and thus any PCS returned by mac_select_pcs() must be available by this
time - or we drop the check in phylink_create().
v2: fix kerneldoc typo in patch 1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert mvneta to validate the autoneg state for 1000base-X in the
pcs_validate() operation, rather than the MAC validate() operation.
This allows us to switch the MAC validate() to use
phylink_generic_validate().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An initial stab at converting mvneta to PCS operations. There's a few
FIXMEs to be solved.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert mvneta to use the mac_prepare() and mac_finish() methods in
preparation to converting mvneta to split-PCS support.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert mvpp2 to validate the autoneg state for 1000base-X in the
pcs_validate() operation, rather than the MAC validate() operation.
This allows us to switch the MAC validate() to use
phylink_generic_validate().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the mac_select_pcs() method to choose between the GMAC and XLG
PCS implementations.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a hook for PCS to validate the link parameters. This avoids MAC
drivers having to have knowledge of their PCS in their validate()
method, thereby allowing several MAC drivers to be simplfied.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mac_select_pcs() allows us to have an explicit point to query which
PCS the MAC wishes to use for a particular PHY interface mode, thereby
allowing us to add support to validate the link settings with the PCS.
Phylink will also use this to select the PCS to be used during a major
configuration event without the MAC driver needing to call
phylink_set_pcs().
Note that if mac_select_pcs() is present, the supported_interfaces
bitmap must be filled in; this avoids mac_select_pcs() being called
with PHY_INTERFACE_MODE_NA when we want to get support for all
interface types. Phylink will return an error in phylink_create()
unless this condition is satisfied.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-12-15
This series contains updates to igb, igbvf, igc and ixgbe drivers.
Karen moves checks for invalid VF MAC filters to occur earlier for
igb.
Letu Ren fixes a double free issue in igbvf probe.
Sasha fixes incorrect min value being used when calculating for max for
igc.
Robert Schlabbach adds documentation on enabling NBASE-T support for
ixgbe.
Cyril Novikov adds missing initialization of MDIO bus speed for ixgbe.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-12-15
This series contains updates to ice driver only.
Jake makes changes to flash update. This includes the following:
* a new shadow-ram region similar to NVM region but for the device shadow
RAM contents. This is distinct from NVM region because shadow RAM is
built up during device init and may be different from the raw NVM flash
data.
* refactoring of the ice_flash_pldm_image to become the main flash update
entry point. This is simpler than having both an
ice_devlink_flash_update and an ice_flash_pldm_image. It will make
additions like dry-run easier in the future.
* reducing time to read Option ROM version information.
* adding support for firmware activation via devlink reload, when
possible.
The major new work is the reload support, which allows activating firmware
immediately without a reboot when possible. Reload support only supports
firmware activation.
Jesse improves transmit code: utilizing newer netif_tx* API, adding some
prefetch calls, correcting expected conditions when calling ice_vsi_down(),
and utilizing __netdev_tx_sent_queue() call.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next branch 2021-12-15
Hi Dave, Jakub, Jason
This pulls mlx5-next branch into net-next and rdma branches.
All patches already reviewed on both rdma and netdev mailing lists.
Please pull and let me know if there's any problem.
1) Add multiple FDB steering priorities [1]
2) Introduce HW bits needed to configure MAC list size of VF/SF.
Required for ("net/mlx5: Memory optimizations") upcoming series [2].
[1] https://lore.kernel.org/netdev/20211201193621.9129-1-saeed@kernel.org/
[2] https://lore.kernel.org/lkml/20211208141722.13646-1-shayd@nvidia.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|