Age | Commit message (Collapse) | Author | Files | Lines |
|
I got memory leak as follows when doing fault injection test:
unreferenced object 0xffff88800906c618 (size 8):
comm "i2c-idt82p33931", pid 4421, jiffies 4294948083 (age 13.188s)
hex dump (first 8 bytes):
70 74 70 30 00 00 00 00 ptp0....
backtrace:
[<00000000312ed458>] __kmalloc_track_caller+0x19f/0x3a0
[<0000000079f6e2ff>] kvasprintf+0xb5/0x150
[<0000000026aae54f>] kvasprintf_const+0x60/0x190
[<00000000f323a5f7>] kobject_set_name_vargs+0x56/0x150
[<000000004e35abdd>] dev_set_name+0xc0/0x100
[<00000000f20cfe25>] ptp_clock_register+0x9f4/0xd30 [ptp]
[<000000008bb9f0de>] idt82p33_probe.cold+0x8b6/0x1561 [ptp_idt82p33]
When posix_clock_register() returns an error, the name allocated
in dev_set_name() will be leaked, the put_device() should be used
to give up the device reference, then the name will be freed in
kobject_cleanup() and other memory will be freed in ptp_clock_release().
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When utilizing End to End delay mechanism, the following error messages show up:
|root@ehl1:~# ptp4l --tx_timestamp_timeout=50 -H -i eno2 -E -m
|ptp4l[950.573]: selected /dev/ptp3 as PTP clock
|ptp4l[950.586]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[950.586]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[952.879]: port 1: new foreign master 001395.fffe.4897b4-1
|ptp4l[956.879]: selected best master clock 001395.fffe.4897b4
|ptp4l[956.879]: port 1: assuming the grand master role
|ptp4l[956.879]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER
|ptp4l[962.017]: port 1: received DELAY_REQ without timestamp
|ptp4l[962.273]: port 1: received DELAY_REQ without timestamp
|ptp4l[963.090]: port 1: received DELAY_REQ without timestamp
Commit f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP
packets in dwmac v5.10a") already addresses this problem for the dwmac
v5.10. However, same holds true for all dwmacs above version v4.10. Correct the
check accordingly. Afterwards everything works as expected.
Tested on Intel Atom(R) x6414RE Processor.
Fixes: 14f347334bf2 ("net: stmmac: Correctly take timestamp for PTPv2")
Fixes: f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a")
Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If something goes wrong in the remove callback, returning an error code
just results in an error message. The device still disappears.
So don't skip disabling the regulator in st95hf_remove() if resetting
the controller via spi fails. Also don't return an error code which just
results in two error messages.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko.
hns3.ko includes network stack and pci_driver, hclge.ko includes
HW device action, algo_ops and timer task, hnae3.ko includes some
register function.
When SRIOV is enable and hclge.ko is removed, HW device is unloaded
but VF still exists, PF will not reply VF mbx messages, and cause
errors.
This patch fix it by disable SRIOV before remove hclge.ko.
Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The task of VF reset is performed through the workqueue. It checks the
value of hdev->reset_pending to determine whether to exit the loop.
However, the value of hdev->reset_pending may also be assigned by
the interrupt function hclgevf_misc_irq_handle(), which may cause the
loop fail to exit and keep occupying the workqueue. This loop is not
necessary, so remove it and the workqueue will be rescheduled if the
reset needs to be retried or a new reset occurs.
Fixes: 1cc9bc6e5867 ("net: hns3: split hclgevf_reset() into preparing and rebuilding part")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently when there is a rx page allocation failure, it is
possible that polling may be stopped if there is no more packet
to be reveiced, which may cause queue stall problem under memory
pressure.
This patch makes sure polling is scheduled again when there is
any rx page allocation failure, and polling will try to allocate
receive buffers until it succeeds.
Now the allocation retry is added, it is unnecessary to do the rx
page allocation at the end of rx cleaning, so remove it. And reset
the unused_count to zero after calling hns3_nic_alloc_rx_buffers()
to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under
memory pressure.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rx unused desc is the desc that need attatching new buffer
before refilling to hw to receive new packet, the number of
desc need attatching new buffer is calculated using next_to_use
and next_to_clean. when next_to_use == next_to_clean, currently
hns3 driver assumes that all the desc has the buffer attatched,
but 'next_to_use == next_to_clean' also means all the desc need
attatching new buffer if hw has comsumed all the desc and the
driver has not attatched any buffer to the desc yet.
This patch adds 'refill' in desc_cb to indicate whether a new
buffer has been refilled to a desc.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the max tx size supported by the hw is calculated by
using the max BD num supported by the hw. According to the hw
user manual, the max tx size is fixed value for both non-TSO and
TSO skb.
This patch updates the max tx size according to the manual.
Fixes: 8ae10cfb5089("net: hns3: support tx-scatter-gather-fraglist feature")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP
mode. In this case, this tc may occupy all the tx bandwidth if it has
huge traffic, so it violates the purpose of the user setting.
To fix this problem, limit the ets dwrr bandwidth must greater than 0.
Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, DWRR of tc will be initialized to a fixed value when this tc
is enabled, but it is not been reset to 0 when this tc is disabled. It
cause a problem that the DWRR of unused tc is not 0 after using tc tool
to add and delete multi-tc parameters.
For examples, after enabling 4 TCs and restoring to 1 TC by follow
tc commands:
$ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \
8@0 8@8 8@16 8@24 hw 1 mode channel
$ tc qdisc del dev eth0 root
Now there is just one TC is enabled for eth0, but the tc info querying by
debugfs is shown as follow:
$ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info
enabled tc number: 1
weight_offset: 14
TC MODE WEIGHT
0 dwrr 100
1 dwrr 100
2 dwrr 100
3 dwrr 100
4 dwrr 0
5 dwrr 0
6 dwrr 0
7 dwrr 0
This patch fixes it by resetting DWRR of tc to 0 when tc is disabled.
Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add configuration of interrupt type and fifo interrupt enable of TM QCN
error event if enabled, otherwise this event will not be reported when
there is error.
Fixes: d914971df022 ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1.
When an interface is enslaved in a VRF, prerouting conntrack hook is
called twice: once in the context of the original input interface, and
once in the context of the VRF interface. If no special precausions are
taken, this leads to creation of two conntrack entries instead of one,
and breaks SNAT.
Commit above was intended to avoid creation of extra conntrack entries
when input interface is enslaved in a VRF. It did so by resetting
conntrack related data associated with the skb when it enters VRF context.
However it breaks netfilter operation. Imagine a use case when conntrack
zone must be assigned based on the original input interface, rather than
VRF interface (that would make original interfaces indistinguishable). One
could create netfilter rules similar to these:
chain rawprerouting {
type filter hook prerouting priority raw;
iif realiface1 ct zone set 1 return
iif realiface2 ct zone set 2 return
}
This works before the mentioned commit, but not after: zone assignment
is "forgotten", and any subsequent NAT or filtering that is dependent
on the conntrack zone does not work.
Here is a reproducer script that demonstrates the difference in behaviour.
==========
#!/bin/sh
# This script demonstrates unexpected change of nftables behaviour
# caused by commit 09e856d54bda5f28 ""vrf: Reset skb conntrack
# connection on VRF rcv"
# https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
#
# Before the commit, it was possible to assign conntrack zone to a
# packet (or mark it for `notracking`) in the prerouting chanin, raw
# priority, based on the `iif` (interface from which the packet
# arrived).
# After the change, # if the interface is enslaved in a VRF, such
# assignment is lost. Instead, assignment based on the `iif` matching
# the VRF master interface is honored. Thus it is impossible to
# distinguish packets based on the original interface.
#
# This script demonstrates this change of behaviour: conntrack zone 1
# or 2 is assigned depending on the match with the original interface
# or the vrf master interface. It can be observed that conntrack entry
# appears in different zone in the kernel versions before and after
# the commit.
IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30
ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct
ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
/sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout
nft -f - <<__END__
table testct {
chain rawpre {
type filter hook prerouting priority raw;
iif { veout, tvrf } meta nftrace set 1
iif veout ct zone set 1 return
iif tvrf ct zone set 2 return
notrack
}
chain rawout {
type filter hook output priority raw;
notrack
}
}
__END__
uname -rv
conntrack -F
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L
Signed-off-by: Eugene Crosser <crosser@average.org>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A new warning in clang points out two places in this driver where
boolean expressions are being used with a bitwise OR instead of a
logical one:
drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: note: cast one or both operands to int to silence this warning
drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: note: cast one or both operands to int to silence this warning
2 errors generated.
The motivation for the warning is that logical operations short circuit
while bitwise operations do not. In this case, it does not seem like
short circuiting is harmful so implement the suggested fix of changing
to a logical operation to fix the warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1479
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20211018193101.2340261-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix following coccicheck warning:
./drivers/net/ethernet/mscc/ocelot_vsc7514.c:946:1-33: WARNING: Function
for_each_available_child_of_node should have of_node_put() before goto.
Early exits from for_each_available_child_of_node should decrement the
node reference counter.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix following coccicheck warning:
./drivers/net/ethernet/microchip/sparx5/s4parx5_main.c:723:1-33: WARNING: Function
for_each_available_child_of_node should have of_node_put() before goto
Early exits from for_each_available_child_of_node should decrement the
node reference counter.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Setting ds->num_ports to DSA_MAX_PORTS made DSA core allocate unnecessary
dsa_port's and call mt7530_port_disable for non-existent ports.
Set it to MT7530_NUM_PORTS to fix that, and dsa_is_user_port check in
port_enable/disable is no longer required.
Cc: stable@vger.kernel.org
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I compared the register definitions with the D-Link DWR-966
GPL sources and found that the PUAFD field definition was
incorrect. This definition is unused and causes no issues.
Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-10-17
this is a pull request of 11 patches for net/master.
The first 4 patches are by Ziyang Xuan and Zhang Changzhong and fix 1
use after free and 3 standard conformance problems in the j1939 CAN
stack.
The next 2 patches are by Ziyang Xuan and fix 2 concurrency problems
in the ISOTP CAN stack.
Yoshihiro Shimoda's patch for the rcar_can fix suspend/resume on not
running CAN interfaces.
Aswath Govindraju's patch for the m_can driver fixes access for MMIO
devices.
Zheyu Ma contributes a patch for the peak_pci driver to fix a use
after free.
Stephane Grosjean's 2 patches fix CAN error state handling in the
peak_usb driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On i386, the baycom_epp driver wants to inspect X86 CPU features (TSC)
and then act on that data, but that info is not available when running
on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/hamradio/baycom_epp.c: In function ‘epp_bh’:
../drivers/net/hamradio/baycom_epp.c:630:6: error: implicit declaration of function ‘boot_cpu_has’; did you mean ‘get_cpu_mask’? [-Werror=implicit-function-declaration]
if (boot_cpu_has(X86_FEATURE_TSC)) \
^
../drivers/net/hamradio/baycom_epp.c:658:2: note: in expansion of macro ‘GETTICK’
GETTICK(time1);
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Cc: linux-hams@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
nullity of a pointer
Since alloc_can_err_skb() puts NULL in cf in the case when skb cannot
be allocated and can_change_state() handles the case when cf is NULL,
the test on the nullity of skb is now unnecessary.
Link: https://lore.kernel.org/all/20210929142111.55757-2-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
notification
This corrects the lack of notification of a return to ERROR_ACTIVE
state for USB - CANFD devices from PEAK-System.
Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters")
Link: https://lore.kernel.org/all/20210929142111.55757-1-s.grosjean@peak-system.com
Cc: stable@vger.kernel.org
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
When remove the module peek_pci, referencing 'chan' again after
releasing 'dev' will cause UAF.
Fix this by releasing 'dev' later.
The following log reveals it:
[ 35.961814 ] BUG: KASAN: use-after-free in peak_pci_remove+0x16f/0x270 [peak_pci]
[ 35.963414 ] Read of size 8 at addr ffff888136998ee8 by task modprobe/5537
[ 35.965513 ] Call Trace:
[ 35.965718 ] dump_stack_lvl+0xa8/0xd1
[ 35.966028 ] print_address_description+0x87/0x3b0
[ 35.966420 ] kasan_report+0x172/0x1c0
[ 35.966725 ] ? peak_pci_remove+0x16f/0x270 [peak_pci]
[ 35.967137 ] ? trace_irq_enable_rcuidle+0x10/0x170
[ 35.967529 ] ? peak_pci_remove+0x16f/0x270 [peak_pci]
[ 35.967945 ] __asan_report_load8_noabort+0x14/0x20
[ 35.968346 ] peak_pci_remove+0x16f/0x270 [peak_pci]
[ 35.968752 ] pci_device_remove+0xa9/0x250
Fixes: e6d9c80b7ca1 ("can: peak_pci: add support of some new PEAK-System PCI cards")
Link: https://lore.kernel.org/all/1634192913-15639-1-git-send-email-zheyuma97@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The read and writes from the fifo are from a buffer, with various
fields and data at predefined offsets. So, they should not be done to
the same address(or port) in case of val_count greater than 1.
Therefore, fix this by using iowrite32()/ioread32() instead of
ioread32_rep()/iowrite32_rep().
Also, the write into FIFO must be performed with an offset from the
message ram base address. Therefore, fix the base address to
mram_base.
Fixes: e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors")
Link: https://lore.kernel.org/all/20210920123344.2320-1-a-govindraju@ti.com
Signed-off-by: Aswath Govindraju <a-govindraju@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
If the driver was not opened, rcar_can_suspend() should not call
clk_disable() because the clock was not enabled.
Fixes: fd1159318e55 ("can: add Renesas R-Car CAN driver")
Link: https://lore.kernel.org/all/20210924075556.223685-1-yoshihiro.shimoda.uh@renesas.com
Cc: stable@vger.kernel.org
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Ayumi Nakamichi <ayumi.nakamichi.kf@renesas.com>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-10-14
Brett ensures RDMA nodes are removed during release and rebuild. He also
corrects fw.mgmt.api to include the patch number for proper
identification.
Dave stops ida_free() being called when an IDA has not been allocated.
Michal corrects the order of parameters being provided and the number of
entries skipped for UDP tunnels.
====================
Link: https://lore.kernel.org/r/20211014181953.3538330-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/usb/lan78xx.o: in function `lan78xx_set_multicast':
lan78xx.c:(.text+0x48cf): undefined reference to `crc32_le'
The actual use of crc32_le() comes indirectly through ether_crc().
Fixes: 55d7de9de6c30 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit a86ed2cfa13c5 ("ptp: Don't print an error if ptp_kvm is not supported")
fixes the error message print on ARM platform by only concerning about
the case that the error returned from kvm_arch_ptp_init() is not -EOPNOTSUPP.
Although the ARM platform returns -EOPNOTSUPP if ptp_kvm is not supported
while X86_64 platform returns -KVM_EOPNOTSUPP, both error codes share the
same value 95.
Actually kvm_arch_ptp_init() on X86_64 platform can return three kinds of
errors (-KVM_ENOSYS, -KVM_EOPNOTSUPP and -KVM_EFAULT). The problem is that
-KVM_EOPNOTSUPP is masked out and -KVM_EFAULT is ignored among them.
This patch fixes this by returning them to ptp_kvm_init() respectively.
Signed-off-by: Kele Huang <huangkele@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Quite calm.
The noisy DSA driver (embedded switches) changes, and adjustment to
IPv6 IOAM behavior add to diffstat's bottom line but are not scary.
Current release - regressions:
- af_unix: rename UNIX-DGRAM to UNIX to maintain backwards
compatibility
- procfs: revert "add seq_puts() statement for dev_mcast", minor
format change broke user space
Current release - new code bugs:
- dsa: fix bridge_num not getting cleared after ports leaving the
bridge, resource leak
- dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware
bridges using VID 0, prevent packet drops if pvid is removed
- dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware, prevent HW
getting confused about station to VLAN mapping
Previous releases - regressions:
- virtio-net: fix for skb_over_panic inside big mode
- phy: do not shutdown PHYs in READY state
- dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's, fix link
LED staying lit after ifdown
- mptcp: fix possible infinite wait on recvmsg(MSG_WAITALL)
- mqprio: Correct stats in mqprio_dump_class_stats()
- ice: fix deadlock for Tx timestamp tracking flush
- stmmac: fix feature detection on old hardware
Previous releases - always broken:
- sctp: account stream padding length for reconf chunk
- icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe()
- isdn: cpai: check ctr->cnr to avoid array index out of bound
- isdn: mISDN: fix sleeping function called from invalid context
- nfc: nci: fix potential UAF of rf_conn_info object
- dsa: microchip: prevent ksz_mib_read_work from kicking back in
after it's canceled in .remove and crashing
- dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged
ports
- dsa: sja1105, ocelot: break circular dependency between switch and
tag drivers
- dsa: felix: improve timestamping in presence of packe loss
- mlxsw: thermal: fix out-of-bounds memory accesses
Misc:
- ipv6: ioam: move the check for undefined bits to improve
interoperability"
* tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe
MAINTAINERS: Update the devicetree documentation path of imx fec driver
sctp: account stream padding length for reconf chunk
mlxsw: thermal: Fix out-of-bounds memory accesses
ethernet: s2io: fix setting mac address during resume
NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
nfc: fix error handling of nfc_proto_register()
Revert "net: procfs: add seq_puts() statement for dev_mcast"
net: encx24j600: check error in devm_regmap_init_encx24j600
net: korina: select CRC32
net: arc: select CRC32
net: dsa: felix: break at first CPU port during init and teardown
net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver
net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header
net: mscc: ocelot: deny TX timestamping of non-PTP packets
net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
...
|
|
Currently when a user uses "devlink dev info", the fw.mgmt.api will be
the major.minor numbers as shown below:
devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
driver ice
serial_number 00-01-00-ff-ff-00-00-00
versions:
fixed:
board.id K91258-000
running:
fw.mgmt 6.1.2
fw.mgmt.api 1.7 <--- No patch number included
fw.mgmt.build 0xd75e7d06
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.app.name ICE OS Default Package
fw.app 1.3.27.0
fw.app.bundle_id 0xc0000001
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
stored:
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
There are many features in the driver that depend on the major, minor,
and patch version of the FW. Without the patch number in the output for
fw.mgmt.api debugging issues related to the FW API version is difficult.
Also, using major.minor.patch aligns with the existing firmware version
which uses a 3 digit value.
Fix this by making the fw.mgmt.api print the major.minor.patch
versions. Shown below is the result:
devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
driver ice
serial_number 00-01-00-ff-ff-00-00-00
versions:
fixed:
board.id K91258-000
running:
fw.mgmt 6.1.2
fw.mgmt.api 1.7.9 <--- patch number included
fw.mgmt.build 0xd75e7d06
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.app.name ICE OS Default Package
fw.app 1.3.27.0
fw.app.bundle_id 0xc0000001
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
stored:
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
Fixes: ff2e5c700e08 ("ice: add basic handler for devlink .info_get")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Correct parameters order in call to ice_tunnel_idx_to_entry function.
Entry in sparse port table is correct when the idx is 0. For idx 1 one
correct entry should be skipped, for idx 2 two of them should be skipped
etc. Change if condition to be true when idx is 0, which means that
previous valid entry of this tunnel type were skipped.
Fixes: b20e6c17c468 ("ice: convert to new udp_tunnel infrastructure")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In the remove path, there is an attempt to free the aux_idx IDA whether
it was allocated or not. This can potentially cause a crash when
unloading the driver on systems that do not initialize support for RDMA.
But, this free cannot be gated by the status bit for RDMA, since it is
allocated if the driver detects support for RDMA at probe time, but the
driver can enter into a state where RDMA is not supported after the IDA
has been allocated at probe time and this would lead to a memory leak.
Initialize aux_idx to an invalid value and check for a valid value when
unloading to determine if an IDA free is necessary.
Fixes: d25a0fc41c1f9 ("ice: Initialize RDMA support")
Reported-by: Jun Miao <jun.miao@windriver.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active
the RDMA Tx queue scheduler node configuration will not be cleaned up.
This will cause the rebuild/re-add of the VSI to fail due to the
software structures not being correctly cleaned up for the VSI index.
Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there
are no RDMA scheduler nodes created, then there is no harm in calling
ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if
RDMA support is added for other VSI types they will also get this
change.
Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently, mlxsw allows cooling states to be set above the maximum
cooling state supported by the driver:
# cat /sys/class/thermal/thermal_zone2/cdev0/type
mlxsw_fan
# cat /sys/class/thermal/thermal_zone2/cdev0/max_state
10
# echo 18 > /sys/class/thermal/thermal_zone2/cdev0/cur_state
# echo $?
0
This results in out-of-bounds memory accesses when thermal state
transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the
transition table is accessed with a too large index (state) [1].
According to the thermal maintainer, it is the responsibility of the
driver to reject such operations [2].
Therefore, return an error when the state to be set exceeds the maximum
cooling state supported by the driver.
To avoid dead code, as suggested by the thermal maintainer [3],
partially revert commit a421ce088ac8 ("mlxsw: core: Extend cooling
device with cooling levels") that tried to interpret these invalid
cooling states (above the maximum) in a special way. The cooling levels
array is not removed in order to prevent the fans going below 20% PWM,
which would cause them to get stuck at 0% PWM.
[1]
BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290
Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122
Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016
Workqueue: events_freezable_power_ thermal_zone_device_check
Call Trace:
dump_stack_lvl+0x8b/0xb3
print_address_description.constprop.0+0x1f/0x140
kasan_report.cold+0x7f/0x11b
thermal_cooling_device_stats_update+0x271/0x290
__thermal_cdev_update+0x15e/0x4e0
thermal_cdev_update+0x9f/0xe0
step_wise_throttle+0x770/0xee0
thermal_zone_device_update+0x3f6/0xdf0
process_one_work+0xa42/0x1770
worker_thread+0x62f/0x13e0
kthread+0x3ee/0x4e0
ret_from_fork+0x1f/0x30
Allocated by task 1:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
thermal_cooling_device_setup_sysfs+0x153/0x2c0
__thermal_cooling_device_register.part.0+0x25b/0x9c0
thermal_cooling_device_register+0xb3/0x100
mlxsw_thermal_init+0x5c5/0x7e0
__mlxsw_core_bus_device_register+0xcb3/0x19c0
mlxsw_core_bus_device_register+0x56/0xb0
mlxsw_pci_probe+0x54f/0x710
local_pci_probe+0xc6/0x170
pci_device_probe+0x2b2/0x4d0
really_probe+0x293/0xd10
__driver_probe_device+0x2af/0x440
driver_probe_device+0x51/0x1e0
__driver_attach+0x21b/0x530
bus_for_each_dev+0x14c/0x1d0
bus_add_driver+0x3ac/0x650
driver_register+0x241/0x3d0
mlxsw_sp_module_init+0xa2/0x174
do_one_initcall+0xee/0x5f0
kernel_init_freeable+0x45a/0x4de
kernel_init+0x1f/0x210
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff8881052f7800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 1016 bytes inside of
1024-byte region [ffff8881052f7800, ffff8881052f7c00)
The buggy address belongs to the page:
page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0
head:0000000052355272 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67-1895-1fdc45c0244e@linaro.org/
[3] https://lore.kernel.org/linux-pm/af9857f2-578e-de3a-e62b-6baff7e69fd4@linaro.org/
CC: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone")
Fixes: a421ce088ac8 ("mlxsw: core: Extend cooling device with cooling levels")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20211012174955.472928-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After recent cleanups, gcc started warning about a suspicious
memcpy() call during the s2io_io_resume() function:
In function '__dev_addr_set',
inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:318:2,
inlined from 's2io_set_mac_addr' at drivers/net/ethernet/neterion/s2io.c:5205:2,
inlined from 's2io_io_resume' at drivers/net/ethernet/neterion/s2io.c:8569:7:
arch/x86/include/asm/string_32.h:182:25: error: '__builtin_memcpy' accessing 6 bytes at offsets 0 and 2 overlaps 4 bytes at offset 2 [-Werror=restrict]
182 | #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/netdevice.h:4648:9: note: in expansion of macro 'memcpy'
4648 | memcpy(dev->dev_addr, addr, len);
| ^~~~~~
What apparently happened is that an old cleanup changed the calling
conventions for s2io_set_mac_addr() from taking an ethernet address
as a character array to taking a struct sockaddr, but one of the
callers was not changed at the same time.
Change it to instead call the low-level do_s2io_prog_unicast() function
that still takes the old argument type.
Fixes: 2fd376884558 ("S2io: Added support set_mac_address driver entry point")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211013143613.2049096-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
devm_regmap_init may return error which caused by like out of memory,
this will results in null pointer dereference later when reading
or writing register:
general protection fault in encx24j600_spi_probe
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 286 Comm: spi-encx24j600- Not tainted 5.15.0-rc2-00142-g9978db750e31-dirty #11 9c53a778c1306b1b02359f3c2bbedc0222cba652
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:regcache_cache_bypass drivers/base/regmap/regcache.c:540
Code: 54 41 89 f4 55 53 48 89 fb 48 83 ec 08 e8 26 94 a8 fe 48 8d bb a0 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 03 00 00 4c 8d ab b0 00 00 00 48 8b ab a0 00
RSP: 0018:ffffc900010476b8 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: 0000000000000000
RDX: 0000000000000012 RSI: ffff888002de0000 RDI: 0000000000000094
RBP: ffff888013c9a000 R08: 0000000000000000 R09: fffffbfff3f9cc6a
R10: ffffc900010476e8 R11: fffffbfff3f9cc69 R12: 0000000000000001
R13: 000000000000000a R14: ffff888013c9af54 R15: ffff888013c9ad08
FS: 00007ffa984ab580(0000) GS:ffff88801fe00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055a6384136c8 CR3: 000000003bbe6003 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
encx24j600_spi_probe drivers/net/ethernet/microchip/encx24j600.c:459
spi_probe drivers/spi/spi.c:397
really_probe drivers/base/dd.c:517
__driver_probe_device drivers/base/dd.c:751
driver_probe_device drivers/base/dd.c:782
__device_attach_driver drivers/base/dd.c:899
bus_for_each_drv drivers/base/bus.c:427
__device_attach drivers/base/dd.c:971
bus_probe_device drivers/base/bus.c:487
device_add drivers/base/core.c:3364
__spi_add_device drivers/spi/spi.c:599
spi_add_device drivers/spi/spi.c:641
spi_new_device drivers/spi/spi.c:717
new_device_store+0x18c/0x1f1 [spi_stub 4e02719357f1ff33f5a43d00630982840568e85e]
dev_attr_store drivers/base/core.c:2074
sysfs_kf_write fs/sysfs/file.c:139
kernfs_fop_write_iter fs/kernfs/file.c:300
new_sync_write fs/read_write.c:508 (discriminator 4)
vfs_write fs/read_write.c:594
ksys_write fs/read_write.c:648
do_syscall_64 arch/x86/entry/common.c:50
entry_SYSCALL_64_after_hwframe arch/x86/entry/entry_64.S:113
Add error check in devm_regmap_init_encx24j600 to avoid this situation.
Fixes: 04fbfce7a222 ("net: Microchip encx24j600 driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Link: https://lore.kernel.org/r/20211012125901.3623144-1-sunnanyong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-10-12
* tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors
net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
net/mlx5e: Switchdev representors are not vlan challenged
net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode
net/mlx5: Fix cleanup of bridge delayed work
====================
Link: https://lore.kernel.org/r/20211012205323.20123-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/ethernet/korina.o: in function `korina_multicast_list':
korina.c:(.text+0x1af): undefined reference to `crc32_le'
Fixes: ef11291bcd5f9 ("Add support the Korina (IDT RC32434) Ethernet MAC")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Florian fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211012152509.21771-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/ethernet/arc/emac_main.o: in function `arc_emac_set_rx_mode':
emac_main.c:(.text+0xb11): undefined reference to `crc32_le'
The crc32_le() call comes through the ether_crc_le() call in
arc_emac_set_rx_mode().
[v2: moved the select to ARC_EMAC_CORE; the Makefile is a bit confusing,
but the error comes from emac_main.o, which is part of the arc_emac module,
which in turn is enabled by CONFIG_ARC_EMAC_CORE. Note that arc_emac is
different from emac_arc...]
Fixes: 775dd682e2b0ec ("arc_emac: implement promiscuous mode and multicast filtering")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lore.kernel.org/r/20211012093446.1575-1-vegard.nossum@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
one of them is capable of acting as an NPI port at a time (inject and
extract packets using DSA tags).
However, using the alternative ocelot-8021q tagging protocol, it should
be possible to use both CPU ports symmetrically, but for that we need to
mark both ports in the device tree as DSA masters.
In the process of doing that, it can be seen that traffic to/from the
network stack gets broken, and this is because the Felix driver iterates
through all DSA CPU ports and configures them as NPI ports. But since
there can only be a single NPI port, we effectively end up in a
situation where DSA thinks the default CPU port is the first one, but
the hardware port configured to be an NPI is the last one.
I would like to treat this as a bug, because if the updated device trees
are going to start circulating, it would be really good for existing
kernels to support them, too.
Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At present, when a PTP packet which requires TX timestamping gets
dropped under congestion by the switch, things go downhill very fast.
The driver keeps a clone of that skb in a queue of packets awaiting TX
timestamp interrupts, but interrupts will never be raised for the
dropped packets.
Moreover, matching timestamped packets to timestamps is done by a 2-bit
timestamp ID, and this can wrap around and we can match on the wrong skb.
Since with the default NPI-based tagging protocol, we get no notification
about packet drops, the best we can do is eventually recover from the
drop of a PTP frame: its skb will be dead memory until another skb which
was assigned the same timestamp ID happens to find it.
However, with the ocelot-8021q tagger which injects packets using the
manual register interface, it appears that we can check for more
information, such as:
- whether the input queue has reached the high watermark or not
- whether the injection group's FIFO can accept additional data or not
so we know that a PTP frame is likely to get dropped before actually
sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).
But when we do that, we can also remove the skb from the timestamping
queue, because there surely won't be any timestamp that matches it.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michael reported that when using the "ocelot-8021q" tagging protocol,
the switch driver module must be manually loaded before the tagging
protocol can be loaded/is available.
This appears to be the same problem described here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
where due to the fact that DSA tagging protocols make use of symbols
exported by the switch drivers, circular dependencies appear and this
breaks module autoloading.
The ocelot_8021q driver needs the ocelot_can_inject() and
ocelot_port_inject_frame() functions from the switch library. Previously
the wrong approach was taken to solve that dependency: shims were
provided for the case where the ocelot switch library was compiled out,
but that turns out to be insufficient, because the dependency when the
switch lib _is_ compiled is problematic too.
We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
static inline functions, because these access I/O functions like
__ocelot_write_ix() which is called by ocelot_write_rix(). Making those
static inline basically means exposing the whole guts of the ocelot
switch library, not ideal...
We already have one tagging protocol driver which calls into the switch
driver during xmit but not using any exported symbol: sja1105_defer_xmit.
We can do the same thing here: create a kthread worker and one work item
per skb, and let the switch driver itself do the register accesses to
send the skb, and then consume it.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocol drivers cannot depend on symbols exported by switch
drivers, because this creates a circular dependency that breaks module
autoloading.
The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
exported by the common ocelot switch lib. This function looks at
OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
DSA tag, for PTP timestamping (the command: one-step/two-step, and the
TX timestamp identifier).
None of that requires deep insight into the driver, it is quite
stateless, as it only depends upon the skb->cb. So let's make it a
static inline function and put it in include/linux/dsa/ocelot.h, a
file that despite its name is used by the ocelot switch driver for
populating the injection header too - since commit 40d3f295b5fe ("net:
mscc: ocelot: use common tag parsing code with DSA").
With that function declared as static inline, its body is expanded
inside each call site, so the dependency is broken and the DSA tagger
can be built without the switch library, upon which the felix driver
depends.
Fixes: 39e5308b3250 ("net: mscc: ocelot: support PTP Sync one-step timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
the skb PTP header
The sad reality is that when a PTP frame with a TX timestamping request
is transmitted, it isn't guaranteed that it will make it all the way to
the wire (due to congestion inside the switch), and that a timestamp
will be taken by the hardware and placed in the timestamp FIFO where an
IRQ will be raised for it.
The implication is that if enough PTP frames are silently dropped by the
hardware such that the timestamp ID has rolled over, it is possible to
match a timestamp to an old skb.
Furthermore, nobody will match on the real skb corresponding to this
timestamp, since we stupidly matched on a previous one that was stale in
the queue, and stopped there.
So PTP timestamping will be broken and there will be no way to recover.
It looks like the hardware parses the sequenceID from the PTP header,
and also provides that metadata for each timestamp. The driver currently
ignores this, but it shouldn't.
As an extra resiliency measure, do the following:
- check whether the PTP sequenceID also matches between the skb and the
timestamp, treat the skb as stale otherwise and free it
- if we see a stale skb, don't stop there and try to match an skb one
more time, chances are there's one more skb in the queue with the same
timestamp ID, otherwise we wouldn't have ever found the stale one (it
is by timestamp ID that we matched it).
While this does not prevent PTP packet drops, it at least prevents
the catastrophic consequences of incorrect timestamp matching.
Since we already call ptp_classify_raw in the TX path, save the result
in the skb->cb of the clone, and just use that result in the interrupt
code path.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It appears that Ocelot switches cannot timestamp non-PTP frames,
I tested this using the isochron program at:
https://github.com/vladimiroltean/tsn-scripts
with the result that the driver increments the ocelot_port->ts_id
counter as expected, puts it in the REW_OP, but the hardware seems to
not timestamp these packets at all, since no IRQ is emitted.
Therefore check whether we are sending PTP frames, and refuse to
populate REW_OP otherwise.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When skb_match is NULL, it means we received a PTP IRQ for a timestamp
ID that the kernel has no idea about, since there is no skb in the
timestamping queue with that timestamp ID.
This is a grave error and not something to just "continue" over.
So print a big warning in case this happens.
Also, move the check above ocelot_get_hwtimestamp(), there is no point
in reading the full 64-bit current PTP time if we're not going to do
anything with it anyway for this skb.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
PTP packets with 2-step TX timestamp requests are matched to packets
based on the egress port number and a 6-bit timestamp identifier.
All PTP timestamps are held in a common FIFO that is 128 entry deep.
This patch ensures that back-to-back timestamping requests cannot exceed
the hardware FIFO capacity. If that happens, simply send the packets
without requesting a TX timestamp to be taken (in the case of felix,
since the DSA API has a void return code in ds->ops->port_txtstamp) or
drop them (in the case of ocelot).
I've moved the ts_id_lock from a per-port basis to a per-switch basis,
because we need separate accounting for both numbers of PTP frames in
flight. And since we need locking to inc/dec the per-switch counter,
that also offers protection for the per-port counter and hence there is
no reason to have a per-port counter anymore.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At present, there is a problem when user space bombards a port with PTP
event frames which have TX timestamping requests (or when a tc-taprio
offload is installed on a port, which delays the TX timestamps by a
significant amount of time). The driver will happily roll over the 2-bit
timestamp ID and this will cause incorrect matches between an skb and
the TX timestamp collected from the FIFO.
The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
63 is reserved, so that leaves identifiers 0-62 to be used.
The timestamp identifiers are selected by the REW_OP packet field, and
are actually shared between CPU-injected frames and frames which match a
VCAP IS2 rule that modifies the REW_OP. The hardware supports
partitioning between the two uses of the REW_OP field through the
PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.
The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
ocelot_init_timestamp() which makes all timestamp identifiers available
to CPU injection.
Therefore, we can make use of all 63 timestamp identifiers, which should
allow more timestampable packets to be in flight on each port. This is
only part of the solution, more issues will be addressed in future changes.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|