Age | Commit message (Collapse) | Author | Files | Lines |
|
Remove the histogram stuff as it's mostly going to be outdated.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/162431195953.2908479.16770977195634296638.stgit@warthog.procyon.org.uk/
|
|
Add /proc/fs/fscache/cookies to display active cookies.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/158861211871.340223.7223853943667440807.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465771021.1376105.6933857529128238020.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588460994.3465195.16963417803501149328.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/162431194785.2908479.786917990782538164.stgit@warthog.procyon.org.uk/
|
|
Add a cookie debug ID and use that in traces and in procfiles rather than
displaying the (hashed) pointer to the cookie. This is easier to correlate
and we don't lose anything when interpreting oops output since that shows
unhashed addresses and registers that aren't comparable to the hashed
values.
Changes:
ver #2:
- Fix the fscache_op tracepoint to handle a NULL cookie pointer.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/158861210988.340223.11688464116498247790.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465769844.1376105.14119502774019865432.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588459097.3465195.1273313637721852165.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/162431193544.2908479.17556704572948300790.stgit@warthog.procyon.org.uk/
|
|
A common implementation of isatty(3) involves calling a ioctl passing
a dummy struct argument and checking whether the syscall failed --
bionic and glibc use TCGETS (passing a struct termios), and musl uses
TIOCGWINSZ (passing a struct winsize). If the FD is a socket, we will
copy sizeof(struct ifreq) bytes of data from the argument and return
-EFAULT if that fails. The result is that the isatty implementations
may return a non-POSIX-compliant value in errno in the case where part
of the dummy struct argument is inaccessible, as both struct termios
and struct winsize are smaller than struct ifreq (at least on arm64).
Although there is usually enough stack space following the argument
on the stack that this did not present a practical problem up to now,
with MTE stack instrumentation it's more likely for the copy to fail,
as the memory following the struct may have a different tag.
Fix the problem by adding an early check for whether the ioctl is a
valid socket ioctl, and return -ENOTTY if it isn't.
Fixes: 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers")
Link: https://linux-review.googlesource.com/id/I869da6cf6daabc3e4b7b82ac979683ba05e27d4d
Signed-off-by: Peter Collingbourne <[email protected]>
Cc: <[email protected]> # 4.19
Signed-off-by: David S. Miller <[email protected]>
|
|
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
All callers already have a dax_device obtained from fs_dax_get_by_bdev
at hand, so just pass that to dax_supported() insted of doing another
lookup.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>
|
|
dax_supported calls into ->dax_supported which checks for fsdax support.
Don't bother building it for !CONFIG_FS_DAX as it will always return
false.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>
|
|
Just implement generic_fsdax_supported directly out of line instead of
adding a wrapper. Given that generic_fsdax_supported is only supplied
for CONFIG_FS_DAX builds this also allows to not provide it at all for
!CONFIG_FS_DAX builds.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>
|
|
And move the code around a bit to avoid a forward declaration.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from can and bpf.
Closing three hw-dependent regressions. Any fixes of note are in the
'old code' category. Nothing blocking release from our perspective.
Current release - regressions:
- stmmac: revert "stmmac: align RX buffers"
- usb: asix: ax88772: move embedded PHY detection as early as
possible
- usb: asix: do not call phy_disconnect() for ax88178
- Revert "net: really fix the build...", from Kalle to fix QCA6390
Current release - new code bugs:
- phy: mediatek: add the missing suspend/resume callbacks
Previous releases - regressions:
- qrtr: fix another OOB Read in qrtr_endpoint_post
- stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings
Previous releases - always broken:
- inet: use siphash in exception handling
- ip_gre: add validation for csum_start
- bpf: fix ringbuf helper function compatibility
- rtnetlink: return correct error on changing device netns
- e1000e: do not try to recover the NVM checksum on Tiger Lake"
* tag 'net-5.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
Revert "net: really fix the build..."
net: hns3: fix get wrong pfc_en when query PFC configuration
net: hns3: fix GRO configuration error after reset
net: hns3: change the method of getting cmd index in debugfs
net: hns3: fix duplicate node in VLAN list
net: hns3: fix speed unknown issue in bond 4
net: hns3: add waiting time before cmdq memory is released
net: hns3: clear hardware resource when loading driver
net: fix NULL pointer reference in cipso_v4_doi_free
rtnetlink: Return correct error on changing device netns
net: dsa: hellcreek: Adjust schedule look ahead window
net: dsa: hellcreek: Fix incorrect setting of GCL
cxgb4: dont touch blocked freelist bitmap after free
ipv4: use siphash instead of Jenkins in fnhe_hashfun()
ipv6: use siphash in rt6_exception_hash()
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
net: usb: asix: ax88772: fix boolconv.cocci warnings
net/sched: ets: fix crash when flipping from 'strict' to 'quantum'
qede: Fix memset corruption
net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp
...
|
|
In the reexport case, nfsd is currently passing along locks with the
reclaim bit set. The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.
We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.
I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.
Signed-off-by: J. Bruce Fields <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
NFS implements blocking locks by blocking inside its lock method. In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock. It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.
Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks. The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.
Signed-off-by: J. Bruce Fields <[email protected]>
Acked-by: Anna Schumaker <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
Some systems, e.g., HiSilicon KunPeng920 and KunPeng930, have devices that
appear as PCI but are actually on the AMBA bus. Some of these fake PCI
devices support a PASID-like feature and they do have a working PASID
capability even though they do not use the PCIe Transport Layer Protocol
and do not support TLP prefixes.
Add a pasid_no_tlp bit for this "PASID works without TLP prefixes" case and
update pci_enable_pasid() so it can enable PASID on these devices.
Set this bit for HiSilicon KunPeng920 and KunPeng930.
[bhelgaas: squashed, commit log]
Suggested-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Zhangfei Gao <[email protected]>
Signed-off-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Zhou Wang <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
|
|
A typical code pattern for pm_clk_create() call is to call it in the
_probe function and to call pm_clk_destroy() both from _probe error path
and from _remove function. For some drivers the whole remove function
would consist of the call to pm_remove_disable().
Add helper function to replace this bolierplate piece of code. Calling
devm_pm_clk_create() removes the need for calling pm_clk_destroy() both
in the probe()'s error path and in the remove() function.
Signed-off-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
|
|
A typical code pattern for pm_runtime_enable() call is to call it in the
_probe function and to call pm_runtime_disable() both from _probe error
path and from _remove function. For some drivers the whole remove
function would consist of the call to pm_remove_disable().
Add helper function to replace this bolierplate piece of code. Calling
devm_pm_runtime_enable() removes the need for calling
pm_runtime_disable() both in the probe()'s error path and in the
remove() function.
Signed-off-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
|
|
This reverts commit ce78ffa3ef1681065ba451cfd545da6126f5ca88.
Wren and Nicolas reported that ath11k was failing to initialise QCA6390
Wi-Fi 6 device with error:
qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22
Commit ce78ffa3ef16 ("net: really fix the build..."), introduced in
v5.14-rc5, caused this regression in qrtr. Most likely all ath11k
devices are broken, but I only tested QCA6390. Let's revert the broken
commit so that ath11k works again.
Reported-by: Wren Turkal <[email protected]>
Reported-by: Nicolas Schichan <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Now that vfio_pci has been split into two source modules, one focusing on
the "struct pci_driver" (vfio_pci.c) and a toolbox library of code
(vfio_pci_core.c), complete the split and move them into two different
kernel modules.
As before vfio_pci.ko continues to present the same interface under sysfs
and this change will have no functional impact.
Splitting into another module and adding exports allows creating new HW
specific VFIO PCI drivers that can implement device specific
functionality, such as VFIO migration interfaces or specialized device
requirements.
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Yishai Hadas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
Expose an 'override_only' helper macro (i.e.
PCI_DRIVER_OVERRIDE_DEVICE_VFIO) for VFIO PCI sub system and add the
required code to prefix its matching entries with "vfio_" in
modules.alias file.
It allows VFIO device drivers to include match entries in the
modules.alias file produced by kbuild that are not used for normal
driver autoprobing and module autoloading. Drivers using these match
entries can be connected to the PCI device manually, by userspace, using
the existing driver_override sysfs.
For example the resulting modules.alias may have:
alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core
alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci
In this example mlx5_core and mlx5_vfio_pci match to the same PCI
device. The kernel will autoload and autobind to mlx5_core but the
kernel and udev mechanisms will ignore mlx5_vfio_pci.
When userspace wants to change a device to the VFIO subsystem it can
implement a generic algorithm:
1) Identify the sysfs path to the device:
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0
2) Get the modalias string from the kernel:
$ cat /sys/bus/pci/devices/0000:01:00.0/modalias
pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00
3) Prefix it with vfio_:
vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00
4) Search modules.alias for the above string and select the entry that
has the fewest *'s:
alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci
5) modprobe the matched module name:
$ modprobe mlx5_vfio_pci
6) cat the matched module name to driver_override:
echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override
7) unbind device from original module
echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci)
echo 0000:01:00.0 > /sys/bus/pci/drivers_probe
The algorithm is independent of bus type. In future the other buses with
VFIO device drivers, like platform and ACPI, can use this algorithm as
well.
This patch is the infrastructure to provide the information in the
modules.alias to userspace. Convert the only VFIO pci_driver which results
in one new line in the modules.alias:
alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci
Later series introduce additional HW specific VFIO PCI drivers, such as
mlx5_vfio_pci.
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]> # for pci.h
Signed-off-by: Yishai Hadas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
Add 'override_only' field to struct pci_device_id to be used as part of
pci_match_device().
When set, a driver only matches the entry when dev->driver_override is
set to that driver.
In addition, add a helper macro named 'PCI_DEVICE_DRIVER_OVERRIDE' to
enable setting some data on it.
Next patch from this series will use the above functionality.
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Yishai Hadas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alex Williamson <[email protected]>
|
|
Factor out force_sig_seccomp from the seccomp signal generation and
place it in kernel/signal.c. The function force_sig_seccomp takes a
parameter force_coredump to indicate that the sigaction field should
be reset to SIGDFL so that a coredump will be generated when the
signal is delivered.
force_sig_seccomp is then used to replace both seccomp_send_sigsys
and seccomp_init_siginfo.
force_sig_info_to_task gains an extra parameter to force using
the default signal action.
With this change seccomp is no longer a special case and there
becomes exactly one place do_coredump is called from.
Further it no longer becomes necessary for __seccomp_filter
to call do_group_exit.
Acked-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/87r1gr6qc4.fsf_-_@disp2133
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
|
|
|
|
Oltean <[email protected]>:
The ls-extirq irqchip driver accesses regmap inside its implementation
of the struct irq_chip :: irq_set_type method, and currently regmap
only knows to lock using normal spinlocks. But the method above wants
raw spinlock context, so this isn't going to work and triggers a
"[ BUG: Invalid wait context ]" splat.
The best we can do given the arrangement of the code is to patch regmap
and the syscon driver: regmap to support raw spinlocks, and syscon to
request them on behalf of its ls-extirq consumer.
Link: https://lore.kernel.org/lkml/20210825135438.ubcuxm5vctt6ne2q@skbuf/T/#u
Vladimir Oltean (2):
regmap: teach regmap to use raw spinlocks if requested in the config
mfd: syscon: request a regmap with raw spinlocks for some devices
drivers/base/regmap/internal.h | 4 ++++
drivers/base/regmap/regmap.c | 35 +++++++++++++++++++++++++++++-----
drivers/mfd/syscon.c | 16 ++++++++++++++++
include/linux/regmap.h | 2 ++
4 files changed, 52 insertions(+), 5 deletions(-)
--
2.25.1
base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
|
|
Some drivers might access regmap in a context where a raw spinlock is
held. An example is drivers/irqchip/irq-ls-extirq.c, which calls
regmap_update_bits() from struct irq_chip :: irq_set_type, which is a
method called by __irq_set_trigger() under the desc->lock raw spin lock.
Since desc->lock is a raw spin lock and the regmap internal lock for
mmio is a plain spinlock (which can become sleepable on RT), this is an
invalid locking scheme and we get a splat stating that this is a
"[ BUG: Invalid wait context ]".
It seems reasonable for regmap to have an option use a raw spinlock too,
so add that in the config such that drivers can request it.
Suggested-by: Mark Brown <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
A few more things:
* Use correct DFS domain for self-managed devices
* some preparations for transmit power element handling
and other 6 GHz regulatory handling
* TWT support in AP mode in mac80211
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
IEEE Std 802.11ax™-2021 makes changes to the transmit power envelope
element, adjust the code accordingly.
Signed-off-by: Wen Gong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
IEEE Std 802.11ax™-2021 added regulatory info subfield in HE operation
element, add it to the header file.
Signed-off-by: Wen Gong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
ssh://git.freedesktop.org/git/tegra/linux into drm-next
drm/tegra: Changes for v5.15-rc1
The bulk of these changes is a more modern ABI that can be efficiently
used on newer SoCs as well as older ones. The userspace parts for this
are available here:
- libdrm support: https://gitlab.freedesktop.org/tagr/drm/-/commits/drm-tegra-uabi-v8
- VAAPI driver: https://github.com/cyndis/vaapi-tegra-driver
In addition, existing userspace from the grate reverse-engineering
project has been updated to use this new ABI:
- X11 driver: https://github.com/grate-driver/xf86-video-opentegra
- 3D driver: https://github.com/grate-driver/grate
Other than that, there's also support for display memory bandwidth
management for various generations and a bit of cleanup.
Signed-off-by: Dave Airlie <[email protected]>
From: Thierry Reding <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
This commit fixes linker errors along the lines of:
s390-linux-ld: task_iter.c:(.init.text+0xa4): undefined reference to `btf_task_struct_ids'`
Fix by defining btf_task_struct_ids unconditionally in kernel/bpf/btf.c
since there exists code that unconditionally uses btf_task_struct_ids.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Daniel Xu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/05d94748d9f4b3eecedc4fddd6875418a396e23c.1629942444.git.dxu@dxuuu.xyz
|
|
In testing mounts to Macs, noticed that the OIDS for some
GSSAPI/SPNEGO auth mechanisms sent by the server were not
recognized and were missing from the header.
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
No need to have it defined 5 times. Once is enough.
Signed-off-by: Daniel Xu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz
|
|
Same as BTF_ID_LIST_SINGLE macro except defines a global ID.
Signed-off-by: Daniel Xu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/a867a97517df42fd3953eeb5454402b57e74538f.1629772842.git.dxu@dxuuu.xyz
|
|
|
|
Move the cookie debug ID from struct netfs_read_request to struct
netfs_cache_resources and drop the 'cookie_' prefix. This makes it
available for things that want to use netfs_cache_resources without having
a netfs_read_request.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/162431190784.2908479.13386972676539789127.stgit@warthog.procyon.org.uk/
|
|
Introduce and reuse a helper that acts similarly to __sys_accept4_file()
but returns struct file instead of installing file descriptor. Will be
used by io_uring.
Signed-off-by: Pavel Begunkov <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Acked-by: David S. Miller <[email protected]>
Link: https://lore.kernel.org/r/c57b9e8e818d93683a3d24f8ca50ca038d1da8c4.1629888991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
Introduced in commit 38b5beeae7a4 ("net: dsa: sja1105: prepare tagger
for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
function solved quite a different problem than our needs are now.
Then, we used best-effort VLAN filtering and we were using the xmit_tpid
to tunnel packets coming from an 8021q upper through the TX VLAN allocated
by tag_8021q to that egress port. The need for a different VLAN protocol
depending on switch revision came from the fact that this in itself was
more of a hack to trick the hardware into accepting tunneled VLANs in
the first place.
Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
supported them again, we would not do that using the same method of
{tunneling the VLAN on egress, retagging the VLAN on ingress} that we
had in the best-effort VLAN filtering mode. It seems rather simpler that
we just allocate a VLAN in the VLAN table that is simply not used by the
bridge at all, or by any other port.
Anyway, I have 2 gripes with the current sja1105_xmit_tpid:
1. When sending packets on behalf of a VLAN-aware bridge (with the new
TX forwarding offload framework) plus untagged (with the tag_8021q
VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
through the DSA master have a VLAN protocol of 0x8100 and others of
0x88a8. This is strange and there is no reason for it now. If we have
a bridge and are therefore forced to send using that bridge's TPID,
we can as well blend with that bridge's VLAN protocol for all packets.
2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
because it looks inside dp->priv. It is desirable to keep as much
separation between taggers and switch drivers as possible. Now it
doesn't do that anymore.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The sja1105 driver is a bit special in its use of VLAN headers as DSA
tags. This is because in VLAN-aware mode, the VLAN headers use an actual
TPID of 0x8100, which is understood even by the DSA master as an actual
VLAN header.
Furthermore, control packets such as PTP and STP are transmitted with no
VLAN header as a DSA tag, because, depending on switch generation, there
are ways to steer these control packets towards a precise egress port
other than VLAN tags. Transmitting control packets as untagged means
leaving a door open for traffic in general to be transmitted as untagged
from the DSA master, and for it to traverse the switch and exit a random
switch port according to the FDB lookup.
This behavior is a bit out of line with other DSA drivers which have
native support for DSA tagging. There, it is to be expected that the
switch only accepts DSA-tagged packets on its CPU port, dropping
everything that does not match this pattern.
We perhaps rely a bit too much on the switches' hardware dropping on the
CPU port, and place no other restrictions in the kernel data path to
avoid that. For example, sja1105 is also a bit special in that STP/PTP
packets are transmitted using "management routes"
(sja1105_port_deferred_xmit): when sending a link-local packet from the
CPU, we must first write a SPI message to the switch to tell it to
expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
it towards port 3 when it gets it. This entry expires as soon as it
matches a packet received by the switch, and it needs to be reinstalled
for the next packet etc. All in all quite a ghetto mechanism, but it is
all that the sja1105 switches offer for injecting a control packet.
The driver takes a mutex for serializing control packets and making the
pairs of SPI writes of a management route and its associated skb atomic,
but to be honest, a mutex is only relevant as long as all parties agree
to take it. With the DSA design, it is possible to open an AF_PACKET
socket on the DSA master net device, and blast packets towards
01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
it all goes kaput because management routes installed by the driver will
match skbs sent by the DSA master, and not skbs generated by the driver
itself. So they will end up being routed on the wrong port.
So through the lens of that, maybe it would make sense to avoid that
from happening by doing something in the network stack, like: introduce
a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
dev_hard_start_xmit(), introduce the following check:
if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
kfree_skb(skb);
Ok, maybe that is a bit drastic, but that would at least prevent a bunch
of problems. For example, right now, even though the majority of DSA
switches drop packets without DSA tags sent by the DSA master (and
therefore the majority of garbage that user space daemons like avahi and
udhcpcd and friends create), it is still conceivable that an aggressive
user space program can open an AF_PACKET socket and inject a spoofed DSA
tag directly on the DSA master. We have no protection against that; the
packet will be understood by the switch and be routed wherever user
space says. Furthermore: there are some DSA switches where we even have
register access over Ethernet, using DSA tags. So even user space
drivers are possible in this way. This is a huge hole.
However, the biggest thing that bothers me is that udhcpcd attempts to
ask for an IP address on all interfaces by default, and with sja1105, it
will attempt to get a valid IP address on both the DSA master as well as
on sja1105 switch ports themselves. So with IP addresses in the same
subnet on multiple interfaces, the routing table will be messed up and
the system will be unusable for traffic until it is configured manually
to not ask for an IP address on the DSA master itself.
It turns out that it is possible to avoid that in the sja1105 driver, at
least very superficially, by requesting the switch to drop VLAN-untagged
packets on the CPU port. With the exception of control packets, all
traffic originated from tag_sja1105.c is already VLAN-tagged, so only
STP and PTP packets need to be converted. For that, we need to uphold
the equivalence between an untagged and a pvid-tagged packet, and to
remember that the CPU port of sja1105 uses a pvid of 4095.
Now that we drop untagged traffic on the CPU port, non-aggressive user
space applications like udhcpcd stop bothering us, and sja1105 effectively
becomes just as vulnerable to the aggressive kind of user space programs
as other DSA switches are (ok, users can also create 8021q uppers on top
of the DSA master in the case of sja1105, but in future patches we can
easily deny that, but it still doesn't change the fact that VLAN-tagged
packets can still be injected over raw sockets).
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-08-24
Vinicius Costa Gomes says:
This adds support for PCIe PTM (Precision Time Measurement) to the igc
driver. PCIe PTM allows the NIC and Host clocks to be compared more
precisely, improving the clock synchronization accuracy.
Patch 1/4 reverts a commit that made pci_enable_ptm() private to the
PCI subsystem, reverting makes it possible for it to be called from
the drivers.
Patch 2/4 adds the pcie_ptm_enabled() helper.
Patch 3/4 calls pci_enable_ptm() from the igc driver.
Patch 4/4 implements the PCIe PTM support. Exposing it via the
.getcrosststamp() API implies that the time measurements are made
synchronously with the ioctl(). The hardware was implemented so the
most convenient way to retrieve that information would be
asynchronously. So, to follow the expectations of the ioctl() we have
to use less convenient ways, triggering an PCIe PTM dialog every time
a ioctl() is received.
Some questions are raised (also pointed out in the commit message):
1. Using convert_art_ns_to_tsc() is too x86 specific, there should be
a common way to create a 'system_counterval_t' from a timestamp.
2. convert_art_ns_to_tsc() says that it should only be used when
X86_FEATURE_TSC_KNOWN_FREQ is true, but during tests it works even
when it returns false. Should that check be done?
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.
The source of most of the slow down is the `dev_addr_lists.c` module,
which mainatins a linked list of HW addresses.
When using IPv6, this list grows for each IPv6 address added on a
VLAN, since each IPv6 address has a multicast HW address associated with
it.
When performing any modification to the involved links, this list is
traversed many times, often for nothing, all while holding the RTNL
lock.
Instead, this patch adds an auxilliary rbtree which cuts down
traversal time significantly.
Performance can be seen with the following script:
#!/bin/bash
ip netns del test || true 2>/dev/null
ip netns add test
echo 1 | ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null
set -e
ip -n test link add foo type veth peer name bar
ip -n test link add b1 type bond
ip -n test link add florp type vrf table 10
ip -n test link set bar master b1
ip -n test link set foo up
ip -n test link set bar up
ip -n test link set b1 up
ip -n test link set florp up
VLAN_COUNT=1500
BASE_DEV=b1
echo Creating vlans
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done"
echo Bringing them up
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test link set foo.\$i up; done"
echo Assiging IPv6 Addresses
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test address add dev foo.\$i 2000::\$i/64; done"
echo Attaching to VRF
ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
do ip -n test link set foo.\$i master florp; done"
On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance
before the patch is (truncated):
Creating vlans
real 108.35
Bringing them up
real 4.96
Assiging IPv6 Addresses
real 19.22
Attaching to VRF
real 458.84
After the patch:
Creating vlans
real 5.59
Bringing them up
real 5.07
Assiging IPv6 Addresses
real 5.64
Attaching to VRF
real 25.37
Cc: David S. Miller <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Lu Wei <[email protected]>
Cc: Xiongfeng Wang <[email protected]>
Cc: Taehee Yoo <[email protected]>
Signed-off-by: Gilad Naaman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
orig_nents should represent the number of entries with pages,
but __sg_alloc_table_from_pages sets orig_nents as the number of
total entries in the table. This is wrong when the API is used for
dynamic allocation where not all the table entries are mapped with
pages. It wasn't observed until now, since RDMA umem who uses this
API in the dynamic form doesn't use orig_nents implicit or explicit
by the scatterlist APIs.
Fix it by changing the append API to track the SG append table
state and have an API to free the append table according to the
total number of entries in the table.
Now all APIs set orig_nents as number of enries with pages.
Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of SG table from pages")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Make the forward declarations of elfcorehdr_addr and elfcorehdr_size,
and the definitions of ELFCORE_ADDR_MAX and ELFCORE_ADDR_ERR always
available, like is done for phys_initrd_start and phys_initrd_size.
Code referring to these symbols can then just check for
IS_ENABLED(CONFIG_CRASH_DUMP), instead of requiring conditional
compilation using an #ifdef, thus preparing to increase compile
coverage.
Suggested-by: Rob Herring <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/ba965ca613c0cc82c1ec2fe353ee34fb13b36474.1628670468.git.geert+renesas@glider.be
|
|
Add a predicate that returns if PCIe PTM (Precision Time Measurement)
is enabled.
It will only return true if it's enabled in all the ports in the path
from the device to the root.
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
RDMA is the only in-kernel user that uses __sg_alloc_table_from_pages to
append pages dynamically. In the next patch. That mode will be extended
and that function will get more parameters. So separate it into a unique
function to make such change more clear.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Make pci_enable_ptm() accessible from the drivers.
Exposing this to the driver enables the driver to use the
'ptm_enabled' field of 'pci_dev' to check if PTM is enabled or not.
This reverts commit ac6c26da29c1 ("PCI: Make pci_enable_ptm() private").
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
./include/linux/libata.h:1462:8-9:WARNING: return of 0/1 in function
'ata_is_host_link' with return type bool
Return statements in functions returning bool should use true/false
instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Jing Yangyang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
blkdev_fsync is only used inside of block_dev.c since the
removal of the raw drіver.
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Support generic alternative_gpt_sector() block device operation.
It calculates location of GPT entry for eMMC of NVIDIA Tegra Android
devices. Add new MMC_CAP2_ALT_GPT_TEGRA flag that enables scanning of
alternative GPT sector and add raw_boot_mult field to mmc_ext_csd
which allows to get size of the boot partitions that is needed for
the calculation.
Signed-off-by: Dmitry Osipenko <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Add alternative_gpt_sector() block device operation which specifies
alternative location of a GPT entry. This allows us to support Android
devices that have GPT entry at a non-standard location and can't be
repartitioned easily.
Reviewed-by: Christoph Hellwig <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dmitry Osipenko <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.
Signed-off-by: Yufeng Mo <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently, there are many drivers who support CQE mode configuration,
some configure it as a fixed when initialized, some provide an
interface to change it by ethtool private flags. In order to make it
more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and
'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these
parameters can be accessed by ethtool netlink coalesce uAPI.
Also add an new structure kernel_ethtool_coalesce, then the
new parameter can be added into this struct.
Signed-off-by: Yufeng Mo <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|