aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2020-12-04net-zerocopy: Defer vm zap unless actually needed.Arjun Roy1-0/+2
Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be triggered from userspace with madvise(MADV_DONTNEED). If userspace is configured to call this before reusing a segment, or if there was nothing mapped at this virtual address to begin with, we can avoid calling zap_page_range() under the socket lock. That said, if userspace does not do that, then we are still responsible for calling zap_page_range(). This patch adds a flag that the user can use to hint to the kernel that a zap is not required. If the flag is not set, or if an older user application does not have a flags field at all, then the kernel calls zap_page_range as before. Also, if the flag is set but a zap is still required, the kernel performs that zap as necessary. Thus incorrectly indicating that a zap can be avoided does not change the correctness of operation. It also increases the batchsize for vm_insert_pages and prefetches the page struct for the batch since we're about to bump the refcount. An alternative mechanism could be to not have a flag, assume by default a zap is not needed, and fall back to zapping if needed. However, this would harm performance for older applications for which a zap is necessary, and thus we implement it with an explicit flag so newer applications can opt in. When using RPC-style traffic with medium sized (tens of KB) RPCs, this change yields an efficency improvement of about 30% for QPS/CPU usage. Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.Arjun Roy1-0/+2
When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will do so. This elides a recvmsg() call for received traffic that isn't exactly page-aligned in size. This was tested with RPC-style traffic of arbitrary sizes. Normally, each received message required at least one getsockopt() call, and one recvmsg() call for the remaining unaligned data. With this change, almost all of the recvmsg() calls are eliminated, leading to a savings of about 25%-50% in number of system calls for RPC-style workloads. Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04bpf: Add a bpf_sock_from_file helperFlorent Revest1-0/+9
While eBPF programs can check whether a file is a socket by file->f_op == &socket_file_ops, they cannot convert the void private_data pointer to a struct socket BTF pointer. In order to do this a new helper wrapping sock_from_file is added. This is useful to tracing programs but also other program types inheriting this set of helpers such as iterators or LSM programs. Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: KP Singh <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-04net: Remove the err argument from sock_from_fileFlorent Revest1-1/+1
Currently, the sock_from_file prototype takes an "err" pointer that is either not set or set to -ENOTSOCK IFF the returned socket is NULL. This makes the error redundant and it is ignored by a few callers. This patch simplifies the API by letting callers deduce the error based on whether the returned socket is NULL or not. Suggested-by: Al Viro <[email protected]> Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: KP Singh <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-04seg6: add support for the SRv6 End.DT4 behaviorAndrea Mayer1-0/+1
SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants environments. It decapsulates the received packets and it performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT4 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e: $ sysctl -wq net.vrf.strict_mode=1. At JANOG44, LINE corporation presented their multi-tenant DC architecture using SRv6 [2]. In the slides, they reported that the Linux kernel is missing the support of SRv6 End.DT4 behavior. The SRv6 End.DT4 behavior can be instantiated using a command similar to the following: $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0 We introduce the "vrftable" extension in iproute2 in a following patch. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6 Signed-off-by: Andrea Mayer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04Merge tag 'for-5.10/dm-fixes' of ↵Linus Torvalds1-5/+6
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM's bio splitting changes that were made during v5.9. This restores splitting in terms of varied per-target ti->max_io_len rather than use block core's single stacked 'chunk_sectors' limit. - Like DM crypt, update DM integrity to not use crypto drivers that have CRYPTO_ALG_ALLOCATES_MEMORY set. - Fix DM writecache target's argument parsing and status display. - Remove needless BUG() from dm writecache's persistent_memory_claim() - Remove old gcc workaround in DM cache target's block_div() for ARM link errors now that gcc >= 4.9 is required. - Fix RCU locking in dm_blk_report_zones and dm_dax_zero_page_range. - Remove old, and now frowned upon, BUG_ON(in_interrupt()) in dm_table_event(). - Remove invalid sparse annotations from dm_prepare_ioctl() and dm_unprepare_ioctl(). * tag 'for-5.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: remove invalid sparse __acquires and __releases annotations dm: fix double RCU unlock in dm_dax_zero_page_range() error path dm: fix IO splitting dm writecache: remove BUG() and fail gracefully instead dm table: Remove BUG_ON(in_interrupt()) dm: fix bug with RCU locking in dm_blk_report_zones Revert "dm cache: fix arm link errors with inline" dm writecache: fix the maximum number of arguments dm writecache: advance the number of arguments when reporting max_age dm integrity: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
2020-12-04PCI: Return u16 from pci_find_ext_capability() and similarBjorn Helgaas1-3/+3
PCI Express Extended Capabilities are in config space between offsets 256 and 4K. These offsets all fit in 16 bits. Change the return type of pci_find_ext_capability() and supporting functions from int to u16 to match the specification. Many callers use "int", which is fine, but there's no need to store more than a u16. Signed-off-by: Bjorn Helgaas <[email protected]>
2020-12-04PCI: Return u8 from pci_find_capability() and similarPuranjay Mohan1-6/+6
PCI Capabilities are linked in a list that must appear in the first 256 bytes of config space. Each capabilities list pointer is 8 bits. Change the return type of pci_find_capability() and supporting functions from int to u8 to match the specification. [bhelgaas: change other related interfaces, fix HyperTransport typos] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Puranjay Mohan <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2020-12-04Merge tag 'auxbus-5.11-rc1' of ↵Mark Brown55-1977/+346
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into asoc-5.11 Auxiliary Bus support tag for 5.11-rc1 This is a signed tag for other subsystems to be able to pull in the auxiliary bus support into their trees for the 5.11-rc1 merge. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04dm: fix IO splittingMike Snitzer1-5/+6
Commit 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") caused a couple regressions: 1) Using lcm_not_zero() when stacking chunk_sectors was a bug because chunk_sectors must reflect the most limited of all devices in the IO stack. 2) DM targets that set max_io_len but that do _not_ provide an .iterate_devices method no longer had there IO split properly. And commit 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") also caused a regression where DM no longer supported varied (per target) IO splitting. The implication being the potential for severely reduced performance for IO stacks that use a DM target like dm-cache to hide performance limitations of a slower device (e.g. one that requires 4K IO splitting). Coming full circle: Fix all these issues by discontinuing stacking chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(), add optional chunk_sectors override argument to blk_max_size_offset() and update DM's max_io_len() to pass ti->max_io_len to its blk_max_size_offset() call. Passing in an optional chunk_sectors override to blk_max_size_offset() allows for code reuse of block's centralized calculation for max IO size based on provided offset and split boundary. Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting") Fixes: 5091cdec56fa ("dm: change max_io_len() to use blk_max_size_offset()") Cc: [email protected] Reported-by: John Dorminy <[email protected]> Reported-by: Bruce Johnston <[email protected]> Reported-by: Kirill Tkhai <[email protected]> Reviewed-by: John Dorminy <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Reviewed-by: Jens Axboe <[email protected]>
2020-12-04bpf: Remove trailing semicolon in macro definitionTom Rix1-6/+6
The macro use will already have a semicolon. Clean up escaped newlines. Signed-off-by: Tom Rix <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-04Merge tag 'wireless-drivers-next-2020-12-03' of ↵Jakub Kicinski1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.11 First set of patches for v5.11. rtw88 getting improvements to work better with Bluetooth and other driver also getting some new features. mhi-ath11k-immutable branch was pulled from mhi tree to avoid conflicts with mhi tree. Major changes: rtw88 * major bluetooth co-existance improvements wilc1000 * Wi-Fi Multimedia (WMM) support ath11k * Fast Initial Link Setup (FILS) discovery and unsolicited broadcast probe response support * qcom,ath11k-calibration-variant Device Tree setting * cold boot calibration support * new DFS region: JP wnc36xx * enable connection monitoring and keepalive in firmware ath10k * firmware IRAM recovery feature mhi * merge mhi-ath11k-immutable branch to make MHI API change go smoothly * tag 'wireless-drivers-next-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (180 commits) wl1251: remove trailing semicolon in macro definition airo: remove trailing semicolon in macro definition wilc1000: added queue support for WMM wilc1000: call complete() for failure in wilc_wlan_txq_add_cfg_pkt() wilc1000: free resource in wilc_wlan_txq_add_mgmt_pkt() for failure path wilc1000: free resource in wilc_wlan_txq_add_net_pkt() for failure path wilc1000: added 'ndo_set_mac_address' callback support brcmfmac: expose firmware config files through modinfo wlcore: Switch to using the new API kobj_to_dev() rtw88: coex: add feature to enhance HID coexistence performance rtw88: coex: upgrade coexistence A2DP mechanism rtw88: coex: add action for coexistence in hardware initial rtw88: coex: add function to avoid cck lock rtw88: coex: change the coexistence mechanism for WLAN connected rtw88: coex: change the coexistence mechanism for HID rtw88: coex: update AFH information while in free-run mode rtw88: coex: update the mechanism for A2DP + PAN rtw88: coex: add debug message rtw88: coex: run coexistence when WLAN entering/leaving LPS Revert "rtl8xxxu: Add Buffalo WI-U3-866D to list of supported devices" ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04PCI/ERR: Cache RCEC EA Capability offset in pci_init_capabilities()Sean V Kelley1-0/+4
Extend support for Root Complex Event Collectors by decoding and caching the RCEC Endpoint Association Extended Capabilities when enumerating. Use that cached information for later error source reporting. See PCIe r5.0, sec 7.9.10. Co-developed-by: Qiuxu Zhuo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Jonathan Cameron <[email protected]> # non-native/no RCEC Signed-off-by: Qiuxu Zhuo <[email protected]> Signed-off-by: Sean V Kelley <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]>
2020-12-04PCI/ERR: Bind RCEC devices to the Root Port driverQiuxu Zhuo2-0/+8
If a Root Complex Integrated Endpoint (RCiEP) is implemented, it may signal errors through a Root Complex Event Collector (RCEC). Each RCiEP must be associated with no more than one RCEC. For an RCEC (which is technically not a Bridge), error messages "received" from associated RCiEPs must be enabled for "transmission" in order to cause a System Error via the Root Control register or (when the Advanced Error Reporting Capability is present) reporting via the Root Error Command register and logging in the Root Error Status register and Error Source Identification register. Given the commonality with Root Ports and the need to also support AER and PME services for RCECs, extend the Root Port driver to support RCEC devices by adding the RCEC Class ID to the driver structure. Co-developed-by: Sean V Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Jonathan Cameron <[email protected]> # non-native/no RCEC Signed-off-by: Sean V Kelley <[email protected]> Signed-off-by: Qiuxu Zhuo <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
2020-12-04block: remove the request_queue to argument request based tracepointsChristoph Hellwig2-21/+14
The request_queue can trivially be derived from the request. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: remove the request_queue argument to the block_bio_remap tracepointChristoph Hellwig1-5/+3
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: remove the request_queue argument to the block_split tracepointChristoph Hellwig1-8/+6
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: simplify and extend the block_bio_merge tracepoint classChristoph Hellwig1-125/+33
The block_bio_merge tracepoint class can be reused for most bio-based tracepoints. For that it just needs to lose the superfluous q and rq parameters. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: remove the unused block_sleeprq tracepointChristoph Hellwig1-18/+0
The block_sleeprq tracepoint was only used by the legacy request code. Remove it now that the legacy request code is gone. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04tty: Fix ->session lockingJann Horn1-0/+4
Currently, locking of ->session is very inconsistent; most places protect it using the legacy tty mutex, but disassociate_ctty(), __do_SAK(), tiocspgrp() and tiocgsid() don't. Two of the writers hold the ctrl_lock (because they already need it for ->pgrp), but __proc_set_tty() doesn't do that yet. On a PREEMPT=y system, an unprivileged user can theoretically abuse this broken locking to read 4 bytes of freed memory via TIOCGSID if tiocgsid() is preempted long enough at the right point. (Other things might also go wrong, especially if root-only ioctls are involved; I'm not sure about that.) Change the locking on ->session such that: - tty_lock() is held by all writers: By making disassociate_ctty() hold it. This should be fine because the same lock can already be taken through the call to tty_vhangup_session(). The tricky part is that we need to shorten the area covered by siglock to be able to take tty_lock() without ugly retry logic; as far as I can tell, this should be fine, since nothing in the signal_struct is touched in the `if (tty)` branch. - ctrl_lock is held by all writers: By changing __proc_set_tty() to hold the lock a little longer. - All readers that aren't holding tty_lock() hold ctrl_lock: By adding locking to tiocgsid() and __do_SAK(), and expanding the area covered by ctrl_lock in tiocspgrp(). Cc: [email protected] Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Jiri Slaby <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04tty: Remove dead termiox codeJann Horn2-10/+0
set_termiox() and the TCGETX handler bail out with -EINVAL immediately if ->termiox is NULL, but there are no code paths that can set ->termiox to a non-NULL pointer; and no such code paths seem to have existed since the termiox mechanism was introduced back in commit 1d65b4a088de ("tty: Add termiox") in v2.6.28. Similarly, no driver actually implements .set_termiox; and it looks like no driver ever has. Delete this dead code; but leave the definition of struct termiox in the UAPI headers intact. Signed-off-by: Jann Horn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski18-106/+399
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04Merge tag 'mhi-for-v5.11' of ↵Greg Kroah-Hartman1-5/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI patches for v5.11 Here is the MHI patch set for v5.11. Most of the patches are cleanups and fixes but there are some noticeable changes too: 1. Loic finally removed the auto-start option from the channel parameters of the MHI controller. It is the duty of the client drivers like qrtr to start/stop the channels when required, so we decided to remove this option. As a side effect, we changed the qrtr driver to start the channels during its probe and removed the auto-start option from ath11k controller. **NOTE** Since these changes spawns both MHI and networking trees, the patches are maintained in an immutable branch [1] and pulled into both mhi-next and ath11k-next branches. The networking patches got acks from ath11k and networking maintainers as well. 2. Loic added a generic MHI pci controller driver. This driver will be used by the PCI based Qualcomm modems like SDX55 and exposes channels such as QMI, IP_HW0, IPCR etc... 3. Loic fixed the MHI device hierarchy by maintaining the correct parent child relationships. Earlier all MHI devices lived in the same level under the parent device like PCIe. But now, the MHI devices belonging to channels will become the children of controller MHI device. 4. Finally Loic also improved the MHI device naming by using indexed names such as mhi0, mhi1, etc... This will break the userspace applications depending on the old naming convention but since the only one user so far is Jeff Hugo's AI accelerator apps, we decided to make this change now itself with his agreement. 5. Bhaumik fixed the qrtr driver by stopping the channels during remove. This patch also got ack from networking maintainer and we decided to take it through MHI tree (via immutable branch) since we already had a qrtr change. [1] https://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi.git/log/?h=mhi-ath11k-immutable * tag 'mhi-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: (30 commits) mhi: pci_generic: Fix implicit conversion warning bus: mhi: core: Fix error handling in mhi_register_controller() bus: mhi: core: Fix device hierarchy bus: mhi: core: Indexed MHI controller name net: qrtr: Unprepare MHI channels during remove bus: mhi: core: Remove MHI event ring IRQ handlers when powering down bus: mhi: core: Mark and maintain device states early on after power down bus: mhi: core: Separate system error and power down handling bus: mhi: core: Check for IRQ availability during registration bus: mhi: core: Move to an error state on mission mode failure bus: mhi: core: Use appropriate label in firmware load handler API bus: mhi: core: Move to an error state on any firmware load failure bus: mhi: core: Prevent sending multiple RDDM entry callbacks bus: mhi: core: Move to SYS_ERROR regardless of RDDM capability bus: mhi: core: Skip device wake in error or shutdown states bus: mhi: core: Move to using high priority workqueue bus: mhi: core: Use appropriate names for firmware load functions bus: mhi: core: Skip RDDM download for unknown execution environment bus: mhi: core: Rename RDDM download function to use proper words bus: mhi: core: Remove unused mhi_fw_load_worker() declaration ...
2020-12-04of: fix linker-section match-table corruptionJohan Hovold1-0/+1
Specify type alignment when declaring linker-section match-table entries to prevent gcc from increasing alignment and corrupting the various tables with padding (e.g. timers, irqchips, clocks, reserved memory). This is specifically needed on x86 where gcc (typically) aligns larger objects like struct of_device_id with static extent on 32-byte boundaries which at best prevents matching on anything but the first entry. Specifying alignment when declaring variables suppresses this optimisation. Here's a 64-bit example where all entries are corrupt as 16 bytes of padding has been inserted before the first entry: ffffffff8266b4b0 D __clk_of_table ffffffff8266b4c0 d __of_table_fixed_factor_clk ffffffff8266b5a0 d __of_table_fixed_clk ffffffff8266b680 d __clk_of_table_sentinel And here's a 32-bit example where the 8-byte-aligned table happens to be placed on a 32-byte boundary so that all but the first entry are corrupt due to the 28 bytes of padding inserted between entries: 812b3ec0 D __irqchip_of_table 812b3ec0 d __of_table_irqchip1 812b3fa0 d __of_table_irqchip2 812b4080 d __of_table_irqchip3 812b4160 d irqchip_of_match_end Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte alignment), and on arm using gcc-7.2. Note that there are no in-tree users of these tables on x86 currently (even if they are included in the image). Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations") Fixes: f6e916b82022 ("irqchip: add basic infrastructure") Cc: stable <[email protected]> # 3.9 Signed-off-by: Johan Hovold <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04earlycon: simplify earlycon-table implementationJohan Hovold1-13/+7
Instead of using the array-of-pointers trick to avoid having gcc mess up the earlycon array stride, specify type alignment when declaring entries to prevent gcc from increasing alignment. This is essentially an alternative (one-line) fix to the problem addressed by commit dd709e72cb93 ("earlycon: Use a pointer table to fix __earlycon_table stride"). gcc can increase the alignment of larger objects with static extent as an optimisation, but this can be suppressed by using the aligned attribute when declaring variables. Note that we have been relying on this behaviour for kernel parameters for 16 years and it indeed hasn't changed since the introduction of the aligned attribute in gcc-3.1. Signed-off-by: Johan Hovold <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04net/mlx5: Register mlx5 devices to auxiliary virtual busLeon Romanovsky1-3/+23
Create auxiliary devices under new virtual bus. This will replace the custom-made mlx5 ->add()/->remove() interfaces and next patches will fill the missing callback and remove the old interface logic. The attachment of auxiliary drivers to the devices is possible in 1-to-1 manner only and it requires us to create device for every protocol, so that device (module) will be able to connect to it. System with 2 IB and 1 RoCE cards: [leonro@vm ~]$ lspci |grep nox 00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2 [leonro@vm ~]$ rdma dev 0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456 2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457 System with RoCE SR-IOV card with 4 VFs: [leonro@vm ~]$ lspci |grep nox 01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6] 01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] 01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function] [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/ mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2 mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3 mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4 [leonro@vm ~]$ rdma dev 0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456 2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457 3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458 4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459 Signed-off-by: Leon Romanovsky <[email protected]>
2020-12-04vdpa/mlx5: Make hardware definitions visible to all mlx5 devicesLeon Romanovsky1-0/+166
Move mlx5_vdpa IFC header file to the general include folder, so mlx5_core will be able to reuse it to check if VDPA is supported prior to creating an auxiliary device. As part of this move, update the header file name to mlx5 general naming scheme. Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2020-12-04net/mlx5_core: Clean driver version and nameLeon Romanovsky1-0/+2
Remove exposed driver version as it was done in other drivers, so module version will work correctly by displaying the kernel version for which it is compiled. And move mlx5_core module name to general include, so auxiliary drivers will be able to use it as a basis for a name in their device ID tables. Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2020-12-04Merge tag 'auxbus-5.11-rc1' of ↵Leon Romanovsky2-0/+85
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into mlx5-next Auxiliary Bus support tag for 5.11-rc1 This is a signed tag for other subsystems to be able to pull in the auxiliary bus support into their trees for the 5.11-rc1 merge. Signed-off-by: Greg Kroah-Hartman <[email protected]> * tag 'auxbus-5.11-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: auxiliary bus: minor coding style tweaks driver core: auxiliary bus: make remove function return void driver core: auxiliary bus: move slab.h from include file Add auxiliary bus support
2020-12-04Merge tag 'auxbus-5.11-rc1' of ↵Greg Kroah-Hartman2-0/+85
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into driver-core-next Auxiliary Bus support tag for 5.11-rc1 This is a signed tag for other subsystems to be able to pull in the auxiliary bus support into their trees for the 5.11-rc1 merge. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04driver core: auxiliary bus: minor coding style tweaksGreg Kroah-Hartman1-3/+3
For some reason, the original aux bus patch had some really long lines in a few places, probably due to it being a very long-lived patch in development by many different people. Fix that up so that the two files all have the same length lines and function formatting styles. Cc: Dan Williams <[email protected]> Cc: Dave Ertman <[email protected]> Cc: Fred Oh <[email protected]> Cc: Kiran Patil <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Martin Habets <[email protected]> Cc: Parav Pandit <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Cc: Ranjani Sridharan <[email protected]> Cc: Shiraz Saleem <[email protected]> Link: https://lore.kernel.org/r/X8oiSFTpYHw1xE/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04driver core: auxiliary bus: make remove function return voidGreg Kroah-Hartman1-1/+1
There's an effort to move the remove() callback in the driver core to not return an int, as nothing can be done if this function fails. To make that effort easier, make the aux bus remove function void to start with so that no users have to be changed sometime in the future. Cc: Dan Williams <[email protected]> Cc: Dave Ertman <[email protected]> Cc: Fred Oh <[email protected]> Cc: Kiran Patil <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Martin Habets <[email protected]> Cc: Parav Pandit <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Cc: Ranjani Sridharan <[email protected]> Cc: Shiraz Saleem <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04driver core: auxiliary bus: move slab.h from include fileGreg Kroah-Hartman1-1/+0
No need to include slab.h in include/linux/auxiliary_bus.h, as it is not needed there. Move it to drivers/base/auxiliary.c instead. Cc: Dan Williams <[email protected]> Cc: Dave Ertman <[email protected]> Cc: Fred Oh <[email protected]> Cc: Kiran Patil <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Martin Habets <[email protected]> Cc: Parav Pandit <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Cc: Ranjani Sridharan <[email protected]> Cc: Shiraz Saleem <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04mmc: tmio: set max_busy_timeoutWolfram Sang1-1/+6
Set max_busy_timeouts for variants known to support the TOPxx bits in the SD_OPTION register. The timeout mechanism was running in the background but not yet properly handled in the driver. So, let the MMC core know when to not use R1B to avoid unhandled timeouts. My datasheets for older variants (tmio_mmc.c) suggest that they support it, too. However, actual bit descriptions are lacking, so I chose an opt-in approach. Signed-off-by: Wolfram Sang <[email protected]> Reviewed-by: Yoshihiro Shimoda <[email protected]> Tested-by: Yoshihiro Shimoda <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-12-04Add auxiliary bus supportDave Ertman2-0/+86
Add support for the Auxiliary Bus, auxiliary_device and auxiliary_driver. It enables drivers to create an auxiliary_device and bind an auxiliary_driver to it. The bus supports probe/remove shutdown and suspend/resume callbacks. Each auxiliary_device has a unique string based id; driver binds to an auxiliary_device based on this id through the bus. Co-developed-by: Kiran Patil <[email protected]> Co-developed-by: Ranjani Sridharan <[email protected]> Co-developed-by: Fred Oh <[email protected]> Co-developed-by: Leon Romanovsky <[email protected]> Signed-off-by: Kiran Patil <[email protected]> Signed-off-by: Ranjani Sridharan <[email protected]> Signed-off-by: Fred Oh <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Reviewed-by: Pierre-Louis Bossart <[email protected]> Reviewed-by: Shiraz Saleem <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Martin Habets <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]> Link: https://lore.kernel.org/r/160695681289.505290.8978295443574440604.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04fs, close_range: add flag CLOSE_RANGE_CLOEXECGiuseppe Scrivano1-0/+3
When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't immediately close the files but it sets the close-on-exec bit. It is useful for e.g. container runtimes that usually install a seccomp profile "as late as possible" before execv'ing the container process itself. The container runtime could either do: 1 2 - install_seccomp_profile(); - close_range(MIN_FD, MAX_INT, 0); - close_range(MIN_FD, MAX_INT, 0); - install_seccomp_profile(); - execve(...); - execve(...); Both alternative have some disadvantages. In the first variant the seccomp_profile cannot block the close_range syscall, as well as opendir/read/close/... for the fallback on older kernels. In the second variant, close_range() can be used only on the fds that are not going to be needed by the runtime anymore, and it must be potentially called multiple times to account for the different ranges that must be closed. Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues. The runtime is able to use the existing open fds, the seccomp profile can block close_range() and the syscalls used for its fallback. Signed-off-by: Giuseppe Scrivano <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2020-12-04Merge remote-tracking branch 'origin/kvm-arm64/misc-5.11' into ↵Marc Zyngier2-0/+5
kvmarm-master/queue Signed-off-by: Marc Zyngier <[email protected]>
2020-12-04psci: Add accessor for psci_0_1_function_idsDavid Brazdil1-0/+9
Make it possible to retrieve a copy of the psci_0_1_function_ids struct. This is useful for KVM if it is configured to intercept host's PSCI SMCs. Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-04batman-adv: Allow selection of routing algorithm over rtnetlinkSven Eckelmann1-0/+6
A batadv net_device is associated to a B.A.T.M.A.N. routing algorithm. This algorithm has to be selected before the interface is initialized and cannot be changed after that. The only way to select this algorithm was a module parameter which specifies the default algorithm used during the creation of the net_device. This module parameter is writeable over /sys/module/batman_adv/parameters/routing_algo and thus allows switching of the routing algorithm: 1. change routing_algo parameter 2. create new batadv net_device But this is not race free because another process can be scheduled between 1 + 2 and in that time frame change the routing_algo parameter again. It is much cleaner to directly provide this information inside the rtnetlink's RTM_NEWLINK message. The two processes would be (in regards of the creation parameter of their batadv interfaces) be isolated. This also eases the integration of batadv devices inside tools like network-manager or systemd-networkd which are not expecting to operate on /sys before a new net_device is created. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-12-04batman-adv: Prepare infrastructure for newlink settingsSven Eckelmann1-0/+20
The batadv generic netlink family can be used to retrieve the current state and set various configuration settings. But there are also settings which must be set before the actual interface is created. The rtnetlink already uses IFLA_INFO_DATA to allow net_device families to transfer such configurations. The minimal required functionality for this is now available for the batadv rtnl_link_ops. Also a new IFLA class of attributes will be attached to it because rtnetlink only allows 51 different attributes but batadv_nl_attrs already contains 62 attributes. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-12-04crypto: lib/blake2s - Move selftest prototype into header fileHerbert Xu1-0/+2
This patch fixes a missing prototype warning on blake2s_selftest. Reported-by: kernel test robot <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2020-12-03bpf: Allow to specify kernel module BTFs when attaching BPF programsAndrii Nakryiko2-1/+7
Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03bpf: Remove hard-coded btf_vmlinux assumption from BPF verifierAndrii Nakryiko3-12/+34
Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03bpf: Adds support for setting window clampPrankur gupta1-0/+1
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_WINDOW_CLAMP, which sets the maximum receiver window size. It will be useful for limiting receiver window based on RTT. Signed-off-by: Prankur gupta <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-6/+39
Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03vfio-ccw: Wire in the request callbackEric Farman1-0/+1
The device is being unplugged, so pass the request to userspace to ask for a graceful cleanup. This should free up the thread that would otherwise loop waiting for the device to be fully released. Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-12-03vfio-mdev: Wire in a request handler for mdev parentEric Farman1-0/+4
While performing some destructive tests with vfio-ccw, where the paths to a device are forcible removed and thus the device itself is unreachable, it is rather easy to end up in an endless loop in vfio_del_group_dev() due to the lack of a request callback for the associated device. In this example, one MDEV (77c) is used by a guest, while another (77b) is not. The symptom is that the iommu is detached from the mdev for 77b, but not 77c, until that guest is shutdown: [ 238.794867] vfio_ccw 0.0.077b: MDEV: Unregistering [ 238.794996] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: Removing from iommu group 2 [ 238.795001] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: MDEV: detaching iommu [ 238.795036] vfio_ccw 0.0.077c: MDEV: Unregistering ...silence... Let's wire in the request call back to the mdev device, so that a device being physically removed from the host can be (gracefully?) handled by the parent device at the time the device is removed. Add a message when registering the device if a driver doesn't provide this callback, so a clue is given that this same loop may be encountered in a similar situation, and a message when this occurs instead of the awkward silence noted above. Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-12-03Merge tag 'net-5.10-rc7' of ↵Linus Torvalds5-3/+30
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc7, including fixes from bpf, netfilter, wireless drivers, wireless mesh and can. Current release - regressions: - mt76: usb: fix crash on device removal Current release - always broken: - xsk: Fix umem cleanup from wrong context in socket destruct Previous release - regressions: - net: ip6_gre: set dev->hard_header_len when using header_ops - ipv4: Fix TOS mask in inet_rtm_getroute() - net, xsk: Avoid taking multiple skbuff references Previous release - always broken: - net/x25: prevent a couple of overflows - netfilter: ipset: prevent uninit-value in hash_ip6_add - geneve: pull IP header before ECN decapsulation - mpls: ensure LSE is pullable in TC and openvswitch paths - vxlan: respect needed_headroom of lower device - batman-adv: Consider fragmentation for needed packet headroom - can: drivers: don't count arbitration loss as an error - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal - inet_ecn: Fix endianness of checksum update when setting ECT(1) - ibmvnic: fix various corner cases around reset handling - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering - net/mlx5: Enforce HW TX csum offload with kTLS" * tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled net/mlx5: Fix wrong address reclaim when command interface is down net/sched: act_mpls: ensure LSE is pullable before reading it net: openvswitch: ensure LSE is pullable before reading it net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl net: mvpp2: Fix error return code in mvpp2_open() chelsio/chtls: fix a double free in chtls_setkey() rtw88: debug: Fix uninitialized memory in debugfs code vxlan: fix error return code in __vxlan_dev_create() net: pasemi: fix error return code in pasemi_mac_open() cxgb3: fix error return code in t3_sge_alloc_qset() net/x25: prevent a couple of overflows dpaa_eth: copy timestamp fields to new skb in A-050385 workaround net: ip6_gre: set dev->hard_header_len when using header_ops mt76: usb: fix crash on device removal iwlwifi: pcie: add some missing entries for AX210 iwlwifi: pcie: invert values of NO_160 device config entries iwlwifi: pcie: add one missing entry for AX210 ...
2020-12-03tcp: merge 'init_req' and 'route_req' functionsFlorian Westphal1-5/+4
The Multipath-TCP standard (RFC 8684) says that an MPTCP host should send a TCP reset if the token in a MP_JOIN request is unknown. At this time we don't do this, the 3whs completes and the 'new subflow' is reset afterwards. There are two ways to allow MPTCP to send the reset. 1. override 'send_synack' callback and emit the rst from there. The drawback is that the request socket gets inserted into the listeners queue just to get removed again right away. 2. Send the reset from the 'route_req' function instead. This avoids the 'add&remove request socket', but route_req lacks the skb that is required to send the TCP reset. Instead of just adding the skb to that function for MPTCP sake alone, Paolo suggested to merge init_req and route_req functions. This saves one indirection from syn processing path and provides the skb to the merged function at the same time. 'send reset on unknown mptcp join token' is added in next patch. Suggested-by: Paolo Abeni <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03security: add const qualifier to struct sock in various placesFlorian Westphal3-4/+4
A followup change to tcp_request_sock_op would have to drop the 'const' qualifier from the 'route_req' function as the 'security_inet_conn_request' call is moved there - and that function expects a 'struct sock *'. However, it turns out its also possible to add a const qualifier to security_inet_conn_request instead. Signed-off-by: Florian Westphal <[email protected]> Acked-by: James Morris <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>