aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-15proc: don't allow async path resolution of /proc/thread-self componentsJens Axboe2-1/+8
If this is attempted by an io-wq kthread, then return -EOPNOTSUPP as we don't currently support that. Once we can get task_pid_ptr() doing the right thing, then this can go away again. Use PF_IO_WORKER for this to speciically target the io_uring workers. Modify the /proc/self/ check to use PF_IO_WORKER as well. Cc: [email protected] Fixes: 8d4c3e76e3be ("proc: don't allow async path resolution of /proc/self components") Reported-by: Eric W. Biederman <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-15Merge branches 'powercap' and 'pm-misc'Rafael J. Wysocki12-12/+1067
* powercap: powercap: intel_rapl: Use topology interface in rapl_init_domains() powercap: intel_rapl: Use topology interface in rapl_add_package() powercap/intel_rapl: add support for AlderLake Mobile powercap/drivers/dtpm: Fix size of object being allocated powercap/drivers/dtpm: Fix an IS_ERR() vs NULL check powercap/drivers/dtpm: Fix some missing unlock bugs powercap/drivers/dtpm: Fix a double shift bug powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbols powercap/drivers/dtpm: Add CPU energy model based support powercap/drivers/dtpm: Add API for dynamic thermal power management Documentation/powercap/dtpm: Add documentation for dtpm units: Add Watt units * pm-misc: PM: Kconfig: remove unneeded "default n" options PM: EM: update Kconfig description and drop "default n" option
2021-02-15binfmt_misc: pass binfmt_misc flags to the interpreterLaurent Vivier5-3/+19
It can be useful to the interpreter to know which flags are in use. For instance, knowing if the preserve-argv[0] is in use would allow to skip the pathname argument. This patch uses an unused auxiliary vector, AT_FLAGS, to add a flag to inform interpreter if the preserve-argv[0] is enabled. Note by Helge Deller: The real-world user of this patch is qemu-user, which needs to know if it has to preserve the argv[0]. See Debian bug #970460. Signed-off-by: Laurent Vivier <[email protected]> Reviewed-by: YunQiang Su <[email protected]> URL: http://bugs.debian.org/970460 Signed-off-by: Helge Deller <[email protected]>
2021-02-15netfilter: nftables: introduce table ownershipPablo Neira Ayuso3-46/+128
A userspace daemon like firewalld might need to monitor for netlink updates to detect its ruleset removal by the (global) flush ruleset command to ensure ruleset persistency. This adds extra complexity from userspace and, for some little time, the firewall policy is not in place. This patch adds the NFT_TABLE_F_OWNER flag which allows a userspace program to own the table that creates in exclusivity. Tables that are owned... - can only be updated and removed by the owner, non-owners hit EPERM if they try to update it or remove it. - are destroyed when the owner closes the netlink socket or the process is gone (implicit netlink socket closure). - are skipped by the global flush ruleset command. - are listed in the global ruleset. The userspace process that sets on the NFT_TABLE_F_OWNER flag need to leave open the netlink socket. A new NFTA_TABLE_OWNER netlink attribute specifies the netlink port ID to identify the owner from userspace. This patch also updates error reporting when an unknown table flag is specified to change it from EINVAL to EOPNOTSUPP given that EINVAL is usually reserved to report for malformed netlink messages to userspace. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-02-15netfilter: nftables: add helper function to release hooks of one single tablePablo Neira Ayuso1-5/+10
Add a function to release the hooks of one single table. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-02-15netfilter: nftables: add helper function to release one tablePablo Neira Ayuso1-35/+40
Add a function to release one table. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-02-15Merge branch 'acpi-messages'Rafael J. Wysocki14-332/+242
* acpi-messages: ACPI: OSL: Clean up printing messages ACPI: OSL: Rework acpi_check_resource_conflict() ACPI: thermal: Clean up printing messages ACPI: video: Clean up printing messages ACPI: button: Clean up printing messages ACPI: battery: Clean up printing messages ACPI: AC: Clean up printing messages ACPI: bus: Drop ACPI_BUS_COMPONENT which is not used any more ACPI: utils: Clean up printing messages ACPI: scan: Clean up printing messages ACPI: bus: Clean up printing messages ACPI: PM: Clean up printing messages ACPI: power: Clean up printing messages
2021-02-15Merge branches 'acpi-misc', 'acpi-cppc', 'acpi-docs', 'acpi-config' and ↵Rafael J. Wysocki18-94/+99
'acpi-apei' * acpi-misc: ACPI: Test for ACPI_SUCCESS rather than !ACPI_FAILURE ACPI: Use DEVICE_ATTR_<RW|RO|WO> macros * acpi-cppc: ACPI: CPPC: initialise vaddr pointers to NULL ACPI: CPPC: add __iomem annotation to generic_comm_base pointer ACPI: CPPC: remove __iomem annotation for cpc_reg's address * acpi-docs: Documentation: ACPI: add new rule for gpio-line-names * acpi-config: ACPI: configfs: add missing check after configfs_register_default_group() * acpi-apei: ACPI: APEI: ERST: remove unneeded semicolon ACPI: APEI: Add is_generic_error() to identify GHES sources
2021-02-15Merge branches 'acpi-scan', 'acpi-properties' and 'acpi-platform'Rafael J. Wysocki10-118/+389
* acpi-scan: ACPI: scan: Rearrange code related to acpi_get_device_data() ACPI: scan: Adjust white space in acpi_device_add() ACPI: scan: Rearrange memory allocation in acpi_device_add() * acpi-properties: ACPI: property: Satisfy kernel doc validator (part 2) ACPI: property: Satisfy kernel doc validator (part 1) ACPI: property: Make acpi_node_prop_read() static ACPI: property: Remove dead code ACPI: property: Fix fwnode string properties matching * acpi-platform: ACPI: platform-profile: Fix possible deadlock in platform_profile_remove() ACPI: platform-profile: Introduce object pointers to callbacks ACPI: platform-profile: Drop const qualifier for cur_profile ACPI: platform: Add platform profile support Documentation: Add documentation for new platform_profile sysfs attribute
2021-02-15Merge branches 'pm-devfreq' and 'pm-tools'Rafael J. Wysocki10-63/+66
* pm-devfreq: PM / devfreq: rk3399_dmc: Remove unneeded semicolon PM / devfreq: Replace devfreq->dev.parent as dev in devfreq_add_device PM / devfreq: Correct spelling in a comment * pm-tools: cpupower: Add cpuid cap flag for MSR_AMD_HWCR support cpupower: Remove family arg to decode_pstates() cpupower: Condense pstate enabled bit checks in decode_pstates() cpupower: Update family checks when decoding HW pstates cpupower: Remove unused pscur variable. cpupower: Add CPUPOWER_CAP_AMD_HW_PSTATE cpuid caps flag cpupower: Correct macro name for CPB caps flag cpupower: Update msr_pstate union struct naming cpupower: add Makefile dependencies for install targets
2021-02-15Merge branch 'pm-opp' into pmRafael J. Wysocki10-339/+885
* pm-opp: (37 commits) PM / devfreq: Add required OPPs support to passive governor PM / devfreq: Cache OPP table reference in devfreq OPP: Add function to look up required OPP's for a given OPP opp: Replace ENOTSUPP with EOPNOTSUPP opp: Fix "foo * bar" should be "foo *bar" opp: Don't ignore clk_get() errors other than -ENOENT opp: Update bandwidth requirements based on scaling up/down opp: Allow lazy-linking of required-opps opp: Remove dev_pm_opp_set_bw() devfreq: tegra30: Migrate to dev_pm_opp_set_opp() drm: msm: Migrate to dev_pm_opp_set_opp() cpufreq: qcom: Migrate to dev_pm_opp_set_opp() opp: Implement dev_pm_opp_set_opp() opp: Update parameters of _set_opp_custom() opp: Allow _generic_set_opp_clk_only() to work for non-freq devices opp: Allow _generic_set_opp_regulator() to work for non-freq devices opp: Allow _set_opp() to work for non-freq devices opp: Split _set_opp() out of dev_pm_opp_set_rate() opp: Keep track of currently programmed OPP opp: No need to check clk for errors ...
2021-02-15Merge branches 'pm-sleep', 'pm-core', 'pm-domains' and 'pm-clk'Rafael J. Wysocki12-87/+413
* pm-sleep: PM: sleep: Constify static struct attribute_group PM: sleep: Use dev_printk() when possible PM: sleep: No need to check PF_WQ_WORKER in thaw_kernel_threads() * pm-core: PM: runtime: Fix typos and grammar PM: runtime: Fix resposible -> responsible in runtime.c * pm-domains: PM: domains: Simplify the calculation of variables PM: domains: Add "performance" column to debug summary PM: domains: Make of_genpd_add_subdomain() return -EPROBE_DEFER PM: domains: Make set_performance_state() callback optional PM: domains: use device's next wakeup to determine domain idle state PM: domains: inform PM domain of a device's next wakeup * pm-clk: PM: clk: make PM clock layer compatible with clocks that must sleep
2021-02-15Merge branches 'pm-cpuidle' and 'pm-cpufreq'Rafael J. Wysocki929-4602/+9679
* pm-cpuidle: MAINTAINERS: cpuidle: exynos: include header in file pattern intel_idle: remove definition of DEBUG * pm-cpufreq: cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN cpufreq: Remove CPUFREQ_STICKY flag cpufreq: intel_pstate: Remove repeated word cpufreq: remove tango driver cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove() cpufreq: brcmstb-avs-cpufreq: Free resources in error path cpufreq: qcom-hw: enable boost support cpufreq: tegra20: Use resource-managed API cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available cpufreq: intel_pstate: Rename two functions cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument cpufreq: intel_pstate: Always read hwp_cap_cached with READ_ONCE()
2021-02-15media: v4l: async: Fix kerneldoc documentation for async functionsSakari Ailus1-41/+39
Fix kerneldoc documentation for functions that add async sub-devices to notifiers. The functions themselves were improved recently but that left issues with the kerneldoc documentation. Fix them now. Also remove underscores from macro argument names. [mchehab: fix a build breakage] Reported-by: Stephen Rothwell <[email protected]> Fixes: b01edcbd409c ("media: v4l2-async: Improve v4l2_async_notifier_add_*_subdev() API") Signed-off-by: Sakari Ailus <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2021-02-15Merge tag 'irqchip-5.12' of ↵Thomas Gleixner17-455/+701
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier - New driver for the MIPS-based Realtek RTL838x/RTL839x SoC - Conversion of the sun6i-r support code to a hierarchical setup - Fix wake-up interrupts for the ls-extirq driver - Fix MSI allocation for the loongson-pch-msi driver - Add compatible strings for new Qualcomm SoCs - Tidy up a few Kconfig entries (IMX, CSKY) - Spelling phyksiz - Remove the sirfsoc and tango drivers Link: https://lore.kernel.org/r/[email protected]
2021-02-15MIPS: kernel: Drop kgdb_call_nmi_hookThomas Bogendoerfer1-5/+0
With the removal of set_fs() calls kgdb_call_nmi_hook() is now the same as the default implementation, so we can remove it. Signed-off-by: Thomas Bogendoerfer <[email protected]>
2021-02-15Revert "Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer"Wei Liu11-139/+25
This reverts commit a8c3209998afb5c4941b49e35b513cea9050cb4a. It is reported that the said commit caused regression in netvsc. Reported-by: Andrea Parri (Microsoft) <[email protected]> Signed-off-by: Wei Liu <[email protected]>
2021-02-15Merge branch 'edac-misc' into edac-updates-for-v5.12Borislav Petkov2-2/+2
2021-02-15ALSA: hda/realtek: Quirk for HP Spectre x360 14 amp setupTakashi Iwai1-0/+29
HP Spectre x360 14 model (PCI SSID 103c:87f7) seems requiring a unique setup for its external amp: the GPIO0 needs to be toggled on and off shortly at each device initialization via runtime PM. This patch implements that workaround as well as the model option string, so that users with other devices may try the same workaround more easily. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=210633 Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2021-02-15Merge branch 'for-next' into for-linusTakashi Iwai493-2263/+5417
Unification of 5.12-devel branches. Signed-off-by: Takashi Iwai <[email protected]>
2021-02-15xen-blkback: fix error handling in xen_blkbk_map()Jan Beulich1-10/+16
The function uses a goto-based loop, which may lead to an earlier error getting discarded by a later iteration. Exit this ad-hoc loop when an error was encountered. The out-of-memory error path additionally fails to fill a structure field looked at by xen_blkbk_unmap_prepare() before inspecting the handle which does get properly set (to BLKBACK_INVALID_HANDLE). Since the earlier exiting from the ad-hoc loop requires the same field filling (invalidation) as that on the out-of-memory path, fold both paths. While doing so, drop the pr_alert(), as extra log messages aren't going to help the situation (the kernel will log oom conditions already anyway). This is XSA-365. Signed-off-by: Jan Beulich <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Reviewed-by: Julien Grall <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15xen-scsiback: don't "handle" error by BUG()Jan Beulich1-2/+2
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping(). Don't make problems worse, the more that handling elsewhere (together with map's status fields now indicating whether a mapping wasn't even attempted, and hence has to be considered failed) doesn't require this odd way of dealing with errors. This is part of XSA-362. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15xen-netback: don't "handle" error by BUG()Jan Beulich1-3/+1
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping(). Don't make problems worse, the more that handling elsewhere (together with map's status fields now indicating whether a mapping wasn't even attempted, and hence has to be considered failed) doesn't require this odd way of dealing with errors. This is part of XSA-362. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15xen-blkback: don't "handle" error by BUG()Jan Beulich1-4/+2
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping(). Don't make problems worse, the more that handling elsewhere (together with map's status fields now indicating whether a mapping wasn't even attempted, and hence has to be considered failed) doesn't require this odd way of dealing with errors. This is part of XSA-362. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15xen/arm: don't ignore return errors from set_phys_to_machineStefano Stabellini1-2/+4
set_phys_to_machine can fail due to lack of memory, see the kzalloc call in arch/arm/xen/p2m.c:__set_phys_to_machine_multi. Don't ignore the potential return error in set_foreign_p2m_mapping, returning it to the caller instead. This is part of XSA-361. Signed-off-by: Stefano Stabellini <[email protected]> Cc: [email protected] Reviewed-by: Julien Grall <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15Xen/gntdev: correct error checking in gntdev_map_grant_pages()Jan Beulich2-8/+10
Failure of the kernel part of the mapping operation should also be indicated as an error to the caller, or else it may assume the respective kernel VA is okay to access. Furthermore gnttab_map_refs() failing still requires recording successfully mapped handles, so they can be unmapped subsequently. This in turn requires there to be a way to tell full hypercall failure from partial success - preset map_op status fields such that they won't "happen" to look as if the operation succeeded. Also again use GNTST_okay instead of implying its value (zero). This is part of XSA-361. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()Jan Beulich1-11/+13
We may not skip setting the field in the unmap structure when GNTMAP_device_map is in use - such an unmap would fail to release the respective resources (a page ref in the hypervisor). Otoh the field doesn't need setting at all when GNTMAP_device_map is not in use. To record the value for unmapping, we also better don't use our local p2m: In particular after a subsequent change it may not have got updated for all the batch elements. Instead it can simply be taken from the respective map's results. We can additionally avoid playing this game altogether for the kernel part of the mappings in (x86) PV mode. This is part of XSA-361. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Stefano Stabellini <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()Jan Beulich1-1/+2
We should not set up further state if either mapping failed; paying attention to just the user mapping's status isn't enough. Also use GNTST_okay instead of implying its value (zero). This is part of XSA-361. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-15Xen/x86: don't bail early from clear_foreign_p2m_mapping()Jan Beulich1-7/+5
Its sibling (set_foreign_p2m_mapping()) as well as the sibling of its only caller (gnttab_map_refs()) don't clean up after themselves in case of error. Higher level callers are expected to do so. However, in order for that to really clean up any partially set up state, the operation should not terminate upon encountering an entry in unexpected state. It is particularly relevant to notice here that set_foreign_p2m_mapping() would skip setting up a p2m entry if its grant mapping failed, but it would continue to set up further p2m entries as long as their mappings succeeded. Arguably down the road set_foreign_p2m_mapping() may want its page state related WARN_ON() also converted to an error return. This is part of XSA-361. Signed-off-by: Jan Beulich <[email protected]> Cc: [email protected] Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2021-02-14lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid()Andy Shevchenko2-5/+3
There is a specific API to treat raw data as GUID, i.e. export_guid() and import_guid(). Use them instead of guid_copy() with explicit casting. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-14lightnvm: fix unnecessary NULL check warningsTian Tao1-2/+1
Remove NULL checks before vfree() to fix these warnings: ./drivers/lightnvm/pblk-gc.c:27:2-7: WARNING: NULL check before some freeing functions is not needed. Signed-off-by: Tian Tao <[email protected]> Signed-off-by: Matias Bjørling <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-14Merge branch 'mvpp2-next'David S. Miller2-32/+28
Stefan Chulski says: ==================== net: mvpp2: Minor non functional driver code improvements The patch series contains minor code improvements and did not change any functionality. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: mvpp2: improve Networking Complex Control register namingStefan Chulski2-7/+7
GENCONF_CTRL0_PORTX naming improved. Non functional change. Signed-off-by: Stefan Chulski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: mvpp2: improve mvpp2_get_sram returnStefan Chulski1-3/+1
Use PTR_ERR_OR_ZERO instead of IS_ERR and PTR_ERR. Non functional change. Signed-off-by: Stefan Chulski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: mvpp2: improve Packet Processor version checkStefan Chulski1-18/+18
Use >= MVPP22 instead of != MVPP21. Non functional change. Signed-off-by: Stefan Chulski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: mvpp2: simplify PPv2 version ID readStefan Chulski1-4/+2
PPv2.1 contain 0 in Version ID register, priv->hw_version check can be removed. Signed-off-by: Stefan Chulski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14Merge branch 'Propagate-extack-for-switchdev-LANs-from-DSA'David S. Miller32-150/+319
Vladimir Oltean says: ==================== Propagate extack for switchdev VLANs from DSA This series moves the restriction messages printed by the DSA core, and by some individual device drivers, into the netlink extended ack structure, to be communicated to user space where possible, or still printed to the kernel log from the bridge layer. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: dsa: propagate extack to .port_vlan_filteringVladimir Oltean21-34/+61
Some drivers can't dynamically change the VLAN filtering option, or impose some restrictions, it would be nice to propagate this info through netlink instead of printing it to a kernel log that might never be read. Also netlink extack includes the module that emitted the message, which means that it's easier to figure out which ones are driver-generated errors as opposed to command misuse. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: dsa: propagate extack to .port_vlan_addVladimir Oltean20-37/+79
Allow drivers to communicate their restrictions to user space directly, instead of printing to the kernel log. Where the conversion would have been lossy and things like VLAN ID could no longer be conveyed (due to the lack of support for printf format specifier in netlink extack), I chose to keep the messages in full form to the kernel log only, and leave it up to individual driver maintainers to move more messages to extack. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: bridge: propagate extack through switchdev_port_attr_setVladimir Oltean9-25/+35
The benefit is the ability to propagate errors from switchdev drivers for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: bridge: propagate extack through store_bridge_parmVladimir Oltean4-46/+142
The bridge sysfs interface stores parameters for the STP, VLAN, multicast etc subsystems using a predefined function prototype. Sometimes the underlying function being called supports a netlink extended ack message, and we ignore it. Let's expand the store_bridge_parm function prototype to include the extack, and just print it to console, but at least propagate it where applicable. Where not applicable, create a shim function in the br_sysfs_br.c file that discards the extra function argument. This patch allows us to propagate the extack argument to br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle, and from there, further up in br_changelink from br_netlink.c. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: bridge: remove __br_vlan_filter_toggleVladimir Oltean3-10/+4
This function is identical with br_vlan_filter_toggle. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14Merge branch 'PTP-for-DSA-tag_ocelot_8021q'David S. Miller13-495/+825
Vladimir Oltean says: ==================== PTP for DSA tag_ocelot_8021q Changes in v2: Add stub definition for ocelot_port_inject_frame when switch driver is not compiled in. This is part two of the errata workaround begun here: https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/ Now that we have basic traffic support when we operate the Ocelot DSA switches without an NPI port, it would be nice to regain some of the features lost due to the lack of the NPI port functionality. An important one is PTP timestamping, which is intimately tied to the DSA frame header added by the NPI port: on TX, we put a "timestamp request ID" in the Injection Frame Header, while on RX, the Extraction Frame Header contains a partial 32-bit PTP timestamp. Get rid of the NPI port and replace it with a VLAN-based tagger, and you lose PTP, right? Well, not quite, this is what this patch series is about. The NPI port is basically a regular Ethernet port configured to service the packets in and out of the switch's CPU port module (which has other non-DSA I/O mechanisms too, such as register-based MMIO and DMA). If we disable the NPI port, we can in theory still access the packets delivered to the CPU port module by doing exactly what the ocelot switchdev driver does: extracting Ethernet packets through registers (yes, it is as icky as it sounds). However, there's a catch. The Felix switch was integrated into NXP LS1028A with the idea in mind that it will operate as DSA, i.e. using the CPU port module connected to the NPI port, not having I/O over register-based MMIO which is painfully slow and CPU intensive. So register-based packet I/O not supposed to work - those registers aren't even documented in the hardware reference manual for Felix. However they kinda do, with the exception of the fact that an RX interrupt was really not wired to the CPU cores - so we don't know when the CPU port module receives a new packet. But we can hack even around that, by replicating every packet that goes to the CPU port module and making it also go to a plain internal Ethernet port. Then drop the Ethernet packet and read the other copy of it from the CPU port module, this time annotated with the much-wanted RX timestamp. This is all fine and it works, but it does raise some questions about what DSA even is anymore, if we start having switches that inject some of their packets over Ethernet and some through registers, where do we draw the line. In principle I believe these concerns are founded, but at the same time, the way that the Felix driver uses register MMIO based packet I/O is fundamentally the same as any other DSA driver capable of PTP makes use of a side-channel for timestamps like a FIFO (just that this one is a lot more complicated, and comes with the entire actual packet, not just the timestamp). Nonetheless, I tried to keep the extra pressure added by this ERR workaround upon the DSA subsystem as small as possible, so some of the patches are just a revisit of some of Andrew's complaints w.r.t. the fact that tag_ocelot already violates any driver <-> tagger boundary, and as a consequence, is not able to be used on testbeds such as dsa_loop (which it now can). So now, the tag_ocelot and tag_ocelot_8021q drivers should be dsa_loop-clean, and have the ERR workarounds as self-contained as possible, using all the designated features for PTP timestamping and nothing more. Comments appreciated. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: dsa: tag_ocelot_8021q: add support for PTP timestampingVladimir Oltean5-2/+115
For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021qVladimir Oltean4-3/+143
Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: mscc: ocelot: refactor ocelot_xtr_irq_handler into ocelot_xtr_pollVladimir Oltean3-135/+163
Since the felix DSA driver will need to poll the CPU port module for extracted frames as well, let's create some common functions that read an Extraction Frame Header, and then an skb, from a CPU extraction group. We abuse the struct ocelot_ops :: port_to_netdev function a little bit, in order to retrieve the DSA port net_device or the ocelot switchdev net_device based on the source port information from the Extraction Frame Header, but it's all in the benefit of code simplification - netdev_alloc_skb needs it. Originally, the port_to_netdev method was intended for parsing act->dev from tc flower offload code. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: dsa: tag_ocelot: create separate tagger for SevilleVladimir Oltean7-83/+70
The ocelot tagger is a hot mess currently, it relies on memory initialized by the attached driver for basic frame transmission. This is against all that DSA tagging protocols stand for, which is that the transmission and reception of a DSA-tagged frame, the data path, should be independent from the switch control path, because the tag protocol is in principle hot-pluggable and reusable across switches (even if in practice it wasn't until very recently). But if another driver like dsa_loop wants to make use of tag_ocelot, it couldn't. This was done to have common code between Felix and Ocelot, which have one bit difference in the frame header format. Quoting from commit 67c2404922c2 ("net: dsa: felix: create a template for the DSA tags on xmit"): Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. The really interesting part is that Seville should have had its own tagging protocol defined - it is not compatible on the wire with Ocelot, even for that single bit. In principle, a packet generated by DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain way, but when booted on NXP T1040 it would look differently. The reverse is also true: a packet generated by a Seville switch would be interpreted incorrectly by Wireshark if it was told it was generated by an Ocelot switch. Actually things are a bit more nuanced. If we concentrate only on the DSA tag, what I said above is true, but Ocelot/Seville also support an optional DSA tag prefix, which can be short or long, and it is possible to distinguish the two taggers based on an integer constant put in that prefix. Nonetheless, creating a separate tagger is still justified, since the tag prefix is optional, and without it, there is again no way to distinguish. Claiming backwards binary compatibility is a bit more tough, since I've already changed the format of tag_ocelot once, in commit 5124197ce58b ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress"). Therefore I am not very concerned with treating this as a bugfix and backporting it to stable kernels (which would be another mess due to the fact that there would be lots of conflicts with the other DSA_TAG_PROTO* definitions). It's just simpler to say that the string values of the taggers have ABI value starting with kernel 5.12, which will be when the changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging goes live. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: dsa: tag_ocelot: single out PTP-related transmit tag processingVladimir Oltean1-11/+21
There is one place where we cannot avoid accessing driver data, and that is 2-step PTP TX timestamping, since the switch wants us to provide a timestamp request ID through the injection header, which naturally must come from a sequence number kept by the driver (it is generated by the .port_txtstamp method prior to the tagger's xmit). However, since other drivers like dsa_loop do not claim PTP support anyway, the DSA_SKB_CB(skb)->clone will always be NULL anyway, so if we move all PTP-related dereferences of struct ocelot and struct ocelot_port into a separate function, we can effectively ensure that this is dead code when the ocelot tagger is attached to non-ocelot switches, and the stateful portion of the tagger is more self-contained. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: mscc: ocelot: use common tag parsing code with DSAVladimir Oltean9-227/+246
The Injection Frame Header and Extraction Frame Header that the switch prepends to frames over the NPI port is also prepended to frames delivered over the CPU port module's queues. Let's unify the handling of the frame headers by making the ocelot driver call some helpers exported by the DSA tagger. Among other things, this allows us to get rid of the strange cpu_to_be32 when transmitting the Injection Frame Header on ocelot, since the packing API uses network byte order natively (when "quirks" is 0). The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and the cpu extraction queue mask to something, but the code doesn't do it, so we don't do it either. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-14net: dsa: tag_ocelot: avoid accessing ds->priv in ocelot_rcvVladimir Oltean1-4/+3
Taggers should be written to do something valid irrespective of the switch driver that they are attached to. This is even more true now, because since the introduction of the .change_tag_protocol method, a certain tagger is not necessarily strictly associated with a driver any longer, and I would like to be able to test all taggers with dsa_loop in the future. In the case of ocelot, it needs to move the classified VLAN from the DSA tag into the skb if the port is VLAN-aware. We can allow it to do that by looking at the dp->vlan_filtering property, no need to invoke structures which are specific to ocelot. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>