aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-03-13bpftool: Add _bpftool and profiler.skel.h to .gitignoreSong Liu1-0/+2
These files are generated, so ignore them. Signed-off-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-13bpftool: Skeleton should depend on libbpfSong Liu1-2/+3
Add the dependency to libbpf, to fix build errors like: In file included from skeleton/profiler.bpf.c:5: .../bpf_helpers.h:5:10: fatal error: 'bpf_helper_defs.h' file not found #include "bpf_helper_defs.h" ^~~~~~~~~~~~~~~~~~~ 1 error generated. make: *** [skeleton/profiler.bpf.o] Error 1 make: *** Waiting for unfinished jobs.... Fixes: 47c09d6a9f67 ("bpftool: Introduce "prog profile" command") Suggested-by: Quentin Monnet <[email protected]> Signed-off-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-13bpftool: Only build bpftool-prog-profile if supported by clangSong Liu4-5/+24
bpftool-prog-profile requires clang to generate BTF for global variables. When compared with older clang, skip this command. This is achieved by adding a new feature, clang-bpf-global-var, to tools/build/feature. Signed-off-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-12drm/dp_mst: Rewrite and fix bandwidth limit checksLyude Paul1-26/+93
Sigh, this is mostly my fault for not giving commit cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") enough scrutiny during review. The way we're checking bandwidth limitations here is mostly wrong: For starters, drm_dp_mst_atomic_check_bw_limit() determines the pbn_limit of a branch by simply scanning each port on the current branch device, then uses the last non-zero full_pbn value that it finds. It then counts the sum of the PBN used on each branch device for that level, and compares against the full_pbn value it found before. This is wrong because ports can and will have different PBN limitations on many hubs, especially since a number of DisplayPort hubs out there will be clever and only use the smallest link rate required for each downstream sink - potentially giving every port a different full_pbn value depending on what link rate it's trained at. This means with our current code, which max PBN value we end up with is not well defined. Additionally, we also need to remember when checking bandwidth limitations that the top-most device in any MST topology is a branch device, not a port. This means that the first level of a topology doesn't technically have a full_pbn value that needs to be checked. Instead, we should assume that so long as our VCPI allocations fit we're within the bandwidth limitations of the primary MSTB. We do however, want to check full_pbn on every port including those of the primary MSTB. However, it's important to keep in mind that this value represents the minimum link rate /between a port's sink or mstb, and the mstb itself/. A quick diagram to explain: MSTB #1 / \ / \ Port #1 Port #2 full_pbn for Port #1 → | | ← full_pbn for Port #2 Sink #1 MSTB #2 | etc... Note that in the above diagram, the combined PBN from all VCPI allocations on said hub should not exceed the full_pbn value of port #2, and the display configuration on sink #1 should not exceed the full_pbn value of port #1. However, port #1 and port #2 can otherwise consume as much bandwidth as they want so long as their VCPI allocations still fit. And finally - our current bandwidth checking code also makes the mistake of not checking whether something is an end device or not before trying to traverse down it. So, let's fix it by rewriting our bandwidth checking helpers. We split the function into one part for handling branches which simply adds up the total PBN on each branch and returns it, and one for checking each port to ensure we're not going over its PBN limit. Phew. This should fix regressions seen, where we erroneously reject display configurations due to thinking they're going over our bandwidth limits when they're not. Changes since v1: * Took an even closer look at how PBN limitations are supposed to be handled, and did some experimenting with Sean Paul. Ended up rewriting these helpers again, but this time they should actually be correct! Changes since v2: * Small indenting fix * Fix pbn_used check in drm_dp_mst_atomic_check_port_bw_limit() Signed-off-by: Lyude Paul <[email protected]> Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") Cc: Sean Paul <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Mikita Lipski <[email protected]> Tested-by: Hans de Goede <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-12drm/dp_mst: Reprobe path resources in CSN handlerLyude Paul1-23/+25
We used to punt off reprobing path resources to the link address probe work, but now that we handle CSNs asynchronously from the driver's HPD handling we can do whatever the heck we want from the CSN! So, reprobe the path resources from drm_dp_mst_handle_conn_stat(). Also, get rid of the path resource reprobing code in drm_dp_check_and_send_link_address() since it's needlessly complicated when we already reprobe path resources from drm_dp_handle_link_address_port(). And finally, teach drm_dp_send_enum_path_resources() to return 1 on PBN changes so we know if we need to send another hotplug or not. This fixes issues where we've indicated to userspace that a port has just been connected, before we actually probed it's available PBN - something that results in unexpected atomic check failures. Signed-off-by: Lyude Paul <[email protected]> Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") Cc: Mikita Lipski <[email protected]> Cc: Hans de Goede <[email protected]> Cc: Sean Paul <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Alex Deucher <[email protected]> Tested-by: Hans de Goede <[email protected]>
2020-03-12drm/dp_mst: Use full_pbn instead of available_pbn for bandwidth checksLyude Paul2-10/+9
DisplayPort specifications are fun. For a while, it's been really unclear to us what available_pbn actually does. There's a somewhat vague explanation in the DisplayPort spec (starting from 1.2) that partially explains it: The minimum payload bandwidth number supported by the path. Each node updates this number with its available payload bandwidth number if its payload bandwidth number is less than that in the Message Transaction reply. So, it sounds like available_pbn represents the smallest link rate in use between the source and the branch device. Cool, so full_pbn is just the highest possible PBN that the branch device supports right? Well, we assumed that for quite a while until Sean Paul noticed that on some MST hubs, available_pbn will actually get set to 0 whenever there's any active payloads on the respective branch device. This caused quite a bit of confusion since clearing the payload ID table would end up fixing the available_pbn value. So, we just went with that until commit cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") started breaking people's setups due to us getting erroneous available_pbn values. So, we did some more digging and got confused until we finally looked at the definition for full_pbn: The bandwidth of the link at the trained link rate and lane count between the DP Source device and the DP Sink device with no time slots allocated to VC Payloads, represented as a Payload Bandwidth Number. As with the Available_Payload_Bandwidth_Number, this number is determined by the link with the lowest lane count and link rate. That's what we get for not reading specs closely enough, hehe. So, since full_pbn is definitely what we want for doing bandwidth restriction checks - let's start using that instead and ignore available_pbn entirely. Signed-off-by: Lyude Paul <[email protected]> Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") Cc: Mikita Lipski <[email protected]> Cc: Hans de Goede <[email protected]> Cc: Sean Paul <[email protected]> Reviewed-by: Mikita Lipski <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Alex Deucher <[email protected]> Tested-by: Hans de Goede <[email protected]>
2020-03-12drm/dp_mst: Rename drm_dp_mst_is_dp_mst_end_device() to be less redundantLyude Paul1-5/+5
It's already prefixed by dp_mst, so we don't really need to repeat ourselves here. One of the changes I should have picked up originally when reviewing MST DSC support. There should be no functional changes here Cc: Mikita Lipski <[email protected]> Cc: Sean Paul <[email protected]> Cc: Hans de Goede <[email protected]> Signed-off-by: Lyude Paul <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Tested-by: Hans de Goede <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-03-12inet: Use fallthrough;Joe Perches26-43/+41
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ And by hand: net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block that causes gcc to emit a warning if converted in-place. So move the new fallthrough; inside the containing #ifdef/#endif too. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12octeontx2-pf: unlock on error path in otx2_config_pause_frm()Dan Carpenter1-2/+5
We need to unlock before returning if this allocation fails. Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds4-5/+9
Pull vfs fixes from Al Viro: "A couple of fixes for old crap in ->atomic_open() instances" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cifs_atomic_open(): fix double-put on late allocation failure gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
2020-03-12net: systemport: fix index check to avoid an array out of bounds accessColin Ian King1-1/+1
Currently the bounds check on index is off by one and can lead to an out of bounds access on array priv->filters_loc when index is RXCHK_BRCM_TAG_MAX. Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge branch 'ipa-fixes'David S. Miller2-51/+14
Alex Elder says: ==================== net: fix net-next David: These patches resolve two issues caused by the IPA driver being incorporated into net-next. I hope you will merge them as soon as you can. The IPA driver was merged into net-next last week, but two problems arise as a result, affecting net-next and linux-next: - The patch that defines field_max() was not incorporated into net-next, but is required by the IPA code - A patch that updates "sdm845.dtsi" *was* incorporated into net-next, but other changes to that file in the Qualcomm for-next branch lead to errors Bjorn has agreed to incorporate the DTS file change into the Qualcomm tree after it is reverted from net-next. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-12Revert "arm64: dts: sdm845: add IPA information"Alex Elder1-51/+0
This reverts commit 9cc5ae125f0eaee471bc87fb5cbf29385fd9272a. This commit: b303f9f0050b arm64: dts: sdm845: Redefine interconnect provider DT nodes found in the Qualcomm for-next tree removes/redefines the interconnect provider node(s) used for IPA. I'm not sure whether it technically conflicts with the IPA change to "sdm845.dtsi" in for-next, but it renders it broken. Revert this commit in the for-next tree, with the plan to incorporate it into the Qualcomm tree instead. Signed-off-by: Alex Elder <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12bitfield.h: add FIELD_MAX() and field_max()Alex Elder1-0/+14
Define FIELD_MAX(), which supplies the maximum value that can be represented by a field value. Define field_max() as well, to go along with the lower-case forms of the field mask functions. Signed-off-by: Alex Elder <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12tc-testing: add ETS scheduler to tdc build configurationDavide Caratti1-0/+1
add CONFIG_NET_SCH_ETS to 'config', otherwise test suites using this file to perform a full tdc run will encounter the following warning: ok 645 e90e - Add ETS qdisc using bands # skipped - "-----> teardown stage" did not complete successfully Fixes: 82c664b69c8b ("selftests: qdiscs: Add test coverage for ETS Qdisc") Reported-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net: phy: fix MDIO bus PM PHY resumingHeiner Kallweit2-1/+7
So far we have the unfortunate situation that mdio_bus_phy_may_suspend() is called in suspend AND resume path, assuming that function result is the same. After the original change this is no longer the case, resulting in broken resume as reported by Geert. To fix this call mdio_bus_phy_may_suspend() in the suspend path only, and let the phy_device store the info whether it was suspended by MDIO bus PM. Fixes: 503ba7c69610 ("net: phy: Avoid multiple suspends") Reported-by: Geert Uytterhoeven <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge branch 'ethtool-netlink-interface-part-3'David S. Miller18-80/+1538
Michal Kubecek says: ==================== ethtool netlink interface, part 3 Implementation of more netlink request types: - netdev features (ethtool -k/-K, patches 3-6) - private flags (--show-priv-flags / --set-priv-flags, patches 7-9) - ring sizes (ethtool -g/-G, patches 10-12) - channel counts (ethtool -l/-L, patches 13-15) Patch 1 is a style cleanup suggested in part 2 review and patch 2 updates the mapping between netdev features and legacy ioctl requests (which are still used by ethtool for backward compatibility). Changes in v2: - fix netdev reference leaks in error path of ethnl_set_rings() and ethnl_set_channels() (found by Jakub Kicinski) - use __set_bit() rather than set_bit() (suggested by David Miller) - in replies to RINGS_GET and CHANNELS_GET requests, omit ring and channel types not supported by driver/device (suggested by Jakub Kicinski) - more descriptive message size calculations in rings_reply_size() and channels_reply_size() (suggested by Jakub Kicinski) - coding style cleanup (suggested by Jakub Kicinski) ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: add CHANNELS_NTF notificationMichal Kubecek5-1/+12
Send ETHTOOL_MSG_CHANNELS_NTF notification whenever channel counts of a network device are modified using ETHTOOL_MSG_CHANNELS_SET netlink message or ETHTOOL_SCHANNELS ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: set device channel counts with CHANNELS_SET requestMichal Kubecek8-32/+177
Implement CHANNELS_SET netlink request to set channel counts of a network device. These are traditionally set with ETHTOOL_SCHANNELS ioctl request. Like the ioctl implementation, the generic ethtool code checks if supplied values do not exceed driver defined limits; if they do, first offending attribute is reported using extack. Checks preventing removing channels used for RX indirection table or zerocopy AF_XDP socket are also implemented. Move ethtool_get_max_rxfh_channel() helper into common.c so that it can be used by both ioctl and netlink code. v2: - fix netdev reference leak in error path (found by Jakub Kicinsky) Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: provide channel counts with CHANNELS_GET requestMichal Kubecek6-2/+169
Implement CHANNELS_GET request to get channel counts of a network device. These are traditionally available via ETHTOOL_GCHANNELS ioctl request. Omit attributes for channel types which are not supported by driver or device (zero reported for maximum). v2: (all suggested by Jakub Kicinski) - minor cleanup in channels_prepare_data() - more descriptive channels_reply_size() - omit attributes with zero max count Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: add RINGS_NTF notificationMichal Kubecek5-1/+12
Send ETHTOOL_MSG_RINGS_NTF notification whenever ring sizes of a network device are modified using ETHTOOL_MSG_RINGS_SET netlink message or ETHTOOL_SRINGPARAM ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: set device ring sizes with RINGS_SET requestMichal Kubecek5-1/+118
Implement RINGS_SET netlink request to set ring sizes of a network device. These are traditionally set with ETHTOOL_SRINGPARAM ioctl request. Like the ioctl implementation, the generic ethtool code checks if supplied values do not exceed driver defined limits; if they do, first offending attribute is reported using extack. v2: - fix netdev reference leak in error path (found by Jakub Kicinsky) Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: provide ring sizes with RINGS_GET requestMichal Kubecek6-2/+168
Implement RINGS_GET request to get ring sizes of a network device. These are traditionally available via ETHTOOL_GRINGPARAM ioctl request. Omit attributes for ring types which are not supported by driver or device (zero reported for maximum). v2: (all suggested by Jakub Kicinski) - minor cleanup in rings_prepare_data() - more descriptive rings_reply_size() - omit attributes with zero max size Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: add PRIVFLAGS_NTF notificationMichal Kubecek4-0/+8
Send ETHTOOL_MSG_PRIVFLAGS_NTF notification whenever private flags of a network device are modified using ETHTOOL_MSG_PRIVFLAGS_SET netlink message or ETHTOOL_SPFLAGS ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: set device private flags with PRIVFLAGS_SET requestMichal Kubecek5-1/+97
Implement PRIVFLAGS_SET netlink request to set private flags of a network device. These are traditionally set with ETHTOOL_SPFLAGS ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: provide private flags with PRIVFLAGS_GET requestMichal Kubecek6-2/+189
Implement PRIVFLAGS_GET request to get private flags for a network device. These are traditionally available via ETHTOOL_GPFLAGS ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: add FEATURES_NTF notificationMichal Kubecek4-1/+39
Send ETHTOOL_MSG_FEATURES_NTF notification whenever network device features are modified using ETHTOOL_MSG_FEATURES_SET netlink message, ethtool ioctl request or any other way resulting in call to netdev_update_features() or netdev_change_features() Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: set netdev features with FEATURES_SET requestMichal Kubecek5-9/+224
Implement FEATURES_SET netlink request to set network device features. These are traditionally set using ETHTOOL_SFEATURES ioctl request. Actual change is subject to netdev_change_features() sanity checks so that it can differ from what was requested. Unlike with most other SET requests, in addition to error code and optional extack, kernel provides an optional reply message (ETHTOOL_MSG_FEATURES_SET_REPLY) in the same format but with different semantics: information about difference between user request and actual result and difference between old and new state of dev->features. This reply message can be suppressed by setting ETHTOOL_FLAG_OMIT_REPLY flag in request header. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: add ethnl_parse_bitset() helperMichal Kubecek2-0/+98
Unlike other SET type commands, modifying netdev features is required to provide a reply telling userspace what was actually changed, compared to what was requested. For that purpose, the "modified" flag provided by ethnl_update_bitset() is not sufficient, we need full information which bits were requested to change. Therefore provide ethnl_parse_bitset() returning effective value and mask bitmaps equivalent to the contents of a bitset nested attribute. v2: use non-atomic __set_bit() (suggested by David Miller) Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: provide netdev features with FEATURES_GET requestMichal Kubecek8-12/+202
Implement FEATURES_GET request to get network device features. These are traditionally available via ETHTOOL_GFEATURES ioctl request. v2: - style cleanup suggested by Jakub Kicinski Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: update mapping of features to legacy ioctl requestsMichal Kubecek1-2/+3
Legacy ioctl request like ETHTOOL_GTXCSUM are still used by ethtool utility to get values of legacy flags (which rather work as feature groups). These are calculated from values of actual features and request to set them is implemented as an attempt to set all features mapping to them but there are two inconsistencies: - tx-checksum-fcoe-crc is shown under tx-checksumming but NETIF_F_FCOE_CRC is not included in ETHTOOL_GTXCSUM/ETHTOOL_STXCSUM - tx-scatter-gather-fraglist is shown under scatter-gather but NETIF_F_FRAGLIST is not included in ETHTOOL_GSG/ETHTOOL_SSG As the mapping in ethtool output is more correct from logical point of view, fix ethtool_get_feature_mask() to match it. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: rename ethnl_parse_header() to ethnl_parse_header_dev_get()Michal Kubecek6-17/+25
Andrew Lunn pointed out that even if it's documented that ethnl_parse_header() takes reference to network device if it fills it into the target structure, its name doesn't make it apparent so that corresponding dev_put() looks like mismatched. Rename the function ethnl_parse_header_dev_get() to indicate that it takes a reference. Suggested-by: Andrew Lunn <[email protected]> Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12cifs_atomic_open(): fix double-put on late allocation failureAl Viro3-4/+8
several iterations of ->atomic_open() calling conventions ago, we used to need fput() if ->atomic_open() failed at some point after successful finish_open(). Now (since 2016) it's not needed - struct file carries enough state to make fput() work regardless of the point in struct file lifecycle and discarding it on failure exits in open() got unified. Unfortunately, I'd missed the fact that we had an instance of ->atomic_open() (cifs one) that used to need that fput(), as well as the stale comment in finish_open() demanding such late failure handling. Trivially fixed... Fixes: fe9ec8291fca "do_last(): take fput() on error after opening to out:" Cc: [email protected] # v4.7+ Signed-off-by: Al Viro <[email protected]>
2020-03-12gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcacheAl Viro1-1/+1
with the way fs/namei.c:do_last() had been done, ->atomic_open() instances needed to recognize the case when existing file got found with O_EXCL|O_CREAT, either by falling back to finish_no_open() or failing themselves. gfs2 one didn't. Fixes: 6d4ade986f9c (GFS2: Add atomic_open support) Cc: [email protected] # v3.11 Signed-off-by: Al Viro <[email protected]>
2020-03-12Merge branch 'Introduce-connection-tracking-offload'David S. Miller22-70/+2134
Paul Blakey says: ==================== Introduce connection tracking offload Background ---------- The connection tracking action provides the ability to associate connection state to a packet. The connection state may be used for stateful packet processing such as stateful firewalls and NAT operations. Connection tracking in TC SW ---------------------------- The CT state may be matched only after the CT action is performed. As such, CT use cases are commonly implemented using multiple chains. Consider the following TC filters, as an example: 1. tc filter add dev ens1f0_0 ingress prio 1 chain 0 proto ip flower \ src_mac 24:8a:07:a5:28:01 ct_state -trk \ action ct \ pipe action goto chain 2 2. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \ ct_state +trk+new \ action ct commit \ pipe action tunnel_key set \ src_ip 0.0.0.0 \ dst_ip 7.7.7.8 \ id 98 \ dst_port 4789 \ action mirred egress redirect dev vxlan0 3. tc filter add dev ens1f0_0 ingress prio 1 chain 2 proto ip flower \ ct_state +trk+est \ action tunnel_key set \ src_ip 0.0.0.0 \ dst_ip 7.7.7.8 \ id 98 \ dst_port 4789 \ action mirred egress redirect dev vxlan0 Filter #1 (chain 0) decides, after initial packet classification, to send the packet to the connection tracking module (ct action). Once the ct_state is initialized by the CT action the packet processing continues on chain 2. Chain 2 classifies the packet based on the ct_state. Filter #2 matches on the +trk+new CT state while filter #3 matches on the +trk+est ct_state. MLX5 Connection tracking HW offload - MLX5 driver patches ------------------------------ The MLX5 hardware model aligns with the software model by realizing a multi-table architecture. In SW the TC CT action sets the CT state on the skb. Similarly, HW sets the CT state on a HW register. Driver gets this CT state while offloading a tuple with a new ct_metadata action that provides it. Matches on ct_state are translated to HW register matches. TC filter with CT action broken to two rules, a pre_ct rule, and a post_ct rule. pre_ct rule: Inserted on the corrosponding tc chain table, matches on original tc match, with actions: any pre ct actions, set fte_id, set zone, and goto the ct table. The fte_id is a register mapping uniquely identifying this filter. post_ct_rule: Inserted in a post_ct table, matches on the fte_id register mapping, with actions: counter + any post ct actions (this is usally 'goto chain X') post_ct table is a table that all the tuples inserted to the ct table goto, so if there is a tuple hit, packet will continue from ct table to post_ct table, after being marked with the CT state (mark/label..) This design ensures that the rule's actions and counters will be executed only after a CT hit. HW misses will continue processing in SW from the last chain ID that was processed in hardware. The following illustrates the HW model: +-------------------+ +--------------------+ +--------------+ + pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +-----> + original match + | + tuple + zone match + | + fte_id match + | +-------------------+ | +--------------------+ | +--------------+ | v v v set chain miss mapping set mark original set fte_id set label filter set zone set established actions set tunnel_id do nat (if needed) do decap To fill CT table, driver registers a CB for flow offload events, for each new flow table that is passed to it from offloading ct actions. Once a flow offload event is triggered on this CB, offload this flow to the hardware CT table. Established events offload -------------------------- Currently, act_ct maintains an FT instance per ct zone. Flow table entries are created, per ct connection, when connections enter an established state and deleted otherwise. Once an entry is created, the FT assumes ownership of the entries, and manages their aging. FT is used for software offload of conntrack. FT entries associate 5-tuples with an action list. The act_ct changes in this patchset: Populate the action list with a (new) ct_metadata action, providing the connection's ct state (zone,mark and label), and mangle actions if NAT is configured. Pass the action's flow table instance as ct action entry parameter, so when the action is offloaded, the driver may register a callback on it's block to receive FT flow offload add/del/stats events. Netilter changes -------------------------- The netfilter changes export the relevant bits, and add the relevant CBs to support the above. Applying this patchset -------------------------- On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"), pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git and fix the following non trivial conflict in fs_core.c as follows: Then apply this patchset. Changelog: v2->v3: Added the first two patches needed after rebasing on net-next: "net/mlx5: E-Switch, Enable reg c1 loopback when possible" "net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table" ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5e: CT: Support clear actionPaul Blakey3-12/+95
Clear action, as with software, removes all ct metadata from the packet. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5e: CT: Handle misses after executing CT actionPaul Blakey4-5/+92
Mark packets with a unique tupleid, and on miss use that id to get the act ct restore_cookie. Using that restore cookie, we ask CT to restore the relevant info on the SKB. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5e: CT: Offload established flowsPaul Blakey2-0/+691
Register driver callbacks with the nf flow table platform. FT add/delete events will create/delete FTE in the CT/CT_NAT tables. Restoring the CT state on miss will be added in the following patch. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5e: CT: Introduce connection trackingPaul Blakey8-16/+793
Add support for offloading tc ct action and ct matches. We translate the tc filter with CT action the following HW model: +-------------------+ +--------------------+ +--------------+ + pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +-----> + original match + | + tuple + zone match + | + fte_id match + | +-------------------+ | +--------------------+ | +--------------+ | v v v set chain miss mapping set mark original set fte_id set label filter set zone set established actions set tunnel_id do nat (if needed) do decap Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12flow_offload: Add flow_match_ct to get rule ct matchPaul Blakey2-0/+13
Add relevant getter for ct info dissector. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5: E-Switch, Support getting chain mappingPaul Blakey2-0/+20
Currently, we write chain register mapping on miss from the the last prio of a chain. It is used to restore the chain in software. To support re-using the chain register mapping from global tables (such as CT tuple table) misses, export the chain mapping. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5: E-Switch, Add support for offloading rules with no in_portPaul Blakey2-1/+4
FTEs in global tables may match on packets from multiple in_ports. Provide the capability to omit the in_port match condition. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5: E-Switch, Introduce global tablesPaul Blakey4-5/+51
Currently, flow tables are automatically connected according to their <chain,prio,level> tuple. Introduce global tables which are flow tables that are detached from the eswitch chains processing, and will be connected by explicitly referencing them from multiple chains. Add this new table type, and allow connecting them by refenece. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Enable hardware offload of flow table entiresPaul Blakey4-0/+14
Pass the zone's flow table instance on the flow action to the drivers. Thus, allowing drivers to register FT add/del/stats callbacks. Finally, enable hardware offload on the flow table instance. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Support refreshing the flow table entriesPaul Blakey4-13/+19
If driver deleted an FT entry, a FT failed to offload, or registered to the flow table after flows were already added, we still get packets in software. For those packets, while restoring the ct state from the flow table entry, refresh it's hardware offload. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Support restoring conntrack info on skbsPaul Blakey3-0/+24
Provide an API to restore the ct state pointer. This may be used by drivers to restore the ct state if they miss in tc chain after they already did the hardware connection tracking action (ct_metadata action). For example, consider the following rule on chain 0 that is in_hw, however chain 1 is not_in_hw: $ tc filter add dev ... chain 0 ... \ flower ... action ct pipe action goto chain 1 Packets of a flow offloaded (via nf flow table offload) by the driver hit this rule in hardware, will be marked with the ct metadata action (mark, label, zone) that does the equivalent of the software ct action, and when the packet jumps to hardware chain 1, there would be a miss. CT was already processed in hardware. Therefore, the driver's miss handling should restore the ct state on the skb, using the provided API, and continue the packet processing in chain 1. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Instantiate flow table entry actionsPaul Blakey4-23/+235
NF flow table API associate 5-tuple rule with an action list by calling the flow table type action() CB to fill the rule's actions. In action CB of act_ct, populate the ct offload entry actions with a new ct_metadata action. Initialize the ct_metadata with the ct mark, label and zone information. If ct nat was performed, then also append the relevant packet mangle actions (e.g. ipv4/ipv6/tcp/udp header rewrites). Drivers that offload the ft entries may match on the 5-tuple and perform the action list. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12netfilter: flowtable: Add API for registering to flow table eventsPaul Blakey3-0/+57
Let drivers to add their cb allowing them to receive flow offload events of type TC_SETUP_CLSFLOWER (REPLACE/DEL/STATS) for flows managed by the flow table. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads tablePaul Blakey1-0/+1
The eswitch offloads table, which has the reps (vport) rx miss rules, was moved from OFFLOADS namespace [0,0] (prio, level), to [1,0], so the restore table (the new [0,0]) can come before it. The destinations of these miss rules is the rep root ft (ttc for non uplink reps). Uplink rep root ft is created as OFFLOADS namespace [0,1], and is used as a hook to next RX prio (either ethtool or ttc), but this fails to pass fs_core level's check. Move uplink rep root ft to OFFLOADS prio 1, level 1 ([1,1]), so it will keep the same relative position after the restore table change. Signed-off-by: Paul Blakey <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/mlx5: E-Switch, Enable reg c1 loopback when possiblePaul Blakey3-11/+41
Enable reg c1 loopback if firmware reports it's supported, as this is needed for restoring packet metadata (e.g chain). Also define helper to query if it is enabled. Signed-off-by: Paul Blakey <[email protected]> Signed-off-by: David S. Miller <[email protected]>