aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-02net: ipv6: Emit notification when fib hardware flags are changedAmit Cohen6-8/+77
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. It is also possible for a route already installed in hardware to change its action and therefore its flags. For example, a host route that is trapping packets can be "promoted" to perform decapsulation following the installation of an IPinIP/VXLAN tunnel. Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed. The aim is to provide an indication to user-space (e.g., routing daemons) about the state of the route in hardware. Introduce a sysctl that controls this behavior. Keep the default value at 0 (i.e., do not emit notifications) for several reasons: - Multiple RTM_NEWROUTE notification per-route might confuse existing routing daemons. - Convergence reasons in routing daemons. - The extra notifications will negatively impact the insertion rate. - Not all users are interested in these notifications. Move fib6_info_hw_flags_set() to C file because it is no longer a short function. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: Do not call fib6_info_hw_flags_set() when IPv6 is disabledAmit Cohen2-0/+24
With the next patch mlxsw and netdevsim will fail in compilation if CONFIG_IPV6 is disabled. Do not call fib6_info_hw_flags_set() when IPv6 is disabled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: Pass 'net' struct as first argument to fib6_info_hw_flags_set()Amit Cohen3-11/+15
The next patch will emit notification when hardware flags are changed, in case that fib_notify_on_flag_change sysctl is set to 1. To know sysctl values, net struct is needed. This change is consistent with the IPv4 version, which gets 'net' struct as its first argument. Currently, the only callers of this function are mlxsw and netdevsim. Patch the callers to pass net. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipv4: Emit notification when fib hardware flags are changedAmit Cohen5-0/+60
After installing a route to the kernel, user space receives an acknowledgment, which means the route was installed in the kernel, but not necessarily in hardware. The asynchronous nature of route installation in hardware can lead to a routing daemon advertising a route before it was actually installed in hardware. This can result in packet loss or mis-routed packets until the route is installed in hardware. It is also possible for a route already installed in hardware to change its action and therefore its flags. For example, a host route that is trapping packets can be "promoted" to perform decapsulation following the installation of an IPinIP/VXLAN tunnel. Emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags are changed. The aim is to provide an indication to user-space (e.g., routing daemons) about the state of the route in hardware. Introduce a sysctl that controls this behavior. Keep the default value at 0 (i.e., do not emit notifications) for several reasons: - Multiple RTM_NEWROUTE notification per-route might confuse existing routing daemons. - Convergence reasons in routing daemons. - The extra notifications will negatively impact the insertion rate. - Not all users are interested in these notifications. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipv4: Publish fib_nlmsg_size()Amit Cohen2-1/+2
Publish fib_nlmsg_size() to allow it to be used later on from fib_alias_hw_flags_set(). Remove the inline keyword since it shouldn't be used inside C files. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipv4: Pass fib_rt_info as const to fib_dump_info()Amit Cohen2-2/+2
fib_dump_info() does not change 'fri', so pass it as 'const'. It will later allow us to invoke fib_dump_info() from fib_alias_hw_flags_set(). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02netdevsim: fib: Perform the route programming in a non-atomic contextAmit Cohen1-140/+327
Currently, netdevsim implements dummy FIB offload and marks notified routes with RTM_F_TRAP flag. netdevsim does not defer route notifications to a work queue because it does not need to program any hardware. Given that netdevsim's purpose is to both give an example implementation and allow developers to test their code, align netdevsim to a "real" hardware device driver like mlxsw and have it also perform the route "programming" in a non-atomic context. It will be used to test route flags notifications which will be added in the next patches. The following changes are needed when route handling is performed in WQ: - Handle the accounting in the main context, to be able to return an error for adding route when all the routes are used. For FIB_EVENT_ENTRY_REPLACE increase the counter before scheduling the delayed work, and in case that this event replaces an existing route, decrease the counter as part of the delayed work. - For IPv6, cannot use fen6_info->rt->fib6_siblings list because it might be changed during handling the delayed work. Save an array with the nexthops as part of fib6_event struct, and take a reference for each nexthop to prevent them from being freed while event is queued. - Change GFP_ATOMIC allocations to GFP_KERNEL. - Use single work item that is handling a list of ordered routes. Handling routes must be processed in the order they were submitted to avoid logical errors that could lead to unexpected failures. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02netdevsim: fib: Convert the current occupancy to an atomic variableAmit Cohen1-28/+27
When route is added/deleted, the appropriate counter is increased/decreased to maintain number of routes. User can limit the number of routes and then according to the appropriate counter, adding more routes than the limitation is forbidden. Currently, there is one lock which protects hashtable, list and accounting. Handling the counters will be performed from both atomic context and non-atomic context, while the hashtable and the list will be used only from non-atomic context and therefore will be protected by a separate lock. Protect accounting by using an atomic variable, so lock is not needed. v2: * Use atomic64_sub() in nsim_nexthop_account()'s error path Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge branch 'net-ipa-don-t-disable-napi-in-suspend'Jakub Kicinski1-51/+83
Alex Elder says: ==================== net: ipa: don't disable NAPI in suspend This is version 2 of a series that reworks the order in which things happen during channel stop and suspend (and start and resume), in order to address a hang that has been observed during suspend. The introductory message on the first version of the series gave some history which is omitted here. The end result of this series is that we only enable NAPI and the I/O completion interrupt on a channel when we start the channel for the first time. And we only disable them when stopping the channel "for good." In other words, NAPI and the completion interrupt remain enabled while a channel is stopped for suspend. One comment on version 1 of the series suggested *not* returning early on success in a function, instead having both success and error paths return from the same point at the end of the function block. This has been addressed in this version. In addition, this version consolidates things a little bit, but the net result of the series is exactly the same as version 1 (with the exception of the return fix mentioned above). First, patch 6 in the first version was a small step to make patch 7 easier to understand. The two have been combined now. Second, previous version moved (and for suspend/resume, eliminated) I/O completion interrupt and NAPI disable/enable control in separate steps (patches). Now both are moved around together in patch 5 and 6, which eliminates the need for the final (NAPI-only) patch. I won't repeat the patch summaries provided in v1: https://lore.kernel.org/netdev/20210129202019.2099259-1-elder@linaro.org/ Many thanks to Willem de Bruijn for his thoughtful input. ==================== Link: https://lore.kernel.org/r/20210201172850.2221624-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: expand last transaction checkAlex Elder1-9/+25
Transactions to send data for a network device can be allocated at any time up until the point the TX queue is stopped. It is possible for ipa_start_xmit() to be called in one context just before a the transmit queue is stopped in another. Update gsi_channel_trans_last() so that for TX channels the allocated and pending transaction lists are checked--in addition to the completed and polled lists--to determine the "last" transaction. This means any transaction that has been allocated before the TX queue is stopped will be allowed to complete before we conclude the channel is quiesced. Rework the function a bit to use a list pointer and gotos. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: don't disable interrupt on suspendAlex Elder1-15/+29
No completion interrupts will occur while an endpoint is suspended, nor when a channel has been stopped for suspend. So there's no need to disable the interrupt during suspend and re-enable it when resuming. Without any interrupts occurring, there is no need to disable/re-enable NAPI for channel suspend/resume either. We'll only enable NAPI and the interrupt when we first start the channel, and disable it again only when it's "really" stopped. To accomplish this, move the enable/disable calls out of __gsi_channel_start() and __gsi_channel_stop(), and into gsi_channel_start() and gsi_channel_stop() instead. Add a call to napi_synchronize() to gsi_channel_suspend(), to ensure NAPI polling is done before moving on. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: disable interrupt and NAPI after channel stopAlex Elder1-9/+9
Disable both the I/O completion interrupt and NAPI polling on a channel *after* we successfully stop it rather than before. This ensures a completion occurring just before the channel is stopped gets processed. Enable NAPI polling and the interrupt *before* starting a channel rather than after, to be symmetric. A stopped channel won't generate any completion interrupts anyway. Enable NAPI before the interrupt and disable it afterward. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: kill gsi_channel_freeze() and gsi_channel_thaw()Alex Elder1-25/+12
Open-code gsi_channel_freeze() and gsi_channel_thaw() in all callers and get rid of these two functions. This is part of reworking the sequence of things done during channel suspend/resume and start/stop. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: introduce __gsi_channel_start()Alex Elder1-20/+23
Create a new function that does most of the work of starting a channel. What's different is that it takes a flag indicating whether the channel should really be started or not. Create another new function __gsi_channel_stop() that behaves similarly. IPA v3.5.1 implements suspend using a special SUSPEND endpoint setting. If the endpoint is suspended when an I/O completes on the underlying GSI channel, a SUSPEND interrupt is generated. Newer versions of IPA do not implement the SUSPEND endpoint mode. Instead, endpoint suspend is implemented by simply stopping the underlying GSI channel. In this case, a completing I/O on a *stopped* channel causes the SUSPEND interrupt condition. These new functions put all activity related to starting or stopping a channel (including "thawing/freezing" the channel) in one place, whether or not the channel is actually started or stopped. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: introduce gsi_channel_stop_retry()Alex Elder1-5/+15
Create a new helper function that encapsulates issuing a set of channel stop commands, retrying if appropriate, with a short delay between attempts. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: don't thaw channel if error startingAlex Elder1-2/+4
If an error occurs starting a channel, don't "thaw" it. We should assume the channel remains in a non-started state. Update the comment in gsi_channel_stop(); calls to this function are no longer retried. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: fix up truesize of cloned skb in skb_prepare_for_shift()Marco Elver1-1/+13
Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when cloning an skb, save and restore truesize after pskb_expand_head(). This can occur if the allocator decides to service an allocation of the same size differently (e.g. use a different size class, or pass the allocation on to KFENCE). Because truesize is used for bookkeeping (such as sk_wmem_queued), a modified truesize of a cloned skb may result in corrupt bookkeeping and relevant warnings (such as in sk_stream_kill_queues()). Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02mptcp: fix length of MP_PRIO suboptionDavide Caratti2-3/+5
With version 0 of the protocol it was legal to encode the 'Subflow Id' in the MP_PRIO suboption, to specify which subflow would change its 'Backup' flag. This has been removed from v1 specification: thus, according to RFC 8684 §3.3.8, the resulting 'Length' for MP_PRIO changed from 4 to 3 byte. Current Linux generates / parses MP_PRIO according to the old spec, using 'Length' equal to 4, and hardcoding 1 as 'Subflow Id'; RFC compliance can improve if we change 'Length' in other to become 3, leaving a 'Nop' after the MP_PRIO suboption. In this way the kernel will emit and accept *only* MP_PRIO suboptions that are compliant to version 1 of the MPTCP protocol. unpatched 5.11-rc kernel: [root@bottarga ~]# tcpdump -tnnr unpatched.pcap | grep prio reading from file unpatched.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump IP 10.0.3.2.48433 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 4032325513 ecr 1876514270,mptcp prio non-backup id 1,mptcp dss ack 14084896651682217737], length 0 patched 5.11-rc kernel: [root@bottarga ~]# tcpdump -tnnr patched.pcap | grep prio reading from file patched.pcap, link-type LINUX_SLL (Linux cooked v1) dropped privs to tcpdump IP 10.0.3.2.49735 > 10.0.1.1.10006: Flags [.], ack 1, win 502, options [nop,nop,TS val 1276737699 ecr 2686399734,mptcp prio non-backup,nop,mptcp dss ack 18433038869082491686], length 0 Changes since v2: - when accounting for option space, don't increment 'TCPOLEN_MPTCP_PRIO' and use 'TCPOLEN_MPTCP_PRIO_ALIGN' instead, thanks to Matthieu Baerts. Changes since v1: - refactor patch to avoid using 'TCPOLEN_MPTCP_PRIO' with its old value, thanks to Geliang Tang. Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Matteo Croce <mcroce@linux.microsoft.com> Link: https://lore.kernel.org/r/846cdd41e6ad6ec88ef23fee1552ab39c2f5a3d1.1612184361.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge branch 'drivers-net-update-tasklet_init-callers'Jakub Kicinski10-41/+34
Emil Renner Berthing says: ==================== drivers: net: update tasklet_init callers This updates the remaining callers of tasklet_init() in drivers/net to the new API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") All changes are done by coccinelle using the following semantic patch. Coccinelle needs a little help parsing drivers/net/arcnet/arcnet.c @ match @ type T; T *container; identifier tasklet; identifier callback; @@ tasklet_init(&container->tasklet, callback, (unsigned long)container); @ patch1 depends on match @ type match.T; identifier match.tasklet; identifier match.callback; identifier data; identifier container; @@ -void callback(unsigned long data) +void callback(struct tasklet_struct *t) { ... - T *container = (T *)data; + T *container = from_tasklet(container, t, tasklet); ... } @ patch2 depends on match @ type match.T; identifier match.tasklet; identifier match.callback; identifier data; identifier container; @@ -void callback(unsigned long data) +void callback(struct tasklet_struct *t) { ... - T *container; + T *container = from_tasklet(container, t, tasklet); ... - container = (T *)data; ... } @ depends on (patch1 || patch2) @ match.T *container; identifier match.tasklet; identifier match.callback; @@ - tasklet_init(&container->tasklet, callback, (unsigned long)container); + tasklet_setup(&container->tasklet, callback); ==================== Link: https://lore.kernel.org/r/20210130234730.26565-1-kernel@esmil.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: usb: rtl8150: use new tasklet APIEmil Renner Berthing1-3/+3
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: usb: r8152: use new tasklet APIEmil Renner Berthing1-5/+3
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: usb: pegasus: use new tasklet APIEmil Renner Berthing1-4/+3
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: usb: lan78xx: use new tasklet APIEmil Renner Berthing1-3/+3
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: usb: hso: use new tasklet APIEmil Renner Berthing1-5/+5
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02ppp: use new tasklet APIEmil Renner Berthing2-8/+8
This converts the async and synctty drivers to use the new tasklet API n commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02ifb: use new tasklet APIEmil Renner Berthing1-4/+3
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02caif_virtio: use new tasklet APIEmil Renner Berthing1-5/+3
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02arcnet: use new tasklet APIEmil Renner Berthing1-4/+3
This converts the driver to use the new tasklet API introduced in commit 12cc923f1ccc ("tasklet: Introduce new initialization API") Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski188-778/+2056
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linuxLinus Torvalds1-4/+7
Pull clang-format update from Miguel Ojeda: "Update with the latest for_each macro list" * tag 'clang-format-for-linux-v5.11-rc7' of git://github.com/ojeda/linux: clang-format: Update with the latest for_each macro list
2021-02-02Merge tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-1/+5
Pull dma-mapping fix from Christoph Hellwig: "Fix a kernel crash in the new dma-mapping benchmark test (Barry Song)" * tag 'dma-mapping-5.11-1' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: benchmark: fix kernel crash when dma_map_single fails
2021-02-02Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-16/+13
Pull vdpa fix from Michael Tsirkin: "A single mlx bugfix" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix memory key MTT population
2021-02-02Merge tag 'net-5.11-rc7' of ↵Linus Torvalds44-107/+328
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.11-rc7, including fixes from bpf and mac80211 trees. Current release - regressions: - ip_tunnel: fix mtu calculation - mlx5: fix function calculation for page trees Previous releases - regressions: - vsock: fix the race conditions in multi-transport support - neighbour: prevent a dead entry from updating gc_list - dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add Previous releases - always broken: - bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt infra when user space is trying to race against optlen, from Loris Reiff. - bpf: add missing fput() in BPF inode storage map update helper - udp: ipv4: manipulate network header of NATed UDP GRO fraglist - mac80211: fix station rate table updates on assoc - r8169: work around RTL8125 UDP HW bug - igc: report speed and duplex as unknown when device is runtime suspended - rxrpc: fix deadlock around release of dst cached on udp tunnel" * tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits) net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary net: ipa: fix two format specifier errors net: ipa: use the right accessor in ipa_endpoint_status_skip() net: ipa: be explicit about endianness net: ipa: add a missing __iomem attribute net: ipa: pass correct dma_handle to dma_free_coherent() r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS net: mvpp2: TCAM entry enable should be written after SRAM data net: lapb: Copy the skb before sending a packet net/mlx5e: Release skb in case of failure in tc update skb net/mlx5e: Update max_opened_tc also when channels are closed net/mlx5: Fix leak upon failure of rule creation net/mlx5: Fix function calculation for page trees docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc udp: ipv4: manipulate network header of NATed UDP GRO fraglist net: ip_tunnel: fix mtu calculation vsock: fix the race conditions in multi-transport support net: sched: replaced invalid qdisc tree flush helper in qdisc_replace ibmvnic: device remove has higher precedence over reset ...
2021-02-02net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundaryAndreas Oetken1-1/+4
sup_multicast_addr is passed to ether_addr_equal for address comparison which casts the address inputs to u16 leading to an unaligned access. Aligning the sup_multicast_addr to u16 boundary fixes the issue. Signed-off-by: Andreas Oetken <andreas.oetken@siemens.com> Link: https://lore.kernel.org/r/20210202090304.2740471-1-ennoerlangen@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'mlx5-fixes-2021-02-01' of ↵Jakub Kicinski4-9/+20
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-02-01 Please note the first patch in this series ("Fix function calculation for page trees") is fixing a regression due to previous fix in net which you didn't include in your previous rc pr. So I hope this series will make it into your next rc pr, so mlx5 won't be broken in the next rc. * tag 'mlx5-fixes-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Release skb in case of failure in tc update skb net/mlx5e: Update max_opened_tc also when channels are closed net/mlx5: Fix leak upon failure of rule creation net/mlx5: Fix function calculation for page trees ==================== Link: https://lore.kernel.org/r/20210202070703.617251-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge branch 'net-ipa-a-few-bug-fixes'Jakub Kicinski3-6/+6
Alex Elder says: ==================== net: ipa: a few bug fixes This series fixes four minor bugs. The first two are things that sparse points out. All four are very simple and each patch should explain itself pretty well. ==================== Link: https://lore.kernel.org/r/20210201232609.3524451-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: fix two format specifier errorsAlex Elder1-2/+2
Fix two format specifiers that used %lu for a size_t in "ipa_mem.c". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: use the right accessor in ipa_endpoint_status_skip()Alex Elder1-2/+2
When extracting the destination endpoint ID from the status in ipa_endpoint_status_skip(), u32_get_bits() is used. This happens to work, but it's wrong: the structure field is only 8 bits wide instead of 32. Fix this by using u8_get_bits() to get the destination endpoint ID. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: be explicit about endiannessAlex Elder1-1/+1
Sparse warns that the assignment of the metadata mask for a QMAP endpoint in ipa_endpoint_init_hdr_metadata_mask() is a bad assignment. We know we want the mask value to be big endian, even though the value we write is in host byte order. Use a __force tag to indicate we really mean it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: add a missing __iomem attributeAlex Elder1-1/+1
The virt local variable in gsi_channel_state() does not have an __iomem attribute but should. Fix this. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Amy Parker <enbyamy@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: ipa: pass correct dma_handle to dma_free_coherent()Dan Carpenter1-1/+1
The "ring->addr = addr;" assignment is done a few lines later so we can't use "ring->addr" yet. The correct dma_handle is "addr". Fixes: 650d1603825d ("soc: qcom: ipa: the generic software interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/YBjpTU2oejkNIULT@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is setHeiner Kallweit1-2/+2
So far phy_disconnect() is called before free_irq(). If CONFIG_DEBUG_SHIRQ is set and interrupt is shared, then free_irq() creates an "artificial" interrupt by calling the interrupt handler. The "link change" flag is set in the interrupt status register, causing phylib to eventually call phy_suspend(). Because the net_device is detached from the PHY already, the PHY driver can't recognize that WoL is configured and powers down the PHY. Fixes: f1e911d5d0df ("r8169: add basic phylib support") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/fe732c2c-a473-9088-3974-df83cfbd6efd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGSSabyrzhan Tasbolatov1-0/+3
syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS control message is passed with user-controlled 0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition. The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400. So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10 is the closest limit, with 0x10 leftover. Same condition is currently done in rds_cmsg_rdma_args(). [1] WARNING: mm/page_alloc.c:5011 [..] Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc_array include/linux/slab.h:592 [inline] kcalloc include/linux/slab.h:621 [inline] rds_rdma_extra_size+0xb2/0x3b0 net/rds/rdma.c:568 rds_rm_size net/rds/send.c:928 [inline] Reported-by: syzbot+1bd2b07f93745fa38425@syzkaller.appspotmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Link: https://lore.kernel.org/r/20210201203233.1324704-1-snovitoll@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: mvpp2: TCAM entry enable should be written after SRAM dataStefan Chulski1-5/+5
Last TCAM data contains TCAM enable bit. It should be written after SRAM data before entry enabled. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Stefan Chulski <stefanc@marvell.com> Link: https://lore.kernel.org/r/1612172139-28343-1-git-send-email-stefanc@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02net: lapb: Copy the skb before sending a packetXie He1-1/+2
When sending a packet, we will prepend it with an LAPB header. This modifies the shared parts of a cloned skb, so we should copy the skb rather than just clone it, before we prepend the header. In "Documentation/networking/driver.rst" (the 2nd point), it states that drivers shouldn't modify the shared parts of a cloned skb when transmitting. The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called when an skb is being sent, clones the skb and sents the clone to AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte pseudo-header before handing over the skb to us, if we don't copy the skb before prepending the LAPB header, the first byte of the packets received on AF_PACKET sockets can be corrupted. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20210201055706.415842-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02Merge tag 'mac80211-for-net-2021-02-02' of ↵Jakub Kicinski3-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two fixes: - station rate tables were not updated correctly after association, leading to bad configuration - rtl8723bs (staging) was initializing data incorrectly after the previous fix and needed to move the init later * tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: staging: rtl8723bs: Move wiphy setup to after reading the regulatory settings from the chip mac80211: fix station rate table updates on assoc ==================== Link: https://lore.kernel.org/r/20210202143505.37610-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-01net/mlx5e: Release skb in case of failure in tc update skbMaor Dickman1-4/+12
In case of failure in tc update skb the packet is dropped without freeing the skb. Fixed by freeing the skb in case failure in tc update skb. Fixes: d6d27782864f ("net/mlx5: E-Switch, Restore chain id on miss") Fixes: c75690972228 ("net/mlx5e: Add tc chains offload support for nic flows") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5e: Update max_opened_tc also when channels are closedMaxim Mikityanskiy1-4/+2
max_opened_tc is used for stats, so that potentially non-zero stats won't disappear when num_tc decreases. However, mlx5e_setup_tc_mqprio fails to update it in the flow where channels are closed. This commit fixes it. The new value of priv->channels.params.num_tc is always checked on exit. In case of errors it will just be the old value, and in case of success it will be the updated value. Fixes: 05909babce53 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5: Fix leak upon failure of rule creationMaor Gottlieb1-0/+5
When creation of a new rule that requires allocation of an FTE fails, need to call to tree_put_node on the FTE in order to release its' resource. Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-02-01net/mlx5: Fix function calculation for page treesDaniel Jurgens1-1/+1
The function calculation always results in a value of 0. This works generally, but when the release all pages feature is enabled it will result in crashes. Fixes: 0aa128475d33 ("net/mlx5: Maintain separate page trees for ECPF and PF functions") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>