aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-01-15mlxsw: spectrum_qdisc: Include MC TCs in Qdisc countersPetr Machata1-7/+23
mlxsw configures Spectrum in such a way that BUM traffic is passed not through its nominal traffic class TC, but through its MC counterpart TC+8. However, when collecting statistics, Qdiscs only look at the nominal TC and ignore the MC TC. Add two helpers to compute the value for logical TC from the constituents, one for backlog, the other for tail drops. Use them throughout instead of going through the xstats pointer directly. Counters for TX bytes and packets are deduced from packet priority counters, and therefore already include BUM traffic. wred_drop counter is irrelevant on MC TCs, because RED is not enabled on them. Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports") Signed-off-by: Petr Machata <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-15mlxsw: spectrum: Wipe xstats.backlog of down portsPetr Machata1-0/+13
Per-port counter cache used by Qdiscs is updated periodically, unless the port is down. The fact that the cache is not updated for down ports is no problem for most counters, which are relative in nature. However, backlog is absolute in nature, and if there is a non-zero value in the cache around the time that the port goes down, that value just stays there. This value then leaks to offloaded Qdiscs that report non-zero backlog even if there (obviously) is no traffic. The HW does not keep backlog of a downed port, so do likewise: as the port goes down, wipe the backlog value from xstats. Fixes: 075ab8adaf4e ("mlxsw: spectrum: Collect tclass related stats periodically") Signed-off-by: Petr Machata <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-15selftests: mlxsw: qos_mc_aware: Fix mausezahn invocationPetr Machata1-2/+6
Mausezahn does not recognize "own" as a keyword on source IP address. As a result, the MC stream is not running at all, and therefore no UC degradation can be observed even in principle. Fix the invocation, and tighten the test: due to the minimum shaper configured at the MC TCs, we always expect about 20% degradation. Fail the test if it is lower. Fixes: 573363a68f27 ("selftests: mlxsw: Add qos_lib.sh") Signed-off-by: Petr Machata <[email protected]> Reported-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-15mlxsw: switchx2: Do not modify cloned SKBs during xmitIdo Schimmel1-11/+6
The driver needs to prepend a Tx header to each packet it is transmitting. The header includes information such as the egress port and traffic class. The addition of the header requires the driver to modify the SKB's header and therefore it must not be shared. Otherwise, we risk hitting various race conditions. For example, when a packet is flooded (cloned) by the bridge driver to two switch ports swp1 and swp2: t0 - mlxsw_sp_port_xmit() is called for swp1. Tx header is prepended with swp1's port number t1 - mlxsw_sp_port_xmit() is called for swp2. Tx header is prepended with swp2's port number, overwriting swp1's port number t2 - The device processes data buffer from t0. Packet is transmitted via swp2 t3 - The device processes data buffer from t1. Packet is transmitted via swp2 Usually, the device is fast enough and transmits the packet before its Tx header is overwritten, but this is not the case in emulated environments. Fix this by making sure the SKB's header is writable by calling skb_cow_head(). Since the function ensures we have headroom to push the Tx header, the check further in the function can be removed. v2: * Use skb_cow_head() instead of skb_unshare() as suggested by Jakub * Remove unnecessary check regarding headroom Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-15mlxsw: spectrum: Do not modify cloned SKBs during xmitIdo Schimmel1-12/+6
The driver needs to prepend a Tx header to each packet it is transmitting. The header includes information such as the egress port and traffic class. The addition of the header requires the driver to modify the SKB's header and therefore it must not be shared. Otherwise, we risk hitting various race conditions. For example, when a packet is flooded (cloned) by the bridge driver to two switch ports swp1 and swp2: t0 - mlxsw_sp_port_xmit() is called for swp1. Tx header is prepended with swp1's port number t1 - mlxsw_sp_port_xmit() is called for swp2. Tx header is prepended with swp2's port number, overwriting swp1's port number t2 - The device processes data buffer from t0. Packet is transmitted via swp2 t3 - The device processes data buffer from t1. Packet is transmitted via swp2 Usually, the device is fast enough and transmits the packet before its Tx header is overwritten, but this is not the case in emulated environments. Fix this by making sure the SKB's header is writable by calling skb_cow_head(). Since the function ensures we have headroom to push the Tx header, the check further in the function can be removed. v2: * Use skb_cow_head() instead of skb_unshare() as suggested by Jakub * Remove unnecessary check regarding headroom Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <[email protected]> Reported-by: Shalom Toledo <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-15mlxsw: spectrum: Do not enforce same firmware version for multiple ASICsIdo Schimmel1-1/+22
In commit a72afb6879bb ("mlxsw: Enforce firmware version for Spectrum-2") I added a required firmware version for Spectrum-2, but missed the fact that mlxsw_sp2_init() is used by both Spectrum-2 and Spectrum-3. This means that the same firmware version will be used for both, which is wrong. Fix this by creating a new init() callback for Spectrum-3. Fixes: a72afb6879bb ("mlxsw: Enforce firmware version for Spectrum-2") Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Tested-by: Shalom Toledo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-15usb: core: hub: Improved device recognition on remote wakeupKeiya Nobuta1-0/+1
If hub_activate() is called before D+ has stabilized after remote wakeup, the following situation might occur: __ ___________________ / \ / D+ __/ \__/ Hub _______________________________ | ^ ^ ^ | | | | Host _____v__|___|___________|______ | | | | | | | \-- Interrupt Transfer (*3) | | \-- ClearPortFeature (*2) | \-- GetPortStatus (*1) \-- Host detects remote wakeup - D+ goes high, Host starts running by remote wakeup - D+ is not stable, goes low - Host requests GetPortStatus at (*1) and gets the following hub status: - Current Connect Status bit is 0 - Connect Status Change bit is 1 - D+ stabilizes, goes high - Host requests ClearPortFeature and thus Connect Status Change bit is cleared at (*2) - After waiting 100 ms, Host starts the Interrupt Transfer at (*3) - Since the Connect Status Change bit is 0, Hub returns NAK. In this case, port_event() is not called in hub_event() and Host cannot recognize device. To solve this issue, flag change_bits even if only Connect Status Change bit is 1 when got in the first GetPortStatus. This issue occurs rarely because it only if D+ changes during a very short time between GetPortStatus and ClearPortFeature. However, it is fatal if it occurs in embedded system. Signed-off-by: Keiya Nobuta <[email protected]> Cc: stable <[email protected]> Acked-by: Alan Stern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-01-15Merge tag 'mac80211-for-net-2020-01-15' of ↵David S. Miller11-12/+106
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A few fixes: * -O3 enablement fallout, thanks to Arnd who ran this * fixes for a few leaks, thanks to Felix * channel 12 regulatory fix for custom regdomains * check for a crash reported by syzbot (NULL function is called on drivers that don't have it) * fix TKIP replay protection after setup with some APs (from Jouni) * restrict obtaining some mesh data to avoid WARN_ONs * fix deadlocks with auto-disconnect (socket owner) * fix radar detection events with multiple devices ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-15x86/mce/therm_throt: Do not access uninitialized therm_workChuansheng Liu1-4/+5
It is relatively easy to trigger the following boot splat on an Ice Lake client platform. The call stack is like: kernel BUG at kernel/timer/timer.c:1152! Call Trace: __queue_delayed_work queue_delayed_work_on therm_throt_process intel_thermal_interrupt ... The reason is that a CPU's thermal interrupt is enabled prior to executing its hotplug onlining callback which will initialize the throttling workqueues. Such a race can lead to therm_throt_process() accessing an uninitialized therm_work, leading to the above BUG at a very early bootup stage. Therefore, unmask the thermal interrupt vector only after having setup the workqueues completely. [ bp: Heavily massage commit message and correct comment formatting. ] Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Signed-off-by: Chuansheng Liu <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Tony Luck <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-15Revert "gpio: thunderx: Switch to GPIOLIB_IRQCHIP"Kevin Hao2-57/+107
This reverts commit a7fc89f9d5fcc10a5474cfe555f5a9e5df8b0f1f because there are some bugs in this commit, and we don't have a simple way to fix these bugs. So revert this commit to make the thunderx gpio work on the stable kernel at least. We will switch to GPIOLIB_IRQCHIP for thunderx gpio by following patches. Fixes: a7fc89f9d5fc ("gpio: thunderx: Switch to GPIOLIB_IRQCHIP") Signed-off-by: Kevin Hao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2020-01-15tick/sched: Annotate lockless access to last_jiffies_updateEric Dumazet1-5/+9
syzbot (KCSAN) reported a data-race in tick_do_update_jiffies64(): BUG: KCSAN: data-race in tick_do_update_jiffies64 / tick_do_update_jiffies64 write to 0xffffffff8603d008 of 8 bytes by interrupt on cpu 1: tick_do_update_jiffies64+0x100/0x250 kernel/time/tick-sched.c:73 tick_sched_do_timer+0xd4/0xe0 kernel/time/tick-sched.c:138 tick_sched_timer+0x43/0xe0 kernel/time/tick-sched.c:1292 __run_hrtimer kernel/time/hrtimer.c:1514 [inline] __hrtimer_run_queues+0x274/0x5f0 kernel/time/hrtimer.c:1576 hrtimer_interrupt+0x22a/0x480 kernel/time/hrtimer.c:1638 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline] smp_apic_timer_interrupt+0xdc/0x280 arch/x86/kernel/apic/apic.c:1135 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 arch_local_irq_restore arch/x86/include/asm/paravirt.h:756 [inline] kcsan_setup_watchpoint+0x1d4/0x460 kernel/kcsan/core.c:436 check_access kernel/kcsan/core.c:466 [inline] __tsan_read1 kernel/kcsan/core.c:593 [inline] __tsan_read1+0xc2/0x100 kernel/kcsan/core.c:593 kallsyms_expand_symbol.constprop.0+0x70/0x160 kernel/kallsyms.c:79 kallsyms_lookup_name+0x7f/0x120 kernel/kallsyms.c:170 insert_report_filterlist kernel/kcsan/debugfs.c:155 [inline] debugfs_write+0x14b/0x2d0 kernel/kcsan/debugfs.c:256 full_proxy_write+0xbd/0x100 fs/debugfs/file.c:225 __vfs_write+0x67/0xc0 fs/read_write.c:494 vfs_write fs/read_write.c:558 [inline] vfs_write+0x18a/0x390 fs/read_write.c:542 ksys_write+0xd5/0x1b0 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x4c/0x60 fs/read_write.c:620 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 read to 0xffffffff8603d008 of 8 bytes by task 0 on cpu 0: tick_do_update_jiffies64+0x2b/0x250 kernel/time/tick-sched.c:62 tick_nohz_update_jiffies kernel/time/tick-sched.c:505 [inline] tick_nohz_irq_enter kernel/time/tick-sched.c:1257 [inline] tick_irq_enter+0x139/0x1c0 kernel/time/tick-sched.c:1274 irq_enter+0x4f/0x60 kernel/softirq.c:354 entering_irq arch/x86/include/asm/apic.h:517 [inline] entering_ack_irq arch/x86/include/asm/apic.h:523 [inline] smp_apic_timer_interrupt+0x55/0x280 arch/x86/kernel/apic/apic.c:1133 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 rest_init+0xec/0xf6 init/main.c:452 arch_call_rest_init+0x17/0x37 start_kernel+0x838/0x85e init/main.c:786 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Use READ_ONCE() and WRITE_ONCE() to annotate this expected race. Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-15cfg80211: fix page refcount issue in A-MSDU decapFelix Fietkau1-1/+1
The fragments attached to a skb can be part of a compound page. In that case, page_ref_inc will increment the refcount for the wrong page. Fix this by using get_page instead, which calls page_ref_inc on the compound head and also checks for overflow. Fixes: 2b67f944f88c ("cfg80211: reuse existing page fragments in A-MSDU rx") Cc: [email protected] Signed-off-by: Felix Fietkau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15cfg80211: check for set_wiphy_paramsJohannes Berg1-0/+4
Check if set_wiphy_params is assigned and return an error if not, some drivers (e.g. virt_wifi where syzbot reported it) don't have it. Reported-by: [email protected] Reported-by: [email protected] Signed-off-by: Johannes Berg <[email protected]> Link: https://lore.kernel.org/r/20200113125358.ac07f276efff.Ibd85ee1b12e47b9efb00a2adc5cd3fac50da791a@changeid Signed-off-by: Johannes Berg <[email protected]>
2020-01-15cfg80211: fix memory leak in cfg80211_cqm_rssi_updateFelix Fietkau1-0/+1
The per-tid statistics need to be released after the call to rdev_get_station Cc: [email protected] Fixes: 8689c051a201 ("cfg80211: dynamically allocate per-tid stats for station info") Signed-off-by: Felix Fietkau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15cfg80211: fix memory leak in nl80211_probe_mesh_linkFelix Fietkau1-0/+2
The per-tid statistics need to be released after the call to rdev_get_station Cc: [email protected] Fixes: 5ab92e7fe49a ("cfg80211: add support to probe unexercised mesh link") Signed-off-by: Felix Fietkau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15cfg80211: fix deadlocks in autodisconnect workMarkus Theil1-3/+3
Use methods which do not try to acquire the wdev lock themselves. Cc: [email protected] Fixes: 37b1c004685a3 ("cfg80211: Support all iftypes in autodisconnect_wk") Signed-off-by: Markus Theil <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15wireless: wext: avoid gcc -O3 warningArnd Bergmann1-1/+2
After the introduction of CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, the wext code produces a bogus warning: In function 'iw_handler_get_iwstats', inlined from 'ioctl_standard_call' at net/wireless/wext-core.c:1015:9, inlined from 'wireless_process_ioctl' at net/wireless/wext-core.c:935:10, inlined from 'wext_ioctl_dispatch.part.8' at net/wireless/wext-core.c:986:8, inlined from 'wext_handle_ioctl': net/wireless/wext-core.c:671:3: error: argument 1 null where non-null expected [-Werror=nonnull] memcpy(extra, stats, sizeof(struct iw_statistics)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/include/asm/string.h:5, net/wireless/wext-core.c: In function 'wext_handle_ioctl': arch/x86/include/asm/string_64.h:14:14: note: in a call to function 'memcpy' declared here The problem is that ioctl_standard_call() sometimes calls the handler with a NULL argument that would cause a problem for iw_handler_get_iwstats. However, iw_handler_get_iwstats never actually gets called that way. Marking that function as noinline avoids the warning and leads to slightly smaller object code as well. Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15mac80211: Fix TKIP replay protection immediately after key setupJouni Malinen1-3/+15
TKIP replay protection was skipped for the very first frame received after a new key is configured. While this is potentially needed to avoid dropping a frame in some cases, this does leave a window for replay attacks with group-addressed frames at the station side. Any earlier frame sent by the AP using the same key would be accepted as a valid frame and the internal RSC would then be updated to the TSC from that frame. This would allow multiple previously transmitted group-addressed frames to be replayed until the next valid new group-addressed frame from the AP is received by the station. Fix this by limiting the no-replay-protection exception to apply only for the case where TSC=0, i.e., when this is for the very first frame protected using the new key, and the local RSC had not been set to a higher value when configuring the key (which may happen with GTK). Signed-off-by: Jouni Malinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15cfg80211: Fix radar event during another phy CACOrr Mazor5-1/+65
In case a radar event of CAC_FINISHED or RADAR_DETECTED happens during another phy is during CAC we might need to cancel that CAC. If we got a radar in a channel that another phy is now doing CAC on then the CAC should be canceled there. If, for example, 2 phys doing CAC on the same channels, or on comptable channels, once on of them will finish his CAC the other might need to cancel his CAC, since it is no longer relevant. To fix that the commit adds an callback and implement it in mac80211 to end CAC. This commit also adds a call to said callback if after a radar event we see the CAC is no longer relevant Signed-off-by: Orr Mazor <[email protected]> Reviewed-by: Sergey Matyukevich <[email protected]> Link: https://lore.kernel.org/r/[email protected] [slightly reformat/reword commit message] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15wireless: fix enabling channel 12 for custom regulatory domainGanapathi Bhat1-3/+10
Commit e33e2241e272 ("Revert "cfg80211: Use 5MHz bandwidth by default when checking usable channels"") fixed a broken regulatory (leaving channel 12 open for AP where not permitted). Apply a similar fix to custom regulatory domain processing. Signed-off-by: Cathy Luo <[email protected]> Signed-off-by: Ganapathi Bhat <[email protected]> Link: https://lore.kernel.org/r/[email protected] [reword commit message, fix coding style, add a comment] Signed-off-by: Johannes Berg <[email protected]>
2020-01-15fix autofs regression caused by follow_managed() changesAl Viro1-0/+1
we need to reload ->d_flags after the call of ->d_manage() - the thing might've been called with dentry still negative and have the damn thing turned positive while we'd waited. Fixes: d41efb522e90 "fs/namei.c: pull positivity check into follow_managed()" Reported-by: Ian Kent <[email protected]> Tested-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-01-15reimplement path_mountpoint() with less magicAl Viro3-80/+12
... and get rid of a bunch of bugs in it. Background: the reason for path_mountpoint() is that umount() really doesn't want attempts to revalidate the root of what it's trying to umount. The thing we want to avoid actually happen from complete_walk(); solution was to do something parallel to normal path_lookupat() and it both went overboard and got the boilerplate subtly (and not so subtly) wrong. A better solution is to do pretty much what the normal path_lookupat() does, but instead of complete_walk() do unlazy_walk(). All it takes to avoid that ->d_weak_revalidate() call... mountpoint_last() goes away, along with everything it got wrong, and so does the magic around LOOKUP_NO_REVAL. Another source of bugs is that when we traverse mounts at the final location (and we need to do that - umount . expects to get whatever's overmounting ., if any, out of the lookup) we really ought to take care of ->d_manage() - as it is, manual umount of autofs automount in progress can lead to unpleasant surprises for the daemon. Easily solved by using handle_lookup_down() instead of follow_mount(). Tested-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-01-14io_uring: be consistent in assigning next work from handlerJens Axboe1-24/+28
If we pass back dependent work in case of links, we need to always ensure that we call the link setup and work prep handler. If not, we might be missing some setup for the next work item. Signed-off-by: Jens Axboe <[email protected]>
2020-01-14io-wq: cancel work if we fail getting a mm referenceJens Axboe1-4/+8
If we require mm and user context, mark the request for cancellation if we fail to acquire the desired mm. Signed-off-by: Jens Axboe <[email protected]>
2020-01-15MAINTAINERS: Update Ley Foon Tan's email addressLey Foon Tan1-4/+4
@altera.com email is going to removed. Change to @intel.com email. Signed-off-by: Ley Foon Tan <[email protected]>
2020-01-14Merge branch 'net-Add-route-offload-indication'David S. Miller14-129/+2365
Ido Schimmel says: ==================== net: Add route offload indication This patch set adds offload indication to IPv4 and IPv6 routes. So far offload indication was only available for the nexthop via 'RTNH_F_OFFLOAD', which is problematic as a nexthop is usually shared between multiple routes. Based on feedback from Roopa and David on the RFC [1], the indication is split to 'offload' and 'trap'. This is done because not all the routes present in hardware actually offload traffic from the kernel. For example, host routes merely trap packets to the kernel. The two flags are dumped to user space via the 'rtm_flags' field in the ancillary header of the rtnetlink message. In addition, the patch set uses the new flags in order to test the FIB offload API by adding a dummy FIB offload implementation to netdevsim. The new tests are added to a shared library and can be therefore shared between different drivers. Patches #1-#3 add offload indication to IPv4 routes. Patches #4 adds offload indication to IPv6 routes. Patches #5-#6 add support for the offload indication in mlxsw. Patch #7 adds dummy FIB offload implementation in netdevsim. Patches #8-#10 add selftests. v2 (feedback from David Ahern): * Patch #2: Name last argument of fib_dump_info() * Patch #2: Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could later be passed to fib_alias_hw_flags_set() * Patch #3: Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set() * Patch #6: Convert to new fib_alias_hw_flags_set() interface * Patch #7: Convert to new fib_alias_hw_flags_set() interface [1] https://patchwork.ozlabs.org/cover/1170530/ ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-14selftests: mlxsw: Add test for FIB offload APIIdo Schimmel1-0/+180
The test reuses the common FIB offload tests in order to make sure that mlxsw correctly implements FIB offload. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14selftests: netdevsim: Add test for FIB offload APIIdo Schimmel1-0/+341
Test various aspects of the FIB offload API on top of the netdevsim implementation. Both good and bad flows are tested. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14selftests: forwarding: Add helpers and tests for FIB offloadIdo Schimmel1-0/+873
Implement a set of common helpers and tests for FIB offload that can be used by multiple drivers to check their FIB offload implementations. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14netdevsim: fib: Add dummy implementation for FIB offloadIdo Schimmel2-9/+663
Implement dummy IPv4 and IPv6 FIB "offload" in the driver by storing currently "programmed" routes in a hash table. Each route in the hash table is marked with "trap" indication. The indication is cleared when the route is replaced or when the netdevsim instance is deleted. This will later allow us to test the route offload API on top of netdevsim. v2: * Convert to new fib_alias_hw_flags_set() interface Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14mlxsw: spectrum_router: Set hardware flags for routesIdo Schimmel1-77/+77
Previous patches added support for two hardware flags for IPv4 and IPv6 routes: 'RTM_F_OFFLOAD' and 'RTM_F_TRAP'. Both indicate the presence of the route in hardware. The first indicates that traffic is actually offloaded from the kernel, whereas the second indicates that packets hitting such routes are trapped to the kernel for processing (e.g., host routes). Use these two flags in mlxsw. The flags are modified in two places. Firstly, whenever a route is updated in the device's table. This includes the addition, deletion or update of a route. For example, when a host route is promoted to perform NVE decapsulation, its action in the device is updated, the 'RTM_F_OFFLOAD' flag set and the 'RTM_F_TRAP' flag cleared. Secondly, when a route is replaced and overwritten by another route, its flags are cleared. v2: * Convert to new fib_alias_hw_flags_set() interface Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14mlxsw: spectrum_router: Separate nexthop offload indication from routeIdo Schimmel1-17/+75
The driver currently uses the 'RTNH_F_OFFLOAD' flag for both routes and nexthops, which is cumbersome and unnecessary now that we have separate flag for the route itself. Separate the offload indication for nexthops from routes and call it whenever the offload state within the nexthop group changes. Note that IPv6 (unlike IPv4) does not share the same nexthop group between different routes, whereas mlxsw does. Therefore, whenever the offload indication within an IPv6 nexthop group changes, all the linked routes need to be updated. Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14ipv6: Add "offload" and "trap" indications to routesIdo Schimmel2-1/+17
In a similar fashion to previous patch, add "offload" and "trap" indication to IPv6 routes. This is done by using two unused bits in 'struct fib6_info' to hold these indications. Capable drivers are expected to set these when processing the various in-kernel route notifications. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14ipv4: Add "offload" and "trap" indications to routesIdo Schimmel6-0/+87
When performing L3 offload, routes and nexthops are usually programmed into two different tables in the underlying device. Therefore, the fact that a nexthop resides in hardware does not necessarily mean that all the associated routes also reside in hardware and vice-versa. While the kernel can signal to user space the presence of a nexthop in hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag for routes. In addition, the fact that a route resides in hardware does not necessarily mean that the traffic is offloaded. For example, unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap packets to the CPU so that the kernel will be able to generate the appropriate ICMP error packet. This patch adds an "offload" and "trap" indications to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias' is extended with two new fields that indicate if the route resides in hardware or not and if it is offloading traffic from the kernel or trapping packets to it. Note that the new fields are added in the 6 bytes hole and therefore the struct still fits in a single cache line [1]. Capable drivers are expected to invoke fib_alias_hw_flags_set() with the route's key in order to set the flags. The indications are dumped to user space via a new flags (i.e., 'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the ancillary header. v2: * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set() [1] struct fib_alias { struct hlist_node fa_list; /* 0 16 */ struct fib_info * fa_info; /* 16 8 */ u8 fa_tos; /* 24 1 */ u8 fa_type; /* 25 1 */ u8 fa_state; /* 26 1 */ u8 fa_slen; /* 27 1 */ u32 tb_id; /* 28 4 */ s16 fa_default; /* 32 2 */ u8 offload:1; /* 34: 0 1 */ u8 trap:1; /* 34: 1 1 */ u8 unused:6; /* 34: 2 1 */ /* XXX 5 bytes hole, try to pack */ struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */ /* size: 56, cachelines: 1, members: 12 */ /* sum members: 50, holes: 1, sum holes: 5 */ /* sum bitfield members: 8 bits (1 bytes) */ /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14ipv4: Encapsulate function arguments in a structIdo Schimmel5-21/+45
fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages using the passed arguments. Currently, the function takes 11 arguments, 6 of which are attributes of the route being dumped (e.g., prefix, TOS). The next patch will need the function to also dump to user space an indication if the route is present in hardware or not. Instead of passing yet another argument, change the function to take a struct containing the different route attributes. v2: * Name last argument of fib_dump_info() * Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could later be passed to fib_alias_hw_flags_set() Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14ipv4: Replace route in list before notifyingIdo Schimmel1-4/+7
Subsequent patches will add an offload / trap indication to routes which will signal if the route is present in hardware or not. After programming the route to the hardware, drivers will have to ask the IPv4 code to set the flags by passing the route's key. In the case of route replace, the new route is notified before it is actually inserted into the FIB alias list. This can prevent simple drivers (e.g., netdevsim) that program the route to the hardware in the same context it is notified in from being able to set the flag. Solve this by first inserting the new route to the list and rollback the operation in case the route was vetoed. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: mvneta: fix dma sync size in mvneta_run_xdpLorenzo Bianconi1-9/+10
Page pool API will start syncing (if requested) starting from page->dma_addr + pool->p.offset. Fix dma sync length in mvneta_run_xdp since we do not need to account xdp headroom Fixes: 07e13edbb6a6 ("net: mvneta: get rid of huge dma sync in mvneta_rx_refill") Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: socionext: get rid of huge dma sync in netsec_alloc_rx_dataLorenzo Bianconi1-17/+26
Socionext driver can run on dma coherent and non-coherent devices. Get rid of huge dma_sync_single_for_device in netsec_alloc_rx_data since now the driver can let page_pool API to managed needed DMA sync Reviewed-by: Ilias Apalodimas <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14r8152: add missing endpoint sanity checkJohan Hovold1-0/+3
Add missing endpoint sanity check to probe in order to prevent a NULL-pointer dereference (or slab out-of-bounds access) when retrieving the interrupt-endpoint bInterval on ndo_open() in case a device lacks the expected endpoints. Fixes: 40a82917b1d3 ("net/usb/r8152: enable interrupt transfer") Cc: hayeswang <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14Merge branch 'QRTR-flow-control-improvements'David S. Miller1-72/+247
Bjorn Andersson says: ==================== QRTR flow control improvements In order to prevent overconsumption of resources on the remote side QRTR implements a flow control mechanism. Move the handling of the incoming confirm_rx to the receiving process to ensure incoming flow is controlled. Then implement outgoing flow control, using the recommended algorithm of counting outstanding non-confirmed messages and blocking when hitting a limit. The last three patches refactors the node assignment and port lookup, in order to remove the worker in the receive path. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: qrtr: Remove receive workerBjorn Andersson1-40/+17
Rather than enqueuing messages and scheduling a worker to deliver them to the individual sockets we can now, thanks to the previous work, move this directly into the endpoint callback. This saves us a context switch per incoming message and removes the possibility of an opportunistic suspend to happen between the message is coming from the endpoint until it ends up in the socket's receive buffer. Signed-off-by: Bjorn Andersson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: qrtr: Make qrtr_port_lookup() use RCUBjorn Andersson1-2/+6
The important part of qrtr_port_lookup() wrt synchronization is that the function returns a reference counted struct qrtr_sock, or fail. As such we need only to ensure that an decrement of the object's refcount happens inbetween the finding of the object in the idr and qrtr_port_lookup()'s own increment of the object. By using RCU and putting a synchronization point after we remove the mapping from the idr, but before it can be released we achieve this - with the benefit of not having to hold the mutex in qrtr_port_lookup(). Signed-off-by: Bjorn Andersson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: qrtr: Migrate node lookup tree to spinlockBjorn Andersson1-6/+14
Move operations on the qrtr_nodes radix tree under a separate spinlock and make the qrtr_nodes tree GFP_ATOMIC, to allow operation from atomic context in a subsequent patch. Signed-off-by: Bjorn Andersson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: qrtr: Implement outgoing flow controlBjorn Andersson1-7/+187
In order to prevent overconsumption of resources on the remote side QRTR implements a flow control mechanism. The mechanism works by the sender keeping track of the number of outstanding unconfirmed messages that has been transmitted to a particular node/port pair. Upon count reaching a low watermark (L) the confirm_rx bit is set in the outgoing message and when the count reaching a high watermark (H) transmission will be blocked upon the reception of a resume_tx message from the remote, that resets the counter to 0. This guarantees that there will be at most 2H - L messages in flight. Values chosen for L and H are 5 and 10 respectively. Signed-off-by: Bjorn Andersson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: qrtr: Move resume-tx transmission to recvmsgBjorn Andersson1-27/+33
The confirm-rx bit is used to implement a per port flow control, in order to make sure that no messages are dropped due to resource exhaustion. Move the resume-tx transmission to recvmsg to only confirm messages as they are consumed by the application. Signed-off-by: Bjorn Andersson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14pktgen: Allow configuration of IPv6 source address rangeNiu Xilei1-0/+98
Pktgen can use only one IPv6 source address from output device or src6 command setting. In pressure test we need create lots of sessions more than 65535. So add src6_min and src6_max command to set the range. Signed-off-by: Niu Xilei <[email protected]> Changes since v3: - function set_src_in6_addr use static instead of static inline - precompute min_in6_l,min_in6_h,max_in6_h,max_in6_l in setup time Changes since v2: - reword subject line Changes since v1: - only create IPv6 source address over least significant 64 bit range Signed-off-by: David S. Miller <[email protected]>
2020-01-14Merge tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2-7/+24
Pull NFS client bugfixes from Anna Schumaker: "Three NFS over RDMA fixes for bugs Chuck found that can be hit during device removal: - Fix create_qp crash on device unload - Fix completion wait during device removal - Fix oops in receive handler after device removal" * tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: xprtrdma: Fix oops in Receive handler after device removal xprtrdma: Fix completion wait during device removal xprtrdma: Fix create_qp crash on device unload
2020-01-14block: fix get_max_segment_size() overflow on 32bit archMing Lei1-2/+7
Commit 429120f3df2d starts to take account of segment's start dma address when computing max segment size, and data type of 'unsigned long' is used to do that. However, the segment mask may be 0xffffffff, so the figured out segment size may be overflowed in case of zero physical address on 32bit arch. Fix the issue by returning queue_max_segment_size() directly when that happens. Fixes: 429120f3df2d ("block: fix splitting segments on boundary masks") Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Cc: Christoph Hellwig <[email protected]> Tested-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-14nfc: No need to set .owner platform_driver_registerTian Tao1-1/+0
the i2c_add_driver will set the .owner to THIS_MODULE Signed-off-by: Tian Tao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14hv_sock: Remove the accept port restrictionSunil Muthuswamy1-59/+6
Currently, hv_sock restricts the port the guest socket can accept connections on. hv_sock divides the socket port namespace into two parts for server side (listening socket), 0-0x7FFFFFFF & 0x80000000-0xFFFFFFFF (there are no restrictions on client port namespace). The first part (0-0x7FFFFFFF) is reserved for sockets where connections can be accepted. The second part (0x80000000-0xFFFFFFFF) is reserved for allocating ports for the peer (host) socket, once a connection is accepted. This reservation of the port namespace is specific to hv_sock and not known by the generic vsock library (ex: af_vsock). This is problematic because auto-binds/ephemeral ports are handled by the generic vsock library and it has no knowledge of this port reservation and could allocate a port that is not compatible with hv_sock (and legitimately so). The issue hasn't surfaced so far because the auto-bind code of vsock (__vsock_bind_stream) prior to the change 'VSOCK: bind to random port for VMADDR_PORT_ANY' would start walking up from LAST_RESERVED_PORT (1023) and start assigning ports. That will take a large number of iterations to hit 0x7FFFFFFF. But, after the above change to randomize port selection, the issue has started coming up more frequently. There has really been no good reason to have this port reservation logic in hv_sock from the get go. Reserving a local port for peer ports is not how things are handled generally. Peer ports should reflect the peer port. This fixes the issue by lifting the port reservation, and also returns the right peer port. Since the code converts the GUID to the peer port (by using the first 4 bytes), there is a possibility of conflicts, but that seems like a reasonable risk to take, given this is limited to vsock and that only applies to all local sockets. Signed-off-by: Sunil Muthuswamy <[email protected]> Signed-off-by: David S. Miller <[email protected]>