aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-08-03sfc_ef100: statistics gatheringEdward Cree3-0/+217
MAC stats work much the same as on EF10, with a periodic DMA to a region specified via an MCDI. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03sfc_ef100: plumb in fini_dmaqEdward Cree1-0/+1
Bring down the TX and RX queues at ifdown, so that we can then fini the EVQs (otherwise the MC would return EBUSY because they're still in use). Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03sfc_ef100: RX path for EF100Edward Cree3-9/+167
Includes RSS spreading. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03sfc_ef100: RX filter table management and related gubbinsEdward Cree2-0/+77
Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03sfc_ef100: TX path for EF100 NICsEdward Cree5-5/+396
Includes checksum offload and TSO, so declare those in our netdev features. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03sfc_ef100: read Design Parameters at probe timeEdward Cree2-0/+220
Several parts of the EF100 architecture are parameterised (to allow varying capabilities on FPGAs according to resource constraints), and these parameters are exposed to the driver through a TLV-encoded region of the BAR. For the most part we either don't care about these values at all or just need to sanity-check them against the driver's assumptions, but there are a number of TSO limits which we record so that we will be able to check against them in the TX path when handling GSO skbs. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03sfc_ef100: fail the probe if NIC uses unsol_ev creditsEdward Cree1-0/+6
In the future, EF100 is planned to have a credit-based scheme for handling unsolicited events, which drivers will need to use in order to function correctly. However, current EF100 hardware does not yet generate unsolicited events and the credit scheme has not yet been implemented in firmware. To prevent compatibility problems later if the current driver is used with future firmware which does implement it, we check for the corresponding capability flag (which that future firmware will set), and if found, we refuse to probe. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03sfc_ef100: check firmware version at start-of-dayEdward Cree1-0/+40
Early in EF100 development there was a different format of event descriptor; if the NIC is somehow running the very old firmware which will use that format, fail the probe. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03enetc: use napi_schedule to be compatible with PREEMPT_RTJiafei Pan1-1/+1
The driver calls napi_schedule_irqoff() from a context where, in RT, hardirqs are not disabled, since the IRQ handler is force-threaded. In the call path of this function, __raise_softirq_irqoff() is modifying its per-CPU mask of pending softirqs that must be processed, using or_softirq_pending(). The or_softirq_pending() function is not atomic, but since interrupts are supposed to be disabled, nobody should be preempting it, and the operation should be safe. Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case, it isn't safe, and the pending softirqs mask can get corrupted, resulting in softirqs being lost and never processed. To have common code that works with PREEMPT_RT and with mainline Linux, we can use plain napi_schedule() instead. The difference is that napi_schedule() (via __napi_schedule) also calls local_irq_save, which disables hardirqs if they aren't already. But, since they already are disabled in non-RT, this means that in practice we don't see any measurable difference in throughput or latency with this patch. Signed-off-by: Jiafei Pan <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03dpaa2-eth: use napi_schedule to be compatible with PREEMPT_RTJiafei Pan1-1/+1
The driver calls napi_schedule_irqoff() from a context where, in RT, hardirqs are not disabled, since the IRQ handler is force-threaded. In the call path of this function, __raise_softirq_irqoff() is modifying its per-CPU mask of pending softirqs that must be processed, using or_softirq_pending(). The or_softirq_pending() function is not atomic, but since interrupts are supposed to be disabled, nobody should be preempting it, and the operation should be safe. Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case, it isn't safe, and the pending softirqs mask can get corrupted, resulting in softirqs being lost and never processed. To have common code that works with PREEMPT_RT and with mainline Linux, we can use plain napi_schedule() instead. The difference is that napi_schedule() (via __napi_schedule) also calls local_irq_save, which disables hardirqs if they aren't already. But, since they already are disabled in non-RT, this means that in practice we don't see any measurable difference in throughput or latency with this patch. Signed-off-by: Jiafei Pan <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03Merge branch 'net-dsa-loop-Preparatory-changes-for-802-1Q-data-path'David S. Miller2-38/+64
net: dsa: loop: Preparatory changes for 802.1Q data path Florian Fainelli says: ==================== These patches are all meant to help pave the way for a 802.1Q data path added to the mockup driver, making it more useful than just testing for configuration. Sending those out now since there is no real need to wait. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-08-03net: dsa: loop: Set correct number of portsFlorian Fainelli1-1/+1
We only support DSA_LOOP_NUM_PORTS in the switch, do not tell the DSA core to allocate up to DSA_MAX_PORTS which is nearly the double (6 vs. 11). Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03net: dsa: loop: Wire-up MTU callbacksFlorian Fainelli2-0/+18
For now we simply store the port MTU into a per-port member. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03net: dsa: loop: Move data structures to headerFlorian Fainelli2-31/+41
In preparation for adding support for a mockup data path, move the driver data structures to include/linux/dsa/loop.h such that we can share them between net/dsa/ and drivers/net/dsa/ later on. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03net: dsa: loop: Support 4K VLANsFlorian Fainelli1-4/+2
Allocate a 4K array of VLANs instead of limiting ourselves to just 5 which is arbitrary. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03net: dsa: loop: PVID should be per-portFlorian Fainelli1-4/+4
The PVID should be per-port, this is a preliminary change to support a 802.1Q data path in the driver. Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03cxgb4: add TC-MATCHALL IPv6 supportRahul Lakkireddy3-26/+82
Matching IPv6 traffic require allocating their own individual slots in TCAM. So, fetch additional slots to insert IPv6 rules. Also, fetch the cumulative stats of all the slots occupied by the Matchall rule. Signed-off-by: Rahul Lakkireddy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03net: dsa: sja1105: poll for extts events from a timerVladimir Oltean2-34/+50
The current poll interval is enough to ensure that rising and falling edge events are not lost for a 1 PPS signal with 50% duty cycle. But when we deliver the events to user space, it will try to infer if they were corresponding to a rising or to a falling edge (the kernel driver doesn't know that either). User space will try to make that inference based on the time at which the PPS master had emitted the pulse (i.e. if it's a .0 time, it's rising edge, if it's .5 time, it's falling edge). But there is no in-kernel API for retrieving the precise timestamp corresponding to a PPS master (aka perout) pulse. So user space has to guess even that. It will read the PTP time on the PPS master right after we've delivered the extts event, and declare that the PPS master time was just the closest integer second, based on 2 thresholds (lower than .25, or higher than .75, and ignore anything else). Except that, if we poll for extts events (and our hardware doesn't really help us, by not providing an interrupt), then there is a risk that the poll period (and therefore the time at which the event is delivered) might confuse user space. Because we are always scheduling the next extts poll at SJA1105_EXTTS_INTERVAL "from now" (that's the only thing that the schedule_delayed_work() API gives us), it means that the start time of the next delayed workqueue will always be shifted to the right a little bit (shifted with the SPI access duration of this workqueue run). In turn, because user space sees extts events that are non-periodic compared to the PPS master's time, this means that it might start making wrong guesses about rising/falling edge. To understand the effect, here is the output of ts2phc currently. Notice the 'src' timestamps of the 'SKIP extts' events, and how they have a large wander. They keep increasing until the upper limit for the ignore threshold (.75 seconds), after which the application starts ignoring the _other_ edge. ts2phc[26.624]: /dev/ptp3 SKIP extts index 0 at 21.449898912 src 21.657784518 ts2phc[27.133]: adding tstamp 21.949894240 to clock /dev/ptp3 ts2phc[27.133]: adding tstamp 22.000000000 to clock /dev/ptp1 ts2phc[27.133]: /dev/ptp3 offset 640 s2 freq +5112 ts2phc[27.636]: /dev/ptp3 SKIP extts index 0 at 22.449889360 src 22.669398022 ts2phc[28.140]: adding tstamp 22.949884376 to clock /dev/ptp3 ts2phc[28.140]: adding tstamp 23.000000000 to clock /dev/ptp1 ts2phc[28.140]: /dev/ptp3 offset 96 s2 freq +4760 ts2phc[28.644]: /dev/ptp3 SKIP extts index 0 at 23.449879504 src 23.677420422 ts2phc[29.153]: adding tstamp 23.949874704 to clock /dev/ptp3 ts2phc[29.153]: adding tstamp 24.000000000 to clock /dev/ptp1 ts2phc[29.153]: /dev/ptp3 offset -264 s2 freq +4429 ts2phc[29.656]: /dev/ptp3 SKIP extts index 0 at 24.449870008 src 24.689407238 ts2phc[30.160]: adding tstamp 24.949865376 to clock /dev/ptp3 ts2phc[30.160]: adding tstamp 25.000000000 to clock /dev/ptp1 ts2phc[30.160]: /dev/ptp3 offset -280 s2 freq +4334 ts2phc[30.664]: /dev/ptp3 SKIP extts index 0 at 25.449860760 src 25.697449926 ts2phc[31.168]: adding tstamp 25.949856176 to clock /dev/ptp3 ts2phc[31.168]: adding tstamp 26.000000000 to clock /dev/ptp1 ts2phc[31.168]: /dev/ptp3 offset -176 s2 freq +4354 ts2phc[31.672]: /dev/ptp3 SKIP extts index 0 at 26.449851584 src 26.705433606 ts2phc[32.180]: adding tstamp 26.949846992 to clock /dev/ptp3 ts2phc[32.180]: adding tstamp 27.000000000 to clock /dev/ptp1 ts2phc[32.180]: /dev/ptp3 offset -80 s2 freq +4397 ts2phc[32.684]: /dev/ptp3 SKIP extts index 0 at 27.449842384 src 27.717415110 ts2phc[33.192]: adding tstamp 27.949837768 to clock /dev/ptp3 ts2phc[33.192]: adding tstamp 28.000000000 to clock /dev/ptp1 ts2phc[33.192]: /dev/ptp3 offset 0 s2 freq +4453 ts2phc[33.696]: /dev/ptp3 SKIP extts index 0 at 28.449833128 src 28.729412902 ts2phc[34.200]: adding tstamp 28.949828472 to clock /dev/ptp3 ts2phc[34.200]: adding tstamp 29.000000000 to clock /dev/ptp1 ts2phc[34.200]: /dev/ptp3 offset 8 s2 freq +4461 ts2phc[34.704]: /dev/ptp3 SKIP extts index 0 at 29.449823816 src 29.737416038 ts2phc[35.208]: adding tstamp 29.949819152 to clock /dev/ptp3 ts2phc[35.208]: adding tstamp 30.000000000 to clock /dev/ptp1 ts2phc[35.208]: /dev/ptp3 offset -8 s2 freq +4447 ts2phc[35.712]: /dev/ptp3 SKIP extts index 0 at 30.449814496 src 30.745554982 ts2phc[36.216]: adding tstamp 30.949809840 to clock /dev/ptp3 ts2phc[36.216]: adding tstamp 31.000000000 to clock /dev/ptp1 ts2phc[36.216]: /dev/ptp3 offset -8 s2 freq +4445 ts2phc[36.468]: /dev/ptp3 SKIP extts index 0 at 31.449805184 src 31.501109446 ts2phc[36.972]: adding tstamp 31.949800536 to clock /dev/ptp3 ts2phc[36.972]: adding tstamp 32.000000000 to clock /dev/ptp1 ts2phc[36.972]: /dev/ptp3 offset -8 s2 freq +4442 ts2phc[37.480]: /dev/ptp3 SKIP extts index 0 at 32.449795896 src 32.513320070 ts2phc[37.984]: adding tstamp 32.949791248 to clock /dev/ptp3 ts2phc[37.984]: adding tstamp 33.000000000 to clock /dev/ptp1 ts2phc[37.984]: /dev/ptp3 offset 0 s2 freq +4448 Fix that by taking the following measures: - Schedule the poll from a timer. Because we are really scheduling the timer periodically, the extts events delivered to user space are periodic too, and don't suffer from the "shift-to-the-right" effect. - Increase the poll period to 6 times a second. This imposes a smaller upper bound to the shift that can occur to the delivery time of extts events, and makes user space (ts2phc) to always interpret correctly which events should be skipped and which shouldn't. - Move the SPI readout itself to the main PTP kernel thread, instead of the generic workqueue. This is because the timer runs in atomic context, but is also better than before, because if needed, we can chrt & taskset this kernel thread, to ensure it gets enough priority under load. After this patch, one can notice that the wander is greatly reduced, and that the latencies of one extts poll are not propagated to the next. The 'src' timestamp that is skipped is never larger than .65 seconds (which means .15 seconds larger than the time at which the real event occurred at, and .10 seconds smaller than the .75 upper threshold for ignoring the falling edge): ts2phc[40.076]: adding tstamp 34.949261296 to clock /dev/ptp3 ts2phc[40.076]: adding tstamp 35.000000000 to clock /dev/ptp1 ts2phc[40.076]: /dev/ptp3 offset 48 s2 freq +4631 ts2phc[40.568]: /dev/ptp3 SKIP extts index 0 at 35.449256496 src 35.595791078 ts2phc[41.064]: adding tstamp 35.949251744 to clock /dev/ptp3 ts2phc[41.064]: adding tstamp 36.000000000 to clock /dev/ptp1 ts2phc[41.064]: /dev/ptp3 offset -224 s2 freq +4374 ts2phc[41.552]: /dev/ptp3 SKIP extts index 0 at 36.449247088 src 36.579825574 ts2phc[42.044]: adding tstamp 36.949242456 to clock /dev/ptp3 ts2phc[42.044]: adding tstamp 37.000000000 to clock /dev/ptp1 ts2phc[42.044]: /dev/ptp3 offset -240 s2 freq +4290 ts2phc[42.536]: /dev/ptp3 SKIP extts index 0 at 37.449237848 src 37.563828774 ts2phc[43.028]: adding tstamp 37.949233264 to clock /dev/ptp3 ts2phc[43.028]: adding tstamp 38.000000000 to clock /dev/ptp1 ts2phc[43.028]: /dev/ptp3 offset -144 s2 freq +4314 ts2phc[43.520]: /dev/ptp3 SKIP extts index 0 at 38.449228656 src 38.547823238 ts2phc[44.012]: adding tstamp 38.949224048 to clock /dev/ptp3 ts2phc[44.012]: adding tstamp 39.000000000 to clock /dev/ptp1 ts2phc[44.012]: /dev/ptp3 offset -80 s2 freq +4335 ts2phc[44.508]: /dev/ptp3 SKIP extts index 0 at 39.449219432 src 39.535846118 ts2phc[44.996]: adding tstamp 39.949214816 to clock /dev/ptp3 ts2phc[44.996]: adding tstamp 40.000000000 to clock /dev/ptp1 ts2phc[44.996]: /dev/ptp3 offset -32 s2 freq +4359 ts2phc[45.488]: /dev/ptp3 SKIP extts index 0 at 40.449210192 src 40.515824678 ts2phc[45.980]: adding tstamp 40.949205568 to clock /dev/ptp3 ts2phc[45.980]: adding tstamp 41.000000000 to clock /dev/ptp1 ts2phc[45.980]: /dev/ptp3 offset 8 s2 freq +4390 ts2phc[46.636]: /dev/ptp3 SKIP extts index 0 at 41.449200928 src 41.664176902 ts2phc[47.132]: adding tstamp 41.949196288 to clock /dev/ptp3 ts2phc[47.132]: adding tstamp 42.000000000 to clock /dev/ptp1 ts2phc[47.132]: /dev/ptp3 offset 0 s2 freq +4384 ts2phc[47.620]: /dev/ptp3 SKIP extts index 0 at 42.449191656 src 42.648117190 ts2phc[48.112]: adding tstamp 42.949187016 to clock /dev/ptp3 ts2phc[48.112]: adding tstamp 43.000000000 to clock /dev/ptp1 ts2phc[48.112]: /dev/ptp3 offset 0 s2 freq +4384 ts2phc[48.604]: /dev/ptp3 SKIP extts index 0 at 43.449182384 src 43.632112582 ts2phc[49.100]: adding tstamp 43.949177736 to clock /dev/ptp3 ts2phc[49.100]: adding tstamp 44.000000000 to clock /dev/ptp1 ts2phc[49.100]: /dev/ptp3 offset -8 s2 freq +4376 ts2phc[49.588]: /dev/ptp3 SKIP extts index 0 at 44.449173096 src 44.616136774 ts2phc[50.080]: adding tstamp 44.949168464 to clock /dev/ptp3 ts2phc[50.080]: adding tstamp 45.000000000 to clock /dev/ptp1 ts2phc[50.080]: /dev/ptp3 offset 8 s2 freq +4390 ts2phc[50.572]: /dev/ptp3 SKIP extts index 0 at 45.449163816 src 45.600134662 ts2phc[51.064]: adding tstamp 45.949159160 to clock /dev/ptp3 ts2phc[51.064]: adding tstamp 46.000000000 to clock /dev/ptp1 ts2phc[51.064]: /dev/ptp3 offset -8 s2 freq +4376 ts2phc[51.556]: /dev/ptp3 SKIP extts index 0 at 46.449154528 src 46.584588550 ts2phc[52.048]: adding tstamp 46.949149896 to clock /dev/ptp3 ts2phc[52.048]: adding tstamp 47.000000000 to clock /dev/ptp1 ts2phc[52.048]: /dev/ptp3 offset 0 s2 freq +4382 ts2phc[52.540]: /dev/ptp3 SKIP extts index 0 at 47.449145256 src 47.568132198 ts2phc[53.032]: adding tstamp 47.949140616 to clock /dev/ptp3 ts2phc[53.032]: adding tstamp 48.000000000 to clock /dev/ptp1 ts2phc[53.032]: /dev/ptp3 offset 0 s2 freq +4382 ts2phc[53.524]: /dev/ptp3 SKIP extts index 0 at 48.449135968 src 48.552121446 ts2phc[54.016]: adding tstamp 48.949131320 to clock /dev/ptp3 ts2phc[54.016]: adding tstamp 49.000000000 to clock /dev/ptp1 ts2phc[54.016]: /dev/ptp3 offset 0 s2 freq +4382 ts2phc[54.512]: /dev/ptp3 SKIP extts index 0 at 49.449126680 src 49.540147014 ts2phc[55.000]: adding tstamp 49.949122040 to clock /dev/ptp3 ts2phc[55.000]: adding tstamp 50.000000000 to clock /dev/ptp1 ts2phc[55.000]: /dev/ptp3 offset 0 s2 freq +4382 ts2phc[55.492]: /dev/ptp3 SKIP extts index 0 at 50.449117400 src 50.520119078 ts2phc[55.988]: adding tstamp 50.949112768 to clock /dev/ptp3 ts2phc[55.988]: adding tstamp 51.000000000 to clock /dev/ptp1 ts2phc[55.988]: /dev/ptp3 offset 8 s2 freq +4390 ts2phc[56.476]: /dev/ptp3 SKIP extts index 0 at 51.449108120 src 51.504175910 ts2phc[57.132]: adding tstamp 51.949103480 to clock /dev/ptp3 ts2phc[57.132]: adding tstamp 52.000000000 to clock /dev/ptp1 ts2phc[57.132]: /dev/ptp3 offset 0 s2 freq +4384 ts2phc[57.624]: /dev/ptp3 SKIP extts index 0 at 52.449098840 src 52.651833574 ts2phc[58.116]: adding tstamp 52.949094200 to clock /dev/ptp3 ts2phc[58.116]: adding tstamp 53.000000000 to clock /dev/ptp1 ts2phc[58.116]: /dev/ptp3 offset 8 s2 freq +4392 ts2phc[58.612]: /dev/ptp3 SKIP extts index 0 at 53.449089560 src 53.639826918 ts2phc[59.100]: adding tstamp 53.949084920 to clock /dev/ptp3 ts2phc[59.100]: adding tstamp 54.000000000 to clock /dev/ptp1 ts2phc[59.100]: /dev/ptp3 offset 8 s2 freq +4394 ts2phc[59.592]: /dev/ptp3 SKIP extts index 0 at 54.449080272 src 54.619842278 ts2phc[60.084]: adding tstamp 54.949075624 to clock /dev/ptp3 ts2phc[60.084]: adding tstamp 55.000000000 to clock /dev/ptp1 ts2phc[60.084]: /dev/ptp3 offset 8 s2 freq +4397 ts2phc[60.576]: /dev/ptp3 SKIP extts index 0 at 55.449070968 src 55.603885542 ts2phc[61.068]: adding tstamp 55.949066312 to clock /dev/ptp3 ts2phc[61.068]: adding tstamp 56.000000000 to clock /dev/ptp1 ts2phc[61.068]: /dev/ptp3 offset 0 s2 freq +4391 ts2phc[61.560]: /dev/ptp3 SKIP extts index 0 at 56.449061680 src 56.587885798 ts2phc[62.052]: adding tstamp 56.949057032 to clock /dev/ptp3 ts2phc[62.052]: adding tstamp 57.000000000 to clock /dev/ptp1 ts2phc[62.052]: /dev/ptp3 offset -8 s2 freq +4383 Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mptcp: fix bogus sendmsg() return code under pressurePaolo Abeni1-2/+1
In case of memory pressure, mptcp_sendmsg() may call sk_stream_wait_memory() after succesfully xmitting some bytes. If the latter fails we currently return to the user-space the error code, ignoring the succeful xmit. Address the issue always checking for the xmitted bytes before mptcp_sendmsg() completes. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03Merge branch 'mlxsw-Add-support-for-buffer-drop-traps'David S. Miller15-66/+406
Ido Schimmel says: ==================== mlxsw: Add support for buffer drop traps Petr says: A recent patch set added the ability to mirror buffer related drops (e.g., early drops) through a netdev. This patch set adds the ability to trap such packets to the local CPU for analysis. The trapping towards the CPU is configured by using tc-trap action instead of tc-mirred as was done when the packets were mirrored through a netdev. A future patch set will also add the ability to sample the dropped packets using tc-sample action. The buffer related drop traps are added to devlink, which means that the dropped packets can be reported to user space via the kernel's drop_monitor module. Patch set overview: Patch #1 adds the early_drop trap to devlink Patch #2 adds extack to a few devlink operations to facilitate better error reporting to user space. This is necessary - among other things - because the action of buffer drop traps cannot be changed in mlxsw Patch #3 performs a small refactoring in mlxsw, patch #4 fixes a bug that this patchset would trigger. Patches #5-#6 add the infrastructure required to support different traps / trap groups in mlxsw per-ASIC. This is required because buffer drop traps are not supported by Spectrum-1 Patch #7 extends mlxsw to register the early_drop trap Patch #8 adds the offload logic for the "trap" action at a qevent block. Patch #9 adds a mlxsw-specific selftest. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-08-03selftests: mlxsw: RED: Test offload of trapping on RED qeventsPetr Machata2-6/+40
Add a selftest for RED early_drop and mark qevents when a trap action is attached at the associated block. Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mlxsw: spectrum_qdisc: Offload action trap for qeventsPetr Machata3-13/+95
When offloading action trap on a qevent, pass to_dev of NULL to the SPAN module to trigger the mirror to the CPU port. Query the buffer drops policer and use it for policing of the trapped traffic. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mlxsw: spectrum_trap: Add early_drop trapIdo Schimmel3-3/+53
As previously explained, packets that are dropped due to buffer related reasons (e.g., tail drop, early drop) can be mirrored to the CPU port. These packets are then trapped with one of the "mirror session" traps and their CQE includes the reason for which the packet was mirrored. Register with devlink a new trap, early_drop, and initialize the corresponding Rx listener with the appropriate mirror reason. Return an error in case user tries to change the traps' action, as this is not supported. Since Spectrum-1 does not support these traps, the above is only done for Spectrum-2 onwards. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mlxsw: spectrum_trap: Allow for per-ASIC traps initializationIdo Schimmel2-9/+75
Subsequent patches will need to register different traps for Spectrum-1 and Spectrum-2 onwards. Enable that by invoking a per-ASIC operation during traps initialization. Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mlxsw: spectrum_trap: Allow for per-ASIC trap groups initializationIdo Schimmel4-9/+93
Subsequent patches will need to register different trap groups for Spectrum-1 and Spectrum-2 onwards. Enable that by invoking a per-ASIC operation during trap groups initialization. Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mlxsw: spectrum_span: On policer_id_base_ref_count, use dec_and_testPetr Machata1-1/+2
When unsetting policer base, the SPAN code currently uses refcount_dec(). However that function splats when the counter reaches zero, because reaching zero without actually testing is in general indicative of a missing cleanup. There is no cleanup to be done here, but nonetheless, use refcount_dec_and_test() as required. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mlxsw: spectrum_trap: Use 'size_t' for array sizesIdo Schimmel2-5/+5
Use 'size_t' instead of 'u64' for array sizes, as this this is correct type to use for expressions involving sizeof(). Suggested-by: Petr Machata <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03devlink: Pass extack when setting trap's action and group's parametersIdo Schimmel7-20/+35
A later patch will refuse to set the action of certain traps in mlxsw and also to change the policer binding of certain groups. Pass extack so that failure could be communicated clearly to user space. Reviewed-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03devlink: Add early_drop trapAmit Cohen3-0/+8
Add the packet trap that can report packets that were ECN marked due to RED AQM. Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03Merge tag 'platform-drivers-x86-v5.9-1' of ↵Linus Torvalds25-135/+846
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Andy Shevchenko: - ASUS WMI driver honors BAT1 name of the battery (quite a few new laptops are using it) - Dell WMI driver supports new key codes and backlight events - ThinkPad ACPI driver now may use standard charge threshold interface, it also has been updated to provide Laptop or Desktop mode to the user - Intel Speed Select Technology gained support on Sapphire Rapids platform - Regular update of Speed Select Technology tools - Mellanox has been updated to support complex attributes - PMC core driver has been fixed to show correct names for LPM0 register - HTTP links were replaced by HTTPS ones where it applies - Miscellaneous fixes and cleanups here and there * tag 'platform-drivers-x86-v5.9-1' of git://git.infradead.org/linux-platform-drivers-x86: (42 commits) platform/x86: asus-nb-wmi: Drop duplicate DMI quirk structures platform/x86: thinkpad_acpi: Make some symbols static platform/x86: thinkpad_acpi: add documentation for battery charge control platform/x86: thinkpad_acpi: use standard charge control attribute names platform/x86: thinkpad_acpi: remove unused defines platform/x86: ISST: drop a duplicated word in isst_if.h tools/power/x86/intel-speed-select: Update version for v5.9 tools/power/x86/intel-speed-select: Add retries for mail box commands tools/power/x86/intel-speed-select: Add option to delay mbox commands tools/power/x86/intel-speed-select: Ignore -o option processing on error tools/power/x86/intel-speed-select: Change path for caching topology info platform/x86: acerhdf: Replace HTTP links with HTTPS ones platform/x86: apple-gmux: Replace HTTP links with HTTPS ones platform/x86: pcengines-apuv2: revert wiring up simswitch GPIO as LED platform/x86: mlx-platform: Extend FAN platform data description platform_data/mlxreg: Add presence register field for FAN devices Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces platform/mellanox: mlxreg-io: Add support for complex attributes platform/x86: mlx-platform: Add more definitions for system attributes platform_data/mlxreg: Add support for complex attributes ...
2020-08-03fib: Fix undef compile warningYueHaibing1-1/+1
net/core/fib_rules.c:26:7: warning: "CONFIG_IP_MULTIPLE_TABLES" is not defined, evaluates to 0 [-Wundef] #elif CONFIG_IP_MULTIPLE_TABLES ^~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 8b66a6fd34f5 ("fib: fix another fib_rules_ops indirect call wrapper problem") Signed-off-by: YueHaibing <[email protected]> Acked-By: Brian Vazquez <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03mptcp: use mptcp_for_each_subflow in mptcp_stream_acceptGeliang Tang1-1/+1
Use mptcp_for_each_subflow in mptcp_stream_accept instead of open-coding. Signed-off-by: Geliang Tang <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03Merge tag 'mac80211-next-for-davem-2020-08-03' of ↵David S. Miller6-6/+14
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A few more changes, notably: * handle new SAE (WPA3 authentication) status codes in the correct way * fix a while that should be an if instead, avoiding infinite loops * handle beacon filtering changing better ==================== Signed-off-by: David S. Miller <[email protected]>
2020-08-03net: stmmac: fix failed to suspend if phy based WOL is enabledJisheng Zhang1-1/+1
With the latest net-next tree, if test suspend/resume after enabling WOL, we get error as below: [ 487.086365] dpm_run_callback(): mdio_bus_suspend+0x0/0x30 returns -16 [ 487.086375] PM: Device stmmac-0:00 failed to suspend: error -16 -16 means -EBUSY, this is because I didn't enable wakeup of the correct device when implementing phy based WOL feature. To be honest, I caught the issue when implementing phy based WOL and then fix it locally, but forgot to amend the phy based wol patch. Today, I found the issue by testing net-next tree. Signed-off-by: Jisheng Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03seg6_iptunnel: Refactor seg6_lwt_headroom out of uapi headerIoana-Ruxandra Stăncioi2-21/+17
Refactor the function seg6_lwt_headroom out of the seg6_iptunnel.h uapi header, because it is only used in seg6_iptunnel.c. Moreover, it is only used in the kernel code, as indicated by the "#ifdef __KERNEL__". Suggested-by: David Miller <[email protected]> Signed-off-by: Ioana-Ruxandra Stăncioi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03tcp: apply a floor of 1 for RTT samples from TCP timestampsJianfeng Wang1-0/+2
For retransmitted packets, TCP needs to resort to using TCP timestamps for computing RTT samples. In the common case where the data and ACK fall in the same 1-millisecond interval, TCP senders with millisecond- granularity TCP timestamps compute a ca_rtt_us of 0. This ca_rtt_us of 0 propagates to rs->rtt_us. This value of 0 can cause performance problems for congestion control modules. For example, in BBR, the zero min_rtt sample can bring the min_rtt and BDP estimate down to 0, reduce snd_cwnd and result in a low throughput. It would be hard to mitigate this with filtering in the congestion control module, because the proper floor to apply would depend on the method of RTT sampling (using timestamp options or internally-saved transmission timestamps). This fix applies a floor of 1 for the RTT sample delta from TCP timestamps, so that seq_rtt_us, ca_rtt_us, and rs->rtt_us will be at least 1 * (USEC_PER_SEC / TCP_TS_HZ). Note that the receiver RTT computation in tcp_rcv_rtt_measure() and min_rtt computation in tcp_update_rtt_min() both already apply a floor of 1 timestamp tick, so this commit makes the code more consistent in avoiding this edge case of a value of 0. Signed-off-by: Jianfeng Wang <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Kevin Yang <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-03Merge tag 'ras-core-2020-08-03' of ↵Linus Torvalds4-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Ingo Molnar: "Boris is on vacation and he asked us to send you the pending RAS bits: - Print the PPIN field on CPUs that fill them out - Fix an MCE injection bug - Simplify a kzalloc in dev_mcelog_init_device()" * tag 'ras-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce, EDAC/mce_amd: Print PPIN in machine check records x86/mce/dev-mcelog: Use struct_size() helper in kzalloc() x86/mce/inject: Fix a wrong assignment of i_mce.status
2020-08-03Merge tag 'x86-timers-2020-08-03' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer update from Ingo Molnar: "Set the X86_FEATURE_TSC_KNOWN_FREQ flag for Xen guests, to avoid recalibration" * tag 'x86-timers-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/xen/time: Set the X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz()
2020-08-03Merge tag 'x86-platform-2020-08-03' of ↵Linus Torvalds14-1450/+86
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "The biggest change is the removal of SGI UV1 support, which allowed the removal of the legacy EFI old_mmap code as well. This removes quite a bunch of old code & quirks" * tag 'x86-platform-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Remove unused EFI_UV1_MEMMAP code x86/platform/uv: Remove uv bios and efi code related to EFI_UV1_MEMMAP x86/efi: Remove references to no-longer-used efi_have_uv1_memmap() x86/efi: Delete SGI UV1 detection. x86/platform/uv: Remove efi=old_map command line option x86/platform/uv: Remove vestigial mention of UV1 platform from bios header x86/platform/uv: Remove support for UV1 platform from uv x86/platform/uv: Remove support for uv1 platform from uv_hub x86/platform/uv: Remove support for UV1 platform from uv_bau x86/platform/uv: Remove support for UV1 platform from uv_mmrs x86/platform/uv: Remove support for UV1 platform from x2apic_uv_x x86/platform/uv: Remove support for UV1 platform from uv_tlb x86/platform/uv: Remove support for UV1 platform from uv_time
2020-08-03Merge tag 'x86-mm-2020-08-03' of ↵Linus Torvalds3-10/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mmm update from Ingo Molnar: "The biggest change is to not sync the vmalloc and ioremap ranges for x86-64 anymore" * tag 'x86-mm-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/64: Make sync_global_pgds() static x86/mm/64: Do not sync vmalloc/ioremap mappings x86/mm: Pre-allocate P4D/PUD pages for vmalloc area
2020-08-03Merge tag 'x86-misc-2020-08-03' of ↵Linus Torvalds1-0/+69
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MSR filtering from Ingo Molnar: "Filter MSR writes from user-space by default, and print a syslog entry if they happen outside the allowed set of MSRs, which is a single one for now, MSR_IA32_ENERGY_PERF_BIAS. The plan is to eventually disable MSR writes by default (they can still be enabled via allow_writes=on)" * tag 'x86-misc-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Filter MSR writes
2020-08-03Merge tag 'x86-microcode-2020-08-03' of ↵Linus Torvalds2-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode update from Ingo Molnar: "Remove the microcode loader's FW_LOADER coupling" * tag 'x86-microcode-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Do not select FW_LOADER
2020-08-03Merge tag 'x86-fpu-2020-08-03' of ↵Linus Torvalds8-0/+243
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU selftest from Ingo Molnar: "Add the /sys/kernel/debug/selftest_helpers/test_fpu FPU self-test" * tag 'x86-fpu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/fpu: Add an FPU selftest
2020-08-03Merge tag 'x86-cpu-2020-08-03' of ↵Linus Torvalds15-88/+105
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molar: - prepare for Intel's new SERIALIZE instruction - enable split-lock debugging on more CPUs - add more Intel CPU models - optimize stack canary initialization a bit - simplify the Spectre logic a bit * tag 'x86-cpu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Refactor sync_core() for readability x86/cpu: Relocate sync_core() to sync_core.h x86/cpufeatures: Add enumeration for SERIALIZE instruction x86/split_lock: Enable the split lock feature on Sapphire Rapids and Alder Lake CPUs x86/cpu: Add Lakefield, Alder Lake and Rocket Lake models to the to Intel CPU family x86/stackprotector: Pre-initialize canary for secondary CPUs x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
2020-08-03Merge tag 'x86-core-2020-08-03' of ↵Linus Torvalds4-51/+57
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug fixlets from Ingo Molnar: "Improve x86 debuggability: print registers with the same log level as the backtrace" * tag 'x86-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/dumpstack: Show registers dump with trace's log level x86/dumpstack: Add log_lvl to __show_regs() x86/dumpstack: Add log_lvl to show_iret_regs()
2020-08-03Merge tag 'x86-cleanups-2020-08-03' of ↵Linus Torvalds20-49/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups all around the place" * tag 'x86-cleanups-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioperm: Initialize pointer bitmap with NULL rather than 0 x86: uv: uv_hub.h: Delete duplicated word x86: cmpxchg_32.h: Delete duplicated word x86: bootparam.h: Delete duplicated word x86/mm: Remove the unused mk_kernel_pgd() #define x86/tsc: Remove unused "US_SCALE" and "NS_SCALE" leftover macros x86/ioapic: Remove unused "IOAPIC_AUTO" define x86/mm: Drop unused MAX_PHYSADDR_BITS x86/msr: Move the F15h MSRs where they belong x86/idt: Make idt_descr static initrd: Remove erroneous comment x86/mm/32: Fix -Wmissing prototypes warnings for init.c cpu/speculation: Add prototype for cpu_show_srbds() x86/mm: Fix -Wmissing-prototypes warnings for arch/x86/mm/init.c x86/asm: Unify __ASSEMBLY__ blocks x86/cpufeatures: Mark two free bits in word 3 x86/msr: Lift AMD family 0x15 power-specific MSRs
2020-08-03Merge tag 'x86-build-2020-08-03' of ↵Linus Torvalds3-138/+79
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "Misc changes: refresh defconfigs and simplify the boot image link stage" * tag 'x86-build-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/defconfigs: Refresh defconfig files x86/build: Move max-page-size option to LDFLAGS_vmlinux x86/defconfigs: Remove CONFIG_CRYPTO_AES_586 from i386_defconfig
2020-08-04gpio: wcove: Request IRQ after all initialisation doneAndy Shevchenko1-7/+7
There is logically better to request IRQ when we initialise all structures. Align the driver with the rest on the same matter. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2020-08-04gpio: crystalcove: Free IRQ on error pathAndy Shevchenko1-15/+3
It appears that all, but request_irq(), calls in the driver are device managed. In unlikely case of devm_gpiochip_add_data() failure the IRQ left requested. Free IRQ on error path by switching to devm_request_threaded_irq() API. Byproduct of this change is a drop of ->remove() callback completely. Fixes: 945e72db36bd ("gpio: crystalcove: Use irqchip template") Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2020-08-04gpio: pca953x: Request IRQ after all initialisation doneAndy Shevchenko1-11/+11
There is logically better to request IRQ when we initialise all structures. Align the driver with the rest on the same matter. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>