Age | Commit message (Collapse) | Author | Files | Lines |
|
Use tcf spinlock and rcu to protect params pointer from concurrent
modification during dump and init. Use rcu swap operation to reassign
params pointer under protection of tcf lock. (old params value is not used
by init, so there is no need of standalone rcu dereference step)
Ife action has meta-actions that are compiled as standalone modules. Rtnl
mutex must be released while loading a kernel module. In order to support
execution without rtnl mutex, propagate 'rtnl_held' argument to meta action
loading functions. When requesting meta action module, conditionally
release rtnl lock depending on 'rtnl_held' argument.
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use tcf spinlock to protect gact action private state from concurrent
modification during dump and init. Remove rtnl assertion that is no longer
necessary.
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use tcf lock to protect csum action struct private data from concurrent
modification in init and dump. Use rcu swap operation to reassign params
pointer under protection of tcf lock. (old params value is not used by
init, so there is no need of standalone rcu dereference step)
Remove rtnl assertion that is no longer necessary.
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use tcf spinlock to protect bpf action private data from concurrent
modification during dump and init. Remove rtnl lock assertion that is no
longer necessary.
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Konstantin Khorenko says:
====================
net/sctp: Avoid allocating high order memory with kmalloc()
Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:
sizeof(struct sctp_stream_out) == 24,
((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
which means 9th memory order.
This can lead to a memory allocation failures on the systems
under a memory stress.
We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.
So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.
This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:
* First patch is preparatory - it mechanically wraps all direct
access to assoc->stream.out[] and assoc->stream.in[] arrays
with SCTP_SO() and SCTP_SI() wrappers so that later a direct
array access could be easily changed to an access to a
flex_array (or any other possible alternative).
* Second patch replaces kmalloc_array() with flex_array usage.
v2 changes:
sctp_stream_in() users are updated to provide stream as an argument,
sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().
v3 changes:
Move type chages struct sctp_stream_out -> flex_array to next patch.
Make sctp_stream_{in,out}() static incline and move them to a header.
Performance results (single stream):
====================================
* Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
* Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
RAM: 32 Gb
* netperf: taken from https://github.com/HewlettPackard/netperf.git,
compiled from sources with sctp support
* netperf server and client are run on the same node
* ip link set lo mtu 1500
The script used to run tests:
# cat run_tests.sh
#!/bin/bash
for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
echo "TEST: $test";
for i in `seq 1 3`; do
echo "Iteration: $i";
set -x
netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
-l 60 -- -m 1452;
set +x
done
done
================================================
Results (a bit reformatted to be more readable):
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
v4.18-rc7 v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992 1452 60.21 1125.52 1247.04
212992 212992 1452 60.20 1376.38 1149.95
212992 212992 1452 60.20 1131.40 1163.85
TEST: SCTP_STREAM_MANY
212992 212992 1452 60.00 1111.00 1310.05
212992 212992 1452 60.00 1188.55 1130.50
212992 212992 1452 60.00 1108.06 1162.50
===========
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
v4.18-rc7 v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 1 1 60.00 45486.98 46089.43
212992 212992 1 1 60.00 45584.18 45994.21
212992 212992 1 1 60.00 45703.86 45720.84
TEST: SCTP_RR_MANY
212992 212992 1 1 60.00 40.75 40.77
212992 212992 1 1 60.00 40.58 40.08
212992 212992 1 1 60.00 39.98 39.97
Performance results for many streams:
=====================================
* Kernel: v4.18-rc8 - stock and with 2 patches v3
* Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
RAM: 32 Gb
* sctp_test: https://github.com/sctp/lksctp-tools
* both server and client are run on the same node
* ip link set lo mtu 1500
* sysctl -w vm.max_map_count=65530000 (need it to make memory fragmented)
The script used to run tests:
=============================
# cat run_sctp_test.sh
#!/bin/bash
set -x
uname -r
ip link set lo mtu 1500
swapoff -a
free
cat /proc/buddyinfo
./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
sleep 3
time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
-s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null
killall -9 lt-sctp_test
===============================
Results (a bit reformatted to be more readable):
1) ms stock kernel v4.18-rc8, no memory fragmentation
test 1 test 2 test 3
real 0m14.715s 0m14.593s 0m15.954s
user 0m0.954s 0m0.955s 0m0.854s
sys 0m13.388s 0m12.537s 0m13.749s
2) kernel with fixes, no memory fragmentation
test 1 test 2 test 3
real 0m14.959s 0m14.693s 0m14.762s
user 0m0.948s 0m0.921s 0m0.929s
sys 0m13.538s 0m13.225s 0m13.217s
3) kernel with fixes, memory fragmented
'free':
total used free shared buff/cache available
Mem: 32906008 30555200 302740 764 2048068 266452
Mem: 32906008 30379948 541436 764 1984624 442376
Mem: 32906008 30717312 262380 764 1926316 109908
/proc/buddyinfo:
Node 0, zone Normal 40773 37 34 29 0 0 0 0 0 0 0
Node 0, zone Normal 100332 68 8 4 2 1 1 0 0 0 0
Node 0, zone Normal 31113 7 2 1 0 0 0 0 0 0 0
test 1 test 2 test 3
real 0m14.159s 0m15.252s 0m15.826s
user 0m0.839s 0m1.004s 0m1.048s
sys 0m11.827s 0m14.240s 0m14.778s
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
This path replaces physically contiguous memory arrays
allocated using kmalloc_array() with flexible arrays.
This enables to avoid memory allocation failures on the
systems under a memory stress.
Signed-off-by: Oleg Babin <[email protected]>
Signed-off-by: Konstantin Khorenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.
Signed-off-by: Oleg Babin <[email protected]>
Signed-off-by: Konstantin Khorenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Updated README.
Added config file that contains the minimum required features enabled to
run the tests currently present in the kernel.
This must be updated when new unittests are created and require their own
modules.
Signed-off-by: Keara Leibovitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Guillaume Nault says:
====================
l2tp: rework pppol2tp ioctl handling
The current ioctl() handling code can be simplified. It tests for
non-relevant conditions and uselessly holds sockets. Once useless
code is removed, it becomes even simpler to let pppol2tp_ioctl() handle
commands directly, rather than dispatch them to pppol2tp_tunnel_ioctl()
or pppol2tp_session_ioctl(). That is the approach taken by this series.
Patch #1 and #2 define helper functions aimed at simplifying the rest
of the patch set.
Patch #3 drops useless tests in pppol2p_ioctl() and avoid holding a
refcount on the socket.
Patches #4, #5 and #6 are the core of the series. They let
pppol2tp_ioctl() handle all ioctls and drop the tunnel and session
specific functions.
Then patch #6 brings a little bit of consolidation.
Finally, patch #7 takes advantage of the simplified code to make
pppol2tp sockets compatible with dev_ioctl(). Certainly not a killer
feature, but it is trivial and it is always nice to see l2tp getting
better integration with the rest of the stack.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Return -ENOIOCTLCMD for unknown ioctl commands. This lets dev_ioctl()
handle generic socket ioctls like SIOCGIFNAME or SIOCGIFINDEX.
PF_PPPOX/PX_PROTO_OL2TP was one of the few socket types not honouring
this mechanism.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Integrate memset(0) in pppol2tp_copy_stats() to avoid calling it
manually every time.
While there, constify 'stats'.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
pppol2tp_ioctl() has everything in place for handling PPPIOCGL2TPSTATS
on session sockets. We just need to copy the stats and set ->session_id.
As a side effect of sharing session and tunnel code, ->using_ipsec is
properly set even when the request was made using a session socket.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Handle PPPIOCGL2TPSTATS in pppol2tp_ioctl() if the socket represents a
tunnel. This one is a bit special because the caller may use the tunnel
socket to retrieve statistics of one of its sessions. If the session_id
is set, the corresponding session's statistics are returned, instead of
those of the tunnel. This is handled by the new
pppol2tp_tunnel_copy_stats() helper function.
Set ->tunnel_id and ->using_ipsec out of the conditional, so
that it can be used by the 'else' branch in the following patch.
We cannot do that for ->session_id, because tunnel sockets have to
report the value that was originally passed in 'stats.session_id',
while session sockets have to report their own session_id.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Let pppol2tp_ioctl() handle ioctl commands directly. It still relies on
pppol2tp_{session,tunnel}_ioctl() for PPPIOCGL2TPSTATS.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
* Drop test on 'sk': sock->sk cannot be NULL, or pppox_ioctl() could
not have called us.
* Drop test on 'SOCK_DEAD' state: if this flag was set, the socket
would be in the process of being released and no ioctl could be
running anymore.
* Drop test on 'PPPOX_*' state: we depend on ->sk_user_data to get
the session structure. If it is non-NULL, then the socket is
connected. Testing for PPPOX_* is redundant.
* Retrieve session using ->sk_user_data directly, instead of going
through pppol2tp_sock_to_session(). This avoids grabbing a useless
reference on the socket.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
l2tp_session_get() is used for two different purposes. If 'tunnel' is
NULL, the session is searched globally in the supplied network
namespace. Otherwise it is searched exclusively in the tunnel context.
Callers always know the context in which they need to search the
session. But some of them do provide both a namespace and a tunnel,
making the semantic of the call unclear.
This patch defines l2tp_tunnel_get_session() for lookups done in a
tunnel and restricts l2tp_session_get() to namespace searches.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use helper function to figure out if a tunnel is using ipsec.
Also, avoid accessing ->sk_policy directly since it's RCU protected.
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Ilias Apalodimas says:
====================
netsec driver improvements
This patchset introduces some improvements on socionext netsec driver.
- patch 1/2, avoids unneeded MMIO reads on the Rx path
- patch 2/2, is adjusting the numbers of descriptors used
Changes since v1:
- Move dma_rmb() to protect descriptor accesses until the device
has updated the NETSEC_RX_PKT_OWN_FIELD bit
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Increasing descriptors to 256 from 128 and adjusting the NAPI weight
to 64 increases performace on Rx by ~20% on 64byte packets
Signed-off-by: Ilias Apalodimas <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
MMIO reads for remaining packets in queue occur (at least)twice per
invocation of netsec_process_rx(). We can use the packet descriptor to
identify if it's owned by the hardware and break out, avoiding the more
expensive MMIO read operations. This has a ~2% increase on the pps of the
Rx path when tested with 64byte packets
Signed-off-by: Ilias Apalodimas <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/neterion/vxge/vxge-config.c:1097:6: warning:
variable 'ret' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/neterion/vxge/vxge-config.c:2263:6: warning:
variable 'req_out' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/neterion/vxge/vxge-config.c:2262:22: warning:
variable 'status' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/neterion/vxge/vxge-config.c:2360:22: warning:
variable 'status' set but not used [-Wunused-but-set-variable]
enum vxge_hw_status status = VXGE_HW_OK;
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Caleb Raitto says:
====================
virtio_net: Expand affinity to arbitrary numbers of cpu and vq
Virtio-net tries to pin each virtual queue rx and tx interrupt to a cpu if
there are as many queues as cpus.
Expand this heuristic to configure a reasonable affinity setting also
when the number of cpus != the number of virtual queues.
Patch 1 allows vqs to take an affinity mask with more than 1 cpu.
Patch 2 generalizes the algorithm in virtnet_set_affinity beyond
the case where #cpus == #vqs.
v2 changes:
Renamed "virtio_net: Make vp_set_vq_affinity() take a mask." to
"virtio: Make vp_set_vq_affinity() take a mask."
Tested:
[InstanceSetup]
set_multiqueue = false
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done
0001
0002
0004
0008
0010
0020
0040
0080
0100
0200
0400
0800
1000
2000
4000
8000
$ sudo ethtool -L eth0 combined 15
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0-1
0-1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
15
15
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 14` ; do sudo grep ".*" tx-$i/xps_cpus; done
0003
0004
0008
0010
0020
0040
0080
0100
0200
0400
0800
1000
2000
4000
8000
$ sudo ethtool -L eth0 combined 8
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0-1
0-1
2-3
2-3
4-5
4-5
6-7
6-7
8-9
8-9
10-11
10-11
12-13
12-13
14-15
14-15
9
9
10
10
11
11
12
12
13
13
14
14
15
15
15
15
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 7` ; do sudo grep ".*" tx-$i/xps_cpus; done
0003
000c
0030
00c0
0300
0c00
3000
c000
$ sudo ethtool -L eth0 combined 16
$ sudo sh -c "echo 0 > /sys/devices/system/cpu/cpu15/online"
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
0
0
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done
0001
0002
0004
0008
0010
0020
0040
0080
0100
0200
0400
0800
1000
2000
4000
0001
$ for i in `seq 8 15`; \
do sudo sh -c "echo 0 > /sys/devices/system/cpu/cpu$i/online"; done
$ cd /proc/irq
$ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done
0-15
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
0-15
0-15
0-15
0-15
$ cd /sys/class/net/eth0/queues/
$ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done
0001
0002
0004
0008
0010
0020
0040
0080
0001
0002
0004
0008
0010
0020
0040
0080
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Always set the affinity hint, even if #cpu != #vq.
Handle the case where #cpu > #vq (including when #cpu % #vq != 0) and
when #vq > #cpu (including when #vq % #cpu != 0).
Signed-off-by: Caleb Raitto <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Jon Olson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make vp_set_vq_affinity() take a cpumask instead of taking a single CPU.
If there are fewer queues than cores, queue affinity should be able to
map to multiple cores.
Link: https://patchwork.ozlabs.org/patch/948149/
Suggested-by: Willem de Bruijn <[email protected]>
Signed-off-by: Caleb Raitto <[email protected]>
Acked-by: Gonglei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
PTP support includes:
Ingress, and egress timestamping.
One step timestamping available.
PTP clock support.
Periodic output support.
Signed-off-by: Bryan Whitehead <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Yuchung Cheng says:
====================
tcp: new mechanism to ACK immediately
This patch is a follow-up feature improvement to the recent fixes on
the performance issues in ECN (delayed) ACKs. Many of the fixes use
tcp_enter_quickack_mode routine to force immediate ACKs. However the
routine also reset tracking interactive session. This is not ideal
because these immediate ACKs are required by protocol specifics
unrelated to the interactiveness nature of the application.
This patch set introduces a new flag to send a one-time immediate ACK
without changing the status of interactive session tracking. With this
patch set the immediate ACKs are generated upon these protocol states:
1) When a hole is repaired
2) When CE status changes between subsequent data packets received
3) When a data packet carries CWR flag
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Previously commit 9aee40006190 ("tcp: ack immediately when a cwr
packet arrives") calls tcp_enter_quickack_mode to force sending
two immediate ACKs upon receiving a packet w/ CWR flag. The side
effect is it'll also reset the delayed ACK timer and interactive
session tracking. This patch removes that side effect by using the
new ACK_NOW flag to force an immmediate ACK.
Packetdrill to demonstrate:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
+.1 < [ect0] . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
+0 < [ect0] . 1:1001(1000) ack 1 win 257
+0 > [ect01] . 1:1(0) ack 1001
+0 write(4, ..., 1) = 1
+0 > [ect01] P. 1:2(1) ack 1001
+0 < [ect0] . 1001:2001(1000) ack 2 win 257
+0 write(4, ..., 1) = 1
+0 > [ect01] P. 2:3(1) ack 2001
+0 < [ect0] . 2001:3001(1000) ack 3 win 257
+0 < [ect0] . 3001:4001(1000) ack 3 win 257
// Ack delayed ...
+.01 < [ce] P. 4001:4501(500) ack 3 win 257
+0 > [ect01] . 3:3(0) ack 4001
+0 > [ect01] E. 3:3(0) ack 4501
+.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501 win 100
+.01 < [ect0] W. 4501:5501(1000) ack 4 win 257
// No delayed ACK on CWR flag
+0 > [ect01] . 4:4(0) ack 5501
+.31 < [ect0] . 5501:6501(1000) ack 4 win 257
+0 > [ect01] . 4:4(0) ack 6501
Fixes: 9aee40006190 ("tcp: ack immediately when a cwr packet arrives")
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
RFC 5681 sec 4.2:
To provide feedback to senders recovering from losses, the receiver
SHOULD send an immediate ACK when it receives a data segment that
fills in all or part of a gap in the sequence space.
When a gap is partially filled, __tcp_ack_snd_check already checks
the out-of-order queue and correctly send an immediate ACK. However
when a gap is fully filled, the previous implementation only resets
pingpong mode which does not guarantee an immediate ACK because the
quick ACK counter may be zero. This patch addresses this issue by
marking the one-time immediate ACK flag instead.
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The recent fix of acking immediately in DCTCP on CE status change
has an undesirable side-effect: it also resets TCP ack timer and
disables pingpong mode (interactive session). But the CE status
change has nothing to do with them. This patch addresses that by
using the new one-time immediate ACK flag instead of calling
tcp_enter_quickack_mode().
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add a new flag to indicate a one-time immediate ACK. This flag is
occasionaly set under specific TCP protocol states in addition to
the more common quickack mechanism for interactive application.
In several cases in the TCP code we want to force an immediate ACK
but do not want to call tcp_enter_quickack_mode() because we do
not want to forget the icsk_ack.pingpong or icsk_ack.ato state.
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I placed the "fall through"
annotation at the bottom of the case, which is what GCC is expecting
to find.
Addresses-Coverity-ID: 115075 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I placed the "fall through"
annotation at the bottom of the case, which is what GCC is expecting
to find.
Addresses-Coverity-ID: 1369529 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I replaced the code comment at the
top of the switch statement with a proper "fall through" annotation for
each case, which is what GCC is expecting to find.
Addresses-Coverity-ID: 1056542 ("Missing break in switch")
Addresses-Coverity-ID: 1339579 ("Missing break in switch")
Addresses-Coverity-ID: 1369526 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The static int 'zero' is defined but is never used hence it is
redundant and can be removed. The use of this variable was removed
with commit a158bdd3247b ("rxrpc: Fix call timeouts").
Cleans up clang warning:
warning: 'zero' defined but not used [-Wunused-const-variable=]
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
rtl8152_system_suspend
rtl8152_system_suspend defines the variable "ret", but it is not modified
after initialization. So just remove it.
Signed-off-by: zhong jiang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull networking fixes from David Miller:
"Last bit of straggler fixes...
1) Fix btf library licensing to LGPL, from Martin KaFai lau.
2) Fix error handling in bpf sockmap code, from Daniel Borkmann.
3) XDP cpumap teardown handling wrt. execution contexts, from Jesper
Dangaard Brouer.
4) Fix loss of runtime PM on failed vlan add/del, from Ivan
Khoronzhuk.
5) xen-netfront caches skb_shinfo(skb) across a __pskb_pull_tail()
call, which potentially changes the skb's data buffer, and thus
skb_shinfo(). Fix from Juergen Gross"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
xen/netfront: don't cache skb_shinfo()
net: ethernet: ti: cpsw: fix runtime_pm while add/kill vlan
net: ethernet: ti: cpsw: clear all entries when delete vid
xdp: fix bug in devmap teardown code path
samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
xdp: fix bug in cpumap teardown code path
bpf, sockmap: fix cork timeout for select due to epipe
bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
bpf: btf: Change tools/lib/bpf/btf to LGPL
|
|
skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
its return value.
Cc: [email protected]
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Wei Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Grygorii Strashko says:
====================
net: ethernet: ti: cpsw: fix runtime pm while add/del reserved vid
Here 2 not critical fixes for:
- vlan ale table leak while error if deleting vlan (simplifies next fix)
- runtime pm while try to set reserved vlan
====================
Reviewed-by: Grygorii Strashko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
It's exclusive with normal behaviour but if try to set vlan to one of
the reserved values is made, the cpsw runtime pm is broken.
Fixes: a6c5d14f5136 ("drivers: net: cpsw: ndev: fix accessing to suspended device")
Signed-off-by: Ivan Khoronzhuk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In cases if some of the entries were not found in forwarding table
while killing vlan, the rest not needed entries still left in the
table. No need to stop, as entry was deleted anyway. So fix this by
returning error only after all was cleaned. To implement this, return
-ENOENT in cpsw_ale_del_mcast() as it's supposed to be.
Signed-off-by: Ivan Khoronzhuk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If zram supports writeback feature, it's no longer a
BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations
for incompressible pages.
Do not pretend to be synchronous IO device. It makes the system very
sluggish due to waiting for IO completion from upper layers.
Furthermore, it causes a user-after-free problem because swap thinks the
opearion is done when the IO functions returns so it can free the page
(e.g., lock_page_or_retry and goto out_release in do_swap_page) but in
fact, IO is asynchronous so the driver could access a just freed page
afterward.
This patch fixes the problem.
BUG: Bad page state in process qemu-system-x86 pfn:3dfab21
page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1
flags: 0x17fffc000000008(uptodate)
raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
bad because of flags: 0x8(uptodate)
CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G B 4.18.0-rc5+ #1
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
Call Trace:
dump_stack+0x5c/0x7b
bad_page+0xba/0x120
get_page_from_freelist+0x1016/0x1250
__alloc_pages_nodemask+0xfa/0x250
alloc_pages_vma+0x7c/0x1c0
do_swap_page+0x347/0x920
__handle_mm_fault+0x7b4/0x1110
handle_mm_fault+0xfc/0x1f0
__get_user_pages+0x12f/0x690
get_user_pages_unlocked+0x148/0x1f0
__gfn_to_pfn_memslot+0xff/0x3c0 [kvm]
try_async_pf+0x87/0x230 [kvm]
tdp_page_fault+0x132/0x290 [kvm]
kvm_mmu_page_fault+0x74/0x570 [kvm]
kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm]
kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
do_vfs_ioctl+0xa2/0x630
ksys_ioctl+0x70/0x80
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Link: https://lore.kernel.org/lkml/[email protected]/
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix changelog, add comment]
Link: https://lore.kernel.org/lkml/[email protected]/
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: coding-style fixes]
Signed-off-by: Minchan Kim <[email protected]>
Reported-by: Tino Lehnig <[email protected]>
Tested-by: Tino Lehnig <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: <[email protected]> [4.15+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ioremap_prot() can return NULL which could lead to an oops.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: chen jie <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: chenjie <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With gcc-8 fsanitize=null become very noisy. GCC started to complain
about things like &a->b, where 'a' is NULL pointer. There is no NULL
dereference, we just calculate address to struct member. It's
technically undefined behavior so UBSAN is correct to report it. But as
long as there is no real NULL-dereference, I think, we should be fine.
-fno-delete-null-pointer-checks compiler flag should protect us from any
consequences. So let's just no use -fsanitize=null as it's not useful
for us. If there is a real NULL-deref we will see crash. Even if
userspace mapped something at NULL (root can do this), with things like
SMAP should catch the issue.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This entry was created with my personal e-mail address. Update this entry
to my open-source kernel.org account.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kieran Bingham <[email protected]>
Cc: Jan Kiszka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch fixes following smatch warnings:
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2826 bnxt_fill_coredump_seg_hdr() error: strcpy() '"sEgM"' too large for 'seg_hdr->signature' (5 vs 4)
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2858 bnxt_fill_coredump_record() error: strcpy() '"cOrE"' too large for 'record->signature' (5 vs 4)
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2879 bnxt_fill_coredump_record() error: strcpy() 'utsname()->sysname' too large for 'record->os_name' (65 vs 32)
Fixes: 6c5657d085ae ("bnxt_en: Add support for ethtool get dump.")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Vasundhara Volam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Martin KaFai Lau says:
====================
This series introduces a new map type "BPF_MAP_TYPE_REUSEPORT_SOCKARRAY"
and a new prog type BPF_PROG_TYPE_SK_REUSEPORT.
Here is a snippet from a commit message:
"To unleash the full potential of a bpf prog, it is essential for the
userspace to be capable of directly setting up a bpf map which can then
be consumed by the bpf prog to make decision. In this case, decide which
SO_REUSEPORT sk to serve the incoming request.
By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control
and visibility on where a SO_REUSEPORT sk should be located in a bpf map.
The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that
the bpf prog can directly select a sk from the bpf map. That will
raise the programmability of the bpf prog attached to a reuseport
group (a group of sk serving the same IP:PORT).
For example, in UDP, the bpf prog can peek into the payload (e.g.
through the "data" pointer introduced in the later patch) to learn
the application level's connection information and then decide which sk
to pick from a bpf map. The userspace can tightly couple the sk's location
in a bpf map with the application logic in generating the UDP payload's
connection information. This connection info contact/API stays within the
userspace.
Also, when used with map-in-map, the userspace can switch the
old-server-process's inner map to a new-server-process's inner map
in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)".
The bpf prog will then direct incoming requests to the new process instead
of the old process. The old process can finish draining the pending
requests (e.g. by "accept()") before closing the old-fds. [Note that
deleting a fd from a bpf map does not necessary mean the fd is closed]"
====================
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
This patch add tests for the new BPF_PROG_TYPE_SK_REUSEPORT.
The tests cover:
- IPv4/IPv6 + TCP/UDP
- TCP syncookie
- TCP fastopen
- Cases when the bpf_sk_select_reuseport() returning errors
- Cases when the bpf prog returns SK_DROP
- Values from sk_reuseport_md
- outer_map => reuseport_array
The test depends on
commit 3eee1f75f2b9 ("bpf: fix bpf_skb_load_bytes_relative pkt length check")
Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
This patch adds tests for the new BPF_MAP_TYPE_REUSEPORT_SOCKARRAY.
Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
This patch sync include/uapi/linux/bpf.h to
tools/include/uapi/linux/
Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
This patch refactors the ARRAY_SIZE macro to bpf_util.h.
Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|