Age | Commit message (Collapse) | Author | Files | Lines |
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-07-09
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Lots of libbpf improvements: i) addition of new APIs to attach BPF
programs to tracing entities such as {k,u}probes or tracepoints,
ii) improve specification of BTF-defined maps by eliminating the
need for data initialization for some of the members, iii) addition
of a high-level API for setting up and polling perf buffers for
BPF event output helpers, all from Andrii.
2) Add "prog run" subcommand to bpftool in order to test-run programs
through the kernel testing infrastructure of BPF, from Quentin.
3) Improve verifier for BPF sockaddr programs to support 8-byte stores
for user_ip6 and msg_src_ip6 members given clang tends to generate
such stores, from Stanislav.
4) Enable the new BPF JIT zero-extension optimization for further
riscv64 ALU ops, from Luke.
5) Fix a bpftool json JIT dump crash on powerpc, from Jiri.
6) Fix an AF_XDP race in generic XDP's receive path, from Ilya.
7) Various smaller fixes from Ilya, Yue and Arnd.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Unlike driver mode, generic xdp receive could be triggered
by different threads on different CPU cores at the same time
leading to the fill and rx queue breakage. For example, this
could happen while sending packets from two processes to the
first interface of veth pair while the second part of it is
open with AF_XDP socket.
Need to take a lock for each generic receive to avoid race.
Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support")
Signed-off-by: Ilya Maximets <[email protected]>
Acked-by: Magnus Karlsson <[email protected]>
Tested-by: William Tu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
Make the same support as commit 363887a2cdfe ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing
considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE.
Signed-off-by: Stephen Suryaputra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts
for GRE tunnel") supports multipath policy value of 2, Layer 3 or inner
Layer 3 if present, but it only considers inner IPv4. There is a use
case of IPv6 is tunneled by IPv4 GRE, thus add the ability to hash on
inner IPv6 addresses.
Fixes: 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel")
Signed-off-by: Stephen Suryaputra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The function cache_seq_next is declared static and marked
EXPORT_SYMBOL_GPL, which is at best an odd combination. Because the
function is not used outside of the net/sunrpc/cache.c file it is
defined in, this commit removes the EXPORT_SYMBOL_GPL() marking.
Fixes: d48cf356a130 ("SUNRPC: Remove non-RCU protected lookup")
Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
|
|
Use netif_ovs_is_port() function instead of open code.
This patch doesn't change logic.
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch moves the flush of works after vdev->config->del_vqs(vdev),
because we need to be sure that no workers run before to free the
'vsock' object.
Since we stopped the workers using the [tx|rx|event]_run flags,
we are sure no one is accessing the device while we are calling
vdev->config->reset(vdev), so we can safely move the workers' flush.
Before the vdev->config->del_vqs(vdev), workers can be scheduled
by VQ callbacks, so we must flush them after del_vqs(), to avoid
use-after-free of 'vsock' object.
Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().
This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).
Suggested-by: Stefan Hajnoczi <[email protected]>
Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.
To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.
We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.
We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.
This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.
This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.
This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.
This patch is a modified version of Ivan Khoronzhuk's original patch.
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Reviewed-by: Ilias Apalodimas <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Ivan Khoronzhuk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The frags_q is not properly initialized, it may result in illegal memory
access when conn_info is NULL.
The "goto free_exit" should be replaced by "goto exit".
Signed-off-by: Yang Wei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Move bridge keys in nft_meta to nft_meta_bridge, from wenxu.
2) Support for bridge pvid matching, from wenxu.
3) Support for bridge vlan protocol matching, also from wenxu.
4) Add br_vlan_get_pvid_rcu(), to fetch the bridge port pvid
from packet path.
5) Prefer specific family extension in nf_tables.
6) Autoload specific family extension in case it is missing.
7) Add synproxy support to nf_tables, from Fernando Fernandez Mancera.
8) Support for GRE encapsulation in IPVS, from Vadim Fedorenko.
9) ICMP handling for GRE encapsulation, from Julian Anastasov.
10) Remove unused parameter in nf_queue, from Florian Westphal.
11) Replace seq_printf() by seq_puts() in nf_log, from Markus Elfring.
12) Rename nf_SYNPROXY.h => nf_synproxy.h before this header becomes
public.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided
that it can do a single u64 store into user_ip6[2] instead of two
separate u32 ones:
# 17: (18) r2 = 0x100000000000000
# ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
# 19: (7b) *(u64 *)(r1 +16) = r2
# invalid bpf_context access off=16 size=8
>From the compiler point of view it does look like a correct thing
to do, so let's support it on the kernel side.
Credit to Andrii Nakryiko for a proper implementation of
bpf_ctx_wide_store_ok.
Cc: Andrii Nakryiko <[email protected]>
Cc: Yonghong Song <[email protected]>
Fixes: cd17d7770578 ("bpf/tools: sync bpf.h")
Reported-by: kernel test robot <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST
and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map.
Invalid object maps are not trusted, but still updated. Note that we
never iterate, resize or invalidate object maps. If object-map feature
is enabled but object map fails to load, we just fail the requester
(either "rbd map" or I/O, by way of post-acquire action).
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
We already have one exported wrapper around it for extent.osd_data and
rbd_object_map_update_finish() needs another one for cls.request_data.
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Dongsheng Yang <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
|
|
This will be used for loading object map. rbd_obj_read_sync() isn't
suitable because object map must be accessed through class methods.
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Dongsheng Yang <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
|
|
This list item remained from when we had safe and unsafe replies
(commit vs ack). It has since become a private list item for use by
clients.
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
...ditto for the decode function. We only use these functions to fix
up banner addresses now, so let's name them more appropriately.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Going forward, we'll have different address types so let's use
the addr2 TYPE_LEGACY for internal tracking rather than TYPE_NONE.
Also, make ceph_pr_addr print the address type value as well.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Given the new format, we have to decode the addresses twice. Once to
skip past the new_up_client field, and a second time to collect the
addresses.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
While we're in there, let's also fix up the decoder to do proper
bounds checking.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Switch the MonMap decoder to use the new decoding routine for
entity_addr_t's.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Add a function for decoding an entity_addr_t. Once
CEPH_FEATURE_MSG_ADDR2 is enabled, the server daemons will start
encoding entity_addr_t differently.
Add a new helper function that can handle either format.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
It doesn't make sense to leave it undecoded until later.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
This function is entirely unused.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
bpfilter_umh currently printed all messages to /dev/console and this
might interfere the user activity(*).
This commit changes the output device to /dev/kmsg so that the messages
from bpfilter_umh won't show on the console directly.
(*) https://bugzilla.suse.com/show_bug.cgi?id=1140221
Signed-off-by: Gary Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
David reports that RPC applications which use epoll() occasionally
get stuck, and that TLS ULP causes the kernel to not wake applications,
even though read() will return data.
This is indeed true. The ctx->rx_list which holds partially copied
records is not consulted when deciding whether socket is readable.
Note that SO_RCVLOWAT with epoll() is and has always been broken for
kernel TLS. We'd need to parse all records from the TCP layer, instead
of just the first one.
Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records")
Reported-by: David Beckett <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Dirk van der Merwe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
For these places are protected by rcu_read_lock, we change from
rcu_dereference_rtnl to rcu_dereference, as there is no need to
check if rtnl lock is held.
For these places are protected by rtnl_lock, we change from
rcu_dereference_rtnl to rtnl_dereference/rcu_dereference_protected,
as no extra memory barriers are needed under rtnl_lock() which also
protects tn->bearer_list[] and dev->tipc_ptr/b->media_ptr updating.
rcu_dereference_rtnl will be only used in the places where it could
be under rcu_read_lock or rtnl_lock.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently, scripts/cc-can-link.sh is run just for BPFILTER_UMH, but
defining CC_CAN_LINK will be useful in other places.
Signed-off-by: Masahiro Yamada <[email protected]>
|
|
BLE based 6LoWPAN networks are highly constrained in bandwidth.
Do not take a short-cut, always check if the destination address is
known to belong to a peer.
As a side-effect this also removes any behavioral differences between
one, and two or more connected peers.
Acked-by: Jukka Rissanen <[email protected]>
Tested-by: Michael Scott <[email protected]>
Signed-off-by: Josua Mayer <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Like any IPv6 capable device, 6LNs can have multiple addresses assigned
using SLAAC and made known through neighbour advertisements.
After checking the destination address against all peers link-local
addresses, consult the neighbour cache for additional known addresses.
RFC7668 defines the scope of Neighbor Advertisements in Section 3.2.3:
1. "A Bluetooth LE 6LN MUST NOT register its link-local address"
2. "A Bluetooth LE 6LN MUST register its non-link-local addresses with
the 6LBR by sending Neighbor Solicitation (NS) messages ..."
Due to these constranits both the link-local addresses tracked in the
list of 6lowpan peers, and the neighbour cache have to be used when
identifying the 6lowpan peer for a destination address.
Acked-by: Jukka Rissanen <[email protected]>
Tested-by: Michael Scott <[email protected]>
Signed-off-by: Josua Mayer <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Handle overlooked case where the target address is assigned to a peer
and neither route nor gateway exist.
For one peer, no checks are performed to see if it is meant to receive
packets for a given address.
As soon as there is a second peer however, checks are performed
to deal with routes and gateways for handling complex setups with
multiple hops to a target address.
This logic assumed that no route and no gateway imply that the
destination address can not be reached, which is false in case of a
direct peer.
Acked-by: Jukka Rissanen <[email protected]>
Tested-by: Michael Scott <[email protected]>
Signed-off-by: Josua Mayer <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
Ensure last_used is updated before calling mod_timer inside
xprt_schedule_autodisconnect. This avoids a possible xprt_autoclose
firing immediately after a successful connect when xprt_unlock_connect
calls xprt_schedule_autodisconnect with an old value of last_used.
Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
The "CONFIG_" portion is added automatically, so this was being expanded
into "CONFIG_CONFIG_SUNRPC_DISABLE_INSECURE_ENCTYPES"
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
|
|
Remove the following warning:
net/sunrpc/debugfs.c:13: warning: cannot understand function prototype: 'struct dentry *topdir;
Signed-off-by: Trond Myklebust <[email protected]>
|
|
|
|
Now that a client can have multiple xprts, we need to add
them all to debugs.
The first one is still "xprt"
Subsequent xprts are "xprt1", "xprt2", etc.
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
We often see various error conditions with NFS4.x that show up with
a very high operation count all completing with tk_status < 0 in a
short period of time. Add a count to rpc_iostats to record on a
per-op basis the ops that complete in this manner, which will
enable lower overhead diagnostics.
Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Now that a client can have multiple xprts, we need to
report the statistics for all of them.
Reported-by: Chuck Lever <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Update the printk specifiers inside _print_rpc_iostats to avoid
a checkpatch warning.
Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
For diagnostic purposes, it would be useful to have an rpc_iostats
metric of RPCs completing with tk_status < 0. Unfortunately,
tk_status is reset inside the rpc_call_done functions for each
operation, and the call to tally the per-op metrics comes after
rpc_call_done. Refactor the call to rpc_count_iostat earlier in
rpc_exit_task so we can count these RPCs completing in error.
Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
With NFSv4.1, different network connections need to be explicitly
bound to a session. During session startup, this is not possible
so only a single connection must be used for session startup.
So add a task flag to disable the default round-robin choice of
connections (when nconnect > 1) and force the use of a single
connection.
Then use that flag on all requests for session management - for
consistence, include NFSv4.0 management (SETCLIENTID) and session
destruction
Reported-by: Chuck Lever <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Add an argument to struct rpc_create_args that allows the specification
of how many transport connections you want to set up to the server.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Signed-off-by: Trond Myklebust <[email protected]>
|
|
For now, just count the queue length. It is less accurate than counting
number of bytes queued, but easier to implement.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Replace the direct task wakeups from inside a softirq context with
wakeups from a process context.
Signed-off-by: Trond Myklebust <[email protected]>
|