Age | Commit message (Collapse) | Author | Files | Lines |
|
Huazhong Tan says:
====================
net: hns3: bug fix & optimization for HNS3 driver
This patchset presents a bug fix found out when CONFIG_ARM64_64K_PAGES
enable and an optimization for HNS3 driver.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
'truesize' is supposed to be u32, not int, so fix it.
Signed-off-by: Huazhong tan <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of
PAGE_SIZE is 65536(64K). But the type of page_offset is u16, it will
overflow. So change it to u32, when "CONFIG_ARM64_64K_PAGES" enabled.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
forgot to handle anycast route and init anycast rt->dst.input to ip6_forward.
Fix it by setting anycast rt->dst.input back to ip6_input.
Fixes: 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Huazhong Tan says:
====================
net: hns: bug fixes & optimization for HNS driver
This patchset presents some bug fixes found out when CONFIG_ARM64_64K_PAGES
enable and an optimization for HNS driver.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Update hns to drop the hns_nic_get_headlen function in favour of
eth_get_headlen, and hence also removes now redundant hns_nic_get_headlen.
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
skb->truesize is not meant to be tracking amount of used bytes in a skb,
but amount of reserved/consumed bytes in memory.
For instance, if we use a single byte in last page fragment, we have to
account the full size of the fragment.
So skb_add_rx_frag needs to calculate the length of the entire buffer into
turesize.
Fixes: 9cbe9fd5214e ("net: hns: optimize XGE capability by reducing cpu usage")
Signed-off-by: Huazhong tan <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
'truesize' is supposed to be u32, not int, so fix it.
Signed-off-by: Huazhong tan <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length and page_offset are u16, they will
overflow. So change them to u32.
Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support")
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Kevin Yang says:
====================
tcp_bbr: PROBE_RTT minor bug fixes
This series includes two minor bug fixes for the TCP BBR PROBE_RTT
mechanism, and one preparatory patch:
(1) A preparatory patch to reorganize the PROBE_RTT logic by refactoring
(into its own function) the code to exit PROBE_RTT, since the next
patch will be using that code in a new context.
(2) Fix: When BBR restarts from idle and if BBR is in PROBE_RTT mode,
BBR should check if it's time to exit PROBE_RTT. If yes, then BBR
should exit PROBE_RTT mode and restore the cwnd to its full value.
(3) Fix: Apply the PROBE_RTT cwnd cap even if the count of fully-ACKed
packets is 0.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
This commit fixes a corner case where TCP BBR would enter PROBE_RTT
mode but not reduce its cwnd. If a TCP receiver ACKed less than one
full segment, the number of delivered/acked packets was 0, so that
bbr_set_cwnd() would short-circuit and exit early, without cutting
cwnd to the value we want for PROBE_RTT.
The fix is to instead make sure that even when 0 full packets are
ACKed, we do apply all the appropriate caps, including the cap that
applies in PROBE_RTT mode.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Reviewed-by: Yuchung Cheng <[email protected]>
Reviewed-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch fix the case where BBR does not exit PROBE_RTT mode when
it restarts from idle. When BBR restarts from idle and if BBR is in
PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If
yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its
full value.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Reviewed-by: Yuchung Cheng <[email protected]>
Reviewed-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch add a helper function bbr_check_probe_rtt_done() to
1. check the condition to see if bbr should exit probe_rtt mode;
2. process the logic of exiting probe_rtt mode.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Reviewed-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
to send some control packets.
1) RST packets, through tcp_v4_send_reset()
2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()
These packets assert IP_DF, and also use the hashed IP ident generator
to provide an IPv4 ID number.
Geoff Alexander reported this could be used to build off-path attacks.
These packets should not be fragmented, since their size is smaller than
IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
regardless of inner IPID.
We really can use zero IPID, to address the flaw, and as a bonus,
avoid a couple of atomic operations in ip_idents_reserve()
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Geoff Alexander <[email protected]>
Tested-by: Geoff Alexander <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
All the 3 callers of addrconf_add_mroute() assert RTNL
lock, they don't take any additional lock either, so
it is safe to convert it to GFP_KERNEL.
Same for sit_add_v4_addrs().
Cc: David Ahern <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The new tcf_exts_for_each_action() macro doesn't reference its
arguments when CONFIG_NET_CLS_ACT is disabled, which leads to
a harmless warning in at least one driver:
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c: In function 'tc_fill_actions':
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:64:6: error: unused variable 'i' [-Werror=unused-variable]
Adding a cast to void lets us avoid this kind of warning.
To be on the safe side, do it for all three arguments, not
just the one that caused the warning.
Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The TC filter flow mapping override completely skipped the call to
cake_hash(); however that meant that the internal state was not being
updated, which ultimately leads to deadlocks in some configurations. Fix
that by passing the overridden flow ID into cake_hash() instead so it can
react appropriately.
In addition, the major number of the class ID can now be set to override
the host mapping in host isolation mode. If both host and flow are
overridden (or if the respective modes are disabled), flow dissection and
hashing will be skipped entirely; otherwise, the hashing will be kept for
the portions that are not set by the filter.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The ncsi_pkg_info_all_nl() .dumpit handler is missing the NLM_F_MULTI
flag, causing additional package information after the first to be lost.
Also fixup a sanity check in ncsi_write_package_info() to reject out of
range package IDs.
Signed-off-by: Samuel Mendoza-Jonas <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Wolfram Sang <[email protected]>
Acked-by: Sergei Shtylyov <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Cong Wang says:
====================
net_sched: pending clean up and bug fixes
This patchset aims to clean up and fixes some bugs in current
merge window, this is why it is targeting -net.
Patch 1-5 are clean up Vlad's patches merged in current merge
window, patch 6 is just a trivial cleanup.
Patch 7 reverts a lockdep warning fix and patch 8 provides a better
fix for it.
Patch 9 fixes a potential deadlock found by me during code review.
Please see each patch for details.
====================
Signed-off-by: Cong Wang <[email protected]>
|
|
use_all_metadata() acquires read_lock(&ife_mod_lock), then calls
add_metainfo() which calls find_ife_oplist() which acquires the same
lock again. Deadlock!
Introduce __add_metainfo() which accepts struct tcf_meta_ops *ops
as an additional parameter and let its callers to decide how
to find it. For use_all_metadata(), it already has ops, no
need to find it again, just call __add_metainfo() directly.
And, as ife_mod_lock is only needed for find_ife_oplist(),
this means we can make non-atomic allocation for populate_metalist()
now.
Fixes: 817e9f2c5c26 ("act_ife: acquire ife_mod_lock before reading ifeoplist")
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The only time we need to take tcfa_lock is when adding
a new metainfo to an existing ife->metalist. We don't need
to take tcfa_lock so early and so broadly in tcf_ife_init().
This means we can always take ife_mod_lock first, avoid the
reverse locking ordering warning as reported by Vlad.
Reported-by: Vlad Buslov <[email protected]>
Tested-by: Vlad Buslov <[email protected]>
Cc: Vlad Buslov <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit 42c625a486f3 ("net: sched: act_ife: disable bh
when taking ife_mod_lock"), because what ife_mod_lock protects
is absolutely not touched in rate est timer BH context, they have
no race.
A better fix is following up.
Cc: Vlad Buslov <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After commit 90b73b77d08e, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.
Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <[email protected]>
Cc: Vlad Buslov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
tcf_idr_check() is replaced by tcf_idr_check_alloc(),
and __tcf_idr_check() now can be folded into tcf_idr_search().
Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Cc: Jiri Pirko <[email protected]>
Cc: Vlad Buslov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fixes: 16af6067392c ("net: sched: implement reference counted action release")
Cc: Jiri Pirko <[email protected]>
Cc: Vlad Buslov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
All ops->delete() wants is getting the tn->idrinfo, but we already
have tc_action before calling ops->delete(), and tc_action has
a pointer ->idrinfo.
More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().
So it can be just removed.
Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko <[email protected]>
Cc: Vlad Buslov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
tcf_action_put_many() is mostly called to clean up actions on
failure path, but tcf_action_put_many(&actions[acts_deleted]) is
used in the ugliest way: it passes a slice of the array and
uses an additional NULL at the end to avoid out-of-bound
access.
acts_deleted is completely unnecessary since we can teach
tcf_action_put_many() scan the whole array and checks against
NULL pointer. Which also means tcf_action_delete() should
set deleted action pointers to NULL to avoid double free.
Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <[email protected]>
Cc: Vlad Buslov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Registering another device with same MAC address (such as TAP, VPN or
DPDK KNI) will confuse the VF autobinding logic. Restrict the search
to only run if the device is known to be a PCI attached VF.
Fixes: e8ff40d4bff1 ("hv_netvsc: improve VF device matching")
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remove duplicated include.
Signed-off-by: Yue Haibing <[email protected]>
Acked-by: Sowmini Varadhan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remove including <linux/version.h> that don't need it.
Signed-off-by: Yue Haibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remove duplicated include.
Signed-off-by: Yue Haibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Prior to the introduction of fib6_info lwtstate was managed by the dst
code. With fib6_info releasing lwtstate needs to be done when the struct
is freed.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There's a new Dell TB16 dock with a different iSerialNumber.
Apply the same fix from commit 0b1655143df0 ("r8152: disable RX
aggregation on Dell TB16 dock") to this model.
BugLink: https://bugs.launchpad.net/bugs/1785780
Signed-off-by: Kai-Heng Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Tomer Tayar says:
====================
qed: Misc fixes in the interface with the MFW
This patch series fixes several issues in the driver's interface with the
management FW (MFW).
v1->v2:
- Fix loop counter decrement to be pre instead of post.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Keep sending mailbox commands to the MFW when it is not responsive ends up
with a redundant amount of timeout expiries.
This patch prints the MCP status on the first command which is not
responded, and blocks the following commands.
Since the (un)load request commands might be not responded due to other
PFs, the patch also adds the option to skip the blocking upon a failure.
Signed-off-by: Tomer Tayar <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The MFW manages an internal lock to prevent concurrent hardware
(de)initialization of different PFs.
This, together with the busy-waiting for the MFW's responses for commands,
might lead to a deadlock during concurrent load or unload of PFs.
This patch adds the option to sleep within the busy-waiting, and uses it
for the (un)load requests (which are not sent from an interrupt context) to
prevent the possible deadlock.
Signed-off-by: Tomer Tayar <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Successive iterations of halting and resuming the management chip (MCP)
might fail, since currently the driver doesn't wait for these operations to
actually take place.
This patch prevents the driver from moving forward before the operations
are reflected in the state register.
Signed-off-by: Tomer Tayar <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The MFW might be reset and re-update its shared memory.
Upon the detection of such a reset the driver rereads this memory, but it
has to wait till the data is valid.
This patch adds the missing wait for a data ready indication.
Signed-off-by: Tomer Tayar <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If load ip6_vti module and create a network namespace when set
fb_tunnels_only_for_init_net to 1, then exit the namespace will
cause following crash:
[ 6601.677036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 6601.679057] PGD 8000000425eca067 P4D 8000000425eca067 PUD 424292067 PMD 0
[ 6601.680483] Oops: 0000 [#1] SMP PTI
[ 6601.681223] CPU: 7 PID: 93 Comm: kworker/u16:1 Kdump: loaded Tainted: G E 4.18.0+ #3
[ 6601.683153] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 6601.684919] Workqueue: netns cleanup_net
[ 6601.685742] RIP: 0010:vti6_exit_batch_net+0x87/0xd0 [ip6_vti]
[ 6601.686932] Code: 7b 08 48 89 e6 e8 b9 ea d3 dd 48 8b 1b 48 85 db 75 ec 48 83 c5 08 48 81 fd 00 01 00 00 75 d5 49 8b 84 24 08 01 00 00 48 89 e6 <48> 8b 78 08 e8 90 ea d3 dd 49 8b 45 28 49 39 c6 4c 8d 68 d8 75 a1
[ 6601.690735] RSP: 0018:ffffa897c2737de0 EFLAGS: 00010246
[ 6601.691846] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200
[ 6601.693324] RDX: 0000000000000015 RSI: ffffa897c2737de0 RDI: ffffffff9f2ea9e0
[ 6601.694824] RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000000
[ 6601.696314] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8dc323c07e00
[ 6601.697812] R13: ffff8dc324a63100 R14: ffffa897c2737e30 R15: ffffa897c2737e30
[ 6601.699345] FS: 0000000000000000(0000) GS:ffff8dc33fdc0000(0000) knlGS:0000000000000000
[ 6601.701068] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6601.702282] CR2: 0000000000000008 CR3: 0000000424966002 CR4: 00000000001606e0
[ 6601.703791] Call Trace:
[ 6601.704329] cleanup_net+0x1b4/0x2c0
[ 6601.705268] process_one_work+0x16c/0x370
[ 6601.706145] worker_thread+0x49/0x3e0
[ 6601.706942] kthread+0xf8/0x130
[ 6601.707626] ? rescuer_thread+0x340/0x340
[ 6601.708476] ? kthread_bind+0x10/0x10
[ 6601.709266] ret_from_fork+0x35/0x40
Reproduce:
modprobe ip6_vti
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
unshare -n
exit
This because ip6n->tnls_wc[0] point to fallback device in default, but
in non-default namespace, ip6n->tnls_wc[0] will be NULL, so add the NULL
check comparatively.
Fixes: e2948e5af8ee ("ip6_vti: fix creating fallback tunnel device for vti6")
Signed-off-by: Haishuang Yan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull networking fixes from David Miller:
1) Fix races in IPVS, from Tan Hu.
2) Missing unbind in matchall classifier, from Hangbin Liu.
3) Missing act_ife action release, from Vlad Buslov.
4) Cure lockdep splats in ila, from Cong Wang.
5) veth queue leak on link delete, from Toshiaki Makita.
6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From
Kees Cook.
7) RCU usage fixup in XDP, from Tariq Toukan.
8) Two TCP ULP fixes from Daniel Borkmann.
9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner
Kallweit.
10) Always take tcf_lock with BH disabled, otherwise we can deadlock
with rate estimator code paths. From Vlad Buslov.
11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly.
From Jian-Hong Pan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
ip6_vti: fix creating fallback tunnel device for vti6
ip_vti: fix a null pointer deferrence when create vti fallback tunnel
r8169: don't use MSI-X on RTL8106e
net: lan743x_ptp: convert to ktime_get_clocktai_ts64
net: sched: always disable bh when taking tcf_lock
ip6_vti: simplify stats handling in vti6_xmit
bpf: fix redirect to map under tail calls
r8169: add missing Kconfig dependency
tools/bpf: fix bpf selftest test_cgroup_storage failure
bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
bpf, sockmap: fix map elem deletion race with smap_stop_sock
bpf, sockmap: fix leakage of smap_psock_map_entry
tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
tcp, ulp: add alias for all ulp modules
bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM
net/xdp: Fix suspicious RCU usage warning
net/mlx5e: Delete unneeded function argument
Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb
isdn: Disable IIOCDBGVAR
...
|
|
When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
device for vti6 when a new namespace is created.
Tested:
[root@builder2 ~]# modprobe ip6_tunnel
[root@builder2 ~]# modprobe ip6_vti
[root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
[root@builder2 ~]# unshare -n
[root@builder2 ~]# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Signed-off-by: Haishuang Yan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After set fb_tunnels_only_for_init_net to 1, the itn->fb_tunnel_dev will
be NULL and will cause following crash:
[ 2742.849298] BUG: unable to handle kernel NULL pointer dereference at 0000000000000941
[ 2742.851380] PGD 800000042c21a067 P4D 800000042c21a067 PUD 42aaed067 PMD 0
[ 2742.852818] Oops: 0002 [#1] SMP PTI
[ 2742.853570] CPU: 7 PID: 2484 Comm: unshare Kdump: loaded Not tainted 4.18.0-rc8+ #2
[ 2742.855163] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2742.856970] RIP: 0010:vti_init_net+0x3a/0x50 [ip_vti]
[ 2742.858034] Code: 90 83 c0 48 c7 c2 20 a1 83 c0 48 89 fb e8 6e 3b f6 ff 85 c0 75 22 8b 0d f4 19 00 00 48 8b 93 00 14 00 00 48 8b 14 ca 48 8b 12 <c6> 82 41 09 00 00 04 c6 82 38 09 00 00 45 5b c3 66 0f 1f 44 00 00
[ 2742.861940] RSP: 0018:ffff9be28207fde0 EFLAGS: 00010246
[ 2742.863044] RAX: 0000000000000000 RBX: ffff8a71ebed4980 RCX: 0000000000000013
[ 2742.864540] RDX: 0000000000000000 RSI: 0000000000000013 RDI: ffff8a71ebed4980
[ 2742.866020] RBP: ffff8a71ea717000 R08: ffffffffc083903c R09: ffff8a71ea717000
[ 2742.867505] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a71ebed4980
[ 2742.868987] R13: 0000000000000013 R14: ffff8a71ea5b49c0 R15: 0000000000000000
[ 2742.870473] FS: 00007f02266c9740(0000) GS:ffff8a71ffdc0000(0000) knlGS:0000000000000000
[ 2742.872143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2742.873340] CR2: 0000000000000941 CR3: 000000042bc20006 CR4: 00000000001606e0
[ 2742.874821] Call Trace:
[ 2742.875358] ops_init+0x38/0xf0
[ 2742.876078] setup_net+0xd9/0x1f0
[ 2742.876789] copy_net_ns+0xb7/0x130
[ 2742.877538] create_new_namespaces+0x11a/0x1d0
[ 2742.878525] unshare_nsproxy_namespaces+0x55/0xa0
[ 2742.879526] ksys_unshare+0x1a7/0x330
[ 2742.880313] __x64_sys_unshare+0xe/0x20
[ 2742.881131] do_syscall_64+0x5b/0x180
[ 2742.881933] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reproduce:
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
modprobe ip_vti
unshare -n
Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Haishuang Yan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Found the ethernet network on ASUS X441UAR doesn't come back on resume
from suspend when using MSI-X. The chip is RTL8106e - version 39.
[ 21.848357] libphy: r8169: probed
[ 21.848473] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID
44900000, IRQ 127
[ 22.518860] r8169 0000:02:00.0 enp2s0: renamed from eth0
[ 29.458041] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[ 63.227398] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[ 124.514648] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
Here is the ethernet controller in detail:
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: r8169
Kernel modules: r8169
Falling back to MSI fixes the issue.
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Signed-off-by: Jian-Hong Pan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
timekeeping_clocktai64() has been renamed to ktime_get_clocktai_ts64()
for consistency with the other ktime_get_* access functions.
Rename the new caller that has come up as well.
Question: this is the only ptp driver that sets the hardware time
to the current system time in TAI. Why does it do that?
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Recently, ops->init() and ops->dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.
Change ops->init() and ops->dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:
[ 105.470398] ================================
[ 105.475014] WARNING: inconsistent lock state
[ 105.479628] 4.18.0-rc8+ #664 Not tainted
[ 105.483897] --------------------------------
[ 105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[ 105.509696] {SOFTIRQ-ON-W} state was registered at:
[ 105.514925] _raw_spin_lock+0x2c/0x40
[ 105.519022] tcf_bpf_init+0x579/0x820 [act_bpf]
[ 105.523990] tcf_action_init_1+0x4e4/0x660
[ 105.528518] tcf_action_init+0x1ce/0x2d0
[ 105.532880] tcf_exts_validate+0x1d8/0x200
[ 105.537416] fl_change+0x55a/0x268b [cls_flower]
[ 105.542469] tc_new_tfilter+0x748/0xa20
[ 105.546738] rtnetlink_rcv_msg+0x56a/0x6d0
[ 105.551268] netlink_rcv_skb+0x18d/0x200
[ 105.555628] netlink_unicast+0x2d0/0x370
[ 105.559990] netlink_sendmsg+0x3b9/0x6a0
[ 105.564349] sock_sendmsg+0x6b/0x80
[ 105.568271] ___sys_sendmsg+0x4a1/0x520
[ 105.572547] __sys_sendmsg+0xd7/0x150
[ 105.576655] do_syscall_64+0x72/0x2c0
[ 105.580757] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 105.586243] irq event stamp: 489296
[ 105.590084] hardirqs last enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
[ 105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
[ 105.609277] softirqs last enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
[ 105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
[ 105.626813]
other info that might help us debug this:
[ 105.633976] Possible unsafe locking scenario:
[ 105.640526] CPU0
[ 105.643325] ----
[ 105.646125] lock(&(&p->tcfa_lock)->rlock);
[ 105.650747] <Interrupt>
[ 105.653717] lock(&(&p->tcfa_lock)->rlock);
[ 105.658514]
*** DEADLOCK ***
[ 105.665349] 1 lock held by swapper/16/0:
[ 105.669629] #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
[ 105.678200]
stack backtrace:
[ 105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[ 105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 105.698626] Call Trace:
[ 105.701421] <IRQ>
[ 105.703791] dump_stack+0x92/0xeb
[ 105.707461] print_usage_bug+0x336/0x34c
[ 105.711744] mark_lock+0x7c9/0x980
[ 105.715500] ? print_shortest_lock_dependencies+0x2e0/0x2e0
[ 105.721424] ? check_usage_forwards+0x230/0x230
[ 105.726315] __lock_acquire+0x923/0x26f0
[ 105.730597] ? debug_show_all_locks+0x240/0x240
[ 105.735478] ? mark_lock+0x493/0x980
[ 105.739412] ? check_chain_key+0x140/0x1f0
[ 105.743861] ? __lock_acquire+0x836/0x26f0
[ 105.748323] ? lock_acquire+0x12e/0x290
[ 105.752516] lock_acquire+0x12e/0x290
[ 105.756539] ? est_fetch_counters+0x3c/0xa0
[ 105.761084] _raw_spin_lock+0x2c/0x40
[ 105.765099] ? est_fetch_counters+0x3c/0xa0
[ 105.769633] est_fetch_counters+0x3c/0xa0
[ 105.773995] est_timer+0x87/0x390
[ 105.777670] ? est_fetch_counters+0xa0/0xa0
[ 105.782210] ? lock_acquire+0x12e/0x290
[ 105.786410] call_timer_fn+0x161/0x550
[ 105.790512] ? est_fetch_counters+0xa0/0xa0
[ 105.795055] ? del_timer_sync+0xd0/0xd0
[ 105.799249] ? __lock_is_held+0x93/0x110
[ 105.803531] ? mark_held_locks+0x20/0xe0
[ 105.807813] ? _raw_spin_unlock_irq+0x29/0x40
[ 105.812525] ? est_fetch_counters+0xa0/0xa0
[ 105.817069] ? est_fetch_counters+0xa0/0xa0
[ 105.821610] run_timer_softirq+0x3c4/0x9f0
[ 105.826064] ? lock_acquire+0x12e/0x290
[ 105.830257] ? __bpf_trace_timer_class+0x10/0x10
[ 105.835237] ? __lock_is_held+0x25/0x110
[ 105.839517] __do_softirq+0x11d/0x7bf
[ 105.843542] irq_exit+0x140/0x190
[ 105.847208] smp_apic_timer_interrupt+0xac/0x3b0
[ 105.852182] apic_timer_interrupt+0xf/0x20
[ 105.856628] </IRQ>
[ 105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[ 105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[ 105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
[ 105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
[ 105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
[ 105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[ 105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
[ 105.929988] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.935232] do_idle+0x28e/0x320
[ 105.938817] ? arch_cpu_idle_exit+0x40/0x40
[ 105.943361] ? mark_lock+0x8c1/0x980
[ 105.947295] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.952619] cpu_startup_entry+0xc2/0xd0
[ 105.956900] ? cpu_in_idle+0x20/0x20
[ 105.960830] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.966146] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.971391] start_secondary+0x2b5/0x360
[ 105.975669] ? set_cpu_sibling_map+0x1330/0x1330
[ 105.980654] secondary_startup_64+0xa5/0xb0
Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:
[ 162.108959] Possible interrupt unsafe locking scenario:
[ 162.116386] CPU0 CPU1
[ 162.121277] ---- ----
[ 162.126162] lock(psample_groups_lock);
[ 162.130447] local_irq_disable();
[ 162.136772] lock(&(&p->tcfa_lock)->rlock);
[ 162.143957] lock(psample_groups_lock);
[ 162.150813] <Interrupt>
[ 162.153808] lock(&(&p->tcfa_lock)->rlock);
[ 162.158608]
*** DEADLOCK ***
In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.
Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull first set of KVM updates from Paolo Bonzini:
"PPC:
- minor code cleanups
x86:
- PCID emulation and CR3 caching for shadow page tables
- nested VMX live migration
- nested VMCS shadowing
- optimized IPI hypercall
- some optimizations
ARM will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits)
kvm: x86: Set highest physical address bits in non-present/reserved SPTEs
KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
KVM: X86: Implement PV IPIs in linux guest
KVM: X86: Add kvm hypervisor init time platform setup callback
KVM: X86: Implement "send IPI" hypercall
KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
KVM: x86: Skip pae_root shadow allocation if tdp enabled
KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
KVM: vmx: move struct host_state usage to struct loaded_vmcs
KVM: vmx: compute need to reload FS/GS/LDT on demand
KVM: nVMX: remove a misleading comment regarding vmcs02 fields
KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
KVM: vmx: refactor segmentation code in vmx_save_host_state()
kvm: nVMX: Fix fault priority for VMX operations
kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains some major improvements to the RISC-V port, including
the necessary interrupt controller and timer support to actually make
it to userspace. Support for three devices has been added:
- the ISA-mandated timers on RISC-V systems.
- the ISA-mandated first-level interrupt controller on RISC-V
systems, which is handled as part of our core arch code because
it's very small and tightly tied to the ISA.
- SiFive's platform-level interrupt controller, which talks to the
actual devices.
In addition to these new devices, there are a handful of cleanups all
over the RISC-V tree:
- build fixes for various configurations:
* A fix to the vDSO build's makefile so it respects CFLAGS.
* The addition of __lshrti3, a libgcc derived function necessary
for some 32-bit configurations.
* !SMP && PERF_EVENTS
- Cleanups to the arch code to remove the remnants of old versions of
the drivers that were just properly submitted.
* Some dead code from the timer driver, most of which wasn't ever
even compiled.
* Cleanups of some interrupt #defines, which are now local to the
interrupt handling code.
- Fixes to ptrace(), which while not being sufficient to fully make
GDB work are at least sufficient to get simple GDB tasks to work.
- Early printk support via RISC-V's architecturally mandated SBI
console device.
- A fix to our early debug trap handler to ensure it's always
aligned.
These patches have all been through a fairly extensive review process,
but as this enables a whole pile of functionality (ie, userspace) I'm
confident we'll need to submit a few more patches. The only concrete
issues I know about are the sys_riscv_flush_icache patches, but as I
managed to screw those up on Friday I figured it'd be best to let them
bake another week.
This tag boots a Fedora root filesystem on QEMU's master branch for
me, and before this morning's rebase (from 4.18-rc8 to 4.18) it booted
on the HiFive Unleashed.
Thanks to Christoph Hellwig and the other guys at WD for getting the
new drivers in shape!"
* tag 'riscv-for-linus-4.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
dt-bindings: interrupt-controller: SiFive Plaform Level Interrupt Controller
dt-bindings: interrupt-controller: RISC-V local interrupt controller
RISC-V: Fix !CONFIG_SMP compilation error
irqchip: add a SiFive PLIC driver
RISC-V: Add the directive for alignment of stvec's value
clocksource: new RISC-V SBI timer driver
RISC-V: implement low-level interrupt handling
RISC-V: add a definition for the SIE SEIE bit
RISC-V: remove INTERRUPT_CAUSE_* defines from asm/irq.h
RISC-V: simplify software interrupt / IPI code
RISC-V: remove timer leftovers
RISC-V: Add early printk support via the SBI console
RISC-V: Don't increment sepc after breakpoint.
RISC-V: implement __lshrti3.
RISC-V: Use KBUILD_CFLAGS instead of KCFLAGS when building the vDSO
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull UIO fix from Greg KH:
"Here is a single UIO fix that I forgot to send before 4.18-final came
out. It reverts a UIO patch that went in the 4.18 development window
that was causing problems.
This patch has been in linux-next for a while with no problems, I just
forgot to send it earlier, or as part of the larger char/misc patch
series from yesterday, my fault"
* tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "uio: use request_threaded_irq instead"
|