aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-02-09Merge branch 's390-qeth-fixes'David S. Miller2-7/+14
Julian Wiedmann says: ==================== s390/qeth: fixes 2018-02-09 please apply the following two qeth patches for 4.16 and stable. One restricts a command quirk to the intended commandd type, while the other fixes an off-by-one during data transmission that can cause qeth to build malformed buffer descriptors. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-02-09s390/qeth: fix SETIP command handlingJulian Wiedmann2-6/+13
send_control_data() applies some special handling to SETIP v4 IPA commands. But current code parses *all* command types for the SETIP command code. Limit the command code check to IPA commands. Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command") Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09s390/qeth: fix underestimated count of buffer elementsUrsula Braun1-1/+1
For a memory range/skb where the last byte falls onto a page boundary (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the calculation currently doesn't round up to the next PFN due to an off-by-one error. Thus qeth believes that the skb occupies one page less than it actually does, and may select a IO buffer that doesn't have enough spare buffer elements to fit all of the skb's data. HW detects this as a malformed buffer descriptor, and raises an exception which then triggers device recovery. Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09ptr_ring: try vmalloc() when kmalloc() failsJason Wang1-5/+8
This patch switch to use kvmalloc_array() for using a vmalloc() fallback to help in case kmalloc() fails. Reported-by: [email protected] Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers") Signed-off-by: Jason Wang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZEJason Wang1-0/+2
To avoid slab to warn about exceeded size, fail early if queue occupies more than KMALLOC_MAX_SIZE. Reported-by: [email protected] Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers") Signed-off-by: Jason Wang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09Merge branch 'stmmac-irq-fixes-cleanups'David S. Miller3-9/+8
Niklas Cassel says: ==================== stmmac irq fixes/cleanups A couple of small stmmac irq fixes/cleanups. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-02-09net: stmmac: remove redundant enable of PMT irqNiklas Cassel2-4/+1
For dwmac4, GMAC_INT_DEFAULT_ENABLE already includes GMAC_INT_PMT_EN, so it is redundant to check if hw->pmt is set, and if so, setting the bit again. For dwmac1000, GMAC_INT_DEFAULT_MASK does not include GMAC_INT_DISABLE_PMT, so it is redundant to check if hw->pmt is set, and if so, clearing an already cleared bit. Improve code readability by removing this redundant code. Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4Niklas Cassel2-3/+3
GMAC_INT_DEFAULT_MASK is written to the interrupt enable register. In previous versions of the IP (e.g. dwmac1000), this register was instead an interrupt mask register. To improve clarity and reflect reality, rename GMAC_INT_DEFAULT_MASK to GMAC_INT_DEFAULT_ENABLE. Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09net: stmmac: discard disabled flags in interrupt status registerNiklas Cassel1-2/+4
The interrupt status register in both dwmac1000 and dwmac4 ignores interrupt enable (for dwmac4) / interrupt mask (for dwmac1000). Therefore, if we want to check only the bits that can actually trigger an irq, we have to filter the interrupt status register manually. Commit 0a764db10337 ("stmmac: Discard masked flags in interrupt status register") fixed this for dwmac1000. Fix the same issue for dwmac4. Just like commit 0a764db10337 ("stmmac: Discard masked flags in interrupt status register"), this makes sure that we do not get spurious link up/link down prints. Signed-off-by: Niklas Cassel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09ibmvnic: Reset long term map ID counterThomas Falcon1-0/+1
When allocating RX or TX buffer pools, the driver needs to provide a unique mapping ID to firmware for each pool. This value is assigned using a counter which is incremented after a new pool is created. The ID can be an integer ranging from 1-255. When migrating to a device that requests a different number of queues, this value was not being reset properly. As a result, after enough migrations, the counter exceeded the upper bound and pool creation failed. This is fixed by resetting the counter to one in this case. Signed-off-by: Thomas Falcon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller18-111/+557
Daniel Borkmann says: ==================== pull-request: bpf 2018-02-09 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Two fixes for BPF sockmap in order to break up circular map references from programs attached to sockmap, and detaching related sockets in case of socket close() event. For the latter we get rid of the smap_state_change() and plug into ULP infrastructure, which will later also be used for additional features anyway such as TX hooks. For the second issue, dependency chain is broken up via map release callback to free parse/verdict programs, all from John. 2) Fix a libbpf relocation issue that was found while implementing XDP support for Suricata project. Issue was that when clang was invoked with default target instead of bpf target, then various other e.g. debugging relevant sections are added to the ELF file that contained relocation entries pointing to non-BPF related sections which libbpf trips over instead of skipping them. Test cases for libbpf are added as well, from Jesper. 3) Various misc fixes for bpftool and one for libbpf: a small addition to libbpf to make sure it recognizes all standard section prefixes. Then, the Makefile in bpftool/Documentation is improved to explicitly check for rst2man being installed on the system as we otherwise risk installing empty man pages; the man page for bpftool-map is corrected and a set of missing bash completions added in order to avoid shipping bpftool where the completions are only partially working, from Quentin. 4) Fix applying the relocation to immediate load instructions in the nfp JIT which were missing a shift, from Jakub. 5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL in this case; and explicitly delete the veth devices in the two tests test_xdp_{meta,redirect}.sh before dismantling the netnses as when selftests are run in batch mode, then workqueue to handle destruction might not have finished yet and thus veth creation in next test under same dev name would fail, from Yonghong. 6) Fix test_kmod.sh to check the test_bpf.ko module path before performing an insmod, and fallback to modprobe. Especially the latter is useful when having a device under test that has the modules installed instead, from Naresh. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-02-09Merge branch 'bpf-libbpf-relo-fix-and-tests'Daniel Borkmann5-16/+253
Jesper Dangaard Brouer says: ==================== While playing with using libbpf for the Suricata project, we had issues LLVM >= 4.0.1 generating ELF files that could not be loaded with libbpf (tools/lib/bpf/). During the troubleshooting phase, I wrote a test program and improved the debugging output in libbpf. I turned this into a selftests program, and it also serves as a code example for libbpf in itself. I discovered that there are at least three ELF load issues with libbpf. I left them as TODO comments in (tools/testing/selftests/bpf) test_libbpf.sh. I've only fixed the load issue with eh_frames, and other types of relo-section that does not have exec flags. We can work on the other issues later. ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-09tools/libbpf: handle issues with bpf ELF objects containing .eh_framesJesper Dangaard Brouer1-0/+26
V3: More generic skipping of relo-section (suggested by Daniel) If clang >= 4.0.1 is missing the option '-target bpf', it will cause llc/llvm to create two ELF sections for "Exception Frames", with section names '.eh_frame' and '.rel.eh_frame'. The BPF ELF loader library libbpf fails when loading files with these sections. The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c, handle this gracefully. And iproute2 loader also seems to work with these "eh" sections. The issue in libbpf is caused by bpf_object__elf_collect() skipping some sections, and later when performing relocation it will be pointing to a skipped section, as these sections cannot be found by bpf_object__find_prog_by_idx() in bpf_object__collect_reloc(). This is a general issue that also occurs for other sections, like debug sections which are also skipped and can have relo section. As suggested by Daniel. To avoid keeping state about all skipped sections, instead perform a direct qlookup in the ELF object. Lookup the section that the relo-section points to and check if it contains executable machine instructions (denoted by the sh_flags SHF_EXECINSTR). Use this check to also skip irrelevant relo-sections. Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used due to incompatibility with asm embedded headers, that some of the samples include. This is explained in more details by Yonghong Song in bpf_devel_QA. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-09selftests/bpf: add selftest that use test_libbpf_openJesper Dangaard Brouer2-2/+61
This script test_libbpf.sh will be part of the 'make run_tests' invocation, but can also be invoked manually in this directory, and a verbose mode can be enabled via setting the environment variable $VERBOSE like: $ VERBOSE=yes ./test_libbpf.sh The script contains some tests that are commented out, as they currently fail. They are reminders about what we need to improve for the libbpf loader library. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-09selftests/bpf: add test program for loading BPF ELF filesJesper Dangaard Brouer2-1/+151
V2: Moved program into selftests/bpf from tools/libbpf This program can be used on its own for testing/debugging if a BPF ELF-object file can be loaded with libbpf (from tools/lib/bpf). If something is wrong with the ELF object, the program have a --debug mode that will display the ELF sections and especially the skipped sections. This allows for quickly identifying the problematic ELF section number, which can be corrolated with the readelf tool. The program signal error via return codes, and also have a --quiet mode, which is practical for use in scripts like selftests/bpf. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-09tools/libbpf: improve the pr_debug statements to contain section numbersJesper Dangaard Brouer1-12/+13
While debugging a bpf ELF loading issue, I needed to correlate the ELF section number with the failed relocation section reference. Thus, add section numbers/index to the pr_debug. In debug mode, also print section that were skipped. This helped me identify that a section (.eh_frame) was skipped, and this was the reason the relocation section (.rel.eh_frame) could not find that section number. The section numbers corresponds to the readelf tools Section Headers [Nr]. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-09bpf: Sync kernel ABI header with tooling header for bpf_common.hJesper Dangaard Brouer1-3/+4
I recently fixed up a lot of commits that forgot to keep the tooling headers in sync. And then I forgot to do the same thing in commit cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct that before people notice ;-). Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c3a09 ("bpf: add selftest for tcpbpf"). Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-08net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPTHeiner Kallweit1-1/+1
This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates also PHY state changes and we should do what the symbol says. Fixes: 84a527a41f38 ("net: phylib: fix interrupts re-enablement in phy_start") Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08net: thunder: change q_len's type to handle max ring sizeDean Nelson1-1/+1
The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries per. The number of entires are stored in the q_len member of struct q_desc_mem. The problem is that q_len being a u16, results in 65536 becoming 0. In getting pointers to descriptors in the rings, the driver uses q_len minus 1 as a mask after incrementing the pointer, in order to go back to the beginning and not go past the end of the ring. With the q_len set to 0 the mask is no longer correct and the driver does go beyond the end of the ring, causing various ills. Usually the first thing that shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed out" warning. This patch remedies the problem by changing q_len to a u32. Signed-off-by: Dean Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08Merge tag 'wireless-drivers-next-for-davem-2018-02-08' of ↵David S. Miller18-32/+181
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.16 The most important here is the ssb fix, it has been reported by the users frequently and the fix just missed the final v4.15. Also numerous other fixes, mt76 had multiple problems with aggregation and a long standing unaligned access bug in rtlwifi is finally fixed. Major changes: ath10k * correct firmware RAM dump length for QCA6174/QCA9377 * add new QCA988X device id * fix a kernel panic during pci probe * revert a recent commit which broke ath10k firmware metadata parsing ath9k * fix a noise floor regression introduced during the merge window * add new device id rtlwifi * fix unaligned access seen on ARM architecture mt76 * various aggregation fixes which fix connection stalls ssb * fix b43 and b44 on non-MIPS which broke in v4.15-rc9 ==================== Signed-off-by: David S. Miller <[email protected]>
2018-02-08tipc: fix skb truesize/datasize ratio controlHoang Le1-2/+2
In commit d618d09a68e4 ("tipc: enforce valid ratio between skb truesize and contents") we introduced a test for ensuring that the condition truesize/datasize <= 4 is true for a received buffer. Unfortunately this test has two problems. - Because of the integer arithmetics the test if (skb->truesize / buf_roundup_len(skb) > 4) will miss all ratios [4 < ratio < 5], which was not the intention. - The buffer returned by skb_copy() inherits skb->truesize of the original buffer, which doesn't help the situation at all. In this commit, we change the ratio condition and replace skb_copy() with a call to skb_copy_expand() to finally get this right. Acked-by: Jon Maloy <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08net/sched: cls_u32: fix cls_u32 on filter replaceIvan Vecera1-1/+2
The following sequence is currently broken: # tc qdisc add dev foo ingress # tc filter replace dev foo protocol all ingress \ u32 match u8 0 0 action mirred egress mirror dev bar1 # tc filter replace dev foo protocol all ingress \ handle 800::800 pref 49152 \ u32 match u8 0 0 action mirred egress mirror dev bar2 Error: cls_u32: Key node flags do not match passed flags. We have an error talking to the kernel, -1 The error comes from u32_change() when comparing new and existing flags. The existing ones always contains one of TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state. These flags cannot be passed from userspace so the condition (n->flags != flags) in u32_change() always fails. Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and TCA_CLS_FLAGS_IN_HW are not taken into account. Fixes: 24d3dc6d27ea ("net/sched: cls_u32: Reflect HW offload status") Signed-off-by: Ivan Vecera <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08mpls, nospec: Sanitize array index in mpls_label_ok()Dan Williams1-10/+14
mpls_label_ok() validates that the 'platform_label' array index from a userspace netlink message payload is valid. Under speculation the mpls_label_ok() result may not resolve in the CPU pipeline until after the index is used to access an array element. Sanitize the index to zero to prevent userspace-controlled arbitrary out-of-bounds speculation, a precursor for a speculative execution side channel vulnerability. Cc: <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Dan Williams <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and ↵Sowmini Varadhan11-30/+76
rds connection/workq management An rds_connection can get added during netns deletion between lines 528 and 529 of 506 static void rds_tcp_kill_sock(struct net *net) : /* code to pull out all the rds_connections that should be destroyed */ : 528 spin_unlock_irq(&rds_tcp_conn_lock); 529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) 530 rds_conn_destroy(tc->t_cpath->cp_conn); Such an rds_connection would miss out the rds_conn_destroy() loop (that cancels all pending work) and (if it was scheduled after netns deletion) could trigger the use-after-free. A similar race-window exists for the module unload path in rds_tcp_exit -> rds_tcp_destroy_conns Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled by checking check_net() before enqueuing new work or adding new connections. Concurrency with module-unload is handled by maintaining a module specific flag that is set at the start of the module exit function, and must be checked before enqueuing new work or adding new connections. This commit refactors existing RDS_DESTROY_PENDING checks added by commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") and consolidates all the concurrency checks listed above into the function rds_destroy_pending(). Signed-off-by: Sowmini Varadhan <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08net: Whitelist the skbuff_head_cache "cb" fieldKees Cook1-1/+3
Most callers of put_cmsg() use a "sizeof(foo)" for the length argument. Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a result of the cmsg header calculations. This means that hardened usercopy will examine the copy, even though it was technically a fixed size and should be implicitly whitelisted. All the put_cmsg() calls being built from values in skbuff_head_cache are coming out of the protocol-defined "cb" field, so whitelist this field entirely instead of creating per-use bounce buffers, for which there are concerns about performance. Original report was: Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'skbuff_head_cache' (offset 64, size 16)! WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 mm/usercopy.c:76 ... __check_heap_object+0x89/0xc0 mm/slab.c:4426 check_heap_object mm/usercopy.c:236 [inline] __check_object_size+0x272/0x530 mm/usercopy.c:259 check_object_size include/linux/thread_info.h:112 [inline] check_copy_size include/linux/thread_info.h:143 [inline] copy_to_user include/linux/uaccess.h:154 [inline] put_cmsg+0x233/0x3f0 net/core/scm.c:242 sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913 packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296 sock_recvmsg_nosec net/socket.c:803 [inline] sock_recvmsg+0xc9/0x110 net/socket.c:810 ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179 __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287 SYSC_recvmmsg net/socket.c:2368 [inline] SyS_recvmmsg+0xc4/0x160 net/socket.c:2352 entry_SYSCALL_64_fastpath+0x29/0xa0 Reported-by: [email protected] Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08net: Extra '_get' in declaration of arch_get_platform_mac_addressMathieu Malaterre1-1/+1
In commit c7f5d105495a ("net: Add eth_platform_get_mac_address() helper."), two declarations were added: int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr); unsigned char *arch_get_platform_get_mac_address(void); An extra '_get' was introduced in arch_get_platform_get_mac_address, remove it. Fix compile warning using W=1: CC net/ethernet/eth.o net/ethernet/eth.c:523:24: warning: no previous prototype for ‘arch_get_platform_mac_address’ [-Wmissing-prototypes] unsigned char * __weak arch_get_platform_mac_address(void) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ AR net/ethernet/built-in.o Signed-off-by: Mathieu Malaterre <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08ibmvnic: queue reset when CRQ gets closed during resetNathan Fontenot1-1/+5
While handling a driver reset we get a H_CLOSED return trying to send a CRQ event. When this occurs we need to queue up another reset attempt. Without doing this we see instances where the driver is left in a closed state because the reset failed and there is no further attempts to reset the driver. Signed-off-by: Nathan Fontenot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08atm: he: use 64-bit arithmetic instead of 32-bitGustavo A. R. Silva1-4/+4
Add suffix ULL to constants 272, 204, 136 and 68 in order to give the compiler complete information about the proper arithmetic to use. Notice that these constants are used in contexts that expect expressions of type unsigned long long (64 bits, unsigned). The following expressions are currently being evaluated using 32-bit arithmetic: 272 * mult 204 * mult 136 * mult 68 * mult Addresses-Coverity-ID: 201058 Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08rtnetlink: require unique netns identifierChristian Brauner1-0/+48
Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK it is possible for userspace to send us requests with three different properties to identify a target network namespace. This affects at least RTM_{NEW,SET}LINK. Each of them could potentially refer to a different network namespace which is confusing. For legacy reasons the kernel will pick the IFLA_NET_NS_PID property first and then look for the IFLA_NET_NS_FD property but there is no reason to extend this type of behavior to network namespace ids. The regression potential is quite minimal since the rtnetlink requests in question either won't allow IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place. Signed-off-by: Christian Brauner <[email protected]> Acked-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08tuntap: add missing xdp flushJason Wang1-0/+15
When using devmap to redirect packets between interfaces, xdp_do_flush() is usually a must to flush any batched packets. Unfortunately this is missed in current tuntap implementation. Unlike most hardware driver which did XDP inside NAPI loop and call xdp_do_flush() at then end of each round of poll. TAP did it in the context of process e.g tun_get_user(). So fix this by count the pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT or MSG_MORE was cleared by sendmsg() caller. With this fix, xdp_redirect_map works again between two TAPs. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08netlink: ensure to loop over all netns in genlmsg_multicast_allns()Nicolas Dichtel1-2/+10
Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the case when commit 134e63756d5f was pushed. However, there was no reason to stop the loop if a netns does not have listeners. Returns -ESRCH only if there was no listeners in all netns. To avoid having the same problem in the future, I didn't take the assumption that nlmsg_multicast() returns only 0 or -ESRCH. Fixes: 134e63756d5f ("genetlink: make netns aware") CC: Johannes Berg <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08rxrpc: Don't put crypto buffers on the stackDavid Howells2-41/+52
Don't put buffers of data to be handed to crypto on the stack as this may cause an assertion failure in the kernel (see below). Fix this by using an kmalloc'd buffer instead. kernel BUG at ./include/linux/scatterlist.h:147! ... RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc] RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028 RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88 RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08 R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95 R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08 FS: 0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rxkad_respond_to_challenge+0x297/0x330 [rxrpc] rxrpc_process_connection+0xd1/0x690 [rxrpc] ? process_one_work+0x1c3/0x680 ? __lock_is_held+0x59/0xa0 process_one_work+0x249/0x680 worker_thread+0x3a/0x390 ? process_one_work+0x680/0x680 kthread+0x121/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 Reported-by: Jonathan Billings <[email protected]> Reported-by: Marc Dionne <[email protected]> Signed-off-by: David Howells <[email protected]> Tested-by: Jonathan Billings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08Merge ath-current from ↵Kalle Valo8-11/+59
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git ath.git fixes for 4.16. Major changes: ath10k * correct firmware RAM dump length for QCA6174/QCA9377 * add new QCA988X device id * fix a kernel panic during pci probe * revert a recent commit which broke ath10k firmware metadata parsing ath9k * fix a noise floor regression introduced during the merge window * add new device id
2018-02-08Merge branch ↵David S. Miller9-23/+53
'nfp-fix-disabling-TC-offloads-in-flower-max-TSO-segs-and-module-version' Jakub Kicinski says: ==================== nfp: fix disabling TC offloads in flower, max TSO segs and module version This set corrects the way nfp deals with the NETIF_F_HW_TC flag. It has slipped the review that flower offload does not currently refuse disabling this flag when filter offload is active. nfp's flower offload does not actually keep track of how many filters for each port are offloaded. The accounting of the number of filters is added to the nfp core structures, and BPF moved to use these structures as well. If users are allowed to disable TC offloads while filters are active, not only is it incorrect behaviour, but actually the NFP will never be told to remove the flows, leading to use-after-free when stats arrive. Fourth patch makes sure we declare the max number of TSO segments. FW should drop longer packets cleanly (otherwise this would be a security problem for untrusted VFs) but dropping longer TSO frames is not nice and driver should prevent them from being generated. Last small addition populates MODULE_VERSION with kernel version. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-02-08nfp: populate MODULE_VERSIONJakub Kicinski1-0/+1
DKMS and similar out-of-tree module replacement services use module version to make sure the out-of-tree software is not older than the module shipped with the kernel. We use the kernel version in ethtool -i output, put it into MODULE_VERSION as well. Reported-by: Jan Gutter <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08nfp: limit the number of TSO segmentsJakub Kicinski2-1/+6
Most FWs limit the number of TSO segments a frame can produce to 64. This is for fairness and efficiency (of FW datapath) reasons. If a frame with larger number of segments is submitted the FW will drop it. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08nfp: forbid disabling hw-tc-offload on representors while offload activeJakub Kicinski7-21/+33
All netdevs which can accept TC offloads must implement .ndo_set_features(). nfp_reprs currently do not do that, which means hw-tc-offload can be turned on and off even when offloads are active. Whether the offloads are active is really a question to nfp_ports, so remove the per-app tc_busy callback indirection thing, and simply count the number of offloaded items in nfp_port structure. Fixes: 8a2768732a4d ("nfp: provide infrastructure for offloading flower based TC filters") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Pieter Jansen van Vuuren <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08nfp: don't advertise hw-tc-offload on non-port netdevsJakub Kicinski1-1/+1
nfp_port is a structure which represents an ASIC port, both PCIe vNIC (on a PF or a VF) or the external MAC port. vNIC netdev (struct nfp_net) and pure representor netdev (struct nfp_repr) both have a pointer to this structure. nfp_reprs always have a port associated. nfp_nets, however, only represent a device port in legacy mode, where they are considered the MAC port. In switchdev mode they are just the CPU's side of the PCIe link. By definition TC offloads only apply to device ports. Don't set the flag on vNICs without a port (i.e. in switchdev mode). Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Pieter Jansen van Vuuren <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08nfp: bpf: require ETH tableJakub Kicinski1-0/+12
Upcoming changes will require all netdevs supporting TC offloads to have a full struct nfp_port. Require those for BPF offload. The operation without management FW reporting information about Ethernet ports is something we only support for very old and very basic NIC firmwares anyway. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Pieter Jansen van Vuuren <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-08Revert "ath10k: add sanity check to ie_len before parsing fw/board ie"Ryan Hsu1-7/+7
This reverts commit 9ed4f91628737c820af6a1815b65bc06bd31518f. The commit introduced a regression that over read the ie with the padding. - the expected IE information ath10k_pci 0000:03:00.0: found firmware features ie (1 B) ath10k_pci 0000:03:00.0: Enabling feature bit: 6 ath10k_pci 0000:03:00.0: Enabling feature bit: 7 ath10k_pci 0000:03:00.0: features ath10k_pci 0000:03:00.0: 00000000: c0 00 00 00 00 00 00 00 - the wrong IE with padding is read (0x77) ath10k_pci 0000:03:00.0: found firmware features ie (4 B) ath10k_pci 0000:03:00.0: Enabling feature bit: 6 ath10k_pci 0000:03:00.0: Enabling feature bit: 7 ath10k_pci 0000:03:00.0: Enabling feature bit: 8 ath10k_pci 0000:03:00.0: Enabling feature bit: 9 ath10k_pci 0000:03:00.0: Enabling feature bit: 10 ath10k_pci 0000:03:00.0: Enabling feature bit: 12 ath10k_pci 0000:03:00.0: Enabling feature bit: 13 ath10k_pci 0000:03:00.0: Enabling feature bit: 14 ath10k_pci 0000:03:00.0: Enabling feature bit: 16 ath10k_pci 0000:03:00.0: Enabling feature bit: 17 ath10k_pci 0000:03:00.0: Enabling feature bit: 18 ath10k_pci 0000:03:00.0: features ath10k_pci 0000:03:00.0: 00000000: c0 77 07 00 00 00 00 00 Tested-by: Mike Lothian <[email protected]> Signed-off-by: Ryan Hsu <[email protected]> Signed-off-by: Kalle Valo <[email protected]>
2018-02-08Merge branch 'bpf-misc-nfp-bpftool-doc-fixes'Daniel Borkmann6-10/+81
Jakub Kicinski says: ==================== First patch in this series fixes applying the relocation to immediate load instructions in the NFP JIT. The remaining patches come from Quentin. Small addition to libbpf makes sure it recognizes all standard section names. Makefile in bpftool/Documentation is improved to explicitly check for rst2man being installed on the system, otherwise we risk installing empty files. Man page for bpftool-map is corrected to include program as a potential value for map of programs. Last two patches are slightly longer, those update bash completions to include this release cycle's additions from Roman. Maybe the use of Fixes tags is slightly frivolous there, but having bash completions which don't cover all commands and options could be disruptive to work flow for users. ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-08tools: bpftool: add bash completion for cgroup commandsQuentin Monnet2-6/+62
Add bash completion for "bpftool cgroup" command family. While at it, also fix the formatting of some keywords in the man page for cgroups. Fixes: 5ccda64d38cc ("bpftool: implement cgroup bpf operations") Signed-off-by: Quentin Monnet <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-08tools: bpftool: add bash completion for `bpftool prog load`Quentin Monnet1-2/+6
Add bash completion for bpftool command `prog load`. Completion for this command is easy, as it only takes existing file paths as arguments. Fixes: 49a086c201a9 ("bpftool: implement prog load command") Signed-off-by: Quentin Monnet <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-08tools: bpftool: make syntax for program map update explicit in man pageQuentin Monnet1-1/+2
Specify in the documentation that when using bpftool to update a map of type BPF_MAP_TYPE_PROG_ARRAY, the syntax for the program used as a value should use the "id|tag|pinned" keywords convention, as used with "bpftool prog" commands. Fixes: ff69c21a85a4 ("tools: bpftool: add documentation") Signed-off-by: Quentin Monnet <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-08tools: bpftool: exit doc Makefile early if rst2man is not availableQuentin Monnet1-0/+5
If rst2man is not available on the system, running `make doc` from the bpftool directory fails with an error message. However, it creates empty manual pages (.8 files in this case). A subsequent call to `make doc-install` would then succeed and install those empty man pages on the system. To prevent this, raise a Makefile error and exit immediately if rst2man is not available before generating the pages from the rst documentation. Fixes: ff69c21a85a4 ("tools: bpftool: add documentation") Reported-by: Jason van Aaardt <[email protected]> Signed-off-by: Quentin Monnet <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-08libbpf: complete list of strings for guessing program typeQuentin Monnet1-0/+5
It seems that the type guessing feature for libbpf, based on the name of the ELF section the program is located in, was inspired from samples/bpf/prog_load.c, which was not used by any sample for loading programs of certain types such as TC actions and classifiers, or LWT-related types. As a consequence, libbpf is not able to guess the type of such programs and to load them automatically if type is not provided to the `bpf_load_prog()` function. Add ELF section names associated to those eBPF program types so that they can be loaded with e.g. bpftool as well. Signed-off-by: Quentin Monnet <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-08nfp: bpf: fix immed relocation for larger offsetsJakub Kicinski1-1/+1
Immed relocation is missing a shift which means for larger offsets the lower and higher part of the address would be ORed together. Fixes: ce4ebfd859c3 ("nfp: bpf: add helpers for updating immediate instructions") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-07tcp: tracepoint: only call trace_tcp_send_reset with full socketSong Liu2-2/+4
tracepoint tcp_send_reset requires a full socket to work. However, it may be called when in TCP_TIME_WAIT: case TCP_TW_RST: tcp_v6_send_reset(sk, skb); inet_twsk_deschedule_put(inet_twsk(sk)); goto discard_it; To avoid this problem, this patch checks the socket with sk_fullsock() before calling trace_tcp_send_reset(). Fixes: c24b14c46bb8 ("tcp: add tracepoint trace_tcp_send_reset") Signed-off-by: Song Liu <[email protected]> Reviewed-by: Lawrence Brakmo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-07sch_netem: Bug fixing in calculating Netem intervalMd. Islam1-1/+1
In Kernel 4.15.0+, Netem does not work properly. Netem setup: tc qdisc add dev h1-eth0 root handle 1: netem delay 10ms 2ms Result: PING 172.16.101.2 (172.16.101.2) 56(84) bytes of data. 64 bytes from 172.16.101.2: icmp_seq=1 ttl=64 time=22.8 ms 64 bytes from 172.16.101.2: icmp_seq=2 ttl=64 time=10.9 ms 64 bytes from 172.16.101.2: icmp_seq=3 ttl=64 time=10.9 ms 64 bytes from 172.16.101.2: icmp_seq=5 ttl=64 time=11.4 ms 64 bytes from 172.16.101.2: icmp_seq=6 ttl=64 time=11.8 ms 64 bytes from 172.16.101.2: icmp_seq=4 ttl=64 time=4303 ms 64 bytes from 172.16.101.2: icmp_seq=10 ttl=64 time=11.2 ms 64 bytes from 172.16.101.2: icmp_seq=11 ttl=64 time=10.3 ms 64 bytes from 172.16.101.2: icmp_seq=7 ttl=64 time=4304 ms 64 bytes from 172.16.101.2: icmp_seq=8 ttl=64 time=4303 ms Patch: (rnd % (2 * sigma)) - sigma was overflowing s32. After applying the patch, I found following output which is desirable. PING 172.16.101.2 (172.16.101.2) 56(84) bytes of data. 64 bytes from 172.16.101.2: icmp_seq=1 ttl=64 time=21.1 ms 64 bytes from 172.16.101.2: icmp_seq=2 ttl=64 time=8.46 ms 64 bytes from 172.16.101.2: icmp_seq=3 ttl=64 time=9.00 ms 64 bytes from 172.16.101.2: icmp_seq=4 ttl=64 time=11.8 ms 64 bytes from 172.16.101.2: icmp_seq=5 ttl=64 time=8.36 ms 64 bytes from 172.16.101.2: icmp_seq=6 ttl=64 time=11.8 ms 64 bytes from 172.16.101.2: icmp_seq=7 ttl=64 time=8.11 ms 64 bytes from 172.16.101.2: icmp_seq=8 ttl=64 time=10.0 ms 64 bytes from 172.16.101.2: icmp_seq=9 ttl=64 time=11.3 ms 64 bytes from 172.16.101.2: icmp_seq=10 ttl=64 time=11.5 ms 64 bytes from 172.16.101.2: icmp_seq=11 ttl=64 time=10.2 ms Reviewed-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-07net: ethernet: ti: cpsw: fix net watchdog timeoutGrygorii Strashko1-2/+14
It was discovered that simple program which indefinitely sends 200b UDP packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog timeout is triggered due to race between cpsw_ndo_start_xmit() and cpsw_tx_handler() [NAPI] cpsw_ndo_start_xmit() if (unlikely(!cpdma_check_free_tx_desc(txch))) { txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq); ^^ as per [1] barier has to be used after set_bit() otherwise new value might not be visible to other cpus } cpsw_tx_handler() if (unlikely(netif_tx_queue_stopped(txq))) netif_tx_wake_queue(txq); and when it happens ndev TX queue became disabled forever while driver's HW TX queue is empty. Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue() calls and double check for free TX descriptors after stopping ndev TX queue - if there are free TX descriptors wake up ndev TX queue. [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html Signed-off-by: Grygorii Strashko <[email protected]> Reviewed-by: Ivan Khoronzhuk <[email protected]> Signed-off-by: David S. Miller <[email protected]>