aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-03-01net/mlx5: Add multipath modeRoi Dayan4-2/+28
In order to offload ecmp-on-host scheme where next-hop routes are used, we will make use of HW LAG. Add accessor function to let upper layers in the driver to realize if the lag acts in multi-path mode. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-03-01net/mlx5: Use own workqueue for lag netdev events processingRoi Dayan2-1/+9
Instead of using the system workqueue, allocate our own workqueue. This workqueue will be used to handle more work in the next patch. This patch doesn't change functionality. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-03-01net/mlx5: Expose lag operations in header fileRoi Dayan2-48/+68
The change is a refactoring step towards a multipath use case. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-03-01net/mlx5: Use unsigned int bit instead of bool as a struct memberRoi Dayan1-1/+1
This fix checkpatch check CHECK: Avoid using bool structure members because of possible alignment issues Signed-off-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-03-01net/mlx5e: Don't make internal use of errno to denote missing neighRoi Dayan2-14/+22
EAGAIN is treated as a specific case when we consider the attachment successful but wait for neigh event before offloading the flow. This can result in unwanted behavior when sub calls on the offloading path will return EAGAIN and we pass this error up. Instead of attaching to a specific error code return a boolean value from the attach encap operation saying if the encap is valid or not. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-03-01net/mlx5e: Cleanup attach encap functionRoi Dayan1-14/+17
Remove the tunnel info argument which we can get from the other args. Also reorder the args to have input args first and output args later. This patch doesn't change functionality. Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Or Gerlitz <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-03-01net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as staticEran Ben Elisha1-1/+1
Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx reporter, move it to be statically declared in en/reporter_tx.c. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <[email protected]> Reported-by: Or Gerlitz <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-03-01Merge branch 'nfp-control-processor-DMA-support-and-RJ45'David S. Miller3-46/+243
Jakub Kicinski says: ==================== nfp: control processor DMA support and RJ45 This series starts with adding support for reporting twisted pair media type in ethtool. Remaining patches add support for using DMA with the control/service processor. Currently we always copy the command data into card's memory. DMA support allows us to have the NSP read the data from host memory by itself. Unfortunately, the FW loading and flashing cannot directly map the buffers for DMA because (a) the firmware ABI returns const buffers, and (b) the buffers may be vmalloc()ed in many mysterious/unmappable way. So just bite the bullet - allocate new host buffer for the command and copy. As Dirk explains, the NSP now supports updating all FWs at once which means the max flashing time grew significantly. He bumps the max wait to avoid timeouts. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-03-01nfp: nsp: set higher timeout for flash bundleDirk van der Merwe1-4/+1
The management firmware now supports being passed a bundle with multiple components to be stored in flash at once. This makes it easier to update all components to a known state with a single user command, however, this also has the potential to increase the time required to perform the update significantly. The management firmware only updates the components out of a bundle which are outdated, however, we need to make sure we can handle the absolute worst case where a CPLD update can take a long time to perform. We set a very conservative total timeout of 900s which already adds a contingency. Signed-off-by: Dirk van der Merwe <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01nfp: nsp: allow the use of DMA bufferJakub Kicinski1-5/+191
Newer versions of NSP can access host memory. Simplest access type requires all data to be in one contiguous area. Since we don't have the guarantee on where callers of the NSP ABI will allocate their buffers we allocate a bounce buffer and copy the data in and out. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01nfp: nsp: move default buffer handling into its own functionJakub Kicinski1-42/+51
DMA version of NSP communication is coming, move the code which copies data into the NFP buffer into a separate function. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01nfp: nsp: use fractional size of the bufferJakub Kicinski1-6/+7
NSP expresses the buffer size in MB and 4 kB blocks. For small buffers the kB part may make a difference, so count it in. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01nfp: report RJ45 connector in ethtoolJakub Kicinski2-0/+4
Add support for reporting twisted pair port type. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01lan743x: Fix TX Stall IssueBryan Whitehead1-4/+12
It has been observed that tx queue stalls while downloading from certain web sites (example www.speedtest.net) The cause has been tracked down to a corner case where dma descriptors where not setup properly. And there for a tx completion interrupt was not signaled. This fix corrects the problem by properly marking the end of a multi descriptor transmission. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Bryan Whitehead <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01net: phy: phylink: fix uninitialized variable in phylink_get_mac_stateHeiner Kallweit1-0/+4
When debugging an issue I found implausible values in state->pause. Reason in that state->pause isn't initialized and later only single bits are changed. Also the struct itself isn't initialized in phylink_resolve(). So better initialize state->pause and other not yet initialized fields. v2: - use right function name in subject v3: - initialize additional fields Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01net: aquantia: regression on cpus with high cores: set mode with 8 queuesDmitry Bogdanov4-0/+29
Recently the maximum number of queues was increased up to 8, but NIC was not fully configured for 8 queues. In setups with more than 4 CPU cores parts of TX traffic gets lost if the kernel routes it to queues 4th-8th. This patch sets a tx hw traffic mode with 8 queues. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202651 Fixes: 71a963cfc50b ("net: aquantia: increase max number of hw queues") Reported-by: Nicholas Johnson <[email protected]> Signed-off-by: Dmitry Bogdanov <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01selftests: fixes for UDP GROPaolo Abeni2-17/+33
The current implementation for UDP GRO tests is racy: the receiver may flush the RX queue while the sending is still transmitting and incorrectly report RX errors, with a wrong number of packet received. Add explicit timeouts to the receiver for both connection activation (first packet received for UDP) and reception completion, so that in the above critical scenario the receiver will wait for the transfer completion. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01net: marvell: neta: disable comphy when setting modeMarek Behún1-5/+23
The comphy driver for Armada 3700 by Miquèl Raynal (which is currently in linux-next) does not actually set comphy mode when phy_set_mode_ext is called. The mode is set at next call of phy_power_on. Update the driver to semantics similar to mvpp2: helper mvneta_comphy_init sets comphy mode and powers it on. When mode is to be changed in mvneta_mac_config, first power the comphy off, then call mvneta_comphy_init (which sets the mode to new one). Only do this when new mode is different from old mode. This should also work for Armada 38x, since in that comphy driver methods power_on and power_off are unimplemented. Signed-off-by: Marek Behún <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01Merge branch 'enetc-Add-mdio-support-and-device-tree-nodes'David S. Miller7-1/+340
Claudiu Manoil says: ==================== enetc: Add mdio support and device tree nodes This is the missing part to enable PCI probing of the ENETC ethernet ports on the LS1028A SoC and external traffic on the LS1028A RDB board. It's one of the first items on the TODO list for the recently merged ENETC ethernet driver. v3: Add DT bindings doc for ENETC connections v4: none ==================== Signed-off-by: David S. Miller <[email protected]>
2019-03-01dt-bindings: net: freescale: enetc: Add connection bindings for ENETC ↵Claudiu Manoil1-0/+69
ethernet nodes Define connection bindings (external PHY connections and internal links) for the ENETC on-chip ethernet controllers. Signed-off-by: Claudiu Manoil <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01enetc: Add ENETC PF level external MDIO supportClaudiu Manoil4-1/+219
Each ENETC PF has its own MDIO interface, the corresponding MDIO registers are mapped in the ENETC's Port register block. The current patch adds a driver for these PF level MDIO buses, so that each PF can manage directly its own external link. Signed-off-by: Alex Marginean <[email protected]> Signed-off-by: Claudiu Manoil <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A ↵Claudiu Manoil1-0/+17
RDB board The LS1028A RDB board features an Atheros PHY connected over SGMII to the ENETC PF0 (or Port0). ENETC Port1 (PF1) has no external connection on this board, so it can be disabled for now. Signed-off-by: Alex Marginean <[email protected]> Signed-off-by: Claudiu Manoil <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpointsClaudiu Manoil1-0/+35
The LS1028A SoC features a PCI Integrated Endpoint Root Complex (IERC) defining several integrated PCI devices, including the ENETC ethernet controller integrated endpoints (IEPs). The IERC implements ECAM (Enhanced Configuration Access Mechanism) to provide access to the PCIe config space of the IEPs. This means the the IEPs (including ENETC) do not support the standard PCIe BARs, instead the Enhanced Allocation (EA) capability structures in the ECAM space are used to fix the base addresses in the system, and the PCI subsystem uses these structures for device enumeration and discovery. The "ranges" entries contain basic information from these EA capabily structures required by the kernel for device enumeration. The current patch also enables the first 2 ENETC PFs (Physiscal Functions) and the associated VFs (Virtual Functions), 2 VFs for each PF. Each of these ENETC PFs has an external ethernet port on the LS1028A SoC. Signed-off-by: Alex Marginean <[email protected]> Signed-off-by: Claudiu Manoil <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-03-01Merge tag 'iommu-fix-v5.0-rc8' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fix from Joerg Roedel: "One important fix for a memory corruption issue in the Intel VT-d driver that triggers on hardware with deep PCI hierarchies" * tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/dmar: Fix buffer overflow during PCI bus notification
2019-03-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-4/+59
Merge misc fixes from Andrew Morton: "2 fixes" * emailed patches from Andrew Morton <[email protected]>: hugetlbfs: fix races and page leaks during migration kasan: turn off asan-stack for clang-8 and earlier
2019-03-01hugetlbfs: fix races and page leaks during migrationMike Kravetz3-3/+36
hugetlb pages should only be migrated if they are 'active'. The routines set/clear_page_huge_active() modify the active state of hugetlb pages. When a new hugetlb page is allocated at fault time, set_page_huge_active is called before the page is locked. Therefore, another thread could race and migrate the page while it is being added to page table by the fault code. This race is somewhat hard to trigger, but can be seen by strategically adding udelay to simulate worst case scheduling behavior. Depending on 'how' the code races, various BUG()s could be triggered. To address this issue, simply delay the set_page_huge_active call until after the page is successfully added to the page table. Hugetlb pages can also be leaked at migration time if the pages are associated with a file in an explicitly mounted hugetlbfs filesystem. For example, consider a two node system with 4GB worth of huge pages available. A program mmaps a 2G file in a hugetlbfs filesystem. It then migrates the pages associated with the file from one node to another. When the program exits, huge page counts are as follows: node0 1024 free_hugepages 1024 nr_hugepages node1 0 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool That is as expected. 2G of huge pages are taken from the free_hugepages counts, and 2G is the size of the file in the explicitly mounted filesystem. If the file is then removed, the counts become: node0 1024 free_hugepages 1024 nr_hugepages node1 1024 free_hugepages 1024 nr_hugepages Filesystem Size Used Avail Use% Mounted on nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool Note that the filesystem still shows 2G of pages used, while there actually are no huge pages in use. The only way to 'fix' the filesystem accounting is to unmount the filesystem If a hugetlb page is associated with an explicitly mounted filesystem, this information in contained in the page_private field. At migration time, this information is not preserved. To fix, simply transfer page_private from old to new page at migration time if necessary. There is a related race with removing a huge page from a file and migration. When a huge page is removed from the pagecache, the page_mapping() field is cleared, yet page_private remains set until the page is actually freed by free_huge_page(). A page could be migrated while in this state. However, since page_mapping() is not set the hugetlbfs specific routine to transfer page_private is not called and we leak the page count in the filesystem. To fix that, check for this condition before migrating a huge page. If the condition is detected, return EBUSY for the page. Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active") Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Naoya Horiguchi <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: <[email protected]> [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: update comment and changelog] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-01kasan: turn off asan-stack for clang-8 and earlierArnd Bergmann2-1/+23
Building an arm64 allmodconfig kernel with clang results in over 140 warnings about overly large stack frames, the worst ones being: drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare' drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable' drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread' drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe' drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd' drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo' None of these happen with gcc today, and almost all of these are the result of a single known issue in llvm. Hopefully it will eventually get fixed with the clang-9 release. In the meantime, the best idea I have is to turn off asan-stack for clang-8 and earlier, so we can produce a kernel that is safe to run. I have posted three patches that address the frame overflow warnings that are not addressed by turning off asan-stack, so in combination with this change, we get much closer to a clean allmodconfig build, which in turn is necessary to do meaningful build regression testing. It is still possible to turn on the CONFIG_ASAN_STACK option on all versions of clang, and it's always enabled for gcc, but when CONFIG_COMPILE_TEST is set, the option remains invisible, so allmodconfig and randconfig builds (which are normally done with a forced CONFIG_COMPILE_TEST) will still result in a mostly clean build. Link: http://lkml.kernel.org/r/[email protected] Link: https://bugs.llvm.org/show_bug.cgi?id=38809 Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Qian Cai <[email protected]> Reviewed-by: Mark Brown <[email protected]> Acked-by: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: Andrey Konovalov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-03-01Merge tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drmLinus Torvalds4-4/+37
Pull drm fixes from Dave Airlie: "Three final fixes, one for a feature that is new in this kernel, one bochs fix for qemu riscv and one atomic modesetting fix. I've left a few of the other late fixes until next as I didn't want to throw in anything that wasn't really necessary" * tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm: drm/bochs: Fix the ID mismatch error drm: Block fb changes for async plane updates drm/amd/display: Use vrr friendly pageflip throttling in DC.
2019-03-01s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=yMartin Schwidefsky1-14/+5
The dasd_eckd_restore_device() function calls dasd_generic_read_dev_chars with a temporary buffer on the stack. With CONFIG_VMAP_STACK=y this is a vmalloc address but dasd_generic_restore_device() uses the address of the buffer as I/O address. Circumvent this by using the already allocated cqr->data buffer for the RDC data. Signed-off-by: Martin Schwidefsky <[email protected]>
2019-03-01s390/suspend: fix prefix register reset in swsusp_arch_resumeMartin Schwidefsky1-3/+3
The reset of the prefix to zero in swsusp_arch_resume uses a 4 byte stack slot. With CONFIG_VMAP_STACK=y this is now in the vmalloc area, this works only with DAT enabled. Move the DAT disable in swsusp_arch_resume after the prefix reset. Signed-off-by: Martin Schwidefsky <[email protected]>
2019-03-01bpf: drop refcount if bpf_map_new_fd() fails in map_create()Peng Sun1-2/+2
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file descriptor is supposed to return to userspace. When bpf_map_new_fd() fails, drop the refcount. Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID") Signed-off-by: Peng Sun <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-03-01Merge tag 'qcom-fixes-for-5.0-rc8' of ↵Arnd Bergmann1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into arm/fixes Qualcomm ARM64 Fixes for 5.0-rc8 * Fix TZ memory area size to avoid crashes during boot * tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: arm64: dts: qcom: msm8998: Extend TZ reserved memory area
2019-03-01Merge tag 'tee-fix-for-v5.0' of ↵Arnd Bergmann1-1/+3
https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes OP-TEE driver - add missing of_node_put after of_device_is_available * tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee: tee: optee: add missing of_node_put after of_device_is_available
2019-03-01netfilter: nf_tables: merge ipv4 and ipv6 nat chain typesFlorian Westphal9-194/+111
Merge the ipv4 and ipv6 nat chain type. This is the last missing piece which allows to provide inet family support for nat in a follow patch. The kconfig knobs for ipv4/ipv6 nat chain are removed, the nat chain type will be built unconditionally if NFT_NAT expression is enabled. Before: text data bss dec hex filename 1576 896 0 2472 9a8 nft_chain_nat_ipv4.ko 1697 896 0 2593 a21 nft_chain_nat_ipv6.ko After: text data bss dec hex filename 1832 896 0 2728 aa8 nft_chain_nat.ko Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: nf_tables: nat: merge nft_masq protocol specific modulesFlorian Westphal9-236/+168
The family specific masq modules are way too small to warrant an extra module, just place all of them in nft_masq. before: text data bss dec hex filename 1001 832 0 1833 729 nft_masq.ko 766 896 0 1662 67e nft_masq_ipv4.ko 764 896 0 1660 67c nft_masq_ipv6.ko after: 2010 960 0 2970 b9a nft_masq.ko Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: nf_tables: nat: merge nft_redir protocol specific modulesFlorian Westphal9-217/+143
before: text data bss dec hex filename 990 832 0 1822 71e nft_redir.ko 697 896 0 1593 639 nft_redir_ipv4.ko 713 896 0 1609 649 nft_redir_ipv6.ko after: text data bss dec hex filename 1910 960 0 2870 b36 nft_redir.ko size is reduced, all helpers from nft_redir.ko can be made static. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: xt_IDLETIMER: fix sysfs callback function typeSami Tolvanen1-10/+4
Use struct device_attribute instead of struct idletimer_tg_attr, and the correct callback function type to avoid indirect call mismatches with Control Flow Integrity checking. Signed-off-by: Sami Tolvanen <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2Li RongQing1-0/+1
CONNTRACK_LOCKS is divisor when computer array index, if it is power of 2, compiler will optimize modulo operation as bitwise AND, or else modulo will lower performance. Suggested-by: Florian Westphal <[email protected]> Signed-off-by: Li RongQing <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: nf_tables: check the result of dereferencing base_chain->statsLi RongQing1-6/+8
Check the result of dereferencing base_chain->stats, instead of result of this_cpu_ptr with NULL. base_chain->stats maybe be changed to NULL when a chain is updated and a new NULL counter can be attached. And we do not need to check returning of this_cpu_ptr since base_chain->stats is from percpu allocator if it is non-NULL, this_cpu_ptr returns a valid value. And fix two sparse error by replacing rcu_access_pointer and rcu_dereference with READ_ONCE under rcu_read_lock. Thanks for Eric's help to finish this patch. Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Zhang Yu <[email protected]> Signed-off-by: Li RongQing <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slaveDavid Ahern1-1/+2
Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in to not drop packets for slave devices too. Currently, neighbor solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting dropped. This patch enables IPv6 communications for bridges with an address that are enslaved to a VRF. Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device") Signed-off-by: David Ahern <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01ipvs: get sctphdr by sctphoff in sctp_csum_checkXin Long1-5/+2
sctp_csum_check() is called by sctp_s/dnat_handler() where it calls skb_make_writable() to ensure sctphdr to be linearized. So there's no need to get sctphdr by calling skb_header_pointer() in sctp_csum_check(). Signed-off-by: Xin Long <[email protected]> Reviewed-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: convert the proto argument from u8 to u16Li RongQing3-7/+7
The proto in struct xt_match and struct xt_target is u16, when calling xt_check_target/match, their proto argument is u8, and will cause truncation, it is harmless to ip packet, since ip proto is u8 if a etable's match/target has proto that is u16, will cause the check failure. and convert be16 to short in bridge/netfilter/ebtables.c Signed-off-by: Zhang Yu <[email protected]> Signed-off-by: Li RongQing <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: nft_tunnel: Add dst_cache supportwenxu1-0/+7
The metadata_dst does not initialize the dst_cache field, this causes problems to ip_md_tunnel_xmit() since it cannot use this cache, hence, Triggering a route lookup for every packet. Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01netfilter: conntrack: tcp: only close if RST matches exact sequenceFlorian Westphal1-10/+40
TCP resets cause instant transition from established to closed state provided the reset is in-window. Endpoints that implement RFC 5961 require resets to match the next expected sequence number. RST segments that are in-window (but that do not match RCV.NXT) are ignored, and a "challenge ACK" is sent back. Main problem for conntrack is that its a middlebox, i.e. whereas an end host might have ACK'd SEQ (and would thus accept an RST with this sequence number), conntrack might not have seen this ACK (yet). Therefore we can't simply flag RSTs with non-exact match as invalid. This updates RST processing as follows: 1. If the connection is in a state other than ESTABLISHED, nothing is changed, RST is subject to normal in-window check. 2. If the RSTs sequence number either matches exactly RCV.NXT, connection state moves to CLOSE. 3. The same applies if the RST sequence number aligns with a previous packet in the same direction. In all other cases, the connection remains in ESTABLISHED state. If the normal-in-window check passes, the timeout will be lowered to that of CLOSE. If the peer sends a challenge ack, connection timeout will be reset. If the challenge ACK triggers another RST (RST was valid after all), this 2nd RST will match expected sequence and conntrack state changes to CLOSE. If no challenge ACK is received, the connection will time out after CLOSE seconds (10 seconds by default), just like without this patch. Packetdrill test case: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Receive a segment. 0.210 < P. 1:1001(1000) ack 1 win 46 0.210 > . 1:1(0) ack 1001 // Application writes 1000 bytes. 0.250 write(4, ..., 1000) = 1000 0.250 > P. 1:1001(1000) ack 1001 // First reset, old sequence. Conntrack (correctly) considers this // invalid due to failed window validation (regardless of this patch). 0.260 < R 2:2(0) ack 1001 win 260 // 2nd reset, but too far ahead sequence. Same: correctly handled // as invalid. 0.270 < R 99990001:99990001(0) ack 1001 win 260 // in-window, but not exact sequence. // Current Linux kernels might reply with a challenge ack, and do not // remove connection. // Without this patch, conntrack state moves to CLOSE. // With patch, timeout is lowered like CLOSE, but connection stays // in ESTABLISHED state. 0.280 < R 1010:1010(0) ack 1001 win 260 // Expect challenge ACK 0.281 > . 1001:1001(0) ack 1001 win 501 // With or without this patch, RST will cause connection // to move to CLOSE (sequence number matches) // 0.282 < R 1001:1001(0) ack 1001 win 260 // ACK 0.300 < . 1001:1001(0) ack 1001 win 257 // more data could be exchanged here, connection // is still established // Client closes the connection. 0.610 < F. 1001:1001(0) ack 1001 win 260 0.650 > . 1001:1001(0) ack 1002 // Close the connection without reading outstanding data 0.700 close(4) = 0 // so one more reset. Will be deemed acceptable with patch as well: // connection is already closing. 0.701 > R. 1001:1001(0) ack 1002 win 501 // End packetdrill test case. With patch, this generates following conntrack events: [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] Without patch, first RST moves connection to close, whereas socket state does not change until FIN is received. [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] Cc: Jozsef Kadlecsik <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01ipvs: change some data types from int to boolAndrea Claudi5-18/+18
Change the data type of the following variables from int to bool across ipvs code: - found - loop - need_full_dest - need_full_svc - payload_csum Also change the following functions to use bool full_entry param instead of int: - ip_vs_genl_parse_dest() - ip_vs_genl_parse_service() This patch does not change any functionality but makes the source code slightly easier to read. Signed-off-by: Andrea Claudi <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-03-01mmc:fix a bug when max_discard is 0Jiong Wu1-2/+2
The original purpose of the code I fix is to replace max_discard with max_trim if max_trim is less than max_discard. When max_discard is 0 we should replace max_discard with max_trim as well, because max_discard equals 0 happens only when the max_do_calc_max_discard process is overflowed, so if mmc_can_trim(card) is true, max_discard should be replaced by an available max_trim. However, in the original code, there are two lines of code interfere the right process. 1) if (max_discard && mmc_can_trim(card)) when max_discard is 0, it skips the process checking if max_discard needs to be replaced with max_trim. 2) if (max_trim < max_discard) the condition is false when max_discard is 0. it also skips the process that replaces max_discard with max_trim, in fact, we should replace the 0-valued max_discard with max_trim. Signed-off-by: Jiong Wu <[email protected]> Fixes: b305882fbc87 (mmc: core: optimize mmc_calc_max_discard) Cc: [email protected] # v4.17+ Signed-off-by: Ulf Hansson <[email protected]>
2019-03-01s390: warn about clearing als implied facilitiesVasily Gorbik3-2/+20
Add a warning about removing required architecture level set facilities via "facilities=" command line option. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2019-03-01s390: allow overriding facilities via command lineVasily Gorbik3-2/+50
Add "facilities=" command line option which allows to override facility bits returned by stfle. The main purpose of that is debugging aids which allows to test specific kernel behaviour depending on specific facilities presence. It also affects CPU alternatives. "facilities=" command line option format is comma separated list of integer values to be additionally set or cleared (if value is starting with "!"). Values ranges are also supported. e.g.: facilities=!130-160,159,167-169 Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2019-03-01s390: clean up redundant facilities list setupVasily Gorbik2-4/+0
Facilities list in the lowcore is initially set up by verify_facilities from als.c and later initializations are redundant, so cleaning them up. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2019-03-01s390/als: remove duplicated in-place implementation of stfleVasily Gorbik1-14/+1
Reuse __stfle call instead of in-place implementation. __stfle is using memcpy and memset functions but they are safe to use, since mem.S is built with -march=z900. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>