aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-08-16netfilter: ip6t_rpfilter: set F_IFACE for linklocal addressesFlorian Westphal1-1/+11
Roman reports that DHCPv6 client no longer sees replies from server due to ip6tables -t raw -A PREROUTING -m rpfilter --invert -j DROP rule. We need to set the F_IFACE flag for linklocal addresses, they are scoped per-device. Fixes: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups") Reported-by: Roman Mamedov <[email protected]> Tested-by: Roman Mamedov <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-08-16ipvs: don't show negative times in ip_vs_connMatteo Croce1-8/+14
Since commit 500462a9de65 ("timers: Switch to a non-cascading wheel"), timers duration can last even 12.5% more than the scheduled interval. IPVS has two handlers, /proc/net/ip_vs_conn and /proc/net/ip_vs_conn_sync, which shows the remaining time before that a connection expires. The default expire time for a connection is 60 seconds, and the expiration timer can fire even 4 seconds later than the scheduled time. The expiration time is calculated subtracting jiffies to the scheduled expiration time, and it's shown as a huge number when the timer fires late, since both values are unsigned. This can confuse script and tools which relies on it, like ipvsadm: root@mcroce-redhat:~# while ipvsadm -lc |grep SYN_RECV; do sleep 1 ; done TCP 00:05 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:04 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:03 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:02 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:01 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 00:00 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:44 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:43 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:42 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:41 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:40 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 TCP 68719476:39 SYN_RECV [fc00:1::1]:55732 [fc00:1::2]:8000 [fc00:2000::1]:8000 Signed-off-by: Matteo Croce <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-08-16ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest()Tan Hu1-4/+11
We came across infinite loop in ipvs when using ipvs in docker env. When ipvs receives new packets and cannot find an ipvs connection, it will create a new connection, then if the dest is unavailable (i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently. But if the dropped packet is the first packet of this connection, the connection control timer never has a chance to start and the ipvs connection cannot be released. This will lead to memory leak, or infinite loop in cleanup_net() when net namespace is released like this: ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs] __ip_vs_cleanup at ffffffffa0a9f60a [ip_vs] ops_exit_list at ffffffff81567a49 cleanup_net at ffffffff81568b40 process_one_work at ffffffff810a851b worker_thread at ffffffff810a9356 kthread at ffffffff810b0b6f ret_from_fork at ffffffff81697a18 race condition: CPU1 CPU2 ip_vs_in() ip_vs_conn_new() ip_vs_del_dest() __ip_vs_unlink_dest() ~IP_VS_DEST_F_AVAILABLE cp->dest && !IP_VS_DEST_F_AVAILABLE __ip_vs_conn_put ... cleanup_net ---> infinite looping Fix this by checking whether the timer already started. Signed-off-by: Tan Hu <[email protected]> Reviewed-by: Jiang Biao <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2018-08-15Merge branch 'linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix dcache flushing crash in skcipher. - Add hash finup self-tests. - Reschedule during speed tests. Algorithms: - Remove insecure vmac and replace it with vmac64. - Add public key verification for DH/ECDH. Drivers: - Decrease priority of sha-mb on x86. - Improve NEON latency/throughput on ARM64. - Add md5/sha384/sha512/des/3des to inside-secure. - Support eip197d in inside-secure. - Only register algorithms supported by the host in virtio. - Add cts and remove incompatible cts1 from ccree. - Add hisilicon SEC security accelerator driver. - Replace msm hwrng driver with qcom pseudo rng driver. Misc: - Centralize CRC polynomials" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (121 commits) crypto: arm64/ghash-ce - implement 4-way aggregation crypto: arm64/ghash-ce - replace NEON yield check with block limit crypto: hisilicon - sec_send_request() can be static lib/mpi: remove redundant variable esign crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable crypto: arm64/aes-ce-gcm - implement 2-way aggregation crypto: arm64/aes-ce-gcm - operate on two input blocks at a time crypto: dh - make crypto_dh_encode_key() make robust crypto: dh - fix calculating encoded key size crypto: ccp - Check for NULL PSP pointer at module unload crypto: arm/chacha20 - always use vrev for 16-bit rotates crypto: ccree - allow bigger than sector XTS op crypto: ccree - zero all of request ctx before use crypto: ccree - remove cipher ivgen left overs crypto: ccree - drop useless type flag during reg crypto: ablkcipher - fix crash flushing dcache in error path crypto: blkcipher - fix crash flushing dcache in error path crypto: skcipher - fix crash flushing dcache in error path crypto: skcipher - remove unnecessary setting of walk->nbytes crypto: scatterwalk - remove scatterwalk_samebuf() ...
2018-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds425-7620/+24541
Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ...
2018-08-15Merge tag 'kbuild-v4.19' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - verify depmod is installed before modules_install - support build salt in case build ids must be unique between builds - allow users to specify additional host compiler flags via HOST*FLAGS, and rename internal variables to KBUILD_HOST*FLAGS - update buildtar script to drop vax support, add arm64 support - update builddeb script for better debarch support - document the pit-fall of if_changed usage - fix parallel build of UML with O= option - make 'samples' target depend on headers_install to fix build errors - remove deprecated host-progs variable - add a new coccinelle script for refcount_t vs atomic_t check - improve double-test coccinelle script - misc cleanups and fixes * tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits) coccicheck: return proper error code on fail Coccinelle: doubletest: reduce side effect false positives kbuild: remove deprecated host-progs variable kbuild: make samples really depend on headers_install um: clean up archheaders recipe kbuild: add %asm-generic to no-dot-config-targets um: fix parallel building with O= option scripts: Add Python 3 support to tracing/draw_functrace.py builddeb: Add automatic support for sh{3,4}{,eb} architectures builddeb: Add automatic support for riscv* architectures builddeb: Add automatic support for m68k architecture builddeb: Add automatic support for or1k architecture builddeb: Add automatic support for sparc64 architecture builddeb: Add automatic support for mips{,64}r6{,el} architectures builddeb: Add automatic support for mips64el architecture builddeb: Add automatic support for ppc64 and powerpcspe architectures builddeb: Introduce functions to simplify kconfig tests in set_debarch builddeb: Drop check for 32-bit s390 builddeb: Change architecture detection fallback to use dpkg-architecture builddeb: Skip architecture detection when KBUILD_DEBARCH is set ...
2018-08-15Merge tag 'audit-pr-20180814' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit patches from Paul Moore: "Twelve audit patches for v4.19 and they run the full gamut from fixes to features. Notable changes include the ability to use the "exe" audit filter field in a wider variety of filter types, a fix for our comparison of GID/EGID in audit filter rules, better association of related audit records (connecting related audit records together into one audit event), and a fix for a potential use-after-free in audit_add_watch(). All the patches pass the audit-testsuite and merge cleanly on your current master branch" * tag 'audit-pr-20180814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix use-after-free in audit_add_watch audit: use ktime_get_coarse_real_ts64() for timestamps audit: use ktime_get_coarse_ts64() for time access audit: simplify audit_enabled check in audit_watch_log_rule_change() audit: check audit_enabled in audit_tree_log_remove_rule() cred: conditionally declare groups-related functions audit: eliminate audit_enabled magic number comparison audit: rename FILTER_TYPE to FILTER_EXCLUDE audit: Fix extended comparison of GID/EGID audit: tie ANOM_ABEND records to syscall audit: tie SECCOMP records to syscall audit: allow other filter list types for AUDIT_EXE
2018-08-14Merge tag 'leds-for-4.19-rc1' of ↵Linus Torvalds3-8/+22
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: "LED triggers improvements make the biggest part of this pull request. The most striking ones, that allowed for nice cleanups in the triggers are: - centralized handling of creation and removal of trigger sysfs attributes via attribute group - addition of module_led_trigger() helper The other things that need to be mentioned: New features and improvements to existing LED class drivers: - lt3593: add DT support, switch to gpiod interface - lm3692x: support LED sync configuration, change OF calls to fwnode calls - apu: modify PC Engines apu/apu2 driver to support apu3 Change in the drivers/net/can/led.c: - mark led trigger as broken since it's in the way for the further cleanups. It implements a subset of the netdev trigger and an Ack is needed from someone who can actually test and confirm that the netdev trigger works for can devices" * tag 'leds-for-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (32 commits) leds: ns2: Change unsigned to unsigned int usb: simplify usbport trigger leds: gpio trigger: simplifications from core changes leds: backlight trigger: simplifications from core changes leds: activity trigger: simplifications from core changes leds: default-on trigger: make use of module_led_trigger() leds: heartbeat trigger: simplifications from core changes leds: oneshot trigger: simplifications from core changes leds: transient trigger: simplifications from core changes leds: timer trigger: simplifications from core changes leds: netdev trigger: simplifications from core changes leds: triggers: new function led_set_trigger_data() leds: triggers: define module_led_trigger helper leds: triggers: handle .trigger_data and .activated() in the core leds: triggers: add device attribute support leds: triggers: let struct led_trigger::activate() return an error code leds: triggers: make the MODULE_LICENSE string match the actual license leds: lm3692x: Support LED sync configuration dt: bindings: lm3692x: Update binding for LED sync control leds: lm3692x: Change DT calls to fwnode calls ...
2018-08-14net: filter: mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 1472592 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-14rds: fix building with IPV6=mArnd Bergmann1-0/+1
When CONFIG_RDS_TCP is built-in and CONFIG_IPV6 is a loadable module, we get a link error agains the modular ipv6_chk_addr() function: net/rds/tcp.o: In function `rds_tcp_laddr_check': tcp.c:(.text+0x3b2): undefined reference to `ipv6_chk_addr' This adds back a dependency that forces RDS_TCP to also be a loadable module when IPV6 is one. Fixes: e65d4d96334e ("rds: Remove IPv6 dependency") Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-14net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()Jeremy Cline2-2/+3
req->sdiag_family is a user-controlled value that's used as an array index. Sanitize it after the bounds check to avoid speculative out-of-bounds array access. This also protects the sock_is_registered() call, so this removes the sanitize call there. Fixes: e978de7a6d38 ("net: socket: Fix potential spectre v1 gadget in sock_is_registered") Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Jeremy Cline <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-14mac80211: Run TXQ teardown code before de-registering interfacesToke Høiland-Jørgensen1-1/+1
The TXQ teardown code can reference the vif data structures that are stored in the netdev private memory area if there are still packets on the queue when it is being freed. Since the TXQ teardown code is run after the netdevs are freed, this can lead to a use-after-free. Fix this by moving the TXQ teardown code to earlier in ieee80211_unregister_hw(). Reported-by: Ben Greear <[email protected]> Tested-by: Ben Greear <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-08-14rfkill-gpio: include linux/mod_devicetable.hArnd Bergmann1-0/+1
One more driver is apparently broken by the recent change to linux/platform_device.h: net/rfkill/rfkill-gpio.c: In function 'rfkill_gpio_acpi_probe': net/rfkill/rfkill-gpio.c:82:29: error: dereferencing pointer to incomplete type 'const struct acpi_device_id' Include linux/mod_devicetable.h to get the definition of the acpi_device_id structure. Fixes: ac3167257b9f ("headers: separate linux/mod_devicetable.h from linux/platform_device.h") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-08-13l2tp: fix unused function warningArnd Bergmann3-12/+4
Removing one of the callers of pppol2tp_session_get_sock caused a harmless warning in some configurations: net/l2tp/l2tp_ppp.c:142:21: 'pppol2tp_session_get_sock' defined but not used [-Wunused-function] Rather than adding another #ifdef here, using a proper IS_ENABLED() check makes the code more readable and avoids those warnings while letting the compiler figure out for itself which code is needed. This adds one pointer for the unused show() callback in struct l2tp_session, but that seems harmless. Fixes: b0e29063dcb3 ("l2tp: remove pppol2tp_session_ioctl()") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'work.open3' of ↵Linus Torvalds1-24/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs open-related updates from Al Viro: - "do we need fput() or put_filp()" rules are gone - it's always fput() now. We keep track of that state where it belongs - in ->f_mode. - int *opened mess killed - in finish_open(), in ->atomic_open() instances and in fs/namei.c code around do_last()/lookup_open()/atomic_open(). - alloc_file() wrappers with saner calling conventions are introduced (alloc_file_clone() and alloc_file_pseudo()); callers converted, with much simplification. - while we are at it, saner calling conventions for path_init() and link_path_walk(), simplifying things inside fs/namei.c (both on open-related paths and elsewhere). * 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) few more cleanups of link_path_walk() callers allow link_path_walk() to take ERR_PTR() make path_init() unconditionally paired with terminate_walk() document alloc_file() changes make alloc_file() static do_shmat(): grab shp->shm_file earlier, switch to alloc_file_clone() new helper: alloc_file_clone() create_pipe_files(): switch the first allocation to alloc_file_pseudo() anon_inode_getfile(): switch to alloc_file_pseudo() hugetlb_file_setup(): switch to alloc_file_pseudo() ocxlflash_getfile(): switch to alloc_file_pseudo() cxl_getfile(): switch to alloc_file_pseudo() ... and switch shmem_file_setup() to alloc_file_pseudo() __shmem_file_setup(): reorder allocations new wrapper: alloc_file_pseudo() kill FILE_{CREATED,OPENED} switch atomic_open() and lookup_open() to returning 0 in all success cases document ->atomic_open() changes ->atomic_open(): return 0 in all success cases get rid of 'opened' in path_openat() and the helpers downstream ...
2018-08-13net_sched: Fix missing res info when create new tc_index filterHangbin Liu1-0/+1
Li Shuang reported the following warn: [ 733.484610] WARNING: CPU: 6 PID: 21123 at net/sched/sch_cbq.c:1418 cbq_destroy_class+0x5d/0x70 [sch_cbq] [ 733.495190] Modules linked in: sch_cbq cls_tcindex sch_dsmark rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat l [ 733.574155] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ixgbe ahci libahci i2c_algo_bit libata i40e i2c_core dca mdio megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [ 733.592500] CPU: 6 PID: 21123 Comm: tc Not tainted 4.18.0-rc8.latest+ #131 [ 733.600169] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016 [ 733.608518] RIP: 0010:cbq_destroy_class+0x5d/0x70 [sch_cbq] [ 733.614734] Code: e7 d9 d2 48 8b 7b 48 e8 61 05 da d2 48 8d bb f8 00 00 00 e8 75 ae d5 d2 48 39 eb 74 0a 48 89 df 5b 5d e9 16 6c 94 d2 5b 5d c3 <0f> 0b eb b6 0f 1f 44 00 00 66 2e 0f 1f 84 [ 733.635798] RSP: 0018:ffffbfbb066bb9d8 EFLAGS: 00010202 [ 733.641627] RAX: 0000000000000001 RBX: ffff9cdd17392800 RCX: 000000008010000f [ 733.649588] RDX: ffff9cdd1df547e0 RSI: ffff9cdd17392800 RDI: ffff9cdd0f84c800 [ 733.657547] RBP: ffff9cdd0f84c800 R08: 0000000000000001 R09: 0000000000000000 [ 733.665508] R10: ffff9cdd0f84d000 R11: 0000000000000001 R12: 0000000000000001 [ 733.673469] R13: 0000000000000000 R14: 0000000000000001 R15: ffff9cdd17392200 [ 733.681430] FS: 00007f911890a740(0000) GS:ffff9cdd1f8c0000(0000) knlGS:0000000000000000 [ 733.690456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 733.696864] CR2: 0000000000b5544c CR3: 0000000859374002 CR4: 00000000001606e0 [ 733.704826] Call Trace: [ 733.707554] cbq_destroy+0xa1/0xd0 [sch_cbq] [ 733.712318] qdisc_destroy+0x62/0x130 [ 733.716401] dsmark_destroy+0x2a/0x70 [sch_dsmark] [ 733.721745] qdisc_destroy+0x62/0x130 [ 733.725829] qdisc_graft+0x3ba/0x470 [ 733.729817] tc_get_qdisc+0x2a6/0x2c0 [ 733.733901] ? cred_has_capability+0x7d/0x130 [ 733.738761] rtnetlink_rcv_msg+0x263/0x2d0 [ 733.743330] ? rtnl_calcit.isra.30+0x110/0x110 [ 733.748287] netlink_rcv_skb+0x4d/0x130 [ 733.752576] netlink_unicast+0x1a3/0x250 [ 733.756949] netlink_sendmsg+0x2ae/0x3a0 [ 733.761324] sock_sendmsg+0x36/0x40 [ 733.765213] ___sys_sendmsg+0x26f/0x2d0 [ 733.769493] ? handle_pte_fault+0x586/0xdf0 [ 733.774158] ? __handle_mm_fault+0x389/0x500 [ 733.778919] ? __sys_sendmsg+0x5e/0xa0 [ 733.783099] __sys_sendmsg+0x5e/0xa0 [ 733.787087] do_syscall_64+0x5b/0x180 [ 733.791171] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 733.796805] RIP: 0033:0x7f9117f23f10 [ 733.800791] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 [ 733.821873] RSP: 002b:00007ffe96818398 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 733.830319] RAX: ffffffffffffffda RBX: 000000005b71244c RCX: 00007f9117f23f10 [ 733.838280] RDX: 0000000000000000 RSI: 00007ffe968183e0 RDI: 0000000000000003 [ 733.846241] RBP: 00007ffe968183e0 R08: 000000000000ffff R09: 0000000000000003 [ 733.854202] R10: 00007ffe96817e20 R11: 0000000000000246 R12: 0000000000000000 [ 733.862161] R13: 0000000000662ee0 R14: 0000000000000000 R15: 0000000000000000 [ 733.870121] ---[ end trace 28edd4aad712ddca ]--- This is because we didn't update f->result.res when create new filter. Then in tcindex_delete() -> tcf_unbind_filter(), we will failed to find out the res and unbind filter, which will trigger the WARN_ON() in cbq_destroy_class(). Fix it by updating f->result.res when create new filter. Fixes: 6e0565697a106 ("net_sched: fix another crash in cls_tcindex") Reported-by: Li Shuang <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net_sched: fix NULL pointer dereference when delete tcindex filterHangbin Liu1-5/+2
Li Shuang reported the following crash: [ 71.267724] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 71.276456] PGD 800000085d9bd067 P4D 800000085d9bd067 PUD 859a0b067 PMD 0 [ 71.284127] Oops: 0000 [#1] SMP PTI [ 71.288015] CPU: 12 PID: 2386 Comm: tc Not tainted 4.18.0-rc8.latest+ #131 [ 71.295686] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016 [ 71.304037] RIP: 0010:tcindex_delete+0x72/0x280 [cls_tcindex] [ 71.310446] Code: 00 31 f6 48 87 75 20 48 85 f6 74 11 48 8b 47 18 48 8b 40 08 48 8b 40 50 e8 fb a6 f8 fc 48 85 db 0f 84 dc 00 00 00 48 8b 73 18 <8b> 56 04 48 8d 7e 04 85 d2 0f 84 7b 01 00 [ 71.331517] RSP: 0018:ffffb45207b3f898 EFLAGS: 00010282 [ 71.337345] RAX: ffff8ad3d72d6360 RBX: ffff8acc84393680 RCX: 000000000000002e [ 71.345306] RDX: ffff8ad3d72c8570 RSI: 0000000000000000 RDI: ffff8ad847a45800 [ 71.353277] RBP: ffff8acc84393688 R08: ffff8ad3d72c8400 R09: 0000000000000000 [ 71.361238] R10: ffff8ad3de786e00 R11: 0000000000000000 R12: ffffb45207b3f8c7 [ 71.369199] R13: ffff8ad3d93bd2a0 R14: 000000000000002e R15: ffff8ad3d72c9600 [ 71.377161] FS: 00007f9d3ec3e740(0000) GS:ffff8ad3df980000(0000) knlGS:0000000000000000 [ 71.386188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.392597] CR2: 0000000000000004 CR3: 0000000852f06003 CR4: 00000000001606e0 [ 71.400558] Call Trace: [ 71.403299] tcindex_destroy_element+0x25/0x40 [cls_tcindex] [ 71.409611] tcindex_walk+0xbb/0x110 [cls_tcindex] [ 71.414953] tcindex_destroy+0x44/0x90 [cls_tcindex] [ 71.420492] ? tcindex_delete+0x280/0x280 [cls_tcindex] [ 71.426323] tcf_proto_destroy+0x16/0x40 [ 71.430696] tcf_chain_flush+0x51/0x70 [ 71.434876] tcf_block_put_ext.part.30+0x8f/0x1b0 [ 71.440122] tcf_block_put+0x4d/0x70 [ 71.444108] cbq_destroy+0x4d/0xd0 [sch_cbq] [ 71.448869] qdisc_destroy+0x62/0x130 [ 71.452951] dsmark_destroy+0x2a/0x70 [sch_dsmark] [ 71.458300] qdisc_destroy+0x62/0x130 [ 71.462373] qdisc_graft+0x3ba/0x470 [ 71.466359] tc_get_qdisc+0x2a6/0x2c0 [ 71.470443] ? cred_has_capability+0x7d/0x130 [ 71.475307] rtnetlink_rcv_msg+0x263/0x2d0 [ 71.479875] ? rtnl_calcit.isra.30+0x110/0x110 [ 71.484832] netlink_rcv_skb+0x4d/0x130 [ 71.489109] netlink_unicast+0x1a3/0x250 [ 71.493482] netlink_sendmsg+0x2ae/0x3a0 [ 71.497859] sock_sendmsg+0x36/0x40 [ 71.501748] ___sys_sendmsg+0x26f/0x2d0 [ 71.506029] ? handle_pte_fault+0x586/0xdf0 [ 71.510694] ? __handle_mm_fault+0x389/0x500 [ 71.515457] ? __sys_sendmsg+0x5e/0xa0 [ 71.519636] __sys_sendmsg+0x5e/0xa0 [ 71.523626] do_syscall_64+0x5b/0x180 [ 71.527711] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 71.533345] RIP: 0033:0x7f9d3e257f10 [ 71.537331] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 [ 71.558401] RSP: 002b:00007fff6f893398 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 71.566848] RAX: ffffffffffffffda RBX: 000000005b71274d RCX: 00007f9d3e257f10 [ 71.574810] RDX: 0000000000000000 RSI: 00007fff6f8933e0 RDI: 0000000000000003 [ 71.582770] RBP: 00007fff6f8933e0 R08: 000000000000ffff R09: 0000000000000003 [ 71.590729] R10: 00007fff6f892e20 R11: 0000000000000246 R12: 0000000000000000 [ 71.598689] R13: 0000000000662ee0 R14: 0000000000000000 R15: 0000000000000000 [ 71.606651] Modules linked in: sch_cbq cls_tcindex sch_dsmark xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_coni [ 71.685425] libahci i2c_algo_bit i2c_core i40e libata dca mdio megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [ 71.697075] CR2: 0000000000000004 [ 71.700792] ---[ end trace f604eb1acacd978b ]--- Reproducer: tc qdisc add dev lo handle 1:0 root dsmark indices 64 set_tc_index tc filter add dev lo parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2 tc qdisc add dev lo parent 1:0 handle 2:0 cbq bandwidth 10Mbit cell 8 avpkt 1000 mpu 64 tc class add dev lo parent 2:0 classid 2:1 cbq bandwidth 10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated allot 1514 weight 1 maxburst 10 tc filter add dev lo parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:1 pass_on tc qdisc add dev lo parent 2:1 pfifo limit 5 tc qdisc del dev lo root This is because in tcindex_set_parms, when there is no old_r, we set new exts to cr.exts. And we didn't set it to filter when r == &new_filter_result. Then in tcindex_delete() -> tcf_exts_get_net(), we will get NULL pointer dereference as we didn't init exts. Fix it by moving tcf_exts_change() after "if (old_r && old_r != r)" check. Then we don't need "cr" as there is no errout after that. Fixes: bf63ac73b3e13 ("net_sched: fix an oops in tcindex filter") Reported-by: Li Shuang <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds5-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking/atomics update from Thomas Gleixner: "The locking, atomics and memory model brains delivered: - A larger update to the atomics code which reworks the ordering barriers, consolidates the atomic primitives, provides the new atomic64_fetch_add_unless() primitive and cleans up the include hell. - Simplify cmpxchg() instrumentation and add instrumentation for xchg() and cmpxchg_double(). - Updates to the memory model and documentation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/atomics: Rework ordering barriers locking/atomics: Instrument cmpxchg_double*() locking/atomics: Instrument xchg() locking/atomics: Simplify cmpxchg() instrumentation locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation tools/memory-model: Rename litmus tests to comply to norm7 tools/memory-model/Documentation: Fix typo, smb->smp sched/Documentation: Update wake_up() & co. memory-barrier guarantees locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock() sched/core: Use smp_mb() in wake_woken_function() tools/memory-model: Add informal LKMM documentation to MAINTAINERS locking/atomics/Documentation: Describe atomic_set() as a write operation tools/memory-model: Make scripts executable tools/memory-model: Remove ACCESS_ONCE() from model tools/memory-model: Remove ACCESS_ONCE() from recipes locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example MAINTAINERS: Add Daniel Lustig as an LKMM reviewer tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name tools/memory-model: Add litmus test for full multicopy atomicity locking/refcount: Always allow checked forms ...
2018-08-13net: sched: act_ife: disable bh when taking ife_mod_lockVlad Buslov1-10/+10
Lockdep reports deadlock for following locking scenario in ife action: Task one: 1) Executes ife action update. 2) Takes tcfa_lock. 3) Waits on ife_mod_lock which is already taken by task two. Task two: 1) Executes any path that obtains ife_mod_lock without disabling bh (any path that takes ife_mod_lock while holding tcfa_lock has bh disabled) like loading a meta module, or creating new action. 2) Takes ife_mod_lock. 3) Task is preempted by rate estimator timer. 4) Timer callback waits on tcfa_lock which is taken by task one. In described case tasks deadlock because they take same two locks in different order. To prevent potential deadlock reported by lockdep, always disable bh when obtaining ife_mod_lock. Lockdep warning: [ 508.101192] ===================================================== [ 508.107708] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 508.114728] 4.18.0-rc8+ #646 Not tainted [ 508.119050] ----------------------------------------------------- [ 508.125559] tc/5460 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: [ 508.132025] 000000005a938c68 (ife_mod_lock){++++}, at: find_ife_oplist+0x1e/0xc0 [act_ife] [ 508.140996] and this task is already holding: [ 508.147548] 00000000d46f6c56 (&(&p->tcfa_lock)->rlock){+.-.}, at: tcf_ife_init+0x6ae/0xf40 [act_ife] [ 508.157371] which would create a new lock dependency: [ 508.162828] (&(&p->tcfa_lock)->rlock){+.-.} -> (ife_mod_lock){++++} [ 508.169572] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 508.178197] (&(&p->tcfa_lock)->rlock){+.-.} [ 508.178201] ... which became SOFTIRQ-irq-safe at: [ 508.189771] _raw_spin_lock+0x2c/0x40 [ 508.193906] est_fetch_counters+0x41/0xb0 [ 508.198391] est_timer+0x83/0x3c0 [ 508.202180] call_timer_fn+0x16a/0x5d0 [ 508.206400] run_timer_softirq+0x399/0x920 [ 508.210967] __do_softirq+0x157/0x97d [ 508.215102] irq_exit+0x152/0x1c0 [ 508.218888] smp_apic_timer_interrupt+0xc0/0x4e0 [ 508.223976] apic_timer_interrupt+0xf/0x20 [ 508.228540] cpuidle_enter_state+0xf8/0x5d0 [ 508.233198] do_idle+0x28a/0x350 [ 508.236881] cpu_startup_entry+0xc7/0xe0 [ 508.241296] start_secondary+0x2e8/0x3f0 [ 508.245678] secondary_startup_64+0xa5/0xb0 [ 508.250347] to a SOFTIRQ-irq-unsafe lock: (ife_mod_lock){++++} [ 508.256531] ... which became SOFTIRQ-irq-unsafe at: [ 508.267279] ... [ 508.267283] _raw_write_lock+0x2c/0x40 [ 508.273653] register_ife_op+0x118/0x2c0 [act_ife] [ 508.278926] do_one_initcall+0xf7/0x4d9 [ 508.283214] do_init_module+0x18b/0x44e [ 508.287521] load_module+0x4167/0x5730 [ 508.291739] __do_sys_finit_module+0x16d/0x1a0 [ 508.296654] do_syscall_64+0x7a/0x3f0 [ 508.300788] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 508.306302] other info that might help us debug this: [ 508.315286] Possible interrupt unsafe locking scenario: [ 508.322771] CPU0 CPU1 [ 508.327681] ---- ---- [ 508.332604] lock(ife_mod_lock); [ 508.336300] local_irq_disable(); [ 508.342608] lock(&(&p->tcfa_lock)->rlock); [ 508.349793] lock(ife_mod_lock); [ 508.355990] <Interrupt> [ 508.358974] lock(&(&p->tcfa_lock)->rlock); [ 508.363803] *** DEADLOCK *** [ 508.370715] 2 locks held by tc/5460: [ 508.374680] #0: 00000000e27e4fa4 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x583/0x7b0 [ 508.383366] #1: 00000000d46f6c56 (&(&p->tcfa_lock)->rlock){+.-.}, at: tcf_ife_init+0x6ae/0xf40 [act_ife] [ 508.393648] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 508.403505] -> (&(&p->tcfa_lock)->rlock){+.-.} ops: 1001553 { [ 508.409646] HARDIRQ-ON-W at: [ 508.413136] _raw_spin_lock_bh+0x34/0x40 [ 508.419059] gnet_stats_start_copy_compat+0xa2/0x230 [ 508.426021] gnet_stats_start_copy+0x16/0x20 [ 508.432333] tcf_action_copy_stats+0x95/0x1d0 [ 508.438735] tcf_action_dump_1+0xb0/0x4e0 [ 508.444795] tcf_action_dump+0xca/0x200 [ 508.450673] tcf_exts_dump+0xd9/0x320 [ 508.456392] fl_dump+0x1b7/0x4a0 [cls_flower] [ 508.462798] tcf_fill_node+0x380/0x530 [ 508.468601] tfilter_notify+0xdf/0x1c0 [ 508.474404] tc_new_tfilter+0x84a/0xc90 [ 508.480270] rtnetlink_rcv_msg+0x5bd/0x7b0 [ 508.486419] netlink_rcv_skb+0x184/0x220 [ 508.492394] netlink_unicast+0x31b/0x460 [ 508.507411] netlink_sendmsg+0x3fb/0x840 [ 508.513390] sock_sendmsg+0x7b/0xd0 [ 508.518907] ___sys_sendmsg+0x4c6/0x610 [ 508.524797] __sys_sendmsg+0xd7/0x150 [ 508.530510] do_syscall_64+0x7a/0x3f0 [ 508.536201] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 508.543301] IN-SOFTIRQ-W at: [ 508.546834] _raw_spin_lock+0x2c/0x40 [ 508.552522] est_fetch_counters+0x41/0xb0 [ 508.558571] est_timer+0x83/0x3c0 [ 508.563912] call_timer_fn+0x16a/0x5d0 [ 508.569699] run_timer_softirq+0x399/0x920 [ 508.575840] __do_softirq+0x157/0x97d [ 508.581538] irq_exit+0x152/0x1c0 [ 508.586882] smp_apic_timer_interrupt+0xc0/0x4e0 [ 508.593533] apic_timer_interrupt+0xf/0x20 [ 508.599686] cpuidle_enter_state+0xf8/0x5d0 [ 508.605895] do_idle+0x28a/0x350 [ 508.611147] cpu_startup_entry+0xc7/0xe0 [ 508.617097] start_secondary+0x2e8/0x3f0 [ 508.623029] secondary_startup_64+0xa5/0xb0 [ 508.629245] INITIAL USE at: [ 508.632686] _raw_spin_lock_bh+0x34/0x40 [ 508.638557] gnet_stats_start_copy_compat+0xa2/0x230 [ 508.645491] gnet_stats_start_copy+0x16/0x20 [ 508.651719] tcf_action_copy_stats+0x95/0x1d0 [ 508.657992] tcf_action_dump_1+0xb0/0x4e0 [ 508.663937] tcf_action_dump+0xca/0x200 [ 508.669716] tcf_exts_dump+0xd9/0x320 [ 508.675337] fl_dump+0x1b7/0x4a0 [cls_flower] [ 508.681650] tcf_fill_node+0x380/0x530 [ 508.687366] tfilter_notify+0xdf/0x1c0 [ 508.693031] tc_new_tfilter+0x84a/0xc90 [ 508.698820] rtnetlink_rcv_msg+0x5bd/0x7b0 [ 508.704869] netlink_rcv_skb+0x184/0x220 [ 508.710758] netlink_unicast+0x31b/0x460 [ 508.716627] netlink_sendmsg+0x3fb/0x840 [ 508.722510] sock_sendmsg+0x7b/0xd0 [ 508.727931] ___sys_sendmsg+0x4c6/0x610 [ 508.733729] __sys_sendmsg+0xd7/0x150 [ 508.739346] do_syscall_64 +0x7a/0x3f0 [ 508.744943] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 508.751930] } [ 508.753964] ... key at: [<ffffffff916b3e20>] __key.61145+0x0/0x40 [ 508.760946] ... acquired at: [ 508.764294] _raw_read_lock+0x2f/0x40 [ 508.768513] find_ife_oplist+0x1e/0xc0 [act_ife] [ 508.773692] tcf_ife_init+0x82f/0xf40 [act_ife] [ 508.778785] tcf_action_init_1+0x510/0x750 [ 508.783468] tcf_action_init+0x1e8/0x340 [ 508.787938] tcf_action_add+0xc5/0x240 [ 508.792241] tc_ctl_action+0x203/0x2a0 [ 508.796550] rtnetlink_rcv_msg+0x5bd/0x7b0 [ 508.801200] netlink_rcv_skb+0x184/0x220 [ 508.805674] netlink_unicast+0x31b/0x460 [ 508.810129] netlink_sendmsg+0x3fb/0x840 [ 508.814611] sock_sendmsg+0x7b/0xd0 [ 508.818665] ___sys_sendmsg+0x4c6/0x610 [ 508.823029] __sys_sendmsg+0xd7/0x150 [ 508.827246] do_syscall_64+0x7a/0x3f0 [ 508.831483] entry_SYSCALL_64_after_hwframe+0x49/0xbe the dependencies between the lock to be acquired [ 508.838945] and SOFTIRQ-irq-unsafe lock: [ 508.851177] -> (ife_mod_lock){++++} ops: 95 { [ 508.855920] HARDIRQ-ON-W at: [ 508.859478] _raw_write_lock+0x2c/0x40 [ 508.865264] register_ife_op+0x118/0x2c0 [act_ife] [ 508.872071] do_one_initcall+0xf7/0x4d9 [ 508.877947] do_init_module+0x18b/0x44e [ 508.883819] load_module+0x4167/0x5730 [ 508.889595] __do_sys_finit_module+0x16d/0x1a0 [ 508.896043] do_syscall_64+0x7a/0x3f0 [ 508.901734] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 508.908827] HARDIRQ-ON-R at: [ 508.912359] _raw_read_lock+0x2f/0x40 [ 508.918043] find_ife_oplist+0x1e/0xc0 [act_ife] [ 508.924692] tcf_ife_init+0x82f/0xf40 [act_ife] [ 508.931252] tcf_action_init_1+0x510/0x750 [ 508.937393] tcf_action_init+0x1e8/0x340 [ 508.943366] tcf_action_add+0xc5/0x240 [ 508.949130] tc_ctl_action+0x203/0x2a0 [ 508.954922] rtnetlink_rcv_msg+0x5bd/0x7b0 [ 508.961024] netlink_rcv_skb+0x184/0x220 [ 508.966970] netlink_unicast+0x31b/0x460 [ 508.972915] netlink_sendmsg+0x3fb/0x840 [ 508.978859] sock_sendmsg+0x7b/0xd0 [ 508.984400] ___sys_sendmsg+0x4c6/0x610 [ 508.990264] __sys_sendmsg+0xd7/0x150 [ 508.995952] do_syscall_64+0x7a/0x3f0 [ 509.001643] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 509.008722] SOFTIRQ-ON-W at:\ [ 509.012242] _raw_write_lock+0x2c/0x40 [ 509.018013] register_ife_op+0x118/0x2c0 [act_ife] [ 509.024841] do_one_initcall+0xf7/0x4d9 [ 509.030720] do_init_module+0x18b/0x44e [ 509.036604] load_module+0x4167/0x5730 [ 509.042397] __do_sys_finit_module+0x16d/0x1a0 [ 509.048865] do_syscall_64+0x7a/0x3f0 [ 509.054551] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 509.061636] SOFTIRQ-ON-R at: [ 509.065145] _raw_read_lock+0x2f/0x40 [ 509.070854] find_ife_oplist+0x1e/0xc0 [act_ife] [ 509.077515] tcf_ife_init+0x82f/0xf40 [act_ife] [ 509.084051] tcf_action_init_1+0x510/0x750 [ 509.090172] tcf_action_init+0x1e8/0x340 [ 509.096124] tcf_action_add+0xc5/0x240 [ 509.101891] tc_ctl_action+0x203/0x2a0 [ 509.107671] rtnetlink_rcv_msg+0x5bd/0x7b0 [ 509.113811] netlink_rcv_skb+0x184/0x220 [ 509.119768] netlink_unicast+0x31b/0x460 [ 509.125716] netlink_sendmsg+0x3fb/0x840 [ 509.131668] sock_sendmsg+0x7b/0xd0 [ 509.137167] ___sys_sendmsg+0x4c6/0x610 [ 509.143010] __sys_sendmsg+0xd7/0x150 [ 509.148718] do_syscall_64+0x7a/0x3f0 [ 509.154443] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 509.161533] INITIAL USE at: [ 509.164956] _raw_read_lock+0x2f/0x40 [ 509.170574] find_ife_oplist+0x1e/0xc0 [act_ife] [ 509.177134] tcf_ife_init+0x82f/0xf40 [act_ife] [ 509.183619] tcf_action_init_1+0x510/0x750 [ 509.189674] tcf_action_init+0x1e8/0x340 [ 509.195534] tcf_action_add+0xc5/0x240 [ 509.201229] tc_ctl_action+0x203/0x2a0 [ 509.206920] rtnetlink_rcv_msg+0x5bd/0x7b0 [ 509.212936] netlink_rcv_skb+0x184/0x220 [ 509.218818] netlink_unicast+0x31b/0x460 [ 509.224699] netlink_sendmsg+0x3fb/0x840 [ 509.230581] sock_sendmsg+0x7b/0xd0 [ 509.235984] ___sys_sendmsg+0x4c6/0x610 [ 509.241791] __sys_sendmsg+0xd7/0x150 [ 509.247425] do_syscall_64+0x7a/0x3f0 [ 509.253007] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 509.259975] } [ 509.261998] ... key at: [<ffffffffc1554258>] ife_mod_lock+0x18/0xffffffffffff8dc0 [act_ife] [ 509.271569] ... acquired at: [ 509.274912] _raw_read_lock+0x2f/0x40 [ 509.279134] find_ife_oplist+0x1e/0xc0 [act_ife] [ 509.284324] tcf_ife_init+0x82f/0xf40 [act_ife] [ 509.289425] tcf_action_init_1+0x510/0x750 [ 509.294068] tcf_action_init+0x1e8/0x340 [ 509.298553] tcf_action_add+0xc5/0x240 [ 509.302854] tc_ctl_action+0x203/0x2a0 [ 509.307153] rtnetlink_rcv_msg+0x5bd/0x7b0 [ 509.311805] netlink_rcv_skb+0x184/0x220 [ 509.316282] netlink_unicast+0x31b/0x460 [ 509.320769] netlink_sendmsg+0x3fb/0x840 [ 509.325248] sock_sendmsg+0x7b/0xd0 [ 509.329290] ___sys_sendmsg+0x4c6/0x610 [ 509.333687] __sys_sendmsg+0xd7/0x150 [ 509.337902] do_syscall_64+0x7a/0x3f0 [ 509.342116] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 509.349601] stack backtrace: [ 509.354663] CPU: 6 PID: 5460 Comm: tc Not tainted 4.18.0-rc8+ #646 [ 509.361216] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 Fixes: ef6980b6becb ("introduce IFE action") Signed-off-by: Vlad Buslov <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller9-88/+482
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-08-13 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add driver XDP support for veth. This can be used in conjunction with redirect of another XDP program e.g. sitting on NIC so the xdp_frame can be forwarded to the peer veth directly without modification, from Toshiaki. 2) Add a new BPF map type REUSEPORT_SOCKARRAY and prog type SK_REUSEPORT in order to provide more control and visibility on where a SO_REUSEPORT sk should be located, and the latter enables to directly select a sk from the bpf map. This also enables map-in-map for application migration use cases, from Martin. 3) Add a new BPF helper bpf_skb_ancestor_cgroup_id() that returns the id of cgroup v2 that is the ancestor of the cgroup associated with the skb at the ancestor_level, from Andrey. 4) Implement BPF fs map pretty-print support based on BTF data for regular hash table and LRU map, from Yonghong. 5) Decouple the ability to attach BTF for a map from the key and value pretty-printer in BPF fs, and enable further support of BTF for maps for percpu and LPM trie, from Daniel. 6) Implement a better BPF sample of using XDP's CPU redirect feature for load balancing SKB processing to remote CPU. The sample implements the same XDP load balancing as Suricata does which is symmetric hash based on IP and L4 protocol, from Jesper. 7) Revert adding NULL pointer check with WARN_ON_ONCE() in __xdp_return()'s critical path as it is ensured that the allocator is present, from Björn. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-08-13packet: switch kvzalloc to allocate memoryLi RongQing2-32/+13
The patches includes following change: *Use modern kvzalloc()/kvfree() instead of custom allocations. *Remove order argument for alloc_pg_vec, it can get from req. *Remove order argument for free_pg_vec, free_pg_vec now uses kvfree which does not need order argument. *Remove pg_vec_order from struct packet_ring_buffer, no longer need to save/restore 'order' *Remove variable 'order' for packet_set_ring, it is now unused Signed-off-by: Zhang Yu <[email protected]> Signed-off-by: Li RongQing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_mirred method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_vlan method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_skbmod method rename for grep-ability and consistencyJamal Hadi Salim1-2/+2
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_skbedit method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_simple method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_police method rename for grep-ability and consistencyJamal Hadi Salim1-8/+8
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_pedit method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_nat method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_ipt method rename for grep-ability and consistencyJamal Hadi Salim1-4/+4
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_gact method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_sum method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_bpf method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net: sched: act_connmark method rename for grep-ability and consistencyJamal Hadi Salim1-3/+3
Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13crush: fix using plain integer as NULL warningYueHaibing1-2/+2
Fixes the following sparse warnings: net/ceph/crush/mapper.c:517:76: warning: Using plain integer as NULL pointer net/ceph/crush/mapper.c:728:68: warning: Using plain integer as NULL pointer Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-08-13libceph: remove unnecessary non NULL check for request_keyYueHaibing1-1/+1
request_key never return NULL,so no need do non-NULL check. Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2018-08-13l2tp: use sk_dst_check() to avoid race on sk->sk_dst_cacheWei Wang1-1/+1
In l2tp code, if it is a L2TP_UDP_ENCAP tunnel, tunnel->sk points to a UDP socket. User could call sendmsg() on both this tunnel and the UDP socket itself concurrently. As l2tp_xmit_skb() holds socket lock and call __sk_dst_check() to refresh sk->sk_dst_cache, while udpv6_sendmsg() is lockless and call sk_dst_check() to refresh sk->sk_dst_cache, there could be a race and cause the dst cache to be freed multiple times. So we fix l2tp side code to always call sk_dst_check() to garantee xchg() is called when refreshing sk->sk_dst_cache to avoid race conditions. Syzkaller reported stack trace: BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: use-after-free in atomic_fetch_add_unless include/linux/atomic.h:575 [inline] BUG: KASAN: use-after-free in atomic_add_unless include/linux/atomic.h:597 [inline] BUG: KASAN: use-after-free in dst_hold_safe include/net/dst.h:308 [inline] BUG: KASAN: use-after-free in ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 Read of size 4 at addr ffff8801aea9a880 by task syz-executor129/4829 CPU: 0 PID: 4829 Comm: syz-executor129 Not tainted 4.18.0-rc7-next-20180802+ #30 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] atomic_fetch_add_unless include/linux/atomic.h:575 [inline] atomic_add_unless include/linux/atomic.h:597 [inline] dst_hold_safe include/net/dst.h:308 [inline] ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 rt6_get_pcpu_route net/ipv6/route.c:1249 [inline] ip6_pol_route+0x354/0xd20 net/ipv6/route.c:1922 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2098 fib6_rule_lookup+0x283/0x890 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2126 ip6_dst_lookup_tail+0x1278/0x1da0 net/ipv6/ip6_output.c:978 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079 ip6_sk_dst_lookup_flow+0x5ed/0xc50 net/ipv6/ip6_output.c:1117 udpv6_sendmsg+0x2163/0x36b0 net/ipv6/udp.c:1354 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x51d/0x930 net/socket.c:2115 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2210 __do_sys_sendmmsg net/socket.c:2239 [inline] __se_sys_sendmmsg net/socket.c:2236 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2236 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446a29 Code: e8 ac b8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4de5532db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000006dcc38 RCX: 0000000000446a29 RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003 RBP: 00000000006dcc30 R08: 00007f4de5533700 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dcc3c R13: 00007ffe2b830fdf R14: 00007f4de55339c0 R15: 0000000000000001 Fixes: 71b1391a4128 ("l2tp: ensure sk->dst is still valid") Reported-by: [email protected] Signed-off-by: Wei Wang <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Cc: Guillaume Nault <[email protected]> Cc: David Ahern <[email protected]> Cc: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13ipv6: Add icmp_echo_ignore_all support for ICMPv6Virgile Jarry2-3/+14
Preventing the kernel from responding to ICMP Echo Requests messages can be useful in several ways. The sysctl parameter 'icmp_echo_ignore_all' can be used to prevent the kernel from responding to IPv4 ICMP echo requests. For IPv6 pings, such a sysctl kernel parameter did not exist. Add the ability to prevent the kernel from responding to IPv6 ICMP echo requests through the use of the following sysctl parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all. Update the documentation to reflect this change. Signed-off-by: Virgile Jarry <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net/tls: Combined memory allocation for decryption requestVakul Garg1-96/+142
For preparing decryption request, several memory chunks are required (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to an accelerator, it is required that the buffers which are read by the accelerator must be dma-able and not come from stack. The buffers for aad and iv can be separately kmalloced each, but it is inefficient. This patch does a combined allocation for preparing decryption request and then segments into aead_req || sgin || sgout || iv || aad. Signed-off-by: Vakul Garg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-08-13net/9p/trans_virtio.c: add null terminal for mount tagpiaojun1-10/+7
chan->tag is Non-null terminated which will result in printing messy code when debugging code. So we should add '\0' for tag to make the code more convenient and robust. In addition, I drop char->tag_len to simplify the code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jun Piao <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2018-08-139p/virtio: fix off-by-one error in sg list bounds checkjiangyiwen1-1/+2
Because the value of limit is VIRTQUEUE_NUM, if index is equal to limit, it will cause sg array out of bounds, so correct the judgement of BUG_ON. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yiwen Jiang <[email protected]> Reported-By: Dan Carpenter <[email protected]> Acked-by: Jun Piao <[email protected]> Cc: [email protected] Signed-off-by: Dominique Martinet <[email protected]>
2018-08-139p: fix whitespace issuesStephen Hemminger3-4/+3
Remove trailing whitespace and blank lines at EOF Link: http://lkml.kernel.org/m/[email protected] Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2018-08-139p: fix multiple NULL-pointer-dereferencesTomas Bortoli4-1/+13
Added checks to prevent GPFs from raising. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tomas Bortoli <[email protected]> Reported-by: [email protected] Cc: [email protected] Signed-off-by: Dominique Martinet <[email protected]>
2018-08-139p: validate PDU lengthTomas Bortoli4-11/+24
This commit adds length check for the PDU size. The size contained in the header has to match the actual size, except for TCP (trans_fd.c) where actual length is not known ahead and the header's length will be checked only against the validity range. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tomas Bortoli <[email protected]> Reported-by: [email protected] To: Eric Van Hensbergen <[email protected]> To: Ron Minnich <[email protected]> To: Latchesar Ionkov <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2018-08-13net/9p/trans_fd.c: fix race by holding the lockTomas Bortoli1-5/+5
It may be possible to run p9_fd_cancel() with a deleted req->req_list and incur in a double del. To fix hold the client->lock while changing the status, so the other threads will be synchronized. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tomas Bortoli <[email protected]> Reported-by: [email protected] To: Eric Van Hensbergen <[email protected]> To: Ron Minnich <[email protected]> To: Latchesar Ionkov <[email protected]> Cc: Yiwen Jiang <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2018-08-13net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree()Tomas Bortoli1-0/+2
The patch adds the flush in p9_mux_poll_stop() as it the function used by p9_conn_destroy(), in turn called by p9_fd_close() to stop the async polling associated with the data regarding the connection. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tomas Bortoli <[email protected]> Reported-by: [email protected] To: Eric Van Hensbergen <[email protected]> To: Ron Minnich <[email protected]> To: Latchesar Ionkov <[email protected]> Cc: Yiwen Jiang <[email protected]> Cc: [email protected] Signed-off-by: Dominique Martinet <[email protected]>
2018-08-13net/9p/virtio: Fix hard lockup in req_donejiangyiwen1-10/+11
When client has multiple threads that issue io requests all the time, and the server has a very good performance, it may cause cpu is running in the irq context for a long time because it can check virtqueue has buf in the *while* loop. So we should keep chan->lock in the whole loop. [ Dominique: reworded subject line ] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yiwen Jiang <[email protected]> To: Andrew Morton <[email protected]> To: Eric Van Hensbergen <[email protected]> To: Ron Minnich <[email protected]> To: Latchesar Ionkov <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2018-08-13net/9p/trans_virtio.c: fix some spell mistakes in commentspiaojun1-2/+2
Fix spelling mistake in comments of p9_virtio_zc_request(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jun Piao <[email protected]> Cc: Eric Van Hensbergen <[email protected]> Cc: Ron Minnich <[email protected]> Cc: Latchesar Ionkov <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2018-08-139p/net: Fix zero-copy path in the 9p virtio transportChirantan Ekbote1-0/+7
The zero-copy optimization when reading or writing large chunks of data is quite useful. However, the 9p messages created through the zero-copy write path have an incorrect message size: it should be the size of the header + size of the data being written but instead it's just the size of the header. This only works if the server ignores the size field of the message and otherwise breaks the framing of the protocol. Fix this by re-writing the message size field with the correct value. Tested by running `dd if=/dev/zero of=out bs=4k count=1` inside a virtio-9p mount. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chirantan Ekbote <[email protected]> Reviewed-by: Greg Kurz <[email protected]> Tested-by: Greg Kurz <[email protected]> Cc: Dylan Reid <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: [email protected] Signed-off-by: Dominique Martinet <[email protected]>
2018-08-139p: Embed wait_queue_head into p9_req_tMatthew Wilcox2-15/+6
On a 64-bit system, the wait_queue_head_t is 24 bytes while the pointer to it is 8 bytes. Growing the p9_req_t by 16 bytes is better than performing a 24-byte memory allocation. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Reviewed-by: Greg Kurz <[email protected]> Cc: Eric Van Hensbergen <[email protected]> Cc: Ron Minnich <[email protected]> Cc: Latchesar Ionkov <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>