aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-07-15fq_codel: fix return value of fq_codel_drop()WANG Cong1-1/+10
The ->drop() is supposed to return the number of bytes it dropped, however fq_codel_drop() returns the index of the flow where it drops a packet from. Fix this by introducing a helper to wrap fq_codel_drop(). Cc: Eric Dumazet <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15net_sched: fix a use-after-free in sfqWANG Cong1-1/+1
Fixes: 25331d6ce42b ("net: sched: implement qstat helper routines") Cc: John Fastabend <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15ipv6: lock socket in ip6_datagram_connect()Eric Dumazet2-9/+27
ip6_datagram_connect() is doing a lot of socket changes without socket being locked. This looks wrong, at least for udp_lib_rehash() which could corrupt lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15fq_codel: fix a use-after-freeWANG Cong1-1/+1
Fixes: 25331d6ce42b ("net: sched: implement qstat helper routines") Cc: John Fastabend <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15tcp: don't use F-RTO on non-recurring timeoutsYuchung Cheng1-2/+1
Currently F-RTO may repeatedly send new data packets on non-recurring timeouts in CA_Loss mode. This is a bug because F-RTO (RFC5682) should only be used on either new recovery or recurring timeouts. This exacerbates the recovery progress during frequent timeout & repair, because we prioritize sending new data packets instead of repairing the holes when the bandwidth is already scarce. Fix it by correcting the test of a new recovery episode. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15bridge: mdb: fix double add notificationNikolay Aleksandrov1-1/+0
Since the mdb add/del code was introduced there have been 2 br_mdb_notify calls when doing br_mdb_add() resulting in 2 notifications on each add. Example: Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent Before patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent [MDB]dev br0 port eth1 grp 239.0.0.1 permanent After patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov <[email protected]> Fixes: cfd567543590 ("bridge: add support of adding and deleting mdb entries") Signed-off-by: David S. Miller <[email protected]>
2015-07-15bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leaveSatish Ashok1-7/+30
A report with INCLUDE/Change_to_include and empty source list should be treated as a leave, specified by RFC 3376, section 3.1: "If the requested filter mode is INCLUDE *and* the requested source list is empty, then the entry corresponding to the requested interface and multicast address is deleted if present. If no such entry is present, the request is ignored." Signed-off-by: Satish Ashok <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15Merge tag 'for-linus' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Mainly fix-ups for the various 4.2 items" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits) IB/core: Destroy ocrdma_dev_id IDR on module exit IB/core: Destroy multcast_idr on module exit IB/mlx4: Optimize do_slave_init IB/mlx4: Fix memory leak in do_slave_init IB/mlx4: Optimize freeing of items on error unwind IB/mlx4: Fix use of flow-counters for process_mad IB/ipath: Convert use of __constant_<foo> to <foo> IB/ipoib: Set MTU to max allowed by mode when mode changes IB/ipoib: Scatter-Gather support in connected mode IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush IB/ucma: Fix lockdep warning in ucma_lock_files rds: rds_ib_device.refcount overflow RDMA/nes: Fix for incorrect recording of the MAC address RDMA/nes: Fix for resolving the neigh RDMA/core: Fixes for port mapper client registration IB/IPoIB: Fix bad error flow in ipoib_add_port() IB/mlx4: Do not attemp to report HCA clock offset on VFs IB/cm: Do not queue work to a device that's going away IB/srp: Avoid using uninitialized variable ...
2015-07-15net: Fix skb csum races when peekingHerbert Xu1-6/+9
When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15NET: AX.25: Stop heartbeat timer on disconnect.Richard Stearn1-0/+1
This may result in a kernel panic. The bug has always existed but somehow we've run out of luck now and it bites. Signed-off-by: Richard Stearn <[email protected]> Cc: [email protected] # all branches Signed-off-by: Ralf Baechle <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15net: Clone skb before setting peeked flagHerbert Xu1-3/+38
Shared skbs must not be modified and this is crucial for broadcast and/or multicast paths where we use it as an optimisation to avoid unnecessary cloning. The function skb_recv_datagram breaks this rule by setting peeked without cloning the skb first. This causes funky races which leads to double-free. This patch fixes this by cloning the skb and replacing the skb in the list when setting skb->peeked. Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv") Reported-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-15rtnetlink: reject non-IFLA_VF_PORT attributes inside IFLA_VF_PORTSDaniel Borkmann1-4/+7
Similarly as in commit 4f7d2cdfdde7 ("rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver"), we have a double nesting of netlink attributes, i.e. IFLA_VF_PORTS only contains IFLA_VF_PORT that is nested itself. While IFLA_VF_PORTS is a verified attribute from ifla_policy[], we only check if the IFLA_VF_PORTS container has IFLA_VF_PORT attributes and then pass the attribute's content itself via nla_parse_nested(). It would be more correct to reject inner types other than IFLA_VF_PORT instead of continuing parsing and also similarly as in commit 4f7d2cdfdde7, to check for a minimum of NLA_HDRLEN. Signed-off-by: Daniel Borkmann <[email protected]> Cc: Roopa Prabhu <[email protected]> Cc: Scott Feldman <[email protected]> Cc: Jason Gunthorpe <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-14rds: rds_ib_device.refcount overflowWengang Wang1-1/+3
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device") There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr failed(mr pool running out). this lead to the refcount overflow. A complain in line 117(see following) is seen. From vmcore: s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448. That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely to return ERR_PTR(-EAGAIN). 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) 116 { 117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0); 118 if (atomic_dec_and_test(&rds_ibdev->refcount)) 119 queue_work(rds_wq, &rds_ibdev->free_work); 120 } fix is to drop refcount when rds_ib_alloc_fmr failed. Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Haggai Eran <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
2015-07-14ipvs: call skb_sender_cpu_clearJulian Anastasov1-0/+6
Reset XPS's sender_cpu on forwarding. Signed-off-by: Julian Anastasov <[email protected]> Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices") Signed-off-by: Simon Horman <[email protected]>
2015-07-14ipvs: fix crash with sync protocol v0 and FTPJulian Anastasov1-1/+1
Fix crash in 3.5+ if FTP is used after switching sync_version to 0. Fixes: 749c42b620a9 ("ipvs: reduce sync rate with time thresholds") Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2015-07-14ipvs: skb_orphan in case of forwardingAlex Gartrell1-0/+27
It is possible that we bind against a local socket in early_demux when we are actually going to want to forward it. In this case, the socket serves no purpose and only serves to confuse things (particularly functions which implicitly expect sk_fullsock to be true, like ip_local_out). Additionally, skb_set_owner_w is totally broken for non full-socks. Signed-off-by: Alex Gartrell <[email protected]> Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Acked-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2015-07-14ipvs: fix crash if scheduler is changedJulian Anastasov3-37/+69
I overlooked the svc->sched_data usage from schedulers when the services were converted to RCU in 3.10. Now the rare ipvsadm -E command can change the scheduler but due to the reverse order of ip_vs_bind_scheduler and ip_vs_unbind_scheduler we provide new sched_data to the old scheduler resulting in a crash. To fix it without changing the scheduler methods we have to use synchronize_rcu() only for the editing case. It means all svc->scheduler readers should expect a NULL value. To avoid breakage for the service listing and ipvsadm -R we can use the "none" name to indicate that scheduler is not assigned, a state when we drop new connections. Reported-by: Alexander Vasiliev <[email protected]> Fixes: ceec4c381681 ("ipvs: convert services to rcu") Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2015-07-14ipvs: do not use random local source address for tunnelsJulian Anastasov1-1/+0
Michael Vallaly reports about wrong source address used in rare cases for tunneled traffic. Looks like __ip_vs_get_out_rt in 3.10+ is providing uninitialized dest_dst->dst_saddr.ip because ip_vs_dest_dst_alloc uses kmalloc. While we retry after seeing EINVAL from routing for data that does not look like valid local address, it still succeeded when this memory was previously used from other dests and with different local addresses. As result, we can use valid local address that is not suitable for our real server. Fix it by providing 0.0.0.0 every time our cache is refreshed. By this way we will get preferred source address from routing. Reported-by: Michael Vallaly <[email protected]> Fixes: 026ace060dfe ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2015-07-14ipvs: fix ipv6 route unreach panicAlex Gartrell1-0/+7
Previously there was a trivial panic unshare -n /bin/bash <<EOF ip addr add dev lo face::1/128 ipvsadm -A -t [face::1]:15213 ipvsadm -a -t [face::1]:15213 -r b00c::1 echo boom | nc face::1 15213 EOF This patch allows us to replicate the net logic above and simply capture the skb_dst(skb)->dev and use that for the purpose of the invocation. Signed-off-by: Alex Gartrell <[email protected]> Acked-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2015-07-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds25-194/+242
Pull networking fixes from David Miller: 1) Missing list head init in bluetooth hidp session creation, from Tedd Ho-Jeong An. 2) Don't leak SKB in bridge netfilter error paths, from Florian Westphal. 3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien Grall. 4) Fix regression in IP over hamradio bpq encapsulation, from Ralf Baechle. 5) Fix race between rhashtable resize events and table walks, from Phil Sutter. 6) Missing validation of IFLA_VF_INFO netlink attributes, fix from Daniel Borkmann. 7) Missing security layer socket state initialization in tipc code, from Stephen Smalley. 8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from Denys Vlasenko. 9) Missing minor_idr destroy on module unload on macvtap driver, from Johannes Thumshirn. 10) Various pktgen kernel thread races, from Oleg Nesterov. 11) Fix races that can cause packets to be processed in the backlog even after a device attached to that SKB has been fully unregistered. From Julian Anastasov. 12) bcmgenet driver doesn't account packet drops vs. errors properly, fix from Petri Gynther. 13) Array index validation and off by one fix in DSA layer from Florian Fainelli * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits) can: replace timestamp as unique skb attribute ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux can: c_can: Fix default pinmux glitch at init can: rcar_can: unify error messages can: rcar_can: print request_irq() error code can: rcar_can: fix typo in error message can: rcar_can: print signed IRQ # can: rcar_can: fix IRQ check net: dsa: Fix off-by-one in switch address parsing net: dsa: Test array index before use net: switchdev: don't abort unsupported operations net: bcmgenet: fix accounting of packet drops vs errors cdc_ncm: update specs URL Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets bridge: mdb: allow the user to delete mdb entry if there's a querier net: call rcu_read_lock early in process_backlog net: do not process device backlog during unregistration bridge: fix potential crash in __netdev_pick_tx() net: axienet: Fix devm_ioremap_resource return value check ...
2015-07-139p/trans_virtio: reset virtio device on removePierre Morel1-0/+1
On device shutdown/removal, virtio drivers need to trigger a reset on the device; if this is neglected, the virtio core will complain about non-zero device status. This patch resets the status when the 9p virtio driver is removed from the system by calling vdev->config->reset on the virtio_device to send a reset to the host virtio device. Signed-off-by: Pierre Morel <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2015-07-13netfilter: IDLETIMER: fix lockdep warningDmitry Torokhov1-0/+1
Dynamically allocated sysfs attributes should be initialized with sysfs_attr_init() otherwise lockdep will be angry with us: [ 45.468653] BUG: key ffffffc030fad4e0 not in .data! [ 45.468655] ------------[ cut here ]------------ [ 45.468666] WARNING: CPU: 0 PID: 1176 at /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 lockdep_init_map+0x12c/0x490() [ 45.468672] DEBUG_LOCKS_WARN_ON(1) [ 45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U W 3.18.0 #43 [ 45.468674] Hardware name: XXX [ 45.468675] Call trace: [ 45.468680] [<ffffffc0002072b4>] dump_backtrace+0x0/0x10c [ 45.468683] [<ffffffc0002073d0>] show_stack+0x10/0x1c [ 45.468688] [<ffffffc000a86cd4>] dump_stack+0x74/0x94 [ 45.468692] [<ffffffc000217ae0>] warn_slowpath_common+0x84/0xb0 [ 45.468694] [<ffffffc000217b84>] warn_slowpath_fmt+0x4c/0x58 [ 45.468697] [<ffffffc0002530a4>] lockdep_init_map+0x128/0x490 [ 45.468701] [<ffffffc000367ef0>] __kernfs_create_file+0x80/0xe4 [ 45.468704] [<ffffffc00036862c>] sysfs_add_file_mode_ns+0x104/0x170 [ 45.468706] [<ffffffc00036870c>] sysfs_create_file_ns+0x58/0x64 [ 45.468711] [<ffffffc000930430>] idletimer_tg_checkentry+0x14c/0x324 [ 45.468714] [<ffffffc00092a728>] xt_check_target+0x170/0x198 [ 45.468717] [<ffffffc000993efc>] check_target+0x58/0x6c [ 45.468720] [<ffffffc000994c64>] translate_table+0x30c/0x424 [ 45.468723] [<ffffffc00099529c>] do_ipt_set_ctl+0x144/0x1d0 [ 45.468728] [<ffffffc0009079f0>] nf_setsockopt+0x50/0x60 [ 45.468732] [<ffffffc000946870>] ip_setsockopt+0x8c/0xb4 [ 45.468735] [<ffffffc0009661c0>] raw_setsockopt+0x10/0x50 [ 45.468739] [<ffffffc0008c1550>] sock_common_setsockopt+0x14/0x20 [ 45.468742] [<ffffffc0008bd190>] SyS_setsockopt+0x88/0xb8 [ 45.468744] ---[ end trace 41d156354d18c039 ]--- Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-07-12Merge tag 'linux-can-fixes-for-4.2-20150712' of ↵David S. Miller3-8/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2015-07-12 this is a pull request of 8 patchs for net/master. Sergei Shtylyov contributes 5 patches for the rcar_can driver, fixing the IRQ check and several info and error messages. There are two patches by J.D. Schroeder and Roger Quadros for the c_can driver and dra7x-evm device tree, which precent a glitch in the DCAN1 pinmux. Oliver Hartkopp provides a better approach to make the CAN skbs unique, the timestamp is replaced by a counter. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-07-12can: replace timestamp as unique skb attributeOliver Hartkopp3-8/+13
Commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for overlapping CAN filters" requires the skb->tstamp to be set to check for identical CAN skbs. Without timestamping to be required by user space applications this timestamp was not generated which lead to commit 36c01245eb8 "can: fix loss of CAN frames in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs by introducing several __net_timestamp() calls. This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb() to add __net_timestamp() after skbuff creation to prevent the frame loss fixed in mainline Linux. This patch removes the timestamp dependency and uses an atomic counter to create an unique identifier together with the skbuff pointer. Btw: the new skbcnt element introduced in struct can_skb_priv has to be initialized with zero in out-of-tree drivers which are not using alloc_can{,fd}_skb() too. Signed-off-by: Oliver Hartkopp <[email protected]> Cc: linux-stable <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2015-07-11net: dsa: Fix off-by-one in switch address parsingFlorian Fainelli1-1/+1
cd->sw_addr is used as a MDIO bus address, which cannot exceed PHY_MAX_ADDR (32), our check was off-by-one. Fixes: 5e95329b701c ("dsa: add device tree bindings to register DSA switches") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-11net: dsa: Test array index before useFlorian Fainelli1-2/+2
port_index is used an index into an array, and this information comes from Device Tree, make sure that port_index is not equal to the array size before using it. Move the check against port_index earlier in the loop. Fixes: 5e95329b701c: ("dsa: add device tree bindings to register DSA switches") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-11net: switchdev: don't abort unsupported operationsVivien Didelot1-4/+8
There is no need to abort attribute setting or object addition, if the prepare phase returned operation not supported. Thus, abort these two transactions only if the error is not -EOPNOTSUPP. Signed-off-by: Vivien Didelot <[email protected]> Acked-by: Jiri Pirko <[email protected]> Acked-by: Scott Feldman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-10net: inet_diag: always export IPV6_V6ONLY sockopt for listening socketsPhil Sutter1-2/+2
Reconsidering my commit 20462155 "net: inet_diag: export IPV6_V6ONLY sockopt", I am not happy with the limitations it causes for socket analysing code in userspace. Exporting the value only if it is set makes it hard for userspace to decide whether the option is not set or the kernel does not support exporting the option at all. >From an auditor's perspective, the interesting question for listening AF_INET6 sockets is: "Does it NOT have IPV6_V6ONLY set?" Because it is the unexpected case. This patch allows to answer this question reliably. Signed-off-by: Phil Sutter <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-10bridge: mdb: allow the user to delete mdb entry if there's a querierSatish Ashok1-9/+2
Until now when a querier was present static entries couldn't be deleted. Fix this and allow the user to manipulate the mdb with or without a querier. Signed-off-by: Satish Ashok <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-10net: call rcu_read_lock early in process_backlogJulian Anastasov1-16/+16
Incoming packet should be either in backlog queue or in RCU read-side section. Otherwise, the final sequence of flush_backlog() and synchronize_net() may miss packets that can run without device reference: CPU 1 CPU 2 skb->dev: no reference process_backlog:__skb_dequeue process_backlog:local_irq_enable on_each_cpu for flush_backlog => IPI(hardirq): flush_backlog - packet not found in backlog CPU delayed ... synchronize_net - no ongoing RCU read-side sections netdev_run_todo, rcu_barrier: no ongoing callbacks __netif_receive_skb_core:rcu_read_lock - too late free dev process packet for freed dev Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <[email protected]> Cc: Stephen Hemminger <[email protected]> Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-10net: do not process device backlog during unregistrationJulian Anastasov1-2/+4
commit 381c759d9916 ("ipv4: Avoid crashing in ip_error") fixes a problem where processed packet comes from device with destroyed inetdev (dev->ip_ptr). This is not expected because inetdev_destroy is called in NETDEV_UNREGISTER phase and packets should not be processed after dev_close_many() and synchronize_net(). Above fix is still required because inetdev_destroy can be called for other reasons. But it shows the real problem: backlog can keep packets for long time and they do not hold reference to device. Such packets are then delivered to upper levels at the same time when device is unregistered. Calling flush_backlog after NETDEV_UNREGISTER_FINAL still accounts all packets from backlog but before that some packets continue to be delivered to upper levels long after the synchronize_net call which is supposed to wait the last ones. Also, as Eric pointed out, processed packets, mostly from other devices, can continue to add new packets to backlog. Fix the problem by moving flush_backlog early, after the device driver is stopped and before the synchronize_net() call. Then use netif_running check to make sure we do not add more packets to backlog. We have to do it in enqueue_to_backlog context when the local IRQ is disabled. As result, after the flush_backlog and synchronize_net sequence all packets should be accounted. Thanks to Eric W. Biederman for the test script and his valuable feedback! Reported-by: Vittorio Gambaletta <[email protected]> Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <[email protected]> Cc: Stephen Hemminger <[email protected]> Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-10netfilter: ctnetlink: put back references to master ct and expect objectsPablo Neira Ayuso1-5/+0
We have to put back the references to the master conntrack and the expectation that we just created, otherwise we'll leak them. Fixes: 0ef71ee1a5b9 ("netfilter: ctnetlink: refactor ctnetlink_create_expect") Reported-by: Tim Wiess <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-07-09bridge: fix potential crash in __netdev_pick_tx()Eric Dumazet1-0/+1
Commit c29390c6dfee ("xps: must clear sender_cpu before forwarding") fixed an issue in normal forward path, caused by sender_cpu & napi_id skb fields being an union. Bridge is another point where skb can be forwarded, so we need the same cure. Bug triggers if packet was received on a NIC using skb_mark_napi_id() Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Bob Liu <[email protected]> Tested-by: Bob Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-09net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread_worker()Oleg Nesterov1-9/+2
pktgen_thread_worker() doesn't need to wait for kthread_stop(), it can simply exit. Just pktgen_create_thread() and pg_net_exit() should do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is fine. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-09net: pktgen: fix race between pktgen_thread_worker() and kthread_stop()Oleg Nesterov1-1/+3
pktgen_thread_worker() is obviously racy, kthread_stop() can come between the kthread_should_stop() check and set_current_state(). Signed-off-by: Oleg Nesterov <[email protected]> Reported-by: Jan Stancek <[email protected]> Reported-by: Marcelo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-09libceph: treat sockaddr_storage with uninitialized family as blankIlya Dryomov1-7/+7
addr_is_blank() should return true if family is neither AF_INET nor AF_INET6. This is what its counterpart entity_addr_t::is_blank_ip() is doing and it is the right thing to do: in process_banner() we check if our address is blank and if it is "learn" it from our peer. As it is, we never learn our address and always send out a blank one. This goes way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and some ipv6 support groundwork") from 2009. While at at, do not open-code ipv6_addr_any() and use INADDR_ANY constant instead of 0. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2015-07-09libceph: enable ceph in a non-default network namespaceIlya Dryomov2-7/+19
Grab a reference on a network namespace of the 'rbd map' (in case of rbd) or 'mount' (in case of ceph) process and use that to open sockets instead of always using init_net and bailing if network namespace is anything but init_net. Be careful to not share struct ceph_client instances between different namespaces and don't add any code in the !CONFIG_NET_NS case. This is based on a patch from Hong Zhiguo <[email protected]>. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2015-07-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-29/+54
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree. This batch mostly comes with patches to address fallout from the previous merge window cycle, they are: 1) Use entry->state.hook_list from nf_queue() instead of the global nf_hooks which is not valid when used from NFPROTO_NETDEV, this should cause no problems though since we have no userspace queueing for that family, but let's fix this now for the sake of correctness. Patch from Eric W. Biederman. 2) Fix compilation breakage in bridge netfilter if CONFIG_NF_DEFRAG_IPV4 is not set, from Bernhard Thaler. 3) Use percpu jumpstack in arptables too, now that there's a single copy of the rule blob we can't store the return address there anymore. Patch from Florian Westphal. 4) Fix a skb leak in the xmit path of bridge netfilter, problem there since 2.6.37 although it should be not possible to hit invalid traffic there, also from Florian. 5) Eric Leblond reports that when loading a large ruleset with many missing modules after a fresh boot, nf_tables can take long time commit it. Fix this by processing the full batch until the end, even on missing modules, then abort only once and restart processing. 6) Add bridge netfilter files to the MAINTAINER files. 7) Fix a net_device refcount leak in the new IPV6 bridge netfilter code, from Julien Grall. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-07-08ipv4: add support for linkdown sysctl to netconfAndy Gospodarek1-0/+13
This kernel patch exports the value of the new ignore_routes_with_linkdown via netconf. v2: changes to notify userspace via netlink when sysctl values change and proposed for 'net' since this could be considered a bugfix Signed-off-by: Andy Gospodarek <[email protected]> Suggested-by: Nicolas Dichtel <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-08bridge: mdb: zero out the local br_ip variable before useNikolay Aleksandrov1-0/+2
Since commit b0e9a30dd669 ("bridge: Add vlan id to multicast groups") there's a check in br_ip_equal() for a matching vlan id, but the mdb functions were not modified to use (or at least zero it) so when an entry was added it would have a garbage vlan id (from the local br_ip variable in __br_mdb_add/del) and this would prevent it from being matched and also deleted. So zero out the whole local ip var to protect ourselves from future changes and also to fix the current bug, since there's no vlan id support in the mdb uapi - use always vlan id 0. Example before patch: root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent RTNETLINK answers: Invalid argument After patch: root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb Signed-off-by: Nikolay Aleksandrov <[email protected]> Fixes: b0e9a30dd669 ("bridge: Add vlan id to multicast groups") Signed-off-by: David S. Miller <[email protected]>
2015-07-08net/tipc: initialize security state for new connection socketStephen Smalley1-0/+1
Calling connect() with an AF_TIPC socket would trigger a series of error messages from SELinux along the lines of: SELinux: Invalid class 0 type=AVC msg=audit(1434126658.487:34500): avc: denied { <unprintable> } for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable> permissive=0 This was due to a failure to initialize the security state of the new connection sock by the tipc code, leaving it with junk in the security class field and an unlabeled secid. Add a call to security_sk_clone() to inherit the security state from the parent socket. Reported-by: Tim Shearer <[email protected]> Signed-off-by: Stephen Smalley <[email protected]> Acked-by: Paul Moore <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-08ip_tunnel: fix ipv4 pmtu check to honor inner ip header dfTimo Teräs1-3/+5
Frag needed should be sent only if the inner header asked to not fragment. Currently fragmentation is broken if the tunnel has df set, but df was not asked in the original packet. The tunnel's df needs to be still checked to update internally the pmtu cache. Commit 23a3647bc4f93bac broke it, and this commit fixes the ipv4 df check back to the way it was. Fixes: 23a3647bc4f93bac ("ip_tunnels: Use skb-len to PMTU check.") Cc: Pravin B Shelar <[email protected]> Signed-off-by: Timo Teräs <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-08rtnetlink: verify IFLA_VF_INFO attributes before passing them to driverDaniel Borkmann1-91/+96
Jason Gunthorpe reported that since commit c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes anymore with respect to their policy, that is, ifla_vfinfo_policy[]. Before, they were part of ifla_policy[], but they have been nested since placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO, which is another nested attribute for the actual VF attributes such as IFLA_VF_MAC, IFLA_VF_VLAN, etc. Despite the policy being split out from ifla_policy[] in this commit, it's never applied anywhere. nla_for_each_nested() only does basic nla_ok() testing for struct nlattr, but it doesn't know about the data context and their requirements. Fix, on top of Jason's initial work, does 1) parsing of the attributes with the right policy, and 2) using the resulting parsed attribute table from 1) instead of the nla_for_each_nested() loop (just like we used to do when still part of ifla_policy[]). Reference: http://thread.gmane.org/gmane.linux.network/368913 Fixes: c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric") Reported-by: Jason Gunthorpe <[email protected]> Cc: Chris Wright <[email protected]> Cc: Sucheta Chakraborty <[email protected]> Cc: Greg Rose <[email protected]> Cc: Jeff Kirsher <[email protected]> Cc: Rony Efraim <[email protected]> Cc: Vlad Zolotarov <[email protected]> Cc: Nicolas Dichtel <[email protected]> Cc: Thomas Graf <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Vlad Zolotarov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-08Revert "dev: set iflink to 0 for virtual interfaces"Nicolas Dichtel1-4/+0
This reverts commit e1622baf54df8cc958bf29d71de5ad545ea7d93c. The side effect of this commit is to add a '@NONE' after each virtual interface name with a 'ip link'. It may break existing scripts. Reported-by: Olivier Hartkopp <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Tested-by: Oliver Hartkopp <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-08net: graceful exit from netif_alloc_netdev_queues()Eric Dumazet1-1/+2
User space can crash kernel with ip link add ifb10 numtxqueues 100000 type ifb We must replace a BUG_ON() by proper test and return -EINVAL for crazy values. Fixes: 60877a32bce00 ("net: allow large number of tx queues") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-08bridge: mdb: start delete timer for temp static entriesSatish Ashok1-0/+3
Start the delete timer when adding temp static entries so they can expire. Signed-off-by: Satish Ashok <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Fixes: ccb1c31a7a87 ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: David S. Miller <[email protected]>
2015-07-08net_sched: gen_estimator: extend pps limitEric Dumazet1-6/+7
rate estimators are limited to 4 Mpps, which was fine years ago, but too small with current hardware generation. Lets use 2^5 scaling instead of 2^10 to get 128 Mpps new limit. On 64bit arch, use an "unsigned long" for temp storage and remove limit. (We do not expect 32bit arches to be able to reach this point) Tested: tc -s -d filter sh dev eth0 parent ffff: filter protocol ip pref 1 u32 filter protocol ip pref 1 u32 fh 800: ht divisor 1 filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15 match 07000000/ff000000 at 12 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 166 sec Action statistics: Sent 39734251496 bytes 863788076 pkt (dropped 863788117, overlimits 0 requeues 0) rate 4067Mbit 11053596pps backlog 0b 0p requeues 0 Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-07-08netfilter: bridge: Use __in6_dev_get rather than in6_dev_get in br_validate_ipv6Julien Grall1-1/+1
The commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd "netfilter: bridge: forward IPv6 fragmented packets" introduced a new function br_validate_ipv6 which take a reference on the inet6 device. Although, the reference is not released at the end. This will result to the impossibility to destroy any netdevice using ipv6 and bridge. It's possible to directly retrieve the inet6 device without taking a reference as all netfilter hooks are protected by rcu_read_lock via nf_hook_slow. Spotted while trying to destroy a Xen guest on the upstream Linux: "unregister_netdevice: waiting for vif1.0 to become free. Usage count = 1" Signed-off-by: Julien Grall <[email protected]> Cc: Bernhard Thaler <[email protected]> Cc: Pablo Neira Ayuso <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Bob Liu <[email protected]> Acked-by: Stephen Hemminger <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds1-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-07-04bluetooth: fix list handlingLinus Torvalds2-2/+3
Commit 835a6a2f8603 ("Bluetooth: Stop sabotaging list poisoning") thought that the code was sabotaging the list poisoning when NULL'ing out the list pointers and removed it. But what was going on was that the bluetooth code was using NULL pointers for the list as a way to mark it empty, and that commit just broke it (and replaced the test with NULL with a "list_empty()" test on a uninitialized list instead, breaking things even further). So fix it all up to use the regular and real list_empty() handling (which does not use NULL, but a pointer to itself), also making sure to initialize the list properly (the previous NULL case was initialized implicitly by the session being allocated with kzalloc()) This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong An. [ I would normally expect to get this through the bt tree, but I'm going to release -rc1, so I'm just committing this directly - Linus ] Reported-and-tested-by: Jörg Otte <[email protected]> Cc: Alexey Dobriyan <[email protected]> Original-by: Tedd Ho-Jeong An <[email protected]> Original-by: Marcel Holtmann <[email protected]>: Signed-off-by: Linus Torvalds <[email protected]>