aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-04-30mac802154: fix ieee802154_register_hw error handlingAlexander Aring1-1/+3
Currently if ieee802154_if_add failed, we don't unregister the wpan phy which was registered before. This patch adds a correct error handling for unregister the wpan phy when ieee802154_if_add failed. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2015-04-30Bluetooth: Skip the shutdown routine if the interface is not upGabriele Mazzotta1-1/+2
Most likely, the shutdown routine requires the interface to be up. This is the case for BTUSB_INTEL: the routine tries to send a command to the interface, but since this one is down, it fails and exits once HCI_INIT_TIMEOUT has expired. Signed-off-by: Gabriele Mazzotta <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Cc: [email protected] # 4.0.x
2015-04-29tcp: update reordering first before detecting lossYuchung Cheng1-4/+2
tcp_mark_lost_retrans is not used when FACK is disabled. Since tcp_update_reordering may disable FACK, it should be called first before tcp_mark_lost_retrans. Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Nandita Dukkipati <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29tcp: add TCP_CC_INFO socket optionEric Dumazet1-0/+21
Some Congestion Control modules can provide per flow information, but current way to get this information is to use netlink. Like TCP_INFO, let's add TCP_CC_INFO so that applications can issue a getsockopt() if they have a socket file descriptor, instead of playing complex netlink games. Sample usage would be : union tcp_cc_info info; socklen_t len = sizeof(info); if (getsockopt(fd, SOL_TCP, TCP_CC_INFO, &info, &len) == -1) Signed-off-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Neal Cardwell <[email protected]> Acked-by: Neal Cardwell <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29tcp: prepare CC get_info() access from getsockopt()Eric Dumazet5-33/+38
We would like that optional info provided by Congestion Control modules using netlink can also be read using getsockopt() This patch changes get_info() to put this information in a buffer, instead of skb, like tcp_get_info(), so that following patch can reuse this common infrastructure. Signed-off-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Neal Cardwell <[email protected]> Acked-by: Neal Cardwell <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29tcp: add tcpi_bytes_received to tcp_infoEric Dumazet3-4/+15
This patch tracks total number of payload bytes received on a TCP socket. This is the sum of all changes done to tp->rcv_nxt RFC4898 named this : tcpEStatsAppHCThruOctetsReceived This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_received was placed near tp->rcv_nxt for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Matt Mathis <[email protected]> Cc: Eric Salo <[email protected]> Cc: Martin Lau <[email protected]> Cc: Chris Rapier <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29tcp: add tcpi_bytes_acked to tcp_infoEric Dumazet2-3/+16
This patch tracks total number of bytes acked for a TCP socket. This is the sum of all changes done to tp->snd_una, and allows for precise tracking of delivered data. RFC4898 named this : tcpEStatsAppHCThruOctetsAcked This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_acked was placed near tp->snd_una for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Cc: Matt Mathis <[email protected]> Cc: Eric Salo <[email protected]> Cc: Martin Lau <[email protected]> Cc: Chris Rapier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29net: dsa: Fix scope of eeprom-length propertyGuenter Roeck1-1/+1
eeprom-length is a switch property, not a dsa property, and thus needs to be attached to the switch node, not to the dsa node. Reported-by: Andrew Lunn <[email protected]> Fixes: 6793abb4e849 ("net: dsa: Add support for switch EEPROM access") Signed-off-by: Guenter Roeck <[email protected]> Acked-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29tipc: fix problem with parallel link synchronization mechanismJon Paul Maloy1-5/+2
Currently, we try to accumulate arrived packets in the links's 'deferred' queue during the parallel link syncronization phase. This entails two problems: - With an unlucky combination of arriving packets the algorithm may go into a lockstep with the out-of-sequence handling function, where the synch mechanism is adding a packet to the deferred queue, while the out-of-sequence handling is retrieving it again, thus ending up in a loop inside the node_lock scope. - Even if this is avoided, the link will very often send out unnecessary protocol messages, in the worst case leading to redundant retransmissions. We fix this by just dropping arriving packets on the upcoming link during the synchronization phase, thus relying on the retransmission protocol to resolve the situation once the two links have arrived to a synchronized state. Reviewed-by: Erik Hugne <[email protected]> Reviewed-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29tipc: remove wrong use of NLM_F_MULTINicolas Dichtel2-12/+13
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 35b9dd7607f0 ("tipc: add bearer get/dump to new netlink api") Fixes: 7be57fc69184 ("tipc: add link get/dump to new netlink api") Fixes: 46f15c6794fb ("tipc: add media get/dump to new netlink api") CC: Richard Alpe <[email protected]> CC: Jon Maloy <[email protected]> CC: Ying Xue <[email protected]> CC: [email protected] Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29bridge/nl: remove wrong use of NLM_F_MULTINicolas Dichtel3-8/+10
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: e5a55a898720 ("net: create generic bridge ops") Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf") CC: John Fastabend <[email protected]> CC: Sathya Perla <[email protected]> CC: Subbu Seetharaman <[email protected]> CC: Ajit Khaparde <[email protected]> CC: Jeff Kirsher <[email protected]> CC: [email protected] CC: Jiri Pirko <[email protected]> CC: Scott Feldman <[email protected]> CC: Stephen Hemminger <[email protected]> CC: [email protected] Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29bridge/mdb: remove wrong use of NLM_F_MULTINicolas Dichtel1-1/+1
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 37a393bc4932 ("bridge: notify mdb changes via netlink") CC: Cong Wang <[email protected]> CC: Stephen Hemminger <[email protected]> CC: [email protected] Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29net: sched: act_connmark: don't zap skb->nfctFlorian Westphal1-2/+0
This action is meant to be passive, i.e. we should not alter skb->nfct: If nfct is present just leave it alone. Compile tested only. Cc: Jamal Hadi Salim <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29route: Use ipv4_mtu instead of raw rt_pmtuHerbert Xu1-4/+1
The commit 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 ("ipv4: Don't increase PMTU with Datagram Too Big message") broke PMTU in cases where the rt_pmtu value has expired but is smaller than the new PMTU value. This obsolete rt_pmtu then prevents the new PMTU value from being installed. Fixes: 3cdaa5be9e81 ("ipv4: Don't increase PMTU with Datagram Too Big message") Reported-by: Gerd v. Egidy <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-29xfrm: fix a race in xfrm_state_lookup_byspiLi RongQing1-1/+1
The returned xfrm_state should be hold before unlock xfrm_state_lock, otherwise the returned xfrm_state maybe be released. Fixes: c454997e6[{pktgen, xfrm} Introduce xfrm_state_lookup_byspi..] Cc: Fan Du <[email protected]> Signed-off-by: Li RongQing <[email protected]> Acked-by: Fan Du <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2015-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-2/+1
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix a crash in nf_tables when dictionaries are used from the ruleset, due to memory corruption, from Florian Westphal. 2) Fix another crash in nf_queue when used with br_netfilter. Also from Florian. Both fixes are related to new stuff that got in 4.0-rc. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds20-105/+290
Pull networking fixes from David Miller: 1) mlx4 doesn't check fully for supported valid RSS hash function, fix from Amir Vadai 2) Off by one in ibmveth_change_mtu(), from David Gibson 3) Prevent altera chip from reporting false error interrupts in some circumstances, from Chee Nouk Phoon 4) Get rid of that stupid endless loop trying to allocate a FIN packet in TCP, and in the process kill deadlocks. From Eric Dumazet 5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from Eric Dumazet 6) Fix two bugs in async rhashtable resizing, from Thomas Graf 7) Fix topology server listener socket namespace bug in TIPC, from Ying Xue 8) Add some missing HAS_DMA kconfig dependencies, from Geert Uytterhoeven 9) bgmac driver intends to force re-polling but does so by returning the wrong value from it's ->poll() handler. Fix from Rafał Miłecki 10) When the creater of an rhashtable configures a max size for it, don't bark in the logs and drop insertions when that is exceeded. Fix from Johannes Berg 11) Recover from out of order packets in ppp mppe properly, from Sylvain Rochet * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) bnx2x: really disable TPA if 'disable_tpa' option is set net:treewide: Fix typo in drivers/net net/mlx4_en: Prevent setting invalid RSS hash function mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions netfilter; Add some missing default cases to switch statements in nft_reject. ppp: mppe: discard late packet in stateless mode ppp: mppe: sanity error path rework net/bonding: Make DRV macros private net: rfs: fix crash in get_rps_cpus() altera tse: add support for fixed-links. pxa168: fix double deallocation of managed resources net: fix crash in build_skb() net: eth: altera: Resolve false errors from MSGDMA to TSE ehea: Fix memory hook reference counting crashes net/tg3: Release IRQs on permanent error net: mdio-gpio: support access that may sleep inet: fix possible panic in reqsk_queue_unlink() rhashtable: don't attempt to grow when at max_size bgmac: fix requests for extra polling calls from NAPI tcp: avoid looping in tcp_send_fin() ...
2015-04-27netfilter; Add some missing default cases to switch statements in nft_reject.David S. Miller2-0/+4
This fixes: ==================== net/netfilter/nft_reject.c: In function ‘nft_reject_dump’: net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswitch] switch (priv->type) { ^ net/netfilter/nft_reject.c:61:2: warning: enumeration value ‘NFT_REJECT_ICMPX_UNREACH’ not handled in switch [-Wswi\ tch] net/netfilter/nft_reject_inet.c: In function ‘nft_reject_inet_dump’: net/netfilter/nft_reject_inet.c:105:2: warning: enumeration value ‘NFT_REJECT_TCP_RST’ not handled in switch [-Wswi\ tch] switch (priv->type) { ^ ==================== Signed-off-by: David S. Miller <[email protected]>
2015-04-26Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds10-732/+889
Pull NFS client updates from Trond Myklebust: "Another set of mainly bugfixes and a couple of cleanups. No new functionality in this round. Highlights include: Stable patches: - Fix a regression in /proc/self/mountstats - Fix the pNFS flexfiles O_DIRECT support - Fix high load average due to callback thread sleeping Bugfixes: - Various patches to fix the pNFS layoutcommit support - Do not cache pNFS deviceids unless server notifications are enabled - Fix a SUNRPC transport reconnection regression - make debugfs file creation failure non-fatal in SUNRPC - Another fix for circular directory warnings on NFSv4 "junctioned" mountpoints - Fix locking around NFSv4.2 fallocate() support - Truncating NFSv4 file opens should also sync O_DIRECT writes - Prevent infinite loop in rpcrdma_ep_create() Features: - Various improvements to the RDMA transport code's handling of memory registration - Various code cleanups" * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits) fs/nfs: fix new compiler warning about boolean in switch nfs: Remove unneeded casts in nfs NFS: Don't attempt to decode missing directory entries Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one" NFS: Rename idmap.c to nfs4idmap.c NFS: Move nfs_idmap.h into fs/nfs/ NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h NFS: Add a stub for GETDEVICELIST nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes nfs: fix DIO good bytes calculation nfs: Fetch MOUNTED_ON_FILEID when updating an inode sunrpc: make debugfs file creation failure non-fatal nfs: fix high load average due to callback thread sleeping NFS: Reduce time spent holding the i_mutex during fallocate() NFS: Don't zap caches on fallocate() xprtrdma: Make rpcrdma_{un}map_one() into inline functions xprtrdma: Handle non-SEND completions via a callout xprtrdma: Add "open" memreg op xprtrdma: Add "destroy MRs" memreg op xprtrdma: Add "reset MRs" memreg op ...
2015-04-26Merge branch 'for-linus' of ↵Linus Torvalds4-24/+24
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26net: rfs: fix crash in get_rps_cpus()Eric Dumazet1-6/+6
Commit 567e4b79731c ("net: rfs: add hash collision detection") had one mistake : RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu() and get_rps_cpu(), as @next_cpu was the result of an AND with rps_cpu_mask This bug showed up on a host with 72 cpus : next_cpu was 0x7f, and the code was trying to access percpu data of an non existent cpu. In a follow up patch, we might get rid of compares against nr_cpu_ids, if we init the tables with 0. This is silly to test for a very unlikely condition that exists only shortly after table initialization, as we got rid of rps_reset_sock_flow() and similar functions that were writing this RPS_NO_CPU magic value at flow dismantle : When table is old enough, it never contains this value anymore. Fixes: 567e4b79731c ("net: rfs: add hash collision detection") Signed-off-by: Eric Dumazet <[email protected]> Cc: Tom Herbert <[email protected]> Cc: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-25net: fix crash in build_skb()Eric Dumazet2-13/+24
When I added pfmemalloc support in build_skb(), I forgot netlink was using build_skb() with a vmalloc() area. In this patch I introduce __build_skb() for netlink use, and build_skb() is a wrapper handling both skb->head_frag and skb->pfmemalloc This means netlink no longer has to hack skb->head_frag [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26! [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1567.700067] Dumping ftrace buffer: [ 1567.700067] (ftrace buffer empty) [ 1567.700067] Modules linked in: [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167 [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000 [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3)) [ 1567.700067] RSP: 0018:ffff8802467779d8 EFLAGS: 00010202 [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049 [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000 [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000 [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000 [ 1567.700067] FS: 00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000 [ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0 [ 1567.700067] Stack: [ 1567.700067] ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000 [ 1567.700067] ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08 [ 1567.700067] ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821 [ 1567.700067] Call Trace: [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316) [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329) [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623) [ 1567.774369] sock_write_iter (net/socket.c:823) [ 1567.774369] ? sock_sendmsg (net/socket.c:806) [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491) [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249) [ 1567.774369] ? default_llseek (fs/read_write.c:487) [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701) [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4)) [ 1567.774369] vfs_write (fs/read_write.c:539) [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577) [ 1567.774369] ? SyS_read (fs/read_write.c:577) [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636) [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42) [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261) Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Sasha Levin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-24netfilter: nf_tables: fix wrong length for jump/goto verdictsFlorian Westphal1-2/+1
NFT_JUMP/GOTO erronously sets length to sizeof(void *). We then allocate insufficient memory when such element is added to a vmap. Suggested-by: Patrick McHardy <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-04-24inet: fix possible panic in reqsk_queue_unlink()Eric Dumazet7-9/+47
[ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243 There is a race when reqsk_timer_handler() and tcp_check_req() call inet_csk_reqsk_queue_unlink() on the same req at the same time. Before commit fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer"), listener spinlock was held and race could not happen. To solve this bug, we change reqsk_queue_unlink() to not assume req must be found, and we return a status, to conditionally release a refcount on the request sock. This also means tcp_check_req() in non fastopen case might or not consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have to properly handle this. (Same remark for dccp_check_req() and its callers) inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is called 4 times in tcp and 3 times in dccp. Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-24tcp: avoid looping in tcp_send_fin()Eric Dumazet1-21/+29
Presence of an unbound loop in tcp_send_fin() had always been hard to explain when analyzing crash dumps involving gigantic dying processes with millions of sockets. Lets try a different strategy : In case of memory pressure, try to add the FIN flag to last packet in write queue, even if packet was already sent. TCP stack will be able to deliver this FIN after a timeout event. Note that this FIN being delivered by a retransmit, it also carries a Push flag given our current implementation. By checking sk_under_memory_pressure(), we anticipate that cooking many FIN packets might deplete tcp memory. In the case we could not allocate a packet, even with __GFP_WAIT allocation, then not sending a FIN seems quite reasonable if it allows to get rid of this socket, free memory, and not block the process from eventually doing other useful work. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-24mac80211: enable hash table shrinkingJohannes Berg1-0/+1
The hashtable behaviour change was merged into the tree at about the same time as the mac80211 use of rhashtable, but of course these don't really conflict in the normal sense. Enable hash table shrinking now. Signed-off-by: Johannes Berg <[email protected]>
2015-04-23Merge tag 'nfs-rdma-for-4.1-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust60-914/+1294
NFS: NFSoRDMA Client Changes This patch series creates an operation vector for each of the different memory registration modes. This should make it easier to one day increase credit limit, rsize, and wsize. Signed-off-by: Anna Schumaker <[email protected]>
2015-04-23Merge branch 'bugfixes'Trond Myklebust1-10/+12
* bugfixes: NFSv4: Return delegations synchronously in evict_inode SUNRPC: Fix a regression when reconnecting NFS: remount with security change should return EINVAL nfs: do not export discarded symbols NFSv4.1: don't export static symbol
2015-04-23sunrpc: make debugfs file creation failure non-fatalJeff Layton4-38/+32
v2: gracefully handle the case where some dentry pointers end up NULL and be more dilligent about zeroing out dentry pointers We currently have a problem that SELinux policy is being enforced when creating debugfs files. If a debugfs file is created as a side effect of doing some syscall, then that creation can fail if the SELinux policy for that process prevents it. This seems wrong. We don't do that for files under /proc, for instance, so Bruce has proposed a patch to fix that. While discussing that patch however, Greg K.H. stated: "No kernel code should care / fail if a debugfs function fails, so please fix up the sunrpc code first." This patch converts all of the sunrpc debugfs setup code to be void return functins, and the callers to not look for errors from those functions. This should allow rpc_clnt and rpc_xprt creation to work, even if the kernel fails to create debugfs files for some reason. Cc: Greg Kroah-Hartman <[email protected]> Acked-by: "J. Bruce Fields" <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-04-23net: unix: garbage: fixed several comment and whitespace style issuesJason Eastman1-42/+28
fixed several comment and whitespace style issues Signed-off-by: Jason Eastman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-23tipc: fix node refcount issueErik Hugne1-1/+0
When link statistics is dumped over netlink, we iterate over the list of peer nodes and append each links statistics to the netlink msg. In the case where the dump is resumed after filling up a nlmsg, the node refcnt is decremented without having been incremented previously which may cause the node reference to be freed. When this happens, the following info/stacktrace will be generated, followed by a crash or undefined behavior. We fix this by removing the erroneous call to tipc_node_put inside the loop that iterates over nodes. [ 384.312303] INFO: trying to register non-static key. [ 384.313110] the code is fine but needs lockdep annotation. [ 384.313290] turning off the locking correctness validator. [ 384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13 [ 384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 384.313290] ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007 [ 384.313290] ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200 [ 384.313290] ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001 [ 384.313290] Call Trace: [ 384.313290] <IRQ> [<ffffffff8170adf1>] dump_stack+0x4c/0x65 [ 384.313290] [<ffffffff810a6a6d>] __lock_acquire+0xf3d/0xf50 [ 384.313290] [<ffffffff810a7375>] lock_acquire+0xd5/0x290 [ 384.313290] [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc] [ 384.313290] [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc] [ 384.313290] [<ffffffff81712890>] _raw_spin_lock_bh+0x40/0x80 [ 384.313290] [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc] [ 384.313290] [<ffffffffa0043e8c>] link_timeout+0x1c/0x170 [tipc] [ 384.313290] [<ffffffff810c4698>] call_timer_fn+0xb8/0x490 [ 384.313290] [<ffffffff810c45e0>] ? process_timeout+0x10/0x10 [ 384.313290] [<ffffffff810c5a2c>] run_timer_softirq+0x21c/0x420 [ 384.313290] [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc] [ 384.313290] [<ffffffff8105a954>] __do_softirq+0xf4/0x630 [ 384.313290] [<ffffffff8105afdd>] irq_exit+0x5d/0x60 [ 384.313290] [<ffffffff8103ade1>] smp_apic_timer_interrupt+0x41/0x50 [ 384.313290] [<ffffffff817144a0>] apic_timer_interrupt+0x70/0x80 [ 384.313290] <EOI> [<ffffffff8100db10>] ? default_idle+0x20/0x210 [ 384.313290] [<ffffffff8100db0e>] ? default_idle+0x1e/0x210 [ 384.313290] [<ffffffff8100e61a>] arch_cpu_idle+0xa/0x10 [ 384.313290] [<ffffffff81099803>] cpu_startup_entry+0x2c3/0x530 [ 384.313290] [<ffffffff810d2893>] ? clockevents_register_device+0x113/0x200 [ 384.313290] [<ffffffff81038b0f>] start_secondary+0x13f/0x170 Fixes: 8a0f6ebe8494 ("tipc: involve reference counter for node structure") Signed-off-by: Erik Hugne <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-23tipc: fix random link reset problemErik Hugne1-1/+2
In the function tipc_sk_rcv(), the stack variable 'err' is only initialized to TIPC_ERR_NO_PORT for the first iteration over the link input queue. If a chain of messages are received from a link, failure to lookup the socket for any but the first message will cause the message to bounce back out on a random link. We fix this by properly initializing err. Signed-off-by: Erik Hugne <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-23tipc: fix topology server broken issueYing Xue1-6/+3
When a new topology server is launched in a new namespace, its listening socket is inserted into the "init ns" namespace's socket hash table rather than the one owned by the new namespace. Although the socket's namespace is forcedly changed to the new namespace later, the socket is still stored in the socket hash table of "init ns" namespace. When a client created in the new namespace connects its own topology server, the connection is failed as its server's socket could not be found from its own namespace's socket table. If __sock_create() instead of original sock_create_kern() is used to create the server's socket through specifying an expected namesapce, the socket will be inserted into the specified namespace's socket table, thereby avoiding to the topology server broken issue. Fixes: 76100a8a64bc ("tipc: fix netns refcnt leak") Reported-by: Erik Hugne <[email protected]> Signed-off-by: Ying Xue <[email protected]> Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-23mac80211: fix rhashtable conversionJohannes Berg1-1/+17
My conversion of the mac80211 station hash table to rhashtable completely broke the lookup in sta_info_get() as it no longer took into account the virtual interface. Fix that. Fixes: 7bedd0cfad4e1 ("mac80211: use rhashtable for station table") Signed-off-by: Johannes Berg <[email protected]>
2015-04-22net: do not deplete pfmemalloc reserveEric Dumazet1-2/+7
build_skb() should look at the page pfmemalloc status. If set, this means page allocator allocated this page in the expectation it would help to free other pages. Networking stack can do that only if skb->pfmemalloc is also set. Also, we must refrain using high order pages from the pfmemalloc reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for them. Under memory pressure, using order-0 pages is probably the best strategy. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-22ip6_gre: use netdev_alloc_pcpu_stats()Johannes Berg1-8/+1
The code there just open-codes the same, so use the provided macro instead. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-22Merge branch 'for-linus' of ↵Linus Torvalds7-17/+392
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This time around we have a collection of CephFS fixes from Zheng around MDS failure handling and snapshots, support for a new CRUSH straw2 algorithm (to sync up with userspace) and several RBD cleanups and fixes from Ilya, an error path leak fix from Taesoo, and then an assorted collection of cleanups from others" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits) rbd: rbd_wq comment is obsolete libceph: announce support for straw2 buckets crush: straw2 bucket type with an efficient 64-bit crush_ln() crush: ensuring at most num-rep osds are selected crush: drop unnecessary include from mapper.c ceph: fix uninline data function ceph: rename snapshot support ceph: fix null pointer dereference in send_mds_reconnect() ceph: hold on to exclusive caps on complete directories libceph: simplify our debugfs attr macro ceph: show non-default options only libceph: expose client options through debugfs libceph, ceph: split ceph_show_options() rbd: mark block queue as non-rotational libceph: don't overwrite specific con error msgs ceph: cleanup unsafe requests when reconnecting is denied ceph: don't zero i_wrbuffer_ref when reconnecting is denied ceph: don't mark dirty caps when there is no auth cap ceph: keep i_snap_realm while there are writers libceph: osdmap.h: Add missing format newlines ...
2015-04-22mpls: Prevent use of implicit NULL label as outgoing labelRobert Shearman1-0/+9
The reserved implicit-NULL label isn't allowed to appear in the label stack for packets, so make it an error for the control plane to specify it as an outgoing label. Suggested-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Robert Shearman <[email protected]> Reviewed-by: "Eric W. Biederman" <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-22mpls: Per-device enabling of packet inputRobert Shearman2-2/+69
An MPLS network is a single trust domain where the edges must be in control of what labels make their way into the core. The simplest way of ensuring this is for the edge device to always impose the labels, and not allow forward labeled traffic from untrusted neighbours. This is achieved by allowing a per-device configuration of whether MPLS traffic input from that interface should be processed or not. To be secure by default, the default state is changed to MPLS being disabled on all interfaces unless explicitly enabled and no global option is provided to change the default. Whilst this differs from other protocols (e.g. IPv6), network operators are used to explicitly enabling MPLS forwarding on interfaces, and with the number of links to the MPLS core typically fairly low this doesn't present too much of a burden on operators. Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Robert Shearman <[email protected]> Reviewed-by: "Eric W. Biederman" <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-22mpls: Per-device MPLS stateRobert Shearman2-2/+51
Add per-device MPLS state to supported interfaces. Use the presence of this state in mpls_route_add to determine that this is a supported interface. Use the presence of mpls_dev to drop packets that arrived on an unsupported interface - previously they were allowed through. Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Robert Shearman <[email protected]> Reviewed-by: "Eric W. Biederman" <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-22tcp: fix possible deadlock in tcp_send_fin()Eric Dumazet1-1/+19
Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in case a huge process is killed by OOM, and tcp_mem[2] is hit. To be able to free memory we need to make progress, so this patch allows FIN packets to not care about tcp_mem[2], if skb allocation succeeded. In a follow-up patch, we might abort tcp_send_fin() infinite loop in case TIF_MEMDIE is set on this thread, as memory allocator did its best getting extra memory already. This patch reverts d22e15371811 ("tcp: fix tcp fin memory accounting") Fixes: d22e15371811 ("tcp: fix tcp fin memory accounting") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-22crush: straw2 bucket type with an efficient 64-bit crush_ln()Ilya Dryomov4-0/+306
This is an improved straw bucket that correctly avoids any data movement between items A and B when neither A nor B's weights are changed. Said differently, if we adjust the weight of item C (including adding it anew or removing it completely), we will only see inputs move to or from C, never between other items in the bucket. Notably, there is not intermediate scaling factor that needs to be calculated. The mapping function is a simple function of the item weights. The below commits were squashed together into this one (mostly to avoid adding and then yanking a ~6000 lines worth of crush_ln_table): - crush: add a straw2 bucket type - crush: add crush_ln to calculate nature log efficently - crush: improve straw2 adjustment slightly - crush: change crush_ln to provide 32 more digits - crush: fix crush_get_bucket_item_weight and bucket destroy for straw2 - crush/mapper: fix divide-by-0 in straw2 (with div64_s64() for draw = ln / w and INT64_MIN -> S64_MIN - need to create a proper compat.h in ceph.git) Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53, 32a1ead92efcd351822d22a5fc37d159c65c1338, 6289912418c4a3597a11778bcf29ed5415117ad9, 35fcb04e2945717cf5cfe150b9fa89cb3d2303a1, 6445d9ee7290938de1e4ee9563912a6ab6d8ee5f, b5921d55d16796e12d66ad2c4add7305f9ce2353. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-22crush: ensuring at most num-rep osds are selectedIlya Dryomov1-4/+12
Crush temporary buffers are allocated as per replica size configured by the user. When there are more final osds (to be selected as per rule) than the replicas, buffer overlaps and it causes crash. Now, it ensures that at most num-rep osds are selected even if more number of osds are allowed by the rule. Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0, 234b066ba04976783d15ff2abc3e81b6cc06fb10. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-22crush: drop unnecessary include from mapper.cIlya Dryomov1-1/+0
Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-1/+8
Pull networking fixes from David Miller: "Just a few fixes trickling in at this point. 1) If we see an attached socket on an skb in the ipv4 forwarding path, bail. This can happen due to races with FIB rule addition, and deletion, and we should just drop such frames. From Sebastian Pöhn. 2) pppoe receive should only accept packets destined for this hosts's MAC address. From Joakim Tjernlund. 3) Handle checksum unwrapping properly in ppp receive properly when it's encapsulated in UDP in some way, fix from Tom Herbert. 4) Fix some bugs in mv88e6xxx DSA driver resulting from the conversion from register offset constants to mnenomic macros. From Vivien Didelot. 5) Fix handling of HCA max message size in mlx4 adapters, from Eran Ben ELisha" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP tcp: add memory barriers to write space paths altera tse: Error-Bit on tx-avalon-stream always set. net: dsa: mv88e6xxx: use PORT_DEFAULT_VLAN net: dsa: mv88e6xxx: fix setup of port control 1 ppp: call skb_checksum_complete_unset in ppp_receive_frame net: add skb_checksum_complete_unset pppoe: Lacks DST MAC address check ip_forward: Drop frames with attached skb->sk
2015-04-21tcp: add memory barriers to write space paths[email protected]2-1/+5
Ensure that we either see that the buffer has write space in tcp_poll() or that we perform a wakeup from the input side. Did not run into any actual problem here, but thought that we should make things explicit. Signed-off-by: Jason Baron <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-20ip_forward: Drop frames with attached skb->skSebastian Pöhn1-0/+3
Initial discussion was: [FYI] xfrm: Don't lookup sk_policy for timewait sockets Forwarded frames should not have a socket attached. Especially tw sockets will lead to panics later-on in the stack. This was observed with TPROXY assigning a tw socket and broken policy routing (misconfigured). As a result frame enters forwarding path instead of input. We cannot solve this in TPROXY as it cannot know that policy routing is broken. v2: Remove useless comment Signed-off-by: Sebastian Poehn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-20libceph: expose client options through debugfsIlya Dryomov1-0/+24
Add a client_options attribute for showing libceph options. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-20libceph, ceph: split ceph_show_options()Ilya Dryomov1-0/+37
Split ceph_show_options() into two pieces and move the piece responsible for printing client (libceph) options into net/ceph. This way people adding a libceph option wouldn't have to remember to update code in fs/ceph. Signed-off-by: Ilya Dryomov <[email protected]>
2015-04-20libceph: don't overwrite specific con error msgsIlya Dryomov1-12/+13
- specific con->error_msg messages (e.g. "protocol version mismatch") end up getting overwritten by a catch-all "socket error on read / write", introduced in commit 3a140a0d5c4b ("libceph: report socket read/write error message") - "bad message sequence # for incoming message" loses to "bad crc" due to the fact that -EBADMSG is used for both Fix it, and tidy up con->error_msg assignments and pr_errs while at it. Signed-off-by: Ilya Dryomov <[email protected]>