aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2013-11-19quota/genetlink: use proper genetlink multicast APIsJohannes Berg1-2/+8
The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: Jan Kara <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-19drop_monitor/genetlink: use proper genetlink multicast APIsJohannes Berg2-4/+22
The drop monitor code is abusing the genetlink API and is statically using the generic netlink multicast group 1, even if that group belongs to somebody else (which it invariably will, since it's not reserved.) Make the drop monitor code use the proper APIs to reserve a group ID, but also reserve the group id 1 in generic netlink code to preserve the userspace API. Since drop monitor can be a module, don't clear the bit for it on unregistration. Acked-by: Neil Horman <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-19genetlink: only pass array to genl_register_family_with_ops()Johannes Berg16-39/+31
As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-19tcp: don't update snd_nxt, when a socket is switched from repair modeAndrey Vagin1-1/+0
snd_nxt must be updated synchronously with sk_send_head. Otherwise tp->packets_out may be updated incorrectly, what may bring a kernel panic. Here is a kernel panic from my host. [ 103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150 ... [ 146.301158] Call Trace: [ 146.301158] [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0 Before this panic a tcp socket was restored. This socket had sent and unsent data in the write queue. Sent data was restored in repair mode, then the socket was switched from reapair mode and unsent data was restored. After that the socket was switched back into repair mode. In that moment we had a socket where write queue looks like this: snd_una snd_nxt write_seq |_________|________| | sk_send_head After a second switching from repair mode the state of socket was changed: snd_una snd_nxt, write_seq |_________ ________| | sk_send_head This state is inconsistent, because snd_nxt and sk_send_head are not synchronized. Bellow you can find a call trace, how packets_out can be incremented twice for one skb, if snd_nxt and sk_send_head are not synchronized. In this case packets_out will be always positive, even when sk_write_queue is empty. tcp_write_wakeup skb = tcp_send_head(sk); tcp_fragment if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq)) tcp_adjust_pcount(sk, skb, diff); tcp_event_new_data_sent tp->packets_out += tcp_skb_pcount(skb); I think update of snd_nxt isn't required, when a socket is switched from repair mode. Because it's initialized in tcp_connect_init. Then when a write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent, so it's always is in consistent state. I have checked, that the bug is not reproduced with this patch and all tests about restoring tcp connections work fine. Cc: Pavel Emelyanov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Alexey Kuznetsov <[email protected]> Cc: James Morris <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: Patrick McHardy <[email protected]> Signed-off-by: Andrey Vagin <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-19xfrm: Release dst if this dst is improper for vti tunnelfan.du1-0/+1
After searching rt by the vti tunnel dst/src parameter, if this rt has neither attached to any transformation nor the transformation is not tunnel oriented, this rt should be released back to ip layer. otherwise causing dst memory leakage. Signed-off-by: Fan Du <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-19netlink: fix documentation typo in netlink_set_err()Johannes Berg1-1/+1
The parameter is just 'group', not 'groups', fix the documentation typo. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-19netfilter: ebt_ip6: fix source and destination matchingLuís Fernando Cornachioni Estrozi1-3/+5
This bug was introduced on commit 0898f99a2. This just recovers two checks that existed before as suggested by Bart De Schuymer. Signed-off-by: Luís Fernando Cornachioni Estrozi <[email protected]> Signed-off-by: Bart De Schuymer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-18ping: prevent NULL pointer dereference on write to msg_nameHannes Frederic Sowa1-15/+19
A plain read() on a socket does set msg->msg_name to NULL. So check for NULL pointer first. Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-18ipv6: Fix inet6_init() cleanup orderVlad Yasevich1-2/+2
Commit 6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67 net: ipv6: Add IPv6 support to the ping socket introduced a change in the cleanup logic of inet6_init and has a bug in that ipv6_packet_cleanup() may not be called. Fix the cleanup ordering. CC: Hannes Frederic Sowa <[email protected]> CC: Lorenzo Colitti <[email protected]> CC: Fabio Estevam <[email protected]> Signed-off-by: Vlad Yasevich <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-18genetlink: rename shadowed variableJohannes Berg1-5/+5
Sparse pointed out that the new flags variable I had added shadowed an existing one, rename the new one to avoid that, making the code clearer. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-18inet: prevent leakage of uninitialized memory to user in recv syscallsHannes Frederic Sowa8-38/+17
Only update *addr_len when we actually fill in sockaddr, otherwise we can return uninitialized memory from the stack to the caller in the recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL) checks because we only get called with a valid addr_len pointer either from sock_common_recvmsg or inet_recvmsg. If a blocking read waits on a socket which is concurrently shut down we now return zero and set msg_msgnamelen to 0. Reported-by: mpb <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-18net: ipv6: ndisc: Fix warning when CONFIG_SYSCTL=nFabio Estevam1-1/+1
When CONFIG_SYSCTL=n the following build warning happens: net/ipv6/ndisc.c:1730:1: warning: label 'out' defined but not used [-Wunused-label] The 'out' label is only used when CONFIG_SYSCTL=y, so move it inside the 'ifdef CONFIG_SYSCTL' block. Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-18netfilter: nf_conntrack: decrement global counter after object releasePablo Neira Ayuso1-1/+2
nf_conntrack_free() decrements our counter (net->ct.count) before releasing the conntrack object. That counter is used in the nf_conntrack_cleanup_net_list path to check if it's time to kmem_cache_destroy our cache of conntrack objects. I think we have a race there that should be easier to trigger (although still hard) with CONFIG_DEBUG_OBJECTS_FREE as object releases become slowier according to the following splat: [ 1136.321305] WARNING: CPU: 2 PID: 2483 at lib/debugobjects.c:260 debug_print_object+0x83/0xa0() [ 1136.321311] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 ... [ 1136.321390] Call Trace: [ 1136.321398] [<ffffffff8160d4a2>] dump_stack+0x45/0x56 [ 1136.321405] [<ffffffff810514e8>] warn_slowpath_common+0x78/0xa0 [ 1136.321410] [<ffffffff81051557>] warn_slowpath_fmt+0x47/0x50 [ 1136.321414] [<ffffffff812f8883>] debug_print_object+0x83/0xa0 [ 1136.321420] [<ffffffff8106aa90>] ? execute_in_process_context+0x90/0x90 [ 1136.321424] [<ffffffff812f99fb>] debug_check_no_obj_freed+0x20b/0x250 [ 1136.321429] [<ffffffff8112e7f2>] ? kmem_cache_destroy+0x92/0x100 [ 1136.321433] [<ffffffff8115d945>] kmem_cache_free+0x125/0x210 [ 1136.321436] [<ffffffff8112e7f2>] kmem_cache_destroy+0x92/0x100 [ 1136.321443] [<ffffffffa046b806>] nf_conntrack_cleanup_net_list+0x126/0x160 [nf_conntrack] [ 1136.321449] [<ffffffffa046c43d>] nf_conntrack_pernet_exit+0x6d/0x80 [nf_conntrack] [ 1136.321453] [<ffffffff81511cc3>] ops_exit_list.isra.3+0x53/0x60 [ 1136.321457] [<ffffffff815124f0>] cleanup_net+0x100/0x1b0 [ 1136.321460] [<ffffffff8106b31e>] process_one_work+0x18e/0x430 [ 1136.321463] [<ffffffff8106bf49>] worker_thread+0x119/0x390 [ 1136.321467] [<ffffffff8106be30>] ? manage_workers.isra.23+0x2a0/0x2a0 [ 1136.321470] [<ffffffff8107210b>] kthread+0xbb/0xc0 [ 1136.321472] [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110 [ 1136.321477] [<ffffffff8161b8fc>] ret_from_fork+0x7c/0xb0 [ 1136.321479] [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110 [ 1136.321481] ---[ end trace 25f53c192da70825 ]--- Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-18netfilter: nft_compat: fix error path in nft_parse_compat()Pablo Neira Ayuso1-6/+13
The patch 0ca743a55991: "netfilter: nf_tables: add compatibility layer for x_tables", leads to the following Smatch warning: "net/netfilter/nft_compat.c:140 nft_parse_compat() warn: signedness bug returning '(-34)'" This nft_parse_compat function returns error codes but the return type is u8 so the error codes are transformed into small positive values. The callers don't check the return. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-18netfilter: fix wrong byte order in nf_ct_seqadj_set internal informationPhil Oester1-2/+2
In commit 41d73ec053d2, sequence number adjustments were moved to a separate file. Unfortunately, the sequence numbers that are stored in the nf_ct_seqadj structure are expressed in host byte order. The necessary ntohl call was removed when the call to adjust_tcp_sequence was collapsed into nf_ct_seqadj_set. This broke the FTP NAT helper. Fix it by adding back the byte order conversions. Reported-by: Dawid Stawiarski <[email protected]> Signed-off-by: Phil Oester <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-18netfilter: synproxy: correct wscale option passingMartin Topholm1-3/+4
Timestamp are used to store additional syncookie parameters such as sack, ecn, and wscale. The wscale value we need to encode is the client's wscale, since we can't recover that later in the session. Next overwrite the wscale option so the later synproxy_send_client_synack will send the backend's wscale to the client. Signed-off-by: Martin Topholm <[email protected]> Reviewed-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-18netfilter: synproxy: send mss option to backendMartin Topholm2-0/+2
When the synproxy_parse_options is called on the client ack the mss option will not be present. Consequently mss wont be included in the backend syn packet, which falls back to 536 bytes mss. Therefore XT_SYNPROXY_OPT_MSS is explicitly flagged when recovering mss value from cookie. Signed-off-by: Martin Topholm <[email protected]> Reviewed-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2013-11-16Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2-19/+38
Pull NFS client bugfixes: - Stable fix for data corruption when retransmitting O_DIRECT writes - Stable fix for a deep recursion/stack overflow bug in rpc_release_client - Stable fix for infinite looping when mounting a NFSv4.x volume - Fix a typo in the nfs mount option parser - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is * tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix pnfs Kconfig defaults NFS: correctly report misuse of "migration" mount option. nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once SUNRPC: Avoid deep recursion in rpc_release_client SUNRPC: Fix a data corruption issue when retransmitting RPC calls
2013-11-16Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds6-28/+28
Pull nfsd changes from Bruce Fields: "This includes miscellaneous bugfixes and cleanup and a performance fix for write-heavy NFSv4 workloads. (The most significant nfsd-relevant change this time is actually in the delegation patches that went through Viro, fixing a long-standing bug that can cause NFSv4 clients to miss updates made by non-nfs users of the filesystem. Those enable some followup nfsd patches which I have queued locally, but those can wait till 3.14)" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (24 commits) nfsd: export proper maximum file size to the client nfsd4: improve write performance with better sendspace reservations svcrpc: remove an unnecessary assignment sunrpc: comment typo fix Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation" nfsd4: fix discarded security labels on setattr NFSD: Add support for NFS v4.2 operation checking nfsd4: nfsd_shutdown_net needs state lock NFSD: Combine decode operations for v4 and v4.1 nfsd: -EINVAL on invalid anonuid/gid instead of silent failure nfsd: return better errors to exportfs nfsd: fh_update should error out in unexpected cases nfsd4: need to destroy revoked delegations in destroy_client nfsd: no need to unhash_stid before free nfsd: remove_stid can be incorporated into nfs4_put_delegation nfsd: nfs4_open_delegation needs to remove_stid rather than unhash_stid nfsd: nfs4_free_stid nfsd: fix Kconfig syntax sunrpc: trim off EC bytes in GSSAPI v2 unwrap gss_krb5: document that we ignore sequence number ...
2013-11-15consolidate simple ->d_delete() instancesAl Viro1-10/+1
Rename simple_delete_dentry() to always_delete_dentry() and export it. Export simple_dentry_operations, while we are at it, and get rid of their duplicates Signed-off-by: Al Viro <[email protected]>
2013-11-15pkt_sched: fq: fix pacing for small framesEric Dumazet1-4/+18
For performance reasons, sch_fq tried hard to not setup timers for every sent packet, using a quantum based heuristic : A delay is setup only if the flow exhausted its credit. Problem is that application limited flows can refill their credit for every queued packet, and they can evade pacing. This problem can also be triggered when TCP flows use small MSS values, as TSO auto sizing builds packets that are smaller than the default fq quantum (3028 bytes) This patch adds a 40 ms delay to guard flow credit refill. Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Eric Dumazet <[email protected]> Cc: Maciej Żenczykowski <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-15pkt_sched: fq: warn users using defrateEric Dumazet1-6/+4
Commit 7eec4174ff29 ("pkt_sched: fq: fix non TCP flows pacing") obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users. Suggested by David Miller Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-15genetlink: unify registration functionsJohannes Berg1-37/+16
Now that the ops assignment is just two variables rather than a long list iteration etc., there's no reason to separately export __genl_register_family() and __genl_register_family_with_ops(). Unify the two functions into __genl_register_family() and make genl_register_family_with_ops() call it after assigning the ops. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-15Merge branch 'for-linus' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...
2013-11-15macvlan: disable LRO on lower device instead of macvlanMichal Kubeček1-0/+5
A macvlan device has always LRO disabled so that calling dev_disable_lro() on it does nothing. If we need to disable LRO e.g. because - the macvlan device is inserted into a bridge - IPv6 forwarding is enabled for it - it is in a different namespace than lowerdev and IPv4 forwarding is enabled in it we need to disable LRO on its underlying device instead (as we do for 802.1q VLAN devices). v2: use newly introduced netif_is_macvlan() Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-15Merge branch 'for-upstream' of ↵John W. Linville4-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
2013-11-156lowpan: Uncompression of traffic class field was incorrectJukka Rissanen1-2/+2
If priority/traffic class field in IPv6 header is set (seen when using ssh), the uncompression sets the TC and Flow fields incorrectly. Example: This is IPv6 header of a sent packet. Note the priority/TC (=1) in the first byte. 00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00 00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00 00000020: 02 1e ab ff fe 4c 52 57 This gets compressed like this in the sending side 00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16 00000010: aa 2d fe 92 86 4e be c6 .... In the receiving end, the packet gets uncompressed to this IPv6 header 00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00 00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00 00000020: ab ff fe 4c 52 57 ec c2 First four bytes are set incorrectly and we have also lost two bytes from destination address. The fix is to switch the case values in switch statement when checking the TC field. Signed-off-by: Jukka Rissanen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-15tipc: fix dereference before check warningErik Hugne1-1/+2
This fixes the following Smatch warning: net/tipc/link.c:2364 tipc_link_recv_fragment() warn: variable dereferenced before check '*head' (see line 2361) A null pointer might be passed to skb_try_coalesce if a malicious sender injects orphan fragments on a link. Signed-off-by: Erik Hugne <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-15Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds1-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "Nothing really exciting: some groundwork for changing virtio endian, and some robustness fixes for broken virtio devices, plus minor tweaks" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio_scsi: verify if queue is broken after virtqueue_get_buf() x86, asmlinkage, lguest: Pass in globals into assembler statement virtio: mmio: fix signature checking for BE guests virtio_ring: adapt to notify() returning bool virtio_net: verify if queue is broken after virtqueue_get_buf() virtio_console: verify if queue is broken after virtqueue_get_buf() virtio_blk: verify if queue is broken after virtqueue_get_buf() virtio_ring: add new function virtqueue_is_broken() virtio_test: verify if virtqueue_kick() succeeded virtio_net: verify if virtqueue_kick() succeeded virtio_ring: let virtqueue_{kick()/notify()} return a bool virtio_ring: change host notification API virtio_config: remove virtio_config_val virtio: use size-based config accessors. virtio_config: introduce size-based accessors. virtio_ring: plug kmemleak false positive. virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
2013-11-15seq_file: remove "%n" usage from seq_file usersTetsuo Handa6-57/+52
All seq_printf() users are using "%n" for calculating padding size, convert them to use seq_setwidth() / seq_pad() pair. Signed-off-by: Tetsuo Handa <[email protected]> Signed-off-by: Kees Cook <[email protected]> Cc: Joe Perches <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-14ipv4: fix possible seqlock deadlockEric Dumazet1-1/+1
ip4_datagram_connect() being called from process context, it should use IP_INC_STATS() instead of IP_INC_STATS_BH() otherwise we can deadlock on 32bit arches, or get corruptions of SNMP counters. Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dave Jones <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14net/hsr: Fix possible leak in 'hsr_get_node_status()'Geyslan G. Bem1-1/+1
If 'hsr_get_node_data()' returns error, going directly to 'fail' label doesn't free the memory pointed by 'skb_out'. Signed-off-by: Geyslan G. Bem <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14pkt_sched: fq: change classification of control packetsMaciej Żenczykowski1-7/+1
Initial sch_fq implementation copied code from pfifo_fast to classify a packet as a high prio packet. This clashes with setups using PRIO with say 7 bands, as one of the band could be incorrectly (mis)classified by FQ. Packets would be queued in the 'internal' queue, and no pacing ever happen for this special queue. Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Maciej Żenczykowski <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14genetlink: make all genl_ops users constJohannes Berg14-18/+18
Now that genl_ops are no longer modified in place when registering, they can be made const. This patch was done mostly with spatch: @@ identifier ops; @@ +const struct genl_ops ops[] = { ... }; (except the struct thing in net/openvswitch/datapath.c) Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14genetlink: allow making ops constJohannes Berg2-19/+25
Allow making the ops array const by not modifying the ops flags on registration but rather only when ops are sent out in the family information. No users are updated yet except for the pre_doit/post_doit calls in wireless (the only ones that exist now.) Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14genetlink: register family ops as arrayJohannes Berg1-45/+33
Instead of using a linked list, use an array. This reduces the data size needed by the users of genetlink, for example in wireless (net/wireless/nl80211.c) on 64-bit it frees up over 1K of data space. Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns -EINVAL anyway, therefore no such event could ever be sent. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14genetlink: remove genl_register_ops/genl_unregister_opsJohannes Berg1-56/+1
genl_register_ops() is still needed for internal registration, but is no longer available to users of the API. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14wimax: use genl_register_family_with_ops()Johannes Berg6-119/+47
This simplifies the code since there's no longer a need to have error handling in the registration. Unfortunately it means more extern function declarations are needed, but the overall goal would seem to justify this. Due to the removal of duplication in the netlink policies, this reduces the size of wimax by almost 1k. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14ieee802154: use genl_register_family_with_ops()Johannes Berg4-94/+51
This simplifies the code since there's no longer a need to have error handling in the registration. Unfortunately it means more extern function declarations are needed, but the overall goal would seem to justify this. While at it, also fix the registration error path - if the family registration failed then it shouldn't be unregistered. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14hsr: use genl_register_family_with_ops()Johannes Berg1-29/+17
This simplifies the code since there's no longer a need to have error handling in the registration. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14ip6tnl: fix use after free of fb_tnl_devNicolas Dichtel1-5/+13
Bug has been introduced by commit bb8140947a24 ("ip6tnl: allow to use rtnl ops on fb tunnel"). When ip6_tunnel.ko is unloaded, FB device is delete by rtnl_link_unregister() and then we try to use the pointer in ip6_tnl_destroy_tunnels(). Let's add an handler for dellink, which will never remove the FB tunnel. With this patch it will no more be possible to remove it via 'ip link del ip6tnl0', but it's safer. The same fix was already proposed by Willem de Bruijn <[email protected]> for sit interfaces. CC: Willem de Bruijn <[email protected]> Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14sit/gre6: don't try to add the same route two timesNicolas Dichtel1-3/+0
addrconf_add_linklocal() already adds the link local route, so there is no reason to add it before calling this function. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14sit: link local routes are missingNicolas Dichtel1-19/+5
When a link local address was added to a sit interface, the corresponding route was not configured. This breaks routing protocols that use the link local address, like OSPFv3. To ease the code reading, I remove sit_route_add(), which only adds v4 mapped routes, and add this kind of route directly in sit_add_v4_addrs(). Thus link local and v4 mapped routes are configured in the same place. Reported-by: Li Hongjun <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14sit: fix prefix length of ll and v4mapped addressesNicolas Dichtel1-7/+4
When the local IPv4 endpoint is wilcard (0.0.0.0), the prefix length is correctly set, ie 64 if the address is a link local one or 96 if the address is a v4 mapped one. But when the local endpoint is specified, the prefix length is set to 128 for both kind of address. This patch fix this wrong prefix length. Signed-off-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14sit: fix use after free of fb_tunnel_devWillem de Bruijn1-4/+14
Bug: The fallback device is created in sit_init_net and assumed to be freed in sit_exit_net. First, it is dereferenced in that function, in sit_destroy_tunnels: struct net *net = dev_net(sitn->fb_tunnel_dev); Prior to this, rtnl_unlink_register has removed all devices that match rtnl_link_ops == sit_link_ops. Commit 205983c43700 added the line + sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; which cases the fallback device to match here and be freed before it is last dereferenced. Fix: This commit adds an explicit .delllink callback to sit_link_ops that skips deallocation at rtnl_unlink_register for the fallback device. This mechanism is comparable to the one in ip_tunnel. It also modifies sit_destroy_tunnels and its only caller sit_exit_net to avoid the offending dereference in the first place. That double lookup is more complicated than required. Test: The bug is only triggered when CONFIG_NET_NS is enabled. It causes a GPF only when CONFIG_DEBUG_SLAB is enabled. Verified that this bug exists at the mentioned commit, at davem-net HEAD and at 3.11.y HEAD. Verified that it went away after applying this patch. Fixes: 205983c43700 ("sit: allow to use rtnl ops on fb tunnel") Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14net: sctp: bug-fixing: retran_path not set properly after transports ↵Chang Xiangzhong1-2/+4
recovering (v3) When a transport recovers due to the new coming sack, SCTP should iterate all of its transport_list to locate the __two__ most recently used transport and set to active_path and retran_path respectively. The exising code does not find the two properly - In case of the following list: [most-recent] -> [2nd-most-recent] -> ... Both active_path and retran_path would be set to the 1st element. The bug happens when: 1) multi-homing 2) failure/partial_failure transport recovers Both active_path and retran_path would be set to the same most-recent one, in other words, retran_path would not take its role - an end user might not even notice this issue. Signed-off-by: Chang Xiangzhong <[email protected]> Acked-by: Vlad Yasevich <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14net-tcp: fix panic in tcp_fastopen_cache_set()Eric Dumazet1-1/+4
We had some reports of crashes using TCP fastopen, and Dave Jones gave a nice stack trace pointing to the error. Issue is that tcp_get_metrics() should not be called with a NULL dst Fixes: 1fe4c481ba637 ("net-tcp: Fast Open client - cookie cache") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dave Jones <[email protected]> Cc: Yuchung Cheng <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Tested-by: Dave Jones <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14tcp: tsq: restore minimal amount of queueingEric Dumazet2-7/+5
After commit c9eeec26e32e ("tcp: TSQ can use a dynamic limit"), several users reported throughput regressions, notably on mvneta and wifi adapters. 802.11 AMPDU requires a fair amount of queueing to be effective. This patch partially reverts the change done in tcp_write_xmit() so that the minimal amount is sysctl_tcp_limit_output_bytes. It also remove the use of this sysctl while building skb stored in write queue, as TSO autosizing does the right thing anyway. Users with well behaving NICS and correct qdisc (like sch_fq), can then lower the default sysctl_tcp_limit_output_bytes value from 128KB to 8KB. This new usage of sysctl_tcp_limit_output_bytes permits each driver authors to check how their driver performs when/if the value is set to a minimum of 4KB. Normally, line rate for a single TCP flow should be possible, but some drivers rely on timers to perform TX completion and too long TX completion delays prevent reaching full throughput. Fixes: c9eeec26e32e ("tcp: TSQ can use a dynamic limit") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Sujith Manoharan <[email protected]> Reported-by: Arnaud Ebalard <[email protected]> Tested-by: Sujith Manoharan <[email protected]> Cc: Felix Fietkau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14bridge: Fix memory leak when deleting bridge with vlan filtering enabledToshiaki Makita1-0/+1
We currently don't call br_vlan_flush() when deleting a bridge, which leads to memory leak if br->vlan_info is allocated. Steps to reproduce: while : do brctl addbr br0 bridge vlan add dev br0 vid 10 self brctl delbr br0 done We can observe the cache size of corresponding slab entry (as kmalloc-2048 in SLUB) is increased. kmemleak output: unreferenced object 0xffff8800b68a7000 (size 2048): comm "bridge", pid 2086, jiffies 4295774704 (age 47.656s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 48 9b 36 00 88 ff ff .........H.6.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815eb6ae>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8116a1ca>] kmem_cache_alloc_trace+0xca/0x220 [<ffffffffa03eddd6>] br_vlan_add+0x66/0xe0 [bridge] [<ffffffffa03e543c>] br_setlink+0x2dc/0x340 [bridge] [<ffffffff8150e481>] rtnl_bridge_setlink+0x101/0x200 [<ffffffff8150d9d9>] rtnetlink_rcv_msg+0x99/0x260 [<ffffffff81528679>] netlink_rcv_skb+0xa9/0xc0 [<ffffffff8150d938>] rtnetlink_rcv+0x28/0x30 [<ffffffff81527ccd>] netlink_unicast+0xdd/0x190 [<ffffffff8152807f>] netlink_sendmsg+0x2ff/0x740 [<ffffffff814e8368>] sock_sendmsg+0x88/0xc0 [<ffffffff814e8ac8>] ___sys_sendmsg.part.14+0x298/0x2b0 [<ffffffff814e91de>] __sys_sendmsg+0x4e/0x90 [<ffffffff814e922e>] SyS_sendmsg+0xe/0x10 [<ffffffff81601669>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Toshiaki Makita <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-11-14bridge: Call vlan_vid_del for all vids at nbp_vlan_flushToshiaki Makita1-0/+4
We should call vlan_vid_del for all vids at nbp_vlan_flush to prevent vid_info->refcount from being leaked when detaching a bridge port. Signed-off-by: Toshiaki Makita <[email protected]> Signed-off-by: David S. Miller <[email protected]>