aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2010-04-26phonet: use phonet_pernet instead of directly net_genericJiri Pirko1-8/+15
As in for example pppoe introduce phonet_pernet and use it instead of calling net_generic directly. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-25netns: rename unregister_pernet_subsys parameterJiri Pirko1-2/+2
Stay consistent with other functions and with comment also and name pernet_operations parameter properly. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-24rps: optimize rps_get_cpu()Changli Gao1-13/+11
optimize rps_get_cpu(). don't initialize ports when we can get the ports. one memory access for ports than two. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23Merge branch 'net-next-2.6_20100423a/br/br_multicast_v3' of ↵David S. Miller4-156/+619
git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next
2010-04-23IPv6: Complete IPV6_DONTFRAG supportBrian Haley5-8/+112
Finally add support to detect a local IPV6_DONTFRAG event and return the relevant data to the user if they've enabled IPV6_RECVPATHMTU on the socket. The next recvmsg() will return no data, but have an IPV6_PATHMTU as ancillary data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23IPv6: Add dontfrag argument to relevant functionsBrian Haley7-10/+42
Add dontfrag argument to relevant functions for IPV6_DONTFRAG support, as well as allowing the value to be passed-in via ancillary cmsg data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23IPv6: data structure changes for new socket optionsBrian Haley1-0/+46
Add underlying data structure changes and basic setsockopt() and getsockopt() support for IPV6_RECVPATHMTU, IPV6_PATHMTU, and IPV6_DONTFRAG. IPV6_PATHMTU is actually fully functional at this point. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23l2tp_eth: fix memory allocationJiri Pirko1-28/+1
Since .size is set properly in "struct pernet_operations l2tp_eth_net_ops", allocating space for "struct l2tp_eth_net" by hand is not correct, even causes memory leakage. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23l2tp: fix memory allocationJiri Pirko1-28/+1
Since .size is set properly in "struct pernet_operations l2tp_net_ops", allocating space for "struct l2tp_net" by hand is not correct, even causes memory leakage. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23Merge branch 'master' into for-davemJohn W. Linville9-15/+68
Conflicts: drivers/net/wireless/ath/ath9k/phy.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-debugfs.c
2010-04-23bridge br_multicast: IPv6 MLD support.YOSHIFUJI Hideaki3-4/+429
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23bridge br_multicast: Make functions less ipv4 dependent.YOSHIFUJI Hideaki2-58/+151
Introduce struct br_ip{} to store ip address and protocol and make functions more generic so that we can support both IPv4 and IPv6 with less pain. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23ipv6 mcast: Introduce include/net/mld.h for MLD definitions.YOSHIFUJI Hideaki1-95/+40
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-22X25: Add if_x25.h and x25 to device identifiersAndrew Hendry1-16/+20
V2 Feedback from John Hughes. - Add header for userspace implementations such as xot/xoe to use - Use explicit values for interface stability - No changes to driver patches V1 - Use identifiers instead of magic numbers for X25 layer 3 to device interface. - Also fixed checkpatch notes on updated code. [ Add new user header to include/linux/Kbuild -DaveM ] Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22net: Socket filter ancilliary data access for skb->dev->typePaul LeoNerd Evans1-0/+7
Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving access to skb->dev->type, as reported in the sll_hatype field. When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all interfaces, there doesn't appear to be a way for the filter program to actually find out the underlying hardware type the packet was captured on. This patch adds such ability. This patch also handles the case where skb->dev can be NULL, such as on netlink sockets. Signed-off-by: Paul Evans <leonerd@leonerd.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22tcp: fix outsegs stat for TSO segmentsTom Herbert1-2/+3
Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter. Without doing this, the counter can be off by orders of magnitude from the actual number of segments sent. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22IPv6: Generic TTL Security Mechanism (final version)Stephen Hemminger2-1/+25
This patch adds IPv6 support for RFC5082 Generalized TTL Security Mechanism. Not to users of mapped address; the IPV6 and IPV4 socket options are seperate. The server does have to deal with both IPv4 and IPv6 socket options and the client has to handle the different for each family. On client: int ttl = 255; getaddrinfo(argv[1], argv[2], &hint, &result); for (rp = result; rp != NULL; rp = rp->ai_next) { s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (s < 0) continue; if (rp->ai_family == AF_INET) { setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl)); } else if (rp->ai_family == AF_INET6) { setsockopt(s, IPPROTO_IPV6, IPV6_UNICAST_HOPS, &ttl, sizeof(ttl))) } if (connect(s, rp->ai_addr, rp->ai_addrlen) == 0) { ... On server: int minttl = 255 - maxhops; getaddrinfo(NULL, port, &hints, &result); for (rp = result; rp != NULL; rp = rp->ai_next) { s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (s < 0) continue; if (rp->ai_family == AF_INET6) setsockopt(s, IPPROTO_IPV6, IPV6_MINHOPCOUNT, &minttl, sizeof(minttl)); setsockopt(s, IPPROTO_IP, IP_MINTTL, &minttl, sizeof(minttl)); if (bind(s, rp->ai_addr, rp->ai_addrlen) == 0) break ... Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22net: Orphan and de-dst skbs earlier in xmit path.David S. Miller1-7/+8
This way GSO packets don't get handled differently. With help from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-04-22rps: immediate send IPI in process_backlog()Eric Dumazet1-34/+42
If some skb are queued to our backlog, we are delaying IPI sending at the end of net_rx_action(), increasing latencies. This defeats the queueing, since we want to quickly dispatch packets to the pool of worker cpus, then eventually deeply process our packets. It's better to send IPI before processing our packets in upper layers, from process_backlog(). Change the _and_disable_irq suffix to _and_enable_irq(), since we enable local irq in net_rps_action(), sorry for the confusion. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21ethernet: print protocol in host byte orderJohannes Berg1-1/+1
Eric's recent patch added __force, but this place would seem to require actually doing a byte order conversion so the printk is consistent across architectures. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21net: Introduce skb_orphan_try()Eric Dumazet1-1/+0
At this point, skb->destructor is not the original one (stored in DEV_GSO_CB(skb)->destructor) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21fasync: RCU and fine grained lockingEric Dumazet1-62/+11
kill_fasync() uses a central rwlock, candidate for RCU conversion, to avoid cache line ping pongs on SMP. fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short section instead during whole list scan. Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync() doesnt need its own implementation and can use fasync_helper(), to reduce code size and complexity. We can remove __kill_fasync() direct use in net/socket.c, and rename it to kill_fasync_rcu(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21tcp: Mark v6 response packets as CHECKSUM_PARTIALDavid S. Miller1-0/+3
Otherwise we only get the checksum right for data-less TCP responses. Noticed by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21tcp: Fix ipv6 checksumming on response packets for real.David S. Miller1-2/+0
Commit 6651ffc8e8bdd5fb4b7d1867c6cfebb4f309512c ("ipv6: Fix tcp_v6_send_response transport header setting.") fixed one half of why ipv6 tcp response checksums were invalid, but it's not the whole story. If we're going to use CHECKSUM_PARTIAL for these things (which we are since commit 2e8e18ef52e7dd1af0a3bd1f7d990a1d0b249586 "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"), we can't be setting buff->csum as we always have been here in tcp_v6_send_response. We need to leave it at zero. Kill that line and checksums are good again. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21Merge branch 'master' of ↵David S. Miller10-11/+19
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-6000.c net/core/dev.c
2010-04-21net: Fix an RCU warning in dev_pick_tx()David Howells1-1/+1
Fix the following RCU warning in dev_pick_tx(): =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- net/core/dev.c:1993 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by swapper/0: #0: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff81039e65>] run_timer_softirq+0x17b/0x278 #1: (rcu_read_lock_bh){.+....}, at: [<ffffffff812ea3eb>] dev_queue_xmit+0x14e/0x4dc stack backtrace: Pid: 0, comm: swapper Not tainted 2.6.34-rc5-cachefs #4 Call Trace: <IRQ> [<ffffffff810516c4>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffff812ea4f6>] dev_queue_xmit+0x259/0x4dc [<ffffffff812ea3eb>] ? dev_queue_xmit+0x14e/0x4dc [<ffffffff81052324>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81035362>] ? local_bh_enable_ip+0xbc/0xc1 [<ffffffff812f0954>] neigh_resolve_output+0x24b/0x27c [<ffffffff8134f673>] ip6_output_finish+0x7c/0xb4 [<ffffffff81350c34>] ip6_output2+0x256/0x261 [<ffffffff81052324>] ? trace_hardirqs_on+0xd/0xf [<ffffffff813517fb>] ip6_output+0xbbc/0xbcb [<ffffffff8135bc5d>] ? fib6_force_start_gc+0x2b/0x2d [<ffffffff81368acb>] mld_sendpack+0x273/0x39d [<ffffffff81368858>] ? mld_sendpack+0x0/0x39d [<ffffffff81052099>] ? mark_held_locks+0x52/0x70 [<ffffffff813692fc>] mld_ifc_timer_expire+0x24f/0x288 [<ffffffff81039ed6>] run_timer_softirq+0x1ec/0x278 [<ffffffff81039e65>] ? run_timer_softirq+0x17b/0x278 [<ffffffff813690ad>] ? mld_ifc_timer_expire+0x0/0x288 [<ffffffff81035531>] ? __do_softirq+0x69/0x140 [<ffffffff8103556a>] __do_softirq+0xa2/0x140 [<ffffffff81002e0c>] call_softirq+0x1c/0x28 [<ffffffff81004b54>] do_softirq+0x38/0x80 [<ffffffff81034f06>] irq_exit+0x45/0x47 [<ffffffff810177c3>] smp_apic_timer_interrupt+0x88/0x96 [<ffffffff810028d3>] apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff810488dd>] ? __atomic_notifier_call_chain+0x0/0x86 [<ffffffff810096bf>] ? mwait_idle+0x6e/0x78 [<ffffffff810096b6>] ? mwait_idle+0x65/0x78 [<ffffffff810011cb>] cpu_idle+0x4d/0x83 [<ffffffff81380b05>] rest_init+0xb9/0xc0 [<ffffffff81380a4c>] ? rest_init+0x0/0xc0 [<ffffffff8168dcf0>] start_kernel+0x392/0x39d [<ffffffff8168d2a3>] x86_64_start_reservations+0xb3/0xb7 [<ffffffff8168d38b>] x86_64_start_kernel+0xe4/0xeb An rcu_dereference() should be an rcu_dereference_bh(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller1-1/+4
2010-04-21ipv6: Fix tcp_v6_send_response transport header setting.Herbert Xu1-1/+1
My recent patch to remove the open-coded checksum sequence in tcp_v6_send_response broke it as we did not set the transport header pointer on the new packet. Actually, there is code there trying to set the transport header properly, but it sets it for the wrong skb ('skb' instead of 'buff'). This bug was introduced by commit a8fdf2b331b38d61fb5f11f3aec4a4f9fb2dedcb ("ipv6: Fix tcp_v6_send_response(): it didn't set skb transport header") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20net: Remove two unnecessary exports (skbuff).Rami Rosen1-4/+2
There is no need to export skb_under_panic() and skb_over_panic() in skbuff.c, since these methods are used only in skbuff.c ; this patch removes these two exports. It also marks these functions as 'static' and removeS the extern declarations of them from include/linux/skbuff.h Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20net: Fix various endianness glitchesEric Dumazet17-61/+65
Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20bridge: add a missing ntohs()Eric Dumazet1-1/+1
grec_nsrcs is in network order, we should convert to host horder in br_multicast_igmp3_report() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20Merge branch 'master' of ↵David S. Miller2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2010-04-20net: sk_sleep() helperEric Dumazet40-191/+191
Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20mac80211: Fix ieee80211_sta_conn_mon_timer with hw connection monitoringJuuso Oikarinen1-0/+5
When IEEE80211_HW_CONNECTION_MONITOR is configured by the driver, starting of ieee80211_sta_conn_mon_timer should be prevented, as it is then not needed. This is currently partially the case. As it seems, when a probe-response is received from the AP the timer is still restarted, thus restarting the host based connection keep-alive mechanism. These probe-responses happen at least when scanning while associated. Fix this by preventing starting of the ieee80211_sta_conn_mon_timer in the ieee80211_rx_mgmt_probe_resp function. Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20mac80211: sample survey implementation for mac80211 & hwsimHolger Schurig2-0/+21
This adds the survey function to both mac80211 itself and to mac80211_hwsim. For the latter driver, we simply invent some noise level.A real driver which cannot determine the real channel noise MUST NOT report any noise, especially not a magically conjured one :-) Signed-off-by: Holger Schurig <holgerschurig@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20ath9k: Group Key fix for VAPsDaniel Yingqiang Ma1-0/+1
When I set up multiple VAPs with ath9k, I encountered an issue that the traffic may be lost after a while. The detailed phenomenon is 1. After a while the clients connected to one of these VAPs will get into a state that no broadcast/multicast packets can be transfered successfully while the unicast packets can be transfered normally. 2. Minutes latter the unitcast packets transfer will fail as well, because the ARP entry is expired and it can't be freshed due to the broadcast trouble. It's caused by the group key overwritten and someone discussed this issue in ath9k-devel maillist before, but haven't work out a fix yet. I referred the method in madwifi, and made a patch for ath9k. The method is to set the high bit of the sender(AP)'s address, and associated that mac and the group key. It requires the hardware supports multicast frame key search. It seems true for AR9160. Not sure whether it's the correct way to fix this issue. But it seems to work in my test. The patch is attached, feel free to revise it. Signed-off-by: Daniel Yingqiang ma <yma.cool@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20net: emphasize rtnl lock required in call_netdevice_notifiersJiri Pirko1-0/+1
Since netdev_chain is guarded by rtnl_lock, ASSERT_RTNL should be present here to make sure that all callers of call_netdevice_notifiers does the locking properly. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20rps: consistent rxhashEric Dumazet1-7/+18
In case we compute a software skb->rxhash, we can generate a consistent hash : Its value will be the same in both flow directions. This helps some workloads, like conntracking, since the same state needs to be accessed in both directions. tbench + RFS + this patch gives better results than tbench with default kernel configuration (no RPS, no RFS) Also fixed some sparse warnings. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20rps: cleanupsEric Dumazet1-69/+80
struct softnet_data holds many queues, so consistent use "sd" name instead of "queue" is better. Adds a rps_ipi_queued() helper to cleanup enqueue_to_backlog() Adds a _and_irq_disable suffix to net_rps_action() name, as David suggested. incr_input_queue_head() becomes input_queue_head_incr() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19rps: static functionsEric Dumazet1-2/+2
store_rps_map() & store_rps_dev_flow_table_cnt() are static. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19mac80211: add missing newlineJohannes Berg1-1/+1
One HT debugging printk is missing a newline, add it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19mac80211: Prevent running sta_cleanup timer unnecessarilyJuuso Oikarinen2-3/+17
The sta_cleanup timer is used to periodically expire buffered frames from the tx buf. The timer is executing periodically, regardless of the need for it. This is wasting resources. Fix this simply by not restarting the sta_cleanup timer if the tx buffer was empty. Restart the timer when there is some more tx-traffic. Cc: Janne Ylälehto <janne.ylalehto@nokia.com> Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19mac80211: fix stopping RX BA session from timerJohannes Berg1-5/+13
Kalle reported that his system deadlocks since my recent work in this area. The reason quickly became apparent: we try to cancel_timer_sync() a timer from within itself. Fix that by making the function aware of the context it is called from. Reported-by: Kalle Valo <kvalo@adurom.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Kalle Valo <kvalo@adurom.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19mac80211: pass HT changes to driver when off channelReinette Chatre1-0/+2
Since "mac80211: make off-channel work generic" drivers have not been notified of configuration changes after association or authentication. This caused more dependence on current state to ensure driver will be notified when configuration changes occur. One such problem arises if off-channel is in progress when HT information changes. Since HT is only enabled on the "oper_channel" the driver will never be notified of this change. Usually the driver is notified soon after of a BSS information change (BSS_CHANGED_HT) ... but since the driver did not get a notification that this is a HT channel the new BSS information does not make sense. Fix this by also changing the off-channel information when HT is enabled and thus cause driver to be notified correctly. This fixes a problem in 4965 when associated with 5GHz 40MHz channel. Without this patch the system can associate but is unable to transfer any data, not even ping. See http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2158 Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19mac80211: remove bogus TX agg state assignmentJohannes Berg1-1/+0
When the addba timer expires but has no work to do, it should not affect the state machine. If it does, TX will not see the successfully established and we can also crash trying to re-establish the session. Cc: stable@kernel.org [2.6.32, 2.6.33] Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19rps: shortcut net_rps_action()Eric Dumazet1-47/+32
net_rps_action() is a bit expensive on NR_CPUS=64..4096 kernels, even if RPS is not active. Tom Herbert used two bitmasks to hold information needed to send IPI, but a single LIFO list seems more appropriate. Move all RPS logic into net_rps_action() to cleanup net_rx_action() code (remove two ifdefs) Move rps_remote_softirq_cpus into softnet_data to share its first cache line, filling an existing hole. In a future patch, we could call net_rps_action() from process_backlog() to make sure we send IPI before handling this cpu backlog. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds5-7/+11
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: gigaset: include cleanup cleanup packet : remove init_net restriction WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver. ip: Fix ip_dev_loopback_xmit() net: dev_pick_tx() fix fib: suppress lockdep-RCU false positive in FIB trie. tun: orphan an skb on tx forcedeth: fix tx limit2 flag check iwlwifi: work around bogus active chains detection
2010-04-18net: Introduce skb_orphan_try()Eric Dumazet1-14/+13
Transmitted skb might be attached to a socket and a destructor, for memory accounting purposes. Traditionally, this destructor is called at tx completion time, when skb is freed. When tx completion is performed by another cpu than the sender, this forces some cache lines to change ownership. XPS was an attempt to give tx completion to initial cpu. David idea is to call destructor right before giving skb to device (call to ndo_start_xmit()). Because device queues are usually small, orphaning skb before tx completion is not a big deal. Some drivers already do this, we could do it in upper level. There is one known exception to this early orphaning, called tx timestamping. It needs to keep a reference to socket until device can give a hardware or software timestamp. This patch adds a skb_orphan_try() helper, to centralize all exceptions to early orphaning in one spot, and use it in dev_hard_start_xmit(). "tbench 16" results on a Nehalem machine (2 X5570 @ 2.93GHz) before: Throughput 4428.9 MB/sec 16 procs after: Throughput 4448.14 MB/sec 16 procs UDP should get even better results, its destructor being more complex, since SOCK_USE_WRITE_QUEUE is not set (four atomic ops instead of one) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-18net: remove time limit in process_backlog()Eric Dumazet1-3/+2
- There is no point to enforce a time limit in process_backlog(), since other napi instances dont follow same rule. We can exit after only one packet processed... The normal quota of 64 packets per napi instance should be the norm, and net_rx_action() already has its own time limit. Note : /proc/net/core/dev_weight can be used to tune this 64 default value. - Use DEFINE_PER_CPU_ALIGNED for softnet_data definition. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-17rps: rps_sock_flow_table is mostly readEric Dumazet1-1/+1
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>