aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2010-10-21ethtool: Add support for vlan accleration.Jesse Gross1-1/+2
Now that vlan acceleration is handled consistently regardless of usage, it is possible to enable and disable it at will. This adds support for Ethtool operations that change the offloading status for debugging purposes, similar to other forms of hardware acceleration. Signed-off-by: Jesse Gross <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-21vlan: Centralize handling of hardware acceleration.Jesse Gross3-137/+44
Currently each driver that is capable of vlan hardware acceleration must be aware of the vlan groups that are configured and then pass the stripped tag to a specialized receive function. This is different from other types of hardware offload in that it places a significant amount of knowledge in the driver itself rather keeping it in the networking core. This makes vlan offloading function more similarly to other forms of offloading (such as checksum offloading or TSO) by doing the following: * On receive, stripped vlans are passed directly to the network core, without attempting to check for vlan groups or reconstructing the header if no group * vlans are made less special by folding the logic into the main receive routines * On transmit, the device layer will add the vlan header in software if the hardware doesn't support it, instead of spreading that logic out in upper layers, such as bonding. There are a number of advantages to this: * Fixes all bugs with drivers incorrectly dropping vlan headers at once. * Avoids having to disable VLAN acceleration when in promiscuous mode (good for bridging since it always puts devices in promiscuous mode). * Keeps VLAN tag separate until given to ultimate consumer, which avoids needing to do header reconstruction as in tg3 unless absolutely necessary. * Consolidates common code in core networking. Signed-off-by: Jesse Gross <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-21vlan: Avoid hash table lookup to find group.Jesse Gross3-72/+11
A struct net_device always maps to zero or one vlan groups and we always know the device when we are looking up a group. We currently do a hash table lookup on the device to find the group but it is much simpler to just store a pointer. Signed-off-by: Jesse Gross <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-21vlan: Enable software emulation for vlan accleration.Jesse Gross1-3/+33
Currently users of hardware vlan accleration need to know whether the device supports it before generating packets. However, vlan acceleration will soon be available in a more flexible manner so knowing ahead of time becomes much more difficult. This adds a software fallback path for vlan packets on devices without the necessary offloading support, similar to other types of hardware accleration. Signed-off-by: Jesse Gross <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-21vlan: Rename VLAN_GROUP_ARRAY_LEN to VLAN_N_VID.Jesse Gross2-10/+10
VLAN_GROUP_ARRAY_LEN is simply the number of possible vlan VIDs. Since vlan groups will soon be more of an implementation detail for vlan devices, rename the constant to be descriptive of its actual purpose. Signed-off-by: Jesse Gross <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-21ebtables: Allow filtering of hardware accelerated vlan frames.Jesse Gross3-18/+34
An upcoming commit will allow packets with hardware vlan acceleration information to be passed though more parts of the network stack, including packets trunked through the bridge. This adds support for matching and filtering those packets through ebtables. Signed-off-by: Jesse Gross <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-21Merge branch 'master' of ↵David S. Miller1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6
2010-10-21secmark: fix config problem when CONFIG_NF_CONNTRACK_SECMARK is not setEric Paris1-0/+2
When CONFIG_NF_CONNTRACK_SECMARK is not set we accidentally attempt to use the secmark fielf of struct nf_conn. Problem is when that config isn't set the field doesn't exist. whoops. Wrap the incorrect usage in the config. Signed-off-by: Eric Paris <[email protected]> Signed-off-by: James Morris <[email protected]>
2010-10-21secmark: export secctx, drop secmark in procfsEric Paris2-6/+50
The current secmark code exports a secmark= field which just indicates if there is special labeling on a packet or not. We drop this field as it isn't particularly useful and instead export a new field secctx= which is the actual human readable text label. Signed-off-by: Eric Paris <[email protected]> Acked-by: Patrick McHardy <[email protected]> Signed-off-by: James Morris <[email protected]>
2010-10-21conntrack: export lsm context rather than internal secid via netlinkEric Paris1-10/+36
The conntrack code can export the internal secid to userspace. These are dynamic, can change on lsm changes, and have no meaning in userspace. We should instead be sending lsm contexts to userspace instead. This patch sends the secctx (rather than secid) to userspace over the netlink socket. We use a new field CTA_SECCTX and stop using the the old CTA_SECMARK field since it did not send particularly useful information. Signed-off-by: Eric Paris <[email protected]> Reviewed-by: Paul Moore <[email protected]> Acked-by: Patrick McHardy <[email protected]> Signed-off-by: James Morris <[email protected]>
2010-10-21secmark: make secmark object handling genericEric Paris2-19/+17
Right now secmark has lots of direct selinux calls. Use all LSM calls and remove all SELinux specific knowledge. The only SELinux specific knowledge we leave is the mode. The only point is to make sure that other LSMs at least test this generic code before they assume it works. (They may also have to make changes if they do not represent labels as strings) Signed-off-by: Eric Paris <[email protected]> Acked-by: Paul Moore <[email protected]> Acked-by: Patrick McHardy <[email protected]> Signed-off-by: James Morris <[email protected]>
2010-10-21secmark: do not return early if there was no errorEric Paris1-1/+1
Commit 4a5a5c73 attempted to pass decent error messages back to userspace for netfilter errors. In xt_SECMARK.c however the patch screwed up and returned on 0 (aka no error) early and didn't finish setting up secmark. This results in a kernel BUG if you use SECMARK. Signed-off-by: Eric Paris <[email protected]> Acked-by: Paul Moore <[email protected]> Signed-off-by: James Morris <[email protected]>
2010-10-20ceph: fix num_pages_free accounting in pagelistSage Weil1-0/+1
Decrement the free page counter when removing a page from the free_list. Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: don't crash when passed bad mount optionsYehuda Sadeh1-1/+1
This only happened when parse_extra_token was not passed to ceph_parse_option() (hence, only happened in rbd). Signed-off-by: Yehuda Sadeh <[email protected]>
2010-10-20ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursorGreg Farnum1-9/+97
These facilitate preallocation of pages so that we can encode into the pagelist in an atomic context. Signed-off-by: Greg Farnum <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20rbd: introduce rados block device (rbd), based on libcephYehuda Sadeh1-2/+1
The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <[email protected]> Signed-off-by: Sage Weil <[email protected]>
2010-10-20ceph: factor out libceph from Ceph file systemYehuda Sadeh29-0/+10662
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <[email protected]>
2010-10-20net: avoid RCU for NOCACHE dstEric Dumazet2-6/+32
There is no point using RCU for dst we allocate for a very short time (used once). Change dst_release() to take DST_NOCACHE into account, but also change skb_dst_set_noref() to force a refcount increment for such dst. This is a _huge_ gain, because we dont waste memory to store xx thousand of dsts. Instead of queueing them to RCU, we can free them instantly. CPU caches can stay hot, re-using same memory blocks to hold temporary dsts. Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(), since atomic_dec_return() implies a full memory barrier. Stress test, 160.000.000 udp frames sent, IP route cache disabled (DDOS). Before: real 0m38.091s user 0m13.189s sys 7m53.018s After: real 0m29.946s user 0m12.157s sys 7m40.605s For reference, if IP route cache was enabled : real 0m32.030s user 0m10.521s sys 8m15.243s Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-20net: allocate tx queues in register_netdeviceTom Herbert1-53/+53
This patch introduces netif_alloc_netdev_queues which is called from register_device instead of alloc_netdev_mq. This makes TX queue allocation symmetric with RX allocation. Also, queue locks allocation is done in netdev_init_one_queue. Change set_real_num_tx_queues to fail if requested number < 1 or greater than number of allocated queues. Signed-off-by: Tom Herbert <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-20net: cleanups in RX queue allocationTom Herbert1-19/+17
Clean up in RX queue allocation. In netif_set_real_num_rx_queues return error on attempt to set zero queues, or requested number is greater than number of allocated queues. In netif_alloc_rx_queues, do BUG_ON if queue_count is zero. Signed-off-by: Tom Herbert <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-20net: fail alloc_netdev_mq if queue count < 1Tom Herbert1-0/+6
In alloc_netdev_mq fail if requested queue_count < 1. Signed-off-by: Tom Herbert <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-20Merge branch 'for-davem' of ↵David S. Miller26-241/+388
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2010-10-20phonet: remove the unused variable pnChangli Gao1-1/+0
Signed-off-by: Changli Gao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-20netpoll: Revert napi_poll fix for bonding driverNeil Horman1-8/+1
In an erlier patch I modified napi_poll so that devices with IFF_MASTER polled the per_cpu list instead of the device list for napi. I did this because the bonding driver has no napi instances to poll, it instead expects to check the slave devices napi instances, which napi_poll was unaware of. Looking at this more closely however, I now see this isn't strictly needed. As the bond driver poll_controller calls the slaves poll_controller via netpoll_poll_dev, which recursively calls poll_napi on each slave, allowing those napi instances to get serviced. The earlier patch isn't at all harmfull, its just not needed, so lets revert it to make the code cleaner. Sorry for the noise, Signed-off-by: Neil Horman <[email protected]> Reviewed-by: WANG Cong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-19Fixed race condition at ip_vs.ko module init.Eduardo Blanco1-9/+10
Lists were initialized after the module was registered. Multiple ipvsadm processes at module load triggered a race condition that resulted in a null pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex was left locked preventing all further ipvsadm commands. Signed-off-by: Eduardo J. Blanco <[email protected]> Signed-off-by: Simon Horman <[email protected]>
2010-10-19sunrpc: Turn list_for_each-s into the ..._entry-sPavel Emelyanov3-17/+7
Saves some lines of code and some branticks when reading one. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Remove dead "else" branch from bc xprt creationPavel Emelyanov1-9/+4
Since the xprt in question is forcibly set to be bound the else branch of this check is unneeded. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Don't return NULL from rpcb_createPavel Emelyanov1-1/+1
> The reason for this is in the future, we may want to support additional > address family types. We should, therefore, ensure that every piece of > code that is sensitive to address families fail in some orderly manner > to let developers know where a change is needed. Makes sense. I was under impression, that AF-s other than INET are not cared about at all :( Here's a fixed version of the patch. Log: Its callers check for ERR_PTR. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Remove useless if (task == NULL) from xprt_reserve_xprtPavel Emelyanov1-2/+0
The task in question is dereferenced above (and is actually never NULL). Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Remove UDP worker wrappersPavel Emelyanov1-34/+7
Same for UDP sockets creation paths. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Remove TCP worker wrappersPavel Emelyanov1-32/+7
The v4 and the v6 wrappers only pass the respective family to the xs_tcp_setup_socket. This family can be taken from the xprt's sockaddr. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Pass family to setup_socket callsPavel Emelyanov1-24/+8
Now we have a single socket creation routine and can call it directly from the setup_socket routines. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Merge xs_create_sock codePavel Emelyanov1-25/+24
After xs_bind is merged it's easy to merge its callers. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> [[email protected]: fix address family initialization] Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Merge the xs_bind codePavel Emelyanov1-47/+19
There's the only difference betseen the xs_bind4 and the xs_bind6 - the size of sockaddr structure they use. Fortunatelly its size can be indirectly get from the transport. Change since v1: * use sockaddr_storage instead of sockaddr * use rpc_set_port instead of manual port assigning Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> [[email protected]: fix address family initialization] Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Call xs_create_sockX directly from setup_socketPavel Emelyanov1-32/+8
Remove now unneeded wrappers that just add type and protocol to socket creation callback. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Factor out v6 sockets creationPavel Emelyanov1-37/+26
Same patch for v6 protocols. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Factor out v4 sockets creationPavel Emelyanov1-37/+26
The UDPv4 and TCPv4 socket creation callbacks now look very similar. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Factor out udp sockets creationPavel Emelyanov1-40/+56
Make it look like the TCP sockets creation. Unfortunately the git diff made the patch look messy :( Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Remove duplicate xprt/transport arguments from callsPavel Emelyanov1-6/+6
The xs_tcp_reuse_connection takes the xprt only to pass it down to the xs_abort_connection. The later one can get it from the given transport itself. Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Get xprt pointer once in xs_tcp_setup_socketPavel Emelyanov1-6/+4
Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Remove unused sock arg from xs_next_srcportPavel Emelyanov1-3/+3
Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19sunrpc: Remove unused sock arg from xs_get_srcportPavel Emelyanov1-3/+3
Signed-off-by: Pavel Emelyanov <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-19inet: RCU changes in inetdev_by_index()Eric Dumazet4-22/+16
Convert inetdev_by_index() to not increment in_dev refcount. Callers hold RCU or RTNL, and should not decrement in_dev refcount. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-19net: avoid a dev refcount in ip_mc_find_dev()Eric Dumazet2-3/+3
We hold RTNL in ip_mc_find_dev(), no need to touch device refcount. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2010-10-19sunrpc: remove the big kernel lockArnd Bergmann2-30/+11
The sunrpc cache_ioctl function does not need the big kernel lock because it uses its own queue_lock already. rpc_pipe_ioctl apparently should be using i_lock like the other operations on the pipe file descriptor do. Signed-off-by: Arnd Bergmann <[email protected]>
2010-10-19ipvs: IPv6 tunnel modeHans Schillstrom1-79/+92
IPv6 encapsulation uses a bad source address for the tunnel. i.e. VIP will be used as local-addr and encap. dst addr. Decapsulation will not accept this. Example LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100) (eth0 2003::1:0:1/96) RS (ethX 2003::1:0:5/96) tcpdump 2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40) 2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0 In Linux IPv6 impl. you can't have a tunnel with an any cast address receiving packets (I have not tried to interpret RFC 2473) To have receive capabilities the tunnel must have: - Local address set as multicast addr or an unicast addr - Remote address set as an unicast addr. - Loop back addres or Link local address are not allowed. This causes us to setup a tunnel in the Real Server with the LVS as the remote address, here you can't use the VIP address since it's used inside the tunnel. Solution Use outgoing interface IPv6 address (match against the destination). i.e. use ip6_route_output() to look up the route cache and then use ipv6_dev_get_saddr(...) to set the source address of the encapsulated packet. Additionally, cache the results in new destination fields: dst_cookie and dst_saddr and properly check the returned dst from ip6_route_output. We now add xfrm_lookup call only for the tunneling method where the source address is a local one. Signed-off-by:Hans Schillstrom <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-10-19netfilter: ctnetlink: add expectation deletion eventsPablo Neira Ayuso2-11/+25
This patch allows to listen to events that inform about expectations destroyed. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Patrick McHardy <[email protected]>
2010-10-18svcrdma: Cleanup DMA unmapping in error paths.Tom Tucker3-15/+17
There are several error paths in the code that do not unmap DMA. This patch adds calls to svc_rdma_unmap_dma to free these DMA contexts. Signed-off-by: Tom Tucker <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-18svcrdma: Change DMA mapping logic to avoid the page_address kernel APITom Tucker3-38/+78
There was logic in the send path that assumed that a page containing data to send to the client has a KVA. This is not always the case and can result in data corruption when page_address returns zero and we end up DMA mapping zero. This patch changes the bus mapping logic to avoid page_address() where necessary and converts all calls from ib_dma_map_single to ib_dma_map_page in order to keep the map/unmap calls symmetric. Signed-off-by: Tom Tucker <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
2010-10-18sched: Fix softirq time accountingVenkatesh Pallipadi1-1/+1
Peter Zijlstra found a bug in the way softirq time is accounted in VIRT_CPU_ACCOUNTING on this thread: http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html The problem is, softirq processing uses local_bh_disable internally. There is no way, later in the flow, to differentiate between whether softirq is being processed or is it just that bh has been disabled. So, a hardirq when bh is disabled results in time being wrongly accounted as softirq. Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING as well. As account_system_time() in normal tick based accouting also uses softirq_count, which will be set even when not in softirq with bh disabled. Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq processing. The patch below does that and adds API in_serving_softirq() which returns whether we are currently processing softirq or not. Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c to in_serving_softirq. Looks like many usages of in_softirq really want in_serving_softirq. Those changes can be made individually on a case by case basis. Signed-off-by: Venkatesh Pallipadi <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>