aboutsummaryrefslogtreecommitdiff
path: root/net/core/dev.c
AgeCommit message (Collapse)AuthorFilesLines
2008-10-08Merge branch 'master' of ↵David S. Miller1-27/+16
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/ich8lan.c drivers/net/e1000e/netdev.c
2008-10-07net: Fix netdev_run_todo dead-lockHerbert Xu1-21/+6
Benjamin Thery tracked down a bug that explains many instances of the error unregister_netdevice: waiting for %s to become free. Usage count = %d It turns out that netdev_run_todo can dead-lock with itself if a second instance of it is run in a thread that will then free a reference to the device waited on by the first instance. The problem is really quite silly. We were trying to create parallelism where none was required. As netdev_run_todo always follows a RTNL section, and that todo tasks can only be added with the RTNL held, by definition you should only need to wait for the very ones that you've added and be done with it. There is no need for a second mutex or spinlock. This is exactly what the following patch does. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-07net: only invoke dev->change_rx_flags when device is UPPatrick McHardy1-6/+10
Jesper Dangaard Brouer <[email protected]> reported a bug when setting a VLAN device down that is in promiscous mode: When the VLAN device is set down, the promiscous count on the real device is decremented by one by vlan_dev_stop(). When removing the promiscous flag from the VLAN device afterwards, the promiscous count on the real device is decremented a second time by the vlan_change_rx_flags() callback. The root cause for this is that the ->change_rx_flags() callback is invoked while the device is down. The synchronization is meant to mirror the behaviour of the ->set_rx_mode callbacks, meaning the ->open function is responsible for doing a full sync on open, the ->close() function is responsible for doing full cleanup on ->stop() and ->change_rx_flags() is meant to do incremental changes while the device is UP. Only invoke ->change_rx_flags() while the device is UP to provide the intended behaviour. Tested-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-10-01Merge branch 'master' of ↵David S. Miller1-2/+4
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath9k/core.c drivers/net/wireless/ath9k/main.c net/core/dev.c
2008-09-30netdev: docbook comment update (revised)Stephen Hemminger1-2/+44
Add more docbook comments to network device functions and cleanup the comments. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-30netdev: use const for some name functionsStephen Hemminger1-5/+4
dev_change_name and netdev_drivername should use const char on parameters that are read-only input values. The strcpy to newname is not needed since newname is not used later in function. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-23net: remove ifalias on empty given alias Oliver Hartkopp1-0/+8
This patch removes the potentially allocated ifalias when the (new) given alias is empty. E.g. when setting echo "" > /sys/class/net/eth0/ifalias Signed-off-by: Oliver Hartkopp <[email protected]> Acked-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-22net: network device name ifalias supportStephen Hemminger1-0/+23
This patch add support for keeping an additional character alias associated with an network interface. This is useful for maintaining the SNMP ifAlias value which is a user defined value. Routers use this to hold information like which circuit or line it is connected to. It is just an arbitrary text label on the network device. There are two exposed interfaces with this patch, the value can be read/written either via netlink or sysfs. This could be maintained just by the snmp daemon, but it is more generally useful for other management tools, and the kernel is good place to act as an agreed upon interface to store it. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20net: Use hton[sl]() instead of __constant_hton[sl]() where applicableArnaldo Carvalho de Melo1-2/+2
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-20netdev: simple_tx_hash shouldn't hash inside fragmentsAlexander Duyck1-2/+4
Currently simple_tx_hash is hashing inside of udp fragments. As a result packets are getting getting sent to all queues when they shouldn't be. This causes a serious performance regression which can be seen by sending UDP frames larger than mtu on multiqueue devices. This change will make it so that fragments are hashed only as IP datagrams w/o any protocol information. Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-08Merge branch 'master' of ↵David S. Miller1-1/+6
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: net/mac80211/mlme.c
2008-09-08net: Enable TSO if supported by at least one deviceHerbert Xu1-0/+6
As it stands users of netdev_compute_features (e.g., bridges/bonding) will only enable TSO if all consituent devices support it. This is unnecessarily pessimistic since even on devices that do not support hardware TSO and SG, emulated TSO still performs to a par with TSO off. This patch enables TSO if at least on constituent device supports it in hardware. The direct beneficiaries will be virtualisation that uses bridging since this means that TSO will always be enabled for communication from the host to the guests. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-09-07pkt_sched: Fix qdisc state in net_tx_action()Jarek Poplawski1-1/+6
net_tx_action() can skip __QDISC_STATE_SCHED bit clearing while qdisc is neither ran nor rescheduled, which may cause endless loop in dev_deactivate(). Reported-by: Denys Fedoryshchenko <[email protected]> Tested-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Jarek Poplawski <[email protected]> Acked-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-08-19pkt_sched: Prevent livelock in TX queue running.David S. Miller1-1/+3
If dev_deactivate() is trying to quiesce the queue, it is theoretically possible for another cpu to livelock trying to process that queue. This happens because dev_deactivate() grabs the queue spinlock as it checks the queue state, whereas net_tx_action() does a trylock and reschedules the qdisc if it hits the lock. This breaks the livelock by adding a check on __QDISC_STATE_DEACTIVATED to net_tx_action() when the trylock fails. Based upon feedback from Herbert Xu and Jarek Poplawski. Signed-off-by: David S. Miller <[email protected]>
2008-08-17pkt_sched: Fix missed RCU unlock in dev_queue_xmit()David S. Miller1-6/+4
Noticed by Jarek Poplawski. Signed-off-by: David S. Miller <[email protected]>
2008-08-17net: Change handling of the __QDISC_STATE_SCHED flag in net_tx_action().Jarek Poplawski1-15/+19
Change handling of the __QDISC_STATE_SCHED flag in net_tx_action() to enable proper control in dev_deactivate(). Now, if this flag is seen as unset under root_lock means a qdisc can't be netif_scheduled. Signed-off-by: Jarek Poplawski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-08-17pkt_sched: Add 'deactivated' state.David S. Miller1-1/+8
This new state lets dev_deactivate() mark a qdisc as having been deactivated. dev_queue_xmit() and ing_filter() check for this bit and do not try to process the qdisc if the bit is set. dev_deactivate() polls the qdisc after setting the bit, waiting for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear. This isn't perfect yet, but subsequent changesets will make it so. This part is just one piece of the puzzle. Signed-off-by: David S. Miller <[email protected]>
2008-08-07net/core: Allow receive on active slaves.Joe Eykholt1-2/+4
If a packet_type specifies an active slave to bonding and not just any interface, allow it to receive frames that came in on that interface. Signed-off-by: Joe Eykholt <[email protected]> Signed-off-by: Jay Vosburgh <[email protected]> Signed-off-by: Jeff Garzik <[email protected]>
2008-08-07net/core: Allow certain receives on inactive slave.Joe Eykholt1-7/+8
Allow a packet_type that specifies the exact device to receive even on an inactive bonding slave devices. This is important for some L2 protocols such as LLDP and FCoE. This can eventually be used for the bonding special cases as well. Signed-off-by: Joe Eykholt <[email protected]> Signed-off-by: Jay Vosburgh <[email protected]> Signed-off-by: Jeff Garzik <[email protected]>
2008-08-07net/core: Uninline skb_bond().Joe Eykholt1-20/+8
Otherwise subsequent changes need multiple return values. Signed-off-by: Joe Eykholt <[email protected]> Signed-off-by: Jay Vosburgh <[email protected]> Signed-off-by: Jeff Garzik <[email protected]>
2008-08-04net_sched: Add qdisc __NET_XMIT_BYPASS flagJarek Poplawski1-1/+0
Patrick McHardy <[email protected]> noticed that it would be nice to handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit(). David Miller <[email protected]> spotted a serious bug in the first version of this patch. Signed-off-by: Jarek Poplawski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-08-03net: eliminate refcounting in backlog queueStephen Hemminger1-7/+16
Avoid the overhead of atomic increment/decrement on each received packet. This helps performance of non-NAPI devices (like loopback). Use cleanup function to walk queue on each cpu and clean out any left over packets. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-08-03net: use software GSO for SG+CSUM capable netdevicesLennert Buytenhek1-0/+4
If a netdevice does not support hardware GSO, allowing the stack to use GSO anyway and then splitting the GSO skb into MSS-sized pieces as it is handed to the netdevice for transmitting is likely still a win as far as throughput and/or CPU usage are concerned, since it reduces the number of trips through the output path. This patch enables the use of GSO on any netdevice that supports SG. If a GSO skb is then sent to a netdevice that supports SG but does not support hardware GSO, net/core/dev.c:dev_hard_start_xmit() will take care of doing the necessary GSO segmentation in software. Signed-off-by: Lennert Buytenhek <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-08-02pkt_sched: Use qdisc_lock() on already sampled root qdisc.David S. Miller1-2/+2
Based upon a bug report by Jeff Kirsher. Don't use qdisc_root_lock() in these cases as the root qdisc could have been changed, and we'd thus lock the wrong object. Tested by Emil S Tantilov who confirms that this seems to fix the problem. Signed-off-by: David S. Miller <[email protected]>
2008-07-31netdev: Fix lockdep warnings in multiqueue configurations.David S. Miller1-0/+1
When support for multiple TX queues were added, the netif_tx_lock() routines we converted to iterate over all TX queues and grab each queue's spinlock. This causes heartburn for lockdep and it's not a healthy thing to do with lots of TX queues anyways. So modify this to use a top-level lock and a "frozen" state for the individual TX queues. Signed-off-by: David S. Miller <[email protected]>
2008-07-30pkt_sched: Fix OOPS on ingress qdisc add.David S. Miller1-2/+2
Bug report from Steven Jan Springl: Issuing the following command causes a kernel oops: tc qdisc add dev eth0 handle ffff: ingress The problem mostly stems from all of the special case handling of ingress qdiscs. So, to fix this, do the grafting operation the same way we do for TX qdiscs. Which means that dev_activate() and dev_deactivate() now do the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too. Future simplifications are possible now, mainly because it is impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There are NULL checks all over to handle the ingress qdisc special case that used to exist before this commit. Signed-off-by: David S. Miller <[email protected]>
2008-07-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-5/+5
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON
2008-07-25net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen1-5/+5
Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-3/+0
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: pkt_sched: sch_sfq: dump a real number of flows atm: [fore200e] use MODULE_FIRMWARE() and other suggested cleanups netfilter: make security table depend on NETFILTER_ADVANCED tcp: Clear probes_out more aggressively in tcp_ack(). e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call net: Update entry in af_family_clock_key_strings netdev: Remove warning from __netif_schedule(). sky2: don't stop queue on shutdown
2008-07-23Merge branch 'cpus4096-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-23netdev: Remove warning from __netif_schedule().David S. Miller1-3/+0
It isn't helping anything and we aren't going to be able to change all the drivers that do queue wakeups in strange situations. Just letting a noop_qdisc get scheduled will work because when qdisc_run() executes via net_tx_work() it will simply find no packets pending when it makes the ->dequeue() call in qdisc_restart. Signed-off-by: David S. Miller <[email protected]>
2008-07-22netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep.David S. Miller1-6/+21
The new address list lock needs to handle the same device layering issues that the _xmit_lock one does. This integrates work done by Patrick McHardy. Signed-off-by: David S. Miller <[email protected]>
2008-07-22net: Fix build failure with 'make mandocs'.Dave Jones1-26/+25
The function header comments have to go with the functions they are documenting, or things go horribly wrong when we try to process them with the docbook tools. Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq' Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; ' Signed-off-by: Dave Jones <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-07-21net: Print the module name as part of the watchdog messageArjan van de Ven1-0/+20
As suggested by Dave: This patch adds a function to get the driver name from a struct net_device, and consequently uses this in the watchdog timeout handler to print as part of the message. Signed-off-by: Arjan van de Ven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-07-21net: use kcalloc in netdev_queue allocStephen Hemminger1-2/+2
Minor nit, use size_t for allocation size and kcalloc to allocate an array. Probably makes no actual code difference. Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-07-21net: In __netif_schedule() use WARN_ON instead of BUG_ONLinus Torvalds1-1/+2
Signed-off-by: David S. Miller <[email protected]>
2008-07-21net: Improve simple_tx_hash().David S. Miller1-13/+21
Based upon feedback from Eric Dumazet and Andi Kleen. Cure several deficiencies in simple_tx_hash() by using jhash + reciprocol multiply. 1) Eliminates expensive modulus operation. 2) Makes hash less attackable by using random seed. 3) Eliminates endianness hash distribution issues. Signed-off-by: David S. Miller <[email protected]>
2008-07-21Merge branch 'linus' into cpus4096-for-linusIngo Molnar1-96/+284
Conflicts: net/sunrpc/svc.c Signed-off-by: Ingo Molnar <[email protected]>
2008-07-20net_sched: Add qdisc_enqueue wrapperJussi Kivilinna1-2/+2
Signed-off-by: Jussi Kivilinna <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-07-18pkt_sched: Manage qdisc list inside of root qdisc.David S. Miller1-2/+0
Idea is from Patrick McHardy. Instead of managing the list of qdiscs on the device level, manage it in the root qdisc of a netdev_queue. This solves all kinds of visibility issues during qdisc destruction. The way to iterate over all qdiscs of a netdev_queue is to visit the netdev_queue->qdisc, and then traverse it's list. The only special case is to ignore builting qdiscs at the root when dumping or doing a qdisc_lookup(). That was not needed previously because builtin qdiscs were not added to the device's qdisc_list. Signed-off-by: David S. Miller <[email protected]>
2008-07-18Merge branch 'master' of ↵David S. Miller1-2/+2
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/powerpc/booting-without-of.txt drivers/atm/Makefile drivers/net/fs_enet/fs_enet-main.c drivers/pci/pci-acpi.c net/8021q/vlan.c net/iucv/iucv.c
2008-07-17pkt_sched: Kill netdev_queue lock.David S. Miller1-4/+5
We can simply use the qdisc->q.lock for all of the qdisc tree synchronization. Signed-off-by: David S. Miller <[email protected]>
2008-07-17netdevice: Move qdisc_list back into net_device proper.David S. Miller1-0/+2
And give it it's own lock. Signed-off-by: David S. Miller <[email protected]>
2008-07-17pkt_sched: Schedule qdiscs instead of netdev_queue.David S. Miller1-40/+28
When we have shared qdiscs, packets come out of the qdiscs for multiple transmit queues. Therefore it doesn't make any sense to schedule the transmit queue when logically we cannot know ahead of time the TX queue of the SKB that the qdisc->dequeue() will give us. Just for sanity I added a BUG check to make sure we never get into a state where the noop_qdisc is scheduled. Signed-off-by: David S. Miller <[email protected]>
2008-07-17net: Implement simple sw TX hashing.David S. Miller1-0/+52
It just xor hashes over IPv4/IPv6 addresses and ports of transport. The only assumption it makes is that skb_network_header() is set correctly. With bug fixes from Eric Dumazet. Signed-off-by: David S. Miller <[email protected]>
2008-07-17netdev: Add netdev->select_queue() method.David S. Miller1-3/+6
Devices or device layers can set this to control the queue selection performed by dev_pick_tx(). This function runs under RCU protection, which allows overriding functions to have some way of synchronizing with things like dynamic ->real_num_tx_queues adjustments. This makes the spinlock prefetch in dev_queue_xmit() a little bit less effective, but that's the price right now for correctness. Signed-off-by: David S. Miller <[email protected]>
2008-07-17net: Use queue aware tests throughout.David S. Miller1-16/+12
This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <[email protected]>
2008-07-17netdev: Allocate multiple queues for TX.David S. Miller1-9/+31
alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <[email protected]>
2008-07-16Merge branch 'linus' into cpus4096Ingo Molnar1-2/+2
Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <[email protected]>
2008-07-15netdev: Do not use TX lock to protect address lists.David S. Miller1-26/+12
Now that we have a specific lock to protect the network device unicast and multicast lists, remove extraneous grabs of the TX lock in cases where the code only needs address list protection. Signed-off-by: David S. Miller <[email protected]>