aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2008-04-21dlm: move plock code from gfs2David Teigland3-42/+51
Move the code that handles cluster posix locks from gfs2 into the dlm so that it can be used by both gfs2 and ocfs2. Signed-off-by: David Teigland <[email protected]>
2008-04-21Fix RCU list iterator use of 'rcu_dereference()'Linus Torvalds1-33/+15
The RCU iterators used 'rcu_dereference()' on an already-fetched RCU pointer value, which defeats the whole point of the exercise. When we dereference a pointer protected by RCU, we need to make sure that we only fetch the value _once_, because if the compiler ends up re-loading it due to register pressure, the newly reloaded value could be different from the previously fetched one, and you get inconsistent results. Cleaned-up, fixed, and the pointless list_for_each_safe_rcu #define deleted by Paul Kenney. Acked-by: Herbert Xu <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-21block: fix memory hotplug and bouncing in block layerAndi Kleen1-1/+6
Only noticed this while hacking something else, no test case. blk_max_low_pfn is initialized once at bootup by the block layer from max_low_pfn. But max_low_pfn is not necessarily constant over the runtime of the system when you consider memory hotplug. What could happen if that someone adds memory later the block layer wouldn't get updated and then start bouncing memory unnecessarily. Also on 64bit blk_max_low_pfn actually isn't needed because it just disables bouncing essentially and there is no highmem. And nobody can pass pfns > max_low_pfn to the block layer, because those wouldn't have a struct page and I suspect block layer wouldn't be very happy without that. So set BLK_BOUNCE_HIGH to infinity (-1ULL) on 64bit. That avoids the problem of having to update it on memory hotadd. On 32bit I kept the same behaviour because at least on i386 memory hotadd only adds HIGHMEM, never lowmem. BLK_BOUNCE_ANY is always set to infinity on both 32 and 64bit. Signed-off-by: Andi Kleen <[email protected]> Cc: Jens Axboe <[email protected]> Acked-by: Yasunori Goto <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2008-04-21block: move the padding adjustment to blk_rq_map_sgFUJITA Tomonori1-0/+2
blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule that req->data_len (the true data length) is equal to sum(bio). It broke the scsi command completion code. commit e97a294ef6938512b655b1abf17656cf2b26f709 was introduced to fix the above issue. However, the partial completion code doesn't work with it. The commit is also a layer violation (scsi mid-layer should not know about the block layer's padding). This patch moves the padding adjustment to blk_rq_map_sg (suggested by James). The padding works like the drain buffer. This patch breaks the rule that req->data_len is equal to sum(sg), however, the drain buffer already broke it. So this patch just restores the rule that req->data_len is equal to sub(bio) without breaking anything new. Now when a low level driver needs padding, blk_rq_map_user and blk_rq_map_user_iov guarantee there's enough room for padding. blk_rq_map_sg can safely extend the last entry of a scatter list. blk_rq_map_sg must extend the last entry of a scatter list only for a request that got through bio_copy_user_iov. This patches introduces new REQ_COPY_USER flag. Signed-off-by: FUJITA Tomonori <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Mike Christie <[email protected]> Cc: James Bottomley <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2008-04-21block: convert bio_copy_user to bio_copy_user_iovFUJITA Tomonori1-0/+2
This patch enables bio_copy_user to take struct sg_iovec (renamed bio_copy_user_iov). bio_copy_user uses bio_copy_user_iov internally as bio_map_user uses bio_map_user_iov. The major changes are: - adds sg_iovec array to struct bio_map_data - adds __bio_copy_iov that copy data between bio and sg_iovec. bio_copy_user_iov and bio_uncopy_user use it. Signed-off-by: FUJITA Tomonori <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Mike Christie <[email protected]> Cc: James Bottomley <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2008-04-21cdrom: make unregister_cdrom() return voidAkinobu Mita1-1/+1
Now unregister_cdrom() always returns 0. Make it return void and update all callers that check the return value. Signed-off-by: Akinobu Mita <[email protected]> Cc: Adrian McMenamin <[email protected]> Cc: Borislav Petkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2008-04-21cdrom: use list_head for cdrom_device_info listAkinobu Mita1-1/+2
Use list_head for cdrom_device_info list instead of opencoded singly list handling. Signed-off-by: Akinobu Mita <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2008-04-21jiffies: add time_is_after_jiffies and others which compare with jiffiesDave Young1-0/+16
Most of time_after like macros usages just compare jiffies and another number, so here add some time_is_* macros for convenience. Signed-off-by: Dave Young <[email protected]> Cc: Alan Cox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2008-04-20PCI: clean up resource alignment managementIvan Kokshaysky1-1/+4
Done per Linus' request and suggestions. Linus has explained that better than I'll be able to explain: On Thu, Mar 27, 2008 at 10:12:10AM -0700, Linus Torvalds wrote: > Actually, before we go any further, there might be a less intrusive > alternative: add just a couple of flags to the resource flags field (we > still have something like 8 unused bits on 32-bit), and use those to > implement a generic "resource_alignment()" routine. > > Two flags would do it: > > - IORESOURCE_SIZEALIGN: size indicates alignment (regular PCI device > resources) > > - IORESOURCE_STARTALIGN: start field is alignment (PCI bus resources > during probing) > > and then the case of both flags zero (or both bits set) would actually be > "invalid", and we would also clear the IORESOURCE_STARTALIGN flag when we > actually allocate the resource (so that we don't use the "start" field as > alignment incorrectly when it no longer indicates alignment). > > That wouldn't be totally generic, but it would have the nice property of > automatically at least add sanity checking for that whole "res->start has > the odd meaning of 'alignment' during probing" and remove the need for a > new field, and it would allow us to have a generic "resource_alignment()" > routine that just gets a resource pointer. Besides, I removed IORESOURCE_BUS_HAS_VGA flag which was unused for ages. Signed-off-by: Ivan Kokshaysky <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Gary Hade <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: Expose PCI VPD through sysfsBen Hutchings1-0/+3
Vital Product Data (VPD) may be exposed by PCI devices in several ways. It is generally unsafe to read this information through the existing interfaces to user-land because of stateful interfaces. This adds: - abstract operations for VPD access (struct pci_vpd_ops) - VPD state information in struct pci_dev (struct pci_vpd) - an implementation of the VPD access method specified in PCI 2.2 (in access.c) - a 'vpd' binary file in sysfs directories for PCI devices with VPD operations defined It adds a probe for PCI 2.2 VPD in pci_scan_device() and release of VPD state in pci_release_dev(). Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: add generic pci_enable_resources()Bjorn Helgaas1-0/+1
Each architecture has its own pcibios_enable_resources() implementation. These differ in many minor ways that have nothing to do with actual architectural differences. Follow-on patches will make most arches use this generic version instead. This version is based on powerpc, which seemed most up-to-date. The only functional difference from the x86 version is that this uses "!r->parent" to check for resource collisions instead of "!r->start && r->end". Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Benjamin Herrenschmidt <[email protected]> Acked-by: David Howells <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: add PCI Express ASPM supportShaohua Li3-0/+69
PCI Express ASPM defines a protocol for PCI Express components in the D0 state to reduce Link power by placing their Links into a low power state and instructing the other end of the Link to do likewise. This capability allows hardware-autonomous, dynamic Link power reduction beyond what is achievable by software-only controlled power management. However, The device should be configured by software appropriately. Enabling ASPM will save power, but will introduce device latency. This patch adds ASPM support in Linux. It introduces a global policy for ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control it. The interface can be used as a boot option too. Currently we have below setting: -default, BIOS default setting -powersave, highest power saving mode, enable all available ASPM state and clock power management -performance, highest performance, disable ASPM and clock power management By default, the 'default' policy is used currently. In my test, power difference between powersave mode and performance mode is about 1.3w in a system with 3 PCIE links. Note: some devices might not work well with aspm, either because chipset issue or device issue. The patch provide API (pci_disable_link_state), driver can disable ASPM for specific device. Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: #if 0 pci_cleanup_aer_correct_error_status()Adrian Bunk1-5/+0
#if 0 the no longer used pci_cleanup_aer_correct_error_status(). Signed-off-by: Adrian Bunk <[email protected]> Cc: Stephen Hemminger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: remove global list of PCI devicesGreg Kroah-Hartman1-3/+0
This patch finally removes the global list of PCI devices. We are relying entirely on the list held in the driver core now, and do not need a separate "shadow" list as no one uses it. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: add is_added flag to struct pci_devGreg Kroah-Hartman1-0/+1
This lets us check if the device is really added to the driver core or not, which is what we need when walking some of the bus lists. The flag is there in anticipation of getting rid of the other PCI device list, which is what we used to check in this situation. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: clean up search.c a lotGreg Kroah-Hartman1-2/+2
This cleans up the search.c file, now using the pci list of devices that are created for the driver core, instead of relying on our separate list of devices. It's better to use the functions already created for this kind of thing, instead of rolling our own all the time. This work is done in anticipation of getting rid of that second list of pci devices all together. And it ends up saving code, always a nice benefit. This also removes one compiler warning for when CONFIG_PCI_LEGACY is enabled as we no longer internally use the deprecated functions anymore. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: remove pci_get_device_reverseGreg Kroah-Hartman1-10/+0
This removes the pci_get_device_reverse function as there should not be any need to walk pci devices backwards anymore. All users of this call are now gone from the tree, so it is safe to remove it. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: remove pci_find_presentGreg Kroah-Hartman1-2/+0
No one is using this function anymore for quite some time, so remove it. Everyone calls pci_dev_present() instead anyway... Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-20PCI: #if 0 pci_assign_resource_fixed()Adrian Bunk1-1/+0
An unused function that bloated the kernel only when CONFIG_EMBEDDED was enabled... Signed-off-by: Adrian Bunk <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-21[CRYPTO] api: Make the crypto subsystem fully modularSebastian Siewior1-7/+0
Signed-off-by: Sebastian Siewior <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2008-04-20skbuff: fix missing kernel-doc notationRandy Dunlap1-0/+1
Add kernel-doc notation for ndisc_nodetype: Warning(linux-2.6.25-git2//include/linux/skbuff.h:340): No description found for parameter 'ndisc_nodetype' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2008-04-19SCSI: convert struct class_device to struct deviceTony Jones6-30/+31
It's big, but there doesn't seem to be a way to split it up smaller... Signed-off-by: Tony Jones <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Sean Hefty <[email protected]> Cc: Hal Rosenstock <[email protected]> Cc: James Bottomley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19memstick: convert struct class_device to struct deviceGreg Kroah-Hartman1-1/+1
struct class_device is going away, struct device should be used instead. Signed-off-by: Tony Jones <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Cc: Alex Dubov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19PM: Remove destroy_suspended_device()Rafael J. Wysocki4-41/+3
After 2.6.24 there was a plan to make the PM core acquire all device semaphores during a suspend/hibernation to protect itself from concurrent operations involving device objects. That proved to be too heavy-handed and we found a better way to achieve the goal, but before it happened, we had introduced the functions device_pm_schedule_removal() and destroy_suspended_device() to allow drivers to "safely" destroy a suspended device and we had adapted some drivers to use them. Now that these functions are no longer necessary, it seems reasonable to remove them and modify their users to use the normal device unregistration instead. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19Firmware: add iSCSI iBFT SupportKonrad Rzeszutek1-0/+50
Add /sysfs/firmware/ibft/[initiator|targetX|ethernetX] directories along with text properties which export the the iSCSI Boot Firmware Table (iBFT) structure. What is iSCSI Boot Firmware Table? It is a mechanism for the iSCSI tools to extract from the machine NICs the iSCSI connection information so that they can automagically mount the iSCSI share/target. Currently the iSCSI information is hard-coded in the initrd. The /sysfs entries are read-only one-name-and-value fields. The usual set of data exposed is: # for a in `find /sys/firmware/ibft/ -type f -print`; do echo -n "$a: "; cat $a; done /sys/firmware/ibft/target0/target-name: iqn.2007.com.intel-sbx44:storage-10gb /sys/firmware/ibft/target0/nic-assoc: 0 /sys/firmware/ibft/target0/chap-type: 0 /sys/firmware/ibft/target0/lun: 00000000 /sys/firmware/ibft/target0/port: 3260 /sys/firmware/ibft/target0/ip-addr: 192.168.79.116 /sys/firmware/ibft/target0/flags: 3 /sys/firmware/ibft/target0/index: 0 /sys/firmware/ibft/ethernet0/mac: 00:11:25:9d:8b:01 /sys/firmware/ibft/ethernet0/vlan: 0 /sys/firmware/ibft/ethernet0/gateway: 192.168.79.254 /sys/firmware/ibft/ethernet0/origin: 0 /sys/firmware/ibft/ethernet0/subnet-mask: 255.255.252.0 /sys/firmware/ibft/ethernet0/ip-addr: 192.168.77.41 /sys/firmware/ibft/ethernet0/flags: 7 /sys/firmware/ibft/ethernet0/index: 0 /sys/firmware/ibft/initiator/initiator-name: iqn.2007-07.com:konrad.initiator /sys/firmware/ibft/initiator/flags: 3 /sys/firmware/ibft/initiator/index: 0 For full details of the IBFT structure please take a look at: ftp://ftp.software.ibm.com/systems/support/system_x_pdf/ibm_iscsi_boot_firmware_table_v1.02.pdf [[email protected]: fix build] Signed-off-by: Konrad Rzeszutek <[email protected]> Cc: Mike Christie <[email protected]> Cc: Peter Jones <[email protected]> Cc: James Bottomley <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19Driver core: make device_is_registered() work for class devicesGreg Kroah-Hartman1-2/+1
device_is_registered() can use the kobject value for this, so it will now work with devices that are associated with only a class, not a bus and a driver. Cc: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19PM: Convert wakeup flag accessors to inline functionsAlan Stern3-45/+94
This patch (as1058) improves the wakeup macros in include/linux/pm.h. All but the trivial ones are converted to inline routines, which requires moving them to a separate header file since they depend on the definition of struct device. Signed-off-by: Alan Stern <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19PM: Make wakeup flags available whenever CONFIG_PM is setAlan Stern1-15/+21
The various wakeup flags and their accessor macros in struct dev_pm_info should be available whenever CONFIG_PM is enabled, not just when CONFIG_PM_SLEEP is on. Otherwise remote wakeup won't always be configurable for runtime power management. This patch (as1056b) fixes the oversight. David Brownell adds: More accurately, fixes the "regression" ... as noted sometime last summer, after 296699de6bdc717189a331ab6bbe90e05c94db06 introduced CONFIG_SUSPEND. But that didn't make the regression list for that kernel, ergo the delay in fixing it. [rjw: rebased] Signed-off-by: Alan Stern <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19PM: Handle device registrations during suspend/resumeRafael J. Wysocki1-0/+1
Modify the PM core to protect its data structures, specifically the dpm_active list, from being corrupted if a child of the currently suspending device is registered concurrently with its ->suspend() callback. In that case, since the new device (the child) is added to dpm_active after its parent, the PM core will attempt to suspend it after the parent, which is wrong. Introduce a new member of struct dev_pm_info, called 'sleeping', and use it to check if the parent of the device being added to dpm_active has been suspended, in which case the device registration fails. Also, use 'sleeping' for checking if the ordering of devices on dpm_active is correct. Introduce variable 'all_sleeping' that will be set to 'true' once all devices have been suspended and make new device registrations fail until 'all_sleeping' is reset to 'false', in order to avoid having unsuspended devices around while the system is going into a sleep state. Remove pm_sleep_rwsem which is not necessary any more. Special thanks to Alan Stern for discussions and suggestions that lead to the creation of this patch. Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19sysfs: small header file cleanup for SYSFS=nDavid Rientjes1-7/+2
Convert sysfs_remove_bin_file() to have a return type of 'void' for !CONFIG_SYSFS configurations. Also removes unnecessary colons from empty void functions. Signed-off-by: David Rientjes <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19driver core: Convert debug functions declared inline __attribute__((format ↵Joe Perches2-14/+7
(printf,x,y) to statement expression macros When DEBUG is not defined, pr_debug and dev_dbg and some other local debugging functions are specified as: "inline __attribute__((format (printf, x, y)))" This is done to validate printk arguments when not debugging. Converting these functions to macros or statement expressions "do { if (0) printk(fmt, ##arg); } while (0)" or "({ if (0) printk(fmt, ##arg); 0; }) makes at least gcc 4.2.2 produce smaller objects. This has the additional benefit of allowing the optimizer to avoid calling functions like print_mac that might have been arguments to the printk. defconfig x86 current: $ size vmlinux text data bss dec hex filename 4716770 474560 618496 5809826 58a6a2 vmlinux all converted: (More patches follow) $ size vmlinux text data bss dec hex filename 4716642 474560 618496 5809698 58a622 vmlinux Even kernel/sched.o, which doesn't even use these functions, becomes smaller. It appears that merely having an indirect include of <linux/device.h> can cause bigger objects. $ size sched.inline.o sched.if0.o text data bss dec hex filename 31385 2854 328 34567 8707 sched.inline.o 31366 2854 328 34548 86f4 sched.if0.o The current preprocessed only kernel/sched.i file contains: # 612 "include/linux/device.h" static inline __attribute__((always_inline)) int __attribute__ ((format (printf, 2, 3))) dev_dbg(struct device *dev, const char *fmt, ...) { return 0; } # 628 "include/linux/device.h" static inline __attribute__((always_inline)) int __attribute__ ((format (printf, 2, 3))) dev_vdbg(struct device *dev, const char *fmt, ...) { return 0; } Removing these unused inlines from sched.i shrinks sched.o Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19driver core: memory: semaphore to mutexDaniel Walker1-3/+2
Signed-off-by: Daniel Walker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-04-19Merge branch 'master' of ↵Haavard Skinnemoen1-0/+22
git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/usba-2.6.26 into base
2008-04-19Merge branch 'master' of ↵Haavard Skinnemoen1-0/+252
git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/tclib into base
2008-04-19make nfs_automount_list staticAdrian Bunk1-1/+0
nfs_automount_list can now become static. Signed-off-by: Adrian Bunk <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requestsTrond Myklebust1-0/+8
NFSv4 requires us to ensure that we break the TCP connection before we're allowed to retransmit a request. However in the case where we're retransmitting several requests that have been sent on the same connection, we need to ensure that we don't interfere with the attempt to reconnect and/or break the connection again once it has been established. We therefore introduce a 'connection' cookie that is bumped every time a connection is broken. This allows requests to track if they need to force a disconnection. Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19NFSv4: Reintroduce machine credsTrond Myklebust3-0/+5
We need to try to ensure that we always use the same credentials whenever we re-establish the clientid on the server. If not, the server won't recognise that we're the same client, and so may not allow us to recover state. Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()Trond Myklebust1-2/+0
With the recent change to generic creds, we can no longer use cred->cr_ops->cr_name to distinguish between RPCSEC_GSS principals and AUTH_SYS/AUTH_NULL identities. Replace it with the rpc_authops->au_name instead... Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19NLM/lockd: Add a reference counter to struct nlm_rqstTrond Myklebust1-0/+1
When we replace the existing synchronous RPC calls with asynchronous calls, the reference count will be needed in order to allow us to examine the result of the RPC call. Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19NFSv4: Only increment the sequence id if the server saw itTrond Myklebust1-2/+8
It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail before the actual stateful operation has been executed (for instance in the PUTFH call). There is no way to tell from the overall status result which operations were executed from the COMPOUND. The fix is to move incrementing of the sequence id into the XDR layer, so that we do it as we process the results from the stateful operation. Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19NFS: Ensure that the write code cleans up properly when rpc_run_task() failsTrond Myklebust1-2/+2
Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19SUNRPC: Fix up xprt_write_space()Trond Myklebust1-1/+1
The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether or not we have someone waiting for buffer memory. Convert the SUNRPC layer to use the same idiom. Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In fact, the most common case will be that there is nobody waiting for buffer space. SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was limited by the application window. Ensure that we follow the same idiom as the rest of the networking layer here too. Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that write_space() doesn't keep waking things up on xprt->pending. Signed-off-by: Trond Myklebust <[email protected]>
2008-04-19sched: fair-group: de-couple load-balancing from the rb-treesPeter Zijlstra2-0/+4
De-couple load-balancing from the rb-trees, so that I can change their organization. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19sched: rt-group: optimize dequeue_rt_stackPeter Zijlstra1-0/+1
Now that the group hierarchy can have an arbitrary depth the O(n^2) nature of RT task dequeues will really hurt. Optimize this by providing space to store the tree path, so we can walk it the other way. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19sched: fair-group: SMP-nice for group schedulingPeter Zijlstra1-0/+1
Implement SMP nice support for the full group hierarchy. On each load-balance action, compile a sched_domain wide view of the full task_group tree. We compute the domain wide view when walking down the hierarchy, and readjust the weights when walking back up. After collecting and readjusting the domain wide view, we try to balance the tasks within the task_groups. The current approach is a naively balance each task group until we've moved the targeted amount of load. Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP paper. XXX: there will be some numerical issues due to the limited nature of SCHED_LOAD_SCALE wrt to representing a task_groups influence on the total weight. When the tree is deep enough, or the task weight small enough, we'll run out of bits. Signed-off-by: Peter Zijlstra <[email protected]> CC: Abhishek Chandra <[email protected]> CC: Srivatsa Vaddagiri <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19sched, cpuset: customize sched domains, coreHidetoshi Seto1-1/+22
[rebased for sched-devel/latest] - Add a new cpuset file, having levels: sched_relax_domain_level - Modify partition_sched_domains() and build_sched_domains() to take attributes parameter passed from cpuset. - Fill newidle_idx for node domains which currently unused but might be required if sched_relax_domain_level become higher. - We can change the default level by boot option 'relax_domain_level='. Signed-off-by: Hidetoshi Seto <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19sched: fix the task_group hierarchy for UID groupingPeter Zijlstra1-0/+3
UID grouping doesn't actually have a task_group representing the root of the task_group tree. Add one. Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19sched: allow the group scheduler to have multiple levelsDhaval Giani1-1/+1
This patch makes the group scheduler multi hierarchy aware. [[email protected]: rt-parts and assorted fixes] Signed-off-by: Dhaval Giani <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19sched: add new set_cpus_allowed_ptr functionMike Travis1-4/+11
Add a new function that accepts a pointer to the "newly allowed cpus" cpumask argument. int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask) The current set_cpus_allowed() function is modified to use the above but this does not result in an ABI change. And with some compiler optimization help, it may not introduce any additional overhead. Additionally, to enforce the read only nature of the new_mask arg, the "const" property is migrated to sub-functions called by set_cpus_allowed. This silences compiler warnings. Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19cpumask: add show cpu map functionsMike Travis1-6/+11
* Add cpu_sysdev_class functions to display the following maps with cpulist_scnprintf(). cpu_online_map cpu_present_map cpu_possible_map * Small change to include/linux/sysdev.h to allow the attribute name and label to be different (to avoid collision with the "attr_online" entry for bringing cpus on- and off-line.) Cc: H. Peter Anvin <[email protected]> Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>