aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2012-09-23rcu: Handle unbalanced rcu_node configurations with few CPUsPaul E. McKenney1-1/+1
If CONFIG_RCU_FANOUT_EXACT=y, if there are not enough CPUs (according to nr_cpu_ids) to require more than a single rcu_node structure, but if NR_CPUS is larger than would fit into a single rcu_node structure, then the current rcu_init_levelspread() code is subject to integer overflow in the eight-bit ->levelspread[] array in the rcu_state structure. In this case, the solution is -not- to increase the size of the elements in this array because the values in that array should be constrained to the number of bits in an unsigned long. Instead, this commit replaces NR_CPUS with nr_cpu_ids in the rcu_init_levelspread() function's initialization of the cprv local variable. This results in all of the arithmetic being consistently based off of the nr_cpu_ids value, thus avoiding the overflow, which was caused by the mixing of nr_cpu_ids and NR_CPUS. Reported-by: Mike Galbraith <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-09-23rcu: Simplify quiescent-state detectionPaul E. McKenney4-27/+16
The current quiescent-state detection algorithm is needlessly complex. It records the grace-period number corresponding to the quiescent state at the time of the quiescent state, which works, but it seems better to simply erase any record of previous quiescent states at the time that the CPU notices the new grace period. This has the further advantage of removing another piece of RCU for which lockless reasoning is required. Therefore, this commit makes this change. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Reduce synchronize_rcu_expedited() latencyPaul E. McKenney1-8/+22
The synchronize_rcu_expedited() function disables interrupts across a scan of all leaf rcu_node structures, which is not good for real-time scheduling latency on large systems (hundreds or especially thousands of CPUs). This commit therefore holds off CPU-hotplug operations using get_online_cpus(), and removes the prior acquisiion of the ->onofflock (which required disabling interrupts). Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Eliminate signed overflow in synchronize_rcu_expedited()Paul E. McKenney1-4/+4
In the C language, signed overflow is undefined. It is true that twos-complement arithmetic normally comes to the rescue, but if the compiler can subvert this any time it has any information about the values being compared. For example, given "if (a - b > 0)", if the compiler has enough information to realize that (for example) the value of "a" is positive and that of "b" is negative, the compiler is within its rights to optimize to a simple "if (1)", which might not be what you want. This commit therefore converts synchronize_rcu_expedited()'s work-done detection counter from signed to unsigned. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Adjust for unconditional ->completed assignmentPaul E. McKenney1-1/+3
Now that the rcu_node structures' ->completed fields are unconditionally assigned at grace-period cleanup time, they should already have the correct value for the new grace period at grace-period initialization time. This commit therefore inserts a WARN_ON_ONCE() to verify this invariant. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Add random PROVE_RCU_DELAY to grace-period initializationPaul E. McKenney1-0/+5
Preemption greatly raised the probability of certain types of race conditions, so this commit adds an anti-heisenbug to greatly increase the collision cross section, also known as the probability of occurrence. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Fix day-zero grace-period initialization/cleanup racePaul E. McKenney1-23/+17
The current approach to grace-period initialization is vulnerable to extremely low-probability races. These races stem from the fact that the old grace period is marked completed on the same traversal through the rcu_node structure that is marking the start of the new grace period. This means that some rcu_node structures will believe that the old grace period is still in effect at the same time that other rcu_node structures believe that the new grace period has already started. These sorts of disagreements can result in too-short grace periods, as shown in the following scenario: 1. CPU 0 completes a grace period, but needs an additional grace period, so starts initializing one, initializing all the non-leaf rcu_node structures and the first leaf rcu_node structure. Because CPU 0 is both completing the old grace period and starting a new one, it marks the completion of the old grace period and the start of the new grace period in a single traversal of the rcu_node structures. Therefore, CPUs corresponding to the first rcu_node structure can become aware that the prior grace period has completed, but CPUs corresponding to the other rcu_node structures will see this same prior grace period as still being in progress. 2. CPU 1 passes through a quiescent state, and therefore informs the RCU core. Because its leaf rcu_node structure has already been initialized, this CPU's quiescent state is applied to the new (and only partially initialized) grace period. 3. CPU 1 enters an RCU read-side critical section and acquires a reference to data item A. Note that this CPU believes that its critical section started after the beginning of the new grace period, and therefore will not block this new grace period. 4. CPU 16 exits dyntick-idle mode. Because it was in dyntick-idle mode, other CPUs informed the RCU core of its extended quiescent state for the past several grace periods. This means that CPU 16 is not yet aware that these past grace periods have ended. Assume that CPU 16 corresponds to the second leaf rcu_node structure -- which has not yet been made aware of the new grace period. 5. CPU 16 removes data item A from its enclosing data structure and passes it to call_rcu(), which queues a callback in the RCU_NEXT_TAIL segment of the callback queue. 6. CPU 16 enters the RCU core, possibly because it has taken a scheduling-clock interrupt, or alternatively because it has more than 10,000 callbacks queued. It notes that the second most recent grace period has completed (recall that because it corresponds to the second as-yet-uninitialized rcu_node structure, it cannot yet become aware that the most recent grace period has completed), and therefore advances its callbacks. The callback for data item A is therefore in the RCU_NEXT_READY_TAIL segment of the callback queue. 7. CPU 0 completes initialization of the remaining leaf rcu_node structures for the new grace period, including the structure corresponding to CPU 16. 8. CPU 16 again enters the RCU core, again, possibly because it has taken a scheduling-clock interrupt, or alternatively because it now has more than 10,000 callbacks queued. It notes that the most recent grace period has ended, and therefore advances its callbacks. The callback for data item A is therefore in the RCU_DONE_TAIL segment of the callback queue. 9. All CPUs other than CPU 1 pass through quiescent states. Because CPU 1 already passed through its quiescent state, the new grace period completes. Note that CPU 1 is still in its RCU read-side critical section, still referencing data item A. 10. Suppose that CPU 2 wais the last CPU to pass through a quiescent state for the new grace period, and suppose further that CPU 2 did not have any callbacks queued, therefore not needing an additional grace period. CPU 2 therefore traverses all of the rcu_node structures, marking the new grace period as completed, but does not initialize a new grace period. 11. CPU 16 yet again enters the RCU core, yet again possibly because it has taken a scheduling-clock interrupt, or alternatively because it now has more than 10,000 callbacks queued. It notes that the new grace period has ended, and therefore advances its callbacks. The callback for data item A is therefore in the RCU_DONE_TAIL segment of the callback queue. This means that this callback is now considered ready to be invoked. 12. CPU 16 invokes the callback, freeing data item A while CPU 1 is still referencing it. This scenario represents a day-zero bug for TREE_RCU. This commit therefore ensures that the old grace period is marked completed in all leaf rcu_node structures before a new grace period is marked started in any of them. That said, it would have been insanely difficult to force this race to happen before the grace-period initialization process was preemptible. Therefore, this commit is not a candidate for -stable. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Conflicts: kernel/rcutree.c
2012-09-23rcu: Make rcutree module parameters visible in sysfsPaul E. McKenney1-4/+4
The module parameters blimit, qhimark, and qlomark (and more recently, rcu_fanout_leaf) have permission masks of zero, so that their values are not visible from sysfs. This is unnecessary and inconvenient to administrators who might like an easy way to see what these values are on a running system. This commit therefore sets their permission masks to 0444, allowing them to be read but not written. Reported-by: Rusty Russell <[email protected]> Reported-by: Josh Triplett <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Control grace-period duration from sysfsPaul E. McKenney2-3/+33
Although almost everyone is well-served by the defaults, some uses of RCU benefit from shorter grace periods, while others benefit more from the greater efficiency provided by longer grace periods. Situations requiring a large number of grace periods to elapse (and wireshark startup has been called out as an example of this) are helped by lower-latency grace periods. Furthermore, in some embedded applications, people are willing to accept a small degradation in update efficiency (due to there being more of the shorter grace-period operations) in order to gain the lower latency. In contrast, those few systems with thousands of CPUs need longer grace periods because the CPU overhead of a grace period rises roughly linearly with the number of CPUs. Such systems normally do not make much use of facilities that require large numbers of grace periods to elapse, so this is a good tradeoff. Therefore, this commit allows the durations to be controlled from sysfs. There are two sysfs parameters, one named "jiffies_till_first_fqs" that specifies the delay in jiffies from the end of grace-period initialization until the first attempt to force quiescent states, and the other named "jiffies_till_next_fqs" that specifies the delay (again in jiffies) between subsequent attempts to force quiescent states. They both default to three jiffies, which is compatible with the old hard-coded behavior. At some future time, it may be possible to automatically increase the grace-period length with the number of CPUs, but we do not yet have sufficient data to do a good job. Preliminary data indicates that we should add an addiitonal jiffy to each of the delays for every 200 CPUs in the system, but more experimentation is needed. For now, the number of systems with more than 1,000 CPUs is small enough that this can be relegated to boot-time hand tuning. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Prevent force_quiescent_state() memory contentionPaul E. McKenney2-10/+38
Large systems running RCU_FAST_NO_HZ kernels see extreme memory contention on the rcu_state structure's ->fqslock field. This can be avoided by disabling RCU_FAST_NO_HZ, either at compile time or at boot time (via the nohz kernel boot parameter), but large systems will no doubt become sensitive to energy consumption. This commit therefore uses a combining-tree approach to spread the memory contention across new cache lines in the leaf rcu_node structures. This can be thought of as a tournament lock that has only a try-lock acquisition primitive. The effect on small systems is minimal, because such systems have an rcu_node "tree" consisting of a single node. In addition, this functionality is not used on fastpaths. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Adjust debugfs tracing for kthread-based quiescent-state forcingPaul E. McKenney3-30/+17
Moving quiescent-state forcing into a kthread dispenses with the need for the ->n_rp_need_fqs field, so this commit removes it. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Allow RCU quiescent-state forcing to be preemptedPaul E. McKenney1-0/+1
RCU quiescent-state forcing is currently carried out without preemption points, which can result in excessive latency spikes on large systems (many hundreds or thousands of CPUs). This patch therefore inserts a voluntary preemption point into force_qs_rnp(), which should greatly reduce the magnitude of these spikes. Reported-by: Mike Galbraith <[email protected]> Reported-by: Dimitri Sivanich <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Move quiescent-state forcing into kthreadPaul E. McKenney3-138/+82
As the first step towards allowing quiescent-state forcing to be preemptible, this commit moves RCU quiescent-state forcing into the same kthread that is now used to initialize and clean up after grace periods. This is yet another step towards keeping scheduling latency down to a dull roar. Updated to change from raw_spin_lock_irqsave() to raw_spin_lock_irq() and to remove the now-unused rcu_state structure fields as suggested by Peter Zijlstra. Reported-by: Mike Galbraith <[email protected]> Reported-by: Dimitri Sivanich <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-09-23rcu: Segregate rcu_state fields to improve cache localityDimitri Sivanich1-1/+2
The fields in the rcu_state structure that are protected by the root rcu_node structure's ->lock can share a cache line with the fields protected by ->onofflock. This can result in excessive memory contention on large systems, so this commit applies ____cacheline_internodealigned_in_smp to the ->onofflock field in order to segregate them. Signed-off-by: Dimitri Sivanich <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Dimitri Sivanich <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Provide OOM handler to motivate lazy RCU callbacksPaul E. McKenney2-1/+87
In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a large number of lazy callbacks, which as the name implies will be slow to be invoked. This can be a problem on small-memory systems, where the default 6-second sleep for CPUs having only lazy RCU callbacks could well be fatal. This commit therefore installs an OOM hander that ensures that every CPU with lazy callbacks has at least one non-lazy callback, in turn ensuring timely advancement for these callbacks. Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan. Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(), thus reducing the number of IPIs, as suggested by Steven Rostedt. Also to make the for_each_online_cpu() loop be preemptible. (Later, it might be good to use smp_call_function(), as suggested by Peter Zijlstra.) Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Sasha Levin <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Prevent offline CPUs from executing RCU core codePaul E. McKenney1-0/+2
Earlier versions of RCU invoked the RCU core from the CPU_DYING notifier in order to note a quiescent state for the outgoing CPU. Because the CPU is marked "offline" during the execution of the CPU_DYING notifiers, the RCU core had to tolerate being invoked from an offline CPU. However, commit b1420f1c (Make rcu_barrier() less disruptive) left only tracing code in the CPU_DYING notifier, so the RCU core need no longer execute on offline CPUs. This commit therefore enforces this restriction. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Break up rcu_gp_kthread() into subfunctionsPaul E. McKenney1-115/+135
Then rcu_gp_kthread() function is too large and furthermore needs to have the force_quiescent_state() code pulled in. This commit therefore breaks up rcu_gp_kthread() into rcu_gp_init() and rcu_gp_cleanup(). Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Allow RCU grace-period cleanup to be preemptedPaul E. McKenney1-6/+5
RCU grace-period cleanup is currently carried out with interrupts disabled, which can result in excessive latency spikes on large systems (many hundreds or thousands of CPUs). This patch therefore makes the RCU grace-period cleanup be preemptible, including voluntary preemption points, which should eliminate those latency spikes. Similar spikes from forcing of quiescent states will be dealt with similarly by later patches. Updated to replace uses of spin_lock_irqsave() with spin_lock_irq(), as suggested by Peter Zijlstra. Reported-by: Mike Galbraith <[email protected]> Reported-by: Dimitri Sivanich <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Move RCU grace-period cleanup into kthreadPaul E. McKenney1-50/+62
As a first step towards allowing grace-period cleanup to be preemptible, this commit moves the RCU grace-period cleanup into the same kthread that is now used to initialize grace periods. This is needed to keep scheduling latency down to a dull roar. [ paulmck: Get rid of stray spin_lock_irqsave() calls. ] Reported-by: Mike Galbraith <[email protected]> Reported-by: Dimitri Sivanich <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Allow RCU grace-period initialization to be preemptedPaul E. McKenney1-15/+11
RCU grace-period initialization is currently carried out with interrupts disabled, which can result in 200-microsecond latency spikes on systems on which RCU has been configured for 4096 CPUs. This patch therefore makes the RCU grace-period initialization be preemptible, which should eliminate those latency spikes. Similar spikes from grace-period cleanup and the forcing of quiescent states will be dealt with similarly by later patches. Reported-by: Mike Galbraith <[email protected]> Reported-by: Dimitri Sivanich <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Prevent initialization-time quiescent-state racePaul E. McKenney1-14/+0
The next step in reducing RCU's grace-period initialization latency on large systems will make this initialization preemptible. Unfortunately, making the grace-period initialization subject to interrupts (let alone preemption) exposes the following race on systems whose rcu_node tree contains more than one node: 1. CPU 31 starts initializing the grace period, including the first leaf rcu_node structures, and is then preempted. 2. CPU 0 refers to the first leaf rcu_node structure, and notes that a new grace period has started. It passes through a quiescent state shortly thereafter, and informs the RCU core of this rite of passage. 3. CPU 0 enters an RCU read-side critical section, acquiring a pointer to an RCU-protected data item. 4. CPU 31 takes an interrupt whose handler removes the data item referenced by CPU 0 from the data structure, and registers an RCU callback in order to free it. 5. CPU 31 resumes initializing the grace period, including its own rcu_node structure. In invokes rcu_start_gp_per_cpu(), which advances all callbacks, including the one registered in #4 above, to be handled by the current grace period. 6. The remaining CPUs pass through quiescent states and inform the RCU core, but CPU 0 remains in its RCU read-side critical section, still referencing the now-removed data item. 7. The grace period completes and all the callbacks are invoked, including the one that frees the data item that CPU 0 is still referencing. Oops!!! One way to avoid this race is to remove grace-period acceleration from rcu_start_gp_per_cpu(). Now, the only reason for this acceleration was to allow CPUs bringing RCU out of idle state to have their callbacks invoked after only one grace period, rather than the two grace periods that would otherwise be required. But this acceleration does not work when RCU grace-period initialization is moved to a kthread because the CPU posting the callback is no longer necessarily the CPU that is initializing the resulting grace period. This commit therefore removes this now-pointless (and soon to be dangerous) grace-period acceleration, thus avoiding the above race. Signed-off-by: Paul E. McKenney <[email protected]>
2012-09-23rcu: Move RCU grace-period initialization into a kthreadPaul E. McKenney2-64/+129
As the first step towards allowing grace-period initialization to be preemptible, this commit moves the RCU grace-period initialization into its own kthread. This is needed to keep large-system scheduling latency at reasonable levels. Also change raw_spin_lock_irqsave() to raw_spin_lock_irq() as suggested by Peter Zijlstra in review comments. Reported-by: Mike Galbraith <[email protected]> Reported-by: Dimitri Sivanich <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]>
2012-09-23rcu: Fix day-one dyntick-idle stall-warning bugPaul E. McKenney1-1/+3
Each grace period is supposed to have at least one callback waiting for that grace period to complete. However, if CONFIG_NO_HZ=n, an extra callback-free grace period is no big problem -- it will chew up a tiny bit of CPU time, but it will complete normally. In contrast, CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to sleep indefinitely, in turn indefinitely delaying completion of the callback-free grace period. Given that nothing is waiting on this grace period, this is also not a problem. That is, unless RCU CPU stall warnings are also enabled, as they are in recent kernels. In this case, if a CPU wakes up after at least one minute of inactivity, an RCU CPU stall warning will result. The reason that no one noticed until quite recently is that most systems have enough OS noise that they will never remain absolutely idle for a full minute. But there are some embedded systems with cut-down userspace configurations that consistently get into this situation. All this begs the question of exactly how a callback-free grace period gets started in the first place. This can happen due to the fact that CPUs do not necessarily agree on which grace period is in progress. If a CPU still believes that the grace period that just completed is still ongoing, it will believe that it has callbacks that need to wait for another grace period, never mind the fact that the grace period that they were waiting for just completed. This CPU can therefore erroneously decide to start a new grace period. Note that this can happen in TREE_RCU and TREE_PREEMPT_RCU even on a single-CPU system: Deadlock considerations mean that the CPU that detected the end of the grace period is not necessarily officially informed of this fact for some time. Once this CPU notices that the earlier grace period completed, it will invoke its callbacks. It then won't have any callbacks left. If no other CPU has any callbacks, we now have a callback-free grace period. This commit therefore makes CPUs check more carefully before starting a new grace period. This new check relies on an array of tail pointers into each CPU's list of callbacks. If the CPU is up to date on which grace periods have completed, it checks to see if any callbacks follow the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks follow the RCU_WAIT_TAIL segment. The reason that this works is that the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment as soon as the CPU is officially notified that the old grace period has ended. This change is to cpu_needs_another_gp(), which is called in a number of places. The only one that really matters is in rcu_start_gp(), where the root rcu_node structure's ->lock is held, which prevents any other CPU from starting or completing a grace period, so that the comparison that determines whether the CPU is missing the completion of a grace period is stable. Reported-by: Becky Bruce <[email protected]> Reported-by: Subodh Nijsure <[email protected]> Reported-by: Paul Walmsley <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Paul Walmsley <[email protected]> # OMAP3730, OMAP4430 Cc: [email protected]
2012-09-22close the race in nlmsvc_free_block()Al Viro1-2/+1
we need to grab mutex before the reference counter reaches 0 Signed-off-by: Al Viro <[email protected]>
2012-09-22do_add_mount()/umount -l racesAl Viro1-2/+8
normally we deal with lock_mount()/umount races by checking that mountpoint to be is still in our namespace after lock_mount() has been done. However, do_add_mount() skips that check when called with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The reason is that ->mnt_ns may be a temporary namespace created exactly to contain automounts a-la NFS4 referral handling. It's not the namespace of the caller, though, so check_mnt() would fail here. We still need to check that ->mnt_ns is non-NULL in that case, though. Signed-off-by: Al Viro <[email protected]>
2012-09-22pppoe: drop PPPOX_ZOMBIEs in pppoe_releaseXiaodong Xu1-1/+1
When PPPOE is running over a virtual ethernet interface (e.g., a bonding interface) and the user tries to delete the interface in case the PPPOE state is ZOMBIE, the kernel will loop forever while unregistering net_device for the reference count is not decreased to zero which should have been done with dev_put(). Signed-off-by: Xiaodong Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-09-22Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds4-9/+9
Pull MIPS fixes from Ralf Baechle: "Random fixes across arch/mips, essentially. One fix for an issue in get_user_pages_fast() which previously was discovered on x86, a miscalculation in the support for the MIPS MT hardware multithreading support, the RTC support for the Malta and a fix for a spurious interrupt issue that seems to bite only very special Malta configurations." * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Malta: Don't crash on spurious interrupt. MIPS: Malta: Remove RTC Data Mode bootstrap breakage MIPS: mm: Add compound tail page _mapcount when mapped MIPS: CMP/SMTC: Fix tc_id calculation
2012-09-22team: send port changed when addedJiri Pirko1-8/+24
On some hw, link is not up during adding iface to team. That causes event not being sent to userspace and that may cause confusion. Fix this bug by sending port changed event once it's added to team. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-09-22Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds8-53/+106
Pull ARM and clkdev fixes from Russell King: "Two patches for clkdev which resolve the long standing issue that the devm_* versions were dependent on clkdev, which they shouldn't have been. Instead, they're dependent on HAVE_CLK instead, which implies that you're providing clk_get() and clk_put(). A small fix to the ARM decompressor to ensure that the page tables are properly interpreted by the CPU, and reserve syscall 378 for kcmp (the checksyscalls.sh script is unfortunately currently broken so arch maintainers aren't getting notified of new syscalls...) Lastly, a larger fix for an issue between the common clk subsystem and smp_twd which causes warnings to be spat out." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: reserve syscall 378 for kcmp ARM: 7535/1: Reprogram smp_twd based on new common clk framework notifiers ARM: 7537/1: clk: Fix release in devm_clk_put() ARM: 7532/1: decompressor: reset SCTLR.TRE for VMSA ARMv7 cores ARM: 7534/1: clk: Make the managed clk functions generically available
2012-09-22Merge branch 'upstream-fixes' of ↵Linus Torvalds3-0/+48
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: "The most important fix is Logitech Unifying receiver regression in device enumeration fix from Nestor Lopez Casado. In addition to that, there is a small memory leak fix for Thinkpad keyboard driver from Axel Lin." * 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: Fix logitech-dj: missing Unifying device issue HID: lenovo-tpkbd: Fix memory leak in tpkbd_remove_tp()
2012-09-22Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-1/+1
Pull cifs fix from Steve French. * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix return value in cifsConvertToUTF16
2012-09-22ipv4: raw: fix icmp_filter()Eric Dumazet1-6/+8
icmp_filter() should not modify its input, or else its caller would need to recompute ip_hdr() if skb->head is reallocated. Use skb_header_pointer() instead of pskb_may_pull() and change the prototype to make clear both sk and skb are const. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-09-22net/phy/bcm87xx: Add MODULE_LICENSE("GPL") to GPL driverPeter Hüwe1-0/+2
Currently the driver has no MODULE_LICENSE attribute in its source which results in a kernel taint if I load this: root@(none):~# modprobe bcm87xx bcm87xx: module license 'unspecified' taints kernel. Since the first lines of the source code clearly state: * This file is subject to the terms and conditions of the GNU General * Public License. See the file "COPYING" in the main directory of this * archive for more details. I think it's safe to add the MODULE_LICENSE("GPL") macro and thus remove the kernel taint. Cc: [email protected] Signed-off-by: Peter Huewe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-09-22Merge branch 'master' of ↵John W. Linville5-4/+29
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-09-22Merge remote-tracking branches 'regmap/topic/cache' and 'regmap/topic/irq' ↵Mark Brown4-26/+83
into regmap-next
2012-09-22HID: Fix logitech-dj: missing Unifying device issueNestor Lopez Casado2-0/+46
This patch fixes an issue introduced after commit 4ea5454203d991ec ("HID: Fix race condition between driver core and ll-driver"). After that commit, hid-core discards any incoming packet that arrives while hid driver's probe function is being executed. This broke the enumeration process of hid-logitech-dj, that must receive control packets in-band with the mouse and keyboard packets. Discarding mouse or keyboard data at the very begining is usually fine, but it is not the case for control packets. This patch forces a re-enumeration of the paired devices when a packet arrives that comes from an unknown device. Based on a patch originally written by Benjamin Tissoires. Cc: [email protected] # v3.2+ Signed-off-by: Nestor Lopez Casado <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2012-09-22HID: lenovo-tpkbd: Fix memory leak in tpkbd_remove_tp()Axel Lin1-0/+2
We need to kfree names for led_mute and led_micmute in tpkbd_remove_tp(). Signed-off-by: Axel Lin <[email protected]> Acked-by: Bernhard Seibold <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2012-09-21libceph: only kunmap kmapped pagesAlex Elder1-4/+1
In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by: Bryan K. Wright <[email protected]> Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Sage Weil <[email protected]>
2012-09-21rbd: drop dev reference on error in rbd_open()Alex Elder1-4/+3
If a read-only rbd device is opened for writing in rbd_open(), it returns without dropping the just-acquired device reference. Fix this by moving the read-only check before getting the reference. Signed-off-by: Alex Elder <[email protected]> Reviewed-by: Yehuda Sadeh <[email protected]> Reviewed-by: Josh Durgin <[email protected]>
2012-09-21x86, kvm: fix kvm's usage of kernel_fpu_begin/end()Suresh Siddha4-15/+40
Preemption is disabled between kernel_fpu_begin/end() and as such it is not a good idea to use these routines in kvm_load/put_guest_fpu() which can be very far apart. kvm_load/put_guest_fpu() routines are already called with preemption disabled and KVM already uses the preempt notifier to save the guest fpu state using kvm_put_guest_fpu(). So introduce __kernel_fpu_begin/end() routines which don't touch preemption and use them instead of kernel_fpu_begin/end() for KVM's use model of saving/restoring guest FPU state. Also with this change (and with eagerFPU model), fix the host cr0.TS vm-exit state in the case of VMX. For eagerFPU case, host cr0.TS is always clear. So no need to worry about it. For the traditional lazyFPU restore case, change the cr0.TS bit for the host state during vm-exit to be always clear and cr0.TS bit is set in the __vmx_load_host_state() when the FPU (guest FPU or the host task's FPU) state is not active. This ensures that the host/guest FPU state is properly saved, restored during context-switch and with interrupts (using irq_fpu_usable()) not stomping on the active FPU state. Signed-off-by: Suresh Siddha <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Cc: Avi Kivity <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2012-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds55-192/+253
Pull networking updates from David Miller: "More bug fixes, nothing gets past these guys" 1) More kernel info leaks found by Mathias Krause, this time in the IPSEC configuration layers. 2) When IPSEC policies change, we do not properly make sure that cached routes (which could now be stale) throughout the system will be revalidated. Fix this by generalizing the generation count invalidation scheme used by ipv4. From Nicolas Dichtel. 3) When repairing TCP sockets, we need to allow to restore not just the send window scale, but the receive one too. Extend the existing interface to achieve this in a backwards compatible way. From Andrey Vagin. 4) A fix for FCOE scatter gather feature validation erroneously caused scatter gather to be disabled for things like AOE too. From Ed L Cashin. 5) Several cases of mishandling of error pointers, from Mathias Krause, Wei Yongjun, and Devendra Naga. 6) Fix gianfar build, from Richard Cochran. 7) CAP_NET_* failures should return -EPERM not -EACCES, from Zhao Hongjiang. 8) Hardware reset fix in janz-ican3 CAN driver, from Ira W Snyder. 9) Fix oops during rmmod in ti_hecc CAN driver, from Marc Kleine-Budde. 10) The removal of the conditional compilation of the clk support code in the stmmac driver broke things. This is because the interfaces used are the ones that don't also perform the enable/disable of the clk. Fix from Stefan Roese. 11) The QFQ packet scheduler can record out of range virtual start times, resulting later in misbehavior and even crashes. Fix from Paolo Valente. 12) If MSG_WAITALL is used with IOAT DMA under TCP, we can wedge the receiver when the advertised receive window goes to zero. Detect this case and force the processing of the IOAT DMA queue when it happens to avoid getting stuck. Fix from Michal Kubecek. 13) batman-adv assumes that test_bit() returns only 0 or 1, but this is not true for x86 (which returns -1 or 0, via the 'sbb' instruction). Fix from Linus Lussing. 14) Fix small packet corruption in e1000, from Tushar Dave. 15) make_blackhole() in the IPSEC policy code can do one read unlock too many, fix from Li RongQing. 16) The new tcp_try_coalesce() code introduced a bug in TCP URG handling, fix from Eric Dumazet. 17) Fix memory leak in __netif_receive_skb() when doing zerocopy and when hit an OOM condition. From Michael S Tsirkin. 18) netxen blindly deferences pdev->bus->self, which is not guarenteed to be non-NULL. Fix from Nikolay Aleksandrov. 19) Fix a performance regression caused by mistakes in ipv6 checksum validation in the bnx2x driver, fix from Michal Schmidt. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits) net/stmmac: Use clk_prepare_enable and clk_disable_unprepare net: change return values from -EACCES to -EPERM net/irda: sh_sir: fix return value check in sh_sir_set_baudrate() stmmac: fix return value check in stmmac_open_ext_timer() gianfar: fix phc index build failure ipv6: fix return value check in fib6_add() bnx2x: remove false warning regarding interrupt number can: ti_hecc: fix oops during rmmod can: janz-ican3: fix support for older hardware revisions net: do not disable sg for packets requiring no checksum aoe: assert AoE packets marked as requiring no checksum at91ether: return PTR_ERR if call to clk_get fails xfrm_user: don't copy esn replay window twice for new states xfrm_user: ensure user supplied esn replay window is valid xfrm_user: fix info leak in copy_to_user_tmpl() xfrm_user: fix info leak in copy_to_user_policy() xfrm_user: fix info leak in copy_to_user_state() xfrm_user: fix info leak in copy_to_user_auth() net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200 tcp: restore rcv_wscale in a repair mode (v2) ...
2012-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds1-8/+5
Pull sparc updates from David Miller: 1) Debugging builds on 32-bit sparc need to handle the R_SPARC_DISP32 relocation, not just 64-bit sparc. From Andreas Larsson. 2) Wei Yongjun noticed that module_alloc() on sparc can return an error pointer, but that's not allowed. module_alloc() should return only a valid pointer, or NULL. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: fix the return value of module_alloc() sparc32: Enable the relocation target R_SPARC_DISP32 for sparc32
2012-09-21Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Small fixlets" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/init.c: Fix devmem_is_allowed() off by one x86/kconfig: Remove outdated reference to Intel CPUs in CONFIG_SWIOTLB
2012-09-21Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-7/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "One more timekeeping fix for v3.6" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Fix timeekeping_get_ns overflow on 32bit systems
2012-09-21Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds7-3/+61
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Small perf fixlets" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing: Don't call page_to_pfn() if page is NULL perf/x86: Fix Intel Ivy Bridge support perf/x86/ibs: Check syscall attribute flags perf/x86: Export Sandy Bridge uncore clockticks event in sysfs
2012-09-21ia64/PCI: Clear host bridge aperture struct resourceYinghai Lu1-2/+1
Use kzalloc() so the struct resource doesn't contain garbage in fields we don't initialize. Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: [email protected]
2012-09-21x86/PCI: Clear host bridge aperture struct resourceYinghai Lu1-2/+1
Use kzalloc() so the struct resource doesn't contain garbage in fields we don't initialize. [bhelgaas: changelog] Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Cc: [email protected]
2012-09-21Merge tag 'for-linus-v3.6-rc7' of git://oss.sgi.com/xfs/xfsLinus Torvalds3-18/+29
Pull xfs bugfixes from Ben Myers: - fix a regression related to xfs_sync_worker racing with unmount. - fix a race while discarding xfs buffers. * tag 'for-linus-v3.6-rc7' of git://oss.sgi.com/xfs/xfs: xfs: stop the sync worker before xfs_unmountfs xfs: fix race while discarding buffers [V4]
2012-09-21Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds6-139/+59
Pull drm fixes from Dave Airlie: "Fixes for big 3 drivers: nouveau: revert earlier MBP fix, put a dmi based MBP fix in its place (fixes a regression we found on some Dell eDP panels doing some internal testing) radeon: revert pll fixes, real fix is too invasive, fix scratch leak intel: 3 minor fixes, one for HDMI audio." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/nouveau: add dmi quirk for gpio reset drm/radeon: Prevent leak of scratch register on resume from suspend Revert "drm/nv50-/gpio: initialise to vbios defaults during init" Revert "drm/radeon: rework pll selection (v3)" drm/i915: HDMI - Clear Audio Enable bit for Hot Plug drm/i915: Reduce a pin-leak BUG into a WARN drm/i915: enable lvds pin pairs before dpll on gen2
2012-09-21Merge branch 'for-linus' of ↵Linus Torvalds7-10/+70
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: "Updates for the input subsystem. Just a few driver updates mostly dealing with recent regressions." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: edt-ft5x06 - return -EFAULT on copy_to_user() error Input: sentelic - filter out erratic movement when lifting finger Input: ambakmi - [un]prepare clocks when enabling amd disabling Input: i8042 - disable mux on Toshiba C850D Revert "input: ab8500-ponkey: Create AB8500 domain IRQ mapping" Input: imx_keypad - fix missing clk conversions Input: usbtouchscreen - initialize eGalax devices