aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2012-07-22workqueue: fix spurious CPU locality WARN from process_one_work()Tejun Heo1-0/+6
25511a4776 "workqueue: reimplement CPU online rebinding to handle idle workers" added CPU locality sanity check in process_one_work(). It triggers if a worker is executing on a different CPU without UNBOUND or REBIND set. This works for all normal workers but rescuers can trigger this spuriously when they're serving the unbound or a disassociated global_cwq - rescuers don't have either flag set and thus its gcwq->cpu can be a different value including %WORK_CPU_UNBOUND. Fix it by additionally testing %GCWQ_DISASSOCIATED. Signed-off-by: Tejun Heo <[email protected]> Reported-by: "Paul E. McKenney" <[email protected]> LKML-Refence: <[email protected]>
2012-07-22kthread_worker: reimplement flush_kthread_work() to allow freeing the work ↵Tejun Heo1-21/+27
item being executed kthread_worker provides minimalistic workqueue-like interface for users which need a dedicated worker thread (e.g. for realtime priority). It has basic queue, flush_work, flush_worker operations which mostly match the workqueue counterparts; however, due to the way flush_work() is implemented, it has a noticeable difference of not allowing work items to be freed while being executed. While the current users of kthread_worker are okay with the current behavior, the restriction does impede some valid use cases. Also, removing this difference isn't difficult and actually makes the code easier to understand. This patch reimplements flush_kthread_work() such that it uses a flush_work item instead of queue/done sequence numbers. Signed-off-by: Tejun Heo <[email protected]>
2012-07-22kthread_worker: reorganize to prepare for flush_kthread_work() reimplementationTejun Heo1-16/+26
Make the following two non-functional changes. * Separate out insert_kthread_work() from queue_kthread_work(). * Relocate struct kthread_flush_work and kthread_flush_work_fn() definitions above flush_kthread_work(). v2: Added lockdep_assert_held() in insert_kthread_work() as suggested by Andy Walls. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Andy Walls <[email protected]>
2012-07-21Merge branch 'anton-kgdb' (kgdb dmesg fixups)Linus Torvalds3-87/+88
Merge emailed kgdb dmesg fixups patches from Anton Vorontsov: "The dmesg command appears to be broken after the printk rework. The old logic in the kdb code makes no sense in terms of current printk/logging storage format, and KDB simply hangs forever upon entering 'dmesg' command. The first patch revives the command by switching to kmsg_dumper iterator. As a side-effect, the code is now much more simpler. A few changes were needed in the printk.c: we needed unlocked variant of the kmsg_dumper iterator, but these can surely wait for 3.6. It's probably too late even for the first patch to go to 3.5, but I'll try to convince otherwise. :-) Here we go: - The current code is broken for sure, and has no hope to work at all. It is a regression - The new code works for me, and probably works for everyone else; - If it compiles (and I urge everyone to compile-test it on your setup), it hardly can make things worse." * Merge emailed patches from Anton Vorontsov: (4 commits) kdb: Switch to nolock variants of kmsg_dump functions printk: Implement some unlocked kmsg_dump functions printk: Remove kdb_syslog_data kdb: Revive dmesg command
2012-07-21kdb: Switch to nolock variants of kmsg_dump functionsAnton Vorontsov1-4/+4
The locked variants are prone to deadlocks (suppose we got to the debugger w/ the logbuf lock held), so let's switch to nolock variants. Signed-off-by: Anton Vorontsov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-21printk: Implement some unlocked kmsg_dump functionsAnton Vorontsov1-13/+55
If used from KDB, the locked variants are prone to deadlocks (suppose we got to the debugger w/ the logbuf lock held). So, we have to implement a few routines that grab no logbuf lock. Yet we don't need these functions in modules, so we don't export them. Signed-off-by: Anton Vorontsov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-21printk: Remove kdb_syslog_dataAnton Vorontsov2-16/+0
The function is no longer needed, so remove it. Signed-off-by: Anton Vorontsov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-21kdb: Revive dmesg commandAnton Vorontsov1-58/+33
The kgdb dmesg command is broken after the printk rework. The old logic in kdb code makes no sense in terms of current printk/logging storage format, and KDB simply hangs forever. This patch revives the command by switching to kmsg_dumper iterator. The code is now much more simpler and shorter. Signed-off-by: Anton Vorontsov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-20[SCSI] async: make async_synchronize_full() flush all work regardless of domainDan Williams1-2/+41
In response to an async related regression James noted: "My theory is that this is an init problem: The assumption in a lot of our code is that async_synchronize_full() waits for everything ... even the domain specific async schedules, which isn't true." ...so make this assumption true. Each domain, including the default one, registers itself on a global domain list when work is scheduled. Once all entries complete it exits that list. Waiting for the list to be empty syncs all in-flight work across all domains. Domains can opt-out of global syncing if they are declared as exclusive ASYNC_DOMAIN_EXCLUSIVE(). All stack-based domains have been declared exclusive since the domain may go out of scope as soon as the last work item completes. Statically declared domains are mostly ok, but async_unregister_domain() is there to close any theoretical races with pending async_synchronize_full waiters at module removal time. Signed-off-by: Dan Williams <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Reported-by: Meelis Roos <[email protected]> Reported-by: Eldad Zack <[email protected]> Tested-by: Eldad Zack <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-07-20[SCSI] async: introduce 'async_domain' typeDan Williams1-18/+17
This is in preparation for teaching async_synchronize_full() to sync all pending async work, and not just on the async_running domain. This conversion is functionally equivalent, just embedding the existing list in a new async_domain type. The .registered attribute is used in a later patch to distinguish between domains that want to be flushed by async_synchronize_full() versus those that only expect async_synchronize_{full|cookie}_domain to be used for flushing. [jejb: add async.h to scsi_priv.h for struct async_domain] Signed-off-by: Dan Williams <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Acked-by: Mark Brown <[email protected]> Tested-by: Eldad Zack <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2012-07-19printk: Export struct log size and member offsets through vmcoreinfoVivek Goyal1-0/+9
There are tools like makedumpfile and vmcore-dmesg which can extract kernel log buffer from vmcore. Since we introduced structured logging, that functionality is broken. Now user space tools need to know about "struct log" and offsets of various fields to be able to parse struct log data and extract text message or dictonary. This patch exports some of the fields. Currently I am not exporting log "level" info as that is a bitfield and offsetof() bitfields can't be calculated. But if people start asking for log level info in the output then we probably either need to seprate out "level" or use bit shift operations for flags and level. Signed-off-by: Vivek Goyal <[email protected]> Acked-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller17-212/+478
Conflicts: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
2012-07-19random: remove rand_initialize_irq()Theodore Ts'o1-17/+0
With the new interrupt sampling system, we are no longer using the timer_rand_state structure in the irq descriptor, so we can stop initializing it now. [ Merged in fixes from Sedat to find some last missing references to rand_initialize_irq() ] Signed-off-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Sedat Dilek <[email protected]>
2012-07-18Make wait_for_device_probe() also do scsi_complete_async_scans()Linus Torvalds2-10/+0
Commit a7a20d103994 ("sd: limit the scope of the async probe domain") make the SCSI device probing run device discovery in it's own async domain. However, as a result, the partition detection was no longer synchronized by async_synchronize_full() (which, despite the name, only synchronizes the global async space, not all of them). Which in turn meant that "wait_for_device_probe()" would not wait for the SCSI partitions to be parsed. And "wait_for_device_probe()" was what the boot time init code relied on for mounting the root filesystem. Now, most people never noticed this, because not only is it timing-dependent, but modern distributions all use initrd. So the root filesystem isn't actually on a disk at all. And then before they actually mount the final disk filesystem, they will have loaded the scsi-wait-scan module, which not only does the expected wait_for_device_probe(), but also does scsi_complete_async_scans(). [ Side note: scsi_complete_async_scans() had also been partially broken, but that was fixed in commit 43a8d39d0137 ("fix async probe regression"), so that same commit a7a20d103994 had actually broken setups even if you used scsi-wait-scan explicitly ] Solve this problem by just moving the scsi_complete_async_scans() call into wait_for_device_probe(). Everybody who wants to wait for device probing to finish really wants the SCSI probing to complete, so there's no reason not to do this. So now "wait_for_device_probe()" really does what the name implies, and properly waits for device probing to finish. This also removes the now unnecessary extra calls to scsi_complete_async_scans(). Reported-and-tested-by: Artem S. Tashkinov <[email protected]> Cc: Dan Williams <[email protected]> Cc: Alan Stern <[email protected]> Cc: James Bottomley <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: linux-scsi <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-19PM / Sleep: Require CAP_BLOCK_SUSPEND to use wake_lock/wake_unlockRafael J. Wysocki1-0/+7
Require processes wanting to use the wake_lock/wake_unlock sysfs files to have the CAP_BLOCK_SUSPEND capability, which also is required for the eventpoll EPOLLWAKEUP flag to be effective, so that all interfaces related to blocking autosleep depend on the same capability. Signed-off-by: Rafael J. Wysocki <[email protected]> Cc: [email protected] Acked-by: Michael Kerrisk <[email protected]>
2012-07-18Merge branch 'fixes' into pm-sleepRafael J. Wysocki16-204/+478
The 'fixes' branch contains material the next commit depends on.
2012-07-18Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip One more time/ntp fix pulled from Ingo Molnar. * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ntp: Fix STA_INS/DEL clearing bug
2012-07-18Merge branch 'linus' into timers/coreIngo Molnar1-0/+1
Resolve semantic conflict in kernel/time/timekeeping.c. Signed-off-by: Ingo Molnar <[email protected]>
2012-07-18Merge branch 'tip/perf/core' of ↵Ingo Molnar1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing fix from Steve Rostedt. Signed-off-by: Ingo Molnar <[email protected]>
2012-07-18Merge branch 'linus' into perf/coreIngo Molnar14-200/+472
Pick up the latest ring-buffer fixes, before applying a new fix. Signed-off-by: Ingo Molnar <[email protected]>
2012-07-17workqueue: simplify CPU hotplug codeTejun Heo1-54/+25
With trustee gone, CPU hotplug code can be simplified. * gcwq_claim/release_management() now grab and release gcwq lock too respectively and gained _and_lock and _and_unlock postfixes. * All CPU hotplug logic was implemented in workqueue_cpu_callback() which was called by workqueue_cpu_up/down_callback() for the correct priority. This was because up and down paths shared a lot of logic, which is no longer true. Remove workqueue_cpu_callback() and move all hotplug logic into the two actual callbacks. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: remove CPU offline trusteeTejun Heo1-252/+36
With the previous changes, a disassociated global_cwq now can run as an unbound one on its own - it can create workers as necessary to drain remaining works after the CPU has been brought down and manage the number of workers using the usual idle timer mechanism making trustee completely redundant except for the actual unbinding operation. This patch removes the trustee and let a disassociated global_cwq manage itself. Unbinding is moved to a work item (for CPU affinity) which is scheduled and flushed from CPU_DONW_PREPARE. This patch moves nr_running clearing outside gcwq and manager locks to simplify the code. As nr_running is unused at the point, this is safe. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: don't butcher idle workers on an offline CPUTejun Heo1-80/+14
Currently, during CPU offlining, after all pending work items are drained, the trustee butchers all workers. Also, on CPU onlining failure, workqueue_cpu_callback() ensures that the first idle worker is destroyed. Combined, these guarantee that an offline CPU doesn't have any worker for it once all the lingering work items are finished. This guarantee isn't really necessary and makes CPU on/offlining more expensive than needs to be, especially for platforms which use CPU hotplug for powersaving. This patch lets offline CPUs removes idle worker butchering from the trustee and let a CPU which failed onlining keep the created first worker. The first worker is created if the CPU doesn't have any during CPU_DOWN_PREPARE and started right away. If onlining succeeds, the rebind_workers() call in CPU_ONLINE will rebind it like any other workers. If onlining fails, the worker is left alone till the next try. This makes CPU hotplugs cheaper by allowing global_cwqs to keep workers across them and simplifies code. Note that trustee doesn't re-arm idle timer when it's done and thus the disassociated global_cwq will keep all workers until it comes back online. This will be improved by further patches. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: reimplement CPU online rebinding to handle idle workersTejun Heo1-49/+166
Currently, if there are left workers when a CPU is being brough back online, the trustee kills all idle workers and scheduled rebind_work so that they re-bind to the CPU after the currently executing work is finished. This works for busy workers because concurrency management doesn't try to wake up them from scheduler callbacks, which require the target task to be on the local run queue. The busy worker bumps concurrency counter appropriately as it clears WORKER_UNBOUND from the rebind work item and it's bound to the CPU before returning to the idle state. To reduce CPU on/offlining overhead (as many embedded systems use it for powersaving) and simplify the code path, workqueue is planned to be modified to retain idle workers across CPU on/offlining. This patch reimplements CPU online rebinding such that it can also handle idle workers. As noted earlier, due to the local wakeup requirement, rebinding idle workers is tricky. All idle workers must be re-bound before scheduler callbacks are enabled. This is achieved by interlocking idle re-binding. Idle workers are requested to re-bind and then hold until all idle re-binding is complete so that no bound worker starts executing work item. Only after all idle workers are re-bound and parked, CPU_ONLINE proceeds to release them and queue rebind work item to busy workers thus guaranteeing scheduler callbacks aren't invoked until all idle workers are ready. worker_rebind_fn() is renamed to busy_worker_rebind_fn() and idle_worker_rebind() for idle workers is added. Rebinding logic is moved to rebind_workers() and now called from CPU_ONLINE after flushing trustee. While at it, add CPU sanity check in worker_thread(). Note that now a worker may become idle or the manager between trustee release and rebinding during CPU_ONLINE. As the previous patch updated create_worker() so that it can be used by regular manager while unbound and this patch implements idle re-binding, this is safe. This prepares for removal of trustee and keeping idle workers across CPU hotplugs. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: drop @bind from create_worker()Tejun Heo1-19/+45
Currently, create_worker()'s callers are responsible for deciding whether the newly created worker should be bound to the associated CPU and create_worker() sets WORKER_UNBOUND only for the workers for the unbound global_cwq. Creation during normal operation is always via maybe_create_worker() and @bind is true. For workers created during hotplug, @bind is false. Normal operation path is planned to be used even while the CPU is going through hotplug operations or offline and this static decision won't work. Drop @bind from create_worker() and decide whether to bind by looking at GCWQ_DISASSOCIATED. create_worker() will also set WORKER_UNBOUND autmatically if disassociated. To avoid flipping GCWQ_DISASSOCIATED while create_worker() is in progress, the flag is now allowed to be changed only while holding all manager_mutexes on the global_cwq. This requires that GCWQ_DISASSOCIATED is not cleared behind trustee's back. CPU_ONLINE no longer clears DISASSOCIATED before flushing trustee, which clears DISASSOCIATED before rebinding remaining workers if asked to release. For cases where trustee isn't around, CPU_ONLINE clears DISASSOCIATED after flushing trustee. Also, now, first_idle has UNBOUND set on creation which is explicitly cleared by CPU_ONLINE while binding it. These convolutions will soon be removed by further simplification of CPU hotplug path. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: use mutex for global_cwq manager exclusionTejun Heo1-39/+26
POOL_MANAGING_WORKERS is used to ensure that at most one worker takes the manager role at any given time on a given global_cwq. Trustee later hitched on it to assume manager adding blocking wait for the bit. As trustee already needed a custom wait mechanism, waiting for MANAGING_WORKERS was rolled into the same mechanism. Trustee is scheduled to be removed. This patch separates out MANAGING_WORKERS wait into per-pool mutex. Workers use mutex_trylock() to test for manager role and trustee uses mutex_lock() to claim manager roles. gcwq_claim/release_management() helpers are added to grab and release manager roles of all pools on a global_cwq. gcwq_claim_management() always grabs pool manager mutexes in ascending pool index order and uses pool index as lockdep subclass. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: ROGUE workers are UNBOUND workersTejun Heo1-25/+21
Currently, WORKER_UNBOUND is used to mark workers for the unbound global_cwq and WORKER_ROGUE is used to mark workers for disassociated per-cpu global_cwqs. Both are used to make the marked worker skip concurrency management and the only place they make any difference is in worker_enter_idle() where WORKER_ROGUE is used to skip scheduling idle timer, which can easily be replaced with trustee state testing. This patch replaces WORKER_ROGUE with WORKER_UNBOUND and drops WORKER_ROGUE. This is to prepare for removing trustee and handling disassociated global_cwqs as unbound. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: drop CPU_DYING notifier operationTejun Heo1-16/+13
Workqueue used CPU_DYING notification to mark GCWQ_DISASSOCIATED. This was necessary because workqueue's CPU_DOWN_PREPARE happened before other DOWN_PREPARE notifiers and workqueue needed to stay associated across the rest of DOWN_PREPARE. After the previous patch, workqueue's DOWN_PREPARE happens after others and can set GCWQ_DISASSOCIATED directly. Drop CPU_DYING and let the trustee set GCWQ_DISASSOCIATED after disabling concurrency management. Signed-off-by: Tejun Heo <[email protected]> Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17workqueue: perform cpu down operations from low priority cpu_notifier()Tejun Heo1-1/+37
Currently, all workqueue cpu hotplug operations run off CPU_PRI_WORKQUEUE which is higher than normal notifiers. This is to ensure that workqueue is up and running while bringing up a CPU before other notifiers try to use workqueue on the CPU. Per-cpu workqueues are supposed to remain working and bound to the CPU for normal CPU_DOWN_PREPARE notifiers. This holds mostly true even with workqueue offlining running with higher priority because workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which runs the per-cpu workqueue without concurrency management without explicitly detaching the existing workers. However, if the trustee needs to create new workers, it creates unbound workers which may wander off to other CPUs while CPU_DOWN_PREPARE notifiers are in progress. Furthermore, if the CPU down is cancelled, the per-CPU workqueue may end up with workers which aren't bound to the CPU. While reliably reproducible with a convoluted artificial test-case involving scheduling and flushing CPU burning work items from CPU down notifiers, this isn't very likely to happen in the wild, and, even when it happens, the effects are likely to be hidden by the following successful CPU down. Fix it by using different priorities for up and down notifiers - high priority for up operations and low priority for down operations. Workqueue cpu hotplug operations will soon go through further cleanup. Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Acked-by: "Rafael J. Wysocki" <[email protected]>
2012-07-17tracing/function: Convert func_set_flag() to a switch statementAnton Vorontsov1-6/+9
Since the function accepts just one bit, we can use the switch construction instead of if/else if/... Just a cosmetic change, there should be no functional changes. Suggested-by: Steven Rostedt <[email protected]> Signed-off-by: Anton Vorontsov <[email protected]> Acked-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-17tracing/function: Introduce persistent trace optionAnton Vorontsov1-5/+20
This patch introduces 'func_ptrace' option, now available in /sys/kernel/debug/tracing/options when function tracer is selected. The patch also adds some tiny code that calls back to pstore to record the trace. The callback is no-op when PSTORE=n. Signed-off-by: Anton Vorontsov <[email protected]> Acked-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-17tracing: Fix initialization failure path in tracing_set_tracer()Anton Vorontsov1-3/+4
If tracer->init() fails, current code will leave current_tracer pointing to an unusable tracer, which at best makes 'current_tracer' report inaccurate value. Fix the issue by pointing current_tracer to nop tracer, and only update current_tracer with the new one after all the initialization succeeds. Signed-off-by: Anton Vorontsov <[email protected]> Acked-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-16kmsg - do not flush partial lines when the console is busyKay Sievers1-25/+68
Fragments of continuation lines are flushed to the console immediately. In case the console is locked, the fragment must be queued up in the cont buffer. If the the console is busy and the continuation line is complete, but no part of it was written to the console up to this point, we can just store the entire line as a regular record and free the buffer earlier. If the console is busy and earlier messages are already queued up, we should not flush the fragments of continuation lines, but store them after the queued up messages, to ensure the proper ordering. This keeps the console output better readable in case printk()s race against each other, or we receive over-long continuation lines we need to flush. Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-16kmsg - export "continuation record" flag to /dev/kmsgKay Sievers1-2/+21
In some cases we are forced to store individual records for a continuation line print. Export a flag to allow the external re-construction of the line. The flag allows us to apply a similar logic externally which is used internally when the console, /proc/kmsg or the syslog() output is printed. $ cat /dev/kmsg 4,165,0,-;Free swap = 0kB 4,166,0,-;Total swap = 0kB 6,167,0,c;[ 4,168,0,+;0 4,169,0,+;1 4,170,0,+;2 4,171,0,+;3 4,172,0,+;] 6,173,0,-;[0 1 2 3 ] 6,174,0,-;Console: colour VGA+ 80x25 6,175,0,-;console [tty0] enabled Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-16kmsg - avoid warning for CONFIG_PRINTK=n compilationsKay Sievers1-2/+9
Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-16kmsg - properly print over-long continuation linesKay Sievers1-14/+19
Reserve PREFIX_MAX bytes in the LOG_LINE_MAX line when buffering a continuation line, to be able to properly prefix the LOG_LINE_MAX line with the syslog prefix and timestamp when printing it. Reported-By: Dave Jones <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-07-16timekeeping: Add missing update call in timekeeping_resume()Thomas Gleixner1-0/+1
The leap second rework unearthed another issue of inconsistent data. On timekeeping_resume() the timekeeper data is updated, but nothing calls timekeeping_update(), so now the update code in the timer interrupt sees stale values. This has been the case before those changes, but then the timer interrupt was using stale data as well so this went unnoticed for quite some time. Add the missing update call, so all the data is consistent everywhere. Reported-by: Andreas Schwab <[email protected]> Reported-and-tested-by: "Rafael J. Wysocki" <[email protected]> Reported-and-tested-by: Martin Steigerwald <[email protected]> Cc: LKML <[email protected]> Cc: Linux PM list <[email protected]> Cc: John Stultz <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]>, Cc: Prarit Bhargava <[email protected]> Cc: [email protected] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-07-15time: Rework timekeeping functions to take timekeeper ptr as argumentJohn Stultz1-105/+103
As part of cleaning up the timekeeping code, this patch converts a number of internal functions to takei a timekeeper ptr as an argument, so that the internal functions don't access the global timekeeper structure directly. This allows for further optimizations to reduce lock hold time later. This patch has been updated to include more consistent usage of the timekeeper value, by making sure it is always passed as a argument to non top-level functions. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15time: Move xtime_nsec adjustment underflow handling timekeeping_adjustJohn Stultz1-21/+21
When we make adjustments speeding up the clock, its possible for xtime_nsec to underflow. We already handle this properly, but we do so from update_wall_time() instead of the more logical timekeeping_adjust(), where the possible underflow actually occurs. Thus, move the correction logic to the timekeeping_adjust, which is the function that causes the issue. Making update_wall_time() more readable. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15time: Move arch_gettimeoffset() usage into timekeeping_get_ns()John Stultz1-19/+10
Since we call arch_gettimeoffset() in all the accessor functions, move arch_gettimeoffset() calls into timekeeping_get_ns() and timekeeping_get_ns_raw() to simplify the code. This also makes the code easier to maintain as we don't have to worry about forgetting the arch_gettimeoffset() as has happened in the past. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15time: Refactor accumulation of nsecs to secsJohn Stultz1-22/+32
We do the exact same logic moving nsecs to secs in the timekeeper in multiple places, so condense this into a single function. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15time: Condense timekeeper.xtime into xtime_secJohn Stultz1-71/+110
The timekeeper struct has a xtime_nsec, which keeps the sub-nanosecond remainder. This ends up being somewhat duplicative of the timekeeper.xtime.tv_nsec value, and we have to do extra work to keep them apart, copying the full nsec portion out and back in over and over. This patch simplifies some of the logic by taking the timekeeper xtime value and splitting it into timekeeper.xtime_sec and reuses the timekeeper.xtime_nsec for the sub-second portion (stored in higher res shifted nanoseconds). This simplifies some of the accumulation logic. And will allow for more accurate timekeeping once the vsyscall code is updated to use the shifted nanosecond remainder. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15time: Explicitly use u32 instead of int for shift valuesJohn Stultz1-3/+3
Ingo noted that using a u32 instead of int for shift values would be better to make sure the compiler doesn't unnecessarily use complex signed arithmetic. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15time: Whitespace cleanups per Ingo%27s requestsJohn Stultz1-21/+18
Ingo noted a number of places where there is inconsistent use of whitespace. This patch tries to address the main culprits. Signed-off-by: John Stultz <[email protected]> Reviewed-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15Merge branch 'timers/urgent' into timers/coreThomas Gleixner22-394/+1061
Reason: Update to upstream changes to avoid further conflicts. Fixup a trivial merge conflict in kernel/time/tick-sched.c Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-15ntp: Fix STA_INS/DEL clearing bugJohn Stultz1-2/+6
In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I introduced a bug that kept the STA_INS or STA_DEL bit from being cleared from time_status via adjtimex() without forcing STA_PLL first. Usually once the STA_INS is set, it isn't cleared until the leap second is applied, so its unlikely this affected anyone. However during testing I noticed it took some effort to cancel a leap second once STA_INS was set. Signed-off-by: John Stultz <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Prarit Bhargava <[email protected]> CC: [email protected] # 3.4 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2012-07-14random: make 'add_interrupt_randomness()' do something saneTheodore Ts'o1-4/+3
We've been moving away from add_interrupt_randomness() for various reasons: it's too expensive to do on every interrupt, and flooding the CPU with interrupts could theoretically cause bogus floods of entropy from a somewhat externally controllable source. This solves both problems by limiting the actual randomness addition to just once a second or after 64 interrupts, whicever comes first. During that time, the interrupt cycle data is buffered up in a per-cpu pool. Also, we make sure the the nonblocking pool used by urandom is initialized before we start feeding the normal input pool. This assures that /dev/urandom is returning unpredictable data as soon as possible. (Based on an original patch by Linus, but significantly modified by tytso.) Tested-by: Eric Wustrow <[email protected]> Reported-by: Eric Wustrow <[email protected]> Reported-by: Nadia Heninger <[email protected]> Reported-by: Zakir Durumeric <[email protected]> Reported-by: J. Alex Halderman <[email protected]>. Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: [email protected]
2012-07-14Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds9-85/+229
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU, perf, and scheduler fixes from Ingo Molnar. The RCU fix is a revert for an optimization that could cause deadlocks. One of the scheduler commits (164c33c6adee "sched: Fix fork() error path to not crash") is correct but not complete (some architectures like Tile are not covered yet) - the resulting additional fixes are still WIP and Ingo did not want to delay these pending fixes. See this thread on lkml: [PATCH] fork: fix error handling in dup_task() The perf fixes are just trivial oneliners. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix segfault with report and mixed guestmount use perf kvm: Fix regression with guest machine creation perf script: Fix format regression due to libtraceevent merge ring-buffer: Fix accounting of entries when removing pages ring-buffer: Fix crash due to uninitialized new_pages list head * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS/sched: Update scheduler file pattern sched/nohz: Rewrite and fix load-avg computation -- again sched: Fix fork() error path to not crash
2012-07-14VFS: Pass mount flags to sget()David Howells1-1/+1
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>
2012-07-14VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errorsDavid Howells1-5/+5
copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <[email protected]> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <[email protected]> Signed-off-by: David Howells <[email protected]> Cc: Valerie Aurora <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Signed-off-by: Al Viro <[email protected]>