aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2013-05-08arm64: Ignore the 'write' ESR flag on cache maintenance faultsCatalin Marinas1-1/+2
ESR.WnR bit is always set on data cache maintenance faults even though the page is not required to have write permission. If a translation fault (page not yet mapped) happens for read-only user address range, Linux incorrectly assumes a permission fault. This patch adds the check of the ESR.CM bit during the page fault handling to ignore the 'write' flag. Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Tim Northover <[email protected]> Cc: [email protected]
2013-05-08arm64: dts: fix #address-cells for foundation-v8Mark Rutland1-1/+1
Commit 90556ca1 ("arm64: vexpress: Add dts files for the ARMv8 RTSM models") added foundation-v8.dts, but erroneously set /cpus/#address-cells = <1> while providing two cells in each cpus/cpu@N node's reg property. As of commit ea393a2e ("arm64: smp: honour #address-size when parsing CPU reg property") we read in as many address cells as specified rather than always reading two. This means that for foundation-v8.dts, we only read the first reg cell (zero) for each cpu node, and receive a lot of warnings at boot of the form "/cpus/cpu@1: duplicate cpu reg properties in the DT". This patch corrects foundation-v8.dts to have the correct value for /cpus/#address-cells. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Pawel Moll <[email protected]> Cc: Will Deacon <[email protected]> Tested-by: Marc Zyngier <[email protected]> Acked-by: Marc Zyngier <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2013-05-08arm64: vexpress: Add support for poweroff/restartCatalin Marinas3-5/+8
This patch adds the arm_pm_poweroff definition expected by the vexpress-poweroff.c driver and enables the latter for arm64. Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Pawel Moll <[email protected]>
2013-05-08arm64: Enable support for the ARM GIC interrupt controllerCatalin Marinas1-0/+1
This patch enables ARM_GIC on the arm64 kernel. Signed-off-by: Catalin Marinas <[email protected]>
2013-05-08Merge branch 'gic' into HEADCatalin Marinas39-194/+99
* arm64-prep-gic: irqchip: gic: Perform the gic_secondary_init() call via CPU notifier irqchip: gic: Call handle_bad_irq() directly arm: Move chained_irq_(enter|exit) to a generic file arm: Move the set_handle_irq and handle_arch_irq declarations to asm/irq.h
2013-05-08ALSA: hda - Apply pin-enablement workaround to all Haswell HDMI codecsTakashi Iwai1-32/+22
This is a revised patch based on Mengdong Lin's fix patch, which is a supplement to a previous patch [1611a9c9: ALSA: hda - Add fixup for Haswell to enable all pin and convertor widgets]. Some Haswell BIOS will disable the 2nd and 3rd pin/covertor widgets when the HD-A controller changes state from D3 to D0. So when the controller resumes after a system or runtime suspend, these widgets are disabled and programming these widgets to D0 will cause H/W error and codec will not respond. In addition, we found out that some BIOS disables the pins at S3 although it shows up at boot. This confuses the driver utterly, and the hardware falls into the fatal communication error like the above. So in this patch, we apply intel_haswell_enable_all_pins() not only as a fixup to a certain device (with 8086:2010) but to all Haswell machines. The codec driver basically assumes that all pins are exposed, so it's anyway better to see them from the beginning. Even if all pins and converters are shown by this call, there should be no regression in practice: the pin default configurations are still kept, thus the disabled pins are handled as disabled by the driver properly. Signed-off-by: Mengdong Lin <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2013-05-08audit: fix message spacing printing auidEric Paris1-1/+1
The helper function didn't include a leading space, so it was jammed against the previous text in the audit record. Signed-off-by: Eric Paris <[email protected]>
2013-05-07Merge branch 'akpm' (incoming from Andrew)Linus Torvalds89-1364/+900
Merge more incoming from Andrew Morton: - Various fixes which were stalled or which I picked up recently - A large rotorooting of the AIO code. Allegedly to improve performance but I don't really have good performance numbers (I might have lost the email) and I can't raise Kent today. I held this out of 3.9 and we could give it another cycle if it's all too late/scary. I ended up taking only the first two thirds of the AIO rotorooting. I left the percpu parts and the batch completion for later. - Linus * emailed patches from Andrew Morton <[email protected]>: (33 commits) aio: don't include aio.h in sched.h aio: kill ki_retry aio: kill ki_key aio: give shared kioctx fields their own cachelines aio: kill struct aio_ring_info aio: kill batch allocation aio: change reqs_active to include unreaped completions aio: use cancellation list lazily aio: use flush_dcache_page() aio: make aio_read_evt() more efficient, convert to hrtimers wait: add wait_event_hrtimeout() aio: refcounting cleanup aio: make aio_put_req() lockless aio: do fget() after aio_get_req() aio: dprintk() -> pr_debug() aio: move private stuff out of aio.h aio: add kiocb_cancel() aio: kill return value of aio_complete() char: add aio_{read,write} to /dev/{null,zero} aio: remove retry-based AIO ...
2013-05-07aio: don't include aio.h in sched.hKent Overstreet58-7/+58
Faster kernel compiles by way of fewer unnecessary includes. [[email protected]: fix fallout] [[email protected]: fix build] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: kill ki_retryKent Overstreet2-165/+85
Thanks to Zach Brown's work to rip out the retry infrastructure, we don't need this anymore - ki_retry was only called right after the kiocb was initialized. This also refactors and trims some duplicated code, as well as cleaning up the refcounting/error handling a bit. [[email protected]: use fmode_t in aio_run_iocb()] [[email protected]: fix file_start_write/file_end_write tests] [[email protected]: coding-style fixes] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: kill ki_keyKent Overstreet2-7/+9
ki_key wasn't actually used for anything previously - it was always 0. Drop it to trim struct kiocb a bit. Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07Revert "audit: move kaudit thread start from auditd registration to kaudit init"Eric Paris1-4/+10
This reverts commit 6ff5e45985c2fcb97947818f66d1eeaf9d6600b2. Conflicts: kernel/audit.c This patch was starting a kthread for all the time. Since the follow on patches that required it didn't get finished in 3.10 time, we shouldn't ship this change in 3.10. Signed-off-by: Eric Paris <[email protected]>
2013-05-07audit: vfs: fix audit_inode call in O_CREAT case of do_lastJeff Layton1-1/+1
Jiri reported a regression in auditing of open(..., O_CREAT) syscalls. In older kernels, creating a file with open(..., O_CREAT) created audit_name records that looked like this: type=PATH msg=audit(1360255720.628:64): item=1 name="/abc/foo" inode=138810 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0 type=PATH msg=audit(1360255720.628:64): item=0 name="/abc/" inode=138635 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0 ...in recent kernels though, they look like this: type=PATH msg=audit(1360255402.886:12574): item=2 name=(null) inode=264599 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0 type=PATH msg=audit(1360255402.886:12574): item=1 name=(null) inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0 type=PATH msg=audit(1360255402.886:12574): item=0 name="/abc/foo" inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0 Richard bisected to determine that the problems started with commit bfcec708, but the log messages have changed with some later audit-related patches. The problem is that this audit_inode call is passing in the parent of the dentry being opened, but audit_inode is being called with the parent flag false. This causes later audit_inode and audit_inode_child calls to match the wrong entry in the audit_names list. This patch simply sets the flag to properly indicate that this inode represents the parent. With this, the audit_names entries are back to looking like they did before. Cc: <[email protected]> # v3.7+ Reported-by: Jiri Jaburek <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Test By: Richard Guy Briggs <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-05-07audit: Make testing for a valid loginuid explicit.Eric W. Biederman4-3/+25
audit rule additions containing "-F auid!=4294967295" were failing with EINVAL because of a regression caused by e1760bd. Apparently some userland audit rule sets want to know if loginuid uid has been set and are using a test for auid != 4294967295 to determine that. In practice that is a horrible way to ask if a value has been set, because it relies on subtle implementation details and will break every time the uid implementation in the kernel changes. So add a clean way to test if the audit loginuid has been set, and silently convert the old idiom to the cleaner and more comprehensible new idiom. Cc: <[email protected]> # 3.7 Reported-By: Richard Guy Briggs <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Tested-by: Richard Guy Briggs <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-05-08MIPS: Export symbols used by KVM/MIPS moduleSanjay Lal1-0/+1
Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08MIPS: ASM offsets for VCPU arch specific fields.Sanjay Lal1-0/+66
Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08MIPS: If KVM is enabled then use the KVM specific routine to flush the TLBs ↵Sanjay Lal1-0/+6
on a ASID wrap. Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08MIPS: Export routines needed by the KVM module.Sanjay Lal3-2/+7
Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: Routines to handle specific traps/exceptions while executing the ↵Sanjay Lal2-0/+496
guest. Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: Guest interrupt delivery.Sanjay Lal2-0/+292
Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: COP0 accesses profiling.Sanjay Lal1-0/+82
Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: Release notes and KVM module MakefileSanjay Lal2-0/+44
Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: MMU/TLB operations for the Guest.Sanjay Lal1-0/+932
- Note that this file is statically linked with the rest of the host kernel (KSEG0). This is because kernel modules are loaded into mapped space on MIPS and we want to make sure that we don't get any host kernel TLB faults while manipulating TLBs. - Virtual Guest TLBs are implemented as 64 entry array regardless of the number of host TLB entries. - Shadow TLBs map Guest virtual addresses to Host physical addresses. - TLB miss handling details: Guest KSEG0 TLBMISS (0x40000000 – 0x60000000): Transparent to the Guest. Guest KSEG2/3 (0x60000000 – 0x80000000) & Guest UM TLBMISS (0x00000000 – 0x40000000) Lookup in Guest/Virtual TLB If an entry doesn’t match deliver appropriate TLBMISS LD/ST exception to the guest If entry does exist in the Guest TLB and is NOT Valid Deliver TLB invalid exception to the guest If entry does exist in the Guest TLB and is VALID Inject the TLB entry into the Shadow TLB Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: Privileged instruction/target branch emulation.Sanjay Lal2-0/+1853
- The Guest kernel is run in UM and privileged instructions cause a trap. - If the instruction causing the trap is in a branch delay slot, the branch needs to be emulated to figure out the PC @ which the guest will resume execution. Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: KVM Guest kernel support.Sanjay Lal7-3/+52
Both Guest kernel and Guest Userspace execute in UM. The memory map is as follows: Guest User address space: 0x00000000 -> 0x40000000 Guest Kernel Unmapped: 0x40000000 -> 0x60000000 Guest Kernel Mapped: 0x60000000 -> 0x80000000 - Guest Usermode virtual memory is limited to 1GB. Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: MIPS arch specific APIs for KVMSanjay Lal2-0/+1004
- Implements the arch specific APIs for KVM, some are stubs for MIPS - kvm_mips_handle_exit(): Main 'C' distpatch routine for handling exceptions while in "Guest" mode. - Also implements in-kernel timer interrupt support for the guest. Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: Entry point for trampolining to the guest and trap handlers.Sanjay Lal1-0/+650
- __kvm_mips_vcpu_run: main entry point to enter guest, we save kernel context, load up guest context from and ERET to guest context. - mips32_exception: L1 exception handler(s), save k0/k1 and jump to main handlers. - mips32_GuestException: Generic exception handlers for exceptions/interrupts while in guest context. Save guest context, restore some kernel context and jump to main 'C' handler: kvm_mips_handle_exit() Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: Arch specific KVM data structures.Sanjay Lal2-0/+722
Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08KVM/MIPS32: Infrastructure/build files.Sanjay Lal6-1/+985
- Add the KVM option to MIPS build files. - Add default config files for KVM host/guest kernels. - Change the link address for the Malta KVM Guest kernel to UM (0x40100000). - Add KVM Kconfig file with KVM/MIPS specific options Signed-off-by: Sanjay Lal <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ralf Baechle <[email protected]>
2013-05-08MIPS: IP27: Remove pfn_t.Ralf Baechle4-11/+10
In the Linux kernel traditionally pfns are represented by an unsigned long. However a few bits of the SGI IP27 platform code that were ported from IRIX are using pfn_t for historic reasons. This is conflicting with KVM's use of pfn_t. Signed-off-by: Ralf Baechle <[email protected]>
2013-05-07aio: give shared kioctx fields their own cachelinesKent Overstreet1-12/+15
[[email protected]: make reqs_active __cacheline_aligned_in_smp] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: kill struct aio_ring_infoKent Overstreet1-81/+74
struct aio_ring_info was kind of odd, the only place it's used is where it's embedded in struct kioctx - there's no real need for it. The next patch rearranges struct kioctx and puts various things on their own cachelines - getting rid of struct aio_ring_info now makes that reordering a bit clearer. Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: kill batch allocationKent Overstreet2-102/+15
Previously, allocating a kiocb required touching quite a few global (well, per kioctx) cachelines... so batching up allocation to amortize those was worthwhile. But we've gotten rid of some of those, and in another couple of patches kiocb allocation won't require writing to any shared cachelines, so that means we can just rip this code out. Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: change reqs_active to include unreaped completionsKent Overstreet1-15/+32
The aio code tries really hard to avoid having to deal with the completion ringbuffer overflowing. To do that, it has to keep track of the number of outstanding kiocbs, and the number of completions currently in the ringbuffer - and it's got to check that every time we allocate a kiocb. Ouch. But - we can improve this quite a bit if we just change reqs_active to mean "number of outstanding requests and unreaped completions" - that means kiocb allocation doesn't have to look at the ringbuffer, which is a fairly significant win. Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: use cancellation list lazilyKent Overstreet3-55/+81
Cancelling kiocbs requires adding them to a per kioctx linked list, which is one of the few things we need to take the kioctx lock for in the fast path. But most kiocbs can't be cancelled - so if we just do this lazily, we can avoid quite a bit of locking overhead. While we're at it, instead of using a flag bit switch to using ki_cancel itself to indicate that a kiocb has been cancelled/completed. This lets us get rid of ki_flags entirely. [[email protected]: remove buggy BUG()] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: use flush_dcache_page()Kent Overstreet1-28/+17
This wasn't causing problems before because it's not needed on x86, but it is needed on other architectures. Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: make aio_read_evt() more efficient, convert to hrtimersKent Overstreet1-150/+90
Previously, aio_read_event() pulled a single completion off the ringbuffer at a time, locking and unlocking each time. Change it to pull off as many events as it can at a time, and copy them directly to userspace. This also fixes a bug where if copying the event to userspace failed, we'd lose the event. Also convert it to wait_event_interruptible_hrtimeout(), which simplifies it quite a bit. [[email protected]: coding-style fixes] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07wait: add wait_event_hrtimeout()Kent Overstreet1-0/+86
Analagous to wait_event_timeout() and friends, this adds wait_event_hrtimeout() and wait_event_interruptible_hrtimeout(). Note that unlike the versions that use regular timers, these don't return the amount of time remaining when they return - instead, they return 0 or -ETIME if they timed out. because I was uncomfortable with the semantics of doing it the other way (that I could get it right, anyways). If the timer expires, there's no real guarantee that expire_time - current_time would be <= 0 - due to timer slack certainly, and I'm not sure I want to know the implications of the different clock bases in hrtimers. If the timer does expire and the code calculates that the time remaining is nonnegative, that could be even worse if the calling code then reuses that timeout. Probably safer to just return 0 then, but I could imagine weird bugs or at least unintended behaviour arising from that too. I came to the conclusion that if other users end up actually needing the amount of time remaining, the sanest thing to do would be to create a version that uses absolute timeouts instead of relative. [[email protected]: fix description of `timeout' arg] Signed-off-by: Kent Overstreet <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: refcounting cleanupKent Overstreet1-153/+119
The usage of ctx->dead was fubar - it makes no sense to explicitly check it all over the place, especially when we're already using RCU. Now, ctx->dead only indicates whether we've dropped the initial refcount. The new teardown sequence is: set ctx->dead hlist_del_rcu(); synchronize_rcu(); Now we know no system calls can take a new ref, and it's safe to drop the initial ref: put_ioctx(); We also need to ensure there are no more outstanding kiocbs. This was done incorrectly - it was being done in kill_ctx(), and before dropping the initial refcount. At this point, other syscalls may still be submitting kiocbs! Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after kioctx->users has dropped to 0 and we know no more iocbs could be submitted. [[email protected]: coding-style fixes] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: make aio_put_req() locklessKent Overstreet2-54/+36
Freeing a kiocb needed to touch the kioctx for three things: * Pull it off the reqs_active list * Decrementing reqs_active * Issuing a wakeup, if the kioctx was in the process of being freed. This patch moves these to aio_complete(), for a couple reasons: * aio_complete() already has to issue the wakeup, so if we drop the kioctx refcount before aio_complete does its wakeup we don't have to do it twice. * aio_complete currently has to take the kioctx lock, so it makes sense for it to pull the kiocb off the reqs_active list too. * A later patch is going to change reqs_active to include unreaped completions - this will mean allocating a kiocb doesn't have to look at the ringbuffer. So taking the decrement of reqs_active out of kiocb_free() is useful prep work for that patch. This doesn't really affect cancellation, since existing (usb) code that implements a cancel function still calls aio_complete() - we just have to make sure that aio_complete does the necessary teardown for cancelled kiocbs. It does affect code paths where we free kiocbs that were never submitted; they need to decrement reqs_active and pull the kiocb off the reqs_active list. This occurs in two places: kiocb_batch_free(), which is going away in a later patch, and the error path in io_submit_one. [[email protected]: coding-style fixes] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: do fget() after aio_get_req()Kent Overstreet1-13/+9
aio_get_req() will fail if we have the maximum number of requests outstanding, which depending on the application may not be uncommon. So avoid doing an unnecessary fget(). Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: dprintk() -> pr_debug()Kent Overstreet1-33/+24
Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: move private stuff out of aio.hKent Overstreet3-61/+62
Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: add kiocb_cancel()Kent Overstreet1-36/+43
Minor refactoring, to get rid of some duplicated code [[email protected]: fix warning] Signed-off-by: Kent Overstreet <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: kill return value of aio_complete()Kent Overstreet2-18/+11
Nothing used the return value, and it probably wasn't possible to use it safely for the locked versions (aio_complete(), aio_put_req()). Just kill it. Signed-off-by: Kent Overstreet <[email protected]> Acked-by: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07char: add aio_{read,write} to /dev/{null,zero}Zach Brown1-0/+35
These are handy for measuring the cost of the aio infrastructure with operations that do very little and complete immediately. Signed-off-by: Zach Brown <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: remove retry-based AIOZach Brown5-379/+31
This removes the retry-based AIO infrastructure now that nothing in tree is using it. We want to remove retry-based AIO because it is fundemantally unsafe. It retries IO submission from a kernel thread that has only assumed the mm of the submitting task. All other task_struct references in the IO submission path will see the kernel thread, not the submitting task. This design flaw means that nothing of any meaningful complexity can use retry-based AIO. This removes all the code and data associated with the retry machinery. The most significant benefit of this is the removal of the locking around the unused run list in the submission path. [[email protected]: coding-style fixes] Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Zach Brown <[email protected]> Cc: Zach Brown <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07gadget: remove only user of aio retryZach Brown1-9/+29
This removes the only in-tree user of aio retry. This will let us remove the retry code from the aio core. Removing retry is relatively easy as the USB gadget wasn't using it to retry IOs at all. It always fully submitted the IO in the context of the initial io_submit() call. It only used the AIO retry facility to get the submitter's mm context for copying the result of a read back to user space. This is easy to implement with use_mm() and a work struct, much like kvm does with async_pf_execute() for get_user_pages(). [[email protected]: coding-style fixes] Signed-off-by: Zach Brown <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07aio: remove dead code from aio.hZach Brown1-24/+0
Signed-off-by: Zach Brown <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-05-07mm: remove old aio use_mm() commentZach Brown1-3/+0
Bunch of performance improvements and cleanups Zach Brown and I have been working on. The code should be pretty solid at this point, though it could of course use more review and testing. The results in my testing are pretty impressive, particularly when an ioctx is being shared between multiple threads. In my crappy synthetic benchmark, with 4 threads submitting and one thread reaping completions, I saw overhead in the aio code go from ~50% (mostly ioctx lock contention) to low single digits. Performance with ioctx per thread improved too, but I'd have to rerun those benchmarks. The reason I've been focused on performance when the ioctx is shared is that for a fair number of real world completions, userspace needs the completions aggregated somehow - in practice people just end up implementing this aggregation in userspace today, but if it's done right we can do it much more efficiently in the kernel. Performance wise, the end result of this patch series is that submitting a kiocb writes to _no_ shared cachelines - the penalty for sharing an ioctx is gone there. There's still going to be some cacheline contention when we deliver the completions to the aio ringbuffer (at least if you have interrupts being delivered on multiple cores, which for high end stuff you do) but I have a couple more patches not in this series that implement coalescing for that (by taking advantage of interrupt coalescing). With that, there's basically no bottlenecks or performance issues to speak of in the aio code. This patch: use_mm() is used in more places than just aio. There's no need to mention callers when describing the function. Signed-off-by: Zach Brown <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Cc: Felipe Balbi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Asai Thambi S P <[email protected]> Cc: Selvan Mani <[email protected]> Cc: Sam Bradshaw <[email protected]> Acked-by: Jeff Moyer <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin LaHaise <[email protected]> Reviewed-by: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>