aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2013-10-24of/irq: simplify args to irq_create_of_mappingGrant Likely7-13/+8
All the callers of irq_create_of_mapping() pass the contents of a struct of_phandle_args structure to the function. Since all the callers already have an of_phandle_args pointer, why not pass it directly to irq_create_of_mapping()? Signed-off-by: Grant Likely <[email protected]> Acked-by: Michal Simek <[email protected]> Acked-by: Tony Lindgren <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]>
2013-10-24of/irq: Replace of_irq with of_phandle_argsGrant Likely9-36/+30
struct of_irq and struct of_phandle_args are exactly the same structure. This patch makes the kernel use of_phandle_args everywhere. This in itself isn't a big deal, but it makes some follow-on patches simpler. Signed-off-by: Grant Likely <[email protected]> Acked-by: Michal Simek <[email protected]> Acked-by: Tony Lindgren <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]>
2013-10-24of/irq: Rename of_irq_map_* functions to of_irq_parse_*Grant Likely9-9/+9
The OF irq handling code has been overloading the term 'map' to refer to both parsing the data in the device tree and mapping it to the internal linux irq system. This is probably because the device tree does have the concept of an 'interrupt-map' function for translating interrupt references from one node to another, but 'map' is still confusing when the primary purpose of some of the functions are to parse the DT data. This patch renames all the of_irq_map_* functions to of_irq_parse_* which makes it clear that there is a difference between the parsing phase and the mapping phase. Kernel code can make use of just the parsing or just the mapping support as needed by the subsystem. The patch was generated mechanically with a handful of sed commands. Signed-off-by: Grant Likely <[email protected]> Acked-by: Michal Simek <[email protected]> Acked-by: Tony Lindgren <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]>
2013-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller12-61/+169
Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <[email protected]>
2013-10-23powerpc: select ARCH_MIGHT_HAVE_PC_PARPORTMark Salter1-0/+1
Architectures which support CONFIG_PARPORT_PC should select ARCH_MIGHT_HAVE_PC_PARPORT. Signed-off-by: Mark Salter <[email protected]> CC: Benjamin Herrenschmidt <[email protected]> CC: Paul Mackerras <[email protected]> CC: [email protected]
2013-10-23powerpc: Don't corrupt user registers on 32-bitPaul Mackerras2-12/+17
Commit de79f7b9f6 ("powerpc: Put FP/VSX and VR state into structures") modified load_up_fpu() and load_up_altivec() in such a way that they now use r7 and r8. Unfortunately, the callers of these functions on 32-bit machines then return to userspace via fast_exception_return, which doesn't restore all of the volatile GPRs, but only r1, r3 -- r6 and r9 -- r12. This was causing userspace segfaults and other userspace misbehaviour on 32-bit machines. This fixes the problem by changing the register usage of load_up_fpu() and load_up_altivec() to avoid using r7 and r8 and instead use r6 and r10. This also adds comments to those functions saying which registers may be used. Signed-off-by: Paul Mackerras <[email protected]> Tested-by: Scott Wood <[email protected]> (on e500mc, so no altivec) Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2013-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+1
Pull networking fixes from David Miller: "Sorry I let so much accumulate, I was in Buffalo and wanted a few things to cook in my tree for a while before sending to you. Anyways, it's a lot of little things as usual at this stage in the game" 1) Make bonding MAINTAINERS entry reflect reality, from Andy Gospodarek. 2) Fix accidental sock_put() on timewait mini sockets, from Eric Dumazet. 3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6 addresses, from François CACHEREUL. 4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan Carpenter. 5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric Dumazet. 6) SFC driver bug fixes from Ben Hutchings. 7) Fix TX packet scheduling wedge after channel change in ath9k driver, from Felix Fietkau. 8) Fix user after free in BPF JIT code, from Alexei Starovoitov. 9) Source address selection test is reversed in __ip_route_output_key(), fix from Jiri Benc. 10) VLAN and CAN layer mis-size netlink attributes, from Marc Kleine-Budde. 11) Fix permission checks in sysctls to use current_euid() instead of current_uid(). From Eric W Biederman. 12) IPSEC policies can go away while a timer is still pending for them, add appropriate ref-counting to fix, from Steffen Klassert. 13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth chips, from Nguyen Hong Ky and Simon Horman. 14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai. 15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama Ghorbel. 16) Fix typo in fq_change(), we were assigning "initial quantum" to "quantum". From Eric Dumazet. 17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise FQ packet scheduler does not pace those flows properly. Also from Eric Dumazet. 18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland. 19) l2tp_xmit_skb() can be called from process context, not just softirq context, so we must always make sure to BH disable around it. From Eric Dumazet. 20) On qdisc reset, we forget to purge the RB tree of SKBs in netem packet scheduler. From Stephen Hemminger. 21) Fix info leak in farsync WAN driver ioctl() handler, from Dan Carpenter and Salva Peiró. 22) Fix PHY reset and other issues in dm9000 driver, from Nikita Kiryanov and Michael Abbott. 23) When hardware can do SCTP crc32 checksums, we accidently don't disable the csum offload when IPSEC transformations have been applied. From Fan Du and Vlad Yasevich. 24) Tail loss probing in TCP leaves the socket in the wrong congestion avoidance state. From Yuchung Cheng. 25) In CPSW driver, enable NAPI before interrupts are turned on, from Markus Pargmann. 26) Integer underflow and dual-assignment in YAM hamradio driver, from Dan Carpenter. 27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must unclone it. This fixes various hard to track down crashes in drivers where the SKBs ->gso_segs was changing right from underneath the driver during TX queueing. From Eric Dumazet. 28) Fix the handling of VLAN IDs, and in particular the special IDs 0 and 4095, in the bridging layer. From Toshiaki Makita. 29) Another info leak, this time in wanxl WAN driver, from Salva Peiró. 30) Fix race in socket credential passing, from Daniel Borkmann. 31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly, from Seif Mazareeb. 32) Fix identification of fragmented frames in ipv4/ipv6 UDP Fragmentation Offload output paths, from Jiri Pirko. 33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel Elior. 34) When we removed the explicit neighbour pointer from ipv6 routes a slight regression was introduced for users such as IPVS, xt_TEE, and raw sockets. We mix up the users requested destination address with the routes assigned nexthop/gateway. From Julian Anastasov and Simon Horman. 35) Fix stack overruns in rt6_probe(), the issue is that can end up doing two full packet xmit paths at the same time when emitting neighbour discovery messages. From Hannes Frederic Sowa. 36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from Mariusz Ceier. 37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT sample, from Neal Cardwell. 38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from Steffen Klassert. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits) ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter ax88179_178a: Correct the RX error definition in RX header Revert "bridge: only expire the mdb entry when query is received" tcp: initialize passive-side sk_pacing_rate after 3WHS davinci_emac.c: Fix IFF_ALLMULTI setup mac802154: correct a typo in ieee802154_alloc_device() prototype ipv6: probe routes asynchronous in rt6_probe netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper ipv6: fill rt6i_gateway with nexthop address ipv6: always prefer rt6i_gateway if present bnx2x: Set NETIF_F_HIGHDMA unconditionally bnx2x: Don't pretend during register dump bnx2x: Lock DMAE when used by statistic flow bnx2x: Prevent null pointer dereference on error flow bnx2x: Fix config when SR-IOV and iSCSI are enabled bnx2x: Fix Coalescing configuration bnx2x: Unlock VF-PF channel on MAC/VLAN config error bnx2x: Prevent an illegal pointer dereference during panic bnx2x: Fix Maximum CoS estimation for VFs drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing ...
2013-10-19Merge 3.12-rc6 into driver-core-nextGreg Kroah-Hartman12-61/+169
We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-10-18powerpc/booke: clear DBCR0_BT in user_disable_single_step()James Yang1-1/+1
BookE version of user_disable_single_step() clears DBCR0_IC for the instruction completion debug, but did not also clear DBCR0_BT for the branch taken exception. This behavior was lost by the 2/2010 patch. Signed-off-by: James Yang <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2013-10-18powerpc: export debug registers save function for KVMBharat Bhushan2-1/+3
KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan <[email protected]> Acked-by: Michael Neuling <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2013-10-18powerpc: move debug registers in a structureBharat Bhushan8-139/+144
This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan <[email protected]> Acked-by: Michael Neuling <[email protected]> [[email protected]: removed obvious debug_reg comment] Signed-off-by: Scott Wood <[email protected]>
2013-10-18powerpc: remove unnecessary line continuationsBharat Bhushan1-1/+1
Signed-off-by: Bharat Bhushan <[email protected]> Acked-by: Michael Neuling <[email protected]> Signed-off-by: Scott Wood <[email protected]>
2013-10-18powerpc/kgdb: use DEFINE_PER_CPU to allocate kgdb's thread_infoTiejun Chen1-3/+3
Use DEFINE_PER_CPU to allocate thread_info statically instead of kmalloc(). This can avoid introducing more memory check codes. Signed-off-by: Tiejun Chen <[email protected]> [[email protected]: wrapped long line] Signed-off-by: Scott Wood <[email protected]>
2013-10-17kvm: powerpc: book3s: drop is_hv_enabledAneesh Kumar K.V6-8/+10
drop is_hv_enabled, because that should not be a callback property Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: Allow the HV and PR selection per virtual machineAneesh Kumar K.V13-91/+183
This moves the kvmppc_ops callbacks to be a per VM entity. This enables us to select HV and PR mode when creating a VM. We also allow both kvm-hv and kvm-pr kernel module to be loaded. To achieve this we move /dev/kvm ownership to kvm.ko module. Depending on which KVM mode we select during VM creation we take a reference count on respective module Signed-off-by: Aneesh Kumar K.V <[email protected]> [agraf: fix coding style] Signed-off-by: Alexander Graf <[email protected]>
2013-10-17Powerpc KVM work is based on a commit after rc4.Gleb Natapov26-153/+314
Merging master into next to satisfy the dependencies. Conflicts: arch/arm/kvm/reset.c
2013-10-17kvm: Add struct kvm arg to memslot APIsAneesh Kumar K.V4-10/+13
We will use that in the later patch to find the kvm ops handler Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: Support building HV and PR KVM as moduleAneesh Kumar K.V10-9/+42
Signed-off-by: Aneesh Kumar K.V <[email protected]> [agraf: squash in compile fix] Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: booke: Move booke related tracepoints to separate headerAneesh Kumar K.V5-207/+183
Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: pr: move PR related tracepoints to a separate headerAneesh Kumar K.V5-230/+309
This patch moves PR related tracepoints to a separate header. This enables in converting PR to a kernel module which will be done in later patches Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_opsAneesh Kumar K.V7-81/+79
This help us to identify whether we are running with hypervisor mode KVM enabled. The change is needed so that we can have both HV and PR kvm enabled in the same kernel. If both HV and PR KVM are included, interrupts come in to the HV version of the kvmppc_interrupt code, which then jumps to the PR handler, renamed to kvmppc_interrupt_pr, if the guest is a PR guest. Allowing both PR and HV in the same kernel required some changes to kvm_dev_ioctl_check_extension(), since the values returned now can't be selected with #ifdefs as much as previously. We look at is_hv_enabled to return the right value when checking for capabilities.For capabilities that are only provided by HV KVM, we return the HV value only if is_hv_enabled is true. For capabilities provided by PR KVM but not HV, we return the PR value only if is_hv_enabled is false. NOTE: in later patch we replace is_hv_enabled with a static inline function comparing kvm_ppc_ops Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: Cleanup interrupt handling codeAneesh Kumar K.V3-4/+20
With this patch if HV is included, interrupts come in to the HV version of the kvmppc_interrupt code, which then jumps to the PR handler, renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps in enabling both HV and PR, which we do in later patch Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: Add kvmppc_ops callbackAneesh Kumar K.V24-287/+748
This patch add a new callback kvmppc_ops. This will help us in enabling both HV and PR KVM together in the same kernel. The actual change to enable them together is done in the later patch in the series. Signed-off-by: Aneesh Kumar K.V <[email protected]> [agraf: squash in booke changes] Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLEAneesh Kumar K.V10-24/+43
This help ups to select the relevant code in the kernel code when we later move HV and PR bits as seperate modules. The patch also makes the config options for PR KVM selectable Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLEAneesh Kumar K.V9-16/+16
With later patches supporting PR kvm as a kernel module, the changes that has to be built into the main kernel binary to enable PR KVM module is now selected via KVM_BOOK3S_PR_POSSIBLE Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: move book3s_64_vio_hv.c into the main kernel binaryPaul Mackerras2-4/+9
Since the code in book3s_64_vio_hv.c is called from real mode with HV KVM, and therefore has to be built into the main kernel binary, this makes it always built-in rather than part of the KVM module. It gets called from the KVM module by PR KVM, so this adds an EXPORT_SYMBOL_GPL(). Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s: remove kvmppc_handler_highmem labelPaul Mackerras2-6/+0
This label is not used now. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: E500: Add userspace debug stub supportBharat Bhushan4-18/+231
This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running -> thread->debug_reg == QEMU debug register context. -> Kernel will handle switching the debug register on context switch. -> no vcpu_load() called QEMU makes ioctls (except RUN) -> This will call vcpu_load() -> should not change context. -> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs QEMU Makes RUN ioctl -> Save thread->debug_reg on STACK -> Store thread->debug_reg == vcpu->debug_reg -> load thread->debug_reg -> RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running -> makes vcpu_load() should not load any context -> kernel loads the vcpu context as thread->debug_regs points to vcpu context. On heavyweight_exit -> Load the context saved on stack in thread->debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: E500: Using "struct debug_reg"Bharat Bhushan2-22/+25
For KVM also use the "struct debug_reg" defined in asm/processor.h Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: E500: exit to user space on "ehpriv 1" instructionBharat Bhushan5-6/+54
"ehpriv 1" instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with "run->debug" have relevant information. As this is the first point we are using run->debug, also defined the run->debug structure. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17powerpc: export debug registers save function for KVMBharat Bhushan2-1/+3
KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan <[email protected]> Acked-by: Michael Neuling <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17powerpc: move debug registers in a structureBharat Bhushan8-140/+147
This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17powerpc: remove unnecessary line continuationsBharat Bhushan1-1/+1
Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: e500: mark page accessed when mapping a guest pageBharat Bhushan1-0/+3
Mark the guest page as accessed so that there is likely less chances of this page getting swap-out. Signed-off-by: Bharat Bhushan <[email protected]> Acked-by: Scott Wood <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: allow guest control "G" attribute in mas2Bharat Bhushan1-1/+1
"G" bit in MAS2 indicates whether the page is Guarded. There is no reason to stop guest setting "G", so allow him. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: allow guest control "E" attribute in mas2Bharat Bhushan1-1/+1
"E" bit in MAS2 bit indicates whether the page is accessed in Little-Endian or Big-Endian byte order. There is no reason to stop guest setting "E", so allow him." Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIANBharat Bhushan1-1/+1
For booke3e _PAGE_ENDIAN is not defined. Infact what is defined is "_PAGE_LENDIAN" which is wrong and that should be _PAGE_ENDIAN. There are no compilation errors as arch/powerpc/include/asm/pte-common.h defines _PAGE_ENDIAN to 0 as it is not defined anywhere. Signed-off-by: Bharat Bhushan <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S HV: Better handling of exceptions that happen in real modePaul Mackerras2-6/+27
When an interrupt or exception happens in the guest that comes to the host, the CPU goes to hypervisor real mode (MMU off) to handle the exception but doesn't change the MMU context. After saving a few registers, we then clear the "in guest" flag. If, for any reason, we get an exception in the real-mode code, that then gets handled by the normal kernel exception handlers, which turn the MMU on. This is disastrous if the MMU is still set to the guest context, since we end up executing instructions from random places in the guest kernel with hypervisor privilege. In order to catch this situation, we define a new value for the "in guest" flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real mode with guest MMU context. If the "in guest" flag is set to this value, we branch off to an emergency handler. For the moment, this just does a branch to self to stop the CPU from doing anything further. While we're here, we define another new flag value to indicate that we are in a HV guest, as distinct from a PR guest. This will be useful when we have a kernel that can support both PR and HV guests concurrently. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17kvm: powerpc: book3s hv: Fix vcore leakPaul Mackerras1-0/+10
add kvmppc_free_vcores() to free the kvmppc_vcore structures that we allocate for a guest, which are currently being leaked. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiersPaul Mackerras1-8/+32
Currently, whenever any of the MMU notifier callbacks get called, we invalidate all the shadow PTEs. This is inefficient because it means that we typically then get a lot of DSIs and ISIs in the guest to fault the shadow PTEs back in. We do this even if the address range being notified doesn't correspond to guest memory. This commit adds code to scan the memslot array to find out what range(s) of guest physical addresses corresponds to the host virtual address range being affected. For each such range we flush only the shadow PTEs for the range, on all cpus. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Mark pages accessed, and dirty if being writtenPaul Mackerras1-11/+15
The mark_page_dirty() function, despite what its name might suggest, doesn't actually mark the page as dirty as far as the MM subsystem is concerned. It merely sets a bit in KVM's map of dirty pages, if userspace has requested dirty tracking for the relevant memslot. To tell the MM subsystem that the page is dirty, we have to call kvm_set_pfn_dirty() (or an equivalent such as SetPageDirty()). This adds a call to kvm_set_pfn_dirty(), and while we are here, also adds a call to kvm_set_pfn_accessed() to tell the MM subsystem that the page has been accessed. Since we are now using the pfn in several places, this adds a 'pfn' variable to store it and changes the places that used hpaddr >> PAGE_SHIFT to use pfn instead, which is the same thing. This also changes a use of HPTE_R_PP to PP_RXRX. Both are 3, but PP_RXRX is more informative as being the read-only page permission bit setting. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page()Paul Mackerras3-13/+39
When the MM code is invalidating a range of pages, it calls the KVM kvm_mmu_notifier_invalidate_range_start() notifier function, which calls kvm_unmap_hva_range(), which arranges to flush all the existing host HPTEs for guest pages. However, the Linux PTEs for the range being flushed are still valid at that point. We are not supposed to establish any new references to pages in the range until the ...range_end() notifier gets called. The PPC-specific KVM code doesn't get any explicit notification of that; instead, we are supposed to use mmu_notifier_retry() to test whether we are or have been inside a range flush notifier pair while we have been getting a page and instantiating a host HPTE for the page. This therefore adds a call to mmu_notifier_retry inside kvmppc_mmu_map_page(). This call is inside a region locked with kvm->mmu_lock, which is the same lock that is called by the KVM MMU notifier functions, thus ensuring that no new notification can proceed while we are in the locked region. Inside this region we also create the host HPTE and link the corresponding hpte_cache structure into the lists used to find it later. We cannot allocate the hpte_cache structure inside this locked region because that can lead to deadlock, so we allocate it outside the region and free it if we end up not using it. This also moves the updates of vcpu3s->hpte_cache_count inside the regions locked with vcpu3s->mmu_lock, and does the increment in kvmppc_mmu_hpte_cache_map() when the pte is added to the cache rather than when it is allocated, in order that the hpte_cache_count is accurate. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Better handling of host-side read-only pagesPaul Mackerras9-40/+91
Currently we request write access to all pages that get mapped into the guest, even if the guest is only loading from the page. This reduces the effectiveness of KSM because it means that we unshare every page we access. Also, we always set the changed (C) bit in the guest HPTE if it allows writing, even for a guest load. This fixes both these problems. We pass an 'iswrite' flag to the mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether the access is a load or a store. The mmu.xlate() functions now only set C for stores. kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot() instead of gfn_to_pfn() so that it can indicate whether we need write access to the page, and get back a 'writable' flag to indicate whether the page is writable or not. If that 'writable' flag is clear, we then make the host HPTE read-only even if the guest HPTE allowed writing. This means that we can get a protection fault when the guest writes to a page that it has mapped read-write but which is read-only on the host side (perhaps due to KSM having merged the page). Thus we now call kvmppc_handle_pagefault() for protection faults as well as HPTE not found faults. In kvmppc_handle_pagefault(), if the access was allowed by the guest HPTE and we thus need to install a new host HPTE, we then need to remove the old host HPTE if there is one. This is done with a new function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to find and remove the old host HPTE. Since the memslot-related functions require the KVM SRCU read lock to be held, this adds srcu_read_lock/unlock pairs around the calls to kvmppc_handle_pagefault(). Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore guest HPTEs that don't permit access, and to return -EPERM for accesses that are not permitted by the page protections. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S: Move skip-interrupt handlers to common codePaul Mackerras3-50/+26
Both PR and HV KVM have separate, identical copies of the kvmppc_skip_interrupt and kvmppc_skip_Hinterrupt handlers that are used for the situation where an interrupt happens when loading the instruction that caused an exit from the guest. To eliminate this duplication and make it easier to compile in both PR and HV KVM, this moves this code to arch/powerpc/kernel/exceptions-64s.S along with other kernel interrupt handler code. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cachePaul Mackerras6-26/+39
This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache rather than having them embedded in the kvmppc_vcpu_book3s struct, which is allocated with vzalloc. The reason is to reduce the differences between PR and HV KVM in order to make is easier to have them coexist in one kernel binary. With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s struct. The pointer to the kvmppc_book3s_shadow_vcpu struct has moved from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct, and is only present for 32-bit, since it is only used for 32-bit. Signed-off-by: Paul Mackerras <[email protected]> [agraf: squash in compile fix from Aneesh] Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safePaul Mackerras5-34/+72
This adds a per-VM mutex to provide mutual exclusion between vcpus for accesses to and updates of the guest hashed page table (HPT). This also makes the code use single-byte writes to the HPT entry when updating of the reference (R) and change (C) bits. The reason for doing this, rather than writing back the whole HPTE, is that on non-PAPR virtual machines, the guest OS might be writing to the HPTE concurrently, and writing back the whole HPTE might conflict with that. Also, real hardware does single-byte writes to update R and C. The new mutex is taken in kvmppc_mmu_book3s_64_xlate() when reading the HPT and updating R and/or C, and in the PAPR HPT update hcalls (H_ENTER, H_REMOVE, etc.). Having the mutex means that we don't need to use a hypervisor lock bit in the HPT update hcalls, and we don't need to be careful about the order in which the bytes of the HPTE are updated by those hcalls. The other change here is to make emulated TLB invalidations (tlbie) effective across all vcpus. To do this we call kvmppc_mmu_pte_vflush for all vcpus in kvmppc_ppc_book3s_64_tlbie(). For 32-bit, this makes the setting of the accessed and dirty bits use single-byte writes, and makes tlbie invalidate shadow HPTEs for all vcpus. With this, PR KVM can successfully run SMP guests. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Correct errors in H_ENTER implementationPaul Mackerras1-5/+14
The implementation of H_ENTER in PR KVM has some errors: * With H_EXACT not set, if the HPTEG is full, we return H_PTEG_FULL as the return value of kvmppc_h_pr_enter, but the caller is expecting one of the EMULATE_* values. The H_PTEG_FULL needs to go in the guest's R3 instead. * With H_EXACT set, if the selected HPTE is already valid, the H_ENTER call should return a H_PTEG_FULL error. This fixes these errors and also makes it write only the selected HPTE, not the whole group, since only the selected HPTE has been modified. This also micro-optimizes the calculations involving pte_index and i. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Handle PP0 page-protection bit in guest HPTEsPaul Mackerras1-0/+3
64-bit POWER processors have a three-bit field for page protection in the hashed page table entry (HPTE). Currently we only interpret the two bits that were present in older versions of the architecture. The only defined combination that has the new bit set is 110, meaning read-only for supervisor and no access for user mode. This adds code to kvmppc_mmu_book3s_64_xlate() to interpret the extra bit appropriately. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Use 64k host pages where possiblePaul Mackerras5-13/+57
Currently, PR KVM uses 4k pages for the host-side mappings of guest memory, regardless of the host page size. When the host page size is 64kB, we might as well use 64k host page mappings for guest mappings of 64kB and larger pages and for guest real-mode mappings. However, the magic page has to remain a 4k page. To implement this, we first add another flag bit to the guest VSID values we use, to indicate that this segment is one where host pages should be mapped using 64k pages. For segments with this bit set we set the bits in the shadow SLB entry to indicate a 64k base page size. When faulting in host HPTEs for this segment, we make them 64k HPTEs instead of 4k. We record the pagesize in struct hpte_cache for use when invalidating the HPTE. For now we restrict the segment containing the magic page (if any) to 4k pages. It should be possible to lift this restriction in future by ensuring that the magic 4k page is appropriately positioned within a host 64k page. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>
2013-10-17KVM: PPC: Book3S PR: Allow guest to use 64k pagesPaul Mackerras6-15/+197
This adds the code to interpret 64k HPTEs in the guest hashed page table (HPT), 64k SLB entries, and to tell the guest about 64k pages in kvm_vm_ioctl_get_smmu_info(). Guest 64k pages are still shadowed by 4k pages. This also adds another hash table to the four we have already in book3s_mmu_hpte.c to allow us to find all the PTEs that we have instantiated that match a given 64k guest page. The tlbie instruction changed starting with POWER6 to use a bit in the RB operand to indicate large page invalidations, and to use other RB bits to indicate the base and actual page sizes and the segment size. 64k pages came in slightly earlier, with POWER5++. We use one bit in vcpu->arch.hflags to indicate that the emulated cpu supports 64k pages, and another to indicate that it has the new tlbie definition. The KVM_PPC_GET_SMMU_INFO ioctl presents a bit of a problem, because the MMU capabilities depend on which CPU model we're emulating, but it is a VM ioctl not a VCPU ioctl and therefore doesn't get passed a VCPU fd. In addition, commonly-used userspace (QEMU) calls it before setting the PVR for any VCPU. Therefore, as a best effort we look at the first vcpu in the VM and return 64k pages or not depending on its capabilities. We also make the PVR default to the host PVR on recent CPUs that support 1TB segments (and therefore multiple page sizes as well) so that KVM_PPC_GET_SMMU_INFO will include 64k page and 1TB segment support on those CPUs. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Alexander Graf <[email protected]>