Age | Commit message (Collapse) | Author | Files | Lines |
|
Rewrite clear_user() on the same principle as memset(0), making use
of dcbz to clear complete cache lines.
This code is a copy/paste of memset(), with some modifications
in order to retrieve remaining number of bytes to be cleared,
as it needs to be returned in case of error.
On the same way as done on PPC64 in commit 17968fbbd19f1
("powerpc: 64bit optimised __clear_user"), the patch moves
__clear_user() into a dedicated file string_32.S
On a MPC885, throughput is almost doubled:
Before:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s
After:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s
On a MPC8321, throughput is multiplied by 2.12:
Before:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s
After:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The generic csum_ipv6_magic() generates a pretty bad result
00000000 <csum_ipv6_magic>: (PPC32)
0: 81 23 00 00 lwz r9,0(r3)
4: 81 03 00 04 lwz r8,4(r3)
8: 7c e7 4a 14 add r7,r7,r9
c: 7d 29 38 10 subfc r9,r9,r7
10: 7d 4a 51 10 subfe r10,r10,r10
14: 7d 27 42 14 add r9,r7,r8
18: 7d 2a 48 50 subf r9,r10,r9
1c: 80 e3 00 08 lwz r7,8(r3)
20: 7d 08 48 10 subfc r8,r8,r9
24: 7d 4a 51 10 subfe r10,r10,r10
28: 7d 29 3a 14 add r9,r9,r7
2c: 81 03 00 0c lwz r8,12(r3)
30: 7d 2a 48 50 subf r9,r10,r9
34: 7c e7 48 10 subfc r7,r7,r9
38: 7d 4a 51 10 subfe r10,r10,r10
3c: 7d 29 42 14 add r9,r9,r8
40: 7d 2a 48 50 subf r9,r10,r9
44: 80 e4 00 00 lwz r7,0(r4)
48: 7d 08 48 10 subfc r8,r8,r9
4c: 7d 4a 51 10 subfe r10,r10,r10
50: 7d 29 3a 14 add r9,r9,r7
54: 7d 2a 48 50 subf r9,r10,r9
58: 81 04 00 04 lwz r8,4(r4)
5c: 7c e7 48 10 subfc r7,r7,r9
60: 7d 4a 51 10 subfe r10,r10,r10
64: 7d 29 42 14 add r9,r9,r8
68: 7d 2a 48 50 subf r9,r10,r9
6c: 80 e4 00 08 lwz r7,8(r4)
70: 7d 08 48 10 subfc r8,r8,r9
74: 7d 4a 51 10 subfe r10,r10,r10
78: 7d 29 3a 14 add r9,r9,r7
7c: 7d 2a 48 50 subf r9,r10,r9
80: 81 04 00 0c lwz r8,12(r4)
84: 7c e7 48 10 subfc r7,r7,r9
88: 7d 4a 51 10 subfe r10,r10,r10
8c: 7d 29 42 14 add r9,r9,r8
90: 7d 2a 48 50 subf r9,r10,r9
94: 7d 08 48 10 subfc r8,r8,r9
98: 7d 4a 51 10 subfe r10,r10,r10
9c: 7d 29 2a 14 add r9,r9,r5
a0: 7d 2a 48 50 subf r9,r10,r9
a4: 7c a5 48 10 subfc r5,r5,r9
a8: 7c 63 19 10 subfe r3,r3,r3
ac: 7d 29 32 14 add r9,r9,r6
b0: 7d 23 48 50 subf r9,r3,r9
b4: 7c c6 48 10 subfc r6,r6,r9
b8: 7c 63 19 10 subfe r3,r3,r3
bc: 7c 63 48 50 subf r3,r3,r9
c0: 54 6a 80 3e rotlwi r10,r3,16
c4: 7c 63 52 14 add r3,r3,r10
c8: 7c 63 18 f8 not r3,r3
cc: 54 63 84 3e rlwinm r3,r3,16,16,31
d0: 4e 80 00 20 blr
0000000000000000 <.csum_ipv6_magic>: (PPC64)
0: 81 23 00 00 lwz r9,0(r3)
4: 80 03 00 04 lwz r0,4(r3)
8: 81 63 00 08 lwz r11,8(r3)
c: 7c e7 4a 14 add r7,r7,r9
10: 7f 89 38 40 cmplw cr7,r9,r7
14: 7d 47 02 14 add r10,r7,r0
18: 7d 30 10 26 mfocrf r9,1
1c: 55 29 f7 fe rlwinm r9,r9,30,31,31
20: 7d 4a 4a 14 add r10,r10,r9
24: 7f 80 50 40 cmplw cr7,r0,r10
28: 7d 2a 5a 14 add r9,r10,r11
2c: 80 03 00 0c lwz r0,12(r3)
30: 81 44 00 00 lwz r10,0(r4)
34: 7d 10 10 26 mfocrf r8,1
38: 55 08 f7 fe rlwinm r8,r8,30,31,31
3c: 7d 29 42 14 add r9,r9,r8
40: 81 04 00 04 lwz r8,4(r4)
44: 7f 8b 48 40 cmplw cr7,r11,r9
48: 7d 29 02 14 add r9,r9,r0
4c: 7d 70 10 26 mfocrf r11,1
50: 55 6b f7 fe rlwinm r11,r11,30,31,31
54: 7d 29 5a 14 add r9,r9,r11
58: 7f 80 48 40 cmplw cr7,r0,r9
5c: 7d 29 52 14 add r9,r9,r10
60: 7c 10 10 26 mfocrf r0,1
64: 54 00 f7 fe rlwinm r0,r0,30,31,31
68: 7d 69 02 14 add r11,r9,r0
6c: 7f 8a 58 40 cmplw cr7,r10,r11
70: 7c 0b 42 14 add r0,r11,r8
74: 81 44 00 08 lwz r10,8(r4)
78: 7c f0 10 26 mfocrf r7,1
7c: 54 e7 f7 fe rlwinm r7,r7,30,31,31
80: 7c 00 3a 14 add r0,r0,r7
84: 7f 88 00 40 cmplw cr7,r8,r0
88: 7d 20 52 14 add r9,r0,r10
8c: 80 04 00 0c lwz r0,12(r4)
90: 7d 70 10 26 mfocrf r11,1
94: 55 6b f7 fe rlwinm r11,r11,30,31,31
98: 7d 29 5a 14 add r9,r9,r11
9c: 7f 8a 48 40 cmplw cr7,r10,r9
a0: 7d 29 02 14 add r9,r9,r0
a4: 7d 70 10 26 mfocrf r11,1
a8: 55 6b f7 fe rlwinm r11,r11,30,31,31
ac: 7d 29 5a 14 add r9,r9,r11
b0: 7f 80 48 40 cmplw cr7,r0,r9
b4: 7d 29 2a 14 add r9,r9,r5
b8: 7c 10 10 26 mfocrf r0,1
bc: 54 00 f7 fe rlwinm r0,r0,30,31,31
c0: 7d 29 02 14 add r9,r9,r0
c4: 7f 85 48 40 cmplw cr7,r5,r9
c8: 7c 09 32 14 add r0,r9,r6
cc: 7d 50 10 26 mfocrf r10,1
d0: 55 4a f7 fe rlwinm r10,r10,30,31,31
d4: 7c 00 52 14 add r0,r0,r10
d8: 7f 80 30 40 cmplw cr7,r0,r6
dc: 7d 30 10 26 mfocrf r9,1
e0: 55 29 ef fe rlwinm r9,r9,29,31,31
e4: 7c 09 02 14 add r0,r9,r0
e8: 54 03 80 3e rotlwi r3,r0,16
ec: 7c 03 02 14 add r0,r3,r0
f0: 7c 03 00 f8 not r3,r0
f4: 78 63 84 22 rldicl r3,r3,48,48
f8: 4e 80 00 20 blr
This patch implements it in assembly for both PPC32 and PPC64
Link: https://github.com/linuxppc/linux/issues/9
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Segher Boessenkool <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Improve __csum_partial by interleaving loads and adds.
On a 8xx, it brings neither improvement nor degradation.
On a 83xx, it brings a 25% improvement.
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Segher Boessenkool <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
commit 87a156fb18fe1 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nops
A simple benchmark on an 8xx calling 100000x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%
Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.
Fixes: 87a156fb18fe1 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
emulate_step() tests are failing if VSX is not supported or disabled.
emulate_step_test: lxvd2x : FAIL
emulate_step_test: stxvd2x : FAIL
If !CPU_FTR_VSX, emulate_step() failure is expected and testcase should
PASS with a valid justification. After patch:
emulate_step_test: lxvd2x : PASS (!CPU_FTR_VSX)
emulate_step_test: stxvd2x : PASS (!CPU_FTR_VSX)
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
emulate_step() is not checking runtime VSX feature flag before
emulating an instruction. This is causing kernel crash when kernel
is compiled with CONFIG_VSX=y but running on a machine where VSX
is not supported or disabled. Ex, while running emulate_step tests
on P6 machine:
Oops: Exception in kernel mode, sig: 4 [#1]
NIP [c000000000095c24] .load_vsrn+0x28/0x54
LR [c000000000094bdc] .emulate_loadstore+0x167c/0x17b0
Call Trace:
0x40fe240c7ae147ae (unreliable)
.emulate_loadstore+0x167c/0x17b0
.emulate_step+0x25c/0x5bc
.test_lxvd2x_stxvd2x+0x64/0x154
.test_emulate_step+0x38/0x4c
.do_one_initcall+0x5c/0x2c0
.kernel_init_freeable+0x314/0x4cc
.kernel_init+0x24/0x160
.ret_from_kernel_thread+0x58/0xb4
With fix:
emulate_step_test: lxvd2x : FAIL
emulate_step_test: stxvd2x : FAIL
Reported-by: Michael Ellerman <[email protected]>
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Replace 'op->type & INSTR_TYPE_MASK' expression with GETTYPE(op->type)
macro.
Signed-off-by: Ravi Bangoria <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Note that unlike RFI which is patched only in kernel the nospec state
reflects settings at the time the module was loaded.
Iterating all modules and re-patching every time the settings change
is not implemented.
Based on lwsync patching.
Signed-off-by: Michal Suchanek <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Based on the RFI patching. This is required to be able to disable the
speculation barrier.
Only one barrier type is supported and it does nothing when the
firmware does not enable it. Also re-patching modules is not supported
So the only meaningful thing that can be done is patching out the
speculation barrier at boot when the user says it is not wanted.
Signed-off-by: Michal Suchanek <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Some functions prototypes were missing for the non-altivec code. Add the
missing prototypes in a new header file, fix warnings treated as errors
with W=1:
arch/powerpc/lib/xor_vmx_glue.c:18:6: error: no previous prototype for ‘xor_altivec_2’ [-Werror=missing-prototypes]
arch/powerpc/lib/xor_vmx_glue.c:29:6: error: no previous prototype for ‘xor_altivec_3’ [-Werror=missing-prototypes]
arch/powerpc/lib/xor_vmx_glue.c:40:6: error: no previous prototype for ‘xor_altivec_4’ [-Werror=missing-prototypes]
arch/powerpc/lib/xor_vmx_glue.c:52:6: error: no previous prototype for ‘xor_altivec_5’ [-Werror=missing-prototypes]
The prototypes were already present in <asm/xor.h> but this header file is
meant to be included after <include/linux/raid/xor.h>. Trying to re-use
<asm/xor.h> directly would lead to warnings such as:
arch/powerpc/include/asm/xor.h:39:15: error: variable ‘xor_block_altivec’ has initializer but incomplete type
Trying to re-use <asm/xor.h> after <include/linux/raid/xor.h> in
xor_vmx_glue.c would in turn trigger the following warnings:
include/asm-generic/xor.h:688:34: error: ‘xor_block_32regs’ defined but not used [-Werror=unused-variable]
Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
On some CPUs we can prevent a vulnerability related to store-to-load
forwarding by preventing store forwarding between privilege domains,
by inserting a barrier in kernel entry and exit paths.
This is known to be the case on at least Power7, Power8 and Power9
powerpc CPUs.
Barriers must be inserted generally before the first load after moving
to a higher privilege, and after the last store before moving to a
lower privilege, HV and PR privilege transitions must be protected.
Barriers are added as patch sections, with all kernel/hypervisor entry
points patched, and the exit points to lower privilge levels patched
similarly to the RFI flush patching.
Firmware advertisement is not implemented yet, so CPU flush types
are hard coded.
Thanks to Michal Suchánek for bug fixes and review.
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Michal Suchánek <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
My powerpc-linux-gnu-gcc v4.4.5 compiler can't build a 32-bit kernel
any more:
arch/powerpc/lib/sstep.c: In function 'do_popcnt':
arch/powerpc/lib/sstep.c:1068: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1069: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1069: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1070: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c:1079: error: integer constant is too large for 'long' type
arch/powerpc/lib/sstep.c: In function 'do_prty':
arch/powerpc/lib/sstep.c:1117: error: integer constant is too large for 'long' type
This file gets compiled with -std=gnu89 which means a constant can be
given the type 'long' even if it won't fit. Fix the errors with a 'ULL'
suffix on the relevant constants.
Fixes: 2c979c489fee ("powerpc/lib/sstep: Add prty instruction emulation")
Fixes: dcbd19b48d31 ("powerpc/lib/sstep: Add popcnt instruction emulation")
Signed-off-by: Finn Thain <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Add a test of the relative branch patching logic in the alternate
section feature fixup code. This tests that if we branch past the last
instruction of the alternate section, the branch is not patched.
That's because the assembler will have created a branch that already
points to the first instruction after the patched section, which is
correct and needs no further patching.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We want this to remain the last test (because it's disabled by
default), so give it a non-numbered name so we don't have to renumber
it when adding new tests before it.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The code patching code has always been a bit confused about whether
it's best to use void *, unsigned int *, char *, etc. to point to
instructions. In fact in the feature fixups tests we use both unsigned
int[] and u8[] in different places.
Unfortunately the tests that use unsigned int[] calculate the size of
the code blocks using subtraction of those unsigned int pointers, and
then pass the result to memcmp(). This means we're only comparing 1/4
of the bytes we need to, because we need to multiply by
sizeof(unsigned int) to get the number of *bytes*.
The result is that the tests do all the patching and then only compare
some of the resulting code, so patching bugs that only effect that
last 3/4 of the code could slip through undetected. It turns out that
hasn't been happening, although one test had a bad expected case (see
previous commit).
Fix it for now by multiplying the size by 4 in the affected functions.
Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code")
Epic-brown-paper-bag-by: Michael Ellerman <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The expected case for this test was wrong, the source of the alternate
code sequence is:
FTR_SECTION_ELSE
2: or 2,2,2
PPC_LCMPI r3,1
beq 3f
blt 2b
b 3f
b 1b
ALT_FTR_SECTION_END(0, 1)
3: or 1,1,1
or 2,2,2
4: or 3,3,3
So when it's patched the '3' label should still be on the 'or 1,1,1',
and the 4 label is irrelevant and can be removed.
Fixes: 362e7701fd18 ("powerpc: Add self-tests of the feature fixup code")
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When we patch an alternate feature section, we have to adjust any
relative branches that branch out of the alternate section.
But currently we have a bug if we have a branch that points to past
the last instruction of the alternate section, eg:
FTR_SECTION_ELSE
1: b 2f
or 6,6,6
2:
ALT_FTR_SECTION_END(...)
nop
This will result in a relative branch at 1 with a target that equals
the end of the alternate section.
That branch does not need adjusting when it's moved to the non-else
location. Currently we do adjust it, resulting in a branch that goes
off into the link-time location of the else section, which is junk.
The fix is to not patch branches that have a target == end of the
alternate section.
Fixes: d20fe50a7b3c ("KVM: PPC: Book3S HV: Branch inside feature section")
Fixes: 9b1a735de64c ("powerpc: Add logic to patch alternative feature sections")
Cc: [email protected] # v2.6.27+
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Rather than override the machine type in .S code (which can hide wrong
or ambiguous code generation for the target), set the type to power4
for all assembly.
This also means we need to be careful not to build power4-only code
when we're not building for Book3S, such as the "power7" versions of
copyuser/page/memcpy.
Signed-off-by: Nicholas Piggin <[email protected]>
[mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Currently the rfi-flush messages print 'Using <type> flush' for all
enabled_flush_types, but that is not necessarily true -- as now the
fallback flush is always enabled on pseries, but the fixup function
overwrites its nop/branch slot with other flush types, if available.
So, replace the 'Using <type> flush' messages with '<type> flush is
available'.
Also, print the patched flush types in the fixup function, so users
can know what is (not) being used (e.g., the slower, fallback flush,
or no flush type at all if flush is disabled via the debugfs switch).
Suggested-by: Michael Ellerman <[email protected]>
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The inline keyword was not at the beginning of the function declaration.
Fix the following warning (treated as error in W=1):
arch/powerpc/lib/sstep.c:283:1: error: ‘inline’ is not at beginning of declaration
static int nokprobe_inline copy_mem_in(u8 *dest, unsigned long ea, int nb,
arch/powerpc/lib/sstep.c:388:1: error: ‘inline’ is not at beginning of declaration
static int nokprobe_inline copy_mem_out(u8 *dest, unsigned long ea, int nb,
Signed-off-by: Mathieu Malaterre <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Merge our fixes branch from the 4.15 cycle.
Unusually the fixes branch saw some significant features merged,
notably the RFI flush patches, so we want the code in next to be
tested against that, to avoid any surprises when the two are merged.
There's also some other work on the panic handling that was reverted
in fixes and we now want to do properly in next, which would conflict.
And we also fix a few other minor merge conflicts.
|
|
feature fixups need to use patch_instruction() early in the boot,
even before the code is relocated to its final address, requiring
patch_instruction() to use PTRRELOC() in order to address data.
But feature fixups applies on code before it is set to read only,
even for modules. Therefore, feature fixups can use
raw_patch_instruction() instead.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
patch_instruction() uses almost the same sequence as
__patch_instruction()
This patch refactor it so that patch_instruction() uses
__patch_instruction() instead of duplicating code.
Signed-off-by: Christophe Leroy <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
On some CPUs we can prevent the Meltdown vulnerability by flushing the
L1-D cache on exit from kernel to user mode, and from hypervisor to
guest.
This is known to be the case on at least Power7, Power8 and Power9. At
this time we do not know the status of the vulnerability on other CPUs
such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
CPUs. As more information comes to light we can enable this, or other
mechanisms on those CPUs.
The vulnerability occurs when the load of an architecturally
inaccessible memory region (eg. userspace load of kernel memory) is
speculatively executed to the point where its result can influence the
address of a subsequent speculatively executed load.
In order for that to happen, the first load must hit in the L1,
because before the load is sent to the L2 the permission check is
performed. Therefore if no kernel addresses hit in the L1 the
vulnerability can not occur. We can ensure that is the case by
flushing the L1 whenever we return to userspace. Similarly for
hypervisor vs guest.
In order to flush the L1-D cache on exit, we add a section of nops at
each (h)rfi location that returns to a lower privileged context, and
patch that with some sequence. Newer firmwares are able to advertise
to us that there is a special nop instruction that flushes the L1-D.
If we do not see that advertised, we fall back to doing a displacement
flush in software.
For guest kernels we support migration between some CPU versions, and
different CPUs may use different flush instructions. So that we are
prepared to migrate to a machine with a different flush instruction
activated, we may have to patch more than one flush instruction at
boot if the hypervisor tells us to.
In the end this patch is mostly the work of Nicholas Piggin and
Michael Ellerman. However a cast of thousands contributed to analysis
of the issue, earlier versions of the patch, back ports testing etc.
Many thanks to all of them.
Tested-by: Jon Masters <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
When attempting to load a livepatch module, I got the following error:
module_64: patch_module: Expect noop after relocate, got 3c820000
The error was triggered by the following code in
unregister_netdevice_queue():
14c: 00 00 00 48 b 14c <unregister_netdevice_queue+0x14c>
14c: R_PPC64_REL24 net_set_todo
150: 00 00 82 3c addis r4,r2,0
GCC didn't insert a nop after the branch to net_set_todo() because it's
a sibling call, so it never returns. The nop isn't needed after the
branch in that case.
Signed-off-by: Josh Poimboeuf <[email protected]>
Acked-by: Naveen N. Rao <[email protected]>
Reviewed-and-tested-by: Kamalesh Babulal <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A small batch of fixes, about 50% tagged for stable and the rest for
recently merged code.
There's one more fix for the >128T handling on hash. Once a process
had requested a single mmap above 128T we would then always search
above 128T. The correct behaviour is to consider the hint address in
isolation for each mmap request.
Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in
VAS, a fix for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly
identify P9 DD2.1 but in code that is currently not used by default.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Madhavan Srinivasan,
Sukadev Bhattiprolu"
* tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features
powerpc/perf: Fix IMC_MAX_PMU macro
powerpc/perf: Fix pmu_count to count only nest imc pmus
powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX
powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id()
powerpc/vas: Export chip_to_vas_id()
powerpc/64s/slice: Use addr limit when computing slice mask
|
|
On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()
There is the following note in front of early_init():
* Note that the kernel may be running at an address which is different
* from the address that it was linked at, so we must use RELOC/PTRRELOC
* to access static data (including strings). -- paulus
Therefore, slab_is_available() cannot be called yet, and
text_poke_area must be addressed with PTRRELOC()
Fixes: 95902e6c8864 ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
Cc: [email protected] # v4.14+
Reported-by: Meelis Roos <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"A bit of a small release, I suspect in part due to me travelling for
KS. But my backlog of patches to review is smaller than usual, so I
think in part folks just didn't send as much this cycle.
Non-highlights:
- Five fixes for the >128T address space handling, both to fix bugs
in our implementation and to bring the semantics exactly into line
with x86.
Highlights:
- Support for a new OPAL call on bare metal machines which gives us a
true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
- Support for Power9 DD2 in the CXL driver.
- Improvements to machine check handling so that uncorrectable errors
can be reported into the generic memory_failure() machinery.
- Some fixes and improvements for VPHN, which is used under PowerVM
to notify the Linux partition of topology changes.
- Plumbing to enable TM (transactional memory) without suspend on
some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
- Support for emulating vector loads form cache-inhibited memory, on
some Power9 revisions.
- Disable the fast-endian switch "syscall" by default (behind a
CONFIG), we believe it has never had any users.
- A major rework of the API drivers use when initiating and waiting
for long running operations performed by OPAL firmware, and changes
to the powernv_flash driver to use the new API.
- Several fixes for the handling of FP/VMX/VSX while processes are
using transactional memory.
- Optimisations of TLB range flushes when using the radix MMU on
Power9.
- Improvements to the VAS facility used to access coprocessors on
Power9, and related improvements to the way the NX crypto driver
handles requests.
- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
Kennington III"
* tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
powerpc/64s: Fix masking of SRR1 bits on instruction fault
powerpc/64s: mm_context.addr_limit is only used on hash
powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
powerpc/64s/hash: Fix fork() with 512TB process address space
powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Fix 512T hint detection to use >= 128T
powerpc: Fix DABR match on hash based systems
powerpc/signal: Properly handle return value from uprobe_deny_signal()
powerpc/fadump: use kstrtoint to handle sysfs store
powerpc/lib: Implement UACCESS_FLUSHCACHE API
powerpc/lib: Implement PMEM API
powerpc/powernv/npu: Don't explicitly flush nmmu tlb
powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
powerpc/powernv/idle: Round up latency and residency values
powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
...
|
|
Implement the architecture specific portitions of the UACCESS_FLUSHCACHE
API. This provides functions for the copy_user_flushcache iterator that
ensure that when the copy is finished the destination buffer contains
a copy of the original and that the destination buffer is clean in the
processor caches.
Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Implement the architecture specific cache maintence functions that make
up the "PMEM API". Currently the writeback and invalidate functions
are the same since the function of the DCBST (data cache block store)
instruction is typically interpreted as "writeback to the point of
coherency" rather than to memory. As a result implementing the API
requires a full cache flush rather than just a cache write back. This
will probably change in the not-too-distant future.
Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't
modify *regs") introduced emulate_update_regs() to perform part of what
emulate_step() was doing earlier. However, this function was not added
to the kprobes blacklist. Add it so as to prevent it from being probed.
Signed-off-by: Naveen N. Rao <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We have some dependencies & conflicts between patches in fixes and
things to go in next, both in the radix TLB flush code and the IMC PMU
driver. So merge fixes into next.
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <[email protected]>
Reviewed-by: Philippe Ombredanne <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
According to the GCC documentation, the behaviour of __builtin_clz()
and __builtin_clzl() is undefined if the value of the input argument
is zero. Without handling this special case, these builtins have been
used for emulating the following instructions:
* Count Leading Zeros Word (cntlzw[.])
* Count Leading Zeros Doubleword (cntlzd[.])
This fixes the emulated behaviour of these instructions by adding an
additional check for this special case.
Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Sandipan Das <[email protected]>
Reviewed-by: Naveen N. Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This fixes the emulated behaviour of existing fixed-point shift right
algebraic instructions that are supposed to set both the CA and CA32
bits of XER when running on a system that is compliant with POWER ISA
v3.0 independent of whether the system is executing in 32-bit mode or
64-bit mode. The following instructions are affected:
* Shift Right Algebraic Word Immediate (srawi[.])
* Shift Right Algebraic Word (sraw[.])
* Shift Right Algebraic Doubleword Immediate (sradi[.])
* Shift Right Algebraic Doubleword (srad[.])
Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()")
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
There are existing fixed-point arithmetic instructions that always set the
CA bit of XER to reflect the carry out of bit 0 in 64-bit mode and out of
bit 32 in 32-bit mode. In ISA v3.0, these instructions also always set the
CA32 bit of XER to reflect the carry out of bit 32.
This fixes the emulated behaviour of such instructions when running on a
system that is compliant with POWER ISA v3.0. The following instructions
are affected:
* Add Immediate Carrying (addic)
* Add Immediate Carrying and Record (addic.)
* Subtract From Immediate Carrying (subfic)
* Add Carrying (addc[.])
* Subtract From Carrying (subfc[.])
* Add Extended (adde[.])
* Subtract From Extended (subfe[.])
* Add to Minus One Extended (addme[.])
* Subtract From Minus One Extended (subfme[.])
* Add to Zero Extended (addze[.])
* Subtract From Zero Extended (subfze[.])
Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()")
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This adds definitions for the OV32 and CA32 bits of XER that
were introduced in POWER ISA v3.0. There are some existing
instructions that currently set the OV and CA bits based on
certain conditions.
The emulation behaviour of all these instructions needs to
be updated to set these new bits accordingly.
Signed-off-by: Sandipan Das <[email protected]>
Acked-by: Naveen N. Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
mullw should do a 32 bit signed multiply and create a 64 bit signed
result. It currently truncates the result to 32 bits.
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
mcrf broke when we changed analyse_instr() to not modify the register
state. The instruction writes to the CR, so we need to store the result
in op->ccval, not op->val.
Fixes: 3cdfcbfd32b9 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
set_cr0() broke when we changed analyse_instr() to not modify the
register state. Instead of looking at regs->gpr[x] which has not
been updated yet, we need to look at op->val.
Fixes: 3cdfcbfd32b9 ("powerpc: Change analyse_instr so it doesn't modify *regs")
Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 350779a29f11 ("powerpc: Handle most loads and stores in
instruction emulation code", 2017-08-30) changed the register usage
in get_vr and put_vr with the aim of leaving the register number in
r3 untouched on return. Unfortunately, r6 was not a good choice, as
the callers as of 350779a29f11 store a MSR value in r6. Then, in
commit c22435a5f3d8 ("powerpc: Emulate FP/vector/VSX loads/stores
correctly when regs not live", 2017-08-30), the saving and restoring
of the MSR got moved into get_vr and put_vr. Either way, the effect
is that we put a value in MSR that only has the 0x3f8 bits non-zero,
meaning that we are switching to 32-bit mode. That leads to a crash
like this:
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x0007bea0
Oops: Kernel access of bad area, sig: 11 [#12]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: vmx_crypto binfmt_misc ip_tables x_tables autofs4 crc32c_vpmsum
CPU: 6 PID: 32659 Comm: trashy_testcase Tainted: G D 4.13.0-rc2-00313-gf3026f57e6ed-dirty #23
task: c000000f1bb9e780 task.stack: c000000f1ba98000
NIP: 000000000007bea0 LR: c00000000007b054 CTR: c00000000007be70
REGS: c000000f1ba9b960 TRAP: 0400 Tainted: G D (4.13.0-rc2-00313-gf3026f57e6ed-dirty)
MSR: 10000000400010a1 <HV,ME,IR,LE> CR: 48000228 XER: 00000000
CFAR: c00000000007be74 SOFTE: 1
GPR00: c00000000007b054 c000000f1ba9bbe0 c000000000e6e000 000000000000001d
GPR04: c000000f1ba9bc00 c00000000007be70 00000000000000e8 9000000002009033
GPR08: 0000000002000000 100000000282f033 000000000b0a0900 0000000000001009
GPR12: 0000000000000000 c00000000fd42100 0706050303020100 a5a5a5a5a5a5a5a5
GPR16: 2e2e2e2e2e2de70c 2e2e2e2e2e2e2e2d 0000000000ff00ff 0606040202020000
GPR20: 000000000000005b ffffffffffffffff 0000000003020100 0000000000000000
GPR24: c000000f1ab90020 c000000f1ba9bc00 0000000000000001 0000000000000001
GPR28: c000000f1ba9bc90 c000000f1ba9bea0 000000000b0a0908 0000000000000001
NIP [000000000007bea0] 0x7bea0
LR [c00000000007b054] emulate_loadstore+0x1044/0x1280
Call Trace:
[c000000f1ba9bbe0] [c000000000076b80] analyse_instr+0x60/0x34f0 (unreliable)
[c000000f1ba9bc70] [c00000000007b7ec] emulate_step+0x23c/0x544
[c000000f1ba9bce0] [c000000000053424] arch_uprobe_skip_sstep+0x24/0x40
[c000000f1ba9bd00] [c00000000024b2f8] uprobe_notify_resume+0x598/0xba0
[c000000f1ba9be00] [c00000000001c284] do_notify_resume+0xd4/0xf0
[c000000f1ba9be30] [c00000000000bd44] ret_from_except_lite+0x70/0x74
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace a7ae7a7f3e0256b5 ]---
To fix this, we just revert to using r3 as before, since the callers
don't rely on r3 being left unmodified.
Fortunately, this can't be triggered by a misaligned load or store,
because vector loads and stores truncate misaligned addresses rather
than taking an alignment interrupt. It can be triggered using
uprobes.
Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction emulation code")
Reported-by: Anton Blanchard <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
Tested-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Older compilers think val may be used uninitialized:
arch/powerpc/lib/sstep.c: In function 'emulate_loadstore':
arch/powerpc/lib/sstep.c:2758:23: error: 'val' may be used uninitialized in this function
We know better, but initialise val to 0 to avoid breaking the build.
Signed-off-by: Michael Ellerman <[email protected]>
|
|
memset() is patched after initialisation to activate the
optimised part which uses cache instructions.
Today we have a 'b 2f' to skip the optimised patch, which then gets
replaced by a NOP, implying a useless cycle consumption.
As we have a 'bne 2f' just before, we could use that instruction
for the live patching, hence removing the need to have a
dedicated 'b 2f' to be replaced by a NOP.
This patch changes the 'bne 2f' by a 'b 2f'. During init, that
'b 2f' is then replaced by 'bne 2f'
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
There is no need to extend the set value to an int when the length
is lower than 4 as in that case we only do byte stores.
We can therefore immediately branch to the part handling it.
By separating it from the normal case, we are able to eliminate
a few actions on the destination pointer.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 9445aa1a3062a ("ppc: move exports to definitions")
added EXPORT_SYMBOL() for memset() and flush_hash_pages() in
the middle of the functions.
This patch moves them at the end of the two functions.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Commit 694fc88ce271f ("powerpc/string: Implement optimized
memset variants") added memset16(), memset32() and memset64()
for the 64 bits PPC.
On 32 bits, memset64() is not relevant, and as shown below,
the generic version of memset32() gives a good code, so only
memset16() is candidate for an optimised version.
000009c0 <memset32>:
9c0: 2c 05 00 00 cmpwi r5,0
9c4: 39 23 ff fc addi r9,r3,-4
9c8: 4d 82 00 20 beqlr
9cc: 7c a9 03 a6 mtctr r5
9d0: 94 89 00 04 stwu r4,4(r9)
9d4: 42 00 ff fc bdnz 9d0 <memset32+0x10>
9d8: 4e 80 00 20 blr
The last part of memset() handling the not 4-bytes multiples
operates on bytes, making it unsuitable for handling word without
modification. As it would increase memset() complexity, it is
better to implement memset16() from scratch. In addition it
has the advantage of allowing a more optimised memset16() than what
we would have by using the memset() function.
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Michael Ellerman reported that emulate_loadstore() was trying to
access element 32 of regs->gpr[], which doesn't exist, when
emulating a string store instruction. This is because the string
load and store instructions (lswi, lswx, stswi and stswx) are
defined to wrap around from register 31 to register 0 if the number
of bytes being loaded or stored is sufficiently large. This wrapping
was not implemented in the emulation code. To fix it, we mask the
register number after incrementing it.
Reported-by: Michael Ellerman <[email protected]>
Fixes: c9f6f4ed95d4 ("powerpc: Implement emulation of string loads and stores")
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This adds emulation for the lfiwax, lfiwzx and stfiwx instructions.
This necessitated adding a new flag to indicate whether a floating
point or an integer conversion was needed for LOAD_FP and STORE_FP,
so this moves the size field in op->type up 4 bits.
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This replaces almost all of the instruction emulation code in
fix_alignment() with calls to analyse_instr(), emulate_loadstore()
and emulate_dcbz(). The only emulation code left is the SPE
emulation code; analyse_instr() etc. do not handle SPE instructions
at present.
One result of this is that we can now handle alignment faults on
all the new VSX load and store instructions that were added in POWER9.
VSX loads/stores will take alignment faults for unaligned accesses
to cache-inhibited memory.
Another effect is that we no longer rely on the DAR and DSISR values
set by the processor.
With this, we now need to include the instruction emulation code
unconditionally.
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
This moves the parts of emulate_step() that deal with emulating
load and store instructions into a new function called
emulate_loadstore(). This is to make it possible to reuse this
code in the alignment handler.
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|