Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.
I wrote a microbenchmark to test the performance gains of using
this new syscall:
http://ozlabs.org/~anton/junkcode/sendmmsg_test.c
The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.
64B UDP
batch pkts/sec
1 804570
2 872800 (+ 8 %)
4 916556 (+14 %)
8 939712 (+17 %)
16 952688 (+18 %)
32 956448 (+19 %)
64 964800 (+20 %)
64B raw socket
batch pkts/sec
1 1201449
2 1350028 (+12 %)
4 1461416 (+22 %)
8 1513080 (+26 %)
16 1541216 (+28 %)
32 1553440 (+29 %)
64 1557888 (+30 %)
We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.
[ Add sparc syscall entries. -DaveM ]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds support for page coalescing, which is a feature on IBM Power servers
which allows for coalescing identical pages between logical partitions.
Hint text pages as coalesce candidates, since they are the most likely
pages to be able to be coalesced between partitions. This patch also
exports some page coalescing statistics available from firmware via
lparcfg.
[BenH: Moved a couple of things around to fix compile problems]
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Adapt new API.
Almost change is trivial. Most important change is the below line
because we plan to change task->cpus_allowed implementation.
- ctx->cpus_allowed = current->cpus_allowed;
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
Address Register" (CFAR), that records the address of the most recent
branch or rfid (return from interrupt) instruction for debugging purposes.
This saves the value of the CFAR in the exception entry code and stores
it in the exception frame. We also make xmon print the CFAR value in
its register dump code.
Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
field, which is only used for system calls, and use it for the CFAR value
for all exceptions/interrupts other than system calls. This means we
don't save the CFAR on system calls, which is not a great problem since
system calls tend not to happen unexpectedly, and also avoids adding the
overhead of reading the CFAR to the system call entry path.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When we take an interrupt or exception from kernel mode and the stack
pointer is obviously not a kernel address (i.e. the top bit is 0), we
switch to an emergency stack, save register values and panic. However,
on 64-bit server machines, we don't actually save the values of r9 - r13
at the time of the interrupt, but rather values corrupted by the
exception entry code for r12-r13, and nothing at all for r9-r11.
This fixes it by passing a pointer to the register save area in the paca
through to the bad_stack code in r3. The register values are saved in
one of the paca register save areas (depending on which exception this
is). Using the pointer in r3, the bad_stack code now retrieves the
saved values of r9 - r13 and stores them in the exception frame on the
emergency stack. This also stores the normal exception frame marker
("regshere") in the exception frame.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Icswx is a PowerPC instruction to send data to a co-processor. On Book-S
processors the LPAR_ID and process ID (PID) of the owning process are
registered in the window context of the co-processor at initialization
time. When the icswx instruction is executed the L2 generates a cop-reg
transaction on PowerBus. The transaction has no address and the
processor does not perform an MMU access to authenticate the transaction.
The co-processor compares the LPAR_ID and the PID included in the
transaction and the LPAR_ID and PID held in the window context to
determine if the process is authorized to generate the transaction.
The OS needs to assign a 16-bit PID for the process. This cop-PID needs
to be updated during context switch. The cop-PID needs to be destroyed
when the context is destroyed.
Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com>
Signed-off-by: Tseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This removes MMU_FTR_TLBIE_206 as we can now use CPU_FTR_HVMODE_206. It
also changes the logic to select which tlbie to use to be based on this
new CPU feature bit.
This also duplicates the ASM_FTR_IF/SET/CLR defines for CPU features
(copied from MMU features).
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
First step in eliminating irq_map[] table entirely
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
sysctl: net: call unregister_net_sysctl_table where needed
Revert: veth: remove unneeded ifname code from veth_newlink()
smsc95xx: fix reset check
tg3: Fix failure to enable WoL by default when possible
networking: inappropriate ioctl operation should return ENOTTY
amd8111e: trivial typo spelling: Negotitate -> Negotiate
ipv4: don't spam dmesg with "Using LC-trie" messages
af_unix: Only allow recv on connected seqpacket sockets.
mii: add support of pause frames in mii_get_an
net: ftmac100: fix scheduling while atomic during PHY link status change
usbnet: Transfer of maintainership
usbnet: add support for some Huawei modems with cdc-ether ports
bnx2: cancel timer on device removal
iwl4965: fix "Received BA when not expected"
iwlagn: fix "Received BA when not expected"
dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
usbnet: Resubmit interrupt URB if device is open
iwl4965: fix "TX Power requested while scanning"
iwlegacy: led stay solid on when no traffic
b43: trivial: update module info about ucode16_mimo firmware
...
|
|
These changes were incorrectly fixed by codespell. They were now
manually corrected.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
|
Some of the 64bit PPC CPU features are MMU-related, so this patch moves
them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to
mmu_has_feature(), and seven feature bits are freed as a result.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The calculation of the size for the exception save area of the TLB
miss handler is wrong, luckily it's too big not too small.
Rework it to make it a bit clearer, and also correct. We want 3 save
areas, each EX_TLB_SIZE _bytes_.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The MSR bit which indicates 64-bit-ness is different between server and
booke, so add a #define which gives you the right mask regardless.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The goal is to avoid adding overhead to MMIO when only PIO is needed
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.
This patch allows the value to be specified per thread by emulating
the corresponding mfspr and mtspr instructions. Children of such
threads inherit the value. Other threads use a default value that
can be specified in sysfs - /sys/devices/system/cpu/dscr_default.
If a thread starts with non default value in the sysfs entry,
all children threads inherit this non default value even if
the sysfs value is changed later.
Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When we set up the TLB for ourselves on Book3E, we need to flush out any
old mappings established by the firmware or bootloader. At present we
attempt this with a tlbilx to flush everything, but this will leave behind
any entries with the IPROT bit set.
There are several good reason firmware might establish mappings with IPROT,
and in fact ePAPR compliant firmwares are required to establish their
initial mapped area with IPROT.
This patch, therefore adds more complex code to scan through the TLB upon
entry and flush away any entries that are not our own.
Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Add the cputable entry, regs and setup & restore entries for
the PowerPC A2 core.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
mac-fec.c was setting individual UDP address registers instead of multicast
group address registers when joining a multicast group.
This prevented from correctly receiving UDP multicast packets.
According to datasheet, replaced hash_table_high and hash_table_low
with grp_hash_table_high and grp_hash_table_low respectively.
Also renamed hash_table_* with grp_hash_table_* in struct fec declaration
for 8xx: these registers are used only for multicast there.
Tested on a MPC5121 based board.
Build tested also against mpc866_ads_defconfig.
Signed-off-by: Andrea Galbusera <gizero@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
An upcoming new ics backend will need to implement different matching
semantics to the current ones, which are essentially the RTAS ics
backends. So move the current match into the RTAS backend, and allow
other ics backends to override.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
SCOM is a side-band configuration bus implemented on some processors.
This code provides a way for code to map and operate on devices via
SCOM, while the details of how that is implemented is left up to a
SCOM "controller" in the platform code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When we start a cpu we use smp_ops->kick_cpu(), which currently
returns void, it should be able to fail. Convert it to return
int, and update all uses.
Convert all the current error cases to return -ENOENT, which is
what would eventually be returned by __cpu_up() currently when
it doesn't detect the cpu as coming up in time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
On Book3E, MMU_NO_CONTEXT != 0, but the slice_mm_new_context()
macro assumes that it is. This means that the map of the
page sizes for each slice is always initialized to zeroes
(which happens to be 4k pages), rather than to the correct
default base page size value - which might be 64k.
This patch corrects the problem.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Use MMU_NO_CONTEXT as the initialiser for mm_context.id on
nohash and hash64.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Wakeup comes from the system reset handler with a potential loss of
the non-hypervisor CPU state. We save the non-volatile state on the
stack and a pointer to it in the PACA, which the system reset handler
uses to restore things
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We need to wait a bit for them to have done their CPU setup
or we might end up with translation and EE on with different
LPCR values between threads
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This uses feature sections to arrange that we always use HSPRG1
as the scratch register in the interrupt entry code rather than
SPRG2 when we're running in hypervisor mode on POWER7. This will
ensure that we don't trash the guest's SPRG2 when we are running
KVM guests. To simplify the code, we define GET_SCRATCH0() and
SET_SCRATCH0() macros like the GET_PACA/SET_PACA macros.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Rework exception macros a bit to split offset from vector and add
some basic support for HDEC, HDSI, HISI and a few more.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Pass the register type to the prolog, also provides alternate "HV"
version of hardware interrupt (0x500) and adjust LPES accordingly
We tag those interrupts by setting bit 0x2 in the trap number
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When running in Hypervisor mode (arch 2.06 or later), we store the PACA
in HSPRG0 instead of SPRG1. The architecture specifies that SPRGs may be
lost during a "nap" power management operation (though they aren't
currently on POWER7) and this enables use of SPRG1 by KVM guests.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This bit indicates that we are operating in hypervisor mode on a CPU
compliant to architecture 2.06 or later (currently server only).
We set it on POWER7 and have a boot-time CPU setup function that
clears it if MSR:HV isn't set (booting under a hypervisor).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This adds more SPR definitions used on newer processors when running
in hypervisor mode. Along with some other P7 specific bits and pieces
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This is a significant rework of the XICS driver, too significant to
conveniently break it up into a series of smaller patches to be honest.
The driver is moved to a more generic location to allow new platforms
to use it, and is broken up into separate ICP and ICS "backends". For
now we have the native and "hypervisor" ICP backends and one common
RTAS ICS backend.
The driver supports one ICP backend instanciation, and many ICS ones,
in order to accomodate future platforms with multiple possibly different
interrupt "sources" mechanisms.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This problem was noticed on an MPC855T platform. Ftrace did oops
when trying to write to the kernel text segment.
Many thanks to Joakim for finding the root cause of this problem.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
e500mc does not support the HID0/MSR mechanism that is used by e500_idle
(and there are also issues with waking on certain types of interrupts).
Further, even if napping is never actually enabled, just having
CPU_FTR_CAN_NAP will cause machine_init() to overwrite the board's supplied
ppc_md.power_save().
We drop CPU_FTR_MAYBE_CAN_DOZE becuase we should use 'wait' instead on
e500mc.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
The CPU_FTRS_POSSIBLE and CPU_FTRS_ALWAYS defines did not encompass
e5500 CPU features when built for 64-bit. This causes issues with
cpu_has_feature() as it utilizes the POSSIBLE & ALWAYS defines as part
of its check.
Create a unique CPU_FTRS_E5500 (as its different from CPU_FTRS_E500MC),
created a new group for 64-bit Book3e based CPUs and add CPU_FTRS_E5500
to that group.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
Fix common misspellings
|
|
Instead, keep it static, expose an accessor and use that from
the PowerMac code. Avoids easy namespace collisions and will
make it easier to consolidate with other implementations.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This allows us to stop abusing smp_ops->setup_cpu() for cleanup
tasks that have to take place after the initial boot time CPU
bringup.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The current code soft-disables, and then goes to NAP mode which
turns interrupts on. That means that if an interrupt occurs, we
will hit the masked interrupt code path which isn't what we want,
as it will return with EE off, which will either get us out of
NAP mode, or fail to enter it (according to spec).
Instead, let's just rely on the fact that it is safe to take
decrementer interrupts on an offline CPU and leave interrupts
enabled. We can also get rid of the special case in asm for
power4_cpu_offline_powersave() and just use power4_idle().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Use generic cpu_state, call idle_task_exit() properly, and
remove smp_core99_cpu_die() which isn't useful, the generic
function does the job just fine.
|
|
Remove the last remnants of cpu_enable(), everybody uses the normal
__cpu_up() path now
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Nobody uses it, besides we should always use the normal __cpu_up
path anyways
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Various thing are torn down when a CPU is hot-unplugged. That CPU
is expected to go back to start_secondary when re-plugged to re
initialize everything, such as clock sources, maps, ...
Some implementations just return from cpu_die() callback
in the idle loop when the CPU is "re-plugged". This is not enough.
We fix it using a little asm trampoline which resets the stack
and calls back into start_secondary as if we were all fresh from
boot. The trampoline already existed on ppc64, but we add it for
ppc32
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|