Age | Commit message (Collapse) | Author | Files | Lines |
|
Add base address for generic slave ID0, ID1, ID2
and introduced one more entry to align RTC module number between
twl4030 and twl6030
Signed-off-by: Balaji T K <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
This patch disables TWL4030/5030 I2C1 adn I2C4(SR) internal pull-up, to
use only the external HW resistor >=470 Ohm for the assured
functionality in HS mode.
While testing the I2C in High Speed mode, it was discovered that
without a proper pull-up resistor, there is data corruption during
multi-byte transfer. RTC(time_set) test case was used for testing.
From the analysis done, it was concluded that ideally we need a
pull-up of 1.6k Ohm(recomended) or atleast 470 Ohm or greater for
assured performance in HS mode.
Signed-off-by: Moiz Sonasath <[email protected]>
Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Use genirq to simplify IRQ handling in 88pm860x. Remove the interface of
mask/free IRQs on 88pm860x. All these work is taken by genirq. Update the
touchscreen driver of 88pm860x since IRQ handling is changed.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Remove unused definitions.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Update thread irq handler. Simply the interface of using thread irq.
Signed-off-by: Haojian Zhuang <[email protected]>
Acked-by: Mark Brown <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
The WM8994 is a highly integrated ultra low power audio hub CODEC.
Since it includes on-board regulators and GPIOs it is represented
as a multi-function device, though the overwhelming majority of
the functionality is provided by the ASoC CODEC driver.
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
As a separate patch due to the large size.
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
The TWL4030_BCI_BATTERY config option originates from a patch to the
omap git tree. However inclusion in linux was seemingly rejected and
the functionality nears inclusion under a different name so this
removes the bits of the old version that made it into the mainline
kernel again.
Signed-off-by: Christoph Egger <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
This change introduces a driver for the HTC PLD chip found
on some smartphones, such as the HTC Wizard and HTC Herald.
It works through the I2C bus and acts as a GPIO extender.
Specifically:
* it can have several sub-devices, each with its own I2C
address
* Each sub-device provides 8 output and 8 input pins
* The chip attaches to one GPIO to signal when any of the
input GPIOs change -- at which point all chips must be
scanned for changes
This driver implements the GPIOs throught the kernel's
GPIO and IRQ framework. This allows any GPIO-servicing
drivers to operate on htcpld pins, such as the gpio-keys
and gpio-leds drivers.
Signed-off-by: Cory Maccarrone <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Add subdevs in MAX8925. MAX8925 includes regulator, backlight and touch
components.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Basic Max8925 support, which is a power management IC from Maxim
Semiconductor.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Enable touchscreen driver for the 88pm860x multi function core.
Signed-off-by: Haojian Zhuang <[email protected]>
Acked-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Append backlight, led & touch subdevs into 88pm860x driver.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
88PM860x is a complex PMIC device. It contains touch, charger, sound, rtc,
backlight, led, and so on.
Host communicates to 88PM860x by I2C bus. Use thread irq to support this
usage case.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
88PM8606 and 88PM8607 are two discrete chips used for power management.
Hardware designer can use them together or only one of them according to
requirement.
There's some logic tightly linked between these two chips. For example, USB
charger driver needs to access both chips by I2C interface.
Now share one driver to these two devices. Only one I2C client is identified
in platform init data. If another chip is also used, user should mark it in
companion_addr field of platform init data. Then driver could create another
I2C client for the companion chip.
All I2C operations are accessed by 860x-i2c driver. In order to support both
I2C client address, the read/write API is changed in below.
reg_read(client, offset)
reg_write(client, offset, data)
The benefit is that client drivers only need one kind of read/write API. I2C
and MFD driver can be shared in both 8606 and 8607.
Since API is changed, update API in 8607 regulator driver.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Create 88pm8607-i2c driver to support all I2C operation of 88PM8607.
Signed-off-by: Haojian Zhuang <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
This converts the AB3100 core MFD driver to use a threaded
interrupt handler instead of the explicit top/bottom-half
construction with a workqueue. This saves some code and make it
more similar to other modern MFD drivers.
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
This gives us use of the diagnostic facilities genirq provides and
will allow implementation of interrupt support for the WM8350 GPIOs.
Stub functions are provided to ease the transition of the individual
drivers, probably after additional work to pass the IRQ numbers via
the struct devices.
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
Unlike the wm8350-custom code genirq nests enable and disable calls
so we can't just unconditionally mask or unmask the interrupt,
we need to remember the state we set and only mask or unmask when
there is a real change.
Signed-off-by: Mark Brown <[email protected]>
Acked-by: Alessandro Zummo <[email protected]>
Cc: [email protected]
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
To better match genirq.
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Samuel Ortiz <[email protected]>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
[LogFS] Change magic number
[LogFS] Remove h_version field
[LogFS] Check feature flags
[LogFS] Only write journal if dirty
[LogFS] Fix bdev erases
[LogFS] Silence gcc
[LogFS] Prevent 64bit divisions in hash_index
[LogFS] Plug memory leak on error paths
[LogFS] Add MAINTAINERS entry
[LogFS] add new flash file system
Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
fs/logfs/inode.c introduced by write_inode() being changed to use
writeback_control' by commit a9185b41a4f84971b930c519f0c63bd450c4810d
("pass writeback_control to ->write_inode")
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm raid1: fix deadlock when suspending failed device
dm: eliminate some holes data structures
dm ioctl: introduce flag indicating uevent was generated
dm: free dm_io before bio_endio not after
dm table: remove unused dm_get_device range parameters
dm ioctl: only issue uevent on resume if state changed
dm raid1: always return error if all legs fail
dm mpath: refactor pg_init
dm mpath: wait for pg_init completion when suspending
dm mpath: hold io until all pg_inits completed
dm mpath: avoid storing private suspended state
dm: document when snapshot has finished merging
dm table: remove dm_get from dm_table_get_md
dm mpath: skip activate_path for failed paths
dm mpath: pass struct pgpath to pg init done
|
|
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd4: fix minor memory leak
svcrpc: treat uid's as unsigned
nfsd: ensure sockets are closed on error
Revert "sunrpc: move the close processing after do recvfrom method"
Revert "sunrpc: fix peername failed on closed listener"
sunrpc: remove unnecessary svc_xprt_put
NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
xfs_export_operations.commit_metadata
commit_metadata export operation replacing nfsd_sync_dir
lockd: don't clear sm_monitored on nsm_reboot_lookup
lockd: release reference to nsm_handle in nlm_host_rebooted
nfsd: Use vfs_fsync_range() in nfsd_commit
NFSD: Create PF_INET6 listener in write_ports
SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
NFSD: Support AF_INET6 in svc_addsock() function
SUNRPC: Use rpc_pton() in ip_map_parse()
nfsd: 4.1 has an rfc number
nfsd41: Create the recovery entry for the NFSv4.1 client
nfsd: use vfs_fsync for non-directories
...
|
|
Most of the GPIO expanders controlled by the pca953x driver are able to
report changes on the input pins through an *INT pin.
This patch implements the irq_chip functionality (edge detection only).
The driver has been tested on an Arcom Zeus.
[[email protected]: the compiler does inlining for us nowadays]
Signed-off-by: Marc Zyngier <[email protected]>
Cc: Eric Miao <[email protected]>
Cc: Haojian Zhuang <[email protected]>
Cc: David Brownell <[email protected]>
Cc: Nate Case <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
gpio_request() without initial configuration of the GPIO is normally
useless, introduce gpio_request_one() together with GPIOF_ flags for
input/output direction and initial output level.
gpio_{request,free}_array() for multiple GPIOs.
Signed-off-by: Eric Miao <[email protected]>
Cc: David Brownell <[email protected]>
Cc: Ben Nizette <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
linux/i2c/pca953x.h is a very bare include file. Fix check for multiple
includes of linux/i2c/pca953x.h, and add dependent includes into the
header file.
Signed-off-by: Olof Johansson <[email protected]>
Acked-by: Wolfram Sang <[email protected]>
Acked-by: Jean Delvare <[email protected]>
Cc: David Brownell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add the MAX7300-I2C variant of the MAX7301-SPI version. Both chips share
the same core logic, so the generic part of the in-kernel SPI-driver is
refactored into a generic part. The I2C and SPI specific funtions are
then wrapped into seperate drivers picking up the generic part.
Signed-off-by: Wolfram Sang <[email protected]>
Cc: Juergen Beisert <[email protected]>
Cc: David Brownell <[email protected]>
Cc: Jean Delvare <[email protected]>
Cc: Anton Vorontsov <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The driver for the mc13783 rtc needs to know if the TODA irq is pending.
Instead of tracking in the rtc driver if the irq is enabled provide that
information, too.
Signed-off-by: Uwe Kleine-König <[email protected]>
Cc: Alessandro Zummo <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Valentin Longchamp <[email protected]>
Cc: Sascha Hauer <[email protected]>
Cc: Samuel Ortiz <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Luotao Fu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In the source file group these functions together.
The mc13783 header file provides fallback implementations for the old
names to prevent build failures. When all users of the old names are
fixed to use the new names these can go away.
Signed-off-by: Uwe Kleine-König <[email protected]>
Cc: Alessandro Zummo <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Valentin Longchamp <[email protected]>
Cc: Sascha Hauer <[email protected]>
Cc: Samuel Ortiz <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Luotao Fu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pass mm->flags as a coredump parameter for consistency.
---
1787 if (mm->core_state || !get_dumpable(mm)) { <- (1)
1788 up_write(&mm->mmap_sem);
1789 put_cred(cred);
1790 goto fail;
1791 }
1792
[...]
1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ <-(2)
1799 flag = O_EXCL; /* Stop rewrite attacks */
1800 cred->fsuid = 0; /* Dump root private */
1801 }
---
Since dumpable bits are not protected by lock, there is a chance to change
these bits between (1) and (2).
To solve this issue, this patch copies mm->flags to
coredump_params.mm_flags at the beginning of do_coredump() and uses it
instead of get_dumpable() while dumping core.
This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
dump_filter bits in mm->flags.
[[email protected]: fix merge]
Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Roland McGrath <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The current ELF dumper implementation can produce broken corefiles if
program headers exceed 65535. This number is determined by the number of
vmas which the process have. In particular, some extreme programs may use
more than 65535 vmas. (If you google max_map_count, you can find some
users facing this problem.) This kind of program never be able to generate
correct coredumps.
This patch implements ``extended numbering'' that uses sh_info field of
the first section header instead of e_phnum field in order to represent
upto 4294967295 vmas.
This is supported by
AMD64-ABI(http://www.x86-64.org/documentation.html) and
Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
Of course, we are preparing patches for gdb and binutils.
Signed-off-by: Daisuke HATAYAMA <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: David Howells <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
macro for hiding _multiline_ logics in functions. This patch removes
#ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
order to reduce a range of modification.
This cleanup is for my next patches, but I think this cleanup itself is
worth doing regardless of my firnal purpose.
Signed-off-by: Daisuke HATAYAMA <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: David Howells <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
them into other newly created *.c files. Then, each files will contain
dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
same. So, this patch moves them into a header file with dump_seek().
Also, the patch deletes confusing DUMP_WRITE macros in each files.
Signed-off-by: Daisuke HATAYAMA <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: David Howells <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
And bring them back to 4-bit mode during resume.
Signed-off-by: Daniel Drake <[email protected]>
Signed-off-by: Nicolas Pitre <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch series provides the core changes needed to allow SDIO cards to
remain powered and active while the host system is suspended, and let them
wake up the host system when needed. This is used to implement
wake-on-lan with SDIO wireless cards at the moment. Patches to add that
support to the libertas driver will be posted separately.
This patch:
Some SDIO cards have the ability to keep on running autonomously when the
host system is suspended, and wake it up when needed. This however
requires that the host controller preserve power to the card, and
configure itself appropriately for wake-up.
There is however 4 layers of abstractions involved: the host controller
driver, the MMC core code, the SDIO card management code, and the actual
SDIO function driver. To make things simple and manageable, host drivers
must advertise their PM capabilities with a feature bitmask, then function
drivers can query and set those features from their suspend method. Then
each layer in the suspend call chain is expected to act upon those bits
accordingly.
[[email protected]: fix typo in comment]
Signed-off-by: Nicolas Pitre <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some SDIO cards expect byte transfers not to exceed the configured block
transfer size. Add a quirk to that effect.
Patches to make use of this quirk will be sent separately.
Signed-off-by: Bing Zhao <[email protected]>
Signed-off-by: Nicolas Pitre <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The function name must be followed by a space, hypen, space, and a short
description.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
smp: Fix documentation.
Fix documentation in include/linux/smp.h: smp_processor_id()
Signed-off-by: Rakib Mullick <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'. Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.
Signed-off-by: H Hartley Sweeten <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Milton Miller <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Grant Likely <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Benny Halevy <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: Ricardo Labiaga <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Dependent on CONFIG_SMP the num_*_cpus() functions return unsigned or
signed values. Let them always return unsigned values to avoid strange
casts.
Fixes at least one warning:
kernel/kprobes.c: In function 'register_kretprobe':
kernel/kprobes.c:1038: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: Heiko Carstens <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rusty Russell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__GFP_NOFAIL was deprecated in dab48dab, so add a comment that no new
users should be added.
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The VM currently assumes that an inactive, mapped and referenced file page
is in use and promotes it to the active list.
However, every mapped file page starts out like this and thus a problem
arises when workloads create a stream of such pages that are used only for
a short time. By flooding the active list with those pages, the VM
quickly gets into trouble finding eligible reclaim canditates. The result
is long allocation latencies and eviction of the wrong pages.
This patch reuses the PG_referenced page flag (used for unmapped file
pages) to implement a usage detection that scales with the speed of LRU
list cycling (i.e. memory pressure).
If the scanner encounters those pages, the flag is set and the page cycled
again on the inactive list. Only if it returns with another page table
reference it is activated. Otherwise it is reclaimed as 'not recently
used cache'.
This effectively changes the minimum lifetime of a used-once mapped file
page from a full memory cycle to an inactive list cycle, which allows it
to occur in linear streams without affecting the stable working set of the
system.
Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: OSAKI Motohiro <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are quite a few GFP_KERNEL memory allocations made during
suspend/hibernation and resume that may cause the system to hang, because
the I/O operations they depend on cannot be completed due to the
underlying devices being suspended.
Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in
gfp_allowed_mask before suspend/hibernation and restoring the original
values of these bits in gfp_allowed_mask durig the subsequent resume.
[[email protected]: fix CONFIG_PM=n linkage]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reported-by: Maxim Levitsky <[email protected]>
Cc: Sebastian Ott <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a VMA is in an inconsistent state during setup or teardown, the worst
that can happen is that the rmap code will not be able to find the page.
The mapping is in the process of being torn down (PTEs just got
invalidated by munmap), or set up (no PTEs have been instantiated yet).
It is also impossible for the rmap code to follow a pointer to an already
freed VMA, because the rmap code holds the anon_vma->lock, which the VMA
teardown code needs to take before the VMA is removed from the anon_vma
chain.
Hence, we should not need the VM_LOCK_RMAP locking at all.
Signed-off-by: Rik van Riel <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When the parent process breaks the COW on a page, both the original which
is mapped at child and the new page which is mapped parent end up in that
same anon_vma. Generally this won't be a problem, but for some workloads
it could preserve the O(N) rmap scanning complexity.
A simple fix is to ensure that, when a page which is mapped child gets
reused in do_wp_page, because we already are the exclusive owner, the page
gets moved to our own exclusive child's anon_vma.
Signed-off-by: Rik van Riel <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The old anon_vma code can lead to scalability issues with heavily forking
workloads. Specifically, each anon_vma will be shared between the parent
process and all its child processes.
In a workload with 1000 child processes and a VMA with 1000 anonymous
pages per process that get COWed, this leads to a system with a million
anonymous pages in the same anon_vma, each of which is mapped in just one
of the 1000 processes. However, the current rmap code needs to walk them
all, leading to O(N) scanning complexity for each page.
This can result in systems where one CPU is walking the page tables of
1000 processes in page_referenced_one, while all other CPUs are stuck on
the anon_vma lock. This leads to catastrophic failure for a benchmark
like AIM7, where the total number of processes can reach in the tens of
thousands. Real workloads are still a factor 10 less process intensive
than AIM7, but they are catching up.
This patch changes the way anon_vmas and VMAs are linked, which allows us
to associate multiple anon_vmas with a VMA. At fork time, each child
process gets its own anon_vmas, in which its COWed pages will be
instantiated. The parents' anon_vma is also linked to the VMA, because
non-COWed pages could be present in any of the children.
This reduces rmap scanning complexity to O(1) for the pages of the 1000
child processes, with O(N) complexity for at most 1/N pages in the system.
This reduces the average scanning cost in heavily forking workloads from
O(N) to 2.
The only real complexity in this patch stems from the fact that linking a
VMA to anon_vmas now involves memory allocations. This means vma_adjust
can fail, if it needs to attach a VMA to anon_vma structures. This in
turn means error handling needs to be added to the calling functions.
A second source of complexity is that, because there can be multiple
anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
"the" anon_vma lock. To prevent the rmap code from walking up an
incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
to make sure it is impossible to compile a kernel that needs both symbolic
values for the same bitflag.
Some test results:
Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
box with 16GB RAM and not quite enough IO), the system ends up running
>99% in system time, with every CPU on the same anon_vma lock in the
pageout code.
With these changes, AIM7 hits the cross-over point around 29.7k users.
This happens with ~99% IO wait time, there never seems to be any spike in
system time. The anon_vma lock contention appears to be resolved.
[[email protected]: cleanups]
Signed-off-by: Rik van Riel <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It was tolerable until Eric went and added 8388608.
Cc: Eric Paris <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.
POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.
In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.
POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.
So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.
Note that the random hint is not likely to help random reads performance
noticeably. And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).
In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!
Tested-by: Quentin Barnes <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Steven Whitehouse <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: <[email protected]> [2.6.33.x]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A memmap is a directory in sysfs which includes 3 text files: start, end
and type. For example:
start: 0x100000
end: 0x7e7b1cff
type: System RAM
Interface firmware_map_add was not called explicitly. Remove it and add
function firmware_map_add_hotplug as hotplug interface of memmap.
Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
does not export memmap entry for it. We add a call in function add_memory
to function firmware_map_add_hotplug.
Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
will be called when initialize memmap and hot-add memory.
[[email protected]: un-kernedoc a no longer kerneldoc comment]
Signed-off-by: Shaohui Zheng <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Acked-by: Yasunori Goto <[email protected]>
Reviewed-by: Wu Fengguang <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|