Age | Commit message (Collapse) | Author | Files | Lines |
|
If the string corresponding to a %s specifier can change under us, we
might end up copying a \0 byte to the output buffer. There might be
callers who expect the output buffer to contain a genuine C string whose
length is exactly the snprintf return value (assuming truncation hasn't
happened or has been checked for).
We can avoid this by only passing over the source string once, stopping
the first time we meet a nul byte (or when we reach the given
precision), and then letting widen_string() handle left/right space
padding. As a small bonus, this code reuse also makes the generated
code slightly smaller.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Maurizio Lombardi <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is pure code movement, making sure the widen_string() helper is
defined before the string() function.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Maurizio Lombardi <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull out the logic in dentry_name() which handles field width space
padding, in preparation for reusing it from string(). Rename the
widen() helper to move_right(), since it is used for handling the
!(flags & LEFT) case.
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Maurizio Lombardi <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Calvin Owens <[email protected]>
Acked-by: Jan Kara <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Kyle McMartin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Boot consoles are typically replaced by proper consoles during the boot
process. This can be problematic if the boot console data is part of
the init section that is reclaimed late during boot. If the proper
console does not register before this point in time, the boot console
will need to be removed (so that the freed memory is not accessed),
leaving the system without output for some time.
There are various reasons why the proper console may not register early
enough, such as deferred probe or the driver being a loadable module.
If that happens, there is some amount of time where no console messages
are visible to the user, which in turn can mean that they won't see
crashes or other potentially useful information.
To avoid this situation, only remove the boot console when it resides in
the init section. Code exists to replace the boot console by the proper
console when it is registered, keeping a seamless transition between the
boot and proper consoles.
Signed-off-by: Thierry Reding <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a helper to check if an object (given an address and a size) is part
of a section (given beginning and end addresses). For convenience, also
provide a helper that performs this check for __init data using the
__init_begin and __init_end limits.
Signed-off-by: Thierry Reding <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
stop_machine.o is only built if CONFIG_SMP=y, so this ifdef always
evaluates to true.
[[email protected]: remove now-unneeded ifdef]
Reported-by: Valentin Rothberg <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
uselib hasn't been used since libc5; glibc does not use it. Deprecate
uselib a bit more, by making the default y only if libc5 was widely used
on the plaform.
This makes arm64 kernel built with defconfig slightly smaller
bloat-o-meter:
add/remove: 0/3 grow/shrink: 0/2 up/down: 0/-1390 (-1390)
function old new delta
kernel_config_data 18164 18162 -2
uselib_flags 20 - -20
padzero 216 192 -24
sys_uselib 380 - -380
load_elf_library 964 - -964
Signed-off-by: Riku Voipio <[email protected]>
Reviewed-by: Josh Triplett <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
IS_ERR_VALUE() already contains it and so we need to add this only to
the !ptr check. That will allow users of IS_ERR_OR_NULL(), to not add
this compiler flag.
Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As illustrated by commit a3afe70b83fd ("[S390] latencytop s390
support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
advertise an implementation of save_stack_trace_tsk.
However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk()
weak alias") a dummy implementation is provided if STACKTRACE=y. Given
that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.
Signed-off-by: Will Deacon <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Russell King <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Helge Deller <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As all new_valid_dev() checks have been removed it's time to drop
new_valid_dev() itself.
No functional change.
Signed-off-by: Yaowei Bai <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
New_valid_dev() always returns true, so that's unnecessary to perform
new_valid_dev() checks in some filesystems. Most checks of
new_valid_dev() have been removed so let's drop this last one and then
we can remove new_valid_dev() from the source code.
No functional change.
Signed-off-by: Yaowei Bai <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rewrite abs() so that its return type does not depend on the
architecture and no unexpected type conversion happen inside of it. The
only conversion is from unsigned to signed type. char is left as a
return type but treated as a signed type regradless of it's actual
signedness.
With the old version, int arguments were promoted to long and depending
on architecture a long argument might result in s64 or long return type
(which may or may not be the same).
This came after some back and forth with Nicolas. The current macro has
different return type (for the same input type) depending on
architecture which might be midly iritating.
An alternative version would promote to int like so:
#define abs(x) __abs_choose_expr(x, long long, \
__abs_choose_expr(x, long, \
__builtin_choose_expr( \
sizeof(x) <= sizeof(int), \
({ int __x = (x); __x<0?-__x:__x; }), \
((void)0))))
I have no preference but imagine Linus might. :] Nicolas argument against
is that promoting to int causes iconsistent behaviour:
int main(void) {
unsigned short a = 0, b = 1, c = a - b;
unsigned short d = abs(a - b);
unsigned short e = abs(c);
printf("%u %u\n", d, e); // prints: 1 65535
}
Then again, no sane person expects consistent behaviour from C integer
arithmetic. ;)
Note:
__builtin_types_compatible_p(unsigned char, char) is always false, and
__builtin_types_compatible_p(signed char, char) is also always false.
Signed-off-by: Michal Nazarewicz <[email protected]>
Reviewed-by: Nicolas Pitre <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Wey-Yi Guy <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
TIMER_ENTRY_STATIC and TAIL_MAPPING are defined as poison pointers which
should point to nowhere. Redefine them using POISON_POINTER_DELTA
arithmetics to make sure they really point to non-mappable area declared
by the target architecture.
Signed-off-by: Vasily Kulikov <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Solar Designer <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
PA-RISC doesn't have atomic instructions to modify page table entries, so it
takes spinlock in the TLB handler and modifies the page table entry
non-atomically. If you modify the page table entry without the spinlock, you
may race with TLB handler on another CPU and your modification may be lost.
Protect against that with usage of purge_tlb_start() and purge_tlb_end() which
handles the TLB spinlock.
Signed-off-by: Helge Deller <[email protected]>
Cc: [email protected] # v4.4
|
|
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_orig_node_free_ref.
Fixes: 72822225bd41 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_hardif_free_ref.
Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_neigh_ifinfo_free_ref.
Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_hardif_neigh_free_ref.
Fixes: cef63419f7db ("batman-adv: add list of unique single hop neighbors per hard-interface")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_neigh_node_free_ref.
Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
It is not allowed to free the memory of an object which is part of a list
which is protected by rcu-read-side-critical sections without making sure
that no other context is accessing the object anymore. This usually happens
by removing the references to this object and then waiting until the rcu
grace period is over and no one (allowedly) accesses it anymore.
But the _now functions ignore this completely. They free the object
directly even when a different context still tries to access it. This has
to be avoided and thus these functions must be removed and all functions
have to use batadv_orig_ifinfo_free_ref.
Fixes: 7351a4822d42 ("batman-adv: split out router from orig_node")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
The batadv_nc_node_free_ref function uses call_rcu to delay the free of the
batadv_nc_node object until no (already started) rcu_read_lock is enabled
anymore. This makes sure that no context is still trying to access the
object which should be removed. But batadv_nc_node also contains a
reference to orig_node which must be removed.
The reference drop of orig_node was done in the call_rcu function
batadv_nc_node_free_rcu but should actually be done in the
batadv_nc_node_release function to avoid nested call_rcus. This is
important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will
not detect the inner call_rcu as relevant for its execution. Otherwise this
barrier will most likely be inserted in the queue before the callback of
the first call_rcu was executed. The caller of rcu_barrier will therefore
continue to run before the inner call_rcu callback finished.
Fixes: d56b1705e28c ("batman-adv: network coding - detect coding nodes and remove these after timeout")
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
The batadv_claim_free_ref function uses call_rcu to delay the free of the
batadv_bla_claim object until no (already started) rcu_read_lock is enabled
anymore. This makes sure that no context is still trying to access the
object which should be removed. But batadv_bla_claim also contains a
reference to backbone_gw which must be removed.
The reference drop of backbone_gw was done in the call_rcu function
batadv_claim_free_rcu but should actually be done in the
batadv_claim_release function to avoid nested call_rcus. This is important
because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not
detect the inner call_rcu as relevant for its execution. Otherwise this
barrier will most likely be inserted in the queue before the callback of
the first call_rcu was executed. The caller of rcu_barrier will therefore
continue to run before the inner call_rcu callback finished.
Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
Signed-off-by: Sven Eckelmann <[email protected]>
Acked-by: Simon Wunderlich <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>
|
|
The multi-buffer Rx mode implemented in the past introduced
a regression that causes a data corruption for received VLAN
traffic when VLAN tag stripping is enabled. This mode is supported
only be newer chipsets (1860) and is enabled when MTU > 4096.
When this mode is enabled Rx queue contains buffers with fixed size
2048 bytes. Any incoming packet larger than 2048 is divided into
multiple buffers that are attached as skb frags in polling routine.
The driver assumes that all buffers associated with a packet except
the last one is fully used (e.g. packet with size 5000 are divided
into 3 buffers 2048 + 2048 + 904 bytes) and ignores true size reported
in completions. This assumption is usually true but not when VLAN
packet is received and VLAN tag stripping is enabled. In this case
the first buffer is 2044 bytes long but as the driver always assumes
2048 bytes then 4 extra random bytes are included between the first
and the second frag. Additionally the driver sets checksum as correct
so the packet is properly processed by the core.
The driver needs to check the size of used space in each Rx buffer
reported by FW and not blindly use the fixed value.
Cc: Rasesh Mody <[email protected]>
Signed-off-by: Ivan Vecera <[email protected]>
Reviewed-by: Rasesh Mody <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk framework updates from Michael Turquette:
"The clk framework and driver changes for 4.5 look pretty typical. The
bulk of the changes are to clk controller drivers, though some
improvements to the core and some re-usable blocks/templates also
received some love.
In this past cycle the clk maintainers developed a good workflow for
handling the common case of patch submissions containing a new
drivers, new shared Device Tree header and a new Device Tree binding
description. This requires coordination with the Device Tree
maintainers and with the architecture maintainers (typically the
arm-soc tree in our case).
This explains the increase in changes to include/dt-bindings/... and
to Documentation/devicetree/bindings/clock/... coming from the clk
tree. The same commits can be expected to come through those trees on
occasion, through the use of shared, immutable branches"
* tag 'clk-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig
clk: fix clk-gpio.c with optional clock= DT property
clk: rockchip: fix section mismatches with new child-clocks
clk: gpio: handle error codes for of_clk_get_parent_count()
clk: gpio: fix memory leak
clk: shmobile: r8a7795: Add SATA0 clock
clk: bcm2835: Add PWM clock support
clk: bcm2835: Support for clock parent selection
clk: bcm2835: add a round up ability to the clock divisor
clk: lpc32xx: add common clock framework driver
clk: lpc18xx: add NXP specific COMMON_CLK_NXP configuration symbol
dt-bindings: clock: add NXP LPC32xx clock list for consumers
dt-bindings: clock: add description of LPC32xx USB clock controller
dt-bindings: clock: add description of LPC32xx clock controller
clk: rockchip: rk3036: include downstream muxes into fractional dividers
clk: add flag for clocks that need to be enabled on rate changes
clk: rockchip: Allow the RK3288 SPDIF clocks to change their parent
clk: rockchip: include downstream muxes into fractional dividers
clk: rockchip: handle mux dependency of fractional dividers
clk: bcm2835: Add a driver for the auxiliary peripheral clock gates.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull dmi updates from Jean Delvare.
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi_scan: Save SMBIOS Type 9 System Slots
firmware: dmi_scan: Fix dmi_find_device description
firmware: dmi_scan: Clarify dmi_save_extended_devices
firmware: dmi_scan: Optimize dmi_save_extended_devices
|
|
A spare array holding mem cgroup threshold events is kept around to make
sure we can always safely deregister an event and have an array to store
the new set of events in.
In the scenario where we're going from 1 to 0 registered events, the
pointer to the primary array containing 1 event is copied to the spare
slot, and then the spare slot is freed because no events are left.
However, it is freed before calling synchronize_rcu(), which means
readers may still be accessing threshold->primary after it is freed.
Fixed by only freeing after synchronize_rcu().
Signed-off-by: Martijn Coenen <[email protected]>
Cc: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently memory_failure() doesn't handle non anonymous thp case,
because we can hardly expect the error handling to be successful, and it
can just hit some corner case which results in BUG_ON or something
severe like that. This is also the case for soft offline code, so let's
make it in the same way.
Orignal code has a MF_COUNT_INCREASED check before put_hwpoison_page(),
but it's unnecessary because get_any_page() is already called when
running on this code, which takes a refcount of the target page
regardress of the flag. So this patch also removes it.
[[email protected]: fix build]
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
soft_offline_page() has some deeply indented code, that's the sign of
demand for cleanup. So let's do this. No functionality change.
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Page faults can race with fallocate hole punch. If a page fault happens
between the unmap and remove operations, the page is not removed and
remains within the hole. This is not the desired behavior. The race is
difficult to detect in user level code as even in the non-race case, a
page within the hole could be faulted back in before fallocate returns.
If userfaultfd is expanded to support hugetlbfs in the future, this race
will be easier to observe.
If this race is detected and a page is mapped, the remove operation
(remove_inode_hugepages) will unmap the page before removing. The unmap
within remove_inode_hugepages occurs with the hugetlb_fault_mutex held
so that no other faults will be processed until the page is removed.
The (unmodified) routine hugetlb_vmdelete_list was moved ahead of
remove_inode_hugepages to satisfy the new reference.
[[email protected]: move hugetlb_vmdelete_list()]
Signed-off-by: Mike Kravetz <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hillf Danton noticed bugs in the hugetlb_vmtruncate_list routine. The
argument end is of type pgoff_t. It was being converted to a vaddr
offset and passed to unmap_hugepage_range. However, end was also being
used as an argument to the vma_interval_tree_foreach controlling loop.
In addition, the conversion of end to vaddr offset was incorrect.
hugetlb_vmtruncate_list is called as part of a file truncate or
fallocate hole punch operation.
When truncating a hugetlbfs file, this bug could prevent some pages from
being unmapped. This is possible if there are multiple vmas mapping the
file, and there is a sufficiently sized hole between the mappings. The
size of the hole between two vmas (A,B) must be such that the starting
virtual address of B is greater than (ending virtual address of A <<
PAGE_SHIFT). In this case, the pages in B would not be unmapped. If
pages are not properly unmapped during truncate, the following BUG is
hit:
kernel BUG at fs/hugetlbfs/inode.c:428!
In the fallocate hole punch case, this bug could prevent pages from
being unmapped as in the truncate case. However, for hole punch the
result is that unmapped pages will not be removed during the operation.
For hole punch, it is also possible that more pages than desired will be
unmapped. This unnecessary unmapping will cause page faults to
reestablish the mappings on subsequent page access.
Fixes: 1bfad99ab (" hugetlbfs: hugetlb_vmtruncate_list() needs to take a range")Reported-by: Hillf Danton <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: <[email protected]> [4.3]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Both s390 and powerpc have hit the issue of swapoff hanging, when
CONFIG_HAVE_ARCH_SOFT_DIRTY and CONFIG_MEM_SOFT_DIRTY ifdefs were not
quite as x86_64 had them. I think it would be much clearer if
HAVE_ARCH_SOFT_DIRTY was just a Kconfig option set by architectures to
determine whether the MEM_SOFT_DIRTY option should be offered, and the
actual code depend upon CONFIG_MEM_SOFT_DIRTY alone.
But won't embark on that change myself: instead make swapoff more
robust, by using pte_swp_clear_soft_dirty() on each pte it encounters,
without an explicit #ifdef CONFIG_MEM_SOFT_DIRTY. That being a no-op,
whether the bit in question is defined as 0 or the asm-generic fallback
is used, unless soft dirty is fully turned on.
Why "maybe" in maybe_same_pte()? Rename it pte_same_as_swp().
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Cyrill Gorcunov <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Dmitry Vyukov has reported[1] possible deadlock (triggered by his
syzkaller fuzzer):
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&mapping->i_mmap_rwsem);
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&mapping->i_mmap_rwsem);
Both traces points to mm_take_all_locks() as a source of the problem.
It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka
mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem.
huge_pmd_share() does memory allocation under hugetlbfs_i_mmap_rwsem_key
and allocator can take i_mmap_rwsem if it hit reclaim. So we need to
take i_mmap_rwsem from all hugetlb VMAs before taking i_mmap_rwsem from
rest of VMAs.
The patch also documents locking order for hugetlbfs_i_mmap_rwsem_key.
[1] http://lkml.kernel.org/r/CACT4Y+Zu95tBs-0EvdiAKzUOsb4tczRRfCRTpLr4bg_OP9HuVg@mail.gmail.com
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
MPOL_MF_LAZY is not visible from userspace since a720094ded8c ("mm:
mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now"), but
it should still skip non-migratable VMAs such as VM_IO, VM_PFNMAP, and
VM_HUGETLB VMAs, and avoid useless overhead of minor faults.
Signed-off-by: Liang Chen <[email protected]>
Signed-off-by: Gavin Guo <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove unused struct zone *z variable which appeared in 86051ca5eaf5
("mm: fix usemap initialization").
Signed-off-by: Alexander Kuleshov <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since can_do_mlock only return 1 or 0, so make it boolean.
No functional change.
[[email protected]: update declaration in mm.h]
Signed-off-by: Wang Xiaoqiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Just cleanup, no functional change.
Signed-off-by: Wang Xiaoqiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
mem_cgroup_css_from_page()
In earlier versions, mem_cgroup_css_from_page() could return non-root
css on a legacy hierarchy which can go away and required rcu locking;
however, the eventual version simply returns the root cgroup if memcg is
on a legacy hierarchy and thus doesn't need rcu locking around or in it.
Remove spurious rcu lockings.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use "IS_ALIGNED" to judge the alignment, rather than directly judging.
Signed-off-by: Wang Xiaoqiang <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
allmodconfig produces following warning for me:
WARNING: vmlinux.o(.text.unlikely+0x10314): Section mismatch in reference from the function movable_node_is_enabled() to the variable .meminit.data:movable_node_enabled
The function movable_node_is_enabled() references
the variable __meminitdata movable_node_enabled.
This is often because movable_node_is_enabled lacks a __meminitdata
annotation or the annotation of movable_node_enabled is wrong.
Let's mark the function with __meminit. It fixes the warning.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
By passing a non-null flag we allow fixup_user_fault to retry, which
enables userfaultfd. As during these retries we might drop the mmap_sem
we need to check if that happened and redo the complete chain of
actions.
Signed-off-by: Dominik Dingel <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: "Jason J. Herne" <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Eric B Munson <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Dominik Dingel <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
During Jason's work with postcopy migration support for s390 a problem
regarding gmap faults was discovered.
The gmap code will call fixup_user_fault which will end up always in
handle_mm_fault. Till now we never cared about retries, but as the
userfaultfd code kind of relies on it. this needs some fix.
This patchset does not take care of the futex code. I will now look
closer at this.
This patch (of 2):
With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
faulting we ever unlocked mmap_sem.
This patch brings in the logic to handle retries as well as it cleans up
the current documentation. fixup_user_fault was not having the same
semantics as filemap_fault. It never indicated if a retry happened and
so a caller wasn't able to handle that case. So we now changed the
behaviour to always retry a locked mmap_sem.
Signed-off-by: Dominik Dingel <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: "Jason J. Herne" <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Eric B Munson <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Dominik Dingel <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that the get_user_pages() path knows how to handle dax-pmd mappings,
remove the protections that disabled dax-pmd support.
Tests available from github.com/pmem/ndctl:
make TESTS="lib/test-dax.sh lib/test-mmap.sh" check
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a wide gamut of conditions that can trigger the dax pmd path to
fallback to pte mappings. Ideally we'd have a syscall interface to
determine mapping characteristics after the fact. In the meantime
provide debug messages.
Signed-off-by: Dan Williams <[email protected]>
Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
has established a devm_memremap_pages() mapping, i.e. when the pfn_t
return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when
encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
struct dev_pagemap instance to keep the result of pfn_to_page() valid
until put_page().
Signed-off-by: Dan Williams <[email protected]>
Tested-by: Logan Gunthorpe <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A dax-huge-page mapping while it uses some thp helpers is ultimately not
a transparent huge page. The distinction is especially important in the
get_user_pages() path. pmd_devmap() is used to distinguish dax-pmds
from pmd_huge() and pmd_trans_huge() which have slightly different
semantics.
Explicitly mark the pmd_trans_huge() helpers that dax needs by adding
pmd_devmap() checks.
[[email protected]: fix regression in handling mlocked pages in __split_huge_pmd()]
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
get_dev_page() enables paths like get_user_pages() to pin a dynamically
mapped pfn-range (devm_memremap_pages()) while the resulting struct page
objects are in use. Unlike get_page() it may fail if the device is, or
is in the process of being, disabled. While the initial lookup of the
range may be an expensive list walk, the result is cached to speed up
subsequent lookups which are likely to be in the same mapped range.
devm_memremap_pages() now requires a reference counter to be specified
at init time. For pmem this means moving request_queue allocation into
pmem_alloc() so the existing queue usage counter can track "device
pages".
ZONE_DEVICE pages always have an elevated count and will never be on an
lru reclaim list. That space in 'struct page' can be redirected for
other uses, but for safety introduce a poison value that will always
trip __list_add() to assert. This allows half of the struct list_head
storage to be reclaimed with some assurance to back up the assumption
that the page count never goes to zero and a list_add() is never
attempted.
Signed-off-by: Dan Williams <[email protected]>
Tested-by: Logan Gunthorpe <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Before the dynamically allocated struct pages from devm_memremap_pages()
can be put to use outside the driver, we need a mechanism to track
whether they are still in use at teardown. Towards that goal reorder
the initialization sequence to allow the 'q_usage_counter' from the
request_queue to be used by the devm_memremap_pages() implementation (in
subsequent patches).
Signed-off-by: Dan Williams <[email protected]>
Cc: Ross Zwisler <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Similar to the conversion of vm_insert_mixed() use pfn_t in the
vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
pfn is backed by a devm_memremap_pages() mapping.
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of
evaluating the PFN_MAP and PFN_DEV flags. When both are set it triggers
_PAGE_DEVMAP to be set in the resulting pte.
There are no functional changes to the gpu drivers as a result of this
conversion.
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Airlie <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|