aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-01-16lib/vsprintf.c: eliminate potential race in string()Rasmus Villemoes1-19/+9
If the string corresponding to a %s specifier can change under us, we might end up copying a \0 byte to the output buffer. There might be callers who expect the output buffer to contain a genuine C string whose length is exactly the snprintf return value (assuming truncation hasn't happened or has been checked for). We can avoid this by only passing over the source string once, stopping the first time we meet a nul byte (or when we reach the given precision), and then letting widen_string() handle left/right space padding. As a small bonus, this code reuse also makes the generated code slightly smaller. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16lib/vsprintf.c: move string() below widen_string()Rasmus Villemoes1-31/+31
This is pure code movement, making sure the widen_string() helper is defined before the string() function. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16lib/vsprintf.c: pull out padding code from dentry_name()Rasmus Villemoes1-15/+31
Pull out the logic in dentry_name() which handles field width space padding, in preparation for reusing it from string(). Rename the widen() helper to move_right(), since it is used for handling the !(flags & LEFT) case. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Maurizio Lombardi <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16printk: do cond_resched() between lines while outputting to consolesTejun Heo3-3/+36
@console_may_schedule tracks whether console_sem was acquired through lock or trylock. If the former, we're inside a sleepable context and console_conditional_schedule() performs cond_resched(). This allows console drivers which use console_lock for synchronization to yield while performing time-consuming operations such as scrolling. However, the actual console outputting is performed while holding irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule before starting outputting lines. Also, only a few drivers call console_conditional_schedule() to begin with. This means that when a lot of lines need to be output by console_unlock(), for example on a console registration, the task doing console_unlock() may not yield for a long time on a non-preemptible kernel. If this happens with a slow console devices, for example a serial console, the outputting task may occupy the cpu for a very long time. Long enough to trigger softlockup and/or RCU stall warnings, which in turn pile more messages, sometimes enough to trigger the next cycle of warnings incapacitating the system. Fix it by making console_unlock() insert cond_resched() between lines if @console_may_schedule. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Calvin Owens <[email protected]> Acked-by: Jan Kara <[email protected]> Cc: Dave Jones <[email protected]> Cc: Kyle McMartin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16printk: only unregister boot consoles when necessaryThierry Reding1-1/+25
Boot consoles are typically replaced by proper consoles during the boot process. This can be problematic if the boot console data is part of the init section that is reclaimed late during boot. If the proper console does not register before this point in time, the boot console will need to be removed (so that the freed memory is not accessed), leaving the system without output for some time. There are various reasons why the proper console may not register early enough, such as deferred probe or the driver being a loadable module. If that happens, there is some amount of time where no console messages are visible to the user, which in turn can mean that they won't see crashes or other potentially useful information. To avoid this situation, only remove the boot console when it resides in the init section. Code exists to replace the boot console by the proper console when it is registered, keeping a seamless transition between the boot and proper consoles. Signed-off-by: Thierry Reding <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16asm/sections: add helpers to check for section dataThierry Reding1-0/+65
Add a helper to check if an object (given an address and a size) is part of a section (given beginning and end addresses). For convenience, also provide a helper that performs this check for __init data using the __init_begin and __init_end limits. Signed-off-by: Thierry Reding <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16kernel/stop_machine.c: remove CONFIG_SMP dependenciesAndrew Morton1-4/+0
stop_machine.o is only built if CONFIG_SMP=y, so this ifdef always evaluates to true. [[email protected]: remove now-unneeded ifdef] Reported-by: Valentin Rothberg <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16uselib: default depending if libc5 was usedRiku Voipio1-1/+1
uselib hasn't been used since libc5; glibc does not use it. Deprecate uselib a bit more, by making the default y only if libc5 was widely used on the plaform. This makes arm64 kernel built with defconfig slightly smaller bloat-o-meter: add/remove: 0/3 grow/shrink: 0/2 up/down: 0/-1390 (-1390) function old new delta kernel_config_data 18164 18162 -2 uselib_flags 20 - -20 padzero 216 192 -24 sys_uselib 380 - -380 load_elf_library 964 - -964 Signed-off-by: Riku Voipio <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16err.h: add (missing) unlikely() to IS_ERR_OR_NULL()Viresh Kumar1-1/+1
IS_ERR_VALUE() already contains it and so we need to add this only to the !ptr check. That will allow users of IS_ERR_OR_NULL(), to not add this compiler flag. Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16Kconfig: remove HAVE_LATENCYTOP_SUPPORTWill Deacon12-37/+0
As illustrated by commit a3afe70b83fd ("[S390] latencytop s390 support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to advertise an implementation of save_stack_trace_tsk. However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk() weak alias") a dummy implementation is provided if STACKTRACE=y. Given that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether. Signed-off-by: Will Deacon <[email protected]> Acked-by: Heiko Carstens <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Russell King <[email protected]> Cc: James Hogan <[email protected]> Cc: Michal Simek <[email protected]> Cc: Helge Deller <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16include/linux/kdev_t.h: remove new_valid_dev()Yaowei Bai1-5/+0
As all new_valid_dev() checks have been removed it's time to drop new_valid_dev() itself. No functional change. Signed-off-by: Yaowei Bai <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16fs/stat.c: drop the last new_valid_dev checkYaowei Bai1-1/+1
New_valid_dev() always returns true, so that's unnecessary to perform new_valid_dev() checks in some filesystems. Most checks of new_valid_dev() have been removed so let's drop this last one and then we can remove new_valid_dev() from the source code. No functional change. Signed-off-by: Yaowei Bai <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16include/linux/kernel.h: change abs() macro so it uses consistent return typeMichal Nazarewicz3-24/+23
Rewrite abs() so that its return type does not depend on the architecture and no unexpected type conversion happen inside of it. The only conversion is from unsigned to signed type. char is left as a return type but treated as a signed type regradless of it's actual signedness. With the old version, int arguments were promoted to long and depending on architecture a long argument might result in s64 or long return type (which may or may not be the same). This came after some back and forth with Nicolas. The current macro has different return type (for the same input type) depending on architecture which might be midly iritating. An alternative version would promote to int like so: #define abs(x) __abs_choose_expr(x, long long, \ __abs_choose_expr(x, long, \ __builtin_choose_expr( \ sizeof(x) <= sizeof(int), \ ({ int __x = (x); __x<0?-__x:__x; }), \ ((void)0)))) I have no preference but imagine Linus might. :] Nicolas argument against is that promoting to int causes iconsistent behaviour: int main(void) { unsigned short a = 0, b = 1, c = a - b; unsigned short d = abs(a - b); unsigned short e = abs(c); printf("%u %u\n", d, e); // prints: 1 65535 } Then again, no sane person expects consistent behaviour from C integer arithmetic. ;) Note: __builtin_types_compatible_p(unsigned char, char) is always false, and __builtin_types_compatible_p(signed char, char) is also always false. Signed-off-by: Michal Nazarewicz <[email protected]> Reviewed-by: Nicolas Pitre <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Cc: Wey-Yi Guy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16include/linux/poison.h: use POISON_POINTER_DELTA for poison pointersVasily Kulikov1-2/+2
TIMER_ENTRY_STATIC and TAIL_MAPPING are defined as poison pointers which should point to nowhere. Redefine them using POISON_POINTER_DELTA arithmetics to make sure they really point to non-mappable area declared by the target architecture. Signed-off-by: Vasily Kulikov <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: Solar Designer <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-16parisc: Protect huge page pte changes with spinlocksHelge Deller2-28/+52
PA-RISC doesn't have atomic instructions to modify page table entries, so it takes spinlock in the TLB handler and modifies the page table entry non-atomically. If you modify the page table entry without the spinlock, you may race with TLB handler on another CPU and your modification may be lost. Protect against that with usage of purge_tlb_start() and purge_tlb_end() which handles the TLB spinlock. Signed-off-by: Helge Deller <[email protected]> Cc: [email protected] # v4.4
2016-01-16batman-adv: Drop immediate orig_node free functionSven Eckelmann3-27/+13
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_orig_node_free_ref. Fixes: 72822225bd41 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_hard_iface free functionSven Eckelmann2-20/+7
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_hardif_free_ref. Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate neigh_ifinfo free functionSven Eckelmann1-24/+10
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_neigh_ifinfo_free_ref. Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_hardif_neigh_node free functionSven Eckelmann1-33/+13
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_hardif_neigh_free_ref. Fixes: cef63419f7db ("batman-adv: add list of unique single hop neighbors per hard-interface") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_neigh_node free functionSven Eckelmann1-23/+10
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_neigh_node_free_ref. Fixes: 89652331c00f ("batman-adv: split tq information in neigh_node struct") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Drop immediate batadv_orig_ifinfo free functionSven Eckelmann1-28/+31
It is not allowed to free the memory of an object which is part of a list which is protected by rcu-read-side-critical sections without making sure that no other context is accessing the object anymore. This usually happens by removing the references to this object and then waiting until the rcu grace period is over and no one (allowedly) accesses it anymore. But the _now functions ignore this completely. They free the object directly even when a different context still tries to access it. This has to be avoided and thus these functions must be removed and all functions have to use batadv_orig_ifinfo_free_ref. Fixes: 7351a4822d42 ("batman-adv: split out router from orig_node") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Avoid recursive call_rcu for batadv_nc_nodeSven Eckelmann1-11/+8
The batadv_nc_node_free_ref function uses call_rcu to delay the free of the batadv_nc_node object until no (already started) rcu_read_lock is enabled anymore. This makes sure that no context is still trying to access the object which should be removed. But batadv_nc_node also contains a reference to orig_node which must be removed. The reference drop of orig_node was done in the call_rcu function batadv_nc_node_free_rcu but should actually be done in the batadv_nc_node_release function to avoid nested call_rcus. This is important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not detect the inner call_rcu as relevant for its execution. Otherwise this barrier will most likely be inserted in the queue before the callback of the first call_rcu was executed. The caller of rcu_barrier will therefore continue to run before the inner call_rcu callback finished. Fixes: d56b1705e28c ("batman-adv: network coding - detect coding nodes and remove these after timeout") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-16batman-adv: Avoid recursive call_rcu for batadv_bla_claimSven Eckelmann1-7/+3
The batadv_claim_free_ref function uses call_rcu to delay the free of the batadv_bla_claim object until no (already started) rcu_read_lock is enabled anymore. This makes sure that no context is still trying to access the object which should be removed. But batadv_bla_claim also contains a reference to backbone_gw which must be removed. The reference drop of backbone_gw was done in the call_rcu function batadv_claim_free_rcu but should actually be done in the batadv_claim_release function to avoid nested call_rcus. This is important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not detect the inner call_rcu as relevant for its execution. Otherwise this barrier will most likely be inserted in the queue before the callback of the first call_rcu was executed. The caller of rcu_barrier will therefore continue to run before the inner call_rcu callback finished. Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Simon Wunderlich <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2016-01-15bna: fix Rx data corruption with VLAN stripping enabled and MTU > 4096Ivan Vecera1-13/+24
The multi-buffer Rx mode implemented in the past introduced a regression that causes a data corruption for received VLAN traffic when VLAN tag stripping is enabled. This mode is supported only be newer chipsets (1860) and is enabled when MTU > 4096. When this mode is enabled Rx queue contains buffers with fixed size 2048 bytes. Any incoming packet larger than 2048 is divided into multiple buffers that are attached as skb frags in polling routine. The driver assumes that all buffers associated with a packet except the last one is fully used (e.g. packet with size 5000 are divided into 3 buffers 2048 + 2048 + 904 bytes) and ignores true size reported in completions. This assumption is usually true but not when VLAN packet is received and VLAN tag stripping is enabled. In this case the first buffer is 2044 bytes long but as the driver always assumes 2048 bytes then 4 extra random bytes are included between the first and the second frag. Additionally the driver sets checksum as correct so the packet is properly processed by the core. The driver needs to check the size of used space in each Rx buffer reported by FW and not blindly use the fixed value. Cc: Rasesh Mody <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Rasesh Mody <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-01-15Merge tag 'clk-for-linus-4.5' of ↵Linus Torvalds119-1351/+20637
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk framework updates from Michael Turquette: "The clk framework and driver changes for 4.5 look pretty typical. The bulk of the changes are to clk controller drivers, though some improvements to the core and some re-usable blocks/templates also received some love. In this past cycle the clk maintainers developed a good workflow for handling the common case of patch submissions containing a new drivers, new shared Device Tree header and a new Device Tree binding description. This requires coordination with the Device Tree maintainers and with the architecture maintainers (typically the arm-soc tree in our case). This explains the increase in changes to include/dt-bindings/... and to Documentation/devicetree/bindings/clock/... coming from the clk tree. The same commits can be expected to come through those trees on occasion, through the use of shared, immutable branches" * tag 'clk-for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits) clk: remove duplicated COMMON_CLK_NXP record from clk/Kconfig clk: fix clk-gpio.c with optional clock= DT property clk: rockchip: fix section mismatches with new child-clocks clk: gpio: handle error codes for of_clk_get_parent_count() clk: gpio: fix memory leak clk: shmobile: r8a7795: Add SATA0 clock clk: bcm2835: Add PWM clock support clk: bcm2835: Support for clock parent selection clk: bcm2835: add a round up ability to the clock divisor clk: lpc32xx: add common clock framework driver clk: lpc18xx: add NXP specific COMMON_CLK_NXP configuration symbol dt-bindings: clock: add NXP LPC32xx clock list for consumers dt-bindings: clock: add description of LPC32xx USB clock controller dt-bindings: clock: add description of LPC32xx clock controller clk: rockchip: rk3036: include downstream muxes into fractional dividers clk: add flag for clocks that need to be enabled on rate changes clk: rockchip: Allow the RK3288 SPDIF clocks to change their parent clk: rockchip: include downstream muxes into fractional dividers clk: rockchip: handle mux dependency of fractional dividers clk: bcm2835: Add a driver for the auxiliary peripheral clock gates. ...
2016-01-15Merge branch 'dmi-for-linus' of ↵Linus Torvalds2-20/+43
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi updates from Jean Delvare. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi_scan: Save SMBIOS Type 9 System Slots firmware: dmi_scan: Fix dmi_find_device description firmware: dmi_scan: Clarify dmi_save_extended_devices firmware: dmi_scan: Optimize dmi_save_extended_devices
2016-01-15memcg: only free spare array when readers are doneMartijn Coenen1-5/+6
A spare array holding mem cgroup threshold events is kept around to make sure we can always safely deregister an event and have an array to store the new set of events in. In the scenario where we're going from 1 to 0 registered events, the pointer to the primary array containing 1 event is copied to the spare slot, and then the spare slot is freed because no events are left. However, it is freed before calling synchronize_rcu(), which means readers may still be accessing threshold->primary after it is freed. Fixed by only freeing after synchronize_rcu(). Signed-off-by: Martijn Coenen <[email protected]> Cc: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm: soft-offline: exit with failure for non anonymous thpNaoya Horiguchi1-8/+8
Currently memory_failure() doesn't handle non anonymous thp case, because we can hardly expect the error handling to be successful, and it can just hit some corner case which results in BUG_ON or something severe like that. This is also the case for soft offline code, so let's make it in the same way. Orignal code has a MF_COUNT_INCREASED check before put_hwpoison_page(), but it's unnecessary because get_any_page() is already called when running on this code, which takes a refcount of the target page regardress of the flag. So this patch also removes it. [[email protected]: fix build] Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm: soft-offline: clean up soft_offline_page()Naoya Horiguchi1-31/+47
soft_offline_page() has some deeply indented code, that's the sign of demand for cleanup. So let's do this. No functionality change. Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Andi Kleen <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm/hugetlbfs: unmap pages if page fault raced with hole punchMike Kravetz1-69/+75
Page faults can race with fallocate hole punch. If a page fault happens between the unmap and remove operations, the page is not removed and remains within the hole. This is not the desired behavior. The race is difficult to detect in user level code as even in the non-race case, a page within the hole could be faulted back in before fallocate returns. If userfaultfd is expanded to support hugetlbfs in the future, this race will be easier to observe. If this race is detected and a page is mapped, the remove operation (remove_inode_hugepages) will unmap the page before removing. The unmap within remove_inode_hugepages occurs with the hugetlb_fault_mutex held so that no other faults will be processed until the page is removed. The (unmodified) routine hugetlb_vmdelete_list was moved ahead of remove_inode_hugepages to satisfy the new reference. [[email protected]: move hugetlb_vmdelete_list()] Signed-off-by: Mike Kravetz <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list()Mike Kravetz1-8/+11
Hillf Danton noticed bugs in the hugetlb_vmtruncate_list routine. The argument end is of type pgoff_t. It was being converted to a vaddr offset and passed to unmap_hugepage_range. However, end was also being used as an argument to the vma_interval_tree_foreach controlling loop. In addition, the conversion of end to vaddr offset was incorrect. hugetlb_vmtruncate_list is called as part of a file truncate or fallocate hole punch operation. When truncating a hugetlbfs file, this bug could prevent some pages from being unmapped. This is possible if there are multiple vmas mapping the file, and there is a sufficiently sized hole between the mappings. The size of the hole between two vmas (A,B) must be such that the starting virtual address of B is greater than (ending virtual address of A << PAGE_SHIFT). In this case, the pages in B would not be unmapped. If pages are not properly unmapped during truncate, the following BUG is hit: kernel BUG at fs/hugetlbfs/inode.c:428! In the fallocate hole punch case, this bug could prevent pages from being unmapped as in the truncate case. However, for hole punch the result is that unmapped pages will not be removed during the operation. For hole punch, it is also possible that more pages than desired will be unmapped. This unnecessary unmapping will cause page faults to reestablish the mappings on subsequent page access. Fixes: 1bfad99ab (" hugetlbfs: hugetlb_vmtruncate_list() needs to take a range")Reported-by: Hillf Danton <[email protected]> Signed-off-by: Mike Kravetz <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dave Hansen <[email protected]> Cc: <[email protected]> [4.3] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm: make swapoff more robust against soft dirtyHugh Dickins1-14/+4
Both s390 and powerpc have hit the issue of swapoff hanging, when CONFIG_HAVE_ARCH_SOFT_DIRTY and CONFIG_MEM_SOFT_DIRTY ifdefs were not quite as x86_64 had them. I think it would be much clearer if HAVE_ARCH_SOFT_DIRTY was just a Kconfig option set by architectures to determine whether the MEM_SOFT_DIRTY option should be offered, and the actual code depend upon CONFIG_MEM_SOFT_DIRTY alone. But won't embark on that change myself: instead make swapoff more robust, by using pte_swp_clear_soft_dirty() on each pte it encounters, without an explicit #ifdef CONFIG_MEM_SOFT_DIRTY. That being a no-op, whether the bit in question is defined as 0 or the asm-generic fallback is used, unless soft dirty is fully turned on. Why "maybe" in maybe_same_pte()? Rename it pte_same_as_swp(). Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Martin Schwidefsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm: fix locking order in mm_take_all_locks()Kirill A. Shutemov3-21/+37
Dmitry Vyukov has reported[1] possible deadlock (triggered by his syzkaller fuzzer): Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&hugetlbfs_i_mmap_rwsem_key); lock(&mapping->i_mmap_rwsem); lock(&hugetlbfs_i_mmap_rwsem_key); lock(&mapping->i_mmap_rwsem); Both traces points to mm_take_all_locks() as a source of the problem. It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem. huge_pmd_share() does memory allocation under hugetlbfs_i_mmap_rwsem_key and allocator can take i_mmap_rwsem if it hit reclaim. So we need to take i_mmap_rwsem from all hugetlb VMAs before taking i_mmap_rwsem from rest of VMAs. The patch also documents locking order for hugetlbfs_i_mmap_rwsem_key. [1] http://lkml.kernel.org/r/CACT4Y+Zu95tBs-0EvdiAKzUOsb4tczRRfCRTpLr4bg_OP9HuVg@mail.gmail.com Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm: mempolicy: skip non-migratable VMAs when setting MPOL_MF_LAZYLiang Chen1-1/+2
MPOL_MF_LAZY is not visible from userspace since a720094ded8c ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now"), but it should still skip non-migratable VMAs such as VM_IO, VM_PFNMAP, and VM_HUGETLB VMAs, and avoid useless overhead of minor faults. Signed-off-by: Liang Chen <[email protected]> Signed-off-by: Gavin Guo <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm/page_alloc.c: remove unused struct zone *z variableAlexander Kuleshov1-2/+0
Remove unused struct zone *z variable which appeared in 86051ca5eaf5 ("mm: fix usemap initialization"). Signed-off-by: Alexander Kuleshov <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm/mlock.c: change can_do_mlock return value type to booleanWang Xiaoqiang2-5/+5
Since can_do_mlock only return 1 or 0, so make it boolean. No functional change. [[email protected]: update declaration in mm.h] Signed-off-by: Wang Xiaoqiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm/vmalloc.c: use macro IS_ALIGNED to judge the aligmentWang Xiaoqiang1-2/+2
Just cleanup, no functional change. Signed-off-by: Wang Xiaoqiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15cgroup, memcg, writeback: drop spurious rcu locking around ↵Tejun Heo2-5/+0
mem_cgroup_css_from_page() In earlier versions, mem_cgroup_css_from_page() could return non-root css on a legacy hierarchy which can go away and required rcu locking; however, the eventual version simply returns the root cgroup if memcg is on a legacy hierarchy and thus doesn't need rcu locking around or in it. Remove spurious rcu lockings. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm/page_isolation: do some cleanup in "undo_isolate_page_range"Wang Xiaoqiang1-2/+4
Use "IS_ALIGNED" to judge the alignment, rather than directly judging. Signed-off-by: Wang Xiaoqiang <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15memblock: fix section mismatchKirill A. Shutemov1-9/+9
allmodconfig produces following warning for me: WARNING: vmlinux.o(.text.unlikely+0x10314): Section mismatch in reference from the function movable_node_is_enabled() to the variable .meminit.data:movable_node_enabled The function movable_node_is_enabled() references the variable __meminitdata movable_node_enabled. This is often because movable_node_is_enabled lacks a __meminitdata annotation or the annotation of movable_node_enabled is wrong. Let's mark the function with __meminit. It fixes the warning. Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15s390/mm: enable fixup_user_fault retryingDominik Dingel1-3/+26
By passing a non-null flag we allow fixup_user_fault to retry, which enables userfaultfd. As during these retries we might drop the mmap_sem we need to check if that happened and redo the complete chain of actions. Signed-off-by: Dominik Dingel <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: "Jason J. Herne" <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm: bring in additional flag for fixup_user_fault to signal unlockDominik Dingel4-11/+34
During Jason's work with postcopy migration support for s390 a problem regarding gmap faults was discovered. The gmap code will call fixup_user_fault which will end up always in handle_mm_fault. Till now we never cared about retries, but as the userfaultfd code kind of relies on it. this needs some fix. This patchset does not take care of the futex code. I will now look closer at this. This patch (of 2): With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the faulting we ever unlocked mmap_sem. This patch brings in the logic to handle retries as well as it cleans up the current documentation. fixup_user_fault was not having the same semantics as filemap_fault. It never indicated if a retry happened and so a caller wasn't able to handle that case. So we now changed the behaviour to always retry a locked mmap_sem. Signed-off-by: Dominik Dingel <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: "Jason J. Herne" <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15dax: re-enable dax pmd mappingsDan Williams2-7/+4
Now that the get_user_pages() path knows how to handle dax-pmd mappings, remove the protections that disabled dax-pmd support. Tests available from github.com/pmem/ndctl: make TESTS="lib/test-dax.sh lib/test-mmap.sh" check Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15dax: provide diagnostics for pmd mapping failuresDan Williams1-8/+57
There is a wide gamut of conditions that can trigger the dax pmd path to fallback to pte mappings. Ideally we'd have a syscall interface to determine mapping characteristics after the fact. In the meantime provide debug messages. Signed-off-by: Dan Williams <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm, x86: get_user_pages() for dax mappingsDan Williams8-39/+212
A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver has established a devm_memremap_pages() mapping, i.e. when the pfn_t return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when encountering _PAGE_DEVMAP during a page table walk we lookup and pin a struct dev_pagemap instance to keep the result of pfn_to_page() valid until put_page(). Signed-off-by: Dan Williams <[email protected]> Tested-by: Logan Gunthorpe <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmdDan Williams7-27/+47
A dax-huge-page mapping while it uses some thp helpers is ultimately not a transparent huge page. The distinction is especially important in the get_user_pages() path. pmd_devmap() is used to distinguish dax-pmds from pmd_huge() and pmd_trans_huge() which have slightly different semantics. Explicitly mark the pmd_trans_huge() helpers that dax needs by adding pmd_devmap() checks. [[email protected]: fix regression in handling mlocked pages in __split_huge_pmd()] Signed-off-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gupDan Williams6-8/+125
get_dev_page() enables paths like get_user_pages() to pin a dynamically mapped pfn-range (devm_memremap_pages()) while the resulting struct page objects are in use. Unlike get_page() it may fail if the device is, or is in the process of being, disabled. While the initial lookup of the range may be an expensive list walk, the result is cached to speed up subsequent lookups which are likely to be in the same mapped range. devm_memremap_pages() now requires a reference counter to be specified at init time. For pmem this means moving request_queue allocation into pmem_alloc() so the existing queue usage counter can track "device pages". ZONE_DEVICE pages always have an elevated count and will never be on an lru reclaim list. That space in 'struct page' can be redirected for other uses, but for safety introduce a poison value that will always trip __list_add() to assert. This allows half of the struct list_head storage to be reclaimed with some assurance to back up the assumption that the page count never goes to zero and a list_add() is never attempted. Signed-off-by: Dan Williams <[email protected]> Tested-by: Logan Gunthorpe <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15libnvdimm, pmem: move request_queue allocation earlier in probeDan Williams1-13/+20
Before the dynamically allocated struct pages from devm_memremap_pages() can be put to use outside the driver, we need a mechanism to track whether they are still in use at teardown. Towards that goal reorder the initialization sequence to allow the 'q_usage_counter' from the request_queue to be used by the devm_memremap_pages() implementation (in subsequent patches). Signed-off-by: Dan Williams <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm, dax: convert vmf_insert_pfn_pmd() to pfn_tDan Williams8-11/+30
Similar to the conversion of vm_insert_mixed() use pfn_t in the vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the pfn is backed by a devm_memremap_pages() mapping. Signed-off-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm, dax, gpu: convert vm_insert_mixed to pfn_tDan Williams10-14/+61
Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of evaluating the PFN_MAP and PFN_DEV flags. When both are set it triggers _PAGE_DEVMAP to be set in the resulting pte. There are no functional changes to the gpu drivers as a result of this conversion. Signed-off-by: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>