Age | Commit message (Collapse) | Author | Files | Lines |
|
If we just provide a helper to convert completion code to string, we can
combine all debugging messages into a single print.
[keep the old debug messages, for warn and grep -Mathias]
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
instead of using while(!list_empty()) followed by list_first_entry(), we
can actually use list_for_each_entry_safe().
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Remove duplicate code by using trb_to_noop() when
handling Aborted commads
Based on earlier code by Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Useful for turning both transfer and command trbs
into no-ops.
Based on earlier code by Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
xhci_unmap_td_bounce_buffer() already checks for a valid td->bounce_seg
and bails out early if that's invalid. There's no need to check for this
twice.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This way we can remove checks for valid ring from call sites of
xhci_unmap_td_bounce_buffer()
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
handle_tx_event() is not releasing xhci->lock nor reacquiring it, remove
the bogus annotation.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
By extracting xhci_td_cleanup() from finish_td(), code before clearer
and easier to follow.
There are no functional changes with this patch. It's merely a cleanup.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
no functional changes. Simple cleanup to make sure variables are ordered
in a 'reverse christmas tree' fashion. While at that, also remove an
obsolete comment which doesn't apply anymore.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Replace list_entry() with list_first_entry() and list_for_each() with
list_for_each_entry(). This makes the code slightly more readable.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
it does no good, let's remove it.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Instead of having several return points, let's use a local variable and
a single place to return. This makes the code slightly easier to read.
[set ret = IRQ_HANDLED in default working case -Mathias]
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Cleanup only. This patch is a mechaninal rename to make sure our macros
for TRB completion codes match what the specification uses to refer to
such errors. The idea behind this is that it makes it far easier to grep
the specification and match it with implementation.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
When calling xhci_dbg_regs() we actually _do_ want to know XHCI's
version. This might help figure out why certain problems only happen
in some cases.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This is a cleanup patch only, no functional changes. The idea is just to
make sure for loops look the same all over the driver.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Remove the unnecessary return line in xhci_pci_setup().
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Use list_is_singular() to check if cmd_list has only one entry.
[use list_empty() in queue command instead -Mathias]
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
No need to calculate remainder and length_field, if there is
no data phase of a control transfer.
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Drop an unnecessary assignment in prepare_transfer().
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
In case 'quirk-broken-port-ped' property is passed in via device property,
we should enable the corresponding BROKEN_PED quirk flag for XHCI core.
[[email protected]] Updated code from platform data to device property
and added DT binding.
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Roger Quadros <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Some devices from Texas Instruments [1] suffer from
a silicon bug where Port Enabled/Disabled bit
should not be used to silence an erroneous device.
The bug is so that if port is disabled with PED
bit, an IRQ for device removal (or attachment)
will never fire.
Just for the sake of completeness, the actual
problem lies with SNPS USB IP and this affects
all known versions up to 3.00a. A separate
patch will be added to dwc3 to enabled this
quirk flag if version is <= 3.00a.
[1] - AM572x Silicon Errata http://www.ti.com/lit/er/sprz429j/sprz429j.pdf
Section i896— USB xHCI Port Disable Feature Does Not Work
Signed-off-by: Felipe Balbi <[email protected]>
Signed-off-by: Roger Quadros <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This allows someone to grep for the complete warning message as in;
xhci-hcd xhci-hcd.0.auto: USB core suspending device not in U0/U1/U2.
Signed-off-by: Alexander Stein <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Since the 'addr_64' variable as legacy is unused now, then remove it from
xhci_hcd structure.
Signed-off-by: Baolin Wang <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The second try was a workaround for (what we thought was) command
ring failing to stop in the first place. But this turns out to be
due to the race that we have fixed(see "xhci: Fix race related to
abort operation"). With that fix, it is time to remove the second
try.
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Checking if the command timeout timer is pending when queueing the
first command to the command ring is not really useful, remove it.
Suggested-by: Lu Baolu <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
A counter was used to find out if the stop endpoint completion raced with
the stop endpoint timeout timer. This was needed in case the stop ep
completion failed to delete the timer as it was running on anoter cpu.
The EP_STOP_CMD_PENDING flag was not enough as a new stop endpoint command
may be queued between the command completion and timeout function, which
would set the flag back.
Instead of the separate counter that was used we can detect the race by
checking both the STOP_EP_PENDING flag and timer_pending in the timeout
function.
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
We don't want to confuse halted and stalled endpoint states with
a flag indicating we are waiting for a stop endpoint command to
finish or timeout
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
No functional change, De Morgan !(A && B) = (!A || !B)
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The 'isnew' variable in 'sddr09_write_lba' function is set but never
used.
This has been detected by building the driver with W=1:
drivers/usb/storage/sddr09.c: In function ‘sddr09_write_lba’:
drivers/usb/storage/sddr09.c:873:17: warning: variable ‘isnew’ set but
not used [-Wunused-but-set-variable]
int i, result, isnew;
^
Signed-off-by: Augusto Mecking Caringi <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Declare musb_hdrc_config structures as const as they are only stored in
the config field of a musb_hdrc_platform_data structure. This field is of
type const, so musb_hdrc_config structures having this property can be
made const too.
Done using Coccinelle:
@r disable optional_qualifier@
identifier x;
position p;
@@
static struct musb_hdrc_config x@p={...};
@ok@
struct musb_hdrc_platform_data pdata;
identifier r.x;
position p;
@@
pdata.config=&x@p;
@bad@
position p != {r.p,ok.p};
identifier r.x;
@@
x@p
@depends on !bad disable optional_qualifier@
identifier r.x;
@@
+const
struct musb_hdrc_config x;
File size before:
text data bss dec hex filename
1212 338 0 1550 60e drivers/usb/musb/jz4740.o
File size after:
text data bss dec hex filename
1268 290 0 1558 616 drivers/usb/musb/jz4740.o
File size before:
text data bss dec hex filename
6151 333 16 6500 1964 drivers/usb/musb/sunxi.o
File size after:
text data bss dec hex filename
6215 269 16 6500 1964 drivers/usb/musb/sunxi.o
File size before:
text data bss dec hex filename
3668 864 0 4532 11b4 drivers/usb/musb/ux500.o
File size after:
text data bss dec hex filename
3724 808 0 4532 11b4 drivers/usb/musb/ux500.o
Signed-off-by: Bhumika Goyal <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
We need the USB fixes in here as well to handle future merge issues and
dependancies.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
With patches "dmaengine: cppi41: Fix runtime PM timeouts with USB mass
storage", and "dmaengine: cppi41: Fix oops in cppi41_runtime_resume",
the pm_runtime_get/put() in cppi41_irq() is no longer needed. We now
guarantee that cppi41 is enabled when dma is in use.
We can still get pointless error -115 when musb is configured as a
usb peripheral. That's because we should now check for the state of
is_suspended instead.
Let's just remove the now useless code and replace it with a WARN().
Signed-off-by: Tony Lindgren <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
|
|
Commit fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
together with recent MUSB changes allowed USB and DMA on BeagleBone to idle
when no cable is connected. But looks like few corner case issues still
remain.
Looks like just by re-plugging USB cable about ten or so times on BeagleBone
when configured in USB peripheral mode we can get warnings and eventually
trigger an oops in cppi41 DMA:
WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:1154 cppi41_runtime_suspend+
x28/0x38 [cppi41]
...
WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:452
push_desc_queue+0x94/0x9c [cppi41]
...
Unable to handle kernel NULL pointer dereference at virtual
address 00000104
pgd = c0004000
[00000104] *pgd=00000000
Internal error: Oops: 805 [#1] SMP ARM
...
[<bf0d92cc>] (cppi41_runtime_resume [cppi41]) from [<c0589838>]
(__rpm_callback+0xc0/0x214)
[<c0589838>] (__rpm_callback) from [<c05899ac>] (rpm_callback+0x20/0x80)
[<c05899ac>] (rpm_callback) from [<c0589460>] (rpm_resume+0x504/0x78c)
[<c0589460>] (rpm_resume) from [<c058a1a0>] (pm_runtime_work+0x60/0xa8)
[<c058a1a0>] (pm_runtime_work) from [<c0156120>] (process_one_work+0x2b4/0x808)
This is because of a race with runtime PM and cppi41_dma_issue_pending()
as reported by Alexandre Bailon <[email protected]> in earlier
set of patches. Based on mailing list discussions we however came to the
conclusion that a different fix from Alexandre's fix is needed in order
to guarantee that DMA is really active when we try to use it.
To fix the issue, we need to add a driver specific flag as we otherwise
can have -EINPROGRESS state set by runtime PM and can't rely on
pm_runtime_active() to tell us when we can use the DMA.
And we need to make sure the DMA transfers get triggered in the queued
order. So let's always queue the transfers, then flush the queue
from both cppi41_dma_issue_pending() and cppi41_runtime_resume()
as suggested by Grygorii Strashko <[email protected]> in an
earlier example patch.
For reference, this is also documented in Documentation/power/runtime_pm.txt
in the example at the end of the file as pointed out by Grygorii Strashko
<[email protected]>.
Based on earlier patches from Alexandre Bailon <[email protected]>
and Grygorii Strashko <[email protected]> modified based on
testing and what was discussed on the mailing lists.
Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
Cc: Andy Shevchenko <[email protected]>
Cc: Bin Liu <[email protected]>
Cc: Grygorii Strashko <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Patrick Titiano <[email protected]>
Cc: Sergei Shtylyov <[email protected]>
Reported-by: Alexandre Bailon <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
Tested-by: Bin Liu <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
|
|
Commit fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
added runtime PM support for cppi41, but had corner case issues. Some of
the issues were fixed with commit 098de42ad670 ("dmaengine: cppi41: Fix
unpaired pm runtime when only a USB hub is connected"). That fix however
caused a new regression where we can get error -115 messages with USB on
BeagleBone when connecting a USB mass storage device to a hub.
This is because when connecting a USB mass storage device to a hub, the
initial DMA transfers can take over 200ms to complete and cppi41
autosuspend delay times out.
To fix the issue, we want to implement refcounting for chan_busy array
that contains the active dma transfers. Increasing the autosuspend delay
won't help as that the delay could be potentially seconds, and it's best
to let the USB subsystem to deal with the timeouts on errors.
The earlier attempt for runtime PM was buggy as the pm_runtime_get/put()
calls could get unpaired easily as they did not follow the state of
the chan_busy array as described in commit 098de42ad670 ("dmaengine:
cppi41: Fix unpaired pm runtime when only a USB hub is connected".
Let's fix the issue by adding pm_runtime_get() to where a new transfer
is added to the chan_busy array, and calls to pm_runtime_put() where
chan_busy array entry is cleared. This prevents any autosuspend timeouts
from happening while dma transfers are active.
Fixes: 098de42ad670 ("dmaengine: cppi41: Fix unpaired pm runtime when
only a USB hub is connected")
Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
Cc: Andy Shevchenko <[email protected]>
Cc: Bin Liu <[email protected]>
Cc: Grygorii Strashko <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Patrick Titiano <[email protected]>
Cc: Sergei Shtylyov <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
Tested-by: Bin Liu <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
|
|
Anton says: In commit 4db7327194db ("powerpc: Add option to use jump
label for cpu_has_feature()") and commit c12e6f24d413 ("powerpc: Add
option to use jump label for mmu_has_feature()") we added:
BUILD_BUG_ON(!__builtin_constant_p(feature))
to cpu_has_feature() and mmu_has_feature() in order to catch usage
issues (such as cpu_has_feature(cpu_has_feature(X), which has happened
once in the past). Unfortunately LLVM isn't smart enough to resolve
this, and it errors out.
I work around it in my clang/LLVM builds of the kernel, but I have just
discovered that it causes a lot of issues for the bcc (eBPF) trace tool
(which uses LLVM).
For now just #ifdef it away for clang builds.
Fixes: 4db7327194db ("powerpc: Add option to use jump label for cpu_has_feature()")
Fixes: c12e6f24d413 ("powerpc: Add option to use jump label for mmu_has_feature()")
Cc: [email protected] # v4.8+
Reported-by: Anton Blanchard <[email protected]>
Tested-by: Naveen N. Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
We've moved to lists.freedesktop.org from lists.01.org.
Update info in MAINTAINERS.
Signed-off-by: Zhenyu Wang <[email protected]>
|
|
According to kmem_cache_sanity_check(), spaces are not allowed in the
name of a cache and results in a kernel oops with CONFIG_DEBUG_VM.
Convert to underscores.
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
|
|
Per the ABI specification[1], each mdev_supported_types entry should
have an available_instances, with an "s", not available_instance.
[1] Documentation/ABI/testing/sysfs-bus-vfio-mdev
Signed-off-by: Alex Williamson <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
|
|
This reverts commit 7611fb68062f ("thermal: thermal_hwmon: Convert to
hwmon_device_register_with_info()").
Pavel Machek reported breakage in the Nokia N900 due to this commit.
We can revisit a proper fix for the warning later.
Reported-by: Pavel Machek <[email protected]>
Signed-off-by: Fabio Estevam <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
|
|
Merge fixes from Andrew Morton:
"26 fixes"
* emailed patches from Andrew Morton <[email protected]>: (26 commits)
MAINTAINERS: add Dan Streetman to zbud maintainers
MAINTAINERS: add Dan Streetman to zswap maintainers
mm: do not export ioremap_page_range symbol for external module
mn10300: fix build error of missing fpu_save()
romfs: use different way to generate fsid for BLOCK or MTD
frv: add missing atomic64 operations
mm, page_alloc: fix premature OOM when racing with cpuset mems update
mm, page_alloc: move cpuset seqcount checking to slowpath
mm, page_alloc: fix fast-path race with cpuset update or removal
mm, page_alloc: fix check for NULL preferred_zone
kernel/panic.c: add missing \n
fbdev: color map copying bounds checking
frv: add atomic64_add_unless()
mm/mempolicy.c: do not put mempolicy before using its nodemask
radix-tree: fix private list warnings
Documentation/filesystems/proc.txt: add VmPin
mm, memcg: do not retry precharge charges
proc: add a schedule point in proc_pid_readdir()
mm: alloc_contig: re-allow CMA to compact FS pages
mm/slub.c: trace free objects at KERN_INFO
...
|
|
Add myself as zbud maintainer.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dan Streetman <[email protected]>
Cc: Seth Jennings <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add myself as zswap maintainer.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dan Streetman <[email protected]>
Acked-by: Seth Jennings <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Recently, I've found cases in which ioremap_page_range was used
incorrectly, in external modules, leading to crashes. This can be
partly attributed to the fact that ioremap_page_range is lower-level,
with fewer protections, as compared to the other functions that an
external module would typically call. Those include:
ioremap_cache
ioremap_nocache
ioremap_prot
ioremap_uc
ioremap_wc
ioremap_wt
...each of which wraps __ioremap_caller, which in turn provides a safer
way to achieve the mapping.
Therefore, stop EXPORT-ing ioremap_page_range.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: zhong jiang <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Suggested-by: John Hubbard <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When CONFIG_FPU is not enabled on arch/mn10300, <asm/switch_to.h> causes
a build error with a call to fpu_save():
kernel/built-in.o: In function `.L410':
core.c:(.sched.text+0x28a): undefined reference to `fpu_save'
Fix this by including <asm/fpu.h> in <asm/switch_to.h> so that an empty
static inline fpu_save() is defined.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Reviewed-by: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 8a59f5d25265 ("fs/romfs: return f_fsid for statfs(2)") generates
a 64bit id from sb->s_bdev->bd_dev. This is only correct when romfs is
defined with CONFIG_ROMFS_ON_BLOCK. If romfs is only defined with
CONFIG_ROMFS_ON_MTD, sb->s_bdev is NULL, referencing sb->s_bdev->bd_dev
will triger an oops.
Richard Weinberger points out that when CONFIG_ROMFS_BACKED_BY_BOTH=y,
both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD are defined.
Therefore when calling huge_encode_dev() to generate a 64bit id, I use
the follow order to choose parameter,
- CONFIG_ROMFS_ON_BLOCK defined
use sb->s_bdev->bd_dev
- CONFIG_ROMFS_ON_BLOCK undefined and CONFIG_ROMFS_ON_MTD defined
use sb->s_dev when,
- both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD undefined
leave id as 0
When CONFIG_ROMFS_ON_MTD is defined and sb->s_mtd is not NULL, sb->s_dev
is set to a device ID generated by MTD_BLOCK_MAJOR and mtd index,
otherwise sb->s_dev is 0.
This is a try-best effort to generate a uniq file system ID, if all the
above conditions are not meet, f_fsid of this romfs instance will be 0.
Generally only one romfs can be built on single MTD block device, this
method is enough to identify multiple romfs instances in a computer.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Coly Li <[email protected]>
Reported-by: Nong Li <[email protected]>
Tested-by: Nong Li <[email protected]>
Cc: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some more atomic64 operations were missing and as a result frv
allmodconfig was failing. Add the missing operations.
Link: http://lkml.kernel.org/r/1485193844-12850-1-git-send-email-sudip.mukherjee@codethink.co.uk
Signed-off-by: Sudip Mukherjee <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory. The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.
The problem comes from insufficient protection against cpuset changes,
which can cause get_page_from_freelist() to consider all zones as
non-eligible due to nodemask and/or current->mems_allowed. This was
masked in the past by sufficient retries, but since commit 682a3385e773
("mm, page_alloc: inline the fast path of the zonelist iterator") we fix
the preferred_zoneref once, and don't iterate over the whole zonelist in
further attempts, thus the only eligible zones might be placed in the
zonelist before our starting point and we always miss them.
A previous patch fixed this problem for current->mems_allowed. However,
cpuset changes also update the task's mempolicy nodemask. The fix has
two parts. We have to repeat the preferred_zoneref search when we
detect cpuset update by way of seqcount, and we have to check the
seqcount before considering OOM.
[[email protected]: fix typo in comment]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Ganapatrao Kulkarni <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is a preparation for the following patch to make review simpler.
While the primary motivation is a bug fix, this also simplifies the fast
path, although the moved code is only enabled when cpusets are in use.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Ganapatrao Kulkarni <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory. The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.
One possible cause is that in the fast path we find the preferred
zoneref according to current mems_allowed, so that it points to the
middle of the zonelist, skipping e.g. zones of node 1 completely. If
the mems_allowed is updated to contain only node 1, we never reach it in
the zonelist, and trigger OOM before checking the cpuset_mems_cookie.
This patch fixes the particular case by redoing the preferred zoneref
search if we switch back to the original nodemask. The condition is
also slightly changed so that when the last non-root cpuset is removed,
we don't miss it.
Note that this is not a full fix, and more patches will follow.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator")
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Ganapatrao Kulkarni <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "fix premature OOM regression in 4.7+ due to cpuset races".
This is v2 of my attempt to fix the recent report based on LTP cpuset
stress test [1]. The intention is to go to stable 4.9 LTSS with this,
as triggering repeated OOMs is not nice. That's why the patches try to
be not too intrusive.
Unfortunately why investigating I found that modifying the testcase to
use per-VMA policies instead of per-task policies will bring the OOM's
back, but that seems to be much older and harder to fix problem. I have
posted a RFC [2] but I believe that fixing the recent regressions has a
higher priority.
Longer-term we might try to think how to fix the cpuset mess in a better
and less error prone way. I was for example very surprised to learn,
that cpuset updates change not only task->mems_allowed, but also
nodemask of mempolicies. Until now I expected the parameter to
alloc_pages_nodemask() to be stable. I wonder why do we then treat
cpusets specially in get_page_from_freelist() and distinguish HARDWALL
etc, when there's unconditional intersection between mempolicy and
cpuset. I would expect the nodemask adjustment for saving overhead in
g_p_f(), but that clearly doesn't happen in the current form. So we
have both crazy complexity and overhead, AFAICS.
[1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
[2] https://lkml.kernel.org/r/[email protected]
This patch (of 4):
Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first
zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
which can theoretically happen due to concurrent cpuset modification. We
check the zoneref pointer which is never NULL and we should check the zone
pointer. Also document this in first_zones_zonelist() comment per Michal
Hocko.
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Ganapatrao Kulkarni <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|