Age | Commit message (Collapse) | Author | Files | Lines |
|
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
second level). Size of each stack is (num_frames + 3) * sizeof(long).
Which gives us ~84K stacks. This capacity was chosen empirically and it
is enough to run kernel normally.
However, when lots of configs are enabled and a fuzzer tries to maximize
code coverage, it easily hits the limit within tens of minutes. I've
tested for long a time with number of top level entries bumped 4x
(4096). And I think I've seen overflow only once. But I don't have all
configs enabled and code coverage has not reached maximum yet. So bump
it 8x to 8192.
Since we have two-level table, memory cost of this is very moderate --
currently the top-level table is 8KB, with this patch it is 64KB, which
is negligible under KASAN.
Here is some approx math.
128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of
alloc/free call sites are reachable with only one stack. But some
utility functions can have large fanout. Assuming average fanout is 5x,
total number of alloc/free stacks is ~300K.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Baozeng Ding <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When building with the latent_entropy plugin, set the default
CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
blocks that, when instrumented by the latent_entropy plugin, grow beyond
1024 byte stack size on 32-bit builds.
Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
Signed-off-by: Kees Cook <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Cc: Emese Revfy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Nikolay Aleksandrov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The use of config_enabled() is ambiguous. For config options,
IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.
I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.
Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:
- arch/x86/mm/kaslr.c
replace config_enabled() with IS_ENABLED() because
CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
- include/asm-generic/export.h
replace config_enabled() with __is_defined().
Then, config_enabled() can be removed now.
Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Acked-by: Nicolas Pitre <[email protected]>
Cc: Peter Oberparleiter <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Paul Bolle <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.
The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster. This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e. accounted
for.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aristeu Rozanski <[email protected]>
Reported-by: Stefan Schimanski <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Stefan Schimanski <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds. During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).
This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache. This counter is
updated when a slab is created or destroyed. This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement. Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.
We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aruna Ramakrishna <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Recent changes to printk require KERN_CONT uses to continue logging
messages. So add KERN_CONT where necessary.
[[email protected]: coding-style fixes]
Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
Reported-by: Mark Rutland <[email protected]>
Signed-off-by: Joe Perches <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821:
After some analysis it seems to be that the problem is in alloc_super().
In case list_lru_init_memcg() fails it goes into destroy_super(), which
calls list_lru_destroy().
And in list_lru_init() we see that in case memcg_init_list_lru() fails,
lru->node is freed, but not set NULL, which then leads list_lru_destroy()
to believe it is initialized and call memcg_destroy_list_lru().
memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus,
which is NULL.
[[email protected]: add comment]
Signed-off-by: Alexander Polakov <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Back in commit f56141e3e2d9 ("all arches, signal: move restart_block to
struct task_struct"), all architectures and core code were changed to
use task_struct::restart_block. However, when h8300 support was
subsequently restored in v4.2, it was not updated to account for this,
and maintains thread_info::restart_block, which is not kept in sync.
This patch drops the redundant restart_block from thread_info, and moves
h8300 to the common one in task_struct, ensuring that syscall restarting
always works as expected.
Fixes: f56141e3e2d9 ("all arches, signal: move restart_block to struct task_struct")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mark Rutland <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: [email protected]
Cc: <[email protected]> [4.2+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
in_interrupt() returns a nonzero value when we are either in an
interrupt or have bh disabled via local_bh_disable(). Since we are
interested in only ignoring coverage from actual interrupts, do a proper
check instead of just calling in_interrupt().
As a result of this change, kcov will start to collect coverage from
within local_bh_disable()/local_bh_enable() sections.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Konovalov <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Cc: Nicolai Stange <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: James Morse <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Quentin Casasnovas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a bug report that SLAB makes extreme load average due to over
2000 kworker thread.
https://bugzilla.kernel.org/show_bug.cgi?id=172981
This issue is caused by kmemcg feature that try to create new set of
kmem_caches for each memcg. Recently, kmem_cache creation is slowed by
synchronize_sched() and futher kmem_cache creation is also delayed since
kmem_cache creation is synchronized by a global slab_mutex lock. So,
the number of kworker that try to create kmem_cache increases quietly.
synchronize_sched() is for lockless access to node's shared array but
it's not needed when a new kmem_cache is created. So, this patch rules
out that case.
Fixes: 801faf0db894 ("mm/slab: lockless decision to grow cache")
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Doug Smythies <[email protected]>
Tested-by: Doug Smythies <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We need to wait until the percpu_ref is released before exit. Otherwise,
we sometimes lose the race and trigger this new warning that was added
in v4.9 (commit a67823c1ed10 "percpu-refcount: init ->confirm_switch
member properly"):
WARNING: CPU: 0 PID: 3629 at lib/percpu-refcount.c:107 percpu_ref_exit+0x51/0x60
[..]
Call Trace:
[<ffffffff814bf093>] dump_stack+0x85/0xc2
[<ffffffff810b15db>] __warn+0xcb/0xf0
[<ffffffff810b170d>] warn_slowpath_null+0x1d/0x20
[<ffffffff814d70c1>] percpu_ref_exit+0x51/0x60
[<ffffffffa005706a>] dax_pmem_percpu_exit+0x1a/0x50 [dax_pmem]
[<ffffffff81615f1f>] devm_action_release+0xf/0x20
Cc: <[email protected]>
Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
Signed-off-by: Dan Williams <[email protected]>
|
|
No, KASAN may not be able to co-exist with HOTPLUG_MEMORY at runtime,
but for build testing there is no reason not to allow them together.
This hopefully means better build coverage and fewer embarrasing silly
problems like the one fixed by commit 9db4f36e82c2 ("mm: remove unused
variable in memory hotplug") in the future.
Cc: Stephen Rothwell <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A bugfix just tried to address a randconfig build problem and introduced
a variant of the same problem: with CONFIG_LIBNVDIMM=y and
CONFIG_NVDIMM_DAX=m, the nvdimm module now fails to link:
drivers/nvdimm/built-in.o: In function `to_nd_device_type':
bus.c:(.text+0x1b5d): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_notify_driver_action.constprop.2':
region_devs.c:(.text+0x6b6c): undefined reference to `is_nd_dax'
region_devs.c:(.text+0x6b8c): undefined reference to `to_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_probe':
region.c:(.text+0x70f3): undefined reference to `nd_dax_create'
drivers/nvdimm/built-in.o: In function `mode_show':
namespace_devs.c:(.text+0xa196): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa55f): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa56e): undefined reference to `to_nd_dax'
This reverts the earlier fix, making NVDIMM_DAX a 'bool' option again
as it should be (it gets linked into the libnvdimm module). To fix
the original problem, I'm adding a dependency on LIBNVDIMM to
DEV_DAX_PMEM, which ensures we can't have that one built-in if the
rest is a module.
Fixes: 4e65e9381c7a ("/dev/dax: fix Kconfig dependency build breakage")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
When I removed the per-zone bitlock hashed waitqueues in commit
9dcb8b685fc3 ("mm: remove per-zone hashtable of bitlock waitqueues"), I
removed all the magic hotplug memory initialization of said waitqueues
too.
But when I actually _tested_ the resulting build, I stupidly assumed
that "allmodconfig" would enable memory hotplug. And it doesn't,
because it enables KASAN instead, which then disables hotplug memory
support.
As a result, my build test of the per-zone waitqueues was totally
broken, and I didn't notice that the compiler warns about the now unused
iterator variable 'i'.
I guess I should be happy that that seems to be the worst breakage from
my clearly horribly failed test coverage.
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has some driver bugfixes, module autoload fixes, and driver
enablement on some architectures"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: defer probe if bus recovery GPIOs are not ready
i2c: designware: Avoid aborted transfers with fast reacting I2C slaves
i2c: i801: Fix I2C Block Read on 8-Series/C220 and later
i2c: xgene: Avoid dma_buffer overrun
i2c: digicolor: Fix module autoload
i2c: xlr: Fix module autoload for OF registration
i2c: xlp9xx: Fix module autoload
i2c: jz4780: Fix module autoload
i2c: allow configuration of imx driver for ColdFire architecture
i2c: mark device nodes only in case of successful instantiation
i2c: rk3x: Give the tuning value 0 during rk3x_i2c_v0_calc_timings
i2c: hix5hd2: allow build with ARCH_HISI
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal updates from Zhang Rui:
"The latest Thermal Management updates for v4.9-rc3:
- Fix a regression introduced by commit
b721ca0d19(thermal/powerclamp: remove cpu whitelist), that
powerclamp driver checks cpu support in a wrong way. From: Eric
Ernst.
- Fix a problem that intel_pch_thermal driver misses passive trip
point when the PCH thermal device has an ACPI companion device
associated. From: Srinivas Pandruvada.
- Add missing support for Haswell PCH thermal sensor. From: Srinivas
Pandruvada"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal/powerclamp: correct cpu support check
thermal: intel_pch_thermal: Enable Haswell PCH
thermal: intel_pch_thermal: Add an ACPI passive trip
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"A few more s390 patches for 4.9:
- a fix for an overflow in the dasd driver reported by UBSAN
- fix a regression and add hotplug memory to the zone movable again
- add ignore defines for the pkey system calls
- fix the ouput of the merged stack tracer
- replace printk with pr_cont in arch/s390 where appropriate
- remove the arch specific return_address function again
- ignore reserved channel paths at boot time
- add a missing hugetlb_bad_size call to the arch backend"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: fix zone calculation in arch_add_memory()
s390/dumpstack: use pr_cont within show_stack and die
s390/dumpstack: get rid of return_address again
s390/disassambler: use pr_cont where appropriate
s390/dumpstack: use pr_cont where appropriate
s390/dumpstack: restore reliable indicator for call traces
s390/mm: use hugetlb_bad_size()
s390/cio: don't register chpids in reserved state
s390: ignore pkey system calls
s390/dasd: avoid undefined behaviour
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module maintainership updates from Rusty Russell:
"(Quoting from the MAINTAINERS commit:)
Being a Linux kernel maintainer has been my proudest professional
accomplishment, spanning the last 19 years. But now we have a surfeit
of excellent hackers, and I can hand this over without regret.
I'll still be around as co-maintainer for another cycle, but Jessica
is now the one to convince if you want your patches applied. She
rocks, and is far more timely than me too!"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MAINTAINERS: Begin module maintainer transition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull oreangefs updates from Mike Marshall:
"A couple of orangefs cleanups sent in by other developers:
- use d_fsdata instead of d_time (Miklos Szeredi)
- use file_inode(file) instead of file->f_path.dentry->d_inode (Amir
Goldstein)"
* tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: don't use d_time
orangefs: user file_inode() where it is due
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"This update contains fixes for most of the outstanding regressions
introduced with the 4.9-rc1 XFS merge. There is also a fix for an
iomap bug, too.
This is a quite a bit larger than I'd prefer for a -rc3, but most of
the change comes from cleaning up the new reflink copy on write code;
it's much simpler and easier to understand now. These changes fixed
several bugs in the new code, and it wasn't clear that there was an
easier/simpler way to fix them. The rest of the fixes are the usual
size you'd expect at this stage.
I've left the commits to soak in linux-next for a some extra time
because of the size before asking you to pull, no new problems with
them have been reported so I think it's all OK.
Summary:
- iomap page offset masking fix for page faults
- add IOMAP_REPORT to distinguish between read and fiemap map
requests
- cleanups to new shared data extent code
- fix mount active status on failed log recovery
- fix broken dquots in a buffer calculation
- fix locking order issues and merge xfs_reflink_remap_range and
xfs_file_share_range
- rework unmapping of CoW extents and remove now unused functions
- clean state when CoW is done"
* tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits)
xfs: clear cowblocks tag when cow fork is emptied
xfs: fix up inode cowblocks tracking tracepoints
fs: Do to trim high file position bits in iomap_page_mkwrite_actor
xfs: remove xfs_bunmapi_cow
xfs: optimize xfs_reflink_end_cow
xfs: optimize xfs_reflink_cancel_cow_blocks
xfs: refactor xfs_bunmapi_cow
xfs: optimize writes to reflink files
xfs: don't bother looking at the refcount tree for reads
xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared
xfs: add xfs_trim_extent
iomap: add IOMAP_REPORT
xfs: merge xfs_reflink_remap_range and xfs_file_share_range
xfs: remove xfs_file_wait_for_io
xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
xfs: fix the same_inode check in xfs_file_share_range
xfs: remove the same fs check from xfs_file_share_range
libxfs: v3 inodes are only valid on crc-enabled filesystems
libxfs: clean up _calc_dquots_per_chunk
xfs: unset MS_ACTIVE if mount fails
...
|
|
btrfs_remove_all_log_ctxs takes a shortcut where it avoids walking the
list because it knows all of the waiters are patiently waiting for the
commit to finish.
But, there's a small race where btrfs_sync_log can remove itself from
the list if it finds a log commit is already done. Also, it uses
list_del_init() to remove itself from the list, but there's no way to
know if btrfs_remove_all_log_ctxs has already run, so we don't know for
sure if it is safe to call list_del_init().
This gets rid of all the shortcuts for btrfs_remove_all_log_ctxs(), and
just calls it with the proper locking.
This is part two of the corruption fixed by cbd60aa7cd1. I should have
done this in the first place, but convinced myself the optimizations were
safe. A 12 hour run of dbench 2048 will eventually trigger a list debug
WARN_ON for the list_del_init() in btrfs_sync_log().
Fixes: d1433debe7f4346cf9fc0dafc71c3137d2a97bc4
Reported-by: Dave Jones <[email protected]>
cc: [email protected] # 3.15+
Signed-off-by: Chris Mason <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes: one is a fatal section mismatch (reference to init
after it's discarded) and the other two are iscsi locking fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: NCR5380: no longer mark irq probing as __init
scsi: be2iscsi: Replace _bh with _irqsave/irqrestore
scsi: libiscsi: Fix locking in __iscsi_conn_send_pdu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"The AHCI MSI handling change in rc1 was a bit broken and caused disk
probing failures on some machines. These three patches should fix the
issues"
David Howells comments:
"My test machine fell foul of this using a PCIe M.2-attached SSD card.
The patches fix it for me"
* 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: fix the single MSI-X case in ahci_init_one
ahci: fix nvec check
ahci: only try to use multi-MSI mode if there is more than 1 port
|
|
Pull block fixes from Jens Axboe:
"A set of fixes for this series, most notably the fix for the blk-mq
software queue regression in from this merge window.
Apart from that, a fix for an unlikely hang if a queue is flooded with
FUA requests from Ming, and a few small fixes for nbd and badblocks.
Lastly, a rename update for the proc softirq output, since the block
polling code was made generic"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: update hardware and software queues for sleeping alloc
block: flush: fix IO hang in case of flood fua req
nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown
badblocks: badblocks_set/clear update unacked_exist
softirq: Display IRQ_POLL for irq-poll statistics
|
|
The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:
wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().
The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).
It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.
As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.
Peter Zijlstra already has a patch for that, but let's see if anybody
even notices. In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.
Reported-by: Andreas Gruenbacher <[email protected]>
Tested-by: Bob Peterson <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Steven Whitehouse <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If we end up sleeping due to running out of requests, we should
update the hardware and software queues in the map ctx structure.
Otherwise we could end up having rq->mq_ctx point to the pre-sleep
context, and risk corrupting ctx->rq_list since we'll be
grabbing the wrong lock when inserting the request.
Reported-by: Dave Jones <[email protected]>
Reported-by: Chris Mason <[email protected]>
Tested-by: Chris Mason <[email protected]>
Fixes: 63581af3f31e ("blk-mq: remove non-blocking pass in blk_mq_map_request")
Signed-off-by: Jens Axboe <[email protected]>
|
|
When resizing a vt its selection may exceed the new size, resulting in
an invalid memory access [1]. Clear the selection before resizing.
[1] http://lkml.kernel.org/r/CACT4Y+acDTwy4umEvf5ROBGiRJNrxHN4Cn5szCXE5Jw-d1B=Xw@mail.gmail.com
Reported-and-tested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Scot Doyle <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The regmap_update first reads the IOState register and then triggers
a write if needed. However, GPIOS might be configured as an input so
the read to IOState on this GPIO is the current state which might
be random.
Signed-off-by: Francois Berder <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Renesas RZ/G SoC also have the SCIF, SCIFA, SCIFB, and HSCIF ports and
they seem compatible with the R-Car gen2 SoC in this respect...
Document RZ/G1[ME] (also known as R8A774[35]) SoC bindings.
Signed-off-by: Sergei Shtylyov <[email protected]>
Acked-by: Simon Horman <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
NXP SC16C2552 requires that we always write a reset to the RX FIFO and
TX FIFO whenever we enable the FIFOs
Cc: [email protected]
Signed-off-by: Steve Shih <[email protected]>
Signed-off-by: David Singleton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Size of kmalloc() in vc_do_resize() is controlled by user.
Too large kmalloc() size triggers WARNING message on console.
Put a reasonable upper bound on terminal size to prevent WARNINGs.
Signed-off-by: Dmitry Vyukov <[email protected]>
CC: David Rientjes <[email protected]>
Cc: One Thousand Gnomes <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Peter Hurley <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
In the case where head == 0 on the circular buffer, there should be one
DMA buffer, not two. The second zero-length buffer would break the
lpuart driver, transfer would never complete.
Signed-off-by: Aaron Brice <[email protected]>
Acked-by: Stefan Agner <[email protected]>
Tested-by: Stefan Agner <[email protected]>
Tested-by: Bhuvanchandra DV <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The commit 4fe0d154880b ("PCI: Use positive flags in pci_alloc_irq_vectors()")
replaces flags from negative to positive values which makes mandatory to have
the last argument in pci_alloc_irq_vectors() non-zero (if we want to be no-op).
This basically drops MSI enabling in 8250_lpss driver.
Restore desired behaviour in 8250_lpss by passing PCI_IRQ_ALL_TYPES instead of
0 to pci_alloc_irq_vectors().
Fixes: 60a9244a5d14 ("serial: 8250_lpss: enable MSI for Intel Quark")
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Bryan O'Donoghue <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Commit 761ed4a94582 ('tty: serial_core: convert uart_close to use
tty_port_close') started setting the ttyport console flag for serial
drivers. This is causing crashes, hangs, or garbage output on several
platforms because the serial shutdown is skipped and IRQs are left
enabled.
Partially revert commit 761ed4a94582 and drop reporting UART tty_ports
as a console leaving the console handling to the serial_core as it was
before.
Fixes: 761ed4a94582ab29 ("tty: serial_core: convert uart_close to use tty_port_close")
Reported-by: Niklas Söderlund <[email protected]>
Reported-by: Mike Galbraith <[email protected]>
Reported-by: Mugunthan V N <[email protected]>
Cc: Peter Hurley <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: [email protected]
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
At this point, 'value' is always a byte, then this code is clearing
bit 15, which is already clear. I meant to clear bit 7.
Signed-off-by: Masahiro Yamada <[email protected]>
Reported-by: Denys Vlasenko <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Commit 1681d2116c96 ("serial: 8250_uniphier: add "\n" at the end of
error log") missed this.
Signed-off-by: Denys Vlasenko <[email protected]>
[masahiro: add commit log]
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Make sure dmi_system_id tables are NULL terminated.
Signed-off-by: Wei Yongjun <[email protected]>
Acked-by: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This patch Adds the new compatible string for ZynqMP SoC.
Signed-off-by: Nava kishore Manne <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This patch Adds the new compatible string for ZynqMP SoC.
Signed-off-by: Nava kishore Manne <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
If NO_DMA=y:
drivers/built-in.o: In function `stm32_serial_remove':
stm32-usart.c:(.text+0xcea1a): undefined reference to `bad_dma_ops'
stm32-usart.c:(.text+0xcea7a): undefined reference to `bad_dma_ops'
Add a dependency on HAS_DMA to fix this.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Alexandre Torgue <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
drivers/tty/serial/stm32-usart.c: In function ‘stm32_receive_chars’:
drivers/tty/serial/stm32-usart.c:130: warning: comparison is always true due to limited range of data type
drivers/tty/serial/stm32-usart.c: In function ‘stm32_tx_dma_complete’:
drivers/tty/serial/stm32-usart.c:177: warning: comparison is always false due to limited range of data type
stm32_usart_offsets.icr is u8, while UNDEF_REG = ~0 is int, and thus
0xffffffff.
As all registers in stm32_usart_offsets are u8, change the definition of
UNDEF_REG to 0xff to fix this.
Fixes: ada8618ff3bfe183 ("serial: stm32: adding support for stm32f7")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
In csi_J(3), the third parameter of scr_memsetw (vc_screenbuf_size) is
divided by 2 inappropriatelly. But scr_memsetw expects size, not
count, because it divides the size by 2 on its own before doing actual
memset-by-words.
So remove the bogus division.
Signed-off-by: Jiri Slaby <[email protected]>
Cc: Petr Písař <[email protected]>
Fixes: f8df13e0a9 (tty: Clean console safely)
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This patch does a couple of things. First of all, powernv immediately
explodes when running a relocated kernel, because the system reset
exception for handling sleeps does not do correct relocated branches.
Secondly, the sleep handling code trashes the condition and cfar
registers, which we would like to preserve for debugging purposes (for
non-sleep case exception).
This patch changes the exception to use the standard format that saves
registers before any tests or branches are made. It adds the test for
idle-wakeup as an "extra" to break out of the normal exception path.
Then it branches to a relocated idle handler that calls the various
idle handling functions.
After this patch, POWER8 CPU simulator now boots powernv kernel that is
running at non-zero.
Fixes: 948cf67c4726 ("powerpc: Add NAP mode support on Power7 in HV mode")
Cc: [email protected] # v3.0+
Signed-off-by: Nicholas Piggin <[email protected]>
Acked-by: Gautham R. Shenoy <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
Before this patch, we used tlbiel, if we ever ran only on this core.
That was mostly derived from the nohash usage of the same. But is
incorrect, the ISA 3.0 clarifies tlbiel such that:
"All TLB entries that have all of the following properties are made
invalid on the thread executing the tlbiel instruction"
ie. tlbiel only invalidates TLB entries on the current thread. So if the
mm has been used on any other thread (aka. cpu) then we must broadcast
the invalidate.
This bug could lead to invalid TLB entries if a program runs on multiple
threads of a core.
Hence use tlbiel, if we only ever ran on only the current cpu.
Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines")
Cc: [email protected] # v4.7+
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
It should be ALTIVEC, not ALIVEC.
Cyril explains: If a thread performs a transaction with altivec and then
gets preempted for whatever reason, this bug may cause the kernel to not
re-enable altivec when that thread runs again. This will result in an
altivec unavailable fault, when that fault happens inside a user
transaction the kernel has no choice but to enable altivec and doom the
transaction.
The result is that transactions using altivec may get aborted more often
than they should.
The difficulty in catching this with a selftest is my deliberate use of
the word may above. Optimisations to avoid FPU/altivec/VSX faults mean
that the kernel will always leave them on for 255 switches. This code
prevents the kernel turning it off if it got to the 256th switch (and
userspace was transactional).
Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use")
Reviewed-by: Cyril Bur <[email protected]>
Signed-off-by: Valentin Rothberg <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
|
|
The stk1160 chip needs QUIRK_AUDIO_ALIGN_TRANSFER. This patch resolves
the issue reported on the mailing list
(http://marc.info/?l=linux-sound&m=139223599126215&w=2) and also fixes
bug 180071 (https://bugzilla.kernel.org/show_bug.cgi?id=180071).
Signed-off-by: Marcel Hasler <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Since commit:
8663e24d56dc ("sched/fair: Reorder cgroup creation code")
... the variable 'rq' in alloc_fair_sched_group() is set but no longer used.
Remove it to fix the following GCC warning when building with 'W=1':
kernel/sched/fair.c:8842:13: warning: variable ‘rq’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Tobias Klauser <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The following commit:
3732710ff6f2 ("objtool: Improve rare switch jump table pattern detection")
... improved objtool's ability to detect GCC switch statement jump
tables for GCC 6. However the check to allow short jumps with the
scanned range of instructions wasn't quite right. The pattern detection
should allow jumps to the indirect jump instruction itself.
This fixes the following warning:
drivers/infiniband/sw/rxe/rxe_comp.o: warning: objtool: rxe_completer()+0x315: sibling call from callable instruction with changed frame pointer
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 3732710ff6f2 ("objtool: Improve rare switch jump table pattern detection")
Link: http://lkml.kernel.org/r/20161026153408.2rifnw7bvoc5sex7@treble
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Since BIG_KEYS can't be compiled as module it requires one of the "stdrng"
providers to be compiled into kernel. Otherwise big_key_crypto_init() fails
on crypto_alloc_rng step and next dereference of big_key_skcipher (e.g. in
big_key_preparse()) results in a NULL pointer dereference.
Fixes: 13100a72f40f5748a04017e0ab3df4cf27c809ef ('Security: Keys: Big keys stored encrypted')
Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Stephan Mueller <[email protected]>
cc: Kirill Marinushkin <[email protected]>
cc: [email protected]
Signed-off-by: James Morris <[email protected]>
|
|
big_key has two separate initialisation functions, one that registers the
key type and one that registers the crypto. If the key type fails to
register, there's no problem if the crypto registers successfully because
there's no way to reach the crypto except through the key type.
However, if the key type registers successfully but the crypto does not,
big_key_rng and big_key_blkcipher may end up set to NULL - but the code
neither checks for this nor unregisters the big key key type.
Furthermore, since the key type is registered before the crypto, it is
theoretically possible for the kernel to try adding a big_key before the
crypto is set up, leading to the same effect.
Fix this by merging big_key_crypto_init() and big_key_init() and calling
the resulting function late. If they're going to be encrypted, we
shouldn't be creating big_keys before we have the facilities to do the
encryption available. The key type registration is also moved after the
crypto initialisation.
The fix also includes message printing on failure.
If the big_key type isn't correctly set up, simply doing:
dd if=/dev/zero bs=4096 count=1 | keyctl padd big_key a @s
ought to cause an oops.
Fixes: 13100a72f40f5748a04017e0ab3df4cf27c809ef ('Security: Keys: Big keys stored encrypted')
Signed-off-by: David Howells <[email protected]>
cc: Peter Hlavaty <[email protected]>
cc: Kirill Marinushkin <[email protected]>
cc: Artem Savkov <[email protected]>
cc: [email protected]
Signed-off-by: James Morris <[email protected]>
|