Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull cgroup writeback support from Jens Axboe:
"This is the big pull request for adding cgroup writeback support.
This code has been in development for a long time, and it has been
simmering in for-next for a good chunk of this cycle too. This is one
of those problems that has been talked about for at least half a
decade, finally there's a solution and code to go with it.
Also see last weeks writeup on LWN:
http://lwn.net/Articles/648292/"
* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
writeback, blkio: add documentation for cgroup writeback support
vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
writeback: do foreign inode detection iff cgroup writeback is enabled
v9fs: fix error handling in v9fs_session_init()
bdi: fix wrong error return value in cgwb_create()
buffer: remove unusued 'ret' variable
writeback: disassociate inodes from dying bdi_writebacks
writeback: implement foreign cgroup inode bdi_writeback switching
writeback: add lockdep annotation to inode_to_wb()
writeback: use unlocked_inode_to_wb transaction in inode_congested()
writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
writeback: implement [locked_]inode_to_wb_and_lock_list()
writeback: implement foreign cgroup inode detection
writeback: make writeback_control track the inode being written back
writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
writeback: implement memcg writeback domain based throttling
writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
writeback: implement memcg wb_domain
writeback: update wb_over_bg_thresh() to use wb_domain aware operations
...
|
|
Pull asm/scatterlist.h removal from Jens Axboe:
"We don't have any specific arch scatterlist anymore, since parisc
finally switched over. Kill the include"
* 'for-4.2/sg' of git://git.kernel.dk/linux-block:
remove scatterlist.h generation from arch Kbuild files
remove <asm/scatterlist.h>
|
|
The trace.h header when called without CONFIG_EVENT_TRACING enabled
(seldom done), will not compile because of a typo in the protocol
of trace_event_enum_update().
Cc: [email protected] # 4.1+
Signed-off-by: Steven Rostedt <[email protected]>
|
|
While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has
ps->infix.cnt--;
return ps->infix.string[ps->infix.tail++];
The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.
Luckily, only root can write to the filter.
Cc: [email protected] # 2.6.33+
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Pull block driver updates from Jens Axboe:
"This contains:
- a few race fixes for null_blk, from Akinobu Mita.
- a series of fixes for mtip32xx, from Asai Thambi and Selvan Mani at
Micron.
- NVMe:
* Fix for missing error return on allocation failure, from Axel
Lin.
* Code consolidation and cleanups from Christoph.
* Memory barrier addition, syncing queue count and queue
pointers. From Jon Derrick.
* Various fixes from Keith, an addition to support user
issue reset from sysfs or ioctl, and automatic namespace
rescan.
* Fix from Matias, avoiding losing some request flags when
marking the request failfast.
- small cleanups and sparse fixups for ps3vram. From Geert
Uytterhoeven and Geoff Lavand.
- s390/dasd dead code removal, from Jarod Wilson.
- a set of fixes and optimizations for loop, from Ming Lei.
- conversion to blkdev_reread_part() of loop, dasd, ndb. From Ming
Lei.
- updates to cciss. From Tomas Henzl"
* 'for-4.2/drivers' of git://git.kernel.dk/linux-block: (44 commits)
mtip32xx: Fix accessing freed memory
block: nvme-scsi: Catch kcalloc failure
NVMe: Fix IO for extended metadata formats
nvme: don't overwrite req->cmd_flags on sync cmd
mtip32xx: increase wait time for hba reset
mtip32xx: fix minor number
mtip32xx: remove unnecessary sleep in mtip_ftl_rebuild_poll()
mtip32xx: fix crash on surprise removal of the drive
mtip32xx: Abort I/O during secure erase operation
mtip32xx: fix incorrectly setting MTIP_DDF_SEC_LOCK_BIT
mtip32xx: remove unused variable 'port->allocated'
mtip32xx: fix rmmod issue
MAINTAINERS: Update ps3vram block driver
block/ps3vram: Remove obsolete reference to MTD
block/ps3vram: Fix sparse warnings
NVMe: Automatic namespace rescan
NVMe: Memory barrier before queue_count is incremented
NVMe: add sysfs and ioctl controller reset
null_blk: restart request processing on completion handler
null_blk: prevent timer handler running on a different CPU where started
...
|
|
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.
Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Vince Weaver <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Cc: [email protected] # 2.6.33+
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Pull core block IO update from Jens Axboe:
"Nothing really major in here, mostly a collection of smaller
optimizations and cleanups, mixed with various fixes. In more detail,
this contains:
- Addition of policy specific data to blkcg for block cgroups. From
Arianna Avanzini.
- Various cleanups around command types from Christoph.
- Cleanup of the suspend block I/O path from Christoph.
- Plugging updates from Shaohua and Jeff Moyer, for blk-mq.
- Eliminating atomic inc/dec of both remaining IO count and reference
count in a bio. From me.
- Fixes for SG gap and chunk size support for data-less (discards)
IO, so we can merge these better. From me.
- Small restructuring of blk-mq shared tag support, freeing drivers
from iterating hardware queues. From Keith Busch.
- A few cfq-iosched tweaks, from Tahsin Erdogan and me. Makes the
IOPS mode the default for non-rotational storage"
* 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
cfq-iosched: move group scheduling functions under ifdef
cfq-iosched: fix the setting of IOPS mode on SSDs
blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
block, cgroup: implement policy-specific per-blkcg data
block: Make CFQ default to IOPS mode on SSDs
block: add blk_set_queue_dying() to blkdev.h
blk-mq: Shared tag enhancements
block: don't honor chunk sizes for data-less IO
block: only honor SG gap prevention for merges that contain data
block: fix returnvar.cocci warnings
block, dm: don't copy bios for request clones
block: remove management of bi_remaining when restoring original bi_end_io
block: replace trylock with mutex_lock in blkdev_reread_part()
block: export blkdev_reread_part() and __blkdev_reread_part()
suspend: simplify block I/O handling
block: collapse bio bit space
block: remove unused BIO_RW_BLOCK and BIO_EOF flags
block: remove BIO_EOPNOTSUPP
...
|
|
Pull UBI/UBIFS updates from Richard Weinberger:
"Minor fixes for UBI and UBIFS"
* tag 'upstream-4.2-rc1' of git://git.infradead.org/linux-ubifs:
UBI: Remove unnecessary `\'
UBI: Use static class and attribute groups
UBI: add a helper function for updatting on-flash layout volumes
UBI: Fastmap: Do not add vol if it already exists
UBI: Init vol->reserved_pebs by assignment
UBI: Fastmap: Rename variables to make them meaningful
UBI: Fastmap: Remove unnecessary `\'
UBI: Fastmap: Use max() to get the larger value
ubifs: fix to check error code of register_shrinker
UBI: block: Dynamically allocate minor numbers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A very large number of cleanups and bug fixes --- in particular for
the ext4 encryption patches, which is a new feature added in the last
merge window. Also fix a number of long-standing xfstest failures.
(Quota writes failing due to ENOSPC, a race between truncate and
writepage in data=journalled mode that was causing generic/068 to
fail, and other corner cases.)
Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2
performance eliminating locking when a buffer is modified more than
once during a transaction (which is very common for allocation
bitmaps, for example), in which case the state of the journalled
buffer head doesn't need to change"
[ I renamed "ext4_follow_link()" to "ext4_encrypted_follow_link()" in
the merge resolution, to make it clear that that function is _only_
used for encrypted symlinks. The function doesn't actually work for
non-encrypted symlinks at all, and they use the generic helpers
- Linus ]
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits)
ext4: set lazytime on remount if MS_LAZYTIME is set by mount
ext4: only call ext4_truncate when size <= isize
ext4: make online defrag error reporting consistent
ext4: minor cleanup of ext4_da_reserve_space()
ext4: don't retry file block mapping on bigalloc fs with non-extent file
ext4: prevent ext4_quota_write() from failing due to ENOSPC
ext4: call sync_blockdev() before invalidate_bdev() in put_super()
jbd2: speedup jbd2_journal_dirty_metadata()
jbd2: get rid of open coded allocation retry loop
ext4: improve warning directory handling messages
jbd2: fix ocfs2 corrupt when updating journal superblock fails
ext4: mballoc: avoid 20-argument function call
ext4: wait for existing dio workers in ext4_alloc_file_blocks()
ext4: recalculate journal credits as inode depth changes
jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail()
ext4: use swap() in mext_page_double_lock()
ext4: use swap() in memswap()
ext4: fix race between truncate and __ext4_journalled_writepage()
ext4 crypto: fail the mount if blocksize != pagesize
ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate
...
|
|
Pull sparc fixes from David Miller:
"Sparc perf stack traversal fixes from David Ahern"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: perf: Use UREG_FP rather than UREG_I6
sparc64: perf: Add sanity checking on addresses in user stack
sparc64: Convert BUG_ON to warning
sparc: perf: Disable pagefaults while walking userspace stacks
|
|
Pull Renesas H8/300 architecture re-introduction from Yoshinori Sato.
We dropped arch/h8300 two years ago as stale and old, this is a new and
more modern rewritten arch support for the same architecture.
* tag 'for-4.2' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux: (27 commits)
h8300: fix typo.
h8300: Always build dtb
h8300: Remove ARCH_WANT_IPC_PARSE_VERSION
sh-sci: Get register size from platform device
clk: h8300: fix error handling in h8s2678_pll_clk_setup()
h8300: Symbol name fix
h8300: devicetree source
h8300: configs
h8300: IRQ chip driver
h8300: clocksource
h8300: clock driver
h8300: Build scripts
h8300: library functions
h8300: Memory management
h8300: miscellaneous functions
h8300: process helpers
h8300: compressed image support
h8300: Low level entry
h8300: kernel startup
h8300: Interrupt and exceptions
...
|
|
There are two generated files: crypto/rsakey-asn1.c and crypto/raskey-asn1.h,
after the cfc2bb32b31371d6bffc6bf2da3548f20ad48c83 commit. Let's add
.gitignore to ignore *-asn1.[ch] files.
Signed-off-by: Alexander Kuleshov <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
arm64:allmodconfig fails to build as follows.
In file included from include/acpi/platform/aclinux.h:74:0,
from include/acpi/platform/acenv.h:173,
from include/acpi/acpi.h:56,
from include/linux/acpi.h:37,
from ./arch/arm64/include/asm/dma-mapping.h:21,
from include/linux/dma-mapping.h:86,
from include/linux/skbuff.h:34,
from include/crypto/algapi.h:18,
from crypto/asymmetric_keys/rsa.c:16:
include/linux/ctype.h:15:12: error: expected ‘;’, ‘,’ or ‘)’
before numeric constant
#define _X 0x40 /* hex digit */
^
crypto/asymmetric_keys/rsa.c:123:47: note: in expansion of macro ‘_X’
static int RSA_I2OSP(MPI x, size_t xLen, u8 **_X)
^
crypto/asymmetric_keys/rsa.c: In function ‘RSA_verify_signature’:
crypto/asymmetric_keys/rsa.c:256:2: error:
implicit declaration of function ‘RSA_I2OSP’
The problem is caused by an unrelated include file change, resulting in
the inclusion of ctype.h on arm64. This in turn causes the local variable
_X to conflict with macro _X used in ctype.h.
Fixes: b6197b93fa4b ("arm64 : Introduce support for ACPI _CCA object")
Cc: Suthikulpanit, Suravee <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Don't print info about missing test for the internal
helper __driver-gcm-aes-aesni
changes in v2:
- marked test as fips allowed
Signed-off-by: Tadeusz Struk <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
"kzfree"
The kzfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
Signed-off-by: Tadeusz Struk <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The core of the Jitter RNG is intended to be compiled with -O0. To
ensure that the Jitter RNG can be compiled on all architectures,
separate out the RNG core into a stand-alone C file that can be compiled
with -O0 which does not depend on any kernel include file.
As no kernel includes can be used in the C file implementing the core
RNG, any dependencies on kernel code must be extracted.
A second file provides the link to the kernel and the kernel crypto API
that can be compiled with the regular compile options of the kernel.
Signed-off-by: Stephan Mueller <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
David Ahern says:
====================
sparc64: perf fixes for userspace stacks
Coming back to the perf userspace callchain problem. As a reminder there are
a series of problems trying to use perf to collect callchains with scheduling
tracepoints, e.g., perf sched record -g -- <cmd>.
The first patch disables pagefaults while walking the user stack. As discussed
a couple of months ago this is the right fix, but I was puzzled as to why
processes were terminating with sigbus (and sometimes sigsegv). I believe the
root of this problem is bad addresses trying to walk the frames using frame
pointers. The bad addresses lead to faults that get handled by do_sparc64_fault
and it aborts the task though I am still puzzled as to why it gets past this
check in do_sparc64_fault:
if (in_atomic() || !mm)
goto intr_or_no_mm;
pagefault_disable bumps the preempt_count which should make in_atomic return != 0
(building kernels with preemption set to voluntar, CONFIG_PREEMPT_VOLUNTARY=y).
While this set does not fully solve the problem it does prevent a number of
pain points with the current code, most notably able to lock up the system.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
perf walks userspace callchains by following frame pointers. Use the
UREG_FP macro to make it clearer that the %fp is being used.
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Processes are getting killed (sigbus or segv) while walking userspace
callchains when using perf. In some instances I have seen ufp = 0x7ff
which does not seem like a proper stack address.
This patch adds a function to run validity checks against the address
before attempting the copy_from_user. The checks are copied from the
x86 version as a start point with the addition of a 4-byte alignment
check.
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pagefault handling has a BUG_ON path that panics the system. Convert it to
a warning instead. There is no need to bring down the system for this kind
of failure.
The following was hit while running:
perf sched record -g -- make -j 16
[3609412.782801] kernel BUG at /opt/dahern/linux.git/arch/sparc/mm/fault_64.c:416!
[3609412.782833] \|/ ____ \|/
[3609412.782833] "@'/ .. \`@"
[3609412.782833] /_| \__/ |_\
[3609412.782833] \__U_/
[3609412.782870] cat(4516): Kernel bad sw trap 5 [#1]
[3609412.782889] CPU: 0 PID: 4516 Comm: cat Tainted: G E 4.1.0-rc8+ #6
[3609412.782909] task: fff8000126e31f80 ti: fff8000110d90000 task.ti: fff8000110d90000
[3609412.782931] TSTATE: 0000004411001603 TPC: 000000000096b164 TNPC: 000000000096b168 Y: 0000004e Tainted: G E
[3609412.782964] TPC: <do_sparc64_fault+0x5e4/0x6a0>
[3609412.782979] g0: 000000000096abe0 g1: 0000000000d314c4 g2: 0000000000000000 g3: 0000000000000001
[3609412.783009] g4: fff8000126e31f80 g5: fff80001302d2000 g6: fff8000110d90000 g7: 00000000000000ff
[3609412.783045] o0: 0000000000aff6a8 o1: 00000000000001a0 o2: 0000000000000001 o3: 0000000000000054
[3609412.783080] o4: fff8000100026820 o5: 0000000000000001 sp: fff8000110d935f1 ret_pc: 000000000096b15c
[3609412.783117] RPC: <do_sparc64_fault+0x5dc/0x6a0>
[3609412.783137] l0: 000007feff996000 l1: 0000000000030001 l2: 0000000000000004 l3: fff8000127bd0120
[3609412.783174] l4: 0000000000000054 l5: fff8000127bd0188 l6: 0000000000000000 l7: fff8000110d9dba8
[3609412.783210] i0: fff8000110d93f60 i1: fff8000110ca5530 i2: 000000000000003f i3: 0000000000000054
[3609412.783244] i4: fff800010000081a i5: fff8000100000398 i6: fff8000110d936a1 i7: 0000000000407c6c
[3609412.783286] I7: <sparc64_realfault_common+0x10/0x20>
[3609412.783308] Call Trace:
[3609412.783329] [0000000000407c6c] sparc64_realfault_common+0x10/0x20
[3609412.783353] Disabling lock debugging due to kernel taint
[3609412.783379] Caller[0000000000407c6c]: sparc64_realfault_common+0x10/0x20
[3609412.783449] Caller[fff80001002283e4]: 0xfff80001002283e4
[3609412.783471] Instruction DUMP: 921021a0 7feaff91 901222a8 <91d02005> 82086100 02f87f7b 808a2873 81cfe008 01000000
[3609412.783542] Kernel panic - not syncing: Fatal exception
[3609412.784605] Press Stop-A (L1-A) to return to the boot prom
[3609412.784615] ---[ end Kernel panic - not syncing: Fatal exception
With this patch rather than a panic I occasionally get something like this:
perf sched record -g -m 1024 -- make -j N
where N is based on number of cpus (128 to 1024 for a T7-4 and 8 for an 8 cpu
VM on a T5-2).
WARNING: CPU: 211 PID: 52565 at /opt/dahern/linux.git/arch/sparc/mm/fault_64.c:417 do_sparc64_fault+0x340/0x70c()
address (7feffcd6000) != regs->tpc (fff80001004873c0)
Modules linked in: ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cdc_ether usbnet mii ixgbe mdio igb i2c_algo_bit i2c_core ptp crc32c_sparc64 camellia_sparc64 des_sparc64 des_generic md5_sparc64 sha512_sparc64 sha1_sparc64 uio_pdrv_genirq uio usb_storage mpt3sas scsi_transport_sas raid_class aes_sparc64 sunvnet sunvdc sha256_sparc64(E) sha256_generic(E)
CPU: 211 PID: 52565 Comm: ld Tainted: G W E 4.1.0-rc8+ #19
Call Trace:
[000000000045ce30] warn_slowpath_common+0x7c/0xa0
[000000000045ceec] warn_slowpath_fmt+0x30/0x40
[000000000098ad64] do_sparc64_fault+0x340/0x70c
[0000000000407c2c] sparc64_realfault_common+0x10/0x20
---[ end trace 62ee02065a01a049 ]---
ld[52565]: segfault at fff80001004873c0 ip fff80001004873c0 (rpc fff8000100158868) sp 000007feffcd70e1 error 30002 in libc-2.12.so[fff8000100410000+184000]
The segfault is horrible, but better than a system panic.
An 8-cpu VM on a T5-2 also showed the above traces from time to time,
so it is a general problem and not specific to the T7 or baremetal.
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Page faults generated walking userspace stacks can call schedule to switch
out the task. When collecting callchains for scheduler tracepoints this
causes a deadlock as the tracepoints can be hit with the runqueue lock held:
[ 8138.159054] WARNING: CPU: 758 PID: 12488 at /opt/dahern/linux.git/arch/sparc/kernel/nmi.c:80 perfctr_irq+0x1f8/0x2b4()
[ 8138.203152] Watchdog detected hard LOCKUP on cpu 758
[ 8138.410969] CPU: 758 PID: 12488 Comm: perf Not tainted 4.0.0-rc6+ #6
[ 8138.437146] Call Trace:
[ 8138.447193] [000000000045cdd4] warn_slowpath_common+0x7c/0xa0
[ 8138.471238] [000000000045ce90] warn_slowpath_fmt+0x30/0x40
[ 8138.494189] [0000000000983e38] perfctr_irq+0x1f8/0x2b4
[ 8138.515716] [00000000004209f4] tl0_irq15+0x14/0x20
[ 8138.535791] [00000000009839ec] _raw_spin_trylock_bh+0x68/0x108
[ 8138.560180] [0000000000980018] __schedule+0xcc/0x710
[ 8138.580981] [00000000009806dc] preempt_schedule_common+0x10/0x3c
[ 8138.606082] [000000000098077c] _cond_resched+0x34/0x44
[ 8138.627603] [0000000000565990] kmem_cache_alloc_node+0x24/0x1a0
[ 8138.652345] [0000000000450b60] tsb_grow+0xac/0x488
[ 8138.672429] [0000000000985040] do_sparc64_fault+0x4dc/0x6e4
[ 8138.695736] [0000000000407c2c] sparc64_realfault_common+0x10/0x20
[ 8138.721202] [00000000006f2e24] NG4copy_from_user+0xa4/0x3c0
[ 8138.744510] [000000000044f900] perf_callchain_user+0x5c/0x6c
[ 8138.768182] [0000000000517b5c] perf_callchain+0x16c/0x19c
[ 8138.790774] [0000000000515f84] perf_prepare_sample+0x68/0x218
[ 8138.814801] ---[ end trace 42ca6294b1ff7573 ]---
As with PowerPC (b59a1bfcc240, "powerpc/perf: Disable pagefaults during
callchain stack read") disable pagefaults while walking userspace stacks.
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Apparently the compiler does fine without it, but it feels safer and
clearer to add the missing attribute.
Signed-off-by: Jean Delvare <[email protected]>
|
|
Signed-off-by: Jean Delvare <[email protected]>
|
|
The dmi-sysfs module adds DMI table structures entries under
/sys/firmware/dmi/entries only, so rename documentation file to
sysfs-firmware-dmi-entries as more appropriate. Without renaming it's
confusing to differ this from sysfs-firmware-dmi-tables that adds raw
DMI table and actually adds "dmi" kobject.
Signed-off-by: Ivan Khoronzhuk <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>
|
|
Some utils, like dmidecode and smbios, need to access SMBIOS entry
table area in order to get information like SMBIOS version, size, etc.
Currently it's done via /dev/mem. But for situation when /dev/mem
usage is disabled, the utils have to use dmi sysfs instead, which
doesn't represent SMBIOS entry and adds code/delay redundancy when direct
access for table is needed.
So this patch creates dmi/tables and adds SMBIOS entry point to allow
utils in question to work correctly without /dev/mem. Also patch adds
raw dmi table to simplify dmi table processing in user space, as
proposed by Jean Delvare.
Tested-by: Roy Franz <[email protected]>
Signed-off-by: Ivan Khoronzhuk <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>
|
|
The SMBIOS v3 entry points specify a maximum length for the DMI table,
not the exact length. Thus there may be garbage after the end-of-table
marker, which we don't want to export to user-space. Adjust dmi_len
when we find the end-of-table marker, so that only the actual table
payload is exported.
Signed-off-by: Jean Delvare <[email protected]>
Cc: Ivan Khoronzhuk <[email protected]>
|
|
The "dmi_table" function looks like data instance, but it does DMI
table decode. This patch renames it to "dmi_decode_table" name as
more appropriate. That allows us to use "dmi_table" name for correct
purposes.
Signed-off-by: Ivan Khoronzhuk <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>
|
|
I'll be maintaining the pending patches to the dmi_scan and dmi-id
drivers as a quilt tree.
Signed-off-by: Jean Delvare <[email protected]>
|
|
A 32-bit entry point to a DMI table says how many structures the table
contains. The SMBIOS specification explicitly says that end-of-table
markers should be ignored if they are not actually at the end of the
DMI table. So only honor the end-of-table marker for tables accessed
through 64-bit entry points, as they do not specify a structure count.
Fixes: fc43026278 ("dmi: add support for SMBIOS 3.0 64-bit entry point")
Signed-off-by: Jean Delvare <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Cc: Leif Lindholm <[email protected]>
Cc: Matt Fleming <[email protected]>
|
|
Merge first patchbomb from Andrew Morton:
- a few misc things
- ocfs2 udpates
- kernel/watchdog.c feature work (took ages to get right)
- most of MM. A few tricky bits are held up and probably won't make 4.2.
* emailed patches from Andrew Morton <[email protected]>: (91 commits)
mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
mm, thp: respect MPOL_PREFERRED policy with non-local node
tmpfs: truncate prealloc blocks past i_size
mm/memory hotplug: print the last vmemmap region at the end of hot add memory
mm/mmap.c: optimization of do_mmap_pgoff function
mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
mm: kmemleak: fix delete_object_*() race when called on the same memory block
mm: kmemleak: allow safe memory scanning during kmemleak disabling
memcg: convert mem_cgroup->under_oom from atomic_t to int
memcg: remove unused mem_cgroup->oom_wakeups
frontswap: allow multiple backends
x86, mirror: x86 enabling - find mirrored memory ranges
mm/memblock: allocate boot time data structures from mirrored memory
mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
mm: do not ignore mapping_gfp_mask in page cache allocation paths
mm/cma.c: fix typos in comments
mm/oom_kill.c: print points as unsigned int
mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull pstore updates from Tony Luck:
"Miscellaneous pstore improvements"
* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
ramoops: make it possible to change mem_type param.
pstore/ram: verify ramoops header before saving record
fs/pstore: Optimization function ramoops_init_przs
fs/pstore: update the backend parameter in pstore module
pstore: do not use message compression without lock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"New features:
- per-file encryption (e.g., ext4)
- FALLOC_FL_ZERO_RANGE
- FALLOC_FL_COLLAPSE_RANGE
- RENAME_WHITEOUT
Major enhancement/fixes:
- recovery broken superblocks
- enhance f2fs_trim_fs with a discard_map
- fix a race condition on dentry block allocation
- fix a deadlock during summary operation
- fix a missing fiemap result
.. and many minor bug fixes and clean-ups were done"
* tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (83 commits)
f2fs: do not trim preallocated blocks when truncating after i_size
f2fs crypto: add alloc_bounce_page
f2fs crypto: fix to handle errors likewise ext4
f2fs: drop the volatile_write flag only
f2fs: skip committing valid superblock
f2fs: setting discard option in parse_options()
f2fs: fix to return exact trimmed size
f2fs: support FALLOC_FL_INSERT_RANGE
f2fs: hide common code in f2fs_replace_block
f2fs: disable the discard option when device doesn't support
f2fs crypto: remove alloc_page for bounce_page
f2fs: fix a deadlock for summary page lock vs. sentry_lock
f2fs crypto: clean up error handling in f2fs_fname_setup_filename
f2fs crypto: avoid f2fs_inherit_context for symlink
f2fs crypto: do not set encryption policy for non-directory by ioctl
f2fs crypto: allow setting encryption policy once
f2fs crypto: check context consistent for rename2
f2fs: avoid duplicated code by reusing f2fs_read_end_io
f2fs crypto: use per-inode tfm structure
f2fs: recovering broken superblock during mount
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and cleanups from Jan Kara:
"The contains some small fixes and improvements in error handling for
UDF.
Bundled is also one ext3 coding style fix and a fix in quota
documentation"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: fix udf_load_pvoldesc()
udf: remove double err declaration in udf_file_write_iter()
UDF: support NFSv2 export
fs: ext3: super: fixed a space coding style issue
quota: Update documentation
udf: Return error from udf_find_entry()
udf: Make udf_get_filename() return error instead of 0 length file name
udf: bug on exotic flag in udf_get_filename()
udf: improve error management in udf_CS0toNLS()
udf: improve error management in udf_CS0toUTF8()
udf: unicode: update function name in comments
udf: remove unnecessary test in udf_build_ustr_exact()
udf: Return -ENOMEM when allocation fails in udf_get_filename()
|
|
Pull documentation updates from Jonathan Corbet:
"The main thing here is Ingo's big subdirectory documenting feature
support for each architecture. Beyond that, it's the usual pile of
fixes, tweaks, and small additions"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
doc:md: fix typo in md.txt.
Documentation/mic/mpssd: don't build x86 userspace when cross compiling
Documentation/prctl: don't build tsc tests when cross compiling
Documentation/vDSO: don't build tests when cross compiling
Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
Doc: Change wikipedia's URL from http to https
Documentation/kernel-parameters: add missing pciserial to the earlyprintk
Doc:pps: Fix typo in pps.txt
kbuild : Fix documentation of INSTALL_HDR_PATH
Documentation: filesystems: updated struct file_operations documentation in vfs.txt
kbuild: edit explanation of clean-files variable
Doc: ja_JP: Fix typo in HOWTO
Move freefall program from Documentation/ to tools/
Documentation: ARM: EXYNOS: Describe boot loaders interface
Doc:nfc: Fix typo in nfc-hci.txt
vfs: Minor documentation fix
Doc: networking: txtimestamp: fix printf format warning
Documentation, intel_pstate: Improve legacy mode internal governors description
Documentation: extend use case for EXPORT_SYMBOL_GPL()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
"Thanks to Samuel Thibault input device (keyboard) LEDs are no longer
hardwired within the input core but use LED subsystem and so allow use
of different triggers; Hans de Goede did a large update for the ALPS
touchpad driver; we have new TI drv2665 haptics driver and DA9063
OnKey driver, and host of other drivers got various fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (55 commits)
Input: pixcir_i2c_ts - fix receive error
MAINTAINERS: remove non existent input mt git tree
Input: improve usage of gpiod API
tty/vt/keyboard: define LED triggers for VT keyboard lock states
tty/vt/keyboard: define LED triggers for VT LED states
Input: export LEDs as class devices in sysfs
Input: cyttsp4 - use swap() in cyttsp4_get_touch()
Input: goodix - do not explicitly set evbits in input device
Input: goodix - export id and version read from device
Input: goodix - fix variable length array warning
Input: goodix - fix alignment issues
Input: add OnKey driver for DA9063 MFD part
Input: elan_i2c - add product IDs FW names
Input: elan_i2c - add support for multi IC type and iap format
Input: focaltech - report finger width to userspace
tty: remove platform_sysrq_reset_seq
Input: synaptics_i2c - use proper boolean values
Input: psmouse - use true instead of 1 for boolean values
Input: cyapa - fix a few typos in comments
Input: stmpe-ts - enforce device tree only mode
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- New APM X-Gene SoC EDAC driver (Loc Ho)
- AMD error injection module improvements (Aravind Gopalakrishnan)
- Altera Arria 10 support (Thor Thayer)
- misc fixes and cleanups all over the place
* tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (28 commits)
EDAC: Update Documentation/edac.txt
EDAC: Fix typos in Documentation/edac.txt
EDAC, mce_amd_inj: Set MISCV on injection
EDAC, mce_amd_inj: Move bit preparations before the injection
EDAC, mce_amd_inj: Cleanup and simplify README
EDAC, altera: Do not allow suspend when EDAC is enabled
EDAC, mce_amd_inj: Make inj_type static
arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support
EDAC, altera: Add Arria10 EDAC support
EDAC, altera: Refactor for Altera CycloneV SoC
EDAC, altera: Generalize driver to use DT Memory size
EDAC, mce_amd_inj: Add README file
EDAC, mce_amd_inj: Add individual permissions field to dfs_node
EDAC, mce_amd_inj: Modify flags attribute to use string arguments
EDAC, mce_amd_inj: Read out number of MCE banks from the hardware
EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too
EDAC, xgene: Fix cpuid abuse
EDAC, mpc85xx: Extend error address to 64 bit
EDAC, mpc8xxx: Adapt for FSL SoC
EDAC, edac_stub: Drop arch-specific include
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"Here is the bulk of pin control changes for the v4.2 series: Quite a
lot of new SoC subdrivers and two new main drivers this time, apart
from that business as usual.
Details:
Core functionality:
- Enable exclusive pin ownership: it is possible to flag a pin
controller so that GPIO and other functions cannot use a single pin
simultaneously.
New drivers:
- NXP LPC18xx System Control Unit pin controller
- Imagination Pistachio SoC pin controller
New subdrivers:
- Freescale i.MX7d SoC
- Intel Sunrisepoint-H PCH
- Renesas PFC R8A7793
- Renesas PFC R8A7794
- Mediatek MT6397, MT8127
- SiRF Atlas 7
- Allwinner A33
- Qualcomm MSM8660
- Marvell Armada 395
- Rockchip RK3368
Cleanups:
- A big cleanup of the Marvell MVEBU driver rectifying it to
correspond to reality
- Drop platform device probing from the SH PFC driver, we are now a
DT only shop for SuperH
- Drop obsolte multi-platform check for SH PFC
- Various janitorial: constification, grammar etc
Improvements:
- The AT91 GPIO portions now supports the set_multiple() feature
- Split out SPI pins on the Xilinx Zynq
- Support DTs without specific function nodes in the i.MX driver"
* tag 'pinctrl-v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits)
pinctrl: rockchip: add support for the rk3368
pinctrl: rockchip: generalize perpin driver-strength setting
pinctrl: sh-pfc: r8a7794: add SDHI pin groups
pinctrl: sh-pfc: r8a7794: add MMCIF pin groups
pinctrl: sh-pfc: add R8A7794 PFC support
pinctrl: make pinctrl_register() return proper error code
pinctrl: mvebu: armada-39x: add support for Armada 395 variant
pinctrl: mvebu: armada-39x: add missing SATA functions
pinctrl: mvebu: armada-39x: add missing PCIe functions
pinctrl: mvebu: armada-38x: add ptp functions
pinctrl: mvebu: armada-38x: add ua1 functions
pinctrl: mvebu: armada-38x: add nand functions
pinctrl: mvebu: armada-38x: add sata functions
pinctrl: mvebu: armada-xp: add dram functions
pinctrl: mvebu: armada-xp: add nand rb function
pinctrl: mvebu: armada-xp: add spi1 function
pinctrl: mvebu: armada-39x: normalize ref clock naming
pinctrl: mvebu: armada-xp: rename spi to spi0
pinctrl: mvebu: armada-370: align spi1 clock pin naming
pinctrl: mvebu: armada-370: align VDD cpu-pd pin naming with datasheet
...
|
|
I've only seen this once, and I failed to capture the
lockdep backtrace, but I did some investigations.
If we are calling into the MST layer from EDID probing,
we have the mode_config mutex held, if during that EDID
probing, the MST hub goes away, then we can get a deadlock
where the connector destruction function in the driver
tries to retake the mode config mutex.
This offloads connector destruction to a workqueue,
and avoid the subsequenct lock ordering issue.
Acked-by: Daniel Vetter <[email protected]>
Cc: [email protected]
Signed-off-by: Dave Airlie <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Changes to existing drivers:
- supply MODULE_DEVICE_TABLE() to ensure probing
- constify struct; da9052_bl
- enable compile test; lcd_l4f00242t03, lcd_lms283fg05, backlight_gpio
- suspend/resume bugfix; lp855x_bl
- devm_gpiod_get_optional() API fixup; pwm_bl
- error handling fixup; backlight"
* tag 'backlight-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: Change the return type of backlight_update_status() to int
backlight: pwm_bl: Simplify usage of devm_gpiod_get_optional
backlight: lp855x: Don't clear level on suspend/blank
backlight: Allow compile test of GPIO consumers if !GPIOLIB
video: backlight: da9052: Constify platform_device_id
gpio-backlight: Discover driver during boot time
|
|
Beginning at commit d52d3997f843 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc7-next-20150612 #1 Not tainted
-------------------------------
kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by systemd/1:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
#1: (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
#2: (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
stack backtrace:
CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 04/17/2014
Call Trace:
dump_stack+0x4c/0x6e
lockdep_rcu_suspicious+0xe7/0x120
___might_sleep+0x1d5/0x1f0
__might_sleep+0x4d/0x90
kmem_cache_alloc+0x47/0x250
create_object+0x39/0x2e0
kmemleak_alloc_percpu+0x61/0xe0
pcpu_alloc+0x370/0x630
Additional backtrace lines are truncated. In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs. As suggested by Martin KaFai Lau,
these are the clue to the fix. Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: Kamalesh Babulal <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: <[email protected]> [3.18+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit 077fcf116c8c ("mm/thp: allocate transparent hugepages on
local node"), we handle THP allocations on page fault in a special way -
for non-interleave memory policies, the allocation is only attempted on
the node local to the current CPU, if the policy's nodemask allows the
node.
This is motivated by the assumption that THP benefits cannot offset the
cost of remote accesses, so it's better to fallback to base pages on the
local node (which might still be available, while huge pages are not due
to fragmentation) than to allocate huge pages on a remote node.
The nodemask check prevents us from violating e.g. MPOL_BIND policies
where the local node is not among the allowed nodes. However, the
current implementation can still give surprising results for the
MPOL_PREFERRED policy when the preferred node is different than the
current CPU's local node.
In such case we should honor the preferred node and not use the local
node, which is what this patch does. If hugepage allocation on the
preferred node fails, we fall back to base pages and don't try other
nodes, with the same motivation as is done for the local node hugepage
allocations. The patch also moves the MPOL_INTERLEAVE check around to
simplify the hugepage specific test.
The difference can be demonstrated using in-tree transhuge-stress test
on the following 2-node machine where half memory on one node was
occupied to show the difference.
> numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 7878 MB
node 0 free: 3623 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 8045 MB
node 1 free: 7818 MB
node distances:
node 0 1
0: 10 21
1: 21 10
Before the patch:
> numactl -p0 -C0 ./transhuge-stress
transhuge-stress: 2.197 s/loop, 0.276 ms/page, 7249.168 MiB/s 7962 succeed, 0 failed, 1786 different pages
> numactl -p0 -C12 ./transhuge-stress
transhuge-stress: 2.962 s/loop, 0.372 ms/page, 5376.172 MiB/s 7962 succeed, 0 failed, 3873 different pages
Number of successful THP allocations corresponds to free memory on node 0 in
the first case and node 1 in the second case, i.e. -p parameter is ignored and
cpu binding "wins".
After the patch:
> numactl -p0 -C0 ./transhuge-stress
transhuge-stress: 2.183 s/loop, 0.274 ms/page, 7295.516 MiB/s 7962 succeed, 0 failed, 1760 different pages
> numactl -p0 -C12 ./transhuge-stress
transhuge-stress: 2.878 s/loop, 0.361 ms/page, 5533.638 MiB/s 7962 succeed, 0 failed, 1750 different pages
> numactl -p1 -C0 ./transhuge-stress
transhuge-stress: 4.628 s/loop, 0.581 ms/page, 3440.893 MiB/s 7962 succeed, 0 failed, 3918 different pages
The -p parameter is respected regardless of cpu binding.
> numactl -C0 ./transhuge-stress
transhuge-stress: 2.202 s/loop, 0.277 ms/page, 7230.003 MiB/s 7962 succeed, 0 failed, 1750 different pages
> numactl -C12 ./transhuge-stress
transhuge-stress: 3.020 s/loop, 0.379 ms/page, 5273.324 MiB/s 7962 succeed, 0 failed, 3916 different pages
Without -p parameter, hugepage restriction to CPU-local node works as before.
Fixes: 077fcf116c8c ("mm/thp: allocate transparent hugepages on local node")
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]> [4.0+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
One of the rocksdb people noticed that when you do something like this
fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
pwrite(fd, buf, 5M, 0)
ftruncate(5M)
on tmpfs, the file would still take up 10M: which led to super fun
issues because we were getting ENOSPC before we thought we should be
getting ENOSPC. This patch fixes the problem, and mirrors what all the
other fs'es do (and was agreed to be the correct behaviour at LSF).
I tested it locally to make sure it worked properly with the following
xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file
Without the patch we have "Blocks: 20480", with the patch we have the
correct value of "Blocks: 10240".
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When hot add two nodes continuously, we found the vmemmap region info is
a bit messed. The last region of node 2 is printed when node 3 hot
added, like the following:
Initmem setup node 2 [mem 0x0000000000000000-0xffffffffffffffff]
On node 2 totalpages: 0
Built 2 zonelists in Node order, mobility grouping on. Total pages: 16090539
Policy zone: Normal
init_memory_mapping: [mem 0x40000000000-0x407ffffffff]
[mem 0x40000000000-0x407ffffffff] page 1G
[ffffea1000000000-ffffea10001fffff] PMD -> [ffff8a077d800000-ffff8a077d9fffff] on node 2
[ffffea1000200000-ffffea10003fffff] PMD -> [ffff8a077de00000-ffff8a077dffffff] on node 2
...
[ffffea101f600000-ffffea101f9fffff] PMD -> [ffff8a074ac00000-ffff8a074affffff] on node 2
[ffffea101fa00000-ffffea101fdfffff] PMD -> [ffff8a074a800000-ffff8a074abfffff] on node 2
Initmem setup node 3 [mem 0x0000000000000000-0xffffffffffffffff]
On node 3 totalpages: 0
Built 3 zonelists in Node order, mobility grouping on. Total pages: 16090539
Policy zone: Normal
init_memory_mapping: [mem 0x60000000000-0x607ffffffff]
[mem 0x60000000000-0x607ffffffff] page 1G
[ffffea101fe00000-ffffea101fffffff] PMD -> [ffff8a074a400000-ffff8a074a5fffff] on node 2 <=== node 2 ???
[ffffea1800000000-ffffea18001fffff] PMD -> [ffff8a074a600000-ffff8a074a7fffff] on node 3
[ffffea1800200000-ffffea18005fffff] PMD -> [ffff8a074a000000-ffff8a074a3fffff] on node 3
[ffffea1800600000-ffffea18009fffff] PMD -> [ffff8a0749c00000-ffff8a0749ffffff] on node 3
...
The cause is the last region was missed at the and of hot add memory,
and p_start, p_end, node_start were not reset, so when hot add memory to
a new node, it will consider they are not contiguous blocks and print
the previous one. So we print the last vmemmap region at the end of hot
add memory to avoid the confusion.
Signed-off-by: Zhu Guihua <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The simple check for zero length memory mapping may be performed
earlier. So that in case of zero length memory mapping some unnecessary
code is not executed at all. It does not make the code less readable
and saves some CPU cycles.
Signed-off-by: Piotr Kwapulinski <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kmemleak memory scanning uses finer grained object->lock spinlocks
primarily to avoid races with the memory block freeing. However, the
pointer lookup in the rb tree requires the kmemleak_lock to be held.
This is currently done in the find_and_get_object() function for each
pointer-like location read during scanning. While this allows a low
latency on kmemleak_*() callbacks on other CPUs, the memory scanning is
slower.
This patch moves the kmemleak_lock outside the scan_block() loop,
acquiring/releasing it only once per scanned memory block. The
allow_resched logic is moved outside scan_block() and a new
scan_large_block() function is implemented which splits large blocks in
MAX_SCAN_SIZE chunks with cond_resched() calls in-between. A redundant
(object->flags & OBJECT_NO_SCAN) check is also removed from
scan_object().
With this patch, the kmemleak scanning performance is significantly
improved: at least 50% with lock debugging disabled and over an order of
magnitude with lock proving enabled (on an arm64 system).
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
While very unlikely (usually kmemleak or sl*b bug), the create_object()
function in mm/kmemleak.c may fail to insert a newly allocated object into
the rb tree. When this happens, kmemleak disables itself and prints
additional information about the object already found in the rb tree.
Such printing is done with the parent->lock acquired, however the
kmemleak_lock is already held. This is a potential race with the scanning
thread which acquires object->lock and kmemleak_lock in a
This patch removes the locking around the 'parent' object information
printing. Such object cannot be freed or removed from object_tree_root
and object_list since kmemleak_lock is already held. There is a very
small risk that some of the object data is being modified on another CPU
but the only downside is inconsistent information printing.
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kmemleak_do_cleanup() work thread already waits for the kmemleak_scan
thread to finish via kthread_stop(). Waiting in kthread_stop() while
scan_mutex is held may lead to deadlock if kmemleak_scan_thread() also
waits to acquire for scan_mutex.
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Calling delete_object_*() on the same pointer is not a standard use case
(unless there is a bug in the code calling kmemleak_free()). However,
during kmemleak disabling (error or user triggered via /sys), there is a
potential race between kmemleak_free() calls on a CPU and
__kmemleak_do_cleanup() on a different CPU.
The current delete_object_*() implementation first performs a look-up
holding kmemleak_lock, increments the object->use_count and then
re-acquires kmemleak_lock to remove the object from object_tree_root and
object_list.
This patch simplifies the delete_object_*() mechanism to both look up
and remove an object from the object_tree_root and object_list
atomically (guarded by kmemleak_lock). This allows safe concurrent
calls to delete_object_*() on the same pointer without additional
locking for synchronising the kmemleak_free_enabled flag.
A side effect is a slight improvement in the delete_object_*() performance
by avoiding acquiring kmemleak_lock twice and incrementing/decrementing
object->use_count.
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kmemleak scanning thread can run for minutes. Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock. Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ... -> __delete_object() function until the lock is
released in scan_object().
When a kmemleak error occurs (e.g. it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects. If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()). This leads to kmemleak_scan potentially triggering a
page fault.
This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed. The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.
Signed-off-by: Catalin Marinas <[email protected]>
Reported-by: Vignesh Radhakrishnan <[email protected]>
Tested-by: Vignesh Radhakrishnan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|