aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-04-11dcache: account external names as indirectly reclaimable memoryRoman Gushchin1-9/+30
I received a report about suspicious growth of unreclaimable slabs on some machines. I've found that it happens on machines with low memory pressure, and these unreclaimable slabs are external names attached to dentries. External names are allocated using generic kmalloc() function, so they are accounted as unreclaimable. But they are held by dentries, which are reclaimable, and they will be reclaimed under the memory pressure. In particular, this breaks MemAvailable calculation, as it doesn't take unreclaimable slabs into account. This leads to a silly situation, when a machine is almost idle, has no memory pressure and therefore has a big dentry cache. And the resulting MemAvailable is too low to start a new workload. To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is used to track the amount of memory, consumed by external names. The counter is increased in the dentry allocation path, if an external name structure is allocated; and it's decreased in the dentry freeing path. To reproduce the problem I've used the following Python script: import os for iter in range (0, 10000000): try: name = ("/some_long_name_%d" % iter) + "_" * 220 os.stat(name) except Exception: pass Without this patch: $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7811688 kB $ python indirect.py $ cat /proc/meminfo | grep MemAvailable MemAvailable: 2753052 kB With the patch: $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7809516 kB $ python indirect.py $ cat /proc/meminfo | grep MemAvailable MemAvailable: 7749144 kB [[email protected]: fix indirectly reclaimable memory accounting for CONFIG_SLOB] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix indirectly reclaimable memory accounting] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm: treat indirectly reclaimable memory as available in MemAvailableRoman Gushchin1-0/+7
Adjust /proc/meminfo MemAvailable calculation by adding the amount of indirectly reclaimable memory (rounded to the PAGE_SIZE). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTESRoman Gushchin2-0/+2
Patch series "indirectly reclaimable memory", v2. This patchset introduces the concept of indirectly reclaimable memory and applies it to fix the issue of when a big number of dentries with external names can significantly affect the MemAvailable value. This patch (of 3): Introduce a concept of indirectly reclaimable memory and adds the corresponding memory counter and /proc/vmstat item. Indirectly reclaimable memory is any sort of memory, used by the kernel (except of reclaimable slabs), which is actually reclaimable, i.e. will be released under memory pressure. The counter is in bytes, as it's not always possible to count such objects in pages. The name contains BYTES by analogy to NR_KERNEL_STACK_KB. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11sr: get/drop reference to device in revalidate and check_eventsJens Axboe1-4/+15
We can't just use scsi_cd() to get the scsi_cd structure, we have to grab a live reference to the device. For both callbacks, we're not inside an open where we already hold a reference to the device. This fixes device removal/addition under concurrent device access, which otherwise could result in the below oops. NULL pointer dereference at 0000000000000010 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: sr 12:0:0:0: [sr2] scsi-1 drive scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme nvme_core sb_edac xl sr 12:0:0:0: Attached scsi CD-ROM sr2 sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash lzo_compress zlib_defc sr 12:0:0:0: Attached scsi generic sg7 type 5 igb ahci libahci i2c_algo_bit libata dca [last unloaded: crc_t10dif] CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650 Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016 RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod] RSP: 0018:ffff883ff357bb58 EFLAGS: 00010292 RAX: ffffffffa00b07d0 RBX: ffff883ff3058000 RCX: ffff883ff357bb66 RDX: 0000000000000003 RSI: 0000000000007530 RDI: ffff881fea631000 RBP: 0000000000000000 R08: ffff881fe4d38400 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000000001b6 R12: 000000000800005d R13: 000000000800005d R14: ffff883ffd9b3790 R15: 0000000000000000 FS: 00007f7dc8e6d8c0(0000) GS:ffff883fff340000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000003ffda98005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? __invalidate_device+0x48/0x60 check_disk_change+0x4c/0x60 sr_block_open+0x16/0xd0 [sr_mod] __blkdev_get+0xb9/0x450 ? iget5_locked+0x1c0/0x1e0 blkdev_get+0x11e/0x320 ? bdget+0x11d/0x150 ? _raw_spin_unlock+0xa/0x20 ? bd_acquire+0xc0/0xc0 do_dentry_open+0x1b0/0x320 ? inode_permission+0x24/0xc0 path_openat+0x4e6/0x1420 ? cpumask_any_but+0x1f/0x40 ? flush_tlb_mm_range+0xa0/0x120 do_filp_open+0x8c/0xf0 ? __seccomp_filter+0x28/0x230 ? _raw_spin_unlock+0xa/0x20 ? __handle_mm_fault+0x7d6/0x9b0 ? list_lru_add+0xa8/0xc0 ? _raw_spin_unlock+0xa/0x20 ? __alloc_fd+0xaf/0x160 ? do_sys_open+0x1a6/0x230 do_sys_open+0x1a6/0x230 do_syscall_64+0x5a/0x100 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Reviewed-by: Lee Duncan <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-04-11s390: correct nospec auto detection init orderMartin Schwidefsky3-6/+6
With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via early_initcall is done *after* the early_param functions. This overwrites any settings done with the nobp/no_spectre_v2/spectre_v2 parameters. The code patching for the kernel is done after the evaluation of the early parameters but before the early_initcall is done. The end result is a kernel image that is patched correctly but the kernel modules are not. Make sure that the nospec auto detection function is called before the early parameters are evaluated and before the code patching is done. Fixes: 6e179d64126b ("s390: add automatic detection of the spectre defense") Signed-off-by: Martin Schwidefsky <[email protected]>
2018-04-11tracing: Enforce passing in filter=NULL to create_filter()Steven Rostedt (VMware)1-14/+10
There's some inconsistency with what to set the output parameter filterp when passing to create_filter(..., struct event_filter **filterp). Whatever filterp points to, should be NULL when calling this function. The create_filter() calls create_filter_start() with a pointer to a local "filter" variable that is set to NULL. The create_filter_start() has a WARN_ON() if the passed in pointer isn't pointing to a value set to NULL. Ideally, create_filter() should pass the filterp variable it received to create_filter_start() and not hide it as with a local variable, this allowed create_filter() to fail, and not update the passed in filter, and the caller of create_filter() then tried to free filter, which was never initialized to anything, causing memory corruption. Link: http://lkml.kernel.org/r/[email protected] Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster") Reported-by: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-04-11trace_uprobe: Simplify probes_seq_show()Ravi Bangoria1-18/+3
Simplify probes_seq_show() function. No change in output before and after patch. Link: http://lkml.kernel.org/r/[email protected] Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-04-11trace_uprobe: Use %lx to display offsetRavi Bangoria1-1/+1
tu->offset is unsigned long, not a pointer, thus %lx should be used to print it, not the %px. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Acked-by: Masami Hiramatsu <[email protected]> Fixes: 0e4d819d0893 ("trace_uprobe: Display correct offset in uprobe_events") Suggested-by: Kees Cook <[email protected]> Signed-off-by: Ravi Bangoria <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-04-11tracing/uprobe: Add support for overlayfsHoward McLauchlan1-1/+1
uprobes cannot successfully attach to binaries located in a directory mounted with overlayfs. To verify, create directories for mounting overlayfs (upper,lower,work,merge), move some binary into merge/ and use readelf to obtain some known instruction of the binary. I used /bin/true and the entry instruction(0x13b0): $ mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merge $ cd /sys/kernel/debug/tracing $ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events $ echo 1 > events/uprobes/true_entry/enable This returns 'bash: echo: write error: Input/output error' and dmesg tells us 'event trace: Could not enable event true_entry' This change makes create_trace_uprobe() look for the real inode of a dentry. In the case of normal filesystems, this simplifies to just returning the inode. In the case of overlayfs(and similar fs) we will obtain the underlying dentry and corresponding inode, upon which uprobes can successfully register. Running the example above with the patch applied, we can see that the uprobe is enabled and will output to trace as expected. Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Signed-off-by: Howard McLauchlan <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-04-11tracing: Use ARRAY_SIZE() macro instead of open coding itJérémy Lefaure1-1/+1
It is useless to re-invent the ARRAY_SIZE macro so let's use it instead of DATA_CNT. Found with Coccinelle with the following semantic patch: @r depends on (org || report)@ type T; T[] E; position p; @@ ( (sizeof(E)@p /sizeof(*E)) | (sizeof(E)@p /sizeof(E[...])) | (sizeof(E)@p /sizeof(T)) ) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérémy Lefaure <[email protected]> [ Removed useless include of kernel.h ] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-04-11Merge branch 'vhost-fix-vhost_vq_access_ok-log-check'David S. Miller2-36/+38
Stefan Hajnoczi says: ==================== vhost: fix vhost_vq_access_ok() log check v3: * Rebased onto net/master and resolved conflict [DaveM] v2: * Rewrote the conditional to make the vq access check clearer [Linus] * Added Patch 2 to make the return type consistent and harder to misuse [Linus] The first patch fixes the vhost virtqueue access check which was recently broken. The second patch replaces the int return type with bool to prevent future bugs. ==================== Acked-by: Jason Wang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11vhost: return bool from *_access_ok() functionsStefan Hajnoczi2-35/+35
Currently vhost *_access_ok() functions return int. This is error-prone because there are two popular conventions: 1. 0 means failure, 1 means success 2. -errno means failure, 0 means success Although vhost mostly uses #1, it does not do so consistently. umem_access_ok() uses #2. This patch changes the return type from int to bool so that false means failure and true means success. This eliminates a potential source of errors. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11vhost: fix vhost_vq_access_ok() log checkStefan Hajnoczi1-3/+5
Commit d65026c6c62e7d9616c8ceb5a53b68bcdc050525 ("vhost: validate log when IOTLB is enabled") introduced a regression. The logic was originally: if (vq->iotlb) return 1; return A && B; After the patch the short-circuit logic for A was inverted: if (A || vq->iotlb) return A; return B; This patch fixes the regression by rewriting the checks in the obvious way, no longer returning A when vq->iotlb is non-NULL (which is hard to understand). Reported-by: [email protected] Cc: Jason Wang <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11vhost: Fix vhost_copy_to_user()Eric Auger1-1/+1
vhost_copy_to_user is used to copy vring used elements to userspace. We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC. Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Eric Auger <[email protected]> Acked-by: Jason Wang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11Merge branch 'Aquantia-atlantic-critical-fixes-04-2018'David S. Miller2-3/+21
Igor Russkikh says: ==================== Aquantia atlantic critical fixes 04/2018 Two regressions on latest 4.16 driver reported by users Some of old FW (1.5.44) had a link management logic which prevents driver to make clean reset. Driver of 4.16 has a full hardware reset implemented and that broke the link and traffic on such a cards. Second is oops on shutdown callback in case interface is already closed or was never opened. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-04-11net: aquantia: oops when shutdown on already stopped deviceIgor Russkikh1-3/+5
In case netdev is closed at the moment of pci shutdown, aq_nic_stop gets called second time. napi_disable in that case hangs indefinitely. In other case, if device was never opened at all, we get oops because of null pointer access. We should invoke aq_nic_stop conditionally, only if device is running at the moment of shutdown. Reported-by: David Arcari <[email protected]> Fixes: 90869ddfefeb ("net: aquantia: Implement pci shutdown callback") Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11net: aquantia: Regression on reset with 1.x firmwareIgor Russkikh1-0/+16
On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake" is active. With this mode when motherboard gets powered (but no poweron happens yet), NIC automatically enables powersave link and watches for WOL packet. This normally allows to powerup the PC after AC power failures. Not all motherboards or bios settings gives power to PCI slots, so this mode is not enabled on all the hardware. 4.16 linux driver introduced full hardware reset sequence This is required since before that we had no NIC hardware reset implemented and there were side effects of "not clean start". But this full reset is incompatible with "dirty wake" WOL feature it keeps the PHY link in a special mode forever. As a consequence, driver sees no link and no traffic. To fix this we forcibly change FW state to idle state before doing the full reset. This makes FW to restore link state. Fixes: c8c82eb net: aquantia: Introduce global AQC hardware reset sequence Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWANBassem Boubaker1-0/+6
The Cinterion AHS8 is a 3G device with one embedded WWAN interface using cdc_ether as a driver. The modem is controlled via AT commands through the exposed TTYs. AT+CGDCONT write command can be used to activate or deactivate a WWAN connection for a PDP context defined with the same command. UE supports one WWAN adapter. Signed-off-by: Bassem Boubaker <[email protected]> Acked-by: Oliver Neukum <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11slip: Check if rstate is initialized before uncompressingTejaswi Tanikella2-0/+6
On receiving a packet the state index points to the rstate which must be used to fill up IP and TCP headers. But if the state index points to a rstate which is unitialized, i.e. filled with zeros, it gets stuck in an infinite loop inside ip_fast_csum trying to compute the ip checsum of a header with zero length. 89.666953: <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468 89.666965: <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c 89.666978: <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0 89.666991: <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198 89.667005: <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370 89.667027: <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250 89.667040: <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60 89.667053: <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c 89.667065: <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38 89.667085: <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154 89.667098: <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c 89.667117: <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c 89.667131: <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c ./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output: ip_fast_csum at arch/arm64/include/asm/checksum.h:40 (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615 Adding a variable to indicate if the current rstate is initialized. If such a packet arrives, move to toss state. Signed-off-by: Tejaswi Tanikella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11lan78xx: Avoid spurious kevent 4 "error"Phil Elwell1-1/+1
lan78xx_defer_event generates an error message whenever the work item is already scheduled. lan78xx_open defers three events - EVENT_STAT_UPDATE, EVENT_DEV_OPEN and EVENT_LINK_RESET. Being aware of the likelihood (or certainty) of an error message, the DEV_OPEN event is added to the set of pending events directly, relying on the subsequent deferral of the EVENT_LINK_RESET call to schedule the work. Take the same precaution with EVENT_STAT_UPDATE to avoid a totally unnecessary error message. Signed-off-by: Phil Elwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11lan78xx: Correctly indicate invalid OTPPhil Elwell1-1/+2
lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP content, but the value gets overwritten before it is returned and the read goes ahead anyway. Make the read conditional as it should be and preserve the error code. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Phil Elwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11rds: MP-RDS may use an invalid c_pathKa-Cheong Poon1-5/+10
rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to send a message. Suppose the RDS connection is not yet up. In rds_send_mprds_hash(), it does if (conn->c_npaths == 0) wait_event_interruptible(conn->c_hs_waitq, (conn->c_npaths != 0)); If it is interrupted before the connection is set up, rds_send_mprds_hash() will return a non-zero hash value. Hence rds_sendmsg() will use a non-zero c_path to send the message. But if the RDS connection ends up to be non-MP capable, the message will be lost as only the zero c_path can be used. Signed-off-by: Ka-Cheong Poon <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11blk-mq: Revert "blk-mq: reimplement blk_mq_hw_queue_mapped"Ming Lei1-1/+1
This reverts commit 127276c6ce5a30fcc806b7fe53015f4f89b62956. When all CPUs of one hw queue become offline, there still may have IOs not completed from this hctx. But blk_mq_hw_queue_mapped() is called in blk_mq_queue_tag_busy_iter(), which is used for iterating request in timeout handler, timeout event will be missed on the inactive hctx, then request may never be completed. Also the replementation of blk_mq_hw_queue_mapped() doesn't match the helper's name any more, and it should have been named as blk_mq_hw_queue_active(). Even other callers need further verification about this reimplemenation. So revert this patch now, and we can improve hw queue activate/inactivate event after adequent researching and test. Cc: Stefan Haberland <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Reported-by: Jens Axboe <[email protected]> Fixes: 127276c6ce5a30fcc ("blk-mq: reimplement blk_mq_hw_queue_mapped") Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-04-11PCI: Remove messages about reassigning resourcesDesnes A. Nunes do Rosario2-3/+0
When reassigning device resources to increase their alignment, e.g., because of a "pci=resource_alignment=" kernel parameter or because the platform aligns resources to its page size, we previously emitted messages like this: pci 0000:00:00.0: Disabling memory decoding and releasing memory resources pci 0000:00:00.0: disabling bridge mem windows These messages don't convey any useful information, so remove them. Fixes: 38274637699 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned") Signed-off-by: Desnes A. Nunes do Rosario <[email protected]> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <[email protected]>
2018-04-11Merge branches 'pm-cpuidle' and 'pm-qos'Rafael J. Wysocki16-120/+415
* pm-cpuidle: tick-sched: avoid a maybe-uninitialized warning cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC sched: idle: Do not stop the tick before cpuidle_idle_call() sched: idle: Do not stop the tick upfront in the idle loop time: tick-sched: Reorganize idle tick management code * pm-qos: PM / QoS: mark expected switch fall-throughs
2018-04-11parisc: Switch to generic COMPAT_BINFMT_ELFHelge Deller4-132/+42
Drop our own compat binfmt implementation in arch/parisc/kernel/binfmt_elf32.c in favour of the generic implementation with CONFIG_COMPAT_BINFMT_ELF. While cleaning up the dependencies, I noticed that ELF_PLATFORM was strangely defined: On a 32-bit kernel, it was defined to "PARISC", while when running in compat mode on a 64-bit kernel it was defined to "PARISC32". Since it doesn't seem to be used in glibc yet, it's now defined in both cases to "PARISC". In any case, it can be distinguished because it's either a 32-bit or a 64-bit ELF file. Signed-off-by: Helge Deller <[email protected]>
2018-04-11parisc: Move cache flush functions into .text.hot sectionHelge Deller2-5/+6
and move the disable_sr_hashing() C and assembly functions into the .init section. Signed-off-by: Helge Deller <[email protected]>
2018-04-11parisc/signal: Add FPE_CONDTRAP for conditional trap handlingHelge Deller4-12/+7
Posix and common sense requires that SI_USER not be a signal specific si_code. Thus add a new FPE_CONDTRAP si_code for conditional traps. Signed-off-by: Helge Deller <[email protected]> Cc: Stephen Rothwell <[email protected]>
2018-04-11MAINTAINERS: Update ASPEED entry with detailsJoel Stanley1-2/+7
I am interested in all ASPEED drivers, and the previous match wasn't grabbing files in nested directories. Use N instead. Add the arm kernel mailing list so that patches get reviewed there, and the linux-aspeed list which exists only so I can use patchwork to track patches. Add Andrew as a reviewer, because he is involved in reviewing ASPEED stuff. Signed-off-by: Joel Stanley <[email protected]> Acked-by: Andrew Jeffery <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2018-04-11s390/zcrypt: Support up to 256 crypto adapters.Harald Freudenberger6-120/+237
There was an artificial restriction on the card/adapter id to only 6 bits but all the AP commands do support adapter ids with 8 bit. This patch removes this restriction to 64 adapters and now up to 256 adapter can get addressed. Some of the ioctl calls work on the max number of cards possible (which was 64). These ioctls are now deprecated but still supported. All the defines, structs and ioctl interface declarations have been kept for compabibility. There are now new ioctls (and defines for these) with an additional '2' appended which provide the extended versions with 256 cards supported. Signed-off-by: Harald Freudenberger <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2018-04-10Force log to disk before reading the AGF during a fstrimCarlos Maiolino1-7/+7
Forcing the log to disk after reading the agf is wrong, we might be calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held. This can cause a deadlock when racing a fstrim with a filesystem shutdown. The deadlock has been identified due a miscalculation bug in device-mapper dm-thin, which returns lack of space to its users earlier than the device itself really runs out of space, changing the device-mapper volume into an error state. The problem happened while filling the filesystem with a single file, triggering the bug in device-mapper, consequently causing an IO error and shutting down the filesystem. If such file is removed, and fstrim executed before the XFS finishes the shut down process, the fstrim process will end up holding the buffer lock, and going to sleep on the cil wait queue. At this point, the shut down process will try to wake up all the threads waiting on the cil wait queue, but for this, it will try to hold the same buffer log already held my the fstrim, locking up the filesystem. Signed-off-by: Carlos Maiolino <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-04-10Export __set_page_dirtyMatthew Wilcox3-14/+5
XFS currently contains a copy-and-paste of __set_page_dirty(). Export it from buffer.c instead. Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Jeff Layton <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2018-04-10Merge remote-tracking branch ↵Benson Leung5-59/+194
'origin/ib-chrome-platform-cros-ec-sysfs-debugfs-for-v4.17' into working-branch-for-4.17 Merging Enric's cros-ec sysfs and debugfs fixes from immutable branch. Signed-off-by: Benson Leung <[email protected]>
2018-04-10platform/chrome: mfd/cros_ec_dev: Add sysfs entry to set keyboard wake lid angleGwendal Grignou3-18/+96
This adds a sysfs attribute (/sys/class/chromeos/cros_ec/kb_wake_angle) used to set and get the keyboard wake lid angle. This attribute is present only if 2 accelerometers are controlled by the EC. This patch also moves the cros_ec features check before the device is added so the features map obtained from the EC is ready on time. Signed-off-by: Gwendal Grignou <[email protected]> Signed-off-by: Enric Balletbo i Serra <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Lee Jones <[email protected]> Signed-off-by: Benson Leung <[email protected]>
2018-04-10platform/chrome: cros_ec_debugfs: Add PD port info to debugfsShawn Nematbakhsh2-0/+75
Add info useful for debugging USB-PD port state. Signed-off-by: Shawn Nematbakhsh <[email protected]> Signed-off-by: Enric Balletbo i Serra <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Lee Jones <[email protected]> Signed-off-by: Benson Leung <[email protected]>
2018-04-10platform/chrome: cros_ec_debugfs: Use octal permissions '0444'Enric Balletbo i Serra1-2/+2
Fixed the following checkpatch warning: WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. Signed-off-by: Enric Balletbo i Serra <[email protected]> Suggested-by: Andy Shevchenko <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Benson Leung <[email protected]>
2018-04-10platform/chrome: cros_ec_sysfs: use permission-specific DEVICE_ATTR variantsEnric Balletbo i Serra1-12/+12
Use DEVICE_ATTR variants for read/write attributes. This simplifies the source code, improves readbility, and reduces the chance of inconsistencies. Suggested-by: Andy Shevchenko <[email protected]> Signed-off-by: Enric Balletbo i Serra <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Benson Leung <[email protected]>
2018-04-10platform/chrome: cros_ec_sysfs: introduce to_cros_ec_dev define.Enric Balletbo i Serra1-6/+5
Add a define to get the cros_ec_dev from device and use it. Suggested-by: Andy Shevchenko <[email protected]> Signed-off-by: Enric Balletbo i Serra <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Benson Leung <[email protected]>
2018-04-10platform/chrome: cros_ec_sysfs: Modify error handlingGwendal Grignou1-21/+4
When accessing a sysfs attribute, if the EC command fails, -EPROTO is now returned instead of an error message as it is unlikely an app is parsing the error message to do something meaningful. Also, this patch makes use of cros_ec_cmd_xfer_status() instead of cros_ec_cmd_xfer() so an error message is printed in the syslog. Signed-off-by: Gwendal Grignou <[email protected]> Signed-off-by: Enric Balletbo i Serra <[email protected]> Signed-off-by: Benson Leung <[email protected]>
2018-04-11KVM: PPC: Book3S HV: trace_tlbie must not be called in realmodeNicholas Piggin1-4/+0
This crashes with a "Bad real address for load" attempting to load from the vmalloc region in realmode (faulting address is in DAR). Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994 NIP: c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580 REGS: c000000fff76dd80 TRAP: 0200 Not tainted (4.16.0-01530-g43d1859f0994) MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48082222 XER: 00000000 CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3 NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0 LR [c0000000000c2430] do_tlbies+0x230/0x2f0 I suspect the reason is the per-cpu data is not in the linear chunk. This could be restored if that was able to be fixed, but for now, just remove the tracepoints. Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions") Cc: [email protected] # v4.13+ Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-11powerpc/8xx: Fix build with hugetlbfs enabledAneesh Kumar K.V1-0/+1
8xx uses the slice code when hugetlbfs is enabled. We missed a header include on 8xx which resulted in the below build failure: config: mpc885_ads_defconfig + CONFIG_HUGETLBFS arch/powerpc/mm/slice.c: In function 'slice_get_unmapped_area': arch/powerpc/mm/slice.c:655:2: error: implicit declaration of function 'need_extra_context' arch/powerpc/mm/slice.c:656:3: error: implicit declaration of function 'alloc_extended_context' on PPC64 the mmu_context.h was included via linux/pkeys.h Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-11powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loopsNicholas Piggin1-1/+6
The OPAL NVRAM driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, and various lockup errors to trigger (again, BMC reboot can cause it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Depends-on: 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops") Cc: [email protected] # v3.2+ Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-10blk-mq: Avoid that submitting a bio concurrently with device removal ↵Bart Van Assche1-6/+29
triggers a crash Because blkcg_exit_queue() is now called from inside blk_cleanup_queue() it is no longer safe to access cgroup information during or after the blk_cleanup_queue() call. Hence protect the generic_make_request_checks() call with blk_queue_enter() / blk_queue_exit(). Reported-by: Ming Lei <[email protected]> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") Signed-off-by: Bart Van Assche <[email protected]> Cc: Ming Lei <[email protected]> Cc: Joseph Qi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-04-11Merge branch 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie29-1556/+299
into drm-next A few fixes for 4.17: - Fix a potential use after free in a error case - Fix pcie lane handling in amdgpu SI dpm - sdma pipeline sync fix - A few vega12 cleanups and fixes - Misc other fixes * 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: Fix memory leaks at amdgpu_init() error path drm/amdgpu: Fix PCIe lane width calculation drm/radeon: Fix PCIe lane width calculation drm/amdgpu/si: implement get/set pcie_lanes asic callback drm/amdgpu: Add support for SRBM selection v3 Revert "drm/amdgpu: Don't change preferred domian when fallback GTT v5" drm/amd/powerply: fix power reading on Fiji drm/amd/powerplay: Enable ACG SS feature drm/amdgpu/sdma: fix mask in emit_pipeline_sync drm/amdgpu: Fix KIQ hang on bare metal for device unbind/bind back v2. drm/amd/pp: Clean header file in vega12_smumgr.c drm/amd/pp: Remove Dead functions on Vega12 drm/amd/pp: silence a static checker warning drm/amdgpu: drop compute ring timeout setting for non-sriov only (v2) drm/amdgpu: fix typo of domain fallback
2018-04-11Merge tag 'drm-misc-next-fixes-2018-04-04' of ↵Dave Airlie1-3/+1
git://anongit.freedesktop.org/drm/drm-misc into drm-next hda_intel: Don't declare azx PM ops if VGA_SWITCHEROO configured (Lukas) Cc: Lukas Wunner <[email protected]> Cc: Takashi Iwai <[email protected]> * tag 'drm-misc-next-fixes-2018-04-04' of git://anongit.freedesktop.org/drm/drm-misc: ALSA: hda - Silence PM ops build warning
2018-04-10swiotlb: fix unexpected swiotlb_alloc_coherent failuresTakashi Iwai1-1/+1
The code refactoring by commit 0176adb00406 ("swiotlb: refactor coherent buffer allocation") made swiotlb_alloc_buffer almost always failing due to a thinko: namely, the function evaluates the dma_coherent_ok call incorrectly and dealing as if it's invalid. This ends up with weird errors like iwlwifi probe failure or amdgpu screen flickering. This patch corrects the logic error. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1088658 Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1088902 Fixes: 0176adb00406 ("swiotlb: refactor coherent buffer allocation") Cc: <[email protected]> # v4.16+ Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2018-04-10NFS: advance nfs_entry cookie only after decoding completes successfullyFrank Sorenson2-4/+10
In nfs[34]_decode_dirent, the cookie is advanced as soon as it is read, but decoding may still fail later in the function, returning an error. Because the cookie has been advanced, the failing entry is not re-requested from the server, resulting in a missing directory entry. In addition, nfs v3 and v4 read the cookie at different locations in the xdr_stream, so the behavior of the two can be inconsistent. Fix these by reading the cookie into a temporary variable, and only advancing the cookie once the entire entry has been decoded from the xdr_stream successfully. Signed-off-by: Frank Sorenson <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>
2018-04-10NFSv3/acl: forget acl cache after setattrchendt1-1/+4
Sync of ACL with std permissions fail,We need to forget the ACL cache after setattr. Reproduction: #!/bin/bash touch testfile cat <<EOF >testfile #!/bin/bash echo "Test was executed" EOF chmod u=rwx testfile chmod g=rw- testfile chmod o=r-- testfile chacl u::r--,g::rwx,o:rw- testfile chmod u+w testfile ls -l testfile chacl -l testfile Output: -rw-rwxrw- 1 root root 0 Mar 28 05:29 testfile testfile [u::r--,g::rwx,o::rw-] Signed-off-by: chendt.fnst <[email protected]> Reviewed-by: Benjamin Coddington <[email protected]> Reviewed-by: Kinglong Mee <Kinglong Mee> Signed-off-by: Anna Schumaker <[email protected]>
2018-04-10NFSv4.1: Fix exclusive createTrond Myklebust1-16/+33
When we use EXCLUSIVE4_1 mode, the server returns an attribute mask where all the bits indicate which attributes were set, and where the verifier was stored. In order to figure out which attribute we have to resend, we need to clear out the attributes that are set in exclcreat_bitmask. Signed-off-by: Trond Myklebust <[email protected]> [Anna: Fixed typo NFS4_CREATE_EXCLUSIVE4 -> NFS4_CREATE_EXCLUSIVE] Signed-off-by: Anna Schumaker <[email protected]>
2018-04-10NFSv4: Declare the size up to date after it was set.Trond Myklebust2-0/+2
When we've changed the file size, then ensure we declare it to be up to date in the inode attributes. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Anna Schumaker <[email protected]>