aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2015-04-17mm: rcu-protected get_mm_exe_file()Konstantin Khlebnikov2-1/+2
This patch removes mm->mmap_sem from mm->exe_file read side. Also it kills dup_mm_exe_file() and moves exe_file duplication into dup_mmap() where both mmap_sems are locked. [[email protected]: fix comment typo] Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Al Viro <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17kernel/sysctl.c: threads-max observe limitsHeinrich Schuchardt1-0/+3
Users can change the maximum number of threads by writing to /proc/sys/kernel/threads-max. With the patch the value entered is checked against the same limits that apply when fork_init is called. Signed-off-by: Heinrich Schuchardt <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17drivers/rtc/rtc-s5m.c: add support for S2MPS13 RTCKrzysztof Kozlowski1-0/+2
The S2MPS13 RTC is almost the same as S2MPS14. The differences when updating alarm are: 1. Set WUDR+AUDR field instead of WUDR+RUDR. 2. Clear the AUDR field later (it is not auto-cleared). Signed-off-by: Krzysztof Kozlowski <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: Alessandro Zummo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17errno.h: Improve ENOSYS's commentAndy Lutomirski1-1/+10
ENOSYS is the mechanism used by user code to detect whether the running kernel implements a given system call. It should not be returned by anything except an unimplemented system call. Unfortunately, it is rather frequently used in the kernel to indicate that various new functions of existing system calls are not implemented. This should be discouraged. Improve the comment in errno.h to help clarify ENOSYS's purpose. Signed-off-by: Andy Lutomirski <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17lib/bitmap.c: bitmap_[empty,full]: remove code duplicationYury Norov1-4/+4
bitmap_empty() has its own implementation. But it's clearly as simple as: find_first_bit(src, nbits) == nbits The same is true for 'bitmap_full'. Signed-off-by: Yury Norov <[email protected]> Cc: George Spelvin <[email protected]> Cc: Alexey Klimov <[email protected]> Cc: Rasmus Villemoes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17kernel.h: implement DIV_ROUND_CLOSEST_ULLJavi Merino1-0/+12
We have grown a number of different implementations of DIV_ROUND_CLOSEST_ULL throughout the kernel. Move the i915 one to kernel.h so that it can be reused. Signed-off-by: Javi Merino <[email protected]> Reviewed-by: Jeff Epler <[email protected]> Cc: Jani Nikula <[email protected]> Cc: David Airlie <[email protected]> Cc: Guenter Roeck <[email protected]> Acked-by: Daniel Vetter <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Alex Elder <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Javi Merino <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Turquette <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Stephen Hemminger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17util_macros.h: add find_closest() macroBartosz Golaszewski1-0/+40
This series unduplicates the code used to find the member in an array closest to 'x'. The first patch adds a macro implementing the algorithm in two flavors - for arrays sorted in ascending and descending order. The second updates Documentation/CodingStyle on the naming convention for local variables in macros resembling functions. Other three patches replace duplicated code with calls to one of these macros in some hwmon drivers. This patch (of 5): Searching for the member of an array closest to 'x' is duplicated in several places. Add a new include - util_macros.h - and two macros that implement this algorithm for arrays sorted both in ascending and descending order. Uses linear search. Signed-off-by: Bartosz Golaszewski <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17lib: find_*_bit reimplementationYury Norov1-2/+2
This patchset does rework to find_bit function family to achieve better performance, and decrease size of text. All rework is done in patch 1. Patches 2 and 3 are about code moving and renaming. It was boot-tested on x86_64 and MIPS (big-endian) machines. Performance tests were ran on userspace with code like this: /* addr[] is filled from /dev/urandom */ start = clock(); while (ret < nbits) ret = find_next_bit(addr, nbits, ret + 1); end = clock(); printf("%ld\t", (unsigned long) end - start); On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for find_next_bit, nbits is 8M, for find_first_bit - 80K) find_next_bit: find_first_bit: new current new current 26932 43151 14777 14925 26947 43182 14521 15423 26507 43824 15053 14705 27329 43759 14473 14777 26895 43367 14847 15023 26990 43693 15103 15163 26775 43299 15067 15232 27282 42752 14544 15121 27504 43088 14644 14858 26761 43856 14699 15193 26692 43075 14781 14681 27137 42969 14451 15061 ... ... find_next_bit performance gain is 35-40%; find_first_bit - no measurable difference. On ARM machine, there is arch-specific implementation for find_bit. Thanks a lot to George Spelvin and Rasmus Villemoes for hints and helpful discussions. This patch (of 3): New implementations takes less space in source file (see diffstat) and in object. For me it's 710 vs 453 bytes of text. It also shows better performance. find_last_bit description fixed due to obvious typo. [[email protected]: include linux/bitmap.h, per Rasmus] Signed-off-by: Yury Norov <[email protected]> Reviewed-by: Rasmus Villemoes <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Alexey Klimov <[email protected]> Cc: David S. Miller <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Mark Salter <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Thomas Graf <[email protected]> Cc: Valentin Rothberg <[email protected]> Cc: Chris Wilson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-17Revert "mmc: core: Convert mmc_driver to device_driver"Ulf Hansson1-2/+12
This reverts commit 6685ac62b2f0 ("mmc: core: Convert mmc_driver to device_driver") The reverted commit went too far in simplifing the device driver parts for mmc. Let's restore the old mmc_driver to enable driver core to sooner or later to remove the ->probe(), ->remove() and ->shutdown() callbacks from the struct device_driver. Note that, the old ->suspend|resume() callbacks in the struct mmc_driver don't need to be restored, since the mmc block layer has converted to the modern system PM ops. Fixes: 6685ac62b2f0 ("mmc: core: Convert mmc_driver to device_driver") Signed-off-by: Ulf Hansson <[email protected]> Acked-by: Jaehoon Chung <[email protected]>
2015-04-16Merge branch 'for-linus' of ↵Linus Torvalds4-33/+80
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull third hunk of vfs changes from Al Viro: "This contains the ->direct_IO() changes from Omar + saner generic_write_checks() + dealing with fcntl()/{read,write}() races (mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of repeatedly looking at ->f_flags, which can be changed by fcntl(2), check ->ki_flags - which cannot) + infrastructure bits for dhowells' d_inode annotations + Christophs switch of /dev/loop to vfs_iter_write()" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits) block: loop: switch to VFS ITER_BVEC configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode VFS: Make pathwalk use d_is_reg() rather than S_ISREG() VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR() VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk NFS: Don't use d_inode as a variable name VFS: Impose ordering on accesses of d_inode and d_flags VFS: Add owner-filesystem positive/negative dentry checks nfs: generic_write_checks() shouldn't be done on swapout... ocfs2: use __generic_file_write_iter() mirror O_APPEND and O_DIRECT into iocb->ki_flags switch generic_write_checks() to iocb and iter ocfs2: move generic_write_checks() before the alignment checks ocfs2_file_write_iter: stop messing with ppos udf_file_write_iter: reorder and simplify fuse: ->direct_IO() doesn't need generic_write_checks() ext4_file_write_iter: move generic_write_checks() up xfs_file_aio_write_checks: switch to iocb/iov_iter generic_write_checks(): drop isblk argument blkdev_write_iter: expand generic_file_checks() call in there ...
2015-04-16Merge branch 'for_linus' of ↵Linus Torvalds3-24/+86
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and udf updates from Jan Kara: "The pull contains quota changes which complete unification of XFS and VFS quota interfaces (so tools can use either interface to manipulate any filesystem). There's also a patch to support project quotas in VFS quota subsystem from Li Xi. Finally there's a bunch of UDF fixes and cleanups and tiny cleanup in reiserfs & ext3" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits) udf: Update ctime and mtime when directory is modified udf: return correct errno for udf_update_inode() ext3: Remove useless condition in if statement. vfs: Add general support to enforce project quota limits reiserfs: fix __RASSERT format string udf: use int for allocated blocks instead of sector_t udf: remove redundant buffer_head.h includes udf: remove else after return in __load_block_bitmap() udf: remove unused variable in udf_table_free_blocks() quota: Fix maximum quota limit settings quota: reorder flags in quota state quota: paranoia: check quota tree root quota: optimize i_dquot access quota: Hook up Q_XSETQLIM for id 0 to ->set_info xfs: Add support for Q_SETINFO quota: Make ->set_info use structure with neccesary info to VFS and XFS quota: Remove ->get_xstate and ->get_xstatev callbacks gfs2: Convert to using ->get_state callback xfs: Convert to using ->get_state callback quota: Wire up Q_GETXSTATE and Q_GETXSTATV calls to work with ->get_state ...
2015-04-16Merge branch 'for-4.1/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2-48/+3
Pull block driver updates from Jens Axboe: "This is the block driver pull request for 4.1. As with the core bits, this is a relatively slow round. This pull request contains: - Various fixes and cleanups for NVMe, from Alexey Khoroshilov, Chong Yuan, myself, Keith Busch, and Murali Iyer. - Documentation and code cleanups for nbd from Markus Pargmann. - Change of brd maintainer to me, from Ross Zwisler. At least the email doesn't bounce anymore then. - Two xen-blkback fixes from Tao Chen" * 'for-4.1/drivers' of git://git.kernel.dk/linux-block: (23 commits) NVMe: Meta data handling through submit io ioctl NVMe: Add translation for block limits NVMe: Remove check for null NVMe: Fix error handling of class_create("nvme") xen-blkback: define pr_fmt macro to avoid the duplication of DRV_PFX xen-blkback: enlarge the array size of blkback name nbd: Return error pointer directly nbd: Return error code directly nbd: Remove fixme that was already fixed nbd: Restructure debugging prints nbd: Fix device bytesize type nbd: Replace kthread_create with kthread_run nbd: Remove kernel internal header Documentation: nbd: Add list of module parameters Documentation: nbd: Reformat to allow more documentation NVMe: increase depth of admin queue nvme: Fix PRP list calculation for non-4k system page size NVMe: Fix blk-mq hot cpu notification NVMe: embedded iod mask cleanup NVMe: Freeze admin queue on device failure ...
2015-04-16Merge branch 'for-4.1/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+5
Pull block layer core bits from Jens Axboe: "This is the core pull request for 4.1. Not a lot of stuff in here for this round, mostly little fixes or optimizations. This pull request contains: - An optimization that speeds up queue runs on blk-mq, especially for the case where there's a large difference between nr_cpu_ids and the actual mapped software queues on a hardware queue. From Chong Yuan. - Honor node local allocations for requests on legacy devices. From David Rientjes. - Cleanup of blk_mq_rq_to_pdu() from me. - exit_aio() fixup from me, greatly speeding up exiting multiple IO contexts off exit_group(). For my particular test case, fio exit took ~6 seconds. A typical case of both exposing RCU grace periods to user space, and serializing exit of them. - Make blk_mq_queue_enter() honor the gfp mask passed in, so we only wait if __GFP_WAIT is set. From Keith Busch. - blk-mq exports and two added helpers from Mike Snitzer, which will be used by the dm-mq code. - Cleanups of blk-mq queue init from Wei Fang and Xiaoguang Wang" * 'for-4.1/core' of git://git.kernel.dk/linux-block: blk-mq: reduce unnecessary software queue looping aio: fix serial draining in exit_aio() blk-mq: cleanup blk_mq_rq_to_pdu() blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue() block: remove redundant check about 'set->nr_hw_queues' in blk_mq_alloc_tag_set() block: allocate request memory local to request queue blk-mq: don't wait in blk_mq_queue_enter() if __GFP_WAIT isn't set blk-mq: export blk_mq_run_hw_queues blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk
2015-04-16Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2-1/+2
Pull SCSI updates from James Bottomley: "This is the usual grab bag of driver updates (lpfc, qla2xxx, storvsc, aacraid, ipr) plus an assortment of minor updates. There's also a major update to aic1542 which moves the driver into this millenium" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (106 commits) change SCSI Maintainer email sd, mmc, virtio_blk, string_helpers: fix block size units ufs: add support to allow non standard behaviours (quirks) ufs-qcom: save controller revision info in internal structure qla2xxx: Update driver version to 8.07.00.18-k qla2xxx: Restore physical port WWPN only, when port down detected for FA-WWPN port. qla2xxx: Fix virtual port configuration, when switch port is disabled/enabled. qla2xxx: Prevent multiple firmware dump collection for ISP27XX. qla2xxx: Disable Interrupt handshake for ISP27XX. qla2xxx: Add debugging info for MBX timeout. qla2xxx: Add serdes read/write support for ISP27XX qla2xxx: Add udev notification to save fw dump for ISP27XX qla2xxx: Add message for sucessful FW dump collected for ISP27XX. qla2xxx: Add support to load firmware from file for ISP 26XX/27XX. qla2xxx: Fix beacon blink for ISP27XX. qla2xxx: Increase the wait time for firmware to be ready for P3P. qla2xxx: Fix crash due to wrong casting of reg for ISP27XX. qla2xxx: Fix warnings reported by static checker. lpfc: Update version to 10.5.0.0 for upstream patch set lpfc: Update copyright to 2015 ...
2015-04-16Merge branch 'for-v4.1-rc1' of ↵Linus Torvalds2-0/+9
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA-mapping updates from Marek Szyprowski: "This contains two patches, which clarify abiguity in the dma-mapping api" * 'for-v4.1-rc1' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: include/dma-mapping: Clarify output of dma_map_sg asm/dma-mapping-common: Clarify output of dma_map_sg_attrs
2015-04-16sparc: Break up monolithic iommu table/lock into finer graularity pools and lockSowmini Varadhan1-0/+55
Investigation of multithreaded iperf experiments on an ethernet interface show the iommu->lock as the hottest lock identified by lockstat, with something of the order of 21M contentions out of 27M acquisitions, and an average wait time of 26 us for the lock. This is not efficient. A more scalable design is to follow the ppc model, where the iommu_table has multiple pools, each stretching over a segment of the map, and with a separate lock for each pool. This model allows for better parallelization of the iommu map search. This patch adds the iommu range alloc/free function infrastructure. Signed-off-by: Sowmini Varadhan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-16Merge tag 'stable/for-linus-4.1-rc0-tag' of ↵Linus Torvalds3-8/+65
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen features and fixes from David Vrabel: - use a single source list of hypercalls, generating other tables etc. at build time. - add a "Xen PV" APIC driver to support >255 VCPUs in PV guests. - significant performance improve to guest save/restore/migration. - scsiback/front save/restore support. - infrastructure for multi-page xenbus rings. - misc fixes. * tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Try harder to get PXM information for Xen xenbus_client: Extend interface to support multi-page ring xen-pciback: also support disabling of bus-mastering and memory-write-invalidate xen: support suspend/resume in pvscsi frontend xen: scsiback: add LUN of restored domain xen-scsiback: define a pr_fmt macro with xen-pvscsi xen/mce: fix up xen_late_init_mcelog() error handling xen/privcmd: improve performance of MMAPBATCH_V2 xen: unify foreign GFN map/unmap for auto-xlated physmap guests x86/xen/apic: WARN with details. x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs xen/pciback: Don't print scary messages when unsupported by hypervisor. xen: use generated hypercall symbols in arch/x86/xen/xen-head.S xen: use generated hypervisor symbols in arch/x86/xen/trace.c xen: synchronize include/xen/interface/xen.h with xen xen: build infrastructure for generating hypercall depending symbols xen: balloon: Use static attribute groups for sysfs entries xen: pcpu: Use static attribute groups for sysfs entry
2015-04-16Merge tag 'powerpc-4.1-1' of ↵Linus Torvalds2-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Numerous minor fixes, cleanups etc. - More EEH work from Gavin to remove its dependency on device_nodes. - Memory hotplug implemented entirely in the kernel from Nathan Fontenot. - Removal of redundant CONFIG_PPC_OF by Kevin Hao. - Rewrite of VPHN parsing logic & tests from Greg Kurz. - A fix from Nish Aravamudan to reduce memory usage by clamping nodes_possible_map. - Support for pstore on powernv from Hari Bathini. - Removal of old powerpc specific byte swap routines by David Gibson. - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing your firmware when it wasn't. - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver. - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek. - Some fixes for migration from Tyrel Datwyler. - A new syscall to switch the cpu endian by Michael Ellerman. - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn. - A fix for the OPAL sensor driver from Cédric Le Goater. - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman. - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per machine. - Small patch from Sam Bobroff to explicitly abort non-suspended transactions on syscalls, plus a test to exercise it. - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu. - Small patch to enable the hard lockup detector from Anton Blanchard. - Fix from Dave Olson for missing L2 cache information on some CPUs. - Some fixes from Michael Ellerman to get Cell machines booting again. - Freescale updates from Scott: Highlights include BMan device tree nodes, an MSI erratum workaround, a couple minor performance improvements, config updates, and misc fixes/cleanup. * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits) powerpc/powermac: Fix build error seen with powermac smp builds powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE powerpc: Remove PPC32 code from pseries specific find_and_init_phbs() powerpc/cell: Fix iommu breakage caused by controller_ops change powerpc/eeh: Fix crash in eeh_add_device_early() on Cell powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails powerpc/pseries: Correct memory hotplug locking powerpc: Fix missing L2 cache size in /sys/devices/system/cpu powerpc: Add ppc64 hard lockup detector support oprofile: Disable oprofile NMI timer on ppc64 powerpc/perf/hv-24x7: Add missing put_cpu_var() powerpc/perf/hv-24x7: Break up single_24x7_request powerpc/perf/hv-24x7: Define update_event_count() powerpc/perf/hv-24x7: Whitespace cleanup powerpc/perf/hv-24x7: Define add_event_to_24x7_request() powerpc/perf/hv-24x7: Rename hv_24x7_event_update powerpc/perf/hv-24x7: Move debug prints to separate function powerpc/perf/hv-24x7: Drop event_24x7_request() powerpc/perf/hv-24x7: Use pr_devel() to log message ... Conflicts: tools/testing/selftests/powerpc/Makefile tools/testing/selftests/powerpc/tm/Makefile
2015-04-16bpf: fix bpf helpers to use skb->mac_header relative offsetsAlexei Starovoitov2-3/+6
For the short-term solution, lets fix bpf helper functions to use skb->mac_header relative offsets instead of skb->data in order to get the same eBPF programs with cls_bpf and act_bpf work on ingress and egress qdisc path. We need to ensure that mac_header is set before calling into programs. This is effectively the first option from below referenced discussion. More long term solution for LD_ABS|LD_IND instructions will be more intrusive but also more beneficial than this, and implemented later as it's too risky at this point in time. I.e., we plan to look into the option of moving skb_pull() out of eth_type_trans() and into netif_receive_skb() as has been suggested as second option. Meanwhile, this solution ensures ingress can be used with eBPF, too, and that we won't run into ABI troubles later. For dealing with negative offsets inside eBPF helper functions, we've implemented bpf_skb_clone_unwritable() to test for unwriteable headers. Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694 Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action") Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers") Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-16stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetreeVince Bridgers1-0/+2
Read the tx-fifo-depth and rx-fifo-depth from the devicetree. The Synopsys stmmac controller fifos are configurable per product instance, and the fifo sizes are needed to configure certain features correctly such as flow control. Signed-off-by: Vince Bridgers <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-04-16f2fs: pass checkpoint reason on roll-forward recoveryJaegeuk Kim1-0/+1
This patch adds CP_RECOVERY to remain recovery information for checkpoint. And, it makes sure writing checkpoint in this case. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-04-15target: Ensure sess_prot_type is saved across session restartNicholas Bellinger1-0/+1
The following incremental patch saves the current sess_prot_type into se_node_acl, and will always reset sess_prot_type if a previous saved value exists. So the PI setting for the fabric's session with backend devices not supporting PI is persistent across session restart. (Fix se_node_acl dereference for discovery sessions - DanCarpenter) Reviewed-by: Martin Petersen <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2015-04-16cpumask: resurrect CPU_MASK_CPU0Rusty Russell1-0/+5
We removed it in 2f0f267ea072 (cpumask: remove deprecated functions.), but grep shows it still used by MIPS, and not unreasonably. Signed-off-by: Rusty Russell <[email protected]>
2015-04-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds23-230/+272
Merge second patchbomb from Andrew Morton: - the rest of MM - various misc bits - add ability to run /sbin/reboot at reboot time - printk/vsprintf changes - fiddle with seq_printf() return value * akpm: (114 commits) parisc: remove use of seq_printf return value lru_cache: remove use of seq_printf return value tracing: remove use of seq_printf return value cgroup: remove use of seq_printf return value proc: remove use of seq_printf return value s390: remove use of seq_printf return value cris fasttimer: remove use of seq_printf return value cris: remove use of seq_printf return value openrisc: remove use of seq_printf return value ARM: plat-pxa: remove use of seq_printf return value nios2: cpuinfo: remove use of seq_printf return value microblaze: mb: remove use of seq_printf return value ipc: remove use of seq_printf return value rtc: remove use of seq_printf return value power: wakeup: remove use of seq_printf return value x86: mtrr: if: remove use of seq_printf return value linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK MAINTAINERS: CREDITS: remove Stefano Brivio from B43 .mailmap: add Ricardo Ribalda CREDITS: add Ricardo Ribalda Delgado ...
2015-04-15linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASKRasmus Villemoes1-6/+2
The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional, which will generally lead to slightly better generated code (221 bytes saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL). As a small bonus, this also ensures that the nbits parameter is expanded exactly once. In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed to assume it is positive (or divisible by BITS_PER_LONG), and hence just do the simple mask. It doesn't seem to use this, and even on an architecture like x86 where the shift only depends on the lower 5 or 6 bits, and these bits are not affected by the signedness of the expression, gcc still generates code to compute the C99 mandated value of start % BITS_PER_LONG. So just use a mask explicitly, also for consistency with BITMAP_LAST_WORD_MASK. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Tejun Heo <[email protected]> Reviewed-by: George Spelvin <[email protected]> Cc: Yury Norov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15lib/string_helpers.c: change semantics of string_escape_memRasmus Villemoes1-4/+4
The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [[email protected]: simplify qword_add] [[email protected]: add missed curly braces] Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Andy Shevchenko <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15printk: comment pr_cont() stating it is only to continue a lineSteven Rostedt1-0/+5
KERN_CONT is nicely commented in kern_levels.h, but pr_cont() is now used more often, and it lacks the comment stating what it is used for. It can be confused as continuing the log level, but that is not its purpose. Its purpose is to continue a line that had no newline enclosed. This should be documented by pr_cont() as well. Signed-off-by: Steven Rostedt <[email protected]> Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15kernel/reboot.c: add orderly_reboot for graceful rebootJoel Stanley1-1/+2
The kernel has orderly_poweroff which allows the kernel to initiate a graceful shutdown of userspace, by running /sbin/poweroff. This adds orderly_reboot that will cause userspace to shut itself down by calling /sbin/reboot. This will be used for shutdown initiated by a system controller on platforms that do not use ACPI. orderly_reboot() should be used when the system wants to allow userspace to gracefully shut itself down. For cases where the system may imminently catch on fire, the existing emergency_restart() provides an immediate reboot without involving userspace. Signed-off-by: Joel Stanley <[email protected]> Cc: Fabian Frederick <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jeremy Kerr <[email protected]> Cc: David S. Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15kernel/resource.c: remove deprecated __check_region() and friendsJakub Sitnicki1-8/+0
All users of __check_region(), check_region(), and check_mem_region() are gone. We got rid of the last user in v4.0-rc1. Remove them. bloat-o-meter on x86_64 shows: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102) function old new delta __kstrtab___check_region 15 - -15 __ksymtab___check_region 16 - -16 __check_region 71 - -71 Signed-off-by: Jakub Sitnicki <[email protected]> Cc: Bjorn Helgaas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15kernel: conditionally support non-root users, groups and capabilitiesIulia Manda3-4/+60
There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [[email protected]: coding-style fixes] Signed-off-by: Iulia Manda <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15include/linux: remove empty conditionalsRasmus Villemoes2-73/+0
Commit 607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux") left behind some empty conditional blocks. Since they are useless and may cause a reader to wonder whether something is missing, remove them. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15zsmalloc: support compactionMinchan Kim1-0/+1
This patch provides core functions for migration of zsmalloc. Migraion policy is simple as follows. for each size class { while { src_page = get zs_page from ZS_ALMOST_EMPTY if (!src_page) break; dst_page = get zs_page from ZS_ALMOST_FULL if (!dst_page) dst_page = get zs_page from ZS_ALMOST_EMPTY if (!dst_page) break; migrate(from src_page, to dst_page); } } For migration, we need to identify which objects in zspage are allocated to migrate them out. We could know it by iterating of freed objects in a zspage because first_page of zspage keeps free objects singly-linked list but it's not efficient. Instead, this patch adds a tag(ie, OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check whether the object is allocated easily. This patch adds another status bit in handle to synchronize between user access through zs_map_object and migration. During migration, we cannot move objects user are using due to data coherency between old object and new object. [[email protected]: zsmalloc.c needs sched.h for cond_resched()] Signed-off-by: Minchan Kim <[email protected]> Cc: Juneho Choi <[email protected]> Cc: Gunho Lee <[email protected]> Cc: Luigi Semenzato <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15dax: use pfn_mkwrite to update c/mtime + freeze protectionBoaz Harrosh1-0/+1
From: Yigal Korman <[email protected]> [v1] Without this patch, c/mtime is not updated correctly when mmap'ed page is first read from and then written to. A new xfstest is submitted for testing this (generic/080) [v2] Jan Kara has pointed out that if we add the sb_start/end_pagefault pair in the new pfn_mkwrite we are then fixing another bug where: A user could start writing to the page while filesystem is frozen. Signed-off-by: Yigal Korman <[email protected]> Signed-off-by: Boaz Harrosh <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAPBoaz Harrosh1-0/+3
This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to get notified when access is a write to a read-only PFN. This can happen if we mmap() a file then first mmap-read from it to page-in a read-only PFN, than we mmap-write to the same page. We need this functionality to fix a DAX bug, where in the scenario above we fail to set ctime/mtime though we modified the file. An xfstest is attached to this patchset that shows the failure and the fix. (A DAX patch will follow) This functionality is extra important for us, because upon dirtying of a pmem page we also want to RDMA the page to a remote cluster node. We define a new pfn_mkwrite and do not reuse page_mkwrite because 1 - The name ;-) 2 - But mainly because it would take a very long and tedious audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP users. To make sure they do not now CRASH. For example current DAX code (which this is for) would crash. If we would want to reuse page_mkwrite, We will need to first patch all users, so to not-crash-on-no-page. Then enable this patch. But even if I did that I would not sleep so well at night. Adding a new vector is the safest thing to do, and is not that expensive. an extra pointer at a static function vector per driver. Also the new vector is better for performance, because else we Will call all current Kernel vectors, so to: check-ha-no-page-do-nothing and return. No need to call it from do_shared_fault because do_wp_page is called to change pte permissions anyway. Signed-off-by: Yigal Korman <[email protected]> Signed-off-by: Boaz Harrosh <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Jan Kara <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm/mempool.c: kasan: poison mempool elementsAndrey Ryabinin1-0/+2
Mempools keep allocated objects in reserved for situations when ordinary allocation may not be possible to satisfy. These objects shouldn't be accessed before they leave the pool. This patch poison elements when get into the pool and unpoison when they leave it. This will let KASan to detect use-after-free of mempool's elements. Signed-off-by: Andrey Ryabinin <[email protected]> Tested-by: David Rientjes <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dmitry Chernenkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: uninline and cleanup page-mapping related helpersKirill A. Shutemov2-14/+2
Most-used page->mapping helper -- page_mapping() -- has already uninlined. Let's uninline also page_rmapping() and page_anon_vma(). It saves us depending on configuration around 400 bytes in text: text data bss dec hex filename 660318 99254 410000 1169572 11d8a4 mm/built-in.o-before 659854 99254 410000 1169108 11d6d4 mm/built-in.o I also tried to make code a bit more clean. [[email protected]: coding-style fixes] Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: cma: add trace events for CMA allocations and freeingsStefan Strogin1-0/+66
Add trace events for cma_alloc() and cma_release(). The cma_alloc tracepoint is used both for successful and failed allocations, in case of allocation failure pfn=-1UL is stored and printed. Signed-off-by: Stefan Strogin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Thierry Reding <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15include/linux/mm.h: simplify flag checkBorislav Petkov1-3/+3
Flip the flag test so that it is the simplest. No functional change, just a small readability improvement: No code changed: # arch/x86/kernel/sys_x86_64.o: text data bss dec hex filename 1551 24 0 1575 627 sys_x86_64.o.before 1551 24 0 1575 627 sys_x86_64.o.after md5: 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.before.asm 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.after.asm Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: hugetlb: cleanup using paeg_huge_active()Naoya Horiguchi2-2/+7
Now we have an easy access to hugepages' activeness, so existing helpers to get the information can be cleaned up. [[email protected]: s/PageHugeActive/page_huge_active/] Signed-off-by: Naoya Horiguchi <[email protected]> Cc: Hugh Dickins <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm, mempool: disallow mempools based on slab caches with constructorsDavid Rientjes1-1/+2
All occurrences of mempools based on slab caches with object constructors have been removed from the tree, so disallow creating them. We can only dereference mem->ctor in mm/mempool.c without including mm/slab.h in include/linux/mempool.h. So simply note the restriction, just like the comment restricting usage of __GFP_ZERO, and warn on kernels with CONFIG_DEBUG_VM() if such a mempool is allocated from. We don't want to incur this check on every element allocation, so use VM_BUG_ON(). Signed-off-by: David Rientjes <[email protected]> Cc: Dave Kleikamp <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Sebastian Ott <[email protected]> Cc: Mikulas Patocka <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15hugetlbfs: accept subpool min_size mount option and setup accordinglyMike Kravetz1-1/+2
Make 'min_size=<value>' be an option when mounting a hugetlbfs. This option takes the same value as the 'size' option. min_size can be specified without specifying size. If both are specified, min_size must be less that or equal to size else the mount will fail. If min_size is specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: Mike Kravetz <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Aneesh Kumar <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15hugetlbfs: add minimum size tracking fields to subpool structureMike Kravetz1-1/+7
hugetlbfs allocates huge pages from the global pool as needed. Even if the global pool contains a sufficient number pages for the filesystem size at mount time, those global pages could be grabbed for some other use. As a result, filesystem huge page allocations may fail due to lack of pages. Applications such as a database want to use huge pages for performance reasons. hugetlbfs filesystem semantics with ownership and modes work well to manage access to a pool of huge pages. However, the application would like some reasonable assurance that allocations will not fail due to a lack of huge pages. At application startup time, the application would like to configure itself to use a specific number of huge pages. Before starting, the application can check to make sure that enough huge pages exist in the system global pools. However, there are no guarantees that those pages will be available when needed by the application. What the application wants is exclusive use of a subset of huge pages. Add a new hugetlbfs mount option 'min_size=<value>' to indicate that the specified number of pages will be available for use by the filesystem. At mount time, this number of huge pages will be reserved for exclusive use of the filesystem. If there is not a sufficient number of free pages, the mount will fail. As pages are allocated to and freeed from the filesystem, the number of reserved pages is adjusted so that the specified minimum is maintained. This patch (of 4): Add a field to the subpool structure to indicate the minimimum number of huge pages to always be used by this subpool. This minimum count includes allocated pages as well as reserved pages. If the minimum number of pages for the subpool have not been allocated, pages are reserved up to this minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: Mike Kravetz <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Aneesh Kumar <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: rename deactivate_page to deactivate_file_pageMinchan Kim1-1/+1
"deactivate_page" was created for file invalidation so it has too specific logic for file-backed pages. So, let's change the name of the function and date to a file-specific one and yield the generic name. Signed-off-by: Minchan Kim <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Wang, Yalin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: allow compaction of unevictable pagesEric B Munson1-0/+1
Currently, pages which are marked as unevictable are protected from compaction, but not from other types of migration. The POSIX real time extension explicitly states that mlock() will prevent a major page fault, but the spirit of this is that mlock() should give a process the ability to control sources of latency, including minor page faults. However, the mlock manpage only explicitly says that a locked page will not be written to swap and this can cause some confusion. The compaction code today does not give a developer who wants to avoid swap but wants to have large contiguous areas available any method to achieve this state. This patch introduces a sysctl for controlling compaction behavior with respect to the unevictable lru. Users who demand no page faults after a page is present can set compact_unevictable_allowed to 0 and users who need the large contiguous areas can enable compaction on locked memory by leaving the default value of 1. To illustrate this problem I wrote a quick test program that mmaps a large number of 1MB files filled with random data. These maps are created locked and read only. Then every other mmap is unmapped and I attempt to allocate huge pages to the static huge page pool. When the compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages after fragmenting memory. When the value is set to 1, allocations succeed. Signed-off-by: Eric B Munson <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: avoid tail page refcounting on non-THP compound pagesKirill A. Shutemov1-1/+1
THP uses tail page refcounting to be able to split huge pages at any time. Tail page refcounting is not needed for other users of compound pages and it's harmful because of overhead. We try to exclude non-THP pages from tail page refcounting using __compound_tail_refcounted() check. It excludes most common non-THP compound pages: SL*B and hugetlb, but it doesn't catch rest of __GFP_COMP users -- drivers. And it's not only about overhead. Drivers might want to use compound pages to get refcounting semantics suitable for mapping high-order pages to userspace. But tail page refcounting breaks it. Tail page refcounting uses ->_mapcount in tail pages to store GUP pins on them. It means GUP pins would affect page_mapcount() for tail pages. It's not a problem for THP, because it never maps tail pages. But unlike THP, drivers map parts of compound pages with PTEs and it makes page_mapcount() be called for tail pages. In particular, GUP pins would shift PSS up and affect /proc/kpagecount for such pages. But, I'm not aware about anything which can lead to crash or other serious misbehaviour. Since currently all THP pages are anonymous and all drivers pages are not, we can fix the __compound_tail_refcounted() check by requiring PageAnon() to enable tail page refcounting. Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: consolidate all page-flags helpers in <linux/page-flags.h>Kirill A. Shutemov4-105/+96
Currently we take a naive approach to page flags on compound pages - we set the flag on the page without consideration if the flag makes sense for tail page or for compound page in general. This patchset try to sort this out by defining per-flag policy on what need to be done if page-flag helper operate on compound page. The last patch in the patchset also sanitizes usege of page->mapping for tail pages. We don't define the meaning of page->mapping for tail pages. Currently it's always NULL, which can be inconsistent with head page and potentially lead to problems. For now I caught one case of illegal usage of page flags or ->mapping: sound subsystem allocates pages with __GFP_COMP and maps them with PTEs. It leads to setting dirty bit on tail pages and access to tail_page's ->mapping. I don't see any bad behaviour caused by this, but worth fixing anyway. This patchset makes more sense if you take my THP refcounting into account: we will see more compound pages mapped with PTEs and we need to define behaviour of flags on compound pages to avoid bugs. This patch (of 16): We have page-flags helper function declarations/definitions spread over several header files. Let's consolidate them in <linux/page-flags.h>. Signed-off-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Steve Capper <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Jerome Marchand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15mm: refactor zone_movable_is_highmem()Zhang Zhen1-4/+4
All callers of zone_movable_is_highmem are under #ifdef CONFIG_HIGHMEM, so the else branch return 0 is not needed. Signed-off-by: Zhang Zhen <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15vfs: delete vfs_readdir function declarationZhang Zhen1-1/+0
vfs_readdir() was replaced by iterate_dir() in commit 5c0ba4e0762e ("[readdir] introduce iterate_dir() and dir_context"). Signed-off-by: Zhang Zhen <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-15Merge branch 'for-next' of ↵Linus Torvalds3-18/+27
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds Pull LED subsystem updates from Bryan Wu: "In this cycle, we merged some fix and update for LED Flash class driver. Then the core code of LED Flash class driver is in the kernel now. Moreover, we also got some bug fixes, code cleanup and new drivers for LED controllers" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: leds: Don't treat the LED name as a format string leds: Use log level warn instead of info when telling about a name clash leds/led-class: Handle LEDs with the same name leds: lp8860: Fix typo in MODULE_DESCRIPTION in leds-lp8860.c leds: lp8501: Fix typo in MODULE_DESCRIPTION in leds-lp8501.c DT: leds: Add uniqueness requirement for 'label' property. dt-binding: leds: Add common LED DT bindings macros leds: add Qualcomm PM8941 WLED driver leds: add DT binding for Qualcomm PM8941 WLED block leds: pca963x: Add missing initialiation of struct led_info.flags leds: flash: Fix the size of sysfs_groups array Documentation: leds: Add description of LED Flash class extension leds: flash: document sysfs interface leds: flash: Remove synchronized flash strobe feature leds: Introduce devres helper for led_classdev_register leds: lp8860: make use of devm_gpiod_get_optional leds: Let the binding document example for leds-gpio follow the gpio bindings leds: flash: remove stray include directive leds: leds-pwm: drop one pwm_get_period() call
2015-04-15Merge tag 'sound-4.1-rc1' of ↵Linus Torvalds20-46/+649
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "There have been major modernization with the standard bus: in ALSA sequencer core and HD-audio. Also, HD-audio receives the regmap support replacing the in-house cache register cache code. These changes shouldn't impact the existing behavior, but rather refactoring. In addition, HD-audio got the code split to a core library part and the "legacy" driver parts. This is a preliminary work for adapting the upcoming ASoC HD-audio driver, and the whole transition is still work in progress, likely finished in 4.1. Along with them, there are many updates in ASoC area as usual, too: lots of cleanups, Intel code shuffling, etc. Here are some highlights: ALSA core: - PCM: the audio timestamp / wallclock enhancement - PCM: fixes in DPCM management - Fixes / cleanups of user-space control element management - Sequencer: modernization using the standard bus HD-audio: - Modernization using the standard bus - Regmap support - Use standard runtime PM for codec power saving - Widget-path based power-saving for IDT, VIA and Realtek codecs - Reorganized sysfs entries for each codec object - More Dell headset support ASoC: - Move of jack registration to the card level - Lots of ASoC cleanups, mainly moving things from the CODEC level to the card level - Support for DAPM routes specified by both the machine driver and DT - Continuing improvements to rcar - pcm512x enhacements - Intel platforms updates - rt5670 updates / fixes - New platforms / devices: some non-DSP Qualcomm platforms, Google's Storm platform, Maxmim MAX98925 CODECs and the Ingenic JZ4780 SoC Misc: - ice1724: Improved ESI W192M support - emu10k1: Emu 1010 fixes/enhancement" * tag 'sound-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (411 commits) ALSA: hda - set GET bit when adding a vendor verb to the codec regmap ALSA: hda/realtek - Enable the ALC292 dock fixup on the Thinkpad T450 ALSA: hda - Fix another race in runtime PM refcounting ALSA: hda - Expose codec type sysfs ALSA: ctl: fix to handle several elements added by one operation for userspace element ASoC: Intel: fix array_size.cocci warnings ASoC: n810: Automatically disconnect non-connected pins ASoC: n810: Consistently pass the card DAPM context to n810_ext_control() ASoC: davinci-evm: Use card DAPM context to access widgets ASoC: mop500_ab8500: Use card DAPM context to access widgets ASoC: wm1133-ev1: Use card DAPM context to access widgets ASoC: atmel: Improve machine driver compile test coverage ASoC: atmel: Add dependency to SND_SOC_I2C_AND_SPI where necessary ALSA: control: Fix a typo of SNDRV_CTL_ELEM_ACCESS_TLV_* with SNDRV_CTL_TLV_OP_* ALSA: usb-audio: Don't attempt to get Microsoft Lifecam Cinema sample rate ASoC: rnsd: fix build regression without CONFIG_OF ALSA: emu10k1: add toggles for E-mu 1010 optical ports ALSA: ctl: fill identical information to return value when adding userspace elements ALSA: ctl: fix a bug to return no identical information in info operation for userspace controls ALSA: ctl: confirm to return all identical information in 'activate' event ...