aboutsummaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)AuthorFilesLines
2016-02-09gpio: add a userspace chardev ABI for GPIOsLinus Walleij2-0/+29
A new chardev that is to be used for userspace GPIO access is added in this patch. It is intended to gradually replace the horribly broken sysfs ABI. Using a chardev has many upsides: - All operations are per-gpiochip, which is the actual device underlying the GPIOs, making us tie in to the kernel device model properly. - Hotpluggable GPIO controllers can come and go, as this kind of problem has been know to userspace for character devices since ages, and if a gpiochip handle is held in userspace we know we will break something, whereas the sysfs is stateless. - The one-value-per-file rule of sysfs is really hard to maintain when you want to twist more than one knob at a time, for example have in-kernel APIs to switch several GPIO lines at the same time, and this will be possible to do with a single ioctl() from userspace, saving a lot of context switching. We also need to add a new bus type for GPIO. This is necessary for example for userspace coldplug, where sysfs is traversed to find the boot-time device nodes and create the character devices in /dev. This new chardev ABI is *non* *optional* and can be counted on to be present in the future, emphasizing the preference of this ABI. The ABI only implements one single ioctl() to get the name and number of GPIO lines of a chip. Even this is debatable: see it as a minimal example for review. This ABI shall be ruthlessly reviewed and etched in stone. The old /sys/class/gpio is still optional to compile in, but will be deprecated. Unique device IDs are created using IDR, which is overkill and insanely scalable, but also well tested. Cc: Johan Hovold <[email protected]> Cc: Michael Welling <[email protected]> Cc: Markus Pargmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2016-02-09bridge: mdb: add support for offloaded mdb entriesElad Raz1-0/+2
Add new bitmask member 'flags' to br_mdb_entry structure. Adding MDB_FLAGS_OFFLOAD bit which indicates MDB entries is offloaded to hardware. Signed-off-by: Elad Raz <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-09drm: prime: Honour O_RDWR during prime-handle-to-fdDaniel Thompson1-0/+1
Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Daniel Thompson <[email protected]> Signed-off-by: Tiago Vignatti <[email protected]> Reviewed-by: Stéphane Marchesin <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2016-02-09Merge tag 'drm-intel-next-2016-01-24' of ↵Dave Airlie1-4/+29
git://anongit.freedesktop.org/drm-intel into drm-next - support for v3 vbt dsi blocks (Jani) - improve mmio debug checks (Mika Kuoppala) - reorg the ddi port translation table entries and related code (Ville) - reorg gen8 interrupt handling for future platforms (Tvrtko) - refactor tile width/height computations for framebuffers (Ville) - kerneldoc integration for intel_pm.c (Jani) - move default context from engines to device-global dev_priv (Dave Gordon) - make seqno/irq ordering coherent with execlist (Chris) - decouple internal engine number from UABI (Chris&Tvrtko) - tons of small fixes all over, as usual * tag 'drm-intel-next-2016-01-24' of git://anongit.freedesktop.org/drm-intel: (148 commits) drm/i915: Update DRIVER_DATE to 20160124 drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids drm/i915: Decouple execbuf uAPI from internal implementation drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists drm/i915: Limit the auto arming of mmio debugs on vlv/chv drm/i915: Tune down "GT register while GT waking disabled" message drm/i915: tidy up a few leftovers drm/i915: abolish separate per-ring default_context pointers drm/i915: simplify allocation of driver-internal requests drm/i915: Fix NULL plane->fb oops on SKL drm/i915: Do not put big intel_crtc_state on the stack Revert "drm/i915: Add two-stage ILK-style watermark programming (v10)" drm/i915: add DOC: headline to RC6 kernel-doc drm/i915: turn some bogus kernel-doc comments to normal comments drm/i915/sdvo: revert bogus kernel-doc comments to normal comments drm/i915/gen9: Correct max save/restore register count during gpu reset with GuC drm/i915: Demote user facing DMC firmware load failure message drm/i915: use hlist_for_each_entry drm/i915: skl_update_scaler() wants a rotation bitmask instead of bit number drm/i915: Don't reject primary plane windowing with color keying enabled on SKL+ ...
2016-02-07Merge 4.5-rc3 into staging-nextGreg Kroah-Hartman1-1/+0
We want the upstream staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-08quota: add new quotactl Q_GETNEXTQUOTAEric Sandeen1-0/+14
Q_GETNEXTQUOTA is exactly like Q_GETQUOTA, except that it will return quota information for the id equal to or greater than the id requested. In other words, if the requested id has no quota, the command will return quota information for the next higher id which does have a quota set. If no higher id has an active quota, -ESRCH is returned. This allows filesystems to do efficient iteration in kernelspace, much like extN filesystems do in userspace when asked to report all active quotas. This does require a new data structure for userspace, as the current structure does not include an ID for the returned quota information. Today, Ext4 with a hidden quota inode requires getpwent-style iterations, and for systems which have i.e. LDAP backends, this can be very slow, or even impossible if iteration is not allowed in the configuration. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-02-08quota: add new quotactl Q_XGETNEXTQUOTAEric Sandeen1-0/+1
Q_XGETNEXTQUOTA is exactly like Q_XGETQUOTA, except that it will return quota information for the id equal to or greater than the id requested. In other words, if the requested id has no quota, the command will return quota information for the next higher id which does have a quota set. If no higher id has an active quota, -ESRCH is returned. This allows filesystems to do efficient iteration in kernelspace, much like extN filesystems do in userspace when asked to report all active quotas. The patch adds a d_id field to struct qc_dqblk so that we can pass back the id of the quota which was found, and return it to userspace. Today, filesystems such as XFS require getpwent-style iterations, and for systems which have i.e. LDAP backends, this can be very slow, or even impossible if iteration is not allowed in the configuration. Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2016-02-07ethtool: add speed/duplex validation functionsNikolay Aleksandrov1-0/+34
Add functions which check if the speed/duplex are defined. Signed-off-by: Nikolay Aleksandrov <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-07net: Add support for fill_slave_info to VRF deviceDavid Ahern1-0/+8
Allows userspace to have direct access to VRF table association versus looking up master device and its table. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-06bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY mapAlexei Starovoitov1-0/+1
Primary use case is a histogram array of latency where bpf program computes the latency of block requests or other events and stores histogram of latency into array of 64 elements. All cpus are constantly running, so normal increment is not accurate, bpf_xadd causes cache ping-pong and this per-cpu approach allows fastest collision-free counters. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-06bpf: introduce BPF_MAP_TYPE_PERCPU_HASH mapAlexei Starovoitov1-0/+1
Introduce BPF_MAP_TYPE_PERCPU_HASH map type which is used to do accurate counters without need to use BPF_XADD instruction which turned out to be too costly for high-performance network monitoring. In the typical use case the 'key' is the flow tuple or other long living object that sees a lot of events per second. bpf_map_lookup_elem() returns per-cpu area. Example: struct { u32 packets; u32 bytes; } * ptr = bpf_map_lookup_elem(&map, &key); /* ptr points to this_cpu area of the value, so the following * increments will not collide with other cpus */ ptr->packets ++; ptr->bytes += skb->len; bpf_update_elem() atomically creates a new element where all per-cpu values are zero initialized and this_cpu value is populated with given 'value'. Note that non-per-cpu hash map always allocates new element and then deletes old after rcu grace period to maintain atomicity of update. Per-cpu hash map updates element values in-place. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-06net: add rx_nohandler stat counterJarod Wilson1-0/+4
This adds an rx_nohandler stat counter, along with a sysfs statistics node, and copies the counter out via netlink as well. CC: "David S. Miller" <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jiri Pirko <[email protected]> CC: Daniel Borkmann <[email protected]> CC: Tom Herbert <[email protected]> CC: Jay Vosburgh <[email protected]> CC: Veaceslav Falico <[email protected]> CC: Andy Gospodarek <[email protected]> CC: [email protected] Signed-off-by: Jarod Wilson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-03Add ioctls to enable and disable local controls on an instrumentDave Penkler1-0/+6
These ioctls provide support for the USBTMC-USB488 control requests for REN_CONTROL, GO_TO_LOCAL and LOCAL_LOCKOUT Signed-off-by: Dave Penkler <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-03Add ioctl to retrieve USBTMC-USB488 capabilitiesDave Penkler1-3/+18
This is a convenience function to obtain an instrument's capabilities from its file descriptor without having to access sysfs from the user program. Signed-off-by: Dave Penkler <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-03Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation.Dave Penkler1-0/+2
Background: When performing a read on an instrument that is executing a function that runs longer than the USB timeout the instrument may hang and require a device reset to recover. The READ_STATUS_BYTE operation always returns even when the instrument is busy permitting to poll for the appropriate condition. This capability is referred to in instrument application notes on synchronizing acquisitions for other platforms. Signed-off-by: Dave Penkler <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-01Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved area for storing a struct page array. 2/ Fixes for dax operations on a raw block device to prevent pagecache collisions with dax mappings. 3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null pointer de-reference. These have received build success notification from the kbuild robot across 153 configs and pass the latest ndctl tests" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: phys_to_pfn_t: use phys_addr_t mm: fix pfn_t to page conversion in vm_insert_mixed block: use DAX for partition table reads block: revert runtime dax control of the raw block device fs, block: force direct-I/O for dax-enabled block devices devm_memremap_pages: fix vmem_altmap lifetime + alignment handling libnvdimm, pfn: fix restoring memmap location libnvdimm: fix mode determination for e820 devices
2016-02-01Merge tag 'iio-for-4.6a' of ↵Greg Kroah-Hartman1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: First round of new IIO device support, features and cleanups for the 4.6 cycle. Device Support * ad5761 - new driver * at91_sama5d2 ADC. - new driver and MAINTAINERS entry. - minor cleanups followed. * atlas pH-SM - new driver (this has possibly the prettiest data sheet I've ever seen) * mcp3422 - mcp3425 ADC added. * mcp4725 - mcp4726 DAC added. * mma8452 - mma8451q accelerometer added. * mpl115 - mpl115a1 added (a lot bigger than it seems as this is an SPI part whereas previous parts were i2c). * si7005 - Hoperf th02 (seems to be a repackaged part) * si7020 - Hoperf th06 (seems to be a repackaged part) New features * Core - IIO_PH type. Does what it says on the tin. * max30100 - LED current configuration support. * mcp320x - more differential measurement combinations. * mma8452 - free fall deteciton - opt3001 - enable operation without a IRQ line. - device tree docs. Somehow the original docs have disappeared down a rabbit hole, so here is a new set. * st-sensors - Support active-low interrupts. Cleanups and minor / not so minor reworks * Documentation - drop some defunct ABI from the docs in staging. * presure / Kconfig - white space cleanup. * ad7150 - BIT macro usage - Alignment fixes * ad7192 - false indent fixed. * ak8975 - constify the ak_def structures * axp288 - drop a redundant double const. * dht11 - substantial reliability improvements by being more tolerant of missing start bits. - simplify the decoding algorithm * mma8452 - whitespace cleanup * mpl115 - don't bother setting i2c_client_data as nothing uses it. * mpu6050 - drop unused function parameter. * opt3001 - extract integration time as constants. - trivial refactoring.
2016-02-01Merge 4.5-rc2 into usb-nextGreg Kroah-Hartman1-0/+3
We want the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-01[media] media: v4l: Dual license v4l2-common.h under GPL v2 and BSD licensesSakari Ailus1-11/+35
The v4l2-common.h user space header was split off from videodev2.h, but the dual licensing of the videodev2.h (as well as other V4L2 headers) was missed. Change the license of the v4l2-common.h from GNU GPL v2 to both GNU GPL v2 and BSD. Sakari Ailus <[email protected]>: > Would you approve a license change of the patches to > include/uapi/linux/v4l2-common.h (formerly include/linux/v4l2-common.h) you > or your company have contributed from GNU GPL v2 to dual GNU GPL v2 and BSD > licenses, changing the copyright notice in the file as below (from > videodev2.h): > > -------------8<------------ > * This program is free software; you can redistribute it and/or modify > * it under the terms of the GNU General Public License as published by > * the Free Software Foundation; either version 2 of the License, or > * (at your option) any later version. > * > * This program is distributed in the hope that it will be useful, > * but WITHOUT ANY WARRANTY; without even the implied warranty of > * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > * GNU General Public License for more details. > * > * Alternatively you can redistribute this file under the terms of the > * BSD license as stated below: > * > * Redistribution and use in source and binary forms, with or without > * modification, are permitted provided that the following conditions > * are met: > * 1. Redistributions of source code must retain the above copyright > * notice, this list of conditions and the following disclaimer. > * 2. Redistributions in binary form must reproduce the above copyright > * notice, this list of conditions and the following disclaimer in > * the documentation and/or other materials provided with the > * distribution. > * 3. The names of its contributors may not be used to endorse or promote > * products derived from this software without specific prior written > * permission. > * > * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED > * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR > * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF > * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING > * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > -------------8<------------ Mauro Carvalho Chehab <[email protected]>: > No problem from my side. Hans Verkuil <[email protected]>: > Acked-by: Hans Verkuil <[email protected]> Aaro Koskinen <[email protected]>: > This fine also for us. > > Acked-by: Aaro Koskinen <[email protected]> Signed-off-by: Sakari Ailus <[email protected]> Acked-by: Hans Verkuil <[email protected]> Acked-by: Aaro Koskinen <[email protected]> Acked-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2016-02-01[media] media.h: add support for IF-PLL video/sound decoderMauro Carvalho Chehab1-2/+15
Very old hardware may have an analog stage tuner. Those hardware consists of a PLL that converts a RF signal into IF signals. Depending on the hardware, those video IF signal can be decoded directly by the bridge chipset. Most Conexant chips (bt8x8, cx2388x, etc) have internally the decoders for that. Yet, even on such hardware, the tuner may have internally its own TV multi-standard decoder like tda9887. The same happens with the audio IF signal, where some bridges are capable of receiving it, while others require an external IF-PLL sound decoder, like msp3400. Those external IF-PLL audio and video decoders have their own I2C address, and use different drivers to handle them. So, they're mapped as different subdevices on Linux. Thankfully, all modern hardware comes with an IC chip that has both the RF and the IF stages on it, being capable of decoding audio and video IF signals internally. Yet, as we need to support drivers that can work with either analog or silicon tuners, we need to add two entity types for those old hardware. Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2016-01-30block: revert runtime dax control of the raw block deviceDan Williams1-1/+0
Dynamically enabling DAX requires that the page cache first be flushed and invalidated. This must occur atomically with the change of DAX mode otherwise we confuse the fsync/msync tracking and violate data durability guarantees. Eliminate the possibilty of DAX-disabled to DAX-enabled transitions for now and revisit this for the next cycle. Cc: Jan Kara <[email protected]> Cc: Jeff Moyer <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ross Zwisler <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-01-30iio: ph: add IIO_PH channel typeMatt Ranostay1-0/+1
Signed-off-by: Matt Ranostay <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]>
2016-01-28drm/i915: Fix VCS ring selection after uapi decouplingTvrtko Ursulin1-4/+6
This got broken in: commit de1add360522c876c25ef2bbbbab1c94bdb509ab Author: Tvrtko Ursulin <[email protected]> Date: Fri Jan 15 15:12:50 2016 +0000 drm/i915: Decouple execbuf uAPI from internal implementation BSD ring flags need to be shifted before they can be considered indices into the ring array. Reported by Zhipeng Gong. v2: Simplify the code. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Zhipeng Gong <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Testcase: igt/gem_exec_basic # bdw-gt3
2016-01-26drm/etnaviv: add further minor features and varyings countRussell King1-0/+3
Export further minor feature bitmasks and the varyings count from the GPU specifications registers to userspace. Acked-by: Christian Gmeiner <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: Lucas Stach <[email protected]>
2016-01-25audit: stop an old auditd being starved out by a new auditdRichard Guy Briggs1-0/+1
Nothing prevents a new auditd starting up and replacing a valid audit_pid when an old auditd is still running, effectively starving out the old auditd since audit_pid no longer points to the old valid auditd. If no message to auditd has been attempted since auditd died unnaturally or got killed, audit_pid will still indicate it is alive. There isn't an easy way to detect if an old auditd is still running on the existing audit_pid other than attempting to send a message to see if it fails. An -ECONNREFUSED almost certainly means it disappeared and can be replaced. Other errors are not so straightforward and may indicate transient problems that will resolve themselves and the old auditd will recover. Yet others will likely need manual intervention for which a new auditd will not solve the problem. Send a new message type (AUDIT_REPLACE) to the old auditd containing a u32 with the PID of the new auditd. If the audit replace message succeeds (or doesn't fail with certainty), fail to register the new auditd and return an error (-EEXIST). This is expected to make the patch preventing an old auditd orphaning a new auditd redundant. V3: Switch audit message type from 1000 to 1300 block. Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2016-01-25Revert "[media] Postpone the addition of MEDIA_IOC_G_TOPOLOGY"Mauro Carvalho Chehab1-5/+1
Enable MEDIA_IOC_G_TOPOLOGY ioctl for Kernel 4.6. This reverts commit be0270ec89e6b9b49de7e533dd1f3a89ad34d205. Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2016-01-24usb: Support USB 3.1 extended port status requestMathias Nyman1-0/+21
usb 3.1 extend the hub get-port-status request by adding different request types. the new request types return 4 additional bytes called extended port status, these bytes are returned after the regular portstatus and portchange values. The extended port status contains a speed ID for the currently used sublink speed. A table of supported Speed IDs with details about the link is provided by the hub in the device descriptor BOS SuperSpeedPlus device capability Sublink Speed Attributes. Support this new request. Ask for the extended port status after port reset if hub supports USB 3.1. If link is running at SuperSpeedPlus set the device speed to USB_SPEED_SUPER_PLUS Signed-off-by: Mathias Nyman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-01-24usb: define USB_SPEED_SUPER_PLUS speed for SuperSpeedPlus USB3.1 devicesMathias Nyman1-0/+1
Add a new USB_SPEED_SUPER_PLUS device speed, and make sure usb core can handle the new speed. In most cases the behaviour is the same as with USB_SPEED_SUPER SuperSpeed devices. In a few places we add a "Plus" string to inform the user of the new speed. Signed-off-by: Mathias Nyman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-01-22Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-4/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Some locking and page fault bug fixes from Jan Kara, some ext4 encryption fixes from me, and Li Xi's Project Quota commits" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: fs: clean up the flags definition in uapi/linux/fs.h ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support ext4: add project quota support ext4: adds project ID support ext4 crypto: simplify interfaces to directory entry insert functions ext4 crypto: add missing locking for keyring_key access ext4: use pre-zeroed blocks for DAX page faults ext4: implement allocation of pre-zeroed blocks ext4: provide ext4_issue_zeroout() ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag ext4: document lock ordering ext4: fix races of writeback with punch hole and zero range ext4: fix races between buffered IO and collapse / insert range ext4: move unlocked dio protection from ext4_alloc_file_blocks() ext4: fix races between page faults and hole punching
2016-01-22Merge tag 'xfs-for-linus-4.5-2' of ↵Linus Torvalds1-0/+33
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull more xfs updates from Dave Chinner: "This is the second update for XFS that I mentioned in the original pull request last week. It contains a revert for a suspend regression in 4.4 and a fix for a long standing log recovery issue that has been further exposed by all the log recovery changes made in the original 4.5 merge. There is one more thing in this pull request - one that I forgot to merge into the origin. That is, pulling the XFS_IOC_FS[GS]ETXATTR ioctl up to the VFS level so that other filesystems can also use it for modifying project quota IDs Summary: - promotion of XFS_IOC_FS[GS]ETXATTR ioctl to the vfs level so that it can be shared with other filesystems. The ext4 project quota functionality is the first target for this. The commits in this series have not been updated with review or final SOB tags because the branch they were originally published in was needed by ext4. Those tags are: Reviewed-by: Theodore Ts'o <[email protected]> Signed-off-by: Dave Chinner <[email protected]> - Revert a change that is causing suspend failures. - Fix a use-after-free that can occur on log mount failures. Been around forever, but now exposed by other changes to log recovery made in the first 4.5 merge" * tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: log mount failures don't wait for buffers to be released Revert "xfs: clear PF_NOFREEZE for xfsaild kthread" xfs: introduce per-inode DAX enablement xfs: use FS_XFLAG definitions directly fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR promotion
2016-01-21Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull NVMe updates from Jens Axboe: "Last branch for this series is the nvme changes. It's in a separate branch to avoid splitting too much between core and NVMe changes, since NVMe is still helping drive some blk-mq changes. That said, not a huge amount of core changes in here. The grunt of the work is the continued split of the code" * 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits) uapi: update install list after nvme.h rename NVMe: Export NVMe attributes to sysfs group NVMe: Shutdown controller only for power-off NVMe: IO queue deletion re-write NVMe: Remove queue freezing on resets NVMe: Use a retryable error code on reset NVMe: Fix admin queue ring wrap nvme: make SG_IO support optional nvme: fixes for NVME_IOCTL_IO_CMD on the char device nvme: synchronize access to ctrl->namespaces nvme: Move nvme_freeze/unfreeze_queues to nvme core PCI/AER: include header file NVMe: Export namespace attributes to sysfs NVMe: Add pci error handlers block: remove REQ_NO_TIMEOUT flag nvme: merge iod and cmd_info nvme: meta_sg doesn't have to be an array nvme: properly free resources for cancelled command nvme: simplify completion handling nvme: special case AEN requests ...
2016-01-21Merge branch 'for-4.5/lightnvm' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+31
Pull lightnvm fixes and updates from Jens Axboe: "This should have been part of the drivers branch, but it arrived a bit late and wasn't based on the official core block driver branch. So they got a small scolding, but got a pass since it's still new. Hence it's in a separate branch. This is mostly pure fixes, contained to lightnvm/, and minor feature additions" * 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block: (26 commits) lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM lightnvm: introduce factory reset lightnvm: use system block for mm initialization lightnvm: introduce ioctl to initialize device lightnvm: core on-disk initialization lightnvm: introduce mlc lower page table mappings lightnvm: add mccap support lightnvm: manage open and closed blocks separately lightnvm: fix missing grown bad block type lightnvm: reference rrpc lun in rrpc block lightnvm: introduce nvm_submit_ppa lightnvm: move rq->error to nvm_rq->error lightnvm: support multiple ppas in nvm_erase_ppa lightnvm: move the pages per block check out of the loop lightnvm: sectors first in ppa list lightnvm: fix locking and mempool in rrpc_lun_gc lightnvm: put block back to gc list on its reclaim fail lightnvm: check bi_error in gc lightnvm: return the get_bb_tbl return value lightnvm: refactor end_io functions for sync ...
2016-01-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+3
Merge third patch-bomb from Andrew Morton: "I'm pretty much done for -rc1 now: - the rest of MM, basically - lib/ updates - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit - cpu_mask simplifications - kexec, rapidio, MAINTAINERS etc, etc. - more dma-mapping cleanups/simplifications from hch" * emailed patches from Andrew Morton <[email protected]>: (109 commits) MAINTAINERS: add/fix git URLs for various subsystems mm: memcontrol: add "sock" to cgroup2 memory.stat mm: memcontrol: basic memory statistics in cgroup2 memory controller mm: memcontrol: do not uncharge old page in page cache replacement Documentation: cgroup: add memory.swap.{current,max} description mm: free swap cache aggressively if memcg swap is full mm: vmscan: do not scan anon pages if memcg swap limit is hit swap.h: move memcg related stuff to the end of the file mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online mm: vmscan: pass memcg to get_scan_count() mm: memcontrol: charge swap to cgroup2 mm: memcontrol: clean up alloc, online, offline, free functions mm: memcontrol: flatten struct cg_proto mm: memcontrol: rein in the CONFIG space madness net: drop tcp_memcontrol.c mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM mm: memcontrol: allow to disable kmem accounting for cgroup2 mm: memcontrol: account "kmem" consumers in cgroup2 memory controller mm: memcontrol: move kmem accounting code to CONFIG_MEMCG mm: memcontrol: separate kmem code from legacy tcp accounting code ...
2016-01-21Merge branch 'overlayfs-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This contains several bug fixes and a new mount option 'default_permissions' that allows read-only exported NFS filesystems to be used as lower layer" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: check dentry positiveness in ovl_cleanup_whiteouts() ovl: setattr: check permissions before copy-up ovl: root: copy attr ovl: move super block magic number to magic.h ovl: use a minimal buffer in ovl_copy_xattr ovl: allow zero size xattr ovl: default permissions
2016-01-21Merge branch 'for-linus' of ↵Linus Torvalds1-1/+16
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This adds SEEK_HOLE and SEEK_DATA support in lseek" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add support for SEEK_HOLE and SEEK_DATA in lseek
2016-01-21drm/i915: Seal busy-ioctl uABI and prevent leaking of internal idsChris Wilson1-4/+29
Tvrtko was looking through the execbuffer-ioctl and noticed that the uABI was tightly coupled to our internal engine identifiers. Close inspection also revealed that we leak those internal engine identifiers through the busy-ioctl, and those internal identifiers already do not match the user identifiers. Fortuitiously, there is only one user of the set of busy rings from the busy-ioctl, and they only wish to choose between the RENDER and the BLT engines. Let's fix the userspace ABI while we still can. v2: Update the uAPI documentation to explain the identifiers. Signed-off-by: Chris Wilson <[email protected]> Testcase: igt/gem_busy Cc: Tvrtko Ursulin <[email protected]> Cc: Daniel Vetter <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Acked-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2016-01-20epoll: add EPOLLEXCLUSIVE flagJason Baron1-0/+3
Currently, epoll file descriptors or epfds (the fd returned from epoll_create[1]()) that are added to a shared wakeup source are always added in a non-exclusive manner. This means that when we have multiple epfds attached to a shared fd source they are all woken up. This creates thundering herd type behavior. Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the 'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation. This new flag allows for exclusive wakeups when there are multiple epfds attached to a shared fd event source. The implementation walks the list of exclusive waiters, and queues an event to each epfd, until it finds the first waiter that has threads blocked on it via epoll_wait(). The idea is to search for threads which are idle and ready to process the wakeup events. Thus, we queue an event to at least 1 epfd, but may still potentially queue an event to all epfds that are attached to the shared fd source. Performance testing was done by Madars Vitolins using a modified version of Enduro/X. The use of the 'EPOLLEXCLUSIVE' flag reduce the length of this particular workload from 860s down to 24s. Sample epoll_clt text: EPOLLEXCLUSIVE Sets an exclusive wakeup mode for the epfd file descriptor that is being attached to the target file descriptor, fd. Thus, when an event occurs and multiple epfd file descriptors are attached to the same target file using EPOLLEXCLUSIVE, one or more epfds will receive an event with epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not set) is for all epfds to receive an event. EPOLLEXCLUSIVE may only be specified with the op EPOLL_CTL_ADD. Signed-off-by: Jason Baron <[email protected]> Tested-by: Madars Vitolins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Al Viro <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Eric Wong <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hagen Paul Pfeifer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-17Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: - EVM gains support for loading an x509 cert from the kernel (EVM_LOAD_X509), into the EVM trusted kernel keyring. - Smack implements 'file receive' process-based permission checking for sockets, rather than just depending on inode checks. - Misc enhancments for TPM & TPM2. - Cleanups and bugfixes for SELinux, Keys, and IMA. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (41 commits) selinux: Inode label revalidation performance fix KEYS: refcount bug fix ima: ima_write_policy() limit locking IMA: policy can be updated zero times selinux: rate-limit netlink message warnings in selinux_nlmsg_perm() selinux: export validatetrans decisions gfs2: Invalid security labels of inodes when they go invalid selinux: Revalidate invalid inode security labels security: Add hook to invalidate inode security labels selinux: Add accessor functions for inode->i_security security: Make inode argument of inode_getsecid non-const security: Make inode argument of inode_getsecurity non-const selinux: Remove unused variable in selinux_inode_init_security keys, trusted: seal with a TPM2 authorization policy keys, trusted: select hash algorithm for TPM2 chips keys, trusted: fix: *do not* allow duplicate key options tpm_ibmvtpm: properly handle interrupted packet receptions tpm_tis: Tighten IRQ auto-probing tpm_tis: Refactor the interrupt setup tpm_tis: Get rid of the duplicate IRQ probing code ...
2016-01-17Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds26-519/+1020
Pull drm updates from Dave Airlie: "This is the main drm pull request for 4.5. I don't think I've missed anything too major, I'm mostly back at work now but I'll probably get some sleep in 5 years time. Summary: New drivers: - etnaviv: GPU driver for the 3D core on the Vivante core used in numerous ARM boards. Highlights: Core: - Atomic suspend/resume helpers - Move the headers to using userspace friendlier types. - Documentation updates - Lots of struct_mutex removal. - Bunch of DP MST fixes from AMD. Panel: - More DSI helpers - Support for some new basic panels i915: - Basic Kabylake support - DP link training and detect code refactoring - fbc/psr fixes - FIFO underrun fixes - SDE interrupt handling fixes - dma-buf/fence support in pageflip path. - GPU side for MST audio support radeon/amdgpu: - Drop UMS support - GPUVM/Scheduler optimisations - Initial Powerplay support for Tonga/Fiji/CZ/ST - ACP audio prerequisites nouveau: - GK20a instmem improvements - PCIE link speed change support msm: - DSI support for msm8960/apq8064 tegra: - Host1X support for Tegra210 SoC vc4: - 3D acceleration support armada: - Get rid of struct mutex tda998x: - Atomic modesetting support - TMDS clock limitations omapdrm: - Atomic modesetting support - improved TILER performance rockchip: - RK3036 VOP support - Atomic modesetting support - Synopsys DW MIPI DSI support exynos: - Runtime PM support - of_graph binding for DP panels - Cleanup of IPP code - Configurable plane support - Kernel panic fixes at release time" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (711 commits) drm/fb_cma_helper: Remove implicit call to disable_unused_functions drm/amdgpu: add missing irq.h include drm/vmwgfx: Fix a width / pitch mismatch on framebuffer updates drm/vmwgfx: Fix an incorrect lock check drm: nouveau: fix nouveau_debugfs_init prototype drm/nouveau/pci: fix check in nvkm_pcie_set_link drm/amdgpu: validate duplicates first drm/amdgpu: move VM page tables to the LRU end on CS v2 drm/ttm: add ttm_bo_move_to_lru_tail function v2 drm/ttm: fix adding foreign BOs to the swap LRU drm/ttm: fix adding foreign BOs to the LRU during init v2 drm/radeon: use kobj_to_dev() drm/amdgpu: use kobj_to_dev() drm/amdgpu/cz: force vce clocks when sclks are forced drm/amdgpu/cz: force uvd clocks when sclks are forced drm/amdgpu/cz: add code to enable forcing VCE clocks drm/amdgpu/cz: add code to enable forcing UVD clocks drm/amdgpu: fix lost sync_to if scheduler is enabled. drm/amd/powerplay: fix static checker warning for return meaningless value. drm/sysfs: use kobj_to_dev() ...
2016-01-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+1
Merge second patch-bomb from Andrew Morton: - more MM stuff: - Kirill's page-flags rework - Kirill's now-allegedly-fixed THP rework - MADV_FREE implementation - DAX feature work (msync/fsync). This isn't quite complete but DAX is new and it's good enough and the guys have a handle on what needs to be done - I expect this to be wrapped in the next week or two. - some vsprintf maintenance work - various other misc bits * emailed patches from Andrew Morton <[email protected]>: (145 commits) printk: change recursion_bug type to bool lib/vsprintf: factor out %pN[F] handler as netdev_bits() lib/vsprintf: refactor duplicate code to special_hex_number() printk-formats.txt: remove unimplemented %pT printk: help pr_debug and pr_devel to optimize out arguments lib/test_printf.c: test dentry printing lib/test_printf.c: add test for large bitmaps lib/test_printf.c: account for kvasprintf tests lib/test_printf.c: add a few number() tests lib/test_printf.c: test precision quirks lib/test_printf.c: check for out-of-bound writes lib/test_printf.c: don't BUG lib/kasprintf.c: add sanity check to kvasprintf lib/vsprintf.c: warn about too large precisions and field widths lib/vsprintf.c: help gcc make number() smaller lib/vsprintf.c: expand field_width to 24 bits lib/vsprintf.c: eliminate potential race in string() lib/vsprintf.c: move string() below widen_string() lib/vsprintf.c: pull out padding code from dentry_name() printk: do cond_resched() between lines while outputting to consoles ...
2016-01-17Merge tag 'sound-4.5-rc1' of ↵Linus Torvalds3-5/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "We've had quite busy weeks in this cycle. Looking at ALSA core, the significant changes are a few fixes wrt timer and sequencer ioctls that have been revealed by fuzzer recently. Other than that, ASoC core got a few updates about DAI link handling, but these are rather straightforward refactoring. In drivers scene, ASoC received quite lots of new drivers in addition to bunch of updates for still ongoing Intel Skylake support and topology API. HD-audio gained a new HDMI/DP hotplug notification via component. FireWire got a pile of code refactoring/updates with SCS.1x driver integration. More highlights are shown below. [ NOTE: this contains also many commits for DRM. This is due to the pull of drm stable branch into sound tree, as the base of i915 audio component work for HD-audio. The highlights below don't contain these DRM changes, as these are supposed to be pulled via drm tree in anyway sooner or later. ] Core: - Handful fixes to harden ALSA timer and sequencer ioctls against races reported by syzkaller fuzzer - Irq description string can be unique to each card; only for HD-audio for now ASoC: - Conversion of the array of DAI links to a list for supporting dynamically adding and removing DAI links - Topology API enhancements to make everything more component based and being able to specify PCM links via topology - Some more fixes for the topology code, though it is still not final and ready for enabling in production; we really need to get to the point where that can be done - A pile of changes for Intel SkyLake drivers which hopefully deliver some useful initial functionality for systems with this chipset, though there is more work still to come - Lots of new features and cleanups for the Renesas drivers - ANC support for WM5110 - New drivers: Imagination Technologies IPs, Atmel class D speaker, Cirrus CS47L24 and WM1831, Dialog DA7128, Realtek RT5659 and RT56156, Rockchip RK3036, TI PC3168A, and AMD ACP - Rename PCM1792a driver to be generic pcm179x HD-Audio: - Use audio component for i915 HDMI/DP hotplug handling - On-demand binding with i915 driver - bdl_pos_adj parameter adjustment for Baytrail controllers - Enable power_save_node for CX20722; this shouldn't lead to regression, hopefully - Kabylake HDMI/DP codec support - Quirks for Lenovo E50-80, Dell Latitude E-series, and other Dell machines - A few code refactoring FireWire: - Lots of code cleanup and refactoring - Integrate the support of SCS.1x devices into snd-oxfw driver; snd-scs1x driver is obsoleted USB-audio: - Fix possible NULL dereference at disconnection - A regression fix for Native Instruments devices Misc: - A few code cleanups of fm801 driver" * tag 'sound-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (722 commits) ALSA: timer: Code cleanup ALSA: timer: Harden slave timer list handling ALSA: hda - Add fixup for Dell Latitidue E6540 ALSA: timer: Fix race among timer ioctls ALSA: hda - add codec support for Kabylake display audio codec ALSA: timer: Fix double unlink of active_list ALSA: usb-audio: Fix mixer ctl regression of Native Instrument devices ALSA: hda - fix the headset mic detection problem for a Dell laptop ALSA: hda - Fix white noise on Dell Latitude E5550 ALSA: hda_intel: add card number to irq description ALSA: seq: Fix race at timer setup and close ALSA: seq: Fix missing NULL check at remove_events ioctl ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnect ASoC: hdac_hdmi: remove unused hdac_hdmi_query_pin_connlist ASoC: AMD: Add missing include file ALSA: hda - Fixup inverted internal mic for Lenovo E50-80 ALSA: usb: Add native DSD support for Oppo HA-1 ASoC: Make aux_dev more like a generic component ASoC: bcm2835: cleanup includes by ordering them alphabetically ASoC: AMD: Manage ACP 2.x SRAM banks power ...
2016-01-15arch/*/include/uapi/asm/mman.h: : let MADV_FREE have same value for all ↵Chen Gang1-1/+1
architectures For uapi, need try to let all macros have same value, and MADV_FREE is added into main branch recently, so need redefine MADV_FREE for it. At present, '8' can be shared with all architectures, so redefine it to '8'. [[email protected]: correct uniform value of MADV_FREE] Signed-off-by: Chen Gang <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: David S. Miller <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Shaohua Li <[email protected]> Cc: <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Daniel Micay <[email protected]> Cc: Jason Evans <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mika Penttil <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Russell King <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Will Deacon <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15mm: support madvise(MADV_FREE)Minchan Kim1-0/+1
Linux doesn't have an ability to free pages lazy while other OS already have been supported that named by madvise(MADV_FREE). The gain is clear that kernel can discard freed pages rather than swapping out or OOM if memory pressure happens. Without memory pressure, freed pages would be reused by userspace without another additional overhead(ex, page fault + allocation + zeroing). Jason Evans said: : Facebook has been using MAP_UNINITIALIZED : (https://lkml.org/lkml/2012/1/18/308) in some of its applications for : several years, but there are operational costs to maintaining this : out-of-tree in our kernel and in jemalloc, and we are anxious to retire it : in favor of MADV_FREE. When we first enabled MAP_UNINITIALIZED it : increased throughput for much of our workload by ~5%, and although the : benefit has decreased using newer hardware and kernels, there is still : enough benefit that we cannot reasonably retire it without a replacement. : : Aside from Facebook operations, there are numerous broadly used : applications that would benefit from MADV_FREE. The ones that immediately : come to mind are redis, varnish, and MariaDB. I don't have much insight : into Android internals and development process, but I would hope to see : MADV_FREE support eventually end up there as well to benefit applications : linked with the integrated jemalloc. : : jemalloc will use MADV_FREE once it becomes available in the Linux kernel. : In fact, jemalloc already uses MADV_FREE or equivalent everywhere it's : available: *BSD, OS X, Windows, and Solaris -- every platform except Linux : (and AIX, but I'm not sure it even compiles on AIX). The lack of : MADV_FREE on Linux forced me down a long series of increasingly : sophisticated heuristics for madvise() volume reduction, and even so this : remains a common performance issue for people using jemalloc on Linux. : Please integrate MADV_FREE; many people will benefit substantially. How it works: When madvise syscall is called, VM clears dirty bit of ptes of the range. If memory pressure happens, VM checks dirty bit of page table and if it found still "clean", it means it's a "lazyfree pages" so VM could discard the page instead of swapping out. Once there was store operation for the page before VM peek a page to reclaim, dirty bit is set so VM can swap out the page instead of discarding. One thing we should notice is that basically, MADV_FREE relies on dirty bit in page table entry to decide whether VM allows to discard the page or not. IOW, if page table entry includes marked dirty bit, VM shouldn't discard the page. However, as a example, if swap-in by read fault happens, page table entry doesn't have dirty bit so MADV_FREE could discard the page wrongly. For avoiding the problem, MADV_FREE did more checks with PageDirty and PageSwapCache. It worked out because swapped-in page lives on swap cache and since it is evicted from the swap cache, the page has PG_dirty flag. So both page flags check effectively prevent wrong discarding by MADV_FREE. However, a problem in above logic is that swapped-in page has PG_dirty still after they are removed from swap cache so VM cannot consider the page as freeable any more even if madvise_free is called in future. Look at below example for detail. ptr = malloc(); memset(ptr); .. .. .. heavy memory pressure so all of pages are swapped out .. .. var = *ptr; -> a page swapped-in and could be removed from swapcache. Then, page table doesn't mark dirty bit and page descriptor includes PG_dirty .. .. madvise_free(ptr); -> It doesn't clear PG_dirty of the page. .. .. .. .. heavy memory pressure again. .. In this time, VM cannot discard the page because the page .. has *PG_dirty* To solve the problem, this patch clears PG_dirty if only the page is owned exclusively by current process when madvise is called because PG_dirty represents ptes's dirtiness in several processes so we could clear it only if we own it exclusively. Firstly, heavy users would be general allocators(ex, jemalloc, tcmalloc and hope glibc supports it) and jemalloc/tcmalloc already have supported the feature for other OS(ex, FreeBSD) barrios@blaptop:~/benchmark/ebizzy$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 12 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 2 Stepping: 3 CPU MHz: 3200.185 BogoMIPS: 6400.53 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K NUMA node0 CPU(s): 0-11 ebizzy benchmark(./ebizzy -S 10 -n 512) Higher avg is better. vanilla-jemalloc MADV_free-jemalloc 1 thread records: 10 records: 10 avg: 2961.90 avg: 12069.70 std: 71.96(2.43%) std: 186.68(1.55%) max: 3070.00 max: 12385.00 min: 2796.00 min: 11746.00 2 thread records: 10 records: 10 avg: 5020.00 avg: 17827.00 std: 264.87(5.28%) std: 358.52(2.01%) max: 5244.00 max: 18760.00 min: 4251.00 min: 17382.00 4 thread records: 10 records: 10 avg: 8988.80 avg: 27930.80 std: 1175.33(13.08%) std: 3317.33(11.88%) max: 9508.00 max: 30879.00 min: 5477.00 min: 21024.00 8 thread records: 10 records: 10 avg: 13036.50 avg: 33739.40 std: 170.67(1.31%) std: 5146.22(15.25%) max: 13371.00 max: 40572.00 min: 12785.00 min: 24088.00 16 thread records: 10 records: 10 avg: 11092.40 avg: 31424.20 std: 710.60(6.41%) std: 3763.89(11.98%) max: 12446.00 max: 36635.00 min: 9949.00 min: 25669.00 32 thread records: 10 records: 10 avg: 11067.00 avg: 34495.80 std: 971.06(8.77%) std: 2721.36(7.89%) max: 12010.00 max: 38598.00 min: 9002.00 min: 30636.00 In summary, MADV_FREE is about much faster than MADV_DONTNEED. This patch (of 12): Add core MADV_FREE implementation. [[email protected]: small cleanups] Signed-off-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Mika Penttil <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Jason Evans <[email protected]> Cc: Daniel Micay <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Shaohua Li <[email protected]> Cc: <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: "Shaohua Li" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Gang <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: David S. Miller <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Russell King <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Will Deacon <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-15Merge tag 'vfio-v4.5-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds1-0/+9
Pull VFIO updates from Alex Williamson: - Fixes in AMD xgbe reset, spapr structure padding, type 1 flags (Dan Carpenter, Alexey Kardashevskiy, Pierre Morel) - Re-introduce no-iommu mode, with a user this time (Alex Williamson) * tag 'vfio-v4.5-rc1' of git://github.com/awilliam/linux-vfio: vfio/iommu_type1: make use of info.flags vfio: Include No-IOMMU mode vfio: Add explicit alignments in vfio_iommu_spapr_tce_create VFIO: platform: reset: fix a warning message condition
2016-01-15Merge tag 'md/4.5' of git://neil.brown.name/mdLinus Torvalds1-2/+2
Pull md updates from Neil Brown: "Mostly clustered-raid1 and raid5 journal updates. one Y2038 fix and other minor stuff. One patch removes me from the MAINTAINERS file and adds a record of my md maintainership to Credits" Many thanks to Neil, who has been around for a _looong_ time. * tag 'md/4.5' of git://neil.brown.name/md: (26 commits) md/raid: only permit hot-add of compatible integrity profiles Remove myself as MD Maintainer, and add to Credits. raid5-cache: handle journal hotadd in quiesce MD: add journal with array suspended md: set MD_HAS_JOURNAL in correct places md: Remove 'ready' field from mddev. md: remove unnecesary md_new_event_inintr raid5: allow r5l_io_unit allocations to fail raid5-cache: use a mempool for the metadata block raid5-cache: use a bio_set raid5-cache: add journal hot add/remove support drivers: md: use ktime_get_real_seconds() md: avoid warning for 32-bit sector_t raid5-cache: free meta_page earlier raid5-cache: simplify r5l_move_io_unit_list md: update comment for md_allow_write md-cluster: update comments for MD_CLUSTER_SEND_LOCKED_ALREADY md-cluster: Protect communication with mutexes md-cluster: Defer MD reloading to mddev->thread md-cluster: update the documentation ...
2016-01-13Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+10
Pull first round of SCSI updates from James Bottomley: "This includes driver updates from the usual suspects (bfa, arcmsr, scsi_dh_alua, lpfc, storvsc, cxlflash). The major change is the addition of the hisi_sas driver, which is an ARM platform device for SAS. The other change of note is an enormous style transformation to the atp870u driver (which is our worst written SCSI driver)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (169 commits) cxlflash: Enable device id for future IBM CXL adapter cxlflash: Resolve oops in wait_port_offline cxlflash: Fix to resolve cmd leak after host reset cxlflash: Removed driver date print cxlflash: Fix to avoid virtual LUN failover failure cxlflash: Fix to escalate LINK_RESET also on port 1 storvsc: Tighten up the interrupt path storvsc: Refactor the code in storvsc_channel_init() storvsc: Properly support Fibre Channel devices storvsc: Fix a bug in the layout of the hv_fc_wwn_packet mvsas: Add SGPIO support to Marvell 94xx mpt3sas: A correction in unmap_resources hpsa: Add box and bay information for enclosure devices hpsa: Change SAS transport devices to bus 0. hpsa: fix path_info_show cciss: print max outstanding commands as a hex value scsi_debug: Increase the reported optimal transfer length lpfc: Update version to 11.0.0.10 for upstream patch set lpfc: Use kzalloc instead of kmalloc lpfc: Delete unnecessary checks before the function call "mempool_destroy" ...
2016-01-13Merge tag 'libnvdimm-for-4.5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this has appeared in -next and independently received a build success notification from the kbuild robot. The 'for-4.5/block- dax' topic branch was rebased over the weekend to drop the "block device end-of-life" rework that Al would like to see re-implemented with a notifier, and to address bug reports against the badblocks integration. There is pending feedback against "libnvdimm: Add a poison list and export badblocks" received last week. Linda identified some localized fixups that we will handle incrementally. Summary: - Media error handling: The 'badblocks' implementation that originated in md-raid is up-levelled to a generic capability of a block device. This initial implementation is limited to being consulted in the pmem block-i/o path. Later, 'badblocks' will be consulted when creating dax mappings. - Raw block device dax: For virtualization and other cases that want large contiguous mappings of persistent memory, add the capability to dax-mmap a block device directly. - Increased /dev/mem restrictions: Add an option to treat all io-memory as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access while a driver is actively using an address range. This behavior is controlled via the new CONFIG_IO_STRICT_DEVMEM option and can be overridden by the existing "iomem=relaxed" kernel command line option. - Miscellaneous fixes include a 'pfn'-device huge page alignment fix, block device shutdown crash fix, and other small libnvdimm fixes" * tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (32 commits) block: kill disk_{check|set|clear|alloc}_badblocks libnvdimm, pmem: nvdimm_read_bytes() badblocks support pmem, dax: disable dax in the presence of bad blocks pmem: fail io-requests to known bad blocks libnvdimm: convert to statically allocated badblocks libnvdimm: don't fail init for full badblocks list block, badblocks: introduce devm_init_badblocks block: clarify badblocks lifetime badblocks: rename badblocks_free to badblocks_exit libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h libnvdimm: Add a poison list and export badblocks nfit_test: Enable DSMs for all test NFITs md: convert to use the generic badblocks code block: Add badblock management for gendisks badblocks: Add core badblock management code block: fix del_gendisk() vs blkdev_ioctl crash block: enable dax for raw block devices block: introduce bdev_file_inode() restrict /dev/mem to idle io memory ranges arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug ...
2016-01-13Merge tag 'media/v4.5-2' of ↵Linus Torvalds1-18/+210
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull second batch of media updates from Mauro Carvalho Chehab: "This is the second part of the media patches. It contains the media controller next generation patches, with is the result of one year of discussions and development. It also contains patches to enable media controller support at the DVB subsystem. The goal is to improve the media controller to allow proper support for other types of Video4Linux devices (radio and TV ones) and to extend the media controller functionality to allow it to be used by other subsystems like DVB, ALSA and IIO. In order to use the new functionality, a new ioctl is needed (MEDIA_IOC_G_TOPOLOGY). As we're still discussing how to pack the struct fields of this ioctl in order to avoid compat32 issues, I decided to add a patch at the end of this series commenting out the new ioctl, in order to postpone the addition of the new ioctl to the next Kernel version (4.6). With that, no userspace visible changes should happen at the media controller API, as the existing ioctls are untouched. Yet, it helps DVB, ALSA and IIO developers to develop and test the patches adding media controller support there, as the core will contain all required internal changes to allow adding support for devices that belong to those subsystems" * tag 'media/v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (177 commits) [media] Postpone the addition of MEDIA_IOC_G_TOPOLOGY [media] mxl111sf: Add a tuner entity [media] dvbdev: create links on devices with multiple frontends [media] media-entitiy: add a function to create multiple links [media] dvb-usb-v2: postpone removal of media_device [media] dvbdev: Add RF connector if needed [media] dvbdev: remove two dead functions if !CONFIG_MEDIA_CONTROLLER_DVB [media] call media_device_init() before registering the V4L2 device [media] uapi/media.h: Use u32 for the number of graph objects [media] media-entity: don't sleep at media_device_register_entity() [media] media-entity: increase max number of PADs [media] media-entity.h: document the remaining functions [media] media-device.h: use just one u32 counter for object ID [media] media-entity.h fix documentation for several parameters [media] DocBook: document media_entity_graph_walk_cleanup() [media] move documentation to the header files [media] media: Move MEDIA_ENTITY_MAX_PADS from media-entity.h to media-entity.c [media] media: Remove pre-allocated entity enumeration bitmap [media] staging: v4l: davinci_vpbe: Use the new media graph walk interface [media] staging: v4l: omap4iss: Use the new media graph walk interface ...
2016-01-13Merge branch 'for-linus' of ↵Linus Torvalds2-3/+78
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - new driver for eGalaxTouch serial touchscreen - new driver for TS-4800 touchscreen - an update for Goodix touchscreen driver - PS/2 mouse module was reworked to limit number of protocols we try on pass-through ports to speed up their detection time - wacom_w8001 touchscreen driver now reports pen and touch via separate instances of input devices - other driver changes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (42 commits) Input: elantech - mark protocols v2 and v3 as semi-mt Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE Input: gpio-keys - fix check for disabling unsupported keys Input: omap-keypad - remove dead check Input: ti_am335x_tsc - fix HWPEN interrupt handling Input: omap-keypad - set tasklet data earlier Input: rohm_bu21023 - fix handling of retrying firmware update Input: ALPS - report v3 pinnacle trackstick device only if is present Input: ALPS - detect trackstick presence for v7 protocol Input: pcap_ts - use to_delayed_work Input: bma150 - constify bma150_cfg structure Input: i8042 - add Fujitsu Lifebook U745 to the nomux list Input: egalax_ts_serial - fix potential NULL dereference on error Input: uinput - sanity check on ff_effects_max and EV_FF Input: uinput - rework ABS validation Input: uinput - add new UINPUT_DEV_SETUP and UI_ABS_SETUP ioctl Input: goodix - use "inverted_[xy]" flags instead of "rotated_screen" Input: goodix - add axis swapping and axis inversion support Input: goodix - use goodix_i2c_write_u8 instead of i2c_master_send Input: goodix - add power management support ...
2016-01-13uapi: update install list after nvme.h renameMike Frysinger1-1/+1
Commit 9d99a8dda154 ("nvme: move hardware structures out of the uapi version of nvme.h") renamed nvme.h to nvme_ioctl.h, but the uapi list still refers to nvme.h. People trying to install the headers hit a failure as the header no longer exists. Cc: [email protected] Signed-off-by: Mike Frysinger <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>