aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2017-02-23Merge tag 'for-linus' of ↵Linus Torvalds5-10/+95
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull Mellanox rdma updates from Doug Ledford: "Mellanox specific updates for 4.11 merge window Because the Mellanox code required being based on a net-next tree, I keept it separate from the remainder of the RDMA stack submission that is based on 4.10-rc3. This branch contains: - Various mlx4 and mlx5 fixes and minor changes - Support for adding a tag match rule to flow specs - Support for cvlan offload operation for raw ethernet QPs - A change to the core IB code to recognize raw eth capabilities and enumerate them (touches non-Mellanox code) - Implicit On-Demand Paging memory registration support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits) IB/mlx5: Fix configuration of port capabilities IB/mlx4: Take source GID by index from HW GID table IB/mlx5: Fix blue flame buffer size calculation IB/mlx4: Remove unused variable from function declaration IB: Query ports via the core instead of direct into the driver IB: Add protocol for USNIC IB/mlx4: Support raw packet protocol IB/mlx5: Support raw packet protocol IB/core: Add raw packet protocol IB/mlx5: Add implicit MR support IB/mlx5: Expose MR cache for mlx5_ib IB/mlx5: Add null_mkey access IB/umem: Indicate that process is being terminated IB/umem: Update on demand page (ODP) support IB/core: Add implicit MR flag IB/mlx5: Support creation of a WQ with scatter FCS offload IB/mlx5: Enable QP creation with cvlan offload IB/mlx5: Enable WQ creation and modification with cvlan offload IB/mlx5: Expose vlan offloads capabilities IB/uverbs: Enable QP creation with cvlan offload ...
2017-02-23blk-mq: use sbq wait queues instead of restart for driver tagsOmar Sandoval1-0/+2
Commit 50e1dab86aa2 ("blk-mq-sched: fix starvation for multiple hardware queues and shared tags") fixed one starvation issue for shared tags. However, we can still get into a situation where we fail to allocate a tag because all tags are allocated but we don't have any pending requests on any hardware queue. One solution for this would be to restart all queues that share a tag map, but that really sucks. Ideally, we could just block and wait for a tag, but that isn't always possible from blk_mq_dispatch_rq_list(). However, we can still use the struct sbitmap_queue wait queues with a custom callback instead of blocking. This has a few benefits: 1. It avoids iterating over all hardware queues when completing an I/O, which the current restart code has to do. 2. It benefits from the existing rolling wakeup code. 3. It avoids punting to another thread just to have it block. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-23block/sed-opal: Introduce free_opal_dev to free the structure and clean up stateScott Bauer1-0/+5
Before we free the opal structure we need to clean up any saved locking ranges that the user had told us to unlock from a suspend. Signed-off-by: Scott Bauer <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-23Merge branch 'linus' of ↵Linus Torvalds7-12/+70
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Try to catch hash output overrun in testmgr - Introduce walksize attribute for batched walking - Make crypto_xor() and crypto_inc() alignment agnostic Algorithms: - Add time-invariant AES algorithm - Add standalone CBCMAC algorithm Drivers: - Add NEON acclerated chacha20 on ARM/ARM64 - Expose AES-CTR as synchronous skcipher on ARM64 - Add scalar AES implementation on ARM64 - Improve scalar AES implementation on ARM - Improve NEON AES implementation on ARM/ARM64 - Merge CRC32 and PMULL instruction based drivers on ARM64 - Add NEON acclerated CBCMAC/CMAC/XCBC AES on ARM64 - Add IPsec AUTHENC implementation in atmel - Add Support for Octeon-tx CPT Engine - Add Broadcom SPU driver - Add MediaTek driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (142 commits) crypto: xts - Add ECB dependency crypto: cavium - switch to pci_alloc_irq_vectors crypto: cavium - switch to pci_alloc_irq_vectors crypto: cavium - remove dead MSI-X related define crypto: brcm - Avoid double free in ahash_finup() crypto: cavium - fix Kconfig dependencies crypto: cavium - cpt_bind_vq_to_grp could return an error code crypto: doc - fix typo hwrng: omap - update Kconfig help description crypto: ccm - drop unnecessary minimum 32-bit alignment crypto: ccm - honour alignmask of subordinate MAC cipher crypto: caam - fix state buffer DMA (un)mapping crypto: caam - abstract ahash request double buffering crypto: caam - fix error path for ctx_dma mapping failure crypto: caam - fix DMA API leaks for multiple setkey() calls crypto: caam - don't dma_map key for hash algorithms crypto: caam - use dma_map_sg() return code crypto: caam - replace sg_count() with sg_nents_for_len() crypto: caam - check sg_count() return value crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc() ..
2017-02-23Merge tag 'rpmsg-v4.11' of git://github.com/andersson/remoteprocLinus Torvalds3-4/+50
Pull rpmsg updates from Bjorn Andersson: "This introduces an interface for user space to directly communicate on rpmsg endpoints without the implementation of specific kernel drivers, which is useful for e.g. debug channels: * tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc: rpmsg: rpmsg_create_ept() returns NULL on error rpmsg: qcom: smd: Return positively when not enabled rpmsg: unlock on error in rpmsg_eptdev_read() rpmsg: char: add CONFIG_NET dependency rpmsg: smd: Register rpmsg user space interface for edges rpmsg: Driver for user space endpoint interface rpmsg: qcom_smd: Implement endpoint "poll" rpmsg: Introduce "poll" to endpoint ops rpmsg: qcom_smd: Add support for "label" property
2017-02-23Merge tag 'rproc-v4.11' of git://github.com/andersson/remoteprocLinus Torvalds2-3/+21
Pull remoteproc updates from Bjorn Andersson: "This introduces support for booting the dedicated sensor core in the Qualcomm MSM8996, updates the Qualcomm ADSP and Hexagon drivers to utilize SMD subdevice helpers for properly handle shutdowns and restarts of the remoteproc, add virtio support to the ST remoteproc and refactor the Qualcomm Hexagon driver to handle variations between platforms. The support code for parsing, loading and authenticating Qualcomm firmware files (MDT) is refactored and move to drivers/soc/qcom, to allow for non-remoteproc drivers to utilize this. Finally it brings some cleanups to the remoteproc core" * tag 'rproc-v4.11' of git://github.com/andersson/remoteproc: (27 commits) remoteproc: qcom: mdt_loader: Use signed type for offset remoteproc: st: add virtio communication support remoteproc: st: correct probe error management remoteproc: Modify the function names remoteproc: Reduce asynchronous request_firmware to auto-boot only remoteproc: Drop qcom_scm_pas_supported() from adsp_probe() MAINTAINERS: Add missing rpmsg include path remoteproc: qcom: Use common SMD edge handler remoteproc: qcom: wcnss: Make SMD handling common remoteproc: Move qcom_mdt_loader into drivers/soc/qcom remoteproc: qcom: mdt_loader: Refactor MDT loader remoteproc: qcom: mdt_loader: Don't overwrite firmware object remoteproc: qcom: Extract non-mdt related helper remoteproc: qcom: q6v5: Decouple driver from MDT loader remoteproc: qcom: q6v5: Remove mss supply from 8916 remoteproc: qcom: fix initializers for qcom_mss_reg_res array remoteproc: Drop firmware_loading_complete remoteproc: Add RPROC_DELETED state remoteproc: Move rproc_delete_debug_dir() to rproc_del() remoteproc: qcom: Add SLPI rproc support to load and boot slpi proc. ...
2017-02-23Merge tag 'sound-4.11-rc1' of ↵Linus Torvalds11-48/+112
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "Here is the update of sound bits for 4.11: again at this time, no big changes in ALSA and ASoC core but only cosmetic changes like consitifaction. Meanwhile, quite a lot of developments are seen in a few driver side. ALSA Core: - Clean up, consitification of some ops HD-audio: - A slight behavior change of single_cmd option - Quirks for AmigaOne X1000, Samsung Ativ Book 8, Dell AiO, ALC221 HP, and fixes for Lewisburg controller - Realtek ALC299, ALC1220 codecs Others: - USB-audio: Tascam US-16x08 DSP mixer quirk - Intel HDMI LPE audio support for Baytrail / Cherrytrail; this contains some updates in drm/i915 for the new platform binding ASoC: - Lots of updates in Intel drivers, mostly for DisplayPort and HDMI on Skylake and onwards, as well as more Baytrail / Cherrytrail boards support - Channel mapping support for HDMI - Support for AllWinner A31 and A33, Everest Semiconductor ES8328, Nuvoton NAU8540. * tag 'sound-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (323 commits) ALSA: usb-audio: Tidy up mixer_us16x08.c ALSA: usb-audio: Fix memory leak and corruption in mixer_us16x08.c ALSA: usb-audio: purge needless variable length array ALSA: x86: hdmi: select CONFIG_SND_PCM ALSA: x86: Don't enable runtime PM as default ALSA: x86: Use runtime PM autosuspend ALSA: usb-audio: localize function without external linkage ALSA: usb-audio: localize one-referrer variable ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk ALSA: emu10k1: constify snd_emux_operators structure ASoC: sun4i-spdif: drop unnessary snd_soc_unregister_component() ASoC: Intel: bxt: Add jack port initialize in bxt_rt298 machine ASoC: nau8825: automatic BCLK and LRC divde in master mode ASoC: hdac_hdmi: Add device id for Geminilake ASoC: Intel: Skylake: Add Geminlake IDs ASoC: rt298: Add DMI match for Geminilake reference platform ASoC: Intel: Skylake: Check device type to get endpoint configuration ASoC: Intel: bxt: Add jack port initialize in da7219_max98357a machine ASoC: Intel: Skylake: Add jack port initialize in nau88l25_ssm4567 machine ASoC: Intel: Skylake: Add jack port initialize in nau88l25_max98357a machine ...
2017-02-23Merge tag 'gpio-v4.11-1' of ↵Linus Torvalds2-13/+57
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for the v4.11 cycle Core changes: - Augment fwnode_get_named_gpiod() to configure the GPIO pin immediately after requesting it like all other APIs do. This is a treewide change also updating all users. - Pass a GPIO label down to gpiod_request() from fwnode_get_named_gpiod(). This makes debugfs and the userspace ABI correctly reflect the current in-kernel consumer of a pin taken using this abstraction. This is a treewide change also updating all users. - Rename devm_get_gpiod_from_child() to devm_fwnode_get_gpiod_from_child() to reflect the fact that this function is operating on a fwnode object. This is a treewide change also updating all users. - Make it possible to take multiple GPIOs in a single hog of device tree hogs. - The refactorings switching GPIO chips to use the .set_config() callback using standard pin control properties and providing a backend into the pin control subsystem that were also merged into the pin control tree naturally appear here too. Testing instrumentation: - A whole slew of cleanups and improvements to the mockup GPIO driver. We now have an extended userspace test exercising the subsystem, and we can inject interrupts etc from userspace to fully test the core GPIO functionality. New drivers: - New driver for the Cortina Systems Gemini GPIO controller. - New driver for the Exar XR17V352/354/358 chips. - New driver for the ACCES PCI-IDIO-16 PCI GPIO card. Driver changes: - RCAR: set the irqchip parent device, add fine-grained runtime PM support. - pca953x: support optional RESET control line on the chip. - DaVinci: cleanups and simplifications. Add support for multiple instances. - .set_multiple() and naming of lines on more or less all of the ISA/PCI GPIO controllers. - mcp23s08: refactored to use regmap as a first step to further rewrites and modernizations" * tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits) gpio: reintroduce devm_get_gpiod_from_child() gpio: pci-idio-16: Fix PCI BAR index gpio: pci-idio-16: Fix PCI device ID code gpio: mockup: implement event injecting over debugfs gpio: mockup: add a dummy irqchip gpio: mockup: implement naming the lines gpio: mockup: code shrink gpio: mockup: readability tweaks gpio: Add GPIO support for the ACCES PCI-IDIO-16 gpio: Add the devm_fwnode_get_index_gpiod_from_child() helper gpio: Rename devm_get_gpiod_from_child() gpio: mcp23s08: Select REGMAP/REGMAP_I2C to fix build error gpio: ws16c48: Add support for GPIO names gpio: gpio-mm: Add support for GPIO names gpio: 104-idio-16: Add support for GPIO names gpio: 104-idi-48: Add support for GPIO names gpio: 104-dio-48e: Add support for GPIO names gpio: ws16c48: Remove unnecessary driver_data set gpio: gpio-mm: Remove unnecessary driver_data set gpio: 104-idio-16: Remove unnecessary driver_data set ...
2017-02-23Merge branch 'for-linus' of ↵Linus Torvalds5-119/+7
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry: - a new driver for Zeitech touchscreen controller - a new driver for Samsung "touchkeys" - touchscreen driver for Moorestown platform has been removed because platform support is gone - MPU3050 accelerometer driver was removed in favor of IIO driver - miscellaneous driver cleanup and fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (88 commits) Input: zet6223 - export OF device ID as module aliases Input: tsc2004/5 - switch to using generic device properties Input: tsc2004/5 - fix regulator handling Input: tsc2005 - add OF device table Input: add driver for Zeitec ZET6223 Input: joydev - do not report stale values on first open Input: synaptics-rmi4 - forward upper mechanical buttons to PS/2 guest Input: synaptics-rmi4 - clean up F30 implementation Input: synaptics - use SERIO_OOB_DATA to handle trackstick buttons Input: psmouse - add a custom serio protocol to send extra information Input: synaptics-rmi4 - fix error return code in rmi_probe_interrupts() Input: xpad - restore LED state after device resume Input: synaptics-rmi4 - add rmi_find_function() Input: xpad - fix stuck mode button on Xbox One S pad Input: joydev - use clamp() macro Input: refuse to register absolute devices without absinfo Input: synaptics-rmi4 - add sysfs interfaces for hardware IDs Input: synaptics-rmi4 - add sysfs attribute update_fw_status Input: mousedev - stop offering PS/2 to userspace by default Input: tca8418 - switch to using generic device properties ...
2017-02-23Merge tag 'for-linus' of ↵Linus Torvalds16-199/+513
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "First set of updates for 4.11 kernel merge window - Add new Broadcom bnxt_re RoCE driver - rxe driver updates - ioctl cleanups - ETH_P_IBOE declaration cleanup - IPoIB changes - Add port state cache - Allow srpt driver to accept guids as port names in config - Update to hfi1 driver - Update to srp driver - Lots of misc minor changes all over" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (114 commits) RDMA/bnxt_re: fix for "bnxt_en: Update to firmware interface spec 1.7.0." rdma_cm: fail iwarp accepts w/o connection params IB/srp: Drain the send queue before destroying a QP IB/core: Add support for draining IB_POLL_DIRECT completion queues IB/srp: Improve an error path IB/srp: Make a diagnostic message more informative IB/srp: Document locking conventions IB/srp: Fix race conditions related to task management IB/srp: Avoid that duplicate responses trigger a kernel bug IB/SRP: Avoid using IB_MR_TYPE_SG_GAPS RDMA/qedr: Fix some error handling RDMA/bnxt_re: add DCB dependency IB/hns: include linux/module.h IB/vmw_pvrdma: Expose vendor error to ULPs vmw_pvrdma: switch to pci_alloc_irq_vectors IB/hfi1: use size_t for passing array length IB/ipoib: Remove redudant label IB/ipoib: remove the unnecessary memory free IB/mthca: switch to pci_alloc_irq_vectors IB/hfi1: Code reuse with memdup_copy ...
2017-02-23Merge tag 'mfd-for-linus-4.11' of ↵Linus Torvalds6-12/+396
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Core Frameworks: - Add new !TOUCHSCREEN_SUN4I dependency for SUN4I_GPADC - List include/dt-bindings/mfd/* to files supported in MAINTAINERS New Drivers: - Intel Apollo Lake SPI NOR - ST STM32 Timers (Advanced, Basic and PWM) - Motorola 6556002 CPCAP (PMIC) New Device Support: - Add support for AXP221 to axp20x - Add support for Intel Gemini Lake to intel-lpss-pci - Add support for MT6323 LED to mt6397-core - Add support for COMe-bBD#, COMe-bSL6, COMe-bKL6, COMe-cAL6 and COMe-cKL6 to kempld-core New Functionality: - Add support for Analog CODAC to sun6i-prcm - Add support for Watchdog to lpc_ich Fix-ups: - Error handling improvements; axp288_charger, axp20x, ab8500-sysctrl - Adapt platform data handling; axp20x - IRQ handling improvements; arizona, axp20x - Remove superfluous code; arizona, axp20x, lpc_ich - Trivial coding style/spelling fixes; axp20x, abx500, mfd.txt - Regmap fix-ups; axp20x - DT changes; mfd.txt, aspeed-lpc, aspeed-gfx, ab8500-core, tps65912, mt6397 - Use new I2C probing mechanism; max77686 - Constification; rk808 Bug Fixes: - Stop data transfer whilst suspended; cros_ec" * tag 'mfd-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (43 commits) mfd: lpc_ich: Enable watchdog on Intel Apollo Lake PCH mfd: lpc_ich: Remove useless comments in core part mfd: Add support for several boards to Kontron PLD driver mfd: constify regmap_irq_chip structures MAINTAINERS: Add include/dt-bindings/mfd to MFD entry mfd: cpcap: Add minimal support mfd: mt6397: Add MT6323 LED support into MT6397 driver Documentation: devicetree: Add LED subnode binding for MT6323 PMIC mfd: tps65912: Export OF device ID table as module aliases mfd: ab8500-core: Rename clock device and compatible mfd: cros_ec: Send correct suspend/resume event to EC mfd: max77686: Remove I2C device ID table mfd: max77686: Use the struct i2c_driver .probe_new instead of .probe mfd: max77686: Use of_device_get_match_data() helper mfd: max77686: Don't attempt to get i2c_device_id .data mfd: ab8500-sysctrl: Handle probe deferral mfd: intel-lpss: Add Intel Gemini Lake PCI IDs mfd: axp20x: Fix AXP806 access errors on cold boot mfd: cros_ec: Send suspend state notification to EC mfd: cros_ec: Prevent data transfer while device is suspended ...
2017-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2-1/+5
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Revisit warning logic when not applying default helper assignment. Jiri Kosina considers we are breaking existing setups and not warning our users accordinly now that automatic helper assignment has been turned off by default. So let's make him happy by spotting the warning by when we find a helper but we cannot attach, instead of warning on the former deprecated behaviour. Patch from Jiri Kosina. 2) Two patches to fix regression in ctnetlink interfaces with nfnetlink_queue. Specifically, perform more relaxed in CTA_STATUS and do not bail out if CTA_HELP indicates the same helper that we already have. Patches from Kevin Cernekee. 3) A couple of bugfixes for ipset via Jozsef Kadlecsik. Due to wrong index logic in hash set types and null pointer exception in the list:set type. 4) hashlimit bails out with correct userspace parameters due to wrong arithmetics in the code that avoids "divide by zero" when transforming the userspace timing in milliseconds to token credits. Patch from Alban Browaeys. 5) Fix incorrect NFQA_VLAN_MAX definition, patch from Ken-ichirou MATSUZAWA. 6) Don't not declare nfnetlink batch error list as static, since this may be used by several subsystems at the same time. Patch from Liping Zhang. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-02-23net/mlx4: Spoofcheck and zero MAC can't coexistEugenia Emantayev2-1/+11
Spoofcheck can't be enabled if VF MAC is zero. Vice versa, can't zero MAC if spoofcheck is on. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Eugenia Emantayev <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-23uapi: fix linux/rds.h userspace compilation errorsDmitry V. Levin1-5/+5
Consistently use types from linux/types.h to fix the following linux/rds.h userspace compilation errors: /usr/include/linux/rds.h:198:2: error: unknown type name 'u8' u8 rx_traces; /usr/include/linux/rds.h:199:2: error: unknown type name 'u8' u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX]; /usr/include/linux/rds.h:203:2: error: unknown type name 'u8' u8 rx_traces; /usr/include/linux/rds.h:204:2: error: unknown type name 'u8' u8 rx_trace_pos[RDS_MSG_RX_DGRAM_TRACE_MAX]; /usr/include/linux/rds.h:205:2: error: unknown type name 'u64' u64 rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX]; Fixes: 3289025aedc0 ("RDS: add receive message trace used by application") Signed-off-by: Dmitry V. Levin <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-23uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errorsDmitry V. Levin2-0/+3
Include <linux/in6.h> in uapi/linux/seg6.h to fix the following linux/seg6.h userspace compilation error: /usr/include/linux/seg6.h:31:18: error: array type has incomplete element type 'struct in6_addr' struct in6_addr segments[0]; Include <linux/seg6.h> in uapi/linux/seg6_iptunnel.h to fix the following linux/seg6_iptunnel.h userspace compilation error: /usr/include/linux/seg6_iptunnel.h:26:21: error: array type has incomplete element type 'struct ipv6_sr_hdr' struct ipv6_sr_hdr srh[0]; Fixes: a50a05f497a2 ("ipv6: sr: add missing Kbuild export for header files") Signed-off-by: Dmitry V. Levin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-23uapi: fix linux/llc.h userspace compilation errorDmitry V. Levin1-0/+1
Include <linux/if.h> to fix the following linux/llc.h userspace compilation error: /usr/include/linux/llc.h:26:27: error: 'IFHWADDRLEN' undeclared here (not in a function) unsigned char sllc_mac[IFHWADDRLEN]; Signed-off-by: Dmitry V. Levin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-23uapi: fix linux/ip6_tunnel.h userspace compilation errorsDmitry V. Levin1-0/+2
Include <linux/if.h> and <linux/in6.h> to fix the following linux/ip6_tunnel.h userspace compilation errors: /usr/include/linux/ip6_tunnel.h:23:12: error: 'IFNAMSIZ' undeclared here (not in a function) char name[IFNAMSIZ]; /* name of tunnel device */ /usr/include/linux/ip6_tunnel.h:30:18: error: field 'laddr' has incomplete type struct in6_addr laddr; /* local tunnel end-point address */ Signed-off-by: Dmitry V. Levin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds25-146/+753
Merge updates from Andrew Morton: "142 patches: - DAX updates - various misc bits - OCFS2 updates - most of MM" * emailed patches from Andrew Morton <[email protected]>: (142 commits) mm/z3fold.c: limit first_num to the actual range of possible buddy indexes mm: fix <linux/pagemap.h> stray kernel-doc notation zram: remove obsolete sysfs attrs mm/memblock.c: remove unnecessary log and clean up oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA mm: drop unused argument of zap_page_range() mm: drop zap_details::check_swap_entries mm: drop zap_details::ignore_dirty mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled mm: help __GFP_NOFAIL allocations which do not trigger OOM killer mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically mm: consolidate GFP_NOFAIL checks in the allocator slowpath lib/show_mem.c: teach show_mem to work with the given nodemask arch, mm: remove arch specific show_mem mm, page_alloc: warn_alloc print nodemask mm, page_alloc: do not report all nodes in show_mem Revert "mm: bail out in shrink_inactive_list()" mm, vmscan: consider eligible zones in get_scan_count mm, vmscan: cleanup lru size claculations mm, vmscan: do not count freed pages as PGDEACTIVATE ...
2017-02-22Merge tag 'devicetree-for-4.11' of ↵Linus Torvalds2-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: "Pretty standard stuff with dtc upstream sync being the biggest piece. - Sync dtc to upstream commit 0931cea3ba20. This picks up overlay support in dtc. - Set dma_ops for reserved memory users. - Make references to IOMMU consistent in DT bindings. - Cleanup references to pm_power_off in bindings. - Move some display bindings that snuck into the old bindings/video/ path. - Fix some wrong documentation paths caused from binding restructuring. - Vendor prefixes for Faraday and Fujitsu. - Fix an of_node ref counting leak in of_find_node_opts_by_path - Introduce new graph helper of_graph_get_remote_node() which will be used by DRM drivers in 4.12" * tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (27 commits) DT: add Faraday Tec. as vendor of: introduce of_graph_get_remote_node of: Add missing space at end of pr_fmt(). of: make of_device_make_bus_id() static of: fix of_node leak caused in of_find_node_opts_by_path dt-bindings: net: remove reference to fixed link support dt-bindings: power: reset: qnap-poweroff: Drop reference to pm_power_off dt-bindings: power: reset: gpio-poweroff: Drop reference to pm_power_off dt-bindings: mfd: as3722: Drop reference to pm_power_off dt-bindings: display: move ANX7814 and SiI8620 bridge bindings of/unittest: Swap arguments of of_unittest_apply_overlay() Documentation: usb: fix wrong documentation paths serial: fsl-imx-uart.txt: Remove generic property devicetree: Add Fujitsu Ltd. vendor prefix Documentation: display: fix wrong documentation paths of: remove redundant memset in overlay bus:qcom : Fix typo in qcom,ebi2.txt dt-bindings: qman: Remove pool channel node Documentation: panel-dpi: fix path to display-timing.txt devicetree: bindings: clk: mvebu: fix description for sata1 on Armada XP ...
2017-02-22Merge tag 'docs-4.11' of git://git.lwn.net/linuxLinus Torvalds1-56/+54
Pull documentation updates from Jonathan Corbet: "A slightly quieter cycle for documentation this time around. Three more DocBook template files have been converted to RST; only 21 to go. There are various build improvements and the usual array of documentation improvements and fixes" * tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits) docs / driver-api: Fix structure references in device_link.rst PM / docs: Fix structure references in device.rst Add a target to check broken external links in the Documentation Documentation: Fix linux-api list typo Documentation: DocBook/Makefile comment typo Improve sparse documentation Documentation: make Makefile.sphinx no-ops quieter Documentation: DMA-ISA-LPC.txt Documentation: input: fix path to input code definitions docs: Remove the copyright year from conf.py docs: Fix a warning in the Korean HOWTO.rst translation PM / sleep / docs: Convert PM notifiers document to reST PM / core / docs: Convert sleep states API document to reST PM / core: Update kerneldoc comments in pm.h doc-rst: Fix recursive make invocation from macros doc-rst: Delete output of failed dot-SVG conversion doc-rst: Break shell command sequences on failure Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0 doc-rst: fixed cleandoc target when used with O=dir Documentation/sphinx: prevent generation of .pyc files in the source tree ...
2017-02-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-35/+102
Pull KVM updates from Paolo Bonzini: "4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. s390: - expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits) x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 x86/paravirt: Change vcp_is_preempted() arg type to long KVM: VMX: use correct vmcs_read/write for guest segment selector/base x86/kvm/vmx: Defer TR reload after VM exit x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss x86/kvm/vmx: Simplify segment_base() x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels x86/kvm/vmx: Don't fetch the TSS base from the GDT x86/asm: Define the kernel TSS limit in a macro kvm: fix page struct leak in handle_vmon KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now KVM: Return an error code only as a constant in kvm_get_dirty_log() KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() KVM: x86: remove code for lazy FPU handling KVM: race-free exit from KVM_RUN without POSIX signals KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message KVM: PPC: Book3S PR: Ratelimit copy data failure error messages KVM: Support vCPU-based gfn->hva cache KVM: use separate generations for each address space ...
2017-02-23Merge tag 'v4.10-rc8' into drm-nextDave Airlie42-90/+220
Linux 4.10-rc8 Backmerge Linus rc8 to fix some conflicts, but also to avoid pulling it in via a fixes pull from someone.
2017-02-22Merge tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-11/+11
Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.11. We aren't introducing any major features in this release cycle except for this being the first merge window I've managed on my own. :) Changes since last update: - Various cleanups - Livelock fixes for eofblocks scanning - Improved input verification for on-disk metadata - Fix races in the copy on write remap mechanism - Fix buffer io error timeout controls - Streamlining of directio copy on write - Asynchronous discard support - Fix asserts when splitting delalloc reservations - Don't bloat bmbt when right shifting extents - Inode alignment fixes for 32k block sizes" * tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (39 commits) xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG xfs: simplify xfs_rtallocate_extent xfs: tune down agno asserts in the bmap code xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment xfs: don't reserve blocks for right shift transactions xfs: fix len comparison in xfs_extent_busy_trim xfs: fix uninitialized variable in _reflink_convert_cow xfs: split indlen reservations fairly when under reserved xfs: handle indlen shortage on delalloc extent merge xfs: resurrect debug mode drop buffered writes mechanism xfs: clear delalloc and cache on buffered write failure xfs: don't block the log commit handler for discards xfs: improve busy extent sorting xfs: improve handling of busy extents in the low-level allocator xfs: don't fail xfs_extent_busy allocation xfs: correct null checks and error processing in xfs_initialize_perag xfs: update ctime and mtime on clone destinatation inodes xfs: allocate direct I/O COW blocks in iomap_begin xfs: go straight to real allocations for direct I/O COW writes xfs: return the converted extent in __xfs_reflink_convert_cow ...
2017-02-22Merge branch 'for-linus' of ↵Linus Torvalds1-6/+15
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven Rostedt as the printk reviewer. This idea came up after the discussion about printk issues at Kernel Summit. It was formulated and discussed at lkml[1]. - Extend a lock-less NMI per-cpu buffers idea to handle recursive printk() calls by Sergey Senozhatsky[2]. It is the first step in sanitizing printk as discussed at Kernel Summit. The change allows to see messages that would normally get ignored or would cause a deadlock. Also it allows to enable lockdep in printk(). This already paid off. The testing in linux-next helped to discover two old problems that were hidden before[3][4]. - Remove unused parameter by Sergey Senozhatsky. Clean up after a past change. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] [3] http://lkml.kernel.org/r/[email protected] [4] http://lkml.kernel.org/r/[email protected] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop call_console_drivers() unused param printk: convert the rest to printk-safe printk: remove zap_locks() function printk: use printk_safe buffers in printk printk: report lost messages in printk safe/nmi contexts printk: always use deferred printk when flush printk_safe lines printk: introduce per-cpu safe_print seq buffer printk: rename nmi.c and exported api printk: use vprintk_func in vprintk() MAINTAINERS: Add printk maintainers
2017-02-22Merge tag 'modules-for-v4.11' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "Summary of modules changes for the 4.11 merge window: - A few small code cleanups - Add modules git tree url to MAINTAINERS" * tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: MAINTAINERS: add tree for modules module: fix memory leak on early load_module() failures module: Optimize search_module_extables() modules: mark __inittest/__exittest as __maybe_unused livepatch/module: print notice of TAINT_LIVEPATCH module: Drop redundant declaration of struct module
2017-02-22mm: fix <linux/pagemap.h> stray kernel-doc notationRandy Dunlap1-1/+0
Delete stray (second) function description in find_lock_page() kernel-doc notation. Note: scripts/kernel-doc just ignores the second function description. Fixes: 2457aec63745e ("mm: non-atomically mark page accessed during page cache allocation where possible") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm: drop unused argument of zap_page_range()Kirill A. Shutemov1-1/+1
There's no users of zap_page_range() who wants non-NULL 'details'. Let's drop it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm: drop zap_details::check_swap_entriesKirill A. Shutemov1-1/+0
detail == NULL would give the same functionality as .check_swap_entries==true. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm: drop zap_details::ignore_dirtyKirill A. Shutemov1-1/+0
The only user of ignore_dirty is oom-reaper. But it doesn't really use it. ignore_dirty only has effect on file pages mapped with dirty pte. But oom-repear skips shared VMAs, so there's no way we can dirty file pte in them. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22lib/show_mem.c: teach show_mem to work with the given nodemaskMichal Hocko1-3/+2
show_mem() allows to filter out node specific data which is irrelevant to the allocation request via SHOW_MEM_FILTER_NODES. The filtering is done in skip_free_areas_node which skips all nodes which are not in the mems_allowed of the current process. This works most of the time as expected because the nodemask shouldn't be outside of the allocating task but there are some exceptions. E.g. memory hotplug might want to request allocations from outside of the allowed nodes (see new_node_page). Get rid of this hardcoded behavior and push the allocation mask down the show_mem path and use it instead of cpuset_current_mems_allowed. NULL nodemask is interpreted as cpuset_current_mems_allowed. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, page_alloc: warn_alloc print nodemaskMichal Hocko1-2/+2
warn_alloc is currently used for to report an allocation failure or an allocation stall. We print some details of the allocation request like the gfp mask and the request order. We do not print the allocation nodemask which is important when debugging the reason for the allocation failure as well. We alreaddy print the nodemask in the OOM report. Add nodemask to warn_alloc and print it in warn_alloc as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, vmscan: cleanup lru size claculationsMichal Hocko1-1/+1
lruvec_lru_size returns the full size of the LRU list while we sometimes need a value reduced only to eligible zones (e.g. for lowmem requests). inactive_list_is_low is one such user. Later patches will add more of them. Add a new parameter to lruvec_lru_size and allow it filter out zones which are not eligible for the given context. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, thp: add new defer+madvise defrag optionDavid Rientjes1-0/+1
There is no thp defrag option that currently allows MADV_HUGEPAGE regions to do direct compaction and reclaim while all other thp allocations simply trigger kswapd and kcompactd in the background and fail immediately. The "defer" setting simply triggers background reclaim and compaction for all regions, regardless of MADV_HUGEPAGE, which makes it unusable for our userspace where MADV_HUGEPAGE is being used to indicate the application is willing to wait for work for thp memory to be available. The "madvise" setting will do direct compaction and reclaim for these MADV_HUGEPAGE regions, but does not trigger kswapd and kcompactd in the background for anybody else. For reasonable usage, there needs to be a mesh between the two options. This patch introduces a fifth mode, "defer+madvise", that will do direct reclaim and compaction for MADV_HUGEPAGE regions and trigger background reclaim and compaction for everybody else so that hugepages may be available in the near future. A proposal to allow direct reclaim and compaction for MADV_HUGEPAGE regions as part of the "defer" mode, making it a very powerful setting and avoids breaking userspace, was offered: http://marc.info/?t=148236612700003 This additional mode is a compromise. A second proposal to allow both "defer" and "madvise" to be selected at the same time was also offered: http://marc.info/?t=148357345300001. This is possible, but there was a concern that it might break existing userspaces the parse the output of the defrag mode, so the fifth option was introduced instead. This patch also cleans up the helper function for storing to "enabled" and "defrag" since the former supports three modes while the latter supports five and triple_flag_store() was getting unnecessarily messy. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm/swap: skip readahead only when swap slot cache is enabledHuang Ying1-0/+2
Because during swap off, a swap entry may have swap_map[] == SWAP_HAS_CACHE (for example, just allocated). If we return NULL in __read_swap_cache_async(), the swap off will abort. So when swap slot cache is disabled, (for swap off), we will wait for page to be put into swap cache in such race condition. This should not be a problem for swap slot cache, because swap slot cache should be drained after clearing swap_slot_cache_enabled. [[email protected]: fix memory leak in __read_swap_cache_async()] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/5e2c5f6abe8e6eb0797408897b1bba80938e9b9d.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: "Huang, Ying" <[email protected]> Signed-off-by: Tim Chen <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm/swap: add cache for swap slots allocationTim Chen2-0/+32
We add per cpu caches for swap slots that can be allocated and freed quickly without the need to touch the swap info lock. Two separate caches are maintained for swap slots allocated and swap slots returned. This is to allow the swap slots to be returned to the global pool in a batch so they will have a chance to be coaelesced with other slots in a cluster. We do not reuse the slots that are returned right away, as it may increase fragmentation of the slots. The swap allocation cache is protected by a mutex as we may sleep when searching for empty slots in cache. The swap free cache is protected by a spin lock as we cannot sleep in the free path. We refill the swap slots cache when we run out of slots, and we disable the swap slots cache and drain the slots if the global number of slots fall below a low watermark threshold. We re-enable the cache agian when the slots available are above a high watermark. [[email protected]: use raw_cpu_ptr over this_cpu_ptr for swap slots access] [[email protected]: add comments on locks in swap_slots.h] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/35de301a4eaa8daa2977de6e987f2c154385eb66.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm/swap: free swap slots in batchTim Chen1-0/+1
Add new functions that free unused swap slots in batches without the need to reacquire swap info lock. This improves scalability and reduce lock contention. Link: http://lkml.kernel.org/r/c25e0fcdfd237ec4ca7db91631d3b9f6ed23824e.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm/swap: allocate swap slots in batchesTim Chen1-0/+2
Currently, the swap slots are allocated one page at a time, causing contention to the swap_info lock protecting the swap partition on every page being swapped. This patch adds new functions get_swap_pages and scan_swap_map_slots to request multiple swap slots at once. This will reduces the lock contention on the swap_info lock. Also scan_swap_map_slots can operate more efficiently as swap slots often occurs in clusters close to each other on a swap device and it is quicker to allocate them together. Link: http://lkml.kernel.org/r/9fec2845544371f62c3763d43510045e33d286a6.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm/swap: skip readahead for unreferenced swap slotsTim Chen1-0/+6
We can avoid needlessly allocating page for swap slots that are not used by anyone. No pages have to be read in for these slots. Link: http://lkml.kernel.org/r/0784b3f20b9bd3aa5552219624cb78dc4ae710c9.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm/swap: split swap cache into 64MB trunksHuang, Ying1-2/+9
The patch is to improve the scalability of the swap out/in via using fine grained locks for the swap cache. In current kernel, one address space will be used for each swap device. And in the common configuration, the number of the swap device is very small (one is typical). This causes the heavy lock contention on the radix tree of the address space if multiple tasks swap out/in concurrently. But in fact, there is no dependency between pages in the swap cache. So that, we can split the one shared address space for each swap device into several address spaces to reduce the lock contention. In the patch, the shared address space is split into 64MB trunks. 64MB is chosen to balance the memory space usage and effect of lock contention reduction. The size of struct address_space on x86_64 architecture is 408B, so with the patch, 6528B more memory will be used for every 1GB swap space on x86_64 architecture. One address space is still shared for the swap entries in the same 64M trunks. To avoid lock contention for the first round of swap space allocation, the order of the swap clusters in the initial free clusters list is changed. The swap space distance between the consecutive swap clusters in the free cluster list is at least 64M. After the first round of allocation, the swap clusters are expected to be freed randomly, so the lock contention should be reduced effectively. Link: http://lkml.kernel.org/r/735bab895e64c930581ffb0a05b661e01da82bc5.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: "Huang, Ying" <[email protected]> Signed-off-by: Tim Chen <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm/swap: add cluster lockHuang, Ying1-0/+6
This patch is to reduce the lock contention of swap_info_struct->lock via using a more fine grained lock in swap_cluster_info for some swap operations. swap_info_struct->lock is heavily contended if multiple processes reclaim pages simultaneously. Because there is only one lock for each swap device. While in common configuration, there is only one or several swap devices in the system. The lock protects almost all swap related operations. In fact, many swap operations only access one element of swap_info_struct->swap_map array. And there is no dependency between different elements of swap_info_struct->swap_map. So a fine grained lock can be used to allow parallel access to the different elements of swap_info_struct->swap_map. In this patch, a spinlock is added to swap_cluster_info to protect the elements of swap_info_struct->swap_map in the swap cluster and the fields of swap_cluster_info. This reduced locking contention for swap_info_struct->swap_map access greatly. Because of the added spinlock, the size of swap_cluster_info increases from 4 bytes to 8 bytes on the 64 bit and 32 bit system. This will use additional 4k RAM for every 1G swap space. Because the size of swap_cluster_info is much smaller than the size of the cache line (8 vs 64 on x86_64 architecture), there may be false cache line sharing between spinlocks in swap_cluster_info. To avoid the false sharing in the first round of the swap cluster allocation, the order of the swap clusters in the free clusters list is changed. So that, the swap_cluster_info sharing the same cache line will be placed as far as possible. After the first round of allocation, the order of the clusters in free clusters list is expected to be random. So the false sharing should be not serious. Compared with a previous implementation using bit_spin_lock, the sequential swap out throughput improved about 3.2%. Test was done on a Xeon E5 v3 system. The swap device used is a RAM simulated PMEM (persistent memory) device. To test the sequential swapping out, the test case created 32 processes, which sequentially allocate and write to the anonymous pages until the RAM and part of the swap device is used. [[email protected]: v5] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: initialize spinlock for swap_cluster_info] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: annotate nested locking for cluster lock] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/dbb860bbd825b1aaba18988015e8963f263c3f0d.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: "Huang, Ying" <[email protected]> Signed-off-by: Tim Chen <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22powerpc: do not make the entire heap executableDenys Vlasenko1-0/+1
On 32-bit powerpc the ELF PLT sections of binaries (built with --bss-plt, or with a toolchain which defaults to it) look like this: [17] .sbss NOBITS 0002aff8 01aff8 000014 00 WA 0 0 4 [18] .plt NOBITS 0002b00c 01aff8 000084 00 WAX 0 0 4 [19] .bss NOBITS 0002b090 01aff8 0000a4 00 WA 0 0 4 Which results in an ELF load header: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000 This is all correct, the load region containing the PLT is marked as executable. Note that the PLT starts at 0002b00c but the file mapping ends at 0002aff8, so the PLT falls in the 0 fill section described by the load header, and after a page boundary. Unfortunately the generic ELF loader ignores the X bit in the load headers when it creates the 0 filled non-file backed mappings. It assumes all of these mappings are RW BSS sections, which is not the case for PPC. gcc/ld has an option (--secure-plt) to not do this, this is said to incur a small performance penalty. Currently, to support 32-bit binaries with PLT in BSS kernel maps *entire brk area* with executable rights for all binaries, even --secure-plt ones. Stop doing that. Teach the ELF loader to check the X bit in the relevant load header and create 0 filled anonymous mappings that are executable if the load header requests that. Test program showing the difference in /proc/$PID/maps: int main() { char buf[16*1024]; char *p = malloc(123); /* make "[heap]" mapping appear */ int fd = open("/proc/self/maps", O_RDONLY); int len = read(fd, buf, sizeof(buf)); write(1, buf, len); printf("%p\n", p); return 0; } Compiled using: gcc -mbss-plt -m32 -Os test.c -otest Unpatched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10690000-106c0000 rwxp 00000000 00:00 0 [heap] f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ffa90000-ffac0000 rw-p 00000000 00:00 0 [stack] 0x10690008 Patched ppc64 kernel: 00100000-00120000 r-xp 00000000 00:00 0 [vdso] 0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094 /usr/lib/libc-2.17.so 0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094 /usr/lib/libc-2.17.so 10000000-10010000 r-xp 00000000 fd:00 100674505 /home/user/test 10010000-10020000 r--p 00000000 fd:00 100674505 /home/user/test 10020000-10030000 rw-p 00010000 fd:00 100674505 /home/user/test 10180000-101b0000 rw-p 00000000 00:00 0 [heap] ^^^^ this has changed f7c60000-f7c90000 r-xp 00000000 fd:00 67898089 /usr/lib/ld-2.17.so f7c90000-f7ca0000 r--p 00020000 fd:00 67898089 /usr/lib/ld-2.17.so f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089 /usr/lib/ld-2.17.so ff860000-ff890000 rw-p 00000000 00:00 0 [stack] 0x10180008 The patch was originally posted in 2012 by Jason Gunthorpe and apparently ignored: https://lkml.org/lkml/2012/9/30/138 Lightly run-tested. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Denys Vlasenko <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Michael Ellerman <[email protected]> Tested-by: Jason Gunthorpe <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Florian Weimer <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm: page_alloc: skip over regions of invalid pfns where possiblePaul Burton1-0/+1
When using a sparse memory model memmap_init_zone() when invoked with the MEMMAP_EARLY context will skip over pages which aren't valid - ie. which aren't in a populated region of the sparse memory map. However if the memory map is extremely sparse then it can spend a long time linearly checking each PFN in a large non-populated region of the memory map & skipping it in turn. When CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled, we have sufficient information to quickly discover the next valid PFN given an invalid one by searching through the list of memory regions & skipping forwards to the first PFN covered by the memory region to the right of the non-populated region. Implement this in order to speed up memmap_init_zone() for systems with extremely sparse memory maps. James said "I have tested this patch on a virtual model of a Samurai CPU with a sparse memory map. The kernel boot time drops from 109 to 62 seconds. " Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Paul Burton <[email protected]> Tested-by: James Hartley <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, compaction: add vmstats for kcompactd workDavid Rientjes1-0/+1
A "compact_daemon_wake" vmstat exists that represents the number of times kcompactd has woken up. This doesn't represent how much work it actually did, though. It's useful to understand how much compaction work is being done by kcompactd versus other methods such as direct compaction and explicitly triggered per-node (or system) compaction. This adds two new vmstats: "compact_daemon_migrate_scanned" and "compact_daemon_free_scanned" to represent the number of pages kcompactd has scanned as part of its migration scanner and freeing scanner, respectively. These values are still accounted for in the general "compact_migrate_scanned" and "compact_free_scanned" for compatibility. It could be argued that explicitly triggered compaction could also be tracked separately, and that could be added if others find it useful. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm: un-export wake_up_page functionsNicholas Piggin1-10/+2
These are no longer used outside mm/filemap.c, so un-export them and make them static where possible. These were exported specifically for NFS use in commit a4796e37c12e ("MM: export page_wakeup functions"). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Anna Schumaker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, vmscan: add mm_vmscan_inactive_list_is_low tracepointMichal Hocko1-0/+40
Currently we have tracepoints for both active and inactive LRU lists reclaim but we do not have any which would tell us why we we decided to age the active list. Without that it is quite hard to diagnose active/inactive lists balancing. Add mm_vmscan_inactive_list_is_low tracepoint to tell us this information. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, vmscan: enhance mm_vmscan_lru_shrink_inactive tracepointMichal Hocko1-3/+26
mm_vmscan_lru_shrink_inactive will currently report the number of scanned and reclaimed pages. This doesn't give us an idea how the reclaim went except for the overall effectiveness though. Export and show other counters which will tell us why we couldn't reclaim some pages. - nr_dirty, nr_writeback, nr_congested and nr_immediate tells us how many pages are blocked due to IO - nr_activate tells us how many pages were moved to the active list - nr_ref_keep reports how many pages are kept on the LRU due to references (mostly for the file pages which are about to go for another round through the inactive list) - nr_unmap_fail - how many pages failed to unmap All these are rather low level so they might change in future but the tracepoint is already implementation specific so no tools should be depending on its stability. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, vmscan: show LRU name in mm_vmscan_lru_isolate tracepointMichal Hocko2-6/+14
mm_vmscan_lru_isolate currently prints only whether the LRU we isolate from is file or anonymous but we do not know which LRU this is. It is useful to know whether the list is active or inactive, since we are using the same function to isolate pages from both of them and it's hard to distinguish otherwise. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, vmscan: show the number of skipped pages in mm_vmscan_lru_isolateMichal Hocko1-2/+6
mm_vmscan_lru_isolate shows the number of requested, scanned and taken pages. This is mostly OK but on 32b systems the number of scanned pages is quite misleading because it includes both the scanned and skipped pages. Moreover the skipped part is scaled based on the number of taken pages. Let's report the exact numbers without any additional logic and add the number of skipped pages. This should make the reported data much more easier to interpret. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, vmscan: add active list aging tracepointMichal Hocko1-0/+36
Our reclaim process has several tracepoints to tell us more about how things are progressing. We are, however, missing a tracepoint to track active list aging. Introduce mm_vmscan_lru_shrink_active which reports the number of - nr_taken is number of isolated pages from the active list - nr_referenced pages which tells us that we are hitting referenced pages which are deactivated. If this is a large part of the reported nr_deactivated pages then we might be hitting into the active list too early because they might be still part of the working set. This might help to debug performance issues. - nr_active pages which tells us how many pages are kept on the active list - mostly exec file backed pages. A high number can indicate that we might be trashing on executables. [[email protected]: update] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22mm, vmscan: remove unused mm_vmscan_memcg_isolateMichal Hocko1-30/+1
Patch series "vm, vmscan: enahance vmscan tracepoints", v2. While debugging [2] I've realized that there is some room for improvements in the tracepoints set we offer currently. I had hard times to make any conclusion from the existing ones. The resulting problem turned out to be active list aging [3] and we are missing at least two tracepoints to debug such a problem. Some existing tracepoints could export more information to see _why_ the reclaim progress cannot be made not only _how much_ we could reclaim. The later could be seen quite reasonably from the vmstat counters already. It can be argued that we are showing too many implementation details in those tracepoints but I consider them way too lowlevel already to be usable by any kernel independent userspace. I would be _really_ surprised if anything but debugging tools have used them. Any feedback is highly appreciated. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] [3] http://lkml.kernel.org/r/[email protected] This patch (of 8): The trace point is not used since 925b7673cce3 ("mm: make per-memcg LRU lists exclusive") so it can be removed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>