Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC device tree updates from Arnd Bergmann:
"We get a moderate number of new machines this time, and only one new
SoC variant (Actions S700):
Actions:
- S700 Soc and CubieBoard7 development board
- Allo.com Sparky Single-board-computer
Allwinner:
- Orange Pi R1 development board
- Libre Computer Board ALL-H3-CC H3 single-board computer
ASpeed ast2x00:
- Witherspoon: OpenPower Power9 server manufactured by IBM that uses the ASPEED ast2500
- Zaius: OpenPower Power9 server manufactured by Invatech that uses the ASPEED ast2500
- Q71L: Intel Xeon server manufactured by Qanta that uses the ASPEED ast2400
AT91:
- Axentia Nattis/Natte digital signage
- sama5d2 PTC-ek Evaluation board
Freescale/NXP i.MX:
- SolidRun Humminboard2 development board
- Variscite DART-MX6 SoM and Carrier-board
- Technologic TS-4600 and TS-7970 development board
- Toradex Colibri iMX7D SoM board
- v1.5 variant of Solidrun Cubox-i and Hummingboard
Freescale/NXP Layerscape:
- Moxa UC-8410A Series industrial computer
Gemini:
- D-Link DNS-313 NAS enclosure
OMAP:
- LogicPD OMAP35xx SOM-LV devkit
- LogicPD OMAP35xx Torpedo devkit
Renesas:
- r8a77970 (V3M) Starter Kit board
- r8a7795 (M3-W) Salvator-XS board
We finally managed to get the dtc warnings under control, with no more
build-time warnings for bad device tree files. This includes fixes for
the majority of platforms, including nomadik, samsung, lpc32xx, STi,
spear, mediatek, freescale, qcom, realview, keystone, omap, kirkwood,
renesas, hisilicon, and broadcom.
Files get rearranged on a few platforms, in particular the Marvell
Armada 7K/8K device tree files are changed in preparation for future
SoC support, based on more than two of the same chips in one package,
and some boards get renamed for oxnas for consistency.
Finally, many existing SoCs gain descriptions for additional on-chip
devices that we can now support with kernel drivers:
- Allwinner A83t (drm, ethernet, i2c, ...), H3/H5 (USB-OTG)
- Amlogic AXG family (clk, pinctrl, pwm, ...), and others (vpu, hdmi)
- Aspeed clk controller support
- Freescale LS1088A, LS1021A device support
- Gemini Ethernet, PCI, TVE, panel
- Keystone gpio, qspi, more uarts
- Mediatek cpufreq, regulator, clock, reset
- Marvell thermal, cpufreq, nand
- Renesas SMP, thermal, timer, PWM, sound, phy, ipmmu
- Rockchip Mipi, GPU, display
- Samsung Exynos5433 PMU, power domain, nfc
- Spreadtrum: sc9860 clocks
- Tegra TX2 PSDI, HDMI, I2C,SMMU, display, fuse, ..."
* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (690 commits)
arm64: dts: stratix10: fix SPI settings
ARM: dts: socfpga: add i2c reset signals
arm64: dts: stratix10: add USB ECC reset bit
arm64: dts: stratix10: enable USB on the devkit
ARM: dts: socfpga: disable over-current for Arria10 USB devkit
ARM: dts: Nokia N9: add support for up/down keys in the dts
ARM: dts: nomadik: add interrupt-parent for clcd
ARM: dts: Add ethernet to a bunch of platforms
ARM: dts: Add ethernet to the Gemini SoC
ARM: dts: rename oxnas dts files
ARM: dts: s5pv210: add interrupt-parent for ohci
ARM: lpc3250: fix uda1380 gpio numbers
ARM: dts: STi: Add gpio polarity for "hdmi,hpd-gpio" property
ARM: dts: dra7: Reduce shut down temperature of non-cpu thermal zones
ARM: dts: n900: Add aliases for lcd and tvout displays
ARM: dts: Update ti-sysc data for existing users
ARM: dts: Fix smartreflex compatible for omap3 shared mpu-iva instance
arm64: dts: marvell: armada-80x0: Fix pinctrl compatible string
arm: spear13xx: Fix spics gpio controller's warning
arm: spear13xx: Fix dmas cells
...
|
|
iWarp devices do not support the creation of address handles
so return AH_ATTR_TYPE_UNDEFINED for all iWarp devices.
While we are here reduce the size of port_num to u8 and add
a comment.
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Reported-by: Parav Pandit <[email protected]>
CC: Sean Hefty <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Shiraz Saleem <[email protected]>
Signed-off-by: Don Hiatt <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
packet->fecn and packet->becn are calculated in the hot path
and are never used. Remove these fields as they show to be
costly in a profile. Also, remove initialization for
becn and fecn in process_ecn() as they're unconditionally
assigned in the function and ensure fecn and becn variables
use a boolean type.
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
The packet type comparison used to find out if a packet is a bypass
packet in the hot path is an expensive operation as seen in a profile.
Determine packet's pkey and migration bit through the bypass and 9B
code paths instead.
Reviewed-by: Don Hiatt <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
In hfi1_rc_rcv(), BTH is computed for all packets received.
However, it's only used for packets received with opcodes
RDMA_WRITE_LAST and SEND_LAST, and it is a costly operation.
Compute BTH only in the RDMA_WRITE_LAST/SEND_LAST code path
and let the compiler handle endianness conversion for bitwise
operations.
Reviewed-by: Mike Marciniszyn <[email protected]>
Signed-off-by: Sebastian Sanchez <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
Propagate the ADR attribute flag from the NFIT platform capabilities
sub-table to nd_region.
Signed-off-by: Dave Jiang <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Signed-off-by: Ross Zwisler <[email protected]>
|
|
In ACPI 6.2a the platform capability structure has been added to the NFIT
tables. That provides software the ability to determine whether a system
supports the auto flushing of CPU caches on power loss. If the capability
is supported, we do not need to do dax_flush(). Plumbing the path to set the
property on per region from the NFIT tables.
This patch depends on the ACPI NFIT 6.2a platform capabilities support code
in include/acpi/actbl1.h.
Signed-off-by: Dave Jiang <[email protected]>
Reviewed-by: Ross Zwisler <[email protected]>
Signed-off-by: Ross Zwisler <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Add a console_msg_format command line option:
The value "default" keeps the old "[time stamp] text\n" format. The
value "syslog" allows to see the syslog-like "<log
level>[timestamp] text" format.
This feature was requested by people doing regression tests, for
example, 0day robot. They want to have both filtered and full logs
at hands.
- Reduce the risk of softlockup:
Pass the console owner in a busy loop.
This is a new approach to the old problem. It was first proposed by
Steven Rostedt on Kernel Summit 2017. It marks a context in which
the console_lock owner calls console drivers and could not sleep.
On the other side, printk() callers could detect this state and use
a busy wait instead of a simple console_trylock(). Finally, the
console_lock owner checks if there is a busy waiter at the end of
the special context and eventually passes the console_lock to the
waiter.
The hand-off works surprisingly well and helps in many situations.
Well, there is still a possibility of the softlockup, for example,
when the flood of messages stops and the last owner still has too
much to flush.
There is increasing number of people having problems with
printk-related softlockups. We might eventually need to get better
solution. Anyway, this looks like a good start and promising
direction.
- Do not allow to schedule in console_unlock() called from printk():
This reverts an older controversial commit. The reschedule helped
to avoid softlockups. But it also slowed down the console output.
This patch is obsoleted by the new console waiter logic described
above. In fact, the reschedule made the hand-off less effective.
- Deprecate "%pf" and "%pF" format specifier:
It was needed on ia64, ppc64 and parisc64 to dereference function
descriptors and show the real function address. It is done
transparently by "%ps" and "pS" format specifier now.
Sergey Senozhatsky found that all the function descriptors were in
a special elf section and could be easily detected.
- Remove printk_symbol() API:
It has been obsoleted by "%pS" format specifier, and this change
helped to remove few continuous lines and a less intuitive old API.
- Remove redundant memsets:
Sergey removed unnecessary memset when processing printk.devkmsg
command line option.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits)
printk: drop redundant devkmsg_log_str memsets
printk: Never set console_may_schedule in console_trylock()
printk: Hide console waiter logic into helpers
printk: Add console owner and waiter logic to load balance console writes
kallsyms: remove print_symbol() function
checkpatch: add pF/pf deprecation warning
symbol lookup: introduce dereference_symbol_descriptor()
parisc64: Add .opd based function descriptor dereference
powerpc64: Add .opd based function descriptor dereference
ia64: Add .opd based function descriptor dereference
sections: split dereference_function_descriptor()
openrisc: Fix conflicting types for _exext and _stext
lib: do not use print_symbol()
irq debug: do not use print_symbol()
sysfs: do not use print_symbol()
drivers: do not use print_symbol()
x86: do not use print_symbol()
unicore32: do not use print_symbol()
sh: do not use print_symbol()
mn10300: do not use print_symbol()
...
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Pull VFIO updates from Alex Williamson:
- Mask INTx from user if pdev->irq is zero (Alexey Kardashevskiy)
- Capability helper cleanup (Alex Williamson)
- Allow mmaps overlapping MSI-X vector table with region capability
exposing this feature (Alexey Kardashevskiy)
- mdev static cleanups (Xiongwei Song)
* tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfio:
vfio: mdev: make a couple of functions and structure vfio_mdev_driver static
vfio-pci: Allow mapping MSIX BAR
vfio: Simplify capability helper
vfio-pci: Mask INTx if a device is not capabable of enabling it
|
|
Merge KASAN word-at-a-time fixups from Andrey Ryabinin.
The word-at-a-time optimizations have caused headaches for KASAN, since
the whole point is that we access byte streams in bigger chunks, and
KASAN can be unhappy about the potential extra access at the end of the
string.
We used to have a horrible hack in dcache, and then people got
complaints from the strscpy() case. This fixes it all up properly, by
adding an explicit helper for the "access byte stream one word at a
time" case.
* emailed patches from Andrey Ryabinin <[email protected]>:
fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
lib/strscpy: Shut up KASAN false-positives in strscpy()
compiler.h: Add read_word_at_a_time() function.
compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
|
|
Sometimes we know that it's safe to do potentially out-of-bounds access
because we know it won't cross a page boundary. Still, KASAN will
report this as a bug.
Add read_word_at_a_time() function which is supposed to be used in such
cases. In read_word_at_a_time() KASAN performs relaxed check - only the
first byte of access is validated.
Signed-off-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of having two identical __read_once_size_nocheck() functions
with different attributes, consolidate all the difference in new macro
__no_kasan_or_inline and use it. No functional changes.
Signed-off-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
- Convert to use memblock_virt_alloc in DT code which supports
bootmem arches. With this we can remove the arch specific
early_init_dt_alloc_memory_arch() functions.
- Enable running the DT unittests on UML
- Use SPDX license tags on DT files
- Fix early FDT kconfig ifdef logic
- Clean-up unittest Makefile
- Fix function comment for of_irq_parse_raw
- Add missing documentation for linux,initrd-{start,end} properties
- Clean-up of binding examples using uppercase hex
- Add trivial devices W83773G and Infineon TLV493D-A1B6
- Add missing STM32 SoC bindings
- Various small binding doc fixes
* tag 'devicetree-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (23 commits)
xtensa: remove arch specific early DT functions
x86: remove arch specific early_init_dt_alloc_memory_arch
nios2: remove arch specific early_init_dt_alloc_memory_arch
mips: remove arch specific early_init_dt_alloc_memory_arch
metag: remove arch specific early DT functions
cris: remove arch specific early DT functions
libfdt: remove unnecessary include directive from <linux/libfdt.h>
of: unittest: refactor Makefile
of/fdt: use memblock_virt_alloc for early alloc
of: Use SPDX license tag for DT files
of/fdt: Fix #ifdef dependency of early flattree declarations
dt-bindings: h8300 clocksource: correct spelling of pulse
dt-bindings: imx6q-pcie: Add required property for i.MX6SX
mmc: Don't reference Linux-specific OF_GPIO_ACTIVE_LOW flag in DT binding
dt-bindings: Use lower case hex in unit-addresses
dt-bindings: display: panel: Fix compatible string for Toshiba LT089AC29000
dt-bindings: Add Infineon TLV493D-A1B6
dt-bindings: mailbox: ti,message-manager: Fix interrupt name error
dt-bindings: chosen: Document linux,initrd-{start,end}
dt-bindings: arm: document supported STM32 SoC family
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input layer updates from Dmitry Torokhov:
- evdev interface has been adjusted to extend the life of timestamps on
32 bit systems to the year of 2108
- Synaptics RMI4 driver's PS/2 guest handling ha beed updated to
improve chances of detecting trackpoints on the pass-through port
- mms114 touchcsreen controller driver has been updated to support
generic device properties and work with mms152 cntrollers
- Goodix driver now supports generic touchscreen properties
- couple of drivers for AVR32 architecture are gone as the architecture
support has been removed from the kernel
- gpio-tilt driver has been removed as there are no mainline users and
the driver itself is using legacy APIs and relies on platform data
- MODULE_LINECSE/MODULE_VERSION cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits)
Input: goodix - use generic touchscreen_properties
Input: mms114 - fix typo in definition
Input: mms114 - use BIT() macro instead of explicit shifting
Input: mms114 - replace mdelay with msleep
Input: mms114 - add support for mms152
Input: mms114 - drop platform data and use generic APIs
Input: mms114 - mark as direct input device
Input: mms114 - do not clobber interrupt trigger
Input: edt-ft5x06 - fix error handling for factory mode on non-M06
Input: stmfts - set IRQ_NOAUTOEN to the irq flag
Input: auo-pixcir-ts - delete an unnecessary return statement
Input: auo-pixcir-ts - remove custom log for a failed memory allocation
Input: da9052_tsi - remove unused mutex
Input: docs - use PROPERTY_ENTRY_U32() directly
Input: synaptics-rmi4 - log when we create a guest serio port
Input: synaptics-rmi4 - unmask F03 interrupts when port is opened
Input: synaptics-rmi4 - do not delete interrupt memory too early
Input: ad7877 - use managed resource allocations
Input: stmfts,s6sy671 - add SPDX identifier
Input: remove atmel-wm97xx touchscreen driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big pull request for char/misc drivers for 4.16-rc1.
There's a lot of stuff in here. Three new driver subsystems were added
for various types of hardware busses:
- siox
- slimbus
- soundwire
as well as a new vboxguest subsystem for the VirtualBox hypervisor
drivers.
There's also big updates from the FPGA subsystem, lots of Android
binder fixes, the usual handful of hyper-v updates, and lots of other
smaller driver updates.
All of these have been in linux-next for a long time, with no reported
issues"
* tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (155 commits)
char: lp: use true or false for boolean values
android: binder: use VM_ALLOC to get vm area
android: binder: Use true and false for boolean values
lkdtm: fix handle_irq_event symbol for INT_HW_IRQ_EN
EISA: Delete error message for a failed memory allocation in eisa_probe()
EISA: Whitespace cleanup
misc: remove AVR32 dependencies
virt: vbox: Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES
soundwire: Fix a signedness bug
uio_hv_generic: fix new type mismatch warnings
uio_hv_generic: fix type mismatch warnings
auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
uio_hv_generic: add rescind support
uio_hv_generic: check that host supports monitor page
uio_hv_generic: create send and receive buffers
uio: document uio_hv_generic regions
doc: fix documentation about uio_hv_generic
vmbus: add monitor_id and subchannel_id to sysfs per channel
vmbus: fix ABI documentation
uio_hv_generic: use ISR callback method
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big Staging and IIO driver patches for 4.16-rc1.
There is the normal amount of new IIO drivers added, like all
releases.
The networking IPX and the ncpfs filesystem are moved into the staging
tree, as they are on their way out of the kernel due to lack of use
anymore.
The visorbus subsystem finall has started moving out of the staging
tree to the "real" part of the kernel, and the most and fsl-mc
codebases are almost ready to move out, that will probably happen for
4.17-rc1 if all goes well.
Other than that, there is a bunch of license header cleanups in the
tree, along with the normal amount of coding style churn that we all
know and love for this codebase. I also got frustrated at the
Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
huge chunks of it that were never even being used.
Full details of everything is in the shortlog.
All of these patches have been in linux-next for a while with no
reported issues"
* tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits)
staging: rtlwifi: remove redundant initialization of 'cfg_cmd'
staging: rtl8723bs: remove a couple of redundant initializations
staging: comedi: reformat lines to 80 chars or less
staging: lustre: separate a connection destroy from free struct kib_conn
Staging: rtl8723bs: Use !x instead of NULL comparison
Staging: rtl8723bs: Remove dead code
Staging: rtl8723bs: Change names to conform to the kernel code
staging: ccree: Fix missing blank line after declaration
staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd'
staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST
staging: fbtft: remove unused FB_TFT_SSD1325 kconfig
staging: comedi: dt2811: remove redundant initialization of 'ns'
staging: wilc1000: fix alignments to match open parenthesis
staging: wilc1000: removed unnecessary defined enums typedef
staging: wilc1000: remove unnecessary use of parentheses
staging: rtl8192u: remove redundant initialization of 'timeout'
staging: sm750fb: fix CamelCase for dispSet var
staging: lustre: lnet/selftest: fix compile error on UP build
staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons
staging: rts5208: Fix "seg_no" calculation in reset_ms_card()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/staging driver updates from Greg KH:
"Here is the big tty/serial driver update for 4.16-rc1.
The usual number of various serial driver fixes and updates to try to
get them to work with crazy hardware configurations (seriously, how
many different ways are hardware engineers going to come up with to
hook up a simple UART?)
There is also some serdev bugfixes and updates, as well as a
smattering of other small fixes in here.
All have been in the linux-next tree for a while, with no reported
issues"
* tag 'tty-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits)
tty: serial: exar: Relocate sleep wake-up handling
tty: fix data race between tty_init_dev and flush of buf
serial: imx: fix endless loop during suspend
serial: core: mark port as initialized after successful IRQ change
serdev: only match serdev devices
serdev: do not generate modaliases for controllers
serial: mxs-auart: don't use GPIOF_* with gpiod_get_direction
serial: 8250_dw: Revert "Improve clock rate setting"
MAINTAINERS: Add myself as designated reviewer for 8250_dw
gpio: serial: max310x: Support open-drain configuration for GPIOs
serdev: Fix serdev_uevent failure on ACPI enumerated serdev-controllers
serial: 8250_ingenic: Parse earlycon options
serial: 8250_ingenic: Add support for the JZ4770 SoC
serial: core: Make uart_parse_options take const char* argument
serial: 8250_of: fix return code when probe function fails to get reset
serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
serial: 8250_uniphier: fix error return code in uniphier_uart_probe()
tty: n_gsm: Allow ADM response in addition to UA for control dlci
tty: omap-serial: Fix initial on-boot RTS GPIO level
tty: serial: jsm: Add one check against NULL pointer dereference
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/PHY updates from Greg KH:
"Here is the big USB and PHY driver update for 4.16-rc1.
Along with the normally expected XHCI, MUSB, and Gadget driver
patches, there are some PHY driver fixes, license cleanups, sysfs
attribute cleanups, usbip changes, and a raft of other smaller fixes
and additions.
Full details are in the shortlog.
All of these have been in the linux-next tree for a long time with no
reported issues"
* tag 'usb-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (137 commits)
USB: serial: pl2303: new device id for Chilitag
USB: misc: fix up some remaining DEVICE_ATTR() usages
USB: musb: fix up one odd DEVICE_ATTR() usage
USB: atm: fix up some remaining DEVICE_ATTR() usage
USB: move many drivers to use DEVICE_ATTR_WO
USB: move many drivers to use DEVICE_ATTR_RO
USB: move many drivers to use DEVICE_ATTR_RW
USB: misc: chaoskey: Use true and false for boolean values
USB: storage: remove old wording about how to submit a change
USB: storage: remove invalid URL from drivers
usb: ehci-omap: don't complain on -EPROBE_DEFER when no PHY found
usbip: list: don't list devices attached to vhci_hcd
usbip: prevent bind loops on devices attached to vhci_hcd
USB: serial: remove redundant initializations of 'mos_parport'
usb/gadget: Fix "high bandwidth" check in usb_gadget_ep_match_desc()
usb: gadget: compress return logic into one line
usbip: vhci_hcd: update 'status' file header and format
USB: serial: simple: add Motorola Tetra driver
CDC-ACM: apply quirk for card reader
usb: option: Add support for FS040U modem
...
|
|
* pci/spdx:
PCI: Add SPDX GPL-2.0+ to replace implicit GPL v2 or later statement
PCI: Add SPDX GPL-2.0+ to replace GPL v2 or later boilerplate
PCI: Add SPDX GPL-2.0 to replace COPYING boilerplate
PCI: Add SPDX GPL-2.0 to replace GPL v2 boilerplate
PCI: Add SPDX GPL-2.0 when no license was specified
|
|
syzkaller was able to generate the following XDP program ...
(18) r0 = 0x0
(61) r5 = *(u32 *)(r1 +12)
(04) (u32) r0 += (u32) 0
(95) exit
... and trigger a NULL pointer dereference in ___bpf_prog_run()
via bpf_prog_test_run_xdp() where this was attempted to run.
Reason is that recent xdp_rxq_info addition to XDP programs
updated all drivers, but not bpf_prog_test_run_xdp(), where
xdp_buff is set up. Thus when context rewriter does the deref
on the netdev it's NULL at runtime. Fix it by using xdp_rxq
from loopback dev. __netif_get_rx_queue() helper can also be
reused in various other locations later on.
Fixes: 02dd3291b2f0 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs")
Reported-by: [email protected]
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
If you take a GSO skb, and split it into packets, will the MAC
length (L2 + L3 + L4 headers + payload) of those packets be small
enough to fit within a given length?
Move skb_gso_mac_seglen() to skbuff.h with other related functions
like skb_gso_network_seglen() so we can use it, and then create
skb_gso_validate_mac_len to do the full calculation.
Signed-off-by: Daniel Axtens <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Topic branch for stable KVM clockource under Hyper-V.
Thanks to Christoffer Dall for resolving the ARM conflict.
|
|
The function inode_cmp_iversion{+raw} is counter-intuitive, because it
returns true when the counters are different and false when these are equal.
Rename it to inode_eq_iversion{+raw}, which will returns true when
the counters are equal and false otherwise.
Signed-off-by: Goffredo Baroncelli <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
|
|
Pull documentation updates from Jonathan Corbet:
"Documentation updates for 4.16.
New stuff includes refcount_t documentation, errseq documentation,
kernel-doc support for nested structure definitions, the removal of
lots of crufty kernel-doc support for unused formats, SPDX tag
documentation, the beginnings of a manual for subsystem maintainers,
and lots of fixes and updates.
As usual, some of the changesets reach outside of Documentation/ to
effect kerneldoc comment fixes. It also adds the new LICENSES
directory, of which Thomas promises I do not need to be the
maintainer"
* tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits)
linux-next: docs-rst: Fix typos in kfigure.py
linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt
Documentation: Fix misconversion of #if
docs: add index entry for networking/msg_zerocopy
Documentation: security/credentials.rst: explain need to sort group_list
LICENSES: Add MPL-1.1 license
LICENSES: Add the GPL 1.0 license
LICENSES: Add Linux syscall note exception
LICENSES: Add the MIT license
LICENSES: Add the BSD-3-clause "Clear" license
LICENSES: Add the BSD 3-clause "New" or "Revised" License
LICENSES: Add the BSD 2-clause "Simplified" license
LICENSES: Add the LGPL-2.1 license
LICENSES: Add the LGPL 2.0 license
LICENSES: Add the GPL 2.0 license
Documentation: Add license-rules.rst to describe how to properly identify file licenses
scripts: kernel_doc: better handle show warnings logic
fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at
doc: md: Fix a file name to md-fault.c in fault-injection.txt
errseq: Add to documentation tree
...
|
|
Merge updates from Andrew Morton:
- misc fixes
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <[email protected]>: (118 commits)
mm: remove PG_highmem description
tools, vm: new option to specify kpageflags file
mm/swap.c: make functions and their kernel-doc agree
mm, memory_hotplug: fix memmap initialization
mm: correct comments regarding do_fault_around()
mm: numa: do not trap faults on shared data section pages.
hugetlb, mbind: fall back to default policy if vma is NULL
hugetlb, mempolicy: fix the mbind hugetlb migration
mm, hugetlb: further simplify hugetlb allocation API
mm, hugetlb: get rid of surplus page accounting tricks
mm, hugetlb: do not rely on overcommit limit during migration
mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
mm, hugetlb: unify core page allocation accounting and initialization
mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
mm/memcontrol.c: make local symbol static
mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
mm/compaction.c: fix comment for try_to_compact_pages()
mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
zsmalloc: use U suffix for negative literals being shifted
...
|
|
Commit cbe37d093707 ("[PATCH] mm: remove PG_highmem") removed PG_highmem
to save a page flag. So the description of PG_highmem is no longer
needed.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Miles Chen <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Dan Carpenter has noticed that mbind migration callback (new_page) can
get a NULL vma pointer and choke on it inside alloc_huge_page_vma which
relies on the VMA to get the hstate. We used to BUG_ON this case but
the BUG_+ON has been removed recently by "hugetlb, mempolicy: fix the
mbind hugetlb migration".
The proper way to handle this is to get the hstate from the migrated
page and rely on huge_node (resp. get_vma_policy) do the right thing
with null VMA. We are currently falling back to the default mempolicy
in that case which is in line what THP path is doing here.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
do_mbind migration code relies on alloc_huge_page_noerr for hugetlb
pages. alloc_huge_page_noerr uses alloc_huge_page which is a highlevel
allocation function which has to take care of reserves, overcommit or
hugetlb cgroup accounting. None of that is really required for the page
migration because the new page is only temporal and either will replace
the original page or it will be dropped. This is essentially as for
other migration call paths and there shouldn't be any reason to handle
mbind in a special way.
The current implementation is even suboptimal because the migration
might fail just because the hugetlb cgroup limit is reached, or the
overcommit is saturated.
Fix this by making mbind like other hugetlb migration paths. Add a new
migration helper alloc_huge_page_vma as a wrapper around
alloc_huge_page_nodemask with additional mempolicy handling.
alloc_huge_page_noerr has no more users and it can go.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Andrea Reale <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
hugepage migration relies on __alloc_buddy_huge_page to get a new page.
This has 2 main disadvantages.
1) it doesn't allow to migrate any huge page if the pool is used
completely which is not an exceptional case as the pool is static and
unused memory is just wasted.
2) it leads to a weird semantic when migration between two numa nodes
might increase the pool size of the destination NUMA node while the
page is in use. The issue is caused by per NUMA node surplus pages
tracking (see free_huge_page).
Address both issues by changing the way how we allocate and account
pages allocated for migration. Those should temporal by definition. So
we mark them that way (we will abuse page flags in the 3rd page) and
update free_huge_page to free such pages to the page allocator. Page
migration path then just transfers the temporal status from the new page
to the old one which will be freed on the last reference. The global
surplus count will never change during this path but we still have to be
careful when migrating a per-node suprlus page. This is now handled in
move_hugetlb_state which is called from the migration path and it copies
the hugetlb specific page state and fixes up the accounting when needed
Rename __alloc_buddy_huge_page to __alloc_surplus_huge_page to better
reflect its purpose. The new allocation routine for the migration path
is __alloc_migrate_huge_page.
The user visible effect of this patch is that migrated pages are really
temporal and they travel between NUMA nodes as per the migration
request:
Before migration
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
After
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
/sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:1
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
with the previous implementation, both nodes would have nr_hugepages:1
until the page is freed.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Andrea Reale <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
mem_map pointer
The comment is confusing. On the one hand, it refers to 32-bit
alignment (struct page alignment on 32-bit platforms), but this would
only guarantee that the 2 lowest bits must be zero. On the other hand,
it claims that at least 3 bits are available, and 3 bits are actually
used.
This is not broken, because there is a stronger alignment guarantee,
just less obvious. Let's fix the comment to make it clear how many bits
are available and why.
Although memmap arrays are allocated in various places, the resulting
pointer is encoded eventually, so I am adding a BUG_ON() here to enforce
at runtime that all expected bits are indeed available.
I have also added a BUILD_BUG_ON to check that PFN_SECTION_SHIFT is
sufficient, because this part of the calculation can be easily checked
at build time.
[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Tesarik <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kemi Wang <[email protected]>
Cc: YASUAKI ISHIMATSU <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We waste sizeof(swp_entry_t) for zswap header when using zsmalloc as
zpool driver because zsmalloc doesn't support eviction.
Add zpool_evictable() to detect if zpool is potentially evictable, and
use it in zswap to avoid waste memory for zswap header.
[[email protected]: The zpool->" prefix is a result of copy & paste]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yu Zhao <[email protected]>
Acked-by: Dan Streetman <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Implements memfd sealing, similar to shmem:
- WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
memfd_add_seals(). write() doesn't exist for hugetlbfs.
- SHRINK: added similar check as shmem_setattr()
- GROW: added similar check as shmem_setattr() & shmem_fallocate()
Except write() operation that doesn't exist with hugetlbfs, that should
make sealing as close as it can be to shmem support.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Herrmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
hugetlbfs inode information will need to be accessed by code in
mm/shmem.c for file sealing operations. Move inode information
definition from .c file to header for needed access.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Herrmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Those functions are called for memfd files, backed by shmem or hugetlb
(the next patches will handle hugetlb).
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Herrmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "memfd: add sealing to hugetlb-backed memory", v3.
Recently, Mike Kravetz added hugetlbfs support to memfd. However, he
didn't add sealing support. One of the reasons to use memfd is to have
shared memory sealing when doing IPC or sharing memory with another
process with some extra safety. qemu uses shared memory & hugetables
with vhost-user (used by dpdk), so it is reasonable to use memfd now
instead for convenience and security reasons.
This patch (of 9):
The functions are called through shmem_fcntl() only. And no danger in
removing the EXPORTs as the routines only work with shmem file structs.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Herrmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
PG_buddy doesn't exist any more. It's called PageBuddy now.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Be really explicit about what bits / bytes are reserved for users that
want to store extra information about the pages they allocate.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Neither of these values get even close to 256; compound_dtor is
currently at a maximum of 3, and compound_order can't be over 64. No
machine has inefficient access to bytes since EV5, and while those are
still supported, we don't optimise for them any more. This does not
shrink struct page, but it removes an ifdef and frees up 2-6 bytes for
future use.
diff of pahole output:
struct callback_head callback_head; /* 32 16 */
struct {
long unsigned int compound_head; /* 32 8 */
- unsigned int compound_dtor; /* 40 4 */
- unsigned int compound_order; /* 44 4 */
+ unsigned char compound_dtor; /* 40 1 */
+ unsigned char compound_order; /* 41 1 */
}; /* 32 16 */
}; /* 32 16 */
union {
[[email protected]: add comment]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of putting the ifdef in the middle of the definition of struct
page, pull it forward to the rest of the ifdeffery around the SLUB
cmpxchg_double optimisation.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The comment on page->mapping is terse, and out of date (it does not
mention the possibility of PAGE_MAPPING_MOVABLE). Instead, point the
interested reader to page-flags.h where there is a much better comment.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The "third double word block" isn't on 32-bit systems. The layout looks
like this:
unsigned long flags;
struct address_space *mapping
pgoff_t index;
atomic_t _mapcount;
atomic_t _refcount;
which is 32 bytes on 64-bit, but 20 bytes on 32-bit. Nobody is trying to
use the fact that it's double-word aligned today, so just remove the
misleading claims.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I found the struct { union { struct { union { struct { } } } } } layout
rather confusing. Fortunately, there is an easier way to write this.
The innermost union is of four things which are the size of an int, so
the ones which are used by slab/slob/slub can be pulled up two levels to
be in the outermost union with 'counters'. That leaves us with struct {
union { struct { atomic_t; atomic_t; } } } which has the same layout,
but is easier to read.
Output from the current git version of pahole, diffed with -uw to ignore
the whitespace changes from the indentation:
}; /* 16 8 */
union {
long unsigned int counters; /* 24 8 */
- struct {
- union {
- atomic_t _mapcount; /* 24 4 */
unsigned int active; /* 24 4 */
struct {
unsigned int inuse:16; /* 24:16 4 */
@@ -21,7 +18,8 @@
unsigned int frozen:1; /* 24: 0 4 */
}; /* 24 4 */
int units; /* 24 4 */
- }; /* 24 4 */
+ struct {
+ atomic_t _mapcount; /* 24 4 */
atomic_t _refcount; /* 28 4 */
}; /* 24 8 */
}; /* 24 8 */
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Restructure struct page", v2.
This series does not attempt any grand restructuring. Instead, it cures
the worst of the indentitis, fixes the documentation and reduces the
ifdeffery. The only layout change is compound_dtor and compound_order
are each reduced to one byte.
This patch (of 8):
Instead of an ifdef block at the end of the struct, which needed its own
comment, define _struct_page_alignment up at the top where it fits
nicely with the existing comment.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 4d4bbd8526a8 ("mm, oom_reaper: skip mm structs with mmu
notifiers") prevented the oom reaper from unmapping private anonymous
memory with the oom reaper when the oom victim mm had mmu notifiers
registered.
The rationale is that doing mmu_notifier_invalidate_range_{start,end}()
around the unmap_page_range(), which is needed, can block and the oom
killer will stall forever waiting for the victim to exit, which may not
be possible without reaping.
That concern is real, but only true for mmu notifiers that have
blockable invalidate_range_{start,end}() callbacks. This patch adds a
"flags" field to mmu notifier ops that can set a bit to indicate that
these callbacks do not block.
The implementation is steered toward an expensive slowpath, such as
after the oom reaper has grabbed mm->mmap_sem of a still alive oom
victim.
[[email protected]: mmu_notifier_invalidate_range_end() can also call the invalidate_range() must not block, fix comment]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: make mm_has_blockable_invalidate_notifiers() return bool, use rwsem_is_locked()]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Rientjes <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Dimitri Sivanich <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Oded Gabbay <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Mike Marciniszyn <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of marking the pmd ready for split, invalidate the pmd. This
should take care of powerpc requirement. Only side effect is that we
mark the pmd invalid early. This can result in us blocking access to
the page a bit longer if we race against a thp split.
[[email protected]: rebased, dirty THP once]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Daney <[email protected]>
Cc: David Miller <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Vlastimil noted that pmdp_invalidate() is not atomic and we can lose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().
The patch change pmdp_invalidate() to make the entry non-present
atomically and return previous value of the entry. This value can be
used to check if CPU set dirty/accessed bits under us.
The race window is very small and I haven't seen any reports that can be
attributed to the bug. For this reason, I don't think backporting to
stable trees needed.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reported-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Daney <[email protected]>
Cc: David Miller <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vineet Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Do not lose dirty bit on THP pages", v4.
Vlastimil noted that pmdp_invalidate() is not atomic and we can lose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().
The bug can lead to data loss, but the race window is tiny and I haven't
seen any reports that suggested that it happens in reality. So I don't
think it worth sending it to stable.
Unfortunately, there's no way to address the issue in a generic way. We
need to fix all architectures that support THP one-by-one.
All architectures that have THP supported have to provide atomic
pmdp_invalidate() that returns previous value.
If generic implementation of pmdp_invalidate() is used, architecture
needs to provide atomic pmdp_estabish().
pmdp_estabish() is not used out-side generic implementation of
pmdp_invalidate() so far, but I think this can change in the future.
This patch (of 12):
This is an implementation of pmdp_establish() that is only suitable for
an architecture that doesn't have hardware dirty/accessed bits. In this
case we can't race with CPU which sets these bits and non-atomic
approach is fine.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Daney <[email protected]>
Cc: David Miller <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vineet Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|