blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2020-10-20	xen/blkback: use lateeoi irq binding	Juergen Gross	2	-8/+19
	In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving blkfront use the lateeoi irq binding for blkback and unmask the event channel only after processing all pending requests. As the thread processing requests is used to do purging work in regular intervals an EOI may be sent only after having received an event. If there was no pending I/O request flag the EOI as spurious. This is part of XSA-332. Cc: [email protected] Reported-by: Julien Grall <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Reviewed-by: Wei Liu <[email protected]>
2020-10-20	xen/events: add a new "late EOI" evtchn framework	Juergen Gross	2	-17/+155
	In order to avoid tight event channel related IRQ loops add a new framework of "late EOI" handling: the IRQ the event channel is bound to will be masked until the event has been handled and the related driver is capable to handle another event. The driver is responsible for unmasking the event channel via the new function xen_irq_lateeoi(). This is similar to binding an event channel to a threaded IRQ, but without having to structure the driver accordingly. In order to support a future special handling in case a rogue guest is sending lots of unsolicited events, add a flag to xen_irq_lateeoi() which can be set by the caller to indicate the event was a spurious one. This is part of XSA-332. Cc: [email protected] Reported-by: Julien Grall <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Reviewed-by: Wei Liu <[email protected]>
2020-10-20	xen/events: fix race in evtchn_fifo_unmask()	Juergen Gross	1	-4/+9
	Unmasking a fifo event channel can result in unmasking it twice, once directly in the kernel and once via a hypercall in case the event was pending. Fix that by doing the local unmask only if the event is not pending. This is part of XSA-332. Cc: [email protected] Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Jan Beulich <[email protected]>
2020-10-20	xen/events: add a proper barrier to 2-level uevent unmasking	Juergen Gross	1	-0/+2
	A follow-up patch will require certain write to happen before an event channel is unmasked. While the memory barrier is not strictly necessary for all the callers, the main one will need it. In order to avoid an extra memory barrier when using fifo event channels, mandate evtchn_unmask() to provide write ordering. The 2-level event handling unmask operation is missing an appropriate barrier, so add it. Fifo event channels are fine in this regard due to using sync_cmpxchg(). This is part of XSA-332. Cc: [email protected] Suggested-by: Julien Grall <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Julien Grall <[email protected]> Reviewed-by: Wei Liu <[email protected]>
2020-10-20	xen/events: avoid removing an event channel while handling it	Juergen Gross	1	-5/+36
	Today it can happen that an event channel is being removed from the system while the event handling loop is active. This can lead to a race resulting in crashes or WARN() splats when trying to access the irq_info structure related to the event channel. Fix this problem by using a rwlock taken as reader in the event handling loop and as writer when deallocating the irq_info structure. As the observed problem was a NULL dereference in evtchn_from_irq() make this function more robust against races by testing the irq_info pointer to be not NULL before dereferencing it. And finally make all accesses to evtchn_to_irq[row][col] atomic ones in order to avoid seeing partial updates of an array element in irq handling. Note that irq handling can be entered only for event channels which have been valid before, so any not populated row isn't a problem in this regard, as rows are only ever added and never removed. This is XSA-331. Cc: [email protected] Reported-by: Marek Marczykowski-Górecki <[email protected]> Reported-by: Jinoh Kang <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Reviewed-by: Wei Liu <[email protected]>
2020-10-19	Merge tag 'riscv-for-linus-5.10-mw0' of ↵	Linus Torvalds	31	-241/+1212
	git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: "A handful of cleanups and new features: - A handful of cleanups for our page fault handling - Improvements to how we fill out cacheinfo - Support for EFI-based systems" * tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (22 commits) RISC-V: Add page table dump support for uefi RISC-V: Add EFI runtime services RISC-V: Add EFI stub support. RISC-V: Add PE/COFF header for EFI stub RISC-V: Implement late mapping page table allocation functions RISC-V: Add early ioremap support RISC-V: Move DT mapping outof fixmap RISC-V: Fix duplicate included thread_info.h riscv/mm/fault: Set FAULT_FLAG_INSTRUCTION flag in do_page_fault() riscv/mm/fault: Fix inline placement in vmalloc_fault() declaration riscv: Add cache information in AUX vector riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO riscv: Set more data to cacheinfo riscv/mm/fault: Move access error check to function riscv/mm/fault: Move FAULT_FLAG_WRITE handling in do_page_fault() riscv/mm/fault: Simplify mm_fault_error() riscv/mm/fault: Move fault error handling to mm_fault_error() riscv/mm/fault: Simplify fault error handling riscv/mm/fault: Move vmalloc fault handling to vmalloc_fault() riscv/mm/fault: Move bad area handling to bad_area() ...
2020-10-19	Merge tag 'm68knommu-for-v5.10' of ↵	Linus Torvalds	6	-559/+402
	git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: "A collection of fixes for 5.10: - switch to using asm-generic uaccess code - fix sparse warnings in signal code - fix compilation of ColdFire MMC support - support sysrq in ColdFire serial driver" * tag 'm68knommu-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: serial: mcf: add sysrq capability m68knommu: include SDHC support only when hardware has it m68knommu: fix sparse warnings in signal code m68knommu: switch to using asm-generic/uaccess.h
2020-10-19	Merge tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds	37	-509/+1023
	Pull more xfs updates from Darrick Wong: "The second large pile of new stuff for 5.10, with changes even more monumental than last week! We are formally announcing the deprecation of the V4 filesystem format in 2030. All users must upgrade to the V5 format, which contains design improvements that greatly strengthen metadata validation, supports reflink and online fsck, and is the intended vehicle for handling timestamps past 2038. We're also deprecating the old Irix behavioral tweaks in September 2025. Coming along for the ride are two design changes to the deferred metadata ops subsystem. One of the improvements is to retain correct logical ordering of tasks and subtasks, which is a more logical design for upper layers of XFS and will become necessary when we add atomic file range swaps and commits. The second improvement to deferred ops improves the scalability of the log by helping the log tail to move forward during long-running operations. This reduces log contention when there are a large number of threads trying to run transactions. In addition to that, this fixes numerous small bugs in log recovery; refactors logical intent log item recovery to remove the last remaining place in XFS where we could have nested transactions; fixes a couple of ways that intent log item recovery could fail in ways that wouldn't have happened in the regular commit paths; fixes a deadlock vector in the GETFSMAP implementation (which improves its performance by 20%); and fixes serious bugs in the realtime growfs, fallocate, and bitmap handling code. Summary: - Deprecate the V4 filesystem format, some disused mount options, and some legacy sysctl knobs now that we can support dates into the 25th century. Note that removal of V4 support will not happen until the early 2030s. - Fix some probles with inode realtime flag propagation. - Fix some buffer handling issues when growing a rt filesystem. - Fix a problem where a BMAP_REMAP unmap call would free rt extents even though the purpose of BMAP_REMAP is to avoid freeing the blocks. - Strengthen the dabtree online scrubber to check hash values on child dabtree blocks. - Actually log new intent items created as part of recovering log intent items. - Fix a bug where quotas weren't attached to an inode undergoing bmap intent item recovery. - Fix a buffer overrun problem with specially crafted log buffer headers. - Various cleanups to type usage and slightly inaccurate comments. - More cleanups to the xattr, log, and quota code. - Don't run the (slower) shared-rmap operations on attr fork mappings. - Fix a bug where we failed to check the LSN of finobt blocks during replay and could therefore overwrite newer data with older data. - Clean up the ugly nested transaction mess that log recovery uses to stage intent item recovery in the correct order by creating a proper data structure to capture recovered chains. - Use the capture structure to resume intent item chains with the same log space and block reservations as when they were captured. - Fix a UAF bug in bmap intent item recovery where we failed to maintain our reference to the incore inode if the bmap operation needed to relog itself to continue. - Rearrange the defer ops mechanism to finish newly created subtasks of a parent task before moving on to the next parent task. - Automatically relog intent items in deferred ops chains if doing so would help us avoid pinning the log tail. This will help fix some log scaling problems now and will facilitate atomic file updates later. - Fix a deadlock in the GETFSMAP implementation by using an internal memory buffer to reduce indirect calls and copies to userspace, thereby improving its performance by ~20%. - Fix various problems when calling growfs on a realtime volume would not fully update the filesystem metadata. - Fix broken Kconfig asking about deprecated XFS when XFS is disabled" * tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits) xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n xfs: fix high key handling in the rt allocator's query_range function xfs: annotate grabbing the realtime bitmap/summary locks in growfs xfs: make xfs_growfs_rt update secondary superblocks xfs: fix realtime bitmap/summary file truncation when growing rt volume xfs: fix the indent in xfs_trans_mod_dquot xfs: do the ASSERT for the arguments O_{u,g,p}dqpp xfs: fix deadlock and streamline xfs_getfsmap performance xfs: limit entries returned when counting fsmap records xfs: only relog deferred intent items if free space in the log gets low xfs: expose the log push threshold xfs: periodically relog deferred intent items xfs: change the order in which child and parent defer ops are finished xfs: fix an incore inode UAF in xfs_bui_recover xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering xfs: clean up bmap intent item recovery checking xfs: xfs_defer_capture should absorb remaining transaction reservation xfs: xfs_defer_capture should absorb remaining block reservations xfs: proper replay of deferred ops queued during log recovery xfs: remove XFS_LI_RECOVERED ...
2020-10-19	Merge tag 'fuse-update-5.10' of ↵	Linus Torvalds	20	-496/+2689
	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Support directly accessing host page cache from virtiofs. This can improve I/O performance for various workloads, as well as reducing the memory requirement by eliminating double caching. Thanks to Vivek Goyal for doing most of the work on this. - Allow automatic submounting inside virtiofs. This allows unique st_dev/ st_ino values to be assigned inside the guest to files residing on different filesystems on the host. Thanks to Max Reitz for the patches. - Fix an old use after free bug found by Pradeep P V K. * tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits) virtiofs: calculate number of scatter-gather elements accurately fuse: connection remove fix fuse: implement crossmounts fuse: Allow fuse_fill_super_common() for submounts fuse: split fuse_mount off of fuse_conn fuse: drop fuse_conn parameter where possible fuse: store fuse_conn in fuse_req fuse: add submount support to <uapi/linux/fuse.h> fuse: fix page dereference after free virtiofs: add logic to free up a memory range virtiofs: maintain a list of busy elements virtiofs: serialize truncate/punch_hole and dax fault path virtiofs: define dax address space operations virtiofs: add DAX mmap support virtiofs: implement dax read/write operations virtiofs: introduce setupmapping/removemapping commands virtiofs: implement FUSE_INIT map_alignment field virtiofs: keep a list of free dax memory ranges virtiofs: add a mount option to enable dax virtiofs: set up virtio_fs dax_device ...
2020-10-19	Merge tag 'zonefs-5.10-rc1' of ↵	Linus Torvalds	3	-13/+233
	git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs updates from Damien Le Moal: "Add an 'explicit-open' mount option to automatically issue a REQ_OP_ZONE_OPEN command to the device whenever a sequential zone file is open for writing for the first time. This avoids 'insufficient zone resources' errors for write operations on some drives with limited zone resources or on ZNS drives with a limited number of active zones. From Johannes" * tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: document the explicit-open mount option zonefs: open/close zone on file open/close zonefs: provide no-lock zonefs_io_error variant zonefs: introduce helper for zone management
2020-10-19	rtc: r9701: set range	Alexandre Belloni	1	-5/+3
	Set range and remove the set_time check. This is a classic BCD RTC. Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	rtc: r9701: convert to devm_rtc_allocate_device	Alexandre Belloni	1	-3/+3
	This allows further improvement of the driver. Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	rtc: r9701: stop setting RWKCNT	Alexandre Belloni	1	-1/+0
	tm_wday is never checked for validity and it is not read back in r9701_get_datetime. Avoid setting it to stop tripping static checkers: drivers/rtc/rtc-r9701.c:109 r9701_set_datetime() error: undefined (user controlled) shift '1 << dt->tm_wday' Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	rtc: r9701: remove useless memset	Alexandre Belloni	1	-2/+0
	The RTC core already sets to zero the struct rtc_tie it passes to the driver, avoid doing it a second time. Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	rtc: r9701: stop setting a default time	Alexandre Belloni	1	-22/+0
	It doesn't make sense to set the RTC to a default value at probe time. Let the core handle invalid date and time. Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	rtc: r9701: remove leftover comment	Alexandre Belloni	1	-4/+0
	Commit 22652ba72453 ("rtc: stop validating rtc_time in .read_time") removed the code but not the associated comment. Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	rtc: rv3032: Add a driver for Microcrystal RV-3032	Alexandre Belloni	3	-0/+936
	New driver for the Microcrystal RV-3032, including support for: - Date/time - Alarms - Low voltage detection - Trickle charge - Trimming - Clkout - RAM - EEPROM - Temperature sensor Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	dt-bindings: rtc: rv3032: add RV-3032 bindings	Alexandre Belloni	1	-0/+64
	Document the Microcrystal RV-3032 device tree bindings Signed-off-by: Alexandre Belloni <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	dt-bindings: rtc: add trickle-voltage-millivolt	Alexandre Belloni	1	-0/+6
	Some RTCs have a trickle charge that is able to output different voltages depending on the type of the connected auxiliary power (battery, supercap, ...). Add a property allowing to specify the necessary voltage. Signed-off-by: Alexandre Belloni <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-19	io_uring: use blk_queue_nowait() to check if NOWAIT supported	Jeffle Xu	1	-1/+1
	commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") adds a new helper function blk_queue_nowait() to check if the bdev supports handling of REQ_NOWAIT or not. Since then bio-based dm device can also support REQ_NOWAIT, and currently only dm-linear supports that since commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable it for linear target"). Signed-off-by: Jeffle Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-10-19	vfio/pci: Clear token on bypass registration failure	Alex Williamson	1	-1/+3
	The eventfd context is used as our irqbypass token, therefore if an eventfd is re-used, our token is the same. The irqbypass code will return an -EBUSY in this case, but we'll still attempt to unregister the producer, where if that duplicate token still exists, results in removing the wrong object. Clear the token of failed producers so that they harmlessly fall out when unregistered. Fixes: 6d7425f109d2 ("vfio: Register/unregister irq_bypass_producer") Reported-by: guomin chen <[email protected]> Tested-by: guomin chen <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-10-19	vfio/fsl-mc: fix the return of the uninitialized variable ret	Diana Craciun	1	-1/+1
	The vfio_fsl_mc_reflck_attach function may return, on success path, an uninitialized variable. Fix the problem by initializing the return variable to 0. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: f2ba7e8c947b ("vfio/fsl-mc: Added lock support in preparation for interrupt handling") Reported-by: Colin Ian King <[email protected]> Signed-off-by: Diana Craciun <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-10-19	iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built	Bartosz Golaszewski	1	-1/+1
	Since commit c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") dmar.c needs struct iommu_device to be selected. We can drop this dependency by not dereferencing struct iommu_device if IOMMU_API is not selected and by reusing the information stored in iommu->drhd->ignored instead. This fixes the following build error when IOMMU_API is not selected: drivers/iommu/intel/dmar.c: In function ‘free_iommu’: drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no member named ‘ops’ 1139 \| if (intel_iommu_enabled && iommu->iommu.ops) { ^ Fixes: c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") Signed-off-by: Bartosz Golaszewski <[email protected]> Acked-by: Lu Baolu <[email protected]> Acked-by: David Woodhouse <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2020-10-19	Merge tag 'drm-intel-next-fixes-2020-10-15' of ↵	Dave Airlie	2	-13/+6
	git://anongit.freedesktop.org/drm/drm-intel into drm-next - Set all unused color plane offsets to ~0xfff again (Ville) - Fix TGL DKL PHY DP vswing handling (Ville) Signed-off-by: Dave Airlie <[email protected]> From: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-10-19	drm/amd/display: Fix incorrect dsc force enable logic	Eryk Brol	1	-1/+1
	[Why] Missed removing a '!' which results in incorrect behavior [How] Remove the offending '!' Signed-off-by: Eryk Brol <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-10-19	Merge tag 'amd-drm-fixes-5.10-2020-10-14' of ↵	Dave Airlie	11	-20/+65
	git://people.freedesktop.org/~agd5f/linux into drm-next amd-drm-fixes-5.10-2020-10-14: amdgpu: - eDP fix - BACO fix - Kernel documentation fixes - SMU7 mclk fix - VCN1 hw bug workaround amdkfd: - kvfree vs kfree fix Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-10-18	Merge tag 'linux-kselftest-kunit-5.10-rc1' of ↵	Linus Torvalds	16	-110/+444
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more Kunit updates from Shuah Khan: - add Kunit to kernel_init() and remove KUnit from init calls entirely. This addresses the concern that Kunit would not work correctly during late init phase. - add a linker section where KUnit can put references to its test suites. This is the first step in transitioning to dispatching all KUnit tests from a centralized executor rather than having each as its own separate late_initcall. - add a centralized executor to dispatch tests rather than relying on late_initcall to schedule each test suite separately. Centralized execution is for built-in tests only; modules will execute tests when loaded. - convert bitfield test to use KUnit framework - Documentation updates for naming guidelines and how kunit_test_suite() works. - add test plan to KUnit TAP format * tag 'linux-kselftest-kunit-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: lib: kunit: Fix compilation test when using TEST_BIT_FIELD_COMPILE lib: kunit: add bitfield test conversion to KUnit Documentation: kunit: add a brief blurb about kunit_test_suite kunit: test: add test plan to KUnit TAP format init: main: add KUnit to kernel init kunit: test: create a single centralized executor for all tests vmlinux.lds.h: add linker section for KUnit test suites Documentation: kunit: Add naming guidelines
2020-10-18	Merge tag 'core-rcu-2020-10-12' of ↵	Linus Torvalds	57	-421/+1582
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: - Debugging for smp_call_function() - RT raw/non-raw lock ordering fixes - Strict grace periods for KASAN - New smp_call_function() torture test - Torture-test updates - Documentation updates - Miscellaneous fixes [ This doesn't actually pull the tag - I've dropped the last merge from the RCU branch due to questions about the series. - Linus ] * tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) smp: Make symbol 'csd_bug_count' static kernel/smp: Provide CSD lock timeout diagnostics smp: Add source and destination CPUs to __call_single_data rcu: Shrink each possible cpu krcp rcu/segcblist: Prevent useless GP start if no CBs to accelerate torture: Add gdb support rcutorture: Allow pointer leaks to test diagnostic code rcutorture: Hoist OOM registry up one level refperf: Avoid null pointer dereference when buf fails to allocate rcutorture: Properly synchronize with OOM notifier rcutorture: Properly set rcu_fwds for OOM handling torture: Add kvm.sh --help and update help message rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05 torture: Update initrd documentation rcutorture: Replace HTTP links with HTTPS ones locktorture: Make function torture_percpu_rwsem_init() static torture: document --allcpus argument added to the kvm.sh script rcutorture: Output number of elapsed grace periods rcutorture: Remove KCSAN stubs rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp() ...
2020-10-18	Merge tag 'mailbox-v5.10' of ↵	Linus Torvalds	8	-57/+506
	git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: - arm: implementation of mhu as a doorbell driver and conversion of dt-bindings to json-schema - mediatek: fix platform_get_irq error handling - bcm: convert tasklets to use new tasklet_setup api - core: fix race cause by hrtimer starting inappropriately * tag 'mailbox-v5.10' of git://git.linaro.org/landing-teams/working/fujitsu/integration: mailbox: avoid timer start from callback maiblox: mediatek: Fix handling of platform_get_irq() error mailbox: arm_mhu: Add ARM MHU doorbell driver mailbox: arm_mhu: Match only if compatible is "arm,mhu" dt-bindings: mailbox: add doorbell support to ARM MHU dt-bindings: mailbox : arm,mhu: Convert to Json-schema mailbox: bcm: convert tasklets to use new tasklet_setup() API
2020-10-18	Merge branch 'for-5.10' of ↵	Linus Torvalds	11	-26/+1118
	git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux Pull coccinelle updates from Julia Lawall. * 'for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux: coccinelle: api: add kfree_mismatch script coccinelle: iterators: Add for_each_child.cocci script scripts: coccicheck: Change default condition for parallelism scripts: coccicheck: Add quotes to improve portability coccinelle: api: kfree_sensitive: print memset position coccinelle: misc: add flexible_array.cocci script coccinelle: api: add kvmalloc script scripts: coccicheck: Change default value for parallelism coccinelle: misc: add excluded_middle.cocci script scripts: coccicheck: Improve error feedback when coccicheck fails coccinelle: api: update kzfree script to kfree_sensitive coccinelle: misc: add uninitialized_var.cocci script coccinelle: ifnullfree: add vfree(), kvfree*() functions coccinelle: api: add kobj_to_dev.cocci script coccinelle: add patch rule for dma_alloc_coherent scripts: coccicheck: Add chain mode to list of modes
2020-10-18	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	55	-439/+592
	Merge yet more updates from Andrew Morton: "Subsystems affected by this patch series: mm (memcg, migration, pagemap, gup, madvise, vmalloc), ia64, and misc" * emailed patches from Andrew Morton <[email protected]>: (31 commits) mm: remove duplicate include statement in mmu.c mm: remove the filename in the top of file comment in vmalloc.c mm: cleanup the gfp_mask handling in __vmalloc_area_node mm: remove alloc_vm_area x86/xen: open code alloc_vm_area in arch_gnttab_valloc xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv drm/i915: use vmap in i915_gem_object_map drm/i915: stop using kmap in i915_gem_object_map drm/i915: use vmap in shmem_pin_map zsmalloc: switch from alloc_vm_area to get_vm_area mm: allow a NULL fn callback in apply_to_page_range mm: add a vmap_pfn function mm: add a VM_MAP_PUT_PAGES flag for vmap mm: update the documentation for vfree mm/madvise: introduce process_madvise() syscall: an external memory hinting API pid: move pidfd_get_pid() to pid.c mm/madvise: pass mm to do_madvise selftests/vm: 10x speedup for hmm-tests binfmt_elf: take the mmap lock around find_extend_vma() mm/gup_benchmark: take the mmap lock around GUP ...
2020-10-18	Merge tag 'for-linus-5.10-rc1' of ↵	Linus Torvalds	14	-61/+91
	git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Improve support for non-glibc systems - Vector: Add support for scripting and dynamic tap devices - Various fixes for the vector networking driver - Various fixes for time travel mode * tag 'for-linus-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: vector: Add dynamic tap interfaces and scripting um: Clean up stacktrace dump um: Fix incorrect assumptions about max pid length um: Remove dead usage of TIF_IA32 um: Remove redundant NULL check um: change sigio_spinlock to a mutex um: time-travel: Return the sequence number in ACK messages um: time-travel: Fix IRQ handling in time_travel_handle_message() um: Allow static linking for non-glibc implementations um: Some fixes to build UML with musl um: vector: Use GFP_ATOMIC under spin lock um: Fix null pointer dereference in vector_user_bpf
2020-10-18	Merge tag 'for-linus-5.10-rc1-part2' of ↵	Linus Torvalds	7	-4/+25
	git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull more ubi and ubifs updates from Richard Weinberger: "UBI: - Correctly use kthread_should_stop in ubi worker UBIFS: - Fixes for memory leaks while iterating directory entries - Fix for a user triggerable error message - Fix for a space accounting bug in authenticated mode" * tag 'for-linus-5.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: journal: Make sure to not dirty twice for auth nodes ubifs: setflags: Don't show error message when vfs_ioc_setflags_prepare() fails ubifs: ubifs_jnl_change_xattr: Remove assertion 'nlink > 0' for host inode ubi: check kthread_should_stop() after the setting of task state ubifs: dent: Fix some potential memory leaks while iterating entries ubifs: xattr: Fix some potential memory leaks while iterating entries
2020-10-18	Merge tag 'for-linus-5.10-rc1' of ↵	Linus Torvalds	5	-21/+34
	git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull ubifs updates from Richard Weinberger: - Kernel-doc fixes - Fixes for memory leaks in authentication option parsing * tag 'for-linus-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: mount_ubifs: Release authentication resource in error handling path ubifs: Don't parse authentication mount options in remount process ubifs: Fix a memleak after dumping authentication mount options ubifs: Fix some kernel-doc warnings in tnc.c ubifs: Fix some kernel-doc warnings in replay.c ubifs: Fix some kernel-doc warnings in gc.c ubifs: Fix 'hash' kernel-doc warning in auth.c
2020-10-18	mm: remove duplicate include statement in mmu.c	Tian Tao	1	-1/+0
	asm/sections.h is included more than once, Remove the one that isn't necessary. Signed-off-by: Tian Tao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm: remove the filename in the top of file comment in vmalloc.c	Christoph Hellwig	1	-2/+0
	No point in having the filename inside the file. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm: cleanup the gfp_mask handling in __vmalloc_area_node	Christoph Hellwig	1	-12/+10
	Patch series "two small vmalloc cleanups". This patch (of 2): __vmalloc_area_node currently has four different gfp_t variables to just express this simple logic: - use the passed in mask, plus __GFP_NOWARN and __GFP_HIGHMEM (if suitable) for the underlying page allocation - use just the reclaim flags from the passed in mask plus __GFP_ZERO for allocating the page array Simplify this down to just use the pre-existing nested_gfp as-is for the page array allocation, and just the passed in gfp_mask for the page allocation, after conditionally ORing __GFP_HIGHMEM into it. This also makes the allocation warning a little more correct. Also initialize two variables at the time of declaration while touching this area. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm: remove alloc_vm_area	Christoph Hellwig	3	-59/+1
	All users are gone now. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	x86/xen: open code alloc_vm_area in arch_gnttab_valloc	Christoph Hellwig	1	-7/+20
	Replace the last call to alloc_vm_area with an open coded version using an iterator in struct gnttab_vm_area instead of the triple indirection magic in alloc_vm_area. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	xen/xenbus: use apply_to_page_range directly in xenbus_map_ring_pv	Christoph Hellwig	1	-14/+16
	Replacing alloc_vm_area with get_vm_area_caller + apply_page_range allows to fill put the phys_addr values directly instead of doing another loop over all addresses. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	drm/i915: use vmap in i915_gem_object_map	Christoph Hellwig	2	-68/+60
	i915_gem_object_map implements fairly low-level vmap functionality in a driver. Split it into two helpers, one for remapping kernel memory which can use vmap, and one for I/O memory that uses vmap_pfn. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here for the kernel memory case (and could be added to vmap using a flag if actually required). Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	drm/i915: stop using kmap in i915_gem_object_map	Christoph Hellwig	1	-5/+2
	kmap for !PageHighmem is just a convoluted way to say page_address, and kunmap is a no-op in that case. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	drm/i915: use vmap in shmem_pin_map	Christoph Hellwig	1	-58/+18
	shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and manual pte setup. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here (and could be added to vmap using a flag if actually required). Switch to use vmap, and use vfree to free both the vmalloc mapping and the page array, as well as dropping the references to each page. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	zsmalloc: switch from alloc_vm_area to get_vm_area	Christoph Hellwig	1	-2/+8
	Just manually pre-fault the PTEs using apply_to_page_range. Co-developed-by: Minchan Kim <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm: allow a NULL fn callback in apply_to_page_range	Christoph Hellwig	1	-7/+9
	Besides calling the callback on each page, apply_to_page_range also has the effect of pre-faulting all PTEs for the range. To support callers that only need the pre-faulting, make the callback optional. Based on a patch from Minchan Kim <[email protected]>. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm: add a vmap_pfn function	Christoph Hellwig	3	-0/+49
	Add a proper helper to remap PFNs into kernel virtual space so that drivers don't have to abuse alloc_vm_area and open coded PTE manipulation for it. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm: add a VM_MAP_PUT_PAGES flag for vmap	Christoph Hellwig	2	-2/+8
	Add a flag so that vmap takes ownership of the passed in page array. When vfree is called on such an allocation it will put one reference on each page, and free the page array itself. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Matthew Auld <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm: update the documentation for vfree	Matthew Wilcox (Oracle)	1	-10/+11
	Patch series "remove alloc_vm_area", v4. This series removes alloc_vm_area, which was left over from the big vmalloc interface rework. It is a rather arkane interface, basicaly the equivalent of get_vm_area + actually faulting in all PTEs in the allocated area. It was originally addeds for Xen (which isn't modular to start with), and then grew users in zsmalloc and i915 which seems to mostly qualify as abuses of the interface, especially for i915 as a random driver should not set up PTE bits directly. This patch (of 11): * Document that you can call vfree() on an address returned from vmap() * Remove the note about the minimum size -- the minimum size of a vmalloc allocation is one page * Add a Context: section * Fix capitalisation * Reword the prohibition on calling from NMI context to avoid a double negative Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	mm/madvise: introduce process_madvise() syscall: an external memory hinting API	Minchan Kim	22	-3/+117
	There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD\|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD\|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void iov_base; /* starting address / size_t iov_len; / number of bytes to be advised / }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/[email protected]/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/[email protected]/ [[email protected]: fix process_madvise build break for arm64] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix build error for mips of process_madvise] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix patch ordering issue] [[email protected]: fix arm64 whoops] [[email protected]: make process_madvise() vlen arg have type size_t, per Florian] [[email protected]: fix i386 build] [[email protected]: fix syscall numbering] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: madvise.c needs compat.h] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix mips build] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: remove duplicate header which is included twice] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: do not use helper functions for process_madvise] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: pidfd_get_pid() gained an argument] [[email protected]: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Brian Geffon <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Jann Horn <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Dias <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Sandeep Patil <[email protected]> Cc: SeongJae Park <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Tim Murray <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Florian Weimer <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-10-18	pid: move pidfd_get_pid() to pid.c	Minchan Kim	3	-19/+20
	process_madvise syscall needs pidfd_get_pid function to translate pidfd to pid so this patch move the function to kernel/pid.c. Suggested-by: Alexander Duyck <[email protected]> Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jann Horn <[email protected]> Cc: Brian Geffon <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Dias <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Sandeep Patil <[email protected]> Cc: SeongJae Park <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Tim Murray <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Florian Weimer <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>