blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2019-11-06	net/mlx5e: Use correct enum to determine uplink port	Dmytro Linkin	1	-1/+2
	For vlan push action, if eswitch flow source capability is enabled, flow source value compared with MLX5_VPORT_UPLINK enum, to determine uplink port. This lead to syndrome in dmesg if try to add vlan push action. For example: $ tc filter add dev vxlan0 ingress protocol ip prio 1 flower \ enc_dst_port 4789 \ action tunnel_key unset pipe \ action vlan push id 20 pipe \ action mirred egress redirect dev ens1f0_0 $ dmesg ... [ 2456.883693] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 5273): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa9c090) Use the correct enum value MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK. Fixes: bb204dcf39fe ("net/mlx5e: Determine source port properly for vlan push action") Signed-off-by: Dmytro Linkin <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-06	net/mlx5: DR, Fix memory leak during rule creation	Alex Vesker	1	-0/+2
	During rule creation hw_ste_arr was not freed. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-06	net/mlx5: DR, Fix memory leak in modify action destroy	Alex Vesker	1	-0/+1
	The rewrite data was no freed. Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality") Signed-off-by: Alex Vesker <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-06	net/mlx5e: Fix eswitch debug print of max fdb flow	Roi Dayan	1	-1/+1
	The value is already the calculation so remove the log prefix. Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Eli Britstein <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-11-06	HID: wacom: generic: Treat serial number and related fields as unsigned	Jason Gerecke	2	-4/+21
	The HID descriptors for most Wacom devices oddly declare the serial number and other related fields as signed integers. When these numbers are ingested by the HID subsystem, they are automatically sign-extended into 32-bit integers. We treat the fields as unsigned elsewhere in the kernel and userspace, however, so this sign-extension causes problems. In particular, the sign-extended tool ID sent to userspace as ABS_MISC does not properly match unsigned IDs used by xf86-input-wacom and libwacom. We introduce a function 'wacom_s32tou' that can undo the automatic sign extension performed by 'hid_snto32'. We call this function when processing the serial number and related fields to ensure that we are dealing with and reporting the unsigned form. We opt to use this method rather than adding a descriptor fixup in 'wacom_hid_usage_quirk' since it should be more robust in the face of future devices. Ref: https://github.com/linuxwacom/input-wacom/issues/134 Fixes: f85c9dc678 ("HID: wacom: generic: Support tool ID and additional tool types") CC: <[email protected]> # v4.10+ Signed-off-by: Jason Gerecke <[email protected]> Reviewed-by: Aaron Armstrong Skomra <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2019-11-06	drm/amdgpu: add navi14 PCI ID	Tianci.Yin	1	-0/+1
	Add the navi14 PCI device id. Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Tianci.Yin <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06	Revert "drm/amd/display: setting the DIG_MODE to the correct value."	Zhan Liu	1	-9/+0
	This reverts commit 385857adb8154563840e5b0f200254126618f464. Reason for revert: Root cause of this issue is found. The workaround is not needed anymore. Signed-off-by: Zhan Liu <[email protected]> Reviewed-by: Hersen Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06	drm/amd/display: Add ENGINE_ID_DIGD condition check for Navi14	Zhan Liu	1	-0/+5
	[Why] Navi10 has 6 PHY, but Navi14 only has 5 PHY, that is because there is no ENGINE_ID_DIGD in Navi14. Without this patch, many HDMI related issues (e.g. HDMI S3 resume failure, HDMI pink screen on boot) will be observed. [How] If "eng_id" is larger than ENGINE_ID_DIGD, then add "eng_id" by 1. Signed-off-by: Zhan Liu <[email protected]> Reviewed-by: Hersen Wu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06	drm/amdgpu: dont schedule jobs while in reset	Shirish S	1	-1/+4
	[Why] doing kthread_park()/unpark() from drm_sched_entity_fini while GPU reset is in progress defeats all the purpose of drm_sched_stop->kthread_park. If drm_sched_entity_fini->kthread_unpark() happens AFTER drm_sched_stop->kthread_park nothing prevents from another (third) thread to keep submitting job to HW which will be picked up by the unparked scheduler thread and try to submit to HW but fail because the HW ring is deactivated. [How] grab the reset lock before calling drm_sched_entity_fini() Signed-off-by: Shirish S <[email protected]> Suggested-by: Christian König <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06	drm/amdgpu/arcturus: properly set BANK_SELECT and FRAGMENT_SIZE	Alex Deucher	1	-0/+9
	These were not aligned for optimal performance for GPUVM. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-11-06	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	15	-79/+188
	Merge more fixes from Andrew Morton: "17 fixes" Mostly mm fixes and one ocfs2 locking fix. * emailed patches from Andrew Morton <[email protected]>: mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges mm/memory_hotplug: fix updating the node span scripts/gdb: fix debugging modules compiled with hot/cold partitioning mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly MAINTAINERS: update information for "MEMORY MANAGEMENT" dump_stack: avoid the livelock of the dump_lock zswap: add Vitaly to the maintainers list mm/page_alloc.c: ratelimit allocation failure warnings more aggressively mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo mm, vmstat: hide /proc/pagetypeinfo from normal users mm/mmu_notifiers: use the right return code for WARN_ON ocfs2: protect extent tree in ocfs2_prepare_inode_for_write() mm: thp: handle page cache THP correctly in PageTransCompoundMap mm, meminit: recalculate pcpu batch and high limits after init completes mm/gup_benchmark: fix MAP_HUGETLB case mm: memcontrol: fix NULL-ptr deref in percpu stats flush
2019-11-06	arm64: Do not mask out PTE_RDONLY in pte_same()	Catalin Marinas	1	-17/+0
	Following commit 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"), the PTE_RDONLY bit is no longer managed by set_pte_at() but built into the PAGE_* attribute definitions. Consequently, pte_same() must include this bit when checking two PTEs for equality. Remove the arm64-specific pte_same() function, practically reverting commit 747a70e60b72 ("arm64: Fix copy-on-write referencing in HugeTLB") Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Cc: <[email protected]> # 4.14.x- Cc: Will Deacon <[email protected]> Cc: Steve Capper <[email protected]> Reported-by: John Stultz <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2019-11-06	Merge branch 'net-bcmgenet-restore-internal-EPHY-support'	David S. Miller	3	-72/+110
	Doug Berger says: ==================== net: bcmgenet: restore internal EPHY support (part 2) This is a follow up to my previous submission (see [1]). The first commit provides what is intended to be a complete solution for the issues that can result from insufficient clocking of the MAC during reset of its state machines. It should be backported to the stable releases. It is intended to replace the partial solution of commit 1f515486275a ("net: bcmgenet: soft reset 40nm EPHYs before MAC init") which is reverted by the second commit of this series and should not be back- ported as noted in [2]. The third commit corrects a timing hazard with a polled PHY that can occur when the MAC resumes and also when a v3 internal EPHY is reset by the change in commit 25382b991d25 ("net: bcmgenet: reset 40nm EPHY on energy detect"). It is expected that commit 25382b991d25 be back- ported to stable first before backporting this commit. [1] https://lkml.org/lkml/2019/10/16/1706 [2] https://lkml.org/lkml/2019/10/31/749 ==================== Signed-off-by: David S. Miller <[email protected]>
2019-11-06	net: bcmgenet: reapply manual settings to the PHY	Doug Berger	1	-1/+4
	The phy_init_hw() function may reset the PHY to a configuration that does not match manual network settings stored in the phydev structure. If the phy state machine is polled rather than event driven this can create a timing hazard where the phy state machine might alter the settings stored in the phydev structure from the value read from the BMCR. This commit follows invocations of phy_init_hw() by the bcmgenet driver with invocations of the genphy_config_aneg() function to ensure that the BMCR is written to match the settings held in the phydev structure. This prevents the risk of manual settings being accidentally altered. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-06	Revert "net: bcmgenet: soft reset 40nm EPHYs before MAC init"	Doug Berger	3	-69/+73
	This reverts commit 1f515486275a08a17a2c806b844cca18f7de5b34. This commit improved the chances of the umac resetting cleanly by ensuring that the PHY was restored to its normal operation prior to resetting the umac. However, there were still cases when the PHY might not be driving a Tx clock to the umac during this window (e.g. when the PHY detects no link). The previous commit now ensures that the unimac receives clocks from the MAC during its reset window so this commit is no longer needed. This commit also has an unintended negative impact on the MDIO performance of the UniMAC MDIO interface because it is used before the MDIO interrupts are reenabled, so it should be removed. Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-06	net: bcmgenet: use RGMII loopback for MAC reset	Doug Berger	2	-2/+33
	As noted in commit 28c2d1a7a0bf ("net: bcmgenet: enable loopback during UniMAC sw_reset") the UniMAC must be clocked while sw_reset is asserted for its state machines to reset cleanly. The transmit and receive clocks used by the UniMAC are derived from the signals used on its PHY interface. The bcmgenet MAC can be configured to work with different PHY interfaces including MII, GMII, RGMII, and Reverse MII on internal and external interfaces. Unfortunately for the UniMAC, when configured for MII the Tx clock is always driven from the PHY which places it outside of the direct control of the MAC. The earlier commit enabled a local loopback mode within the UniMAC so that the receive clock would be derived from the transmit clock which addressed the observed issue with an external GPHY disabling it's Rx clock. However, when a Tx clock is not available this loopback is insufficient. This commit implements a workaround that leverages the fact that the MAC can reliably generate all of its necessary clocking by enterring the external GPHY RGMII interface mode with the UniMAC in local loopback during the sw_reset interval. Unfortunately, this has the undesirable side efect of the RGMII GTXCLK signal being driven during the same window. In most configurations this is a benign side effect as the signal is either not routed to a pin or is already expected to drive the pin. The one exception is when an external MII PHY is expected to drive the same pin with its TX_CLK output creating output driver contention. This commit exploits the IEEE 802.3 clause 22 standard defined isolate mode to force an external MII PHY to present a high impedance on its TX_CLK output during the window to prevent any contention at the pin. The MII interface is used internally with the 40nm internal EPHY which agressively disables its clocks for power savings leading to incomplete resets of the UniMAC and many instabilities observed over the years. The workaround of this commit is expected to put an end to those problems. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-06	Merge tag 'perf-urgent-for-mingo-5.4-20191105' of ↵	Thomas Gleixner	5	-38/+14
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf fixes from Arnaldo Carvalho de Melo: perf report/top: Jiri Olsa: - Fix time sorting for big numbers, i.e.: perf report -s time -F time,overhead --stdio was failing because the sort comparision routine was returning 'int' while that particular -s key was int64_t, fix it. perf scripting engines: Steven Rostedt (VMware): - Iterate on tep event arrays directly, fixing a bug when generating python/perl source code from a perf.data file with more than one tracepoint event. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-11-06	drm/atomic: fix self-refresh helpers crtc state dereference	Rob Clark	3	-9/+27
	drm_self_refresh_helper_update_avg_times() was incorrectly accessing the new incoming state after drm_atomic_helper_commit_hw_done(). But this state might have already been superceeded by an !nonblock atomic update resulting in dereferencing an already free'd crtc_state. TODO I think this will more or less do the right thing.. althought I'm not 100% sure if, for example, we enter psr in a nonblock commit, and then leave psr in a !nonblock commit that overtakes the completion of the nonblock commit. Not sure if this sort of scenario can happen in practice. But not crashing is better than crashing, so I guess we should either take this patch or rever the self-refresh helpers until Sean can figure out a better solution. Fixes: d4da4e33341c ("drm: Measure Self Refresh Entry/Exit times to avoid thrashing") Cc: Sean Paul <[email protected]> Signed-off-by: Rob Clark <[email protected]> [seanpaul fixed up some checkpatch warns] Signed-off-by: Sean Paul <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-11-06	configfs: calculate the depth of parent item	Honggang Li	1	-1/+1
	When create symbolic link, create_link should calculate the depth of the parent item. However, both the first and second parameters of configfs_get_target_path had been set to the target. Broken symbolic link created. $ targetcli ls / o- / ............................................................. [...] o- backstores .................................................. [...] \| o- block ...................................... [Storage Objects: 0] \| o- fileio ..................................... [Storage Objects: 2] \| \| o- vdev0 .......... [/dev/ramdisk1 (16.0MiB) write-thru activated] \| \| \| o- alua ....................................... [ALUA Groups: 1] \| \| \| o- default_tg_pt_gp ........... [ALUA state: Active/optimized] \| \| o- vdev1 .......... [/dev/ramdisk2 (16.0MiB) write-thru activated] \| \| o- alua ....................................... [ALUA Groups: 1] \| \| o- default_tg_pt_gp ........... [ALUA state: Active/optimized] \| o- pscsi ...................................... [Storage Objects: 0] \| o- ramdisk .................................... [Storage Objects: 0] o- iscsi ................................................ [Targets: 0] o- loopback ............................................. [Targets: 0] o- srpt ................................................. [Targets: 2] \| o- ib.e89a8f91cb3200000000000000000000 ............... [no-gen-acls] \| \| o- acls ................................................ [ACLs: 2] \| \| \| o- ib.e89a8f91cb3200000000000000000000 ........ [Mapped LUNs: 2] \| \| \| \| o- mapped_lun0 ............................. [BROKEN LUN LINK] \| \| \| \| o- mapped_lun1 ............................. [BROKEN LUN LINK] \| \| \| o- ib.e89a8f91cb3300000000000000000000 ........ [Mapped LUNs: 2] \| \| \| o- mapped_lun0 ............................. [BROKEN LUN LINK] \| \| \| o- mapped_lun1 ............................. [BROKEN LUN LINK] \| \| o- luns ................................................ [LUNs: 2] \| \| o- lun0 ...... [fileio/vdev0 (/dev/ramdisk1) (default_tg_pt_gp)] \| \| o- lun1 ...... [fileio/vdev1 (/dev/ramdisk2) (default_tg_pt_gp)] \| o- ib.e89a8f91cb3300000000000000000000 ............... [no-gen-acls] \| o- acls ................................................ [ACLs: 0] \| o- luns ................................................ [LUNs: 0] o- vhost ................................................ [Targets: 0] Fixes: e9c03af21cc7 ("configfs: calculate the symlink target only once") Signed-off-by: Honggang Li <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-11-06	ALSA: timer: Fix incorrectly assigned timer instance	Takashi Iwai	1	-3/+3
	The clean up commit 41672c0c24a6 ("ALSA: timer: Simplify error path in snd_timer_open()") unified the error handling code paths with the standard goto, but it introduced a subtle bug: the timer instance is stored in snd_timer_open() incorrectly even if it returns an error. This may eventually lead to UAF, as spotted by fuzzer. The culprit is the snd_timer_open() code checks the SNDRV_TIMER_IFLG_EXCLUSIVE flag with the common variable timeri. This variable is supposed to be the newly created instance, but we (ab-)used it for a temporary check before the actual creation of a timer instance. After that point, there is another check for the max number of instances, and it bails out if over the threshold. Before the refactoring above, it worked fine because the code returned directly from that point. After the refactoring, however, it jumps to the unified error path that stores the timeri variable in return -- even if it returns an error. Unfortunately this stored value is kept in the caller side (snd_timer_user_tselect()) in tu->timeri. This causes inconsistency later, as if the timer was successfully assigned. In this patch, we fix it by not re-using timeri variable but a temporary variable for testing the exclusive connection, so timeri remains NULL at that point. Fixes: 41672c0c24a6 ("ALSA: timer: Simplify error path in snd_timer_open()") Reported-and-tested-by: Tristan Madani <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2019-11-06	mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges	Johannes Weiner	1	-0/+9
	While upgrading from 4.16 to 5.2, we noticed these allocation errors in the log of the new kernel: SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0 node 0: slabs: 5, objs: 170, free: 0 slab_out_of_memory+1 ___slab_alloc+969 __slab_alloc+14 kmem_cache_alloc+346 inet_twsk_alloc+60 tcp_time_wait+46 tcp_fin+206 tcp_data_queue+2034 tcp_rcv_state_process+784 tcp_v6_do_rcv+405 __release_sock+118 tcp_close+385 inet_release+46 __sock_release+55 sock_close+17 __fput+170 task_work_run+127 exit_to_usermode_loop+191 do_syscall_64+212 entry_SYSCALL_64_after_hwframe+68 accompanied by an increase in machines going completely radio silent under memory pressure. One thing that changed since 4.16 is e699e2c6a654 ("net, mm: account sock objects to kmemcg"), which made these slab caches subject to cgroup memory accounting and control. The problem with that is that cgroups, unlike the page allocator, do not maintain dedicated atomic reserves. As a cgroup's usage hovers at its limit, atomic allocations - such as done during network rx - can fail consistently for extended periods of time. The kernel is not able to operate under these conditions. We don't want to revert the culprit patch, because it indeed tracks a potentially substantial amount of memory used by a cgroup. We also don't want to implement dedicated atomic reserves for cgroups. There is no point in keeping a fixed margin of unused bytes in the cgroup's memory budget to accomodate a consumer that is impossible to predict - we'd be wasting memory and get into configuration headaches, not unlike what we have going with min_free_kbytes. We do this for physical mem because we have to, but cgroups are an accounting game. Instead, account these privileged allocations to the cgroup, but let them bypass the configured limit if they have to. This way, we get the benefits of accounting the consumed memory and have it exert pressure on the rest of the cgroup, but like with the page allocator, we shift the burden of reclaimining on behalf of atomic allocations onto the regular allocations that can block. Link: http://lkml.kernel.org/r/[email protected] Fixes: e699e2c6a654 ("net, mm: account sock objects to kmemcg") Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> [4.18+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/memory_hotplug: fix updating the node span	David Hildenbrand	1	-0/+8
	We recently started updating the node span based on the zone span to avoid touching uninitialized memmaps. Currently, we will always detect the node span to start at 0, meaning a node can easily span too many pages. pgdat_is_empty() will still work correctly if all zones span no pages. We should skip over all zones without spanned pages and properly handle the first detected zone that spans pages. Unfortunately, in contrast to the zone span (/proc/zoneinfo), the node span cannot easily be inspected and tested. The node span gives no real guarantees when an architecture supports memory hotplug, meaning it can easily contain holes or span pages of different nodes. The node span is not really used after init on architectures that support memory hotplug. E.g., we use it in mm/memory_hotplug.c:try_offline_node() and in mm/kmemleak.c:kmemleak_scan(). These users seem to be fine. Link: http://lkml.kernel.org/r/[email protected] Fixes: 00d6c019b5bc ("mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()") Signed-off-by: David Hildenbrand <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Dan Williams <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	scripts/gdb: fix debugging modules compiled with hot/cold partitioning	Ilya Leoshkevich	1	-1/+2
	gcc's -freorder-blocks-and-partition option makes it group frequently and infrequently used code in .text.hot and .text.unlikely sections respectively. At least when building modules on s390, this option is used by default. gdb assumes that all code is located in .text section, and that .text section is located at module load address. With such modules this is no longer the case: there is code in .text.hot and .text.unlikely, and either of them might precede .text. Fix by explicitly telling gdb the addresses of code sections. It might be tempting to do this for all sections, not only the ones in the white list. Unfortunately, gdb appears to have an issue, when telling it about e.g. loadable .note.gnu.build-id section causes it to think that non-loadable .note.Linux section is loaded at address 0, which in turn causes NULL pointers to be resolved to bogus symbols. So keep using the white list approach for the time being. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Jan Kiszka <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly	Roman Gushchin	2	-3/+3
	page_cgroup_ino() doesn't return a valid memcg pointer for non-compound slab pages, because it depends on PgHead AND PgSlab flags to be set to determine the memory cgroup from the kmem_cache. It's correct for compound pages, but not for generic small pages. Those don't have PgHead set, so it ends up returning zero. Fix this by replacing the condition to PageSlab() && !PageTail(). Before this patch: [root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/ \| grep slab 0x0000000000000080 38 0 _______S___________________________________ slab After this patch: [root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/[email protected]/ \| grep slab 0x0000000000000080 147 0 _______S___________________________________ slab Also, hwpoison_filter_task() uses output of page_cgroup_ino() in order to filter error injection events based on memcg. So if page_cgroup_ino() fails to return memcg pointer, we just fail to inject memory error. Considering that hwpoison filter is for testing, affected users are limited and the impact should be marginal. [[email protected]: changelog additions] Link: http://lkml.kernel.org/r/[email protected] Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	MAINTAINERS: update information for "MEMORY MANAGEMENT"	Song Liu	1	-0/+4
	I was trying to find the mm tree in MAINTAINERS by searching "Morton". Unfortunately, I didn't find one. And I didn't even locate the MEMORY MANAGEMENT section quickly, because Andrew's name was not listed there. Thanks to Johannes who helped me find the mm tree. Let save other's time searching around by adding: M: Andrew Morton <[email protected]> T: git git://github.com/hnaz/linux-mm.git [[email protected]: add ozlabs.org quilt trees] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Song Liu <[email protected]> Acked-by: Andrew Morton <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	dump_stack: avoid the livelock of the dump_lock	Kevin Hao	1	-1/+6
	In the current code, we use the atomic_cmpxchg() to serialize the output of the dump_stack(), but this implementation suffers the thundering herd problem. We have observed such kind of livelock on a Marvell cn96xx board(24 cpus) when heavily using the dump_stack() in a kprobe handler. Actually we can let the competitors to wait for the releasing of the lock before jumping to atomic_cmpxchg(). This will definitely mitigate the thundering herd problem. Thanks Linus for the suggestion. [[email protected]: fix comment] Link: http://lkml.kernel.org/r/[email protected] Fixes: b58d977432c8 ("dump_stack: serialize the output from dump_stack()") Signed-off-by: Kevin Hao <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	zswap: add Vitaly to the maintainers list	Vitaly Wool	1	-0/+1
	Per conversation with Dan, add myself to the zswap MAINTAINERS list. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vitaly Wool <[email protected]> Acked-by: Dan Streetman <[email protected]> Acked-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/page_alloc.c: ratelimit allocation failure warnings more aggressively	Johannes Weiner	1	-6/+1
	While investigating a bug related to higher atomic allocation failures, we noticed the failure warnings positively drowning the console, and in our case trigger lockup warnings because of a serial console too slow to handle all that output. But even if we had a faster console, it's unclear what additional information the current level of repetition provides. Allocation failures happen for three reasons: The machine is OOM, the VM is failing to handle reasonable requests, or somebody is making unreasonable requests (and didn't acknowledge their opportunism with __GFP_NOWARN). Having the memory dump, a callstack, and the ratelimit stats on skipped failure warnings should provide enough information to let users/admins/developers know whether something is wrong and point them in the right direction for debugging, bpftracing etc. Limit allocation failure warnings to one spew every ten seconds. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y	Ville Syrjälä	1	-3/+4
	I got some khugepaged spew on a 32bit x86: BUG: sleeping function called from invalid context at include/linux/mmu_notifier.h:346 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: khugepaged INFO: lockdep is turned off. CPU: 1 PID: 25 Comm: khugepaged Not tainted 5.4.0-rc5-elk+ #206 Hardware name: System manufacturer P5Q-EM/P5Q-EM, BIOS 2203 07/08/2009 Call Trace: dump_stack+0x66/0x8e ___might_sleep.cold.96+0x95/0xa6 __might_sleep+0x2e/0x80 collapse_huge_page.isra.51+0x5ac/0x1360 khugepaged+0x9a9/0x20f0 kthread+0xf5/0x110 ret_from_fork+0x2e/0x38 Looks like it's due to CONFIG_HIGHPTE=y pte_offset_map()->kmap_atomic() vs. mmu_notifier_invalidate_range_start(). Let's do the naive approach and just reorder the two operations. Link: http://lkml.kernel.org/r/[email protected] Fixes: 810e24e009cf71 ("mm/mmu_notifiers: annotate with might_sleep()") Signed-off-by: Ville Syrjl <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm, vmstat: reduce zone->lock holding time by /proc/pagetypeinfo	Michal Hocko	1	-3/+20
	pagetypeinfo_showfree_print is called by zone->lock held in irq mode. This is not really nice because it blocks both any interrupts on that cpu and the page allocator. On large machines this might even trigger the hard lockup detector. Considering the pagetypeinfo is a debugging tool we do not really need exact numbers here. The primary reason to look at the outuput is to see how pageblocks are spread among different migratetypes and low number of pages is much more interesting therefore putting a bound on the number of pages on the free_list sounds like a reasonable tradeoff. The new output will simply tell [...] Node 6, zone Normal, type Movable >100000 >100000 >100000 >100000 41019 31560 23996 10054 3229 983 648 instead of Node 6, zone Normal, type Movable 399568 294127 221558 102119 41019 31560 23996 10054 3229 983 648 The limit has been chosen arbitrary and it is a subject of a future change should there be a need for that. While we are at it, also drop the zone lock after each free_list iteration which will help with the IRQ and page allocator responsiveness even further as the IRQ lock held time is always bound to those 100k pages. [[email protected]: tweak comment text, per David Hildenbrand] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Suggested-by: Andrew Morton <[email protected]> Reviewed-by: Waiman Long <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jann Horn <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Song Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm, vmstat: hide /proc/pagetypeinfo from normal users	Michal Hocko	1	-1/+1
	/proc/pagetypeinfo is a debugging tool to examine internal page allocator state wrt to fragmentation. It is not very useful for any other use so normal users really do not need to read this file. Waiman Long has noticed that reading this file can have negative side effects because zone->lock is necessary for gathering data and that a) interferes with the page allocator and its users and b) can lead to hard lockups on large machines which have very long free_list. Reduce both issues by simply not exporting the file to regular users. Link: http://lkml.kernel.org/r/[email protected] Fixes: 467c996c1e19 ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo") Signed-off-by: Michal Hocko <[email protected]> Reported-by: Waiman Long <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Waiman Long <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Jann Horn <[email protected]> Cc: Song Liu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/mmu_notifiers: use the right return code for WARN_ON	Jason Gunthorpe	1	-1/+1
	The return code from the op callback is actually in _ret, while the WARN_ON was checking ret which causes it to misfire. Link: http://lkml.kernel.org/r/[email protected] Fixes: 8402ce61bec2 ("mm/mmu_notifiers: check if mmu notifier callbacks are allowed to fail") Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()	Shuning Zhang	1	-44/+90
	When the extent tree is modified, it should be protected by inode cluster lock and ip_alloc_sem. The extent tree is accessed and modified in the ocfs2_prepare_inode_for_write, but isn't protected by ip_alloc_sem. The following is a case. The function ocfs2_fiemap is accessing the extent tree, which is modified at the same time. kernel BUG at fs/ocfs2/extent_map.c:475! invalid opcode: 0000 [#1] SMP Modules linked in: tun ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue [...] CPU: 16 PID: 14047 Comm: o2info Not tainted 4.1.12-124.23.1.el6uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42040600 10/19/2018 task: ffff88019487e200 ti: ffff88003daa4000 task.ti: ffff88003daa4000 RIP: ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] Call Trace: ocfs2_fiemap+0x1e3/0x430 [ocfs2] do_vfs_ioctl+0x155/0x510 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x18/0xd8 Code: 18 48 c7 c6 60 7f 65 a0 31 c0 bb e2 ff ff ff 48 8b 4a 40 48 8b 7a 28 48 c7 c2 78 2d 66 a0 e8 38 4f 05 00 e9 28 fe ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 bb 86 ff ff ff e9 13 fe ff ff 66 0f 1f RIP ocfs2_get_clusters_nocache.isra.11+0x390/0x550 [ocfs2] ---[ end trace c8aa0c8180e869dc ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled This issue can be reproduced every week in a production environment. This issue is related to the usage mode. If others use ocfs2 in this mode, the kernel will panic frequently. [[email protected]: coding style fixes] [Fix new warning due to unused function by removing said function - Linus ] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shuning Zhang <[email protected]> Reviewed-by: Junxiao Bi <[email protected]> Reviewed-by: Gang He <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm: thp: handle page cache THP correctly in PageTransCompoundMap	Yang Shi	3	-7/+23
	We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But the _mapcount < 0 check doesn't work for page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi <[email protected]> Reported-by: Gang Deng <[email protected]> Tested-by: Gang Deng <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> [4.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm, meminit: recalculate pcpu batch and high limits after init completes	Mel Gorman	1	-2/+8
	Deferred memory initialisation updates zone->managed_pages during the initialisation phase but before that finishes, the per-cpu page allocator (pcpu) calculates the number of pages allocated/freed in batches as well as the maximum number of pages allowed on a per-cpu list. As zone->managed_pages is not up to date yet, the pcpu initialisation calculates inappropriately low batch and high values. This increases zone lock contention quite severely in some cases with the degree of severity depending on how many CPUs share a local zone and the size of the zone. A private report indicated that kernel build times were excessive with extremely high system CPU usage. A perf profile indicated that a large chunk of time was lost on zone->lock contention. This patch recalculates the pcpu batch and high values after deferred initialisation completes for every populated zone in the system. It was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation workload -- allmodconfig and all available CPUs. mmtests configuration: config-workload-kernbench-max Configuration was modified to build on a fresh XFS partition. kernbench 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Duration User 39766.24 49221.79 Duration System 44298.10 13361.67 Duration Elapsed 519.11 388.87 The patch reduces system CPU usage by 69.86% and total build time by 26.65%. The variance of system CPU usage is also much reduced. Before, this was the breakdown of batch and high values over all zones was: 256 batch: 1 256 batch: 63 512 batch: 7 256 high: 0 256 high: 378 512 high: 42 512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After the patch: 256 batch: 1 768 batch: 63 256 high: 0 768 high: 378 [[email protected]: fix merge/linkage snafu] Link: http://lkml.kernel.org/r/[email protected]: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Qian Cai <[email protected]> Cc: <[email protected]> [4.1+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm/gup_benchmark: fix MAP_HUGETLB case	John Hubbard	1	-1/+1
	The MAP_HUGETLB ("-H" option) of gup_benchmark fails: $ sudo ./gup_benchmark -H mmap: Invalid argument This is because gup_benchmark.c is passing in a file descriptor to mmap(), but the fd came from opening up the /dev/zero file. This confuses the mmap syscall implementation, which thinks that, if the caller did not specify MAP_ANONYMOUS, then the file must be a huge page file. So it attempts to verify that the file really is a huge page file, as you can see here: ksys_mmap_pgoff() { if (!(flags & MAP_ANONYMOUS)) { retval = -EINVAL; if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) goto out_fput; /* THIS IS WHERE WE END UP */ else if (flags & MAP_HUGETLB) { ...proceed normally, /dev/zero is ok here... ...and of course is_file_hugepages() returns "false" for the /dev/zero file. The problem is that the user space program, gup_benchmark.c, really just wants anonymous memory here. The simplest way to get that is to pass MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this patch does. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Cc: Keith Busch <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	mm: memcontrol: fix NULL-ptr deref in percpu stats flush	Shakeel Butt	1	-6/+6
	__mem_cgroup_free() can be called on the failure path in mem_cgroup_alloc(). However memcg_flush_percpu_vmstats() and memcg_flush_percpu_vmevents() which are called from __mem_cgroup_free() access the fields of memcg which can potentially be null if called from failure path from mem_cgroup_alloc(). Indeed syzbot has reported the following crash: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 30393 Comm: syz-executor.1 Not tainted 5.4.0-rc2+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:memcg_flush_percpu_vmstats+0x4ae/0x930 mm/memcontrol.c:3436 Code: 05 41 89 c0 41 0f b6 04 24 41 38 c7 7c 08 84 c0 0f 85 5d 03 00 00 44 3b 05 33 d5 12 08 0f 83 e2 00 00 00 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 0f 85 91 03 00 00 48 8b 85 10 fe ff ff 48 8b b0 90 RSP: 0018:ffff888095c27980 EFLAGS: 00010206 RAX: 0000000000000012 RBX: ffff888095c27b28 RCX: ffffc90008192000 RDX: 0000000000040000 RSI: ffffffff8340fae7 RDI: 0000000000000007 RBP: ffff888095c27be0 R08: 0000000000000000 R09: ffffed1013f0da33 R10: ffffed1013f0da32 R11: ffff88809f86d197 R12: fffffbfff138b760 R13: dffffc0000000000 R14: 0000000000000090 R15: 0000000000000007 FS: 00007f5027170700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000710158 CR3: 00000000a7b18000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __mem_cgroup_free+0x1a/0x190 mm/memcontrol.c:5021 mem_cgroup_free mm/memcontrol.c:5033 [inline] mem_cgroup_css_alloc+0x3a1/0x1ae0 mm/memcontrol.c:5160 css_create kernel/cgroup/cgroup.c:5156 [inline] cgroup_apply_control_enable+0x44d/0xc40 kernel/cgroup/cgroup.c:3119 cgroup_mkdir+0x899/0x11b0 kernel/cgroup/cgroup.c:5401 kernfs_iop_mkdir+0x14d/0x1d0 fs/kernfs/dir.c:1124 vfs_mkdir+0x42e/0x670 fs/namei.c:3807 do_mkdirat+0x234/0x2a0 fs/namei.c:3830 __do_sys_mkdir fs/namei.c:3846 [inline] __se_sys_mkdir fs/namei.c:3844 [inline] __x64_sys_mkdir+0x5c/0x80 fs/namei.c:3844 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixing this by moving the flush to mem_cgroup_free as there is no need to flush anything if we see failure in mem_cgroup_alloc(). Link: http://lkml.kernel.org/r/[email protected] Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg") Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg") Signed-off-by: Shakeel Butt <[email protected]> Reported-by: [email protected] Reviewed-by: Roman Gushchin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-06	MAINTAINERS: update Cavium ThunderX2 maintainers	Jayachandran C	2	-1/+4
	jnair is no longer at caviumnetworks.com (or at marvell.com). This also means that Cavium ThunderX2 will now be maintained by Robert. This is probably a good time to map various email addresses used for my patches to my personal email ID, update .mailmap to do this. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jayachandran C <[email protected]> Acked-by: Robert Richter <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
2019-11-06	Merge tag 'stm32-dt-for-v5.4-fixes-2' of ↵	Olof Johansson	2	-13/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32 into arm/fixes STM32 DT fixes for v5.4, round 2 Highlights: ----------- Fixes for STM32MP157: -Fix CAN RAM mapping -Change stmfx pinctrl definition for joystick and camera. Due to stmfx pinctrl fix done in v5.4-rc cycle, camera and joystick were no longer functional. * tag 'stm32-dt-for-v5.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32: ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1 ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1 ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157c ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Olof Johansson <[email protected]>
2019-11-06	ASoC: SOF: topology: Fix bytes control size checks	Dragos Tarcatu	1	-5/+6
	When using the example SOF amp widget topology, KASAN dumps this when the AMP bytes kcontrol gets loaded: [ 9.579548] BUG: KASAN: slab-out-of-bounds in sof_control_load+0x8cc/0xac0 [snd_sof] [ 9.588194] Write of size 40 at addr ffff8882314559dc by task systemd-udevd/2411 Fix that by rejecting the topology if the bytes data size > max_size Fixes: 311ce4fe7637d ("ASoC: SOF: Add support for loading topologies") Reviewed-by: Jaska Uimonen <[email protected]> Reviewed-by: Ranjani Sridharan <[email protected]> Signed-off-by: Dragos Tarcatu <[email protected]> Signed-off-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-11-06	ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1	Amelie Delaunay	1	-1/+0
	Pins used for joystick are all configured as input. "push-pull" is not a valid setting for an input pin. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <[email protected]> Signed-off-by: Amelie Delaunay <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06	ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1	Amelie Delaunay	1	-10/+2
	"push-pull" configuration is now fully handled by the gpiolib and the STMFX pinctrl driver. There is no longer need to declare a pinctrl group to only configure "push-pull" setting for the line. It is done directly by the gpiolib. Fixes: a502b343ebd0 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: Alexandre Torgue <[email protected]> Signed-off-by: Amelie Delaunay <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06	ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157c	Christophe Roullier	1	-2/+2
	Split the 10Kbytes CAN message RAM to be able to use simultaneously FDCAN1 and FDCAN2 instances. First 5Kbytes are allocated to FDCAN1 and last 5Kbytes are used for FDCAN2. To do so, set the offset to 0x1400 in mram-cfg for FDCAN2. Fixes: d44d6e021301 ("ARM: dts: stm32: change CAN RAM mapping on stm32mp157c") Signed-off-by: Christophe Roullier <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-06	ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157	Patrice Chotard	1	-4/+4
	Relax qspi pins slew-rate to minimize peak currents. Fixes: 844030057339 ("ARM: dts: stm32: add flash nor support on stm32mp157c eval board") Signed-off-by: Patrice Chotard <[email protected]> Signed-off-by: Alexandre Torgue <[email protected]>
2019-11-05	Documentation: TLS: Add missing counter description	Tariq Toukan	1	-0/+4
	Add TLS TX counter description for the handshake retransmitted packets that triggers the resync procedure then skip it, going into the regular transmit flow. Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	NFC: fdp: fix incorrect free object	Pan Bian	1	-1/+1
	The address of fw_vsc_cfg is on stack. Releasing it with devm_kfree() is incorrect, which may result in a system crash or other security impacts. The expected object to free is *fw_vsc_cfg. Signed-off-by: Pan Bian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net: prevent load/store tearing on sk->sk_stamp	Eric Dumazet	1	-2/+2
	Add a couple of READ_ONCE() and WRITE_ONCE() to prevent load-tearing and store-tearing in sock_read_timestamp() and sock_write_timestamp() This might prevent another KCSAN report. Fixes: 3a0ed3e96197 ("sock: Make sock->sk_stamp thread-safe") Signed-off-by: Eric Dumazet <[email protected]> Cc: Deepa Dinamani <[email protected]> Acked-by: Deepa Dinamani <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net: qualcomm: rmnet: Fix potential UAF when unregistering	Sean Tranchetti	1	-2/+2
	During the exit/unregistration process of the RmNet driver, the function rmnet_unregister_real_device() is called to handle freeing the driver's internal state and removing the RX handler on the underlying physical device. However, the order of operations this function performs is wrong and can lead to a use after free of the rmnet_port structure. Before calling netdev_rx_handler_unregister(), this port structure is freed with kfree(). If packets are received on any RmNet devices before synchronize_net() completes, they will attempt to use this already-freed port structure when processing the packet. As such, before cleaning up any other internal state, the RX handler must be unregistered in order to guarantee that no further packets will arrive on the device. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	net/tls: fix sk_msg trim on fallback to copy mode	Jakub Kicinski	2	-8/+21
	sk_msg_trim() tries to only update curr pointer if it falls into the trimmed region. The logic, however, does not take into the account pointer wrapping that sk_msg_iter_var_prev() does nor (as John points out) the fact that msg->sg is a ring buffer. This means that when the message was trimmed completely, the new curr pointer would have the value of MAX_MSG_FRAGS - 1, which is neither smaller than any other value, nor would it actually be correct. Special case the trimming to 0 length a little bit and rework the comparison between curr and end to take into account wrapping. This bug caused the TLS code to not copy all of the message, if zero copy filled in fewer sg entries than memcopy would need. Big thanks to Alexander Potapenko for the non-KMSAN reproducer. v2: - take into account that msg->sg is a ring buffer (John). Link: https://lore.kernel.org/netdev/[email protected]/ (v1) Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Reported-by: [email protected] Reported-by: [email protected] Co-developed-by: John Fastabend <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-11-05	mlx4_core: fix wrong comment about the reason of subtract one from the max_cqes	Dotan Barak	1	-2/+1
	The reason for the pre-allocation of one CQE is to enable resizing of the CQ. Fix comment accordingly. Signed-off-by: Dotan Barak <[email protected]> Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Vladimir Sokolovsky <[email protected]> Signed-off-by: Yuval Shaia <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>