aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-08-12mm/hugetlb: add mempolicy check in the reservation routineMuchun Song3-6/+34
In the reservation routine, we only check whether the cpuset meets the memory allocation requirements. But we ignore the mempolicy of MPOL_BIND case. If someone mmap hugetlb succeeds, but the subsequent memory allocation may fail due to mempolicy restrictions and receives the SIGBUS signal. This can be reproduced by the follow steps. 1) Compile the test case. cd tools/testing/selftests/vm/ gcc map_hugetlb.c -o map_hugetlb 2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the system. Each node will pre-allocate one huge page. echo 2 > /proc/sys/vm/nr_hugepages 3) Run test case(mmap 4MB). We receive the SIGBUS signal. numactl --membind=3D0 ./map_hugetlb 4 With this patch applied, the mmap will fail in the step 3) and throw "mmap: Cannot allocate memory". [[email protected]: include sched.h for `current'] Reported-by: Jianchao Guo <[email protected]> Suggested-by: Michal Hocko <[email protected]> Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: David Rientjes <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Baoquan He <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12kselftests: cgroup: add perpcu memory accounting testRoman Gushchin1-1/+69
Add a simple test to check the percpu memory accounting. The test creates a cgroup tree with 1000 child cgroups and checks values of memory.current and memory.stat::percpu. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tobin C. Harding <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Waiman Long <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Bixuan Cui <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm: memcg: charge memcg percpu memory to the parent cgroupRoman Gushchin1-4/+16
Memory cgroups are using large chunks of percpu memory to store vmstat data. Yet this memory is not accounted at all, so in the case when there are many (dying) cgroups, it's not exactly clear where all the memory is. Because the size of memory cgroup internal structures can dramatically exceed the size of object or page which is pinning it in the memory, it's not a good idea to simply ignore it. It actually breaks the isolation between cgroups. Let's account the consumed percpu memory to the parent cgroup. [[email protected]: add WARN_ON_ONCE()s, per Johannes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Dennis Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tobin C. Harding <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Waiman Long <[email protected]> Cc: Bixuan Cui <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm: memcg/percpu: per-memcg percpu memory statisticsRoman Gushchin4-1/+25
Percpu memory can represent a noticeable chunk of the total memory consumption, especially on big machines with many CPUs. Let's track percpu memory usage for each memcg and display it in memory.stat. A percpu allocation is usually scattered over multiple pages (and nodes), and can be significantly smaller than a page. So let's add a byte-sized counter on the memcg level: MEMCG_PERCPU_B. Byte-sized vmstat infra created for slabs can be perfectly reused for percpu case. [[email protected]: v3] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Dennis Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tobin C. Harding <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Waiman Long <[email protected]> Cc: Bixuan Cui <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12mm: memcg/percpu: account percpu memory to memory cgroupsRoman Gushchin5-40/+246
Percpu memory is becoming more and more widely used by various subsystems, and the total amount of memory controlled by the percpu allocator can make a good part of the total memory. As an example, bpf maps can consume a lot of percpu memory, and they are created by a user. Also, some cgroup internals (e.g. memory controller statistics) can be quite large. On a machine with many CPUs and big number of cgroups they can consume hundreds of megabytes. So the lack of memcg accounting is creating a breach in the memory isolation. Similar to the slab memory, percpu memory should be accounted by default. To implement the perpcu accounting it's possible to take the slab memory accounting as a model to follow. Let's introduce two types of percpu chunks: root and memcg. What makes memcg chunks different is an additional space allocated to store memcg membership information. If __GFP_ACCOUNT is passed on allocation, a memcg chunk should be be used. If it's possible to charge the corresponding size to the target memory cgroup, allocation is performed, and the memcg ownership data is recorded. System-wide allocations are performed using root chunks, so there is no additional memory overhead. To implement a fast reparenting of percpu memory on memcg removal, we don't store mem_cgroup pointers directly: instead we use obj_cgroup API, introduced for slab accounting. [[email protected]: fix CONFIG_MEMCG_KMEM=n build errors and warning] [[email protected]: move unreachable code, per Roman] [[email protected]: mm/percpu: fix 'defined but not used' warning] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Bixuan Cui <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Dennis Zhou <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tobin C. Harding <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Waiman Long <[email protected]> Cc: Bixuan Cui <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12percpu: return number of released bytes from pcpu_free_area()Roman Gushchin1-3/+10
Patch series "mm: memcg accounting of percpu memory", v3. This patchset adds percpu memory accounting to memory cgroups. It's based on the rework of the slab controller and reuses concepts and features introduced for the per-object slab accounting. Percpu memory is becoming more and more widely used by various subsystems, and the total amount of memory controlled by the percpu allocator can make a good part of the total memory. As an example, bpf maps can consume a lot of percpu memory, and they are created by a user. Also, some cgroup internals (e.g. memory controller statistics) can be quite large. On a machine with many CPUs and big number of cgroups they can consume hundreds of megabytes. So the lack of memcg accounting is creating a breach in the memory isolation. Similar to the slab memory, percpu memory should be accounted by default. Percpu allocations by their nature are scattered over multiple pages, so they can't be tracked on the per-page basis. So the per-object tracking introduced by the new slab controller is reused. The patchset implements charging of percpu allocations, adds memcg-level statistics, enables accounting for percpu allocations made by memory cgroup internals and provides some basic tests. To implement the accounting of percpu memory without a significant memory and performance overhead the following approach is used: all accounted allocations are placed into a separate percpu chunk (or chunks). These chunks are similar to default chunks, except that they do have an attached vector of pointers to obj_cgroup objects, which is big enough to save a pointer for each allocated object. On the allocation, if the allocation has to be accounted (__GFP_ACCOUNT is passed, the allocating process belongs to a non-root memory cgroup, etc), the memory cgroup is getting charged and if the maximum limit is not exceeded the allocation is performed using a memcg-aware chunk. Otherwise -ENOMEM is returned or the allocation is forced over the limit, depending on gfp (as any other kernel memory allocation). The memory cgroup information is saved in the obj_cgroup vector at the corresponding offset. On the release time the memcg information is restored from the vector and the cgroup is getting uncharged. Unaccounted allocations (at this point the absolute majority of all percpu allocations) are performed in the old way, so no additional overhead is expected. To avoid pinning dying memory cgroups by outstanding allocations, obj_cgroup API is used instead of directly saving memory cgroup pointers. obj_cgroup is basically a pointer to a memory cgroup with a standalone reference counter. The trick is that it can be atomically swapped to point at the parent cgroup, so that the original memory cgroup can be released prior to all objects, which has been charged to it. Because all charges and statistics are fully recursive, it's perfectly correct to uncharge the parent cgroup instead. This scheme is used in the slab memory accounting, and percpu memory can just follow the scheme. This patch (of 5): To implement accounting of percpu memory we need the information about the size of freed object. Return it from pcpu_free_area(). Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Tobin C. Harding <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Waiman Long <[email protected]> cC: Michal Koutný[email protected]> Cc: Bixuan Cui <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12fix breakage in do_rmdir()Al Viro1-1/+1
syzbot reported and bisected a use-after-free due to the recent init cleanups. The putname() should happen only after we'd *not* branched to retry, same as it's done in do_unlinkat(). Reported-by: [email protected] Fixes: e24ab0ef689d "fs: push the getname from do_rmdir into the callers" Cc: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12ALSA: hda/hdmi: Use force connectivity quirk on another HP desktopKai-Heng Feng1-0/+1
There's another HP desktop has buggy BIOS which flags the Port Connectivity bit as no connection. Apply force connectivity quirk to enable DP/HDMI audio. Signed-off-by: Kai-Heng Feng <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2020-08-12NFS: Fix flexfiles read failoverTrond Myklebust3-16/+40
The current mirrored read failover code is correctly resetting the mirror index between failed reads, however it is not able to actually flip the RPC call over to the next RPC client. The end result is that we keep resending the RPC call to the same client over and over. The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a new RPC call, but we need to add the ability to pass in a mirror index so that we always retry the next mirror in the list. Fixes: 166bd5b889ac ("pNFS/flexfiles: Fix layoutstats handling during read failovers") Signed-off-by: Trond Myklebust <[email protected]>
2020-08-12io_uring: fail poll arm on queue proc failureJens Axboe1-1/+1
Check the ipt.error value, it must have been either cleared to zero or set to another error than the default -EINVAL if we don't go through the waitqueue proc addition. Just give up on poll at that point and return failure, this will fallback to async work. io_poll_add() doesn't suffer from this failure case, as it returns the error value directly. Cc: [email protected] # v5.7+ Reported-by: [email protected] Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-08-12fs: nfs: delete repeated words in commentsRandy Dunlap2-2/+2
Drop duplicated words {the, and} in comments. Signed-off-by: Randy Dunlap <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Anna Schumaker <[email protected]> Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
2020-08-12rpc_pipefs: convert comma to semicolonXu Wang1-1/+1
Replace a comma between expression statements by a semicolon. Signed-off-by: Xu Wang <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2020-08-12nfs: Fix getxattr kernel panic and memory overflowJeffrey Mitchell2-3/+5
Move the buffer size check to decode_attr_security_label() before memcpy() Only call memcpy() if the buffer is large enough Fixes: aa9c2669626c ("NFS: Client implementation of Labeled-NFS") Signed-off-by: Jeffrey Mitchell <[email protected]> [Trond: clean up duplicate test of label->len != 0] Signed-off-by: Trond Myklebust <[email protected]>
2020-08-12NFS: Don't return layout segments that are in useTrond Myklebust1-19/+15
If the NFS_LAYOUT_RETURN_REQUESTED flag is set, we want to return the layout as soon as possible, meaning that the affected layout segments should be marked as invalid, and should no longer be in use for I/O. Fixes: f0b429819b5f ("pNFS: Ignore non-recalled layouts in pnfs_layout_need_return()") Cc: [email protected] # v4.19+ Signed-off-by: Trond Myklebust <[email protected]>
2020-08-12NFS: Don't move layouts to plh_return_segs list while in useTrond Myklebust1-11/+1
If the layout segment is still in use for a read or a write, we should not move it to the layout plh_return_segs list. If we do, we can end up returning the layout while I/O is still in progress. Fixes: e0b7d420f72a ("pNFS: Don't discard layout segments that are marked for return") Cc: [email protected] # v4.19+ Signed-off-by: Trond Myklebust <[email protected]>
2020-08-12NFS: Add layout segment info to pnfs read/write/commit tracepointsTrond Myklebust1-6/+36
Allow the pnfs I/O tracepoints to trace which layout segment is being used. Signed-off-by: Trond Myklebust <[email protected]>
2020-08-12parisc: Implement __smp_store_release and __smp_load_acquire barriersJohn David Anglin1-0/+61
This patch implements the __smp_store_release and __smp_load_acquire barriers using ordered stores and loads. This avoids the sync instruction present in the generic implementation. Cc: <[email protected]> # 4.14+ Signed-off-by: Dave Anglin <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2020-08-12perf bench: Fix a couple of spelling mistakes in options textColin Ian King1-2/+2
There are a couple of spelling mistakes in the text. Fix these. Signed-off-by: Colin King <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12perf bench numa: Fix benchmark namesAlexander Gordeev1-18/+17
Standard benchmark names let users know the tests specifics. For example "2x1-bw-process" name tells that two processes one thread each are run and the RAM bandwidth is measured. Several benchmarks names do not correspond to their actual running configuration. Fix that and also some whitespace and comment inconsistencies. Signed-off-by: Alexander Gordeev <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/6b6f2084f132ee8e9203dc7c32f9deb209b87a68.1597004831.git.agordeev@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12perf bench numa: Fix number of processes in "2x3-convergence" testAlexander Gordeev1-1/+1
Signed-off-by: Alexander Gordeev <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/d949f5f48e17fc816f3beecf8479f1b2480345e4.1597004831.git.agordeev@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12tools headers UAPI: Sync kvm.h headers with the kernel sourcesArnaldo Carvalho de Melo1-0/+4
To pick the changes in: 3edd68399dc1 ("KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support") 1aa561b1a4c0 ("kvm: x86: Add "last CPU" to some KVM_EXIT information") 23a60f834406 ("s390/kvm: diagnose 0x318 sync and reset") That do not result in any change in tooling, as the additions are not being used in any table generator. This silences these perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: Adrian Hunter <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Collin Walling <[email protected]> Cc: Jim Mattson <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mohammed Gamal <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12tools include UAPI: Sync linux/vhost.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+2
To get the changes in: 25abc060d282 ("vhost-vdpa: support IOTLB batching hints") This doesn't result in any changes in tooling, no new ioctls to be picked up by the id->string table generators, etc. Silencing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h' diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h Cc: Adrian Hunter <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12tools headers kvm s390: Sync headers with the kernel sourcesArnaldo Carvalho de Melo1-2/+5
To pick the changes in: 23a60f834406 ("s390/kvm: diagnose 0x318 sync and reset") None of them trigger any changes in tooling, this time this is just to silence these perf build warnings: Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h' diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h Cc: Adrian Hunter <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Collin Walling <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12perf trace beauty: Use the autogenerated protocol family tableArnaldo Carvalho de Melo3-8/+13
That helps us not to lose new protocol families when they are introduced, replacing that hardcoded, dated family->string table. To recap what this allows us to do: # perf trace -e syscalls:sys_enter_socket/max-stack=10/ --filter=family==INET --max-events=1 0.000 fetchmail/41097 syscalls:sys_enter_socket(family: INET, type: DGRAM|CLOEXEC|NONBLOCK, protocol: IP) __GI___socket (inlined) reopen (/usr/lib64/libresolv-2.31.so) send_dg (/usr/lib64/libresolv-2.31.so) __res_context_send (/usr/lib64/libresolv-2.31.so) __GI___res_context_query (inlined) __GI___res_context_search (inlined) _nss_dns_gethostbyname4_r (/usr/lib64/libnss_dns-2.31.so) gaih_inet.constprop.0 (/usr/lib64/libc-2.31.so) __GI_getaddrinfo (inlined) [0x15cb2] (/usr/bin/fetchmail) # More work is still needed to allow for the more natura strace-like syscall name usage instead of the trace event name: # perf trace -e socket/max-stack=10,family==INET/ --max-events=1 I.e. to allow for modifiers to follow the syscall name and for logical expressions to be accepted as filters to use with that syscall, be it as trace event filters or BPF based ones. Using -v we can see how the trace event filter is built: # perf trace -v -e syscalls:sys_enter_socket/call-graph=dwarf/ --filter=family==INET --max-events=2 <SNIP> New filter for syscalls:sys_enter_socket: (family==0x2) && (common_pid != 41384 && common_pid != 2836) <SNIP> $ tools/perf/trace/beauty/socket.sh | grep -w 2 [2] = "INET", $ Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12perf trace beauty: Add script to autogenerate socket families tableArnaldo Carvalho de Melo2-0/+466
To use with 'perf trace', to convert the protocol families to strings, e.g: $ tools/perf/trace/beauty/socket.sh static const char *socket_families[] = { [0] = "UNSPEC", [1] = "LOCAL", [2] = "INET", [3] = "AX25", [4] = "IPX", [5] = "APPLETALK", [6] = "NETROM", [7] = "BRIDGE", [8] = "ATMPVC", [9] = "X25", [10] = "INET6", [11] = "ROSE", [12] = "DECnet", [13] = "NETBEUI", [14] = "SECURITY", [15] = "KEY", [16] = "NETLINK", [17] = "PACKET", [18] = "ASH", [19] = "ECONET", [20] = "ATMSVC", [21] = "RDS", [22] = "SNA", [23] = "IRDA", [24] = "PPPOX", [25] = "WANPIPE", [26] = "LLC", [27] = "IB", [28] = "MPLS", [29] = "CAN", [30] = "TIPC", [31] = "BLUETOOTH", [32] = "IUCV", [33] = "RXRPC", [34] = "ISDN", [35] = "PHONET", [36] = "IEEE802154", [37] = "CAIF", [38] = "ALG", [39] = "NFC", [40] = "VSOCK", [41] = "KCM", [42] = "QIPCRTR", [43] = "SMC", [44] = "XDP", }; $ This uses a copy of include/linux/socket.h that is kept in a directory to be used just for these table generation scripts and for checking if the kernel has a new file that maybe gets something new for these tables. This allows us to: - Avoid accessing files outside tools/, in the kernel sources, that may be changed in unexpected ways and thus break these scripts. - Notice when those files change and thus check if the changes don't break those scripts, update them to automatically get the new definitions, a new socket family, for instance. - Not add then to the tools/include/ where it may end up used while building the tools and end up requiring dragging yet more stuff from the kernel or plain break the build in some of the myriad environments where perf may be built. This will replace the previous static array in tools/perf/ that was dated and was already missing the AF_KCM, AF_QIPCRTR, AF_SMC and AF_XDP families. The next cset will wire this up to the perf build process. At some point this must be made into a library to be used in places such as libtraceevent, bpftrace, etc. Cc: Adrian Hunter <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-08-12rtc: pcf2127: fix alarm handlingAlexandre Belloni1-18/+19
Fix multiple issues when handling alarms: - Use threaded interrupt to avoid scheduling when atomic - Stop matching on week day as it may not be set correctly - Avoid parsing the DT interrupt and use what is provided by the i2c or spi subsystem - Avoid returning IRQ_NONE in case of error in the interrupt handler - Never write WDTF as specified in the datasheet - Set uie_unsupported, as for the pcf85063, setting alarms every seconds is not working correctly and confuses the RTC. Signed-off-by: Alexandre Belloni <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-08-12rtc: pcf2127: add alarm supportLiam Beguin1-0/+134
Add alarm support for the pcf2127 RTC chip family. Tested on pca2129. Signed-off-by: Liam Beguin <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]> Reviewed-by: Bruno Thomsen <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-08-12rtc: pcf2127: add pca2129 device idLiam Beguin2-0/+5
The PCA2129 is the automotive grade version of the PCF2129. add it to the list of compatibles. Signed-off-by: Liam Beguin <[email protected]> Signed-off-by: Alexandre Belloni <[email protected]> Reviewed-by: Bruno Thomsen <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-08-12genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()Guenter Roeck1-2/+6
rearm_wake_irq() does not unlock the irq descriptor if the interrupt is not suspended or if wakeup is not enabled on it. Restucture the exit conditions so the unlock is always ensured. Fixes: 3a79bc63d9075 ("PCI: irq: Introduce rearm_wake_irq()") Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2020-08-12btrfs: trim: fix underflow in trim length to prevent access beyond device ↵Qu Wenruo3-0/+20
boundary [BUG] The following script can lead to tons of beyond device boundary access: mkfs.btrfs -f $dev -b 10G mount $dev $mnt trimfs $mnt btrfs filesystem resize 1:-1G $mnt trimfs $mnt [CAUSE] Since commit 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit"), we try to avoid trimming ranges that's already trimmed. So we check device->alloc_state by finding the first range which doesn't have CHUNK_TRIMMED and CHUNK_ALLOCATED not set. But if we shrunk the device, that bits are not cleared, thus we could easily got a range starts beyond the shrunk device size. This results the returned @start and @end are all beyond device size, then we call "end = min(end, device->total_bytes -1);" making @end smaller than device size. Then finally we goes "len = end - start + 1", totally underflow the result, and lead to the beyond-device-boundary access. [FIX] This patch will fix the problem in two ways: - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device This is the root fix - Add extra safety check when trimming free device extents We check and warn if the returned range is already beyond current device. Link: https://github.com/kdave/btrfs-progs/issues/282 Fixes: 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") CC: [email protected] # 5.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-08-12ALSA: hda/realtek - Fix unused variable warningTakashi Iwai1-2/+0
The previous fix forgot to remove the unused variable that triggers a compile warning now: sound/pci/hda/patch_realtek.c: In function 'alc285_fixup_hp_gpio_led': sound/pci/hda/patch_realtek.c:4163:19: warning: unused variable 'spec' [-Wunused-variable] Fix it. Fixes: 404690649e6a ("ALSA: hda - reverse the setting value in the micmute_led_set") Reported-by: Stephen Rothwell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2020-08-12drm/ttm: revert "drm/ttm: make TT creation purely optional v3"Christian König4-22/+31
This reverts commit 2ddef17678bc2ea1d20517dd2b4ed4aa967ffa8b. As it turned out VMWGFX needs a much wider audit to fix this. Signed-off-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-08-12Merge branch 'vmwgfx-next-5.9' of ↵Dave Airlie6-30/+15
git://people.freedesktop.org/~sroland/linux into drm-next The drm_mode_config_reset patches are very important fixing a recently introduced kernel crash, the others fix various older issues which are a bit less serious in practice. Signed-off-by: Dave Airlie <[email protected]> From: "Roland Scheidegger (VMware)" <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-08-12parisc: mask out enable and reserved bits from sba imaskSven Schnelle1-1/+1
When using kexec the SBA IOMMU IBASE might still have the RE bit set. This triggers a WARN_ON when trying to write back the IBASE register later, and it also makes some mask calculations fail. Cc: <[email protected]> Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Helge Deller <[email protected]>
2020-08-11Merge tag 'tag-chrome-platform-for-v5.9' of ↵Linus Torvalds10-128/+483
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform updates from Benson Leung: "cros_ec_typec: - Add support for switch control and alternate modes to the Chrome EC Type C port driver - Add basic suspend/resume support sensorhub: - Fix timestamp overflow issue - Fix legacy timestamp spreading on Nami systems cros_ec_proto: - After removing all users of, stop exporting cros_ec_cmd_xfer - Check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK and ignore wakeups on old ECs misc: - Documentation warning cleanup - Fix double unlock issue in ishtp" * tag 'tag-chrome-platform-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (21 commits) platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK platform/chrome: cros_ec_proto: ignore unnecessary wakeups on old ECs platform/chrome: cros_ec_sensorhub: Simplify legacy timestamp spreading platform/chrome: cros_ec_proto: Do not export cros_ec_cmd_xfer() platform/chrome: cros_ec_typec: Unregister partner on error platform/chrome: cros_ec_sensorhub: Fix EC timestamp overflow platform/chrome: cros_ec_typec: Add PM support platform/chrome: cros_ec_typec: Use workqueue for port update platform/chrome: cros_ec_typec: Add a dependency on USB_ROLE_SWITCH platform/chrome: cros_ec_ishtp: Fix a double-unlock issue platform/chrome: cros_ec_rpmsg: Document missing struct parameters platform/chrome: cros_ec_spi: Document missing function parameters platform/chrome: cros_ec_typec: Add TBT compat support platform/chrome: cros_ec: Add TBT pd_ctrl fields platform/chrome: cros_ec_typec: Make configure_mux static platform/chrome: cros_ec_typec: Support DP alt mode platform/chrome: cros_ec_typec: Add USB mux control platform/chrome: cros_ec_typec: Register PD CTRL cmd v2 platform/chrome: cros_ec: Update mux state bits platform/chrome: cros_ec_typec: Register Type C switches ...
2020-08-11Merge tag 'for-linus-5.9-ofs1' of ↵Linus Torvalds2-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "A fix and a cleanup... The fix: Al Viro pointed out that I had broken some acl functionality with one of my previous patches. And the cleanup: Jing Xiangfeng found and removed a needless variable assignment" * tag 'for-linus-5.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: remove unnecessary assignment to variable ret orangefs: posix acl fix...
2020-08-11Merge tag 'zonefs-5.9-rc1' of ↵Linus Torvalds3-14/+27
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs update from Damien Le Moal: "A single change for this cycle adding support for zone capacities smaller than the zone size, from Johannes" * tag 'zonefs-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: update documentation to reflect zone size vs capacity zonefs: add zone-capacity support
2020-08-12exfat: retain 'VolumeFlags' properlyTetsuhiro Kohada6-28/+47
MediaFailure and VolumeDirty should be retained if these are set before mounting. In '3.1.13.3 Media Failure Field' of exfat specification describe: If, upon mounting a volume, the value of this field is 1, implementations which scan the entire volume for media failures and record all failures as "bad" clusters in the FAT (or otherwise resolve media failures) may clear the value of this field to 0. Therefore, We should not clear MediaFailure without scanning volume. In '8.1 Recommended Write Ordering' of exfat specification describe: Clear the value of the VolumeDirty field to 0, if its value prior to the first step was 0. Therefore, We should not clear VolumeDirty after mounting. Also rename ERR_MEDIUM to MEDIA_FAILURE. Signed-off-by: Tetsuhiro Kohada <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-08-12exfat: optimize exfat_zeroed_cluster()Tetsuhiro Kohada1-43/+10
Replace part of exfat_zeroed_cluster() with exfat_update_bhs(). And remove exfat_sync_bhs(). Signed-off-by: Tetsuhiro Kohada <[email protected]> Reviewed-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-08-12exfat: add error check when updating dir-entriesTetsuhiro Kohada4-7/+12
Add error check when synchronously updating dir-entries. Suggested-by: Sungjong Seo <[email protected]> Signed-off-by: Tetsuhiro Kohada <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-08-12exfat: write multiple sectors at onceTetsuhiro Kohada3-6/+29
Write multiple sectors at once when updating dir-entries. Add exfat_update_bhs() for that. It wait for write completion once instead of sector by sector. It's only effective if sync enabled. Signed-off-by: Tetsuhiro Kohada <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-08-12exfat: remove EXFAT_SB_DIRTY flagTetsuhiro Kohada7-35/+26
This flag is set/reset in exfat_put_super()/exfat_sync_fs() to avoid sync_blockdev(). - exfat_put_super(): Before calling this, the VFS has already called sync_filesystem(), so sync is never performed here. - exfat_sync_fs(): After calling this, the VFS calls sync_blockdev(), so, it is meaningless to check EXFAT_SB_DIRTY or to bypass sync_blockdev() here. Remove the EXFAT_SB_DIRTY check to ensure synchronization. And remove the code related to the flag. Signed-off-by: Tetsuhiro Kohada <[email protected]> Reviewed-by: Sungjong Seo <[email protected]> Signed-off-by: Namjae Jeon <[email protected]>
2020-08-11Merge branch 'net-initialize-fastreuse-on-inet_inherit_port'David S. Miller3-44/+58
Tim Froidcoeur says: ==================== net: initialize fastreuse on inet_inherit_port In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind behaviour or in the incorrect reuse of a bind. the kernel keeps track for each bind_bucket if all sockets in the bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags. These flags allow skipping the costly bind_conflict check when possible (meaning when all sockets have the proper SO_REUSE option). For every socket added to a bind_bucket, these flags need to be updated. As soon as a socket that does not support reuse is added, the flag is set to false and will never go back to true, unless the bind_bucket is deleted. Note that there is no mechanism to re-evaluate these flags when a socket is removed (this might make sense when removing a socket that would not allow reuse; this leaves room for a future patch). For this optimization to work, it is mandatory that these flags are properly initialized and updated. When a child socket is created from a listen socket in __inet_inherit_port, the TPROXY case could create a new bind bucket without properly initializing these flags, thus preventing the optimization to work. Alternatively, a socket not allowing reuse could be added to an existing bind bucket without updating the flags, causing bind_conflict to never be called as it should. Patch 1/2 refactors the fastreuse update code in inet_csk_get_port into a small helper function, making the actual fix tiny and easier to understand. Patch 2/2 calls this new helper when __inet_inherit_port decides to create a new bind_bucket or use a different bind_bucket than the one of the listen socket. v4: - rebase on latest linux/net master branch v3: - remove company disclaimer from automatic signature v2: - remove unnecessary cast ==================== Signed-off-by: David S. Miller <[email protected]>
2020-08-11net: initialize fastreuse on inet_inherit_portTim Froidcoeur1-0/+1
In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind behaviour or in the incorrect reuse of a bind. the kernel keeps track for each bind_bucket if all sockets in the bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags. These flags allow skipping the costly bind_conflict check when possible (meaning when all sockets have the proper SO_REUSE option). For every socket added to a bind_bucket, these flags need to be updated. As soon as a socket that does not support reuse is added, the flag is set to false and will never go back to true, unless the bind_bucket is deleted. Note that there is no mechanism to re-evaluate these flags when a socket is removed (this might make sense when removing a socket that would not allow reuse; this leaves room for a future patch). For this optimization to work, it is mandatory that these flags are properly initialized and updated. When a child socket is created from a listen socket in __inet_inherit_port, the TPROXY case could create a new bind bucket without properly initializing these flags, thus preventing the optimization to work. Alternatively, a socket not allowing reuse could be added to an existing bind bucket without updating the flags, causing bind_conflict to never be called as it should. Call inet_csk_update_fastreuse when __inet_inherit_port decides to create a new bind_bucket or use a different bind_bucket than the one of the listen socket. Fixes: 093d282321da ("tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()") Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Tim Froidcoeur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-11net: refactor bind_bucket fastreuse into helperTim Froidcoeur2-44/+57
Refactor the fastreuse update code in inet_csk_get_port into a small helper function that can be called from other places. Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Tim Froidcoeur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-11net: phy: marvell10g: fix null pointer dereferenceMarek Behún1-11/+7
Commit c3e302edca24 ("net: phy: marvell10g: fix temperature sensor on 2110") added a check for PHY ID via phydev->drv->phy_id in a function which is called by devres at a time when phydev->drv is already set to null by phy_remove function. This null pointer dereference can be triggered via SFP subsystem with a SFP module containing this Marvell PHY. When the SFP interface is put down, the SFP subsystem removes the PHY. Fixes: c3e302edca24 ("net: phy: marvell10g: fix temperature sensor on 2110") Signed-off-by: Marek Behún <[email protected]> Cc: Maxime Chevallier <[email protected]> Cc: Andrew Lunn <[email protected]> Cc: Baruch Siach <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-11net: Fix potential memory leak in proto_register()Miaohe Lin1-10/+15
If we failed to assign proto idx, we free the twsk_slab_name but forget to free the twsk_slab. Add a helper function tw_prot_cleanup() to free these together and also use this helper function in proto_unregister(). Fixes: b45ce32135d1 ("sock: fix potential memory leak in proto_register()") Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-11Merge tag 'arm64-fixes' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix recordmcount build failure on non-arm64 (caused by an arm64 patch)" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: recordmcount: Fix build failure on non arm64
2020-08-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds60-408/+3886
Pull virtio updates from Michael Tsirkin: - IRQ bypass support for vdpa and IFC - MLX5 vdpa driver - Endianness fixes for virtio drivers - Misc other fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits) vdpa/mlx5: fix up endian-ness for mtu vdpa: Fix pointer math bug in vdpasim_get_config() vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config() vdpa/mlx5: fix memory allocation failure checks vdpa/mlx5: Fix uninitialised variable in core/mr.c vdpa_sim: init iommu lock virtio_config: fix up warnings on parisc vdpa/mlx5: Add VDPA driver for supported mlx5 devices vdpa/mlx5: Add shared memory registration code vdpa/mlx5: Add support library for mlx5 VDPA implementation vdpa/mlx5: Add hardware descriptive header file vdpa: Modify get_vq_state() to return error code net/vdpa: Use struct for set/get vq state vdpa: remove hard coded virtq num vdpasim: support batch updating vhost-vdpa: support IOTLB batching hints vhost-vdpa: support get/set backend features vhost: generialize backend features setting/getting vhost-vdpa: refine ioctl pre-processing vDPA: dont change vq irq after DRIVER_OK ...
2020-08-11Merge tag 'for-v5.9' of ↵Linus Torvalds12-12/+12
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "A couple of minor documentation updates only for this release" * tag 'for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: LSM: drop duplicated words in header file comments Replace HTTP links with HTTPS ones: security