aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-06-10cgroup: Fix css_task_iter_advance_css_set() cset skip conditionTejun Heo1-1/+1
While adding handling for dying task group leaders c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") added an inverted cset skip condition to css_task_iter_advance_css_set(). It should skip cset if it's completely empty but was incorrectly testing for the inverse condition for the dying_tasks list. Fix it. Signed-off-by: Tejun Heo <[email protected]> Fixes: c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") Reported-by: [email protected]
2019-06-10mm/hmm: Hold a mmgrab from hmm to mmJason Gunthorpe1-1/+0
So long as a struct hmm pointer exists, so should the struct mm it is linked too. Hold the mmgrab() as soon as a hmm is created, and mmdrop() it once the hmm refcount goes to zero. Since mmdrop() (ie a 0 kref on struct mm) is now impossible with a !NULL mm->hmm delete the hmm_hmm_destroy(). Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Ralph Campbell <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Philip Yang <[email protected]>
2019-06-10cgroup/bfq: revert bfq.weight symlink changeJens Axboe1-29/+4
There's some discussion on how to do this the best, and Tejun prefers that BFQ just create the file itself instead of having cgroups support a symlink feature. Hence revert commit 54b7b868e826 and 19e9da9e86c4 for 5.2, and this can be done properly for 5.3. Signed-off-by: Jens Axboe <[email protected]>
2019-06-09fork: add clone3Christian Brauner1-48/+153
This adds the clone3 system call. As mentioned several times already (cf. [7], [8]) here's the promised patchset for clone3(). We recently merged the CLONE_PIDFD patchset (cf. [1]). It took the last free flag from clone(). Independent of the CLONE_PIDFD patchset a time namespace has been discussed at Linux Plumber Conference last year and has been sent out and reviewed (cf. [5]). It is expected that it will go upstream in the not too distant future. However, it relies on the addition of the CLONE_NEWTIME flag to clone(). The only other good candidate - CLONE_DETACHED - is currently not recyclable as we have identified at least two large or widely used codebases that currently pass this flag (cf. [2], [3], and [4]). Given that CLONE_PIDFD grabbed the last clone() flag the time namespace is effectively blocked. clone3() has the advantage that it will unblock this patchset again. In general, clone3() is extensible and allows for the implementation of new features. The idea is to keep clone3() very simple and close to the original clone(), specifically, to keep on supporting old clone()-based workloads. We know there have been various creative proposals how a new process creation syscall or even api is supposed to look like. Some people even going so far as to argue that the traditional fork()+exec() split should be abandoned in favor of an in-kernel version of spawn(). Independent of whether or not we personally think spawn() is a good idea this patchset has and does not want to have anything to do with this. One stance we take is that there's no real good alternative to clone()+exec() and we need and want to support this model going forward; independent of spawn(). The following requirements guided clone3(): - bump the number of available flags - move arguments that are currently passed as separate arguments in clone() into a dedicated struct clone_args - choose a struct layout that is easy to handle on 32 and on 64 bit - choose a struct layout that is extensible - give new flags that currently need to abuse another flag's dedicated return argument in clone() their own dedicated return argument (e.g. CLONE_PIDFD) - use a separate kernel internal struct kernel_clone_args that is properly typed according to current kernel conventions in fork.c and is different from the uapi struct clone_args - port _do_fork() to use kernel_clone_args so that all process creation syscalls such as fork(), vfork(), clone(), and clone3() behave identical (Arnd suggested, that we can probably also port do_fork() itself in a separate patchset.) - ease of transition for userspace from clone() to clone3() This very much means that we do *not* remove functionality that userspace currently relies on as the latter is a good way of creating a syscall that won't be adopted. - do not try to be clever or complex: keep clone3() as dumb as possible In accordance with Linus suggestions (cf. [11]), clone3() has the following signature: /* uapi */ struct clone_args { __aligned_u64 flags; __aligned_u64 pidfd; __aligned_u64 child_tid; __aligned_u64 parent_tid; __aligned_u64 exit_signal; __aligned_u64 stack; __aligned_u64 stack_size; __aligned_u64 tls; }; /* kernel internal */ struct kernel_clone_args { u64 flags; int __user *pidfd; int __user *child_tid; int __user *parent_tid; int exit_signal; unsigned long stack; unsigned long stack_size; unsigned long tls; }; long sys_clone3(struct clone_args __user *uargs, size_t size) clone3() cleanly supports all of the supported flags from clone() and thus all legacy workloads. The advantage of sticking close to the old clone() is the low cost for userspace to switch to this new api. Quite a lot of userspace apis (e.g. pthreads) are based on the clone() syscall. With the new clone3() syscall supporting all of the old workloads and opening up the ability to add new features should make switching to it for userspace more appealing. In essence, glibc can just write a simple wrapper to switch from clone() to clone3(). There has been some interest in this patchset already. We have received a patch from the CRIU corner for clone3() that would set the PID/TID of a restored process without /proc/sys/kernel/ns_last_pid to eliminate a race. /* User visible differences to legacy clone() */ - CLONE_DETACHED will cause EINVAL with clone3() - CSIGNAL is deprecated It is superseeded by a dedicated "exit_signal" argument in struct clone_args freeing up space for additional flags. This is based on a suggestion from Andrei and Linus (cf. [9] and [10]) /* References */ [1]: b3e5838252665ee4cfa76b82bdf1198dca81e5be [2]: https://dxr.mozilla.org/mozilla-central/source/security/sandbox/linux/SandboxFilter.cpp#343 [3]: https://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_create.c#n233 [4]: https://sources.debian.org/src/blcr/0.8.5-2.3/cr_module/cr_dump_self.c/?hl=740#L740 [5]: https://lore.kernel.org/lkml/[email protected]/ [6]: https://lore.kernel.org/lkml/[email protected]/ [7]: https://lore.kernel.org/lkml/CAHrFyr5HxpGXA2YrKza-oB-GGwJCqwPfyhD-Y5wbktWZdt0sGQ@mail.gmail.com/ [8]: https://lore.kernel.org/lkml/[email protected]/ [9]: https://lore.kernel.org/lkml/[email protected]/ [10]: https://lore.kernel.org/lkml/CAHk-=whQP-Ykxi=zSYaV9iXsHsENa+2fdj-zYKwyeyed63Lsfw@mail.gmail.com/ [11]: https://lore.kernel.org/lkml/CAHk-=wieuV4hGwznPsX-8E0G2FKhx3NjZ9X3dTKh5zKd+iqOBw@mail.gmail.com/ Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Serge Hallyn <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Jann Horn <[email protected]> Cc: David Howells <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Adrian Reber <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Florian Weimer <[email protected]> Cc: [email protected]
2019-06-08Merge tag 'spdx-5.2-rc4' of ↵Linus Torvalds30-170/+30
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Another round of SPDX header file fixes for 5.2-rc4 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being added, based on the text in the files. We are slowly chipping away at the 700+ different ways people tried to write the license text. All of these were reviewed on the spdx mailing list by a number of different people. We now have over 60% of the kernel files covered with SPDX tags: $ ./scripts/spdxcheck.py -v 2>&1 | grep Files Files checked: 64533 Files with SPDX: 40392 Files with errors: 0 I think the majority of the "easy" fixups are now done, it's now the start of the longer-tail of crazy variants to wade through" * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429 ...
2019-06-08Merge tag 'char-misc-5.2-rc4' of ↵Linus Torvalds3-31/+30
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char and misc driver fixes for 5.2-rc4 to resolve a number of reported issues. The most "notable" one here is the kernel headers in proc^Wsysfs fixes. Those changes move the header file info into sysfs and fixes the build issues that you reported. Other than that, a bunch of small habanalabs driver fixes, some fpga driver fixes, and a few other tiny driver fixes. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: habanalabs: Read upper bits of trace buffer from RWPHI habanalabs: Fix virtual address access via debugfs for 2MB pages fpga: zynqmp-fpga: Correctly handle error pointer habanalabs: fix bug in checking huge page optimization habanalabs: Avoid using a non-initialized MMU cache mutex habanalabs: fix debugfs code uapi/habanalabs: add opcode for enable/disable device debug mode habanalabs: halt debug engines on user process close test_firmware: Use correct snprintf() limit genwqe: Prevent an integer overflow in the ioctl parport: Fix mem leak in parport_register_dev_model fpga: dfl: expand minor range when registering chrdev region fpga: dfl: Add lockdep classes for pdata->lock fpga: dfl: afu: Pass the correct device to dma_mapping_error() fpga: stratix10-soc: fix use-after-free on s10_init() w1: ds2408: Fix typo after 49695ac46861 (reset on output_write retry with readback) kheaders: Do not regenerate archive if config is not changed kheaders: Move from proc to sysfs lkdtm/bugs: Adjust recursion test to avoid elision lkdtm/usercopy: Moves the KERNEL_DS test to non-canonical
2019-06-08Merge tag 'for-linus-20190608' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+29
Pull block fixes from Jens Axboe: - Allow symlink from the bfq.weight cgroup parameter to the general weight (Angelo) - Damien is new skd maintainer (Bart) - NVMe pull request from Sagi, with a few small fixes. - Ensure we set DMA segment size properly, dma-debug is now tripping on these (Christoph) - Remove useless debugfs_create() return check (Greg) - Remove redundant unlikely() check on IS_ERR() (Kefeng) - Fixup request freeing on exit (Ming) * tag 'for-linus-20190608' of git://git.kernel.dk/linux-block: block, bfq: add weight symlink to the bfq.weight cgroup parameter cgroup: let a symlink too be created with a cftype file block: free sched's request pool in blk_cleanup_queue nvme-rdma: use dynamic dma mapping per command nvme: Fix u32 overflow in the number of namespace list calculation mmc: also set max_segment_size in the device mtip32xx: also set max_segment_size in the device rsxx: don't call dma_set_max_seg_size nvme-pci: don't limit DMA segement size block: Drop unlikely before IS_ERR(_OR_NULL) block: aoe: no need to check return value of debugfs_create functions nvmet: fix data_len to 0 for bdev-backed write_zeroes MAINTAINERS: Hand over skd maintainership nvme-tcp: fix queue mapping when queue count is limited nvme-rdma: fix queue mapping when queue count is limited
2019-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-4/+16
Daniel Borkmann says: ==================== pull-request: bpf 2019-06-07 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix several bugs in riscv64 JIT code emission which forgot to clear high 32-bits for alu32 ops, from Björn and Luke with selftests covering all relevant BPF alu ops from Björn and Jiong. 2) Two fixes for UDP BPF reuseport that avoid calling the program in case of __udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption that skb->data is pointing to transport header, from Martin. 3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog workqueue, and a missing restore of sk_write_space when psock gets dropped, from Jakub and John. 4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it breaks standard applications like DNS if reverse NAT is not performed upon receive, from Daniel. 5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6 fails to verify that the length of the tuple is long enough, from Lorenz. 6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for {un,}successful probe) as that is expected to be propagated as an fd to load_sk_storage_btf() and thus closing the wrong descriptor otherwise, from Michal. 7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir. 8) Minor misc fixes in docs, samples and selftests, from various others. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-06-07Merge tag 'pm-5.2-rc4' of ↵Linus Torvalds3-2/+17
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a crash during resume from hibernation introduced during the 4.19 cycle, cause the new Performance and Energy Bias Hint (EPB) code to be built only if CONFIG_PM is set and add a few missing kerneldoc comments. Specifics: - Fix a crash that occurs when a kernel with 'nosmt' in the command line is used to resume the system from hibernation (as the "restore" kernel), because memory mapping differences between the restore and image kernels cause SMT siblings to be woken up from idle states and subsequently they try to fetch instructions from incorrect memory locations (Jiri Kosina). - Cause the new Performance and Energy Bias Hint (EPB) code to be built only if CONFIG_PM is set, because that code is not really necessary otherwise (Rafael Wysocki). - Add kerneldoc comments to documents some helper functions related to system-wide suspend to avoid possible confusion regarding their purpose (Rafael Wysocki)" * tag 'pm-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/power: Fix 'nosmt' vs hibernation triple fault during resume PM: sleep: Add kerneldoc comments to some functions x86: intel_epb: Do not build when CONFIG_PM is unset
2019-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller34-301/+137
Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <[email protected]>
2019-06-07kernel: module: Use struct_size() helperGustavo A. R. Silva1-2/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct module_sect_attrs { ... struct module_sect_attr attrs[0]; }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(*sect_attrs) + nloaded * sizeof(sect_attrs->attrs[0] with: struct_size(sect_attrs, attrs, nloaded) This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2019-06-07Merge branch 'pm-x86'Rafael J. Wysocki2-2/+11
* pm-x86: x86/power: Fix 'nosmt' vs hibernation triple fault during resume x86: intel_epb: Do not build when CONFIG_PM is unset
2019-06-07cgroup: let a symlink too be created with a cftype fileAngelo Ruocco1-4/+29
This commit enables a cftype to have a symlink (of any name) that points to the file associated with the cftype. Signed-off-by: Angelo Ruocco <[email protected]> Signed-off-by: Paolo Valente <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-06-06bpf: fix unconnected udp hooksDaniel Borkmann2-4/+16
Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in 7828f20e3779 ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Andrey Ignatov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Acked-by: Martynas Pumputis <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-06-05cgroup: css_task_iter_skip()'d iterators must be advanced before accessedTejun Heo1-0/+4
b636fd38dc40 ("cgroup: Implement css_task_iter_skip()") introduced css_task_iter_skip() which is used to fix task iterations skipping dying threadgroup leaders with live threads. Skipping is implemented as a subportion of full advancing but css_task_iter_next() forgot to fully advance a skipped iterator before determining the next task to visit causing it to return invalid task pointers. Fix it by making css_task_iter_next() fully advance the iterator if it has been skipped since the previous iteration. Signed-off-by: Tejun Heo <[email protected]> Reported-by: syzbot Link: http://lkml.kernel.org/r/[email protected] Fixes: b636fd38dc40 ("cgroup: Implement css_task_iter_skip()")
2019-06-05ptrace: move clearing of TIF_SYSCALL_EMU flag to coreSudeep Holla1-0/+3
While the TIF_SYSCALL_EMU is set in ptrace_resume independent of any architecture, currently only powerpc and x86 unset the TIF_SYSCALL_EMU flag in ptrace_disable which gets called from ptrace_detach. Let's move the clearing of TIF_SYSCALL_EMU flag to __ptrace_unlink which gets executed from ptrace_detach and also keep it along with or close to clearing of TIF_SYSCALL_TRACE. Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441Thomas Gleixner10-52/+10
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436Thomas Gleixner1-2/+1
Based on 1 normalized pattern(s): distributed under the terms of the gnu gpl version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 2 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430Thomas Gleixner1-2/+1
Based on 1 normalized pattern(s): distribute under gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 8 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428Thomas Gleixner7-18/+7
Based on 1 normalized pattern(s): this file is released under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 68 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 363Thomas Gleixner1-1/+1
Based on 1 normalized pattern(s): released under terms in gpl version 2 see copying extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 5 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333Thomas Gleixner1-13/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 136 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295Thomas Gleixner8-72/+8
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282Thomas Gleixner1-10/+1
Based on 1 normalized pattern(s): this software is licensed under the terms of the gnu general public license version 2 as published by the free software foundation and may be copied distributed and modified under those terms this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 285 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Alexios Zavras <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-05livepatch: Use static buffer for debugging messages under rq lockPetr Mladek1-2/+1
The err_buf array uses 128 bytes of stack space. Move it off the stack by making it static. It's safe to use a shared buffer because klp_try_switch_task() is called under klp_mutex. Acked-by: Miroslav Benes <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]> Signed-off-by: Petr Mladek <[email protected]>
2019-06-05signal: improve commentsChristian Brauner1-6/+5
Improve the comments for pidfd_send_signal(). First, the comment still referred to a file descriptor for a process as a "task file descriptor" which stems from way back at the beginning of the discussion. Replace this with "pidfd" for consistency. Second, the wording for the explanation of the arguments to the syscall was a bit inconsistent, e.g. some used the past tense some used present tense. Make the wording more consistent. Signed-off-by: Christian Brauner <[email protected]>
2019-06-05kernel/module.c: Only return -EEXIST for modules that have finished loadingPrarit Bhargava1-4/+2
Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and linux guests boot with repeated errors: amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2) amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2) amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2) The warnings occur because the module code erroneously returns -EEXIST for modules that have failed to load and are in the process of being removed from the module list. module amd64_edac_mod has a dependency on module edac_mce_amd. Using modules.dep, systemd will load edac_mce_amd for every request of amd64_edac_mod. When the edac_mce_amd module loads, the module has state MODULE_STATE_UNFORMED and once the module load fails and the state becomes MODULE_STATE_GOING. Another request for edac_mce_amd module executes and add_unformed_module() will erroneously return -EEXIST even though the previous instance of edac_mce_amd has MODULE_STATE_GOING. Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which fails because of unknown symbols from edac_mce_amd. add_unformed_module() must wait to return for any case other than MODULE_STATE_LIVE to prevent a race between multiple loads of dependent modules. Signed-off-by: Prarit Bhargava <[email protected]> Signed-off-by: Barret Rhoden <[email protected]> Cc: David Arcari <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Heiko Carstens <[email protected]> Signed-off-by: Jessica Yu <[email protected]>
2019-06-04bpf: remove redundant assignment to errColin Ian King2-2/+2
The variable err is assigned with the value -EINVAL that is never read and it is re-assigned a new value later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-03gcov: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-22/+2
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Also delete the dentry variable as it is never needed. Acked-by: Peter Oberparleiter <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03dma-direct: provide generic support for uncached kernel segmentsChristoph Hellwig1-2/+15
A few architectures support uncached kernel segments. In that case we get an uncached mapping for a given physica address by using an offset in the uncached segement. Implement support for this scheme in the generic dma-direct code instead of duplicating it in arch hooks. Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-03dma-contiguous: use fallback alloc_pages for single pagesNicolin Chen1-1/+10
The addresses within a single page are always contiguous, so it's not so necessary to always allocate one single page from CMA area. Since the CMA area has a limited predefined size of space, it may run out of space in heavy use cases, where there might be quite a lot CMA pages being allocated for single pages. However, there is also a concern that a device might care where a page comes from -- it might expect the page from CMA area and act differently if the page doesn't. This patch tries to use the fallback alloc_pages path, instead of one-page size allocations from the global CMA area in case that a device does not have its own CMA area. This'd save resources from the CMA global area for more CMA allocations, and also reduce CMA fragmentations resulted from trivial allocations. Signed-off-by: Nicolin Chen <[email protected]> Tested-by: dann frazier <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-03dma-contiguous: add dma_{alloc,free}_contiguous() helpersNicolin Chen2-20/+51
Both dma_alloc_from_contiguous() and dma_release_from_contiguous() are very simply implemented, but requiring callers to pass certain parameters like count and align, and taking a boolean parameter to check __GFP_NOWARN in the allocation flags. So every function call duplicates similar work: unsigned long order = get_order(size); size_t count = size >> PAGE_SHIFT; page = dma_alloc_from_contiguous(dev, count, order, gfp & __GFP_NOWARN); [...] dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT); Additionally, as CMA can be used only in the context which permits sleeping, most of callers do a gfpflags_allow_blocking() check and a corresponding fallback allocation of normal pages upon any false result: if (gfpflags_allow_blocking(flag)) page = dma_alloc_from_contiguous(); if (!page) page = alloc_pages(); [...] if (!dma_release_from_contiguous(dev, page, count)) __free_pages(page, get_order(size)); So this patch simplifies those function calls by abstracting these operations into the two new functions: dma_{alloc,free}_contiguous. As some callers of dma_{alloc,release}_from_contiguous() might be complicated, this patch just implements these two new functions to kernel/dma/direct.c only as an initial step. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Tested-by: dann frazier <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2019-06-03kprobes: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-19/+6
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03fail_function: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-18/+5
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Kees Cook <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: zhong jiang <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03blktrace: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-6/+0
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jens Axboe <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03trace: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-4/+0
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Steven Rostedt <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03locking/lock_events: Use raw_cpu_{add,inc}() for statsPeter Zijlstra1-41/+4
Instead of playing silly games with CONFIG_DEBUG_PREEMPT toggling between this_cpu_*() and __this_cpu_*() use raw_cpu_*(), which is exactly what we want here. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tim Chen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: huang ying <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Fix merging of hlocks with non-zero referencesImre Deak1-9/+9
The sequence static DEFINE_WW_CLASS(test_ww_class); struct ww_acquire_ctx ww_ctx; struct ww_mutex ww_lock_a; struct ww_mutex ww_lock_b; struct ww_mutex ww_lock_c; struct mutex lock_c; ww_acquire_init(&ww_ctx, &test_ww_class); ww_mutex_init(&ww_lock_a, &test_ww_class); ww_mutex_init(&ww_lock_b, &test_ww_class); ww_mutex_init(&ww_lock_c, &test_ww_class); mutex_init(&lock_c); ww_mutex_lock(&ww_lock_a, &ww_ctx); mutex_lock(&lock_c); ww_mutex_lock(&ww_lock_b, &ww_ctx); ww_mutex_lock(&ww_lock_c, &ww_ctx); mutex_unlock(&lock_c); (*) ww_mutex_unlock(&ww_lock_c); ww_mutex_unlock(&ww_lock_b); ww_mutex_unlock(&ww_lock_a); ww_acquire_fini(&ww_ctx); (**) will trigger the following error in __lock_release() when calling mutex_release() at **: DEBUG_LOCKS_WARN_ON(depth <= 0) The problem is that the hlock merging happening at * updates the references for test_ww_class incorrectly to 3 whereas it should've updated it to 4 (representing all the instances for ww_ctx and ww_lock_[abc]). Fix this by updating the references during merging correctly taking into account that we can have non-zero references (both for the hlock that we merge into another hlock or for the hlock we are merging into). Signed-off-by: Imre Deak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Fix OOO unlock when hlocks need mergingImre Deak1-12/+29
The sequence static DEFINE_WW_CLASS(test_ww_class); struct ww_acquire_ctx ww_ctx; struct ww_mutex ww_lock_a; struct ww_mutex ww_lock_b; struct mutex lock_c; struct mutex lock_d; ww_acquire_init(&ww_ctx, &test_ww_class); ww_mutex_init(&ww_lock_a, &test_ww_class); ww_mutex_init(&ww_lock_b, &test_ww_class); mutex_init(&lock_c); ww_mutex_lock(&ww_lock_a, &ww_ctx); mutex_lock(&lock_c); ww_mutex_lock(&ww_lock_b, &ww_ctx); mutex_unlock(&lock_c); (*) ww_mutex_unlock(&ww_lock_b); ww_mutex_unlock(&ww_lock_a); ww_acquire_fini(&ww_ctx); triggers the following WARN in __lock_release() when doing the unlock at *: DEBUG_LOCKS_WARN_ON(curr->lockdep_depth != depth - 1); The problem is that the WARN check doesn't take into account the merging of ww_lock_a and ww_lock_b which results in decreasing curr->lockdep_depth by 2 not only 1. Note that the following sequence doesn't trigger the WARN, since there won't be any hlock merging. ww_acquire_init(&ww_ctx, &test_ww_class); ww_mutex_init(&ww_lock_a, &test_ww_class); ww_mutex_init(&ww_lock_b, &test_ww_class); mutex_init(&lock_c); mutex_init(&lock_d); ww_mutex_lock(&ww_lock_a, &ww_ctx); mutex_lock(&lock_c); mutex_lock(&lock_d); ww_mutex_lock(&ww_lock_b, &ww_ctx); mutex_unlock(&lock_d); ww_mutex_unlock(&ww_lock_b); ww_mutex_unlock(&ww_lock_a); mutex_unlock(&lock_c); ww_acquire_fini(&ww_ctx); In general both of the above two sequences are valid and shouldn't trigger any lockdep warning. Fix this by taking the decrement due to the hlock merging into account during lock release and hlock class re-setting. Merging can't happen during lock downgrading since there won't be a new possibility to merge hlocks in that case, so add a WARN if merging still happens then. Signed-off-by: Imre Deak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03x86/power: Fix 'nosmt' vs hibernation triple fault during resumeJiri Kosina2-2/+11
As explained in 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once. This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid. That results in triple fault shortly after cr3 is switched, and machine reboots. Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait. Cc: 4.19+ <[email protected]> # v4.19+ Debugged-by: Thomas Gleixner <[email protected]> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina <[email protected]> Acked-by: Pavel Machek <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2019-06-03perf/core: Add attr_groups_update into struct pmuJiri Olsa1-0/+6
Adding attr_update attribute group into pmu, to allow having multiple attribute groups for same group name. This will allow us to update "events" or "format" directories with attributes that depend on various HW conditions. For example having group_format_extra group that updates "format" directory only if pmu version is 2 and higher: static umode_t exra_is_visible(struct kobject *kobj, struct attribute *attr, int i) { return x86_pmu.version >= 2 ? attr->mode : 0; } static struct attribute_group group_format_extra = { .name = "format", .is_visible = exra_is_visible, }; Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03perf/core: Allow non-privileged uprobe for user processesSong Liu2-3/+3
Currently, non-privileged user could only use uprobe with kernel.perf_event_paranoid = -1 However, setting perf_event_paranoid to -1 leaks other users' processes to non-privileged uprobes. To introduce proper permission control of uprobes, we are building the following system: A daemon with CAP_SYS_ADMIN is in charge to create uprobes via tracefs; Users asks the daemon to create uprobes; Then user can attach uprobe only to processes owned by the user. This patch allows non-privileged user to attach uprobe to processes owned by the user. The following example shows how to use uprobe with non-privileged user. This is based on Brendan's blog post [1] 1. Create uprobe with root: sudo perf probe -x 'readline%return +0($retval):string' 2. Then non-root user can use the uprobe as: perf record -vvv -e probe_bash:readline__return -p <pid> sleep 20 perf script [1] http://www.brendangregg.com/blog/2015-06-28/linux-ftrace-uprobe.html Signed-off-by: Song Liu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove !dir in lock irq usage checkYuyang Du1-1/+1
In mark_lock_irq(), the following checks are performed: ---------------------------------- | -> | unsafe | read unsafe | |----------------------------------| | safe | F B | F* B* | |----------------------------------| | read safe | F? B* | - | ---------------------------------- Where: F: check_usage_forwards B: check_usage_backwards *: check enabled by STRICT_READ_CHECKS ?: check enabled by the !dir condition From checking point of view, the special F? case does not make sense, whereas it perhaps is made for peroformance concern. As later patch will address this issue, remove this exception, which makes the checks consistent later. With STRICT_READ_CHECKS = 1 which is default, there is no functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Adjust new bit cases in mark_lockYuyang Du1-14/+7
The new bit can be any possible lock usage except it is garbage, so the cases in switch can be made simpler. Warn early on if wrong usage bit is passed without taking locks. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Consolidate lock usage bit initializationYuyang Du1-8/+14
Lock usage bit initialization is consolidated into one function mark_usage(). Trivial readability improvement. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Check redundant dependency only when CONFIG_LOCKDEP_SMALLYuyang Du1-0/+4
As Peter has put it all sound and complete for the cause, I simply quote: "It (check_redundant) was added for cross-release (which has since been reverted) which would generate a lot of redundant links (IIRC) but having it makes the reports more convoluted -- basically, if we had an A-B-C relation, then A-C will not be added to the graph because it is already covered. This then means any report will include B, even though a shorter cycle might have been possible." This would increase the number of direct dependencies. For a simple workload (make clean; reboot; make vmlinux -j8), the data looks like this: CONFIG_LOCKDEP_SMALL: direct dependencies: 6926 !CONFIG_LOCKDEP_SMALL: direct dependencies: 9052 (+30.7%) Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Refactorize check_noncircular and check_redundantYuyang Du1-44/+74
These two functions now handle different check results themselves. A new check_path function is added to check whether there is a path in the dependency graph. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove unused argument in __lock_releaseYuyang Du1-2/+2
The @nested is not used in __release_lock so remove it despite that it is not used in lock_release in the first place. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Remove redundant argument in check_deadlockYuyang Du1-3/+3
In check_deadlock(), the third argument read comes from the second argument hlock so that it can be removed. No functional change. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-06-03locking/lockdep: Update comments on dependency searchYuyang Du1-11/+10
The breadth-first search is implemented as flat-out non-recursive now, but the comments are still describing it as recursive, update the comments in that regard. Signed-off-by: Yuyang Du <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>