aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-12-04tracing: Fix userstacktrace option for instancesSteven Rostedt (VMware)1-5/+8
When the instances were able to use their own options, the userstacktrace option was left hardcoded for the top level. This made the instance userstacktrace option bascially into a nop, and will confuse users that set it, but nothing happens (I was confused when it happened to me!) Cc: [email protected] Fixes: 16270145ce6b ("tracing: Add trace options for core options to instances") Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-12-04bpf: Add a bpf_sock_from_file helperFlorent Revest1-0/+20
While eBPF programs can check whether a file is a socket by file->f_op == &socket_file_ops, they cannot convert the void private_data pointer to a struct socket BTF pointer. In order to do this a new helper wrapping sock_from_file is added. This is useful to tracing programs but also other program types inheriting this set of helpers such as iterators or LSM programs. Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: KP Singh <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-04block: remove the request_queue to argument request based tracepointsChristoph Hellwig1-27/+17
The request_queue can trivially be derived from the request. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: remove the request_queue argument to the block_bio_remap tracepointChristoph Hellwig1-9/+5
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: remove the request_queue argument to the block_split tracepointChristoph Hellwig1-3/+2
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: simplify and extend the block_bio_merge tracepoint classChristoph Hellwig1-31/+10
The block_bio_merge tracepoint class can be reused for most bio-based tracepoints. For that it just needs to lose the superfluous q and rq parameters. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04block: remove the unused block_sleeprq tracepointChristoph Hellwig1-22/+0
The block_sleeprq tracepoint was only used by the legacy request code. Remove it now that the legacy request code is gone. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-12-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski22-586/+529
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03bpf: Allow to specify kernel module BTFs when attaching BPF programsAndrii Nakryiko2-31/+56
Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03bpf: Remove hard-coded btf_vmlinux assumption from BPF verifierAndrii Nakryiko3-54/+112
Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03bpf: Keep module's btf_data_size intact after loadAndrii Nakryiko1-1/+0
Having real btf_data_size stored in struct module is benefitial to quickly determine which kernel modules have associated BTF object and which don't. There is no harm in keeping this info, as opposed to keeping invalid pointer. Fixes: 607c543f939d ("bpf: Sanitize BTF data pointer after module is loaded") Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()Andrii Nakryiko1-3/+5
__module_address() needs to be called with preemption disabled or with module_mutex taken. preempt_disable() is enough for read-only uses, which is what this fix does. Also, module_put() does internal check for NULL, so drop it as well. Fixes: a38d1107f937 ("bpf: support raw tracepoints in modules") Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-16/+60
Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03fsnotify: generalize handle_inode_event()Amir Goldstein3-3/+3
The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/[email protected] CC: [email protected] Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2020-12-03perf/core: Fix arch_perf_get_page_size()Peter Zijlstra1-65/+38
The (new) page-table walker in arch_perf_get_page_size() is broken in various ways. Specifically while it is used in a lockless manner, it doesn't depend on CONFIG_HAVE_FAST_GUP nor uses the proper _lockless offset methods, nor is careful to only read each entry only once. Also the hugetlb support is broken due to calling pte_page() without first checking pte_special(). Rewrite the whole thing to be a proper lockless page-table walker and employ the new pXX_leaf_size() pgtable functions to determine the pagetable size without looking at the page-frames. Fixes: 51b646b2d9f8 ("perf,mm: Handle non-page-table-aligned hugetlbfs") Fixes: 8d97e71811aa ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Kan Liang <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-02audit: replace atomic_add_return()Yejune Deng1-1/+1
atomic_inc_return() is a little neater Signed-off-by: Yejune Deng <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf progsRoman Gushchin2-69/+12
Do not use rlimit-based memory accounting for bpf progs. It has been replaced with memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting infra for bpf mapsRoman Gushchin1-80/+16
Remove rlimit-based accounting infrastructure code, which is not used anymore. To provide a backward compatibility, use an approximation of the bpf map memory footprint as a "memlock" value, available to a user via map info. The approximation is based on the maximal number of elements and key and value sizes. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf local storage mapsRoman Gushchin1-10/+0
Do not use rlimit-based memory accounting for bpf local storage maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for stackmap mapsRoman Gushchin1-13/+3
Do not use rlimit-based memory accounting for stackmap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf ringbufferRoman Gushchin1-20/+4
Do not use rlimit-based memory accounting for bpf ringbuffer. It has been replaced with the memcg-based memory accounting. bpf_ringbuf_alloc() can't return anything except ERR_PTR(-ENOMEM) and a valid pointer, so to simplify the code make it return NULL in the first case. This allows to drop a couple of lines in ringbuf_map_alloc() and also makes it look similar to other memory allocating function like kmalloc(). Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for reuseport_array mapsRoman Gushchin1-10/+2
Do not use rlimit-based memory accounting for reuseport_array maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for queue_stack_maps mapsRoman Gushchin1-12/+4
Do not use rlimit-based memory accounting for queue_stack maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for lpm_trie mapsRoman Gushchin1-13/+0
Do not use rlimit-based memory accounting for lpm_trie maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for hashtab mapsRoman Gushchin1-18/+1
Do not use rlimit-based memory accounting for hashtab maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for devmap mapsRoman Gushchin1-16/+2
Do not use rlimit-based memory accounting for devmap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for cgroup storage mapsRoman Gushchin1-20/+1
Do not use rlimit-based memory accounting for cgroup storage maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for cpumap mapsRoman Gushchin1-15/+1
Do not use rlimit-based memory accounting for cpumap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf_struct_ops mapsRoman Gushchin1-16/+3
Do not use rlimit-based memory accounting for bpf_struct_ops maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting for arraymap mapsRoman Gushchin1-20/+4
Do not use rlimit-based memory accounting for arraymap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Memcg-based memory accounting for bpf local storage mapsRoman Gushchin1-4/+6
Account memory used by bpf local storage maps: per-socket, per-inode and per-task storages. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Memcg-based memory accounting for bpf ringbufferRoman Gushchin1-7/+4
Enable the memcg-based memory accounting for the memory used by the bpf ringbuffer. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Memcg-based memory accounting for lpm_trie mapsRoman Gushchin1-3/+3
Include lpm trie and lpm trie node objects into the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Refine memcg-based memory accounting for hashtab mapsRoman Gushchin1-10/+14
Include percpu objects and the size of map metadata into the accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Refine memcg-based memory accounting for devmap mapsRoman Gushchin1-3/+4
Include map metadata and the node size (struct bpf_dtab_netdev) into the accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Memcg-based memory accounting for cgroup storage mapsRoman Gushchin1-12/+11
Account memory used by cgroup storage maps including metadata structures. Account the percpu memory for the percpu flavor of cgroup storage. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Refine memcg-based memory accounting for cpumap mapsRoman Gushchin1-9/+12
Include metadata and percpu data into the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Refine memcg-based memory accounting for arraymap mapsRoman Gushchin1-3/+3
Include percpu arrays and auxiliary data into the memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Memcg-based memory accounting for bpf mapsRoman Gushchin1-1/+5
This patch enables memcg-based memory accounting for memory allocated by __bpf_map_area_alloc(), which is used by many types of bpf maps for large initial memory allocations. Please note, that __bpf_map_area_alloc() should not be used outside of map creation paths without setting the active memory cgroup to the map's memory cgroup. Following patches in the series will refine the accounting for some of the map types. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Prepare for memcg-based memory accounting for bpf mapsRoman Gushchin1-0/+63
Bpf maps can be updated from an interrupt context and in such case there is no process which can be charged. It makes the memory accounting of bpf maps non-trivial. Fortunately, after commit 4127c6504f25 ("mm: kmem: enable kernel memcg accounting from interrupt contexts") and commit b87d8cefe43c ("mm, memcg: rework remote charging API to support nesting") it's finally possible. To make the ownership model simple and consistent, when the map is created, the memory cgroup of the current process is recorded. All subsequent allocations related to the bpf map are charged to the same memory cgroup. It includes allocations made by any processes (even if they do belong to a different cgroup) and from interrupts. This commit introduces 3 new helpers, which will be used by following commits to enable the accounting of bpf maps memory: - bpf_map_kmalloc_node() - bpf_map_kzalloc() - bpf_map_alloc_percpu() They are wrapping popular memory allocation functions. They set the active memory cgroup to the map's memory cgroup and add __GFP_ACCOUNT to the passed gfp flags. Then they call into the corresponding memory allocation function and restore the original active memory cgroup. These helpers are supposed to use everywhere except the map creation path. During the map creation when the map structure is allocated by itself, it cannot be passed to those helpers. In those cases default memory allocation function will be used with the __GFP_ACCOUNT flag. Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Memcg-based memory accounting for bpf progsRoman Gushchin1-5/+5
Include memory used by bpf programs into the memcg-based accounting. This includes the memory used by programs itself, auxiliary data, statistics and bpf line info. A memory cgroup containing the process which loads the program is getting charged. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02mm: memcontrol: Use helpers to read page's memcg dataRoman Gushchin1-3/+4
Patch series "mm: allow mapping accounted kernel pages to userspace", v6. Currently a non-slab kernel page which has been charged to a memory cgroup can't be mapped to userspace. The underlying reason is simple: PageKmemcg flag is defined as a page type (like buddy, offline, etc), so it takes a bit from a page->mapped counter. Pages with a type set can't be mapped to userspace. But in general the kmemcg flag has nothing to do with mapping to userspace. It only means that the page has been accounted by the page allocator, so it has to be properly uncharged on release. Some bpf maps are mapping the vmalloc-based memory to userspace, and their memory can't be accounted because of this implementation detail. This patchset removes this limitation by moving the PageKmemcg flag into one of the free bits of the page->mem_cgroup pointer. Also it formalizes accesses to the page->mem_cgroup and page->obj_cgroups using new helpers, adds several checks and removes a couple of obsolete functions. As the result the code became more robust with fewer open-coded bit tricks. This patch (of 4): Currently there are many open-coded reads of the page->mem_cgroup pointer, as well as a couple of read helpers, which are barely used. It creates an obstacle on a way to reuse some bits of the pointer for storing additional bits of information. In fact, we already do this for slab pages, where the last bit indicates that a pointer has an attached vector of objcg pointers instead of a regular memcg pointer. This commits uses 2 existing helpers and introduces a new helper to converts all read sides to calls of these helpers: struct mem_cgroup *page_memcg(struct page *page); struct mem_cgroup *page_memcg_rcu(struct page *page); struct mem_cgroup *page_memcg_check(struct page *page); page_memcg_check() is intended to be used in cases when the page can be a slab page and have a memcg pointer pointing at objcg vector. It does check the lowest bit, and if set, returns NULL. page_memcg() contains a VM_BUG_ON_PAGE() check for the page not being a slab page. To make sure nobody uses a direct access, struct page's mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02irq: Call tick_irq_enter() inside HARDIRQ_OFFSETFrederic Weisbecker1-9/+5
Now that account_hardirq_enter() is called after HARDIRQ_OFFSET has been incremented, there is nothing left that prevents us from also moving tick_irq_enter() after HARDIRQ_OFFSET is incremented. The desired outcome is to remove the nasty hack that prevents softirqs from being raised through ksoftirqd instead of the hardirq bottom half. Also tick_irq_enter() then becomes appropriately covered by lockdep. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02irqtime: Move irqtime entry accounting after irq offset incrementationFrederic Weisbecker2-10/+14
IRQ time entry is currently accounted before HARDIRQ_OFFSET or SOFTIRQ_OFFSET are incremented. This is convenient to decide to which index the cputime to account is dispatched. Unfortunately it prevents tick_irq_enter() from being called under HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ entry accounting due to the necessary clock catch up. As a result we don't benefit from appropriate lockdep coverage on tick_irq_enter(). To prepare for fixing this, move the IRQ entry cputime accounting after the preempt offset is incremented. This requires the cputime dispatch code to handle the extra offset. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02sched/vtime: Consolidate IRQ time accountingFrederic Weisbecker1-4/+9
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02s390/vtime: Use the generic IRQ entry accountingFrederic Weisbecker1-11/+2
s390 has its own version of IRQ entry accounting because it doesn't account the idle time the same way the other architectures do. Only the actual idle sleep time is accounted as idle time, the rest of the idle task execution is accounted as system time. Make the generic IRQ entry accounting aware of architectures that have their own way of accounting idle time and convert s390 to use it. This prepares s390 to get involved in further consolidations of IRQ time accounting. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02sched/cputime: Remove symbol exports from IRQ time accountingFrederic Weisbecker1-2/+0
account_irq_enter_time() and account_irq_exit_time() are not called from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ cputime accounting functions called from there. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02entry: Add syscall_exit_to_user_mode_work()Sven Schnelle1-2/+12
This is the same as syscall_exit_to_user_mode() but without calling exit_to_user_mode(). This can be used if there is an architectural reason to avoid the combo function, e.g. restarting a syscall without returning to userspace. Before returning to user space the caller has to invoke exit_to_user_mode(). [ tglx: Amended comments ] Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02entry: Add exit_to_user_mode() wrapperSven Schnelle1-12/+6
Called from architecture specific code when syscall_exit_to_user_mode() is not suitable. It simply calls __exit_to_user_mode(). This way __exit_to_user_mode() can still be inlined because it is declared static __always_inline. [ tglx: Amended comments and moved it to a different place in the header ] Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02entry_Add_enter_from_user_mode_wrapperSven Schnelle1-10/+6
To be called from architecture specific code if the combo interfaces are not suitable. It simply calls __enter_from_user_mode(). This way __enter_from_user_mode will still be inlined because it is declared static __always_inline. [ tglx: Amend comments and move it to a different location in the header ] Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]