aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-02-12swiotlb: add debugfs to track swiotlb buffer usageDongli Zhang1-0/+44
The device driver will not be able to do dma operations once swiotlb buffer is full, either because the driver is using so many IO TLB blocks inflight, or because there is memory leak issue in device driver. To export the swiotlb buffer usage via debugfs would help the user estimate the size of swiotlb buffer to pre-allocate or analyze device driver memory leak issue. Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2019-02-12swiotlb: fix comment on swiotlb_bounce()Dongli Zhang1-1/+1
Fix the comment as swiotlb_bounce() is used to copy from original dma location to swiotlb buffer during swiotlb_tbl_map_single(), while to copy from swiotlb buffer to original dma location during swiotlb_tbl_unmap_single(). Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2019-02-12bpf: offload: add priv field for driversJakub Kicinski1-1/+9
Currently bpf_offload_dev does not have any priv pointer, forcing the drivers to work backwards from the netdev in program metadata. This is not great given programs are conceptually associated with the offload device, and it means one or two unnecessary deferences. Add a priv pointer to bpf_offload_dev. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-02-11tracing: probeevent: Correctly update remaining space in dynamic areaAndreas Ziegler1-2/+4
Commit 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area") improved the string fetching mechanism by returning the number of required bytes after copying the argument to the dynamic area. However, this return value is now only used to increment the pointer inside the dynamic area but misses updating the 'maxlen' variable which indicates the remaining space in the dynamic area. This means that fetch_store_string() always reads the *total* size of the dynamic area from the data_loc pointer instead of the *remaining* size (and passes it along to strncpy_from_{user,unsafe}) even if we're already about to copy data into the middle of the dynamic area. Link: http://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: [email protected] Fixes: 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area") Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Andreas Ziegler <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-02-11tracing: Change the function format to display function names by perfChangbin Du1-22/+19
Here is an example for this change. $ sudo perf record -e 'ftrace:function' --filter='ip==schedule' $ sudo perf report The output of perf before this patch: \# Samples: 100 of event 'ftrace:function' \# Event count (approx.): 100 \# \# Overhead Trace output \# ........ ...................................... \# 51.00% ffffffff81f6aaa0 <-- ffffffff81158e8d 29.00% ffffffff81f6aaa0 <-- ffffffff8116ccb2 8.00% ffffffff81f6aaa0 <-- ffffffff81f6f2ed 4.00% ffffffff81f6aaa0 <-- ffffffff811628db 4.00% ffffffff81f6aaa0 <-- ffffffff81f6ec5b 2.00% ffffffff81f6aaa0 <-- ffffffff81f6f21a 1.00% ffffffff81f6aaa0 <-- ffffffff811b04af 1.00% ffffffff81f6aaa0 <-- ffffffff8143ce17 After this patch: \# Samples: 36 of event 'ftrace:function' \# Event count (approx.): 36 \# \# Overhead Trace output \# ........ ............................................ \# 38.89% schedule <-- schedule_hrtimeout_range_clock 27.78% schedule <-- worker_thread 13.89% schedule <-- schedule_timeout 11.11% schedule <-- smpboot_thread_fn 5.56% schedule <-- rcu_gp_kthread 2.78% schedule <-- exit_to_usermode_loop Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Changbin Du <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2019-02-11bpf: fix lockdep false positive in stackmapAlexei Starovoitov1-1/+7
Lockdep warns about false positive: [ 11.211460] ------------[ cut here ]------------ [ 11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0) [ 11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280 [ 11.213134] Modules linked in: [ 11.214954] RIP: 0010:lock_release+0x1ad/0x280 [ 11.223508] Call Trace: [ 11.223705] <IRQ> [ 11.223874] ? __local_bh_enable+0x7a/0x80 [ 11.224199] up_read+0x1c/0xa0 [ 11.224446] do_up_read+0x12/0x20 [ 11.224713] irq_work_run_list+0x43/0x70 [ 11.225030] irq_work_run+0x26/0x50 [ 11.225310] smp_irq_work_interrupt+0x57/0x1f0 [ 11.225662] irq_work_interrupt+0xf/0x20 since rw_semaphore is released in a different task vs task that locked the sema. It is expected behavior. Fix the warning with up_read_non_owner() and rwsem_release() annotation. Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context") Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-02-11perf/x86: Add check_period PMU callbackJiri Olsa1-0/+16
Vince (and later on Ravi) reported crashes in the BTS code during fuzzing with the following backtrace: general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:perf_prepare_sample+0x8f/0x510 ... Call Trace: <IRQ> ? intel_pmu_drain_bts_buffer+0x194/0x230 intel_pmu_drain_bts_buffer+0x160/0x230 ? tick_nohz_irq_exit+0x31/0x40 ? smp_call_function_single_interrupt+0x48/0xe0 ? call_function_single_interrupt+0xf/0x20 ? call_function_single_interrupt+0xa/0x20 ? x86_schedule_events+0x1a0/0x2f0 ? x86_pmu_commit_txn+0xb4/0x100 ? find_busiest_group+0x47/0x5d0 ? perf_event_set_state.part.42+0x12/0x50 ? perf_mux_hrtimer_restart+0x40/0xb0 intel_pmu_disable_event+0xae/0x100 ? intel_pmu_disable_event+0xae/0x100 x86_pmu_stop+0x7a/0xb0 x86_pmu_del+0x57/0x120 event_sched_out.isra.101+0x83/0x180 group_sched_out.part.103+0x57/0xe0 ctx_sched_out+0x188/0x240 ctx_resched+0xa8/0xd0 __perf_event_enable+0x193/0x1e0 event_function+0x8e/0xc0 remote_function+0x41/0x50 flush_smp_call_function_queue+0x68/0x100 generic_smp_call_function_single_interrupt+0x13/0x30 smp_call_function_single_interrupt+0x3e/0xe0 call_function_single_interrupt+0xf/0x20 </IRQ> The reason is that while event init code does several checks for BTS events and prevents several unwanted config bits for BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows to create BTS event without those checks being done. Following sequence will cause the crash: If we create an 'almost' BTS event with precise_ip and callchains, and it into a BTS event it will crash the perf_prepare_sample() function because precise_ip events are expected to come in with callchain data initialized, but that's not the case for intel_pmu_drain_bts_buffer() caller. Adding a check_period callback to be called before the period is changed via PERF_EVENT_IOC_PERIOD. It will deny the change if the event would become BTS. Plus adding also the limit_period check as well. Reported-by: Vince Weaver <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11futex: Convert futex_pi_state.refcount to refcount_tElena Reshetova1-7/+8
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable futex_pi_state.refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. Please check Documentation/core-api/refcount-vs-atomic.rst for more information. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the futex_pi_state.refcount it might make a difference in following places: - get_pi_state() and exit_pi_state_list(): increment in refcount_inc_not_zero() only guarantees control dependency on success vs. fully ordered atomic counterpart - put_pi_state(): decrement in refcount_dec_and_test() provides RELEASE ordering and ACQUIRE ordering on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11Merge 5.0-rc6 into driver-core-nextGreg Kroah-Hartman33-144/+409
We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-02-11sched/fair: Remove unused 'sd' parameter from select_idle_smt()Viresh Kumar1-3/+3
The 'sd' parameter isn't getting used in select_idle_smt(), drop it. Signed-off-by: Viresh Kumar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vincent Guittot <[email protected]> Link: http://lkml.kernel.org/r/f91c5e118183e79d4a982e9ac4ce5e47948f6c1b.1549536337.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment blockValentin Schneider1-9/+2
The comment block for that function lists the heuristics for triggering a nohz kick, but the most recent ones (blocked load updates, misfit) aren't included, and some of them (LLC nohz logic, asym packing) are no longer in sync with the code. The conditions are either simple enough or properly commented, so get rid of that list instead of letting it grow. Signed-off-by: Valentin Schneider <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11sched/fair: Explain LLC nohz kick conditionValentin Schneider1-2/+7
Provide a comment explaining the LLC related nohz kick in nohz_balancer_kick(). Signed-off-by: Valentin Schneider <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11sched/fair: Simplify nohz_balancer_kick()Valentin Schneider1-6/+2
Calling 'nohz_balance_exit_idle(rq)' will always clear 'rq->cpu' from 'nohz.idle_cpus_mask' if it is set. Since it is called at the top of 'nohz_balancer_kick()', 'rq->cpu' will never be set in 'nohz.idle_cpus_mask' if it is accessed in the rest of the function. Combine the 'sched_domain_span()' with 'nohz.idle_cpus_mask' and drop the '(i == cpu)' check since 'rq->cpu' will never be iterated over. While at it, clean up a condition alignment. Signed-off-by: Valentin Schneider <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11sched/topology: Fix percpu data types in struct sd_data & struct s_dataLuc Van Oostenryck1-1/+1
The percpu members of struct sd_data and s_data are declared as: struct ... ** __percpu member; So their type is: __percpu pointer to pointer to struct ... But looking at how they're used, their type should be: pointer to __percpu pointer to struct ... and they should thus be declared as: struct ... * __percpu *member; So fix the placement of '__percpu' in the definition of these structures. This addresses a bunch of Sparse's warnings like: warning: incorrect type in initializer (different address spaces) expected void const [noderef] <asn:3> *__vpp_verify got struct sched_domain ** Signed-off-by: Luc Van Oostenryck <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11sched/fair: Simplify post_init_entity_util_avg() by calling it with a ↵Dietmar Eggemann3-24/+18
task_struct pointer argument Since commit: d03266910a53 ("sched/fair: Fix task group initialization") the utilization of a sched entity representing a task group is no longer initialized to any other value than 0. So post_init_entity_util_avg() is only used for tasks, not for sched_entities. Make this clear by calling it with a task_struct pointer argument which also eliminates the entity_is_task(se) if condition in the fork path and get rid of the stale comment in remove_entity_load_avg() accordingly. Signed-off-by: Dietmar Eggemann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Morten Rasmussen <[email protected]> Cc: Patrick Bellasi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11sched/fair: Fix O(nr_cgroups) in the load balancing pathVincent Guittot1-9/+34
This re-applies the commit reverted here: commit c40f7d74c741 ("sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c") I.e. now that cfs_rq can be safely removed/added in the list, we can re-apply: commit a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11sched/fair: Optimize update_blocked_averages()Vincent Guittot1-5/+21
Removing a cfs_rq from rq->leaf_cfs_rq_list can break the parent/child ordering of the list when it will be added back. In order to remove an empty and fully decayed cfs_rq, we must remove its children too, so they will be added back in the right order next time. With a normal decay of PELT, a parent will be empty and fully decayed if all children are empty and fully decayed too. In such a case, we just have to ensure that the whole branch will be added when a new task is enqueued. This is default behavior since : commit f6783319737f ("sched/fair: Fix insertion in rq->leaf_cfs_rq_list") In case of throttling, the PELT of throttled cfs_rq will not be updated whereas the parent will. This breaks the assumption made above unless we remove the children of a cfs_rq that is throttled. Then, they will be added back when unthrottled and a sched_entity will be enqueued. As throttled cfs_rq are now removed from the list, we can remove the associated test in update_blocked_averages(). Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-02-11Merge tag 'v5.0-rc6' into sched/core, to pick up fixesIngo Molnar26-114/+282
Signed-off-by: Ingo Molnar <[email protected]>
2019-02-10bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sockMartin KaFai Lau1-2/+29
This patch adds a helper function BPF_FUNC_tcp_sock and it is currently available for cg_skb and sched_(cls|act): struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk); int cg_skb_foo(struct __sk_buff *skb) { struct bpf_tcp_sock *tp; struct bpf_sock *sk; __u32 snd_cwnd; sk = skb->sk; if (!sk) return 1; tp = bpf_tcp_sock(sk); if (!tp) return 1; snd_cwnd = tp->snd_cwnd; /* ... */ return 1; } A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide read-only access. bpf_tcp_sock has all the existing tcp_sock's fields that has already been exposed by the bpf_sock_ops. i.e. no new tcp_sock's fields are exposed in bpf.h. This helper returns a pointer to the tcp_sock. If it is not a tcp_sock or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it returns NULL. Hence, the caller needs to check for NULL before accessing it. The current use case is to expose members from tcp_sock to allow a cg_skb_bpf_prog to provide per cgroup traffic policing/shaping. Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-02-10bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helperMartin KaFai Lau1-40/+92
In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)" before accessing the fields in sock. For example, in __netdev_pick_tx: static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev) { /* ... */ struct sock *sk = skb->sk; if (queue_index != new_index && sk && sk_fullsock(sk) && rcu_access_pointer(sk->sk_dst_cache)) sk_tx_queue_set(sk, new_index); /* ... */ return queue_index; } This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff" where a few of the convert_ctx_access() in filter.c has already been accessing the skb->sk sock_common's fields, e.g. sock_ops_convert_ctx_access(). "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier. Some of the fileds in "bpf_sock" will not be directly accessible through the "__sk_buff->sk" pointer. It is limited by the new "bpf_sock_common_is_valid_access()". e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock are not allowed. The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)" can be used to get a sk with all accessible fields in "bpf_sock". This helper is added to both cg_skb and sched_(cls|act). int cg_skb_foo(struct __sk_buff *skb) { struct bpf_sock *sk; sk = skb->sk; if (!sk) return 1; sk = bpf_sk_fullsock(sk); if (!sk) return 1; if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP) return 1; /* some_traffic_shaping(); */ return 1; } (1) The sk is read only (2) There is no new "struct bpf_sock_common" introduced. (3) Future kernel sock's members could be added to bpf_sock only instead of repeatedly adding at multiple places like currently in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc. (4) After "sk = skb->sk", the reg holding sk is in type PTR_TO_SOCK_COMMON_OR_NULL. (5) After bpf_sk_fullsock(), the return type will be in type PTR_TO_SOCKET_OR_NULL which is the same as the return type of bpf_sk_lookup_xxx(). However, bpf_sk_fullsock() does not take refcnt. The acquire_reference_state() is only depending on the return type now. To avoid it, a new is_acquire_function() is checked before calling acquire_reference_state(). (6) The WARN_ON in "release_reference_state()" is no longer an internal verifier bug. When reg->id is not found in state->refs[], it means the bpf_prog does something wrong like "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has never been acquired by calling "bpf_sk_fullsock(skb->sk)". A -EINVAL and a verbose are done instead of WARN_ON. A test is added to the test_verifier in a later patch. Since the WARN_ON in "release_reference_state()" is no longer needed, "__release_reference_state()" is folded into "release_reference_state()" also. Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-02-10bpf: Fix narrow load on a bpf_sock returned from sk_lookup()Martin KaFai Lau1-4/+7
By adding this test to test_verifier: { "reference tracking: access sk->src_ip4 (narrow load)", .insns = { BPF_SK_LOOKUP, BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2), BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), BPF_EMIT_CALL(BPF_FUNC_sk_release), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, The above test loads 2 bytes from sk->src_ip4 where sk is obtained by bpf_sk_lookup_tcp(). It hits an internal verifier error from convert_ctx_accesses(): [root@arch-fb-vm1 bpf]# ./test_verifier 665 665 Failed to load prog 'Invalid argument'! 0: (b7) r2 = 0 1: (63) *(u32 *)(r10 -8) = r2 2: (7b) *(u64 *)(r10 -16) = r2 3: (7b) *(u64 *)(r10 -24) = r2 4: (7b) *(u64 *)(r10 -32) = r2 5: (7b) *(u64 *)(r10 -40) = r2 6: (7b) *(u64 *)(r10 -48) = r2 7: (bf) r2 = r10 8: (07) r2 += -48 9: (b7) r3 = 36 10: (b7) r4 = 0 11: (b7) r5 = 0 12: (85) call bpf_sk_lookup_tcp#84 13: (bf) r6 = r0 14: (15) if r0 == 0x0 goto pc+3 R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1 15: (69) r2 = *(u16 *)(r0 +26) 16: (bf) r1 = r6 17: (85) call bpf_sk_release#86 18: (95) exit from 14 to 18: safe processed 20 insns (limit 131072), stack depth 48 bpf verifier is misconfigured Summary: 0 PASSED, 0 SKIPPED, 1 FAILED The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly loaded (meaning load any 1 or 2 bytes of the src_ip4) by marking info->ctx_field_size. However, this marked ctx_field_size is not used. This patch fixes it. Due to the recent refactoring in test_verifier, this new test will be added to the bpf-next branch (together with the bpf_tcp_sock patchset) to avoid merge conflict. Fixes: c64b7983288e ("bpf: Add PTR_TO_SOCKET verifier type") Cc: Joe Stringer <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Joe Stringer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-02-10softirq: Don't skip softirq execution when softirq thread is parkingMatthias Kaehlcke1-1/+2
When a CPU is unplugged the kernel threads of this CPU are parked (see smpboot_park_threads()). kthread_park() is used to mark each thread as parked and wake it up, so it can complete the process of parking itselfs (see smpboot_thread_fn()). If local softirqs are pending on interrupt exit invoke_softirq() is called to process the softirqs, however it skips processing when the softirq kernel thread of the local CPU is scheduled to run. The softirq kthread is one of the threads that is parked when a CPU is unplugged. Parking the kthread wakes it up, however only to complete the parking process, not to process the pending softirqs. Hence processing of softirqs at the end of an interrupt is skipped, but not done elsewhere, which can result in warnings about pending softirqs when a CPU is unplugged: /sys/devices/system/cpu # echo 0 > cpu4/online [ ... ] NOHZ: local_softirq_pending 02 [ ... ] NOHZ: local_softirq_pending 202 [ ... ] CPU4: shutdown [ ... ] psci: CPU4 killed. Don't skip processing of softirqs at the end of an interrupt when the softirq thread of the CPU is parking. Signed-off-by: Matthias Kaehlcke <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-02-10kthread: Add __kthread_should_park()Matthias Kaehlcke1-1/+7
kthread_should_park() is used to check if the calling kthread ('current') should park, but there is no function to check whether an arbitrary kthread should be parked. The latter is required to plug a CPU hotplug race vs. a parking ksoftirqd thread. The new __kthread_should_park() receives a task_struct as parameter to check if the corresponding kernel thread should be parked. Call __kthread_should_park() from kthread_should_park() to avoid code duplication. Signed-off-by: Matthias Kaehlcke <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-02-10genirq: Avoid summation loops for /proc/statThomas Gleixner3-4/+23
Waiman reported that on large systems with a large amount of interrupts the readout of /proc/stat takes a long time to sum up the interrupt statistics. In principle this is not a problem. but for unknown reasons some enterprise quality software reads /proc/stat with a high frequency. The reason for this is that interrupt statistics are accounted per cpu. So the /proc/stat logic has to sum up the interrupt stats for each interrupt. This can be largely avoided for interrupts which are not marked as 'PER_CPU' interrupts by simply adding a per interrupt summation counter which is incremented along with the per interrupt per cpu counter. The PER_CPU interrupts need to avoid that and use only per cpu accounting because they share the interrupt number and the interrupt descriptor and concurrent updates would conflict or require unwanted synchronization. Reported-by: Waiman Long <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Waiman Long <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Kees Cook <[email protected]> Cc: [email protected] Cc: Davidlohr Bueso <[email protected]> Cc: Miklos Szeredi <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Randy Dunlap <[email protected]> Link: https://lkml.kernel.org/r/[email protected] 8<------------- v2: Undo the unintentional layout change of struct irq_desc. include/linux/irqdesc.h | 1 + kernel/irq/chip.c | 12 ++++++++++-- kernel/irq/internals.h | 8 +++++++- kernel/irq/irqdesc.c | 7 ++++++- 4 files changed, 24 insertions(+), 4 deletions(-)
2019-02-10Merge tag 'y2038-new-syscalls' of ↵Thomas Gleixner14-157/+153
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038 Pull y2038 - time64 system calls from Arnd Bergmann: This series finally gets us to the point of having system calls with 64-bit time_t on all architectures, after a long time of incremental preparation patches. There was actually one conversion that I missed during the summer, i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes and review comments. The following system calls are now added on all 32-bit architectures using the same system call numbers: 403 clock_gettime64 404 clock_settime64 405 clock_adjtime64 406 clock_getres_time64 407 clock_nanosleep_time64 408 timer_gettime64 409 timer_settime64 410 timerfd_gettime64 411 timerfd_settime64 412 utimensat_time64 413 pselect6_time64 414 ppoll_time64 416 io_pgetevents_time64 417 recvmmsg_time64 418 mq_timedsend_time64 419 mq_timedreceiv_time64 420 semtimedop_time64 421 rt_sigtimedwait_time64 422 futex_time64 423 sched_rr_get_interval_time64 Each one of these corresponds directly to an existing system call that includes a 'struct timespec' argument, or a structure containing a timespec or (in case of clock_adjtime) timeval. Not included here are new versions of getitimer/setitimer and getrusage/waitid, which are planned for the future but only needed to make a consistent API rather than for correct operation beyond y2038. These four system calls are based on 'timeval', and it has not been finally decided what the replacement kernel interface will use instead. So far, I have done a lot of build testing across most architectures, which has found a number of bugs. Runtime testing so far included testing LTP on 32-bit ARM with the existing system calls, to ensure we do not regress for existing binaries, and a test with a 32-bit x86 build of LTP against a modified version of the musl C library that has been adapted to the new system call interface [3]. This library can be used for testing on all architectures supported by musl-1.1.21, but it is not how the support is getting integrated into the official musl release. Official musl support is planned but will require more invasive changes to the library. Link: https://lore.kernel.org/lkml/[email protected]/T/ Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
2019-02-10Merge tag 'y2038-syscall-cleanup' of ↵Thomas Gleixner1-0/+4
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038 Pull preparatory work for y2038 changes from Arnd Bergmann: System call unification and cleanup The system call tables have diverged a bit over the years, and a number of the recent additions never made it into all architectures, for one reason or another. This is an attempt to clean it up as far as we can without breaking compatibility, doing a number of steps: - Add system calls that have not yet been integrated into all architectures but that we definitely want there. This includes {,f}statfs64() and get{eg,eu,g,p,u,pp}id() on alpha, which have been missing traditionally. - The s390 compat syscall handling is cleaned up to be more like what we do on other architectures, while keeping the 31-bit pointer extension. This was merged as a shared branch by the s390 maintainers and is included here in order to base the other patches on top. - Add the separate ipc syscalls on all architectures that traditionally only had sys_ipc(). This version is done without support for IPC_OLD that is we have in sys_ipc. The new semtimedop_time64 syscall will only be added here, not in sys_ipc - Add syscall numbers for a couple of syscalls that we probably don't need everywhere, in particular pkey_* and rseq, for the purpose of symmetry: if it's in asm-generic/unistd.h, it makes sense to have it everywhere. I expect that any future system calls will get assigned on all platforms together, even when they appear to be specific to a single architecture. - Prepare for having the same system call numbers for any future calls. In combination with the generated tables, this hopefully makes it easier to add new calls across all architectures together. All of the above are technically separate from the y2038 work, but are done as preparation before we add the new 64-bit time_t system calls everywhere, providing a common baseline set of system calls. I expect that glibc and other libraries that want to use 64-bit time_t will require linux-5.1 kernel headers for building in the future, and at a much later point may also require linux-5.1 or a later version as the minimum kernel at runtime. Having a common baseline then allows the removal of many architecture or kernel version specific workarounds.
2019-02-10genirq/affinity: Move allocation of 'node_to_cpumask' to ↵Ming Lei1-14/+13
irq_build_affinity_masks() 'node_to_cpumask' is just one temparay variable for irq_build_affinity_masks(), so move it into irq_build_affinity_masks(). No functioanl change. Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Bjorn Helgaas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Cc: Sagi Grimberg <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-02-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A couple of kernel side fixes: - Fix the Intel uncore driver on certain hardware configurations - Fix a CPU hotplug related memory allocation bug - Remove a spurious WARN() ... plus also a handful of perf tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf script python: Add Python3 support to tests/attr.py perf trace: Support multiple "vfs_getname" probes perf symbols: Filter out hidden symbols from labels perf symbols: Add fallback definitions for GELF_ST_VISIBILITY() tools headers uapi: Sync linux/in.h copy from the kernel sources perf clang: Do not use 'return std::move(something)' perf mem/c2c: Fix perf_mem_events to support powerpc perf tests evsel-tp-sched: Fix bitwise operator perf/core: Don't WARN() for impossible ring-buffer sizes perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu() perf/x86/intel/uncore: Add Node ID mask
2019-02-10Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2-17/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "An rtmutex (PI-futex) deadlock scenario fix, plus a locking documentation fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Handle early deadlock return correctly futex: Fix barrier comment
2019-02-09bpf: Fix narrow load on a bpf_sock returned from sk_lookup()Martin KaFai Lau1-4/+7
By adding this test to test_verifier: { "reference tracking: access sk->src_ip4 (narrow load)", .insns = { BPF_SK_LOOKUP, BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2), BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), BPF_EMIT_CALL(BPF_FUNC_sk_release), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, The above test loads 2 bytes from sk->src_ip4 where sk is obtained by bpf_sk_lookup_tcp(). It hits an internal verifier error from convert_ctx_accesses(): [root@arch-fb-vm1 bpf]# ./test_verifier 665 665 Failed to load prog 'Invalid argument'! 0: (b7) r2 = 0 1: (63) *(u32 *)(r10 -8) = r2 2: (7b) *(u64 *)(r10 -16) = r2 3: (7b) *(u64 *)(r10 -24) = r2 4: (7b) *(u64 *)(r10 -32) = r2 5: (7b) *(u64 *)(r10 -40) = r2 6: (7b) *(u64 *)(r10 -48) = r2 7: (bf) r2 = r10 8: (07) r2 += -48 9: (b7) r3 = 36 10: (b7) r4 = 0 11: (b7) r5 = 0 12: (85) call bpf_sk_lookup_tcp#84 13: (bf) r6 = r0 14: (15) if r0 == 0x0 goto pc+3 R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1 15: (69) r2 = *(u16 *)(r0 +26) 16: (bf) r1 = r6 17: (85) call bpf_sk_release#86 18: (95) exit from 14 to 18: safe processed 20 insns (limit 131072), stack depth 48 bpf verifier is misconfigured Summary: 0 PASSED, 0 SKIPPED, 1 FAILED The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly loaded (meaning load any 1 or 2 bytes of the src_ip4) by marking info->ctx_field_size. However, this marked ctx_field_size is not used. This patch fixes it. Due to the recent refactoring in test_verifier, this new test will be added to the bpf-next branch (together with the bpf_tcp_sock patchset) to avoid merge conflict. Fixes: c64b7983288e ("bpf: Add PTR_TO_SOCKET verifier type") Cc: Joe Stringer <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Joe Stringer <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-02-09Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', ↵Paul E. McKenney17-335/+131
'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD doc.2019.01.26a: Documentation updates. fixes.2019.01.26a: Miscellaneous fixes. sil.2019.01.26a: Removal of a few more spin_is_locked() instances. spdx.2019.02.09a: Add SPDX identifiers to RCU files srcu.2019.01.26a: SRCU updates. torture.2019.01.26a: Torture-test updates.
2019-02-09locking/locktorture: Convert to SPDX license identifierPaul E. McKenney1-16/+3
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09torture: Convert to SPDX license identifierPaul E. McKenney1-16/+3
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/update: Convert to SPDX license identifierPaul E. McKenney1-15/+2
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/tree: Convert to SPDX license identifierPaul E. McKenney4-61/+9
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> [ paulmck: Update .h file SPDX comment format per Joe Perches. ] Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/tiny: Convert to SPDX license identifierPaul E. McKenney1-15/+2
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/sync: Convert to SPDX license identifierPaul E. McKenney1-14/+1
Replace the license boiler plate with a SPDX license identifier. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/srcu: Convert to SPDX license identifierPaul E. McKenney2-30/+4
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/rcutorture: Convert to SPDX license identifierPaul E. McKenney1-16/+3
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/rcu_segcblist: Convert to SPDX license identifierPaul E. McKenney2-30/+4
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/rcuperf: Convert to SPDX license identifierPaul E. McKenney1-16/+3
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-09rcu/rcu.h: Convert to SPDX license identifierPaul E. McKenney1-15/+2
Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2019-02-08Merge branch 'for-linus' of ↵Linus Torvalds1-4/+59
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal fixes from Eric Biederman: "This contains four small fixes for signal handling. A missing range check, a regression fix, prioritizing signals we have already started a signal group exit for, and better detection of synchronous signals. The confused decision of which signals to handle failed spectacularly when a timer was pointed at SIGBUS and the stack overflowed. Resulting in an unkillable process in an infinite loop instead of a SIGSEGV and core dump" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Better detection of synchronous signals signal: Always notice exiting tasks signal: Always attempt to allocate siginfo for SIGSTOP signal: Make siginmask safe when passed a signal of 0
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller17-81/+129
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <[email protected]>
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds7-30/+50
Pull networking fixes from David Miller: "This pull request is dedicated to the upcoming snowpocalypse parts 2 and 3 in the Pacific Northwest: 1) Drop profiles are broken because some drivers use dev_kfree_skb* instead of dev_consume_skb*, from Yang Wei. 2) Fix IWLWIFI kconfig deps, from Luca Coelho. 3) Fix percpu maps updating in bpftool, from Paolo Abeni. 4) Missing station release in batman-adv, from Felix Fietkau. 5) Fix some networking compat ioctl bugs, from Johannes Berg. 6) ucc_geth must reset the BQL queue state when stopping the device, from Mathias Thore. 7) Several XDP bug fixes in virtio_net from Toshiaki Makita. 8) TSO packets must be sent always on queue 0 in stmmac, from Jose Abreu. 9) Fix socket refcounting bug in RDS, from Eric Dumazet. 10) Handle sparse cpu allocations in bpf selftests, from Martynas Pumputis. 11) Make sure mgmt frames have enough tailroom in mac80211, from Felix Feitkau. 12) Use safe list walking in sctp_sendmsg() asoc list traversal, from Greg Kroah-Hartman. 13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL ccid, from Eric Dumazet. 14) Need to reload WoL password into bcmsysport device after deep sleeps, from Florian Fainelli. 15) Remove filter from mask before freeing in cls_flower, from Petr Machata. 16) Missing release and use after free in error paths of s390 qeth code, from Julian Wiedmann. 17) Fix lockdep false positive in dsa code, from Marc Zyngier. 18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn. 19) Fix EQ firmware assert in qed driver, from Manish Chopra. 20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net: dsa: b53: Fix for failure when irq is not defined in dt sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach() geneve: should not call rt6_lookup() when ipv6 was disabled net: Don't default Cavium PTP driver to 'y' net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles net/mlx5e: Don't overwrite pedit action when multiple pedit used net/mlx5e: Update hw flows when encap source mac changed qed*: Advance drivers version to 8.37.0.20 qed: Change verbosity for coalescing message. qede: Fix system crash on configuring channels. qed: Consider TX tcs while deriving the max num_queues for PF. ...
2019-02-08Merge tag 'driver-core-5.0-rc6' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core fixes for 5.0-rc6. Well, not so much "driver core" as "debugfs". There's a lot of outstanding debugfs cleanup patches coming in through different subsystem trees, and in that process the debugfs core was found that it really should return errors when something bad happens, to prevent random files from showing up in the root of debugfs afterward. So debugfs was fixed up to handle this properly, and then two fixes for the relay and blk-mq code was needed as it was making invalid assumptions about debugfs return values. There's also a cacheinfo fix in here that resolves a tiny issue. All of these have been in linux-next for over a week with no reported problems" * tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: blk-mq: protect debugfs_create_files() from failures relay: check return of create_buf_file() properly debugfs: debugfs_lookup() should return NULL if not found debugfs: return error values, not NULL debugfs: fix debugfs_rename parameter checking cacheinfo: Keep the old value if of_property_read_u32 fails
2019-02-08printk: Export console_printkPrarit Bhargava1-0/+1
The fbcon can be built as a module and requires console_printk. Export console_printk. Signed-off-by: Prarit Bhargava <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Acked-by: Petr Mladek <[email protected]> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
2019-02-08futex: Handle early deadlock return correctlyThomas Gleixner2-15/+50
commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") changed the locking rules in the futex code so that the hash bucket lock is not longer held while the waiter is enqueued into the rtmutex wait list. This made the lock and the unlock path symmetric, but unfortunately the possible early exit from __rt_mutex_proxy_start() due to a detected deadlock was not updated accordingly. That allows a concurrent unlocker to observe inconsitent state which triggers the warning in the unlock path. futex_lock_pi() futex_unlock_pi() lock(hb->lock) queue(hb_waiter) lock(hb->lock) lock(rtmutex->wait_lock) unlock(hb->lock) // acquired hb->lock hb_waiter = futex_top_waiter() lock(rtmutex->wait_lock) __rt_mutex_proxy_start() ---> fail remove(rtmutex_waiter); ---> returns -EDEADLOCK unlock(rtmutex->wait_lock) // acquired wait_lock wake_futex_pi() rt_mutex_next_owner() --> returns NULL --> WARN lock(hb->lock) unqueue(hb_waiter) The problem is caused by the remove(rtmutex_waiter) in the failure case of __rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the hash bucket but no waiter on the rtmutex, i.e. inconsistent state. The original commit handles this correctly for the other early return cases (timeout, signal) by delaying the removal of the rtmutex waiter until the returning task reacquired the hash bucket lock. Treat the failure case of __rt_mutex_proxy_start() in the same way and let the existing cleanup code handle the eventual handover of the rtmutex gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter removal for the failure case, so that the other callsites are still operating correctly. Add proper comments to the code so all these details are fully documented. Thanks to Peter for helping with the analysis and writing the really valuable code comments. Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex") Reported-by: Heiko Carstens <[email protected]> Co-developed-by: Peter Zijlstra <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Heiko Carstens <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: [email protected] Cc: Stefan Liebler <[email protected]> Cc: Sebastian Sewior <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-02-08futex: Fix barrier commentDavidlohr Bueso1-2/+2
The current comment for the barrier that guarantees that waiter increment is always before taking the hb spinlock (barrier (A)) needs to be fixed as it is misplaced. This is obviously referring to hb_waiters_inc, which is a full barrier. Reported-by: Peter Zijlstra <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-02-07audit: hide auditsc_get_stamp and audit_serial prototypesRichard Guy Briggs1-0/+5
auditsc_get_stamp() and audit_serial() are internal audit functions so move their prototypes from include/linux/audit.h to kernel/audit.h so they are not visible to the rest of the kernel. Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Paul Moore <[email protected]>