Age | Commit message (Collapse) | Author | Files | Lines |
|
This reverts commit 2e9906f84fc7c99388bb7123ade167250d50f1c0.
It was turned out that commit 2e9906f84fc7 ("tracing: Add "(fault)"
name injection to kernel probes") did not work correctly and probe
events still show just '(fault)' (instead of '"(fault)"'). Also,
current '(fault)' is more explicit that it faulted.
This also moves FAULT_STRING macro to trace.h so that synthetic
event can keep using it, and uses it in trace_probe.c too.
Link: https://lore.kernel.org/all/168908495772.123124.1250788051922100079.stgit@devnote2/
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Cc: Tom Zanussi <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
Fix to update dynamic data counter ('dyndata') and max length ('maxlen')
only if the fetcharg uses the dynamic data. Also get out arg->dynamic
from unlikely(). This makes dynamic data address wrong if
process_fetch_insn() returns error on !arg->dynamic case.
Link: https://lore.kernel.org/all/168908494781.123124.8160245359962103684.stgit@devnote2/
Suggested-by: Steven Rostedt <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 9178412ddf5a ("tracing: probeevent: Return consumed bytes of dynamic area")
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
Fix not to count the error code (which is minus value) to the total
used length of array, because it can mess up the return code of
process_fetch_insn_bottom(). Also clear the 'ret' value because it
will be used for calculating next data_loc entry.
Link: https://lore.kernel.org/all/168908493827.123124.2175257289106364229.stgit@devnote2/
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 9b960a38835f ("tracing: probeevent: Unify fetch_insn processing common part")
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.
Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@devnote2/
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string")
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
Add a comment the reason why fprobe_kprobe_handler() exits if any other
kprobe is running.
Link: https://lore.kernel.org/all/168874788299.159442.2485957441413653858.stgit@devnote2/
Suggested-by: Steven Rostedt <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
|
|
DMA direction should be taken in dma_unmap_page() for unmapping integrity
data.
Fix this DMA direction, and reported in Guangwu's test.
Reported-by: Guangwu Zhang <[email protected]>
Fixes: 4aedb705437f ("nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data")
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
While duplicate IDs are still very harmful, including the potential to easily
see changing devices in /dev/disk/by-id, it turn out they are extremely
common for cheap end user NVMe devices.
Relax our check for them for so that it doesn't reject the probe on
single-ported PCIe devices, but prints a big warning instead. In doubt
we'd still like to see quirk entries to disable the potential for
changing supposed stable device identifier links, but this will at least
allow users how have two (or more) of these devices to use them without
having to manually add a new PCI ID entry with the quirk through sysfs or
by patching the kernel.
Fixes: 2079f41ec6ff ("nvme: check that EUI/GUID/UUID are globally unique")
Cc: [email protected] # 6.0+
Co-developed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
|
|
kmemleak reports:
unreferenced object 0xffff88814d14e200 (size 256):
comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
hex dump (first 32 bytes):
04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00 ................
0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff .........Z......
backtrace:
[<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
[<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
[<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
[<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
[<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
[<ffffffff9bf03a43>] vfs_read+0x143/0x520
[<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
[<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
[<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
in trace_find_next_entry() but not freed before 'trace_pipe' is closed.
To fix it, free 'iter->temp' in tracing_release_pipe().
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Fixes: ff895103a84ab ("tracing: Save off entry when peeking at next entry")
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
select_idle_capacity() not only looks for an idle cpu that fits for the
waking task but also for cpu with highest bandwidth when no cpu fits.
Start the loop with target cpu so it will be selected 1st when no cpu fits
but several cpus shared the same bandwidth. Starting with target cpu
prevents the task to migrate between cpus with same bandwidth at every
wakeup when no cpu fits.
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
There have been a case where the SD_SHARE_CPUCAPACITY sched group flag
in a parent domain were not set and propagated properly when a degenerate
domain is removed.
Add dump of domain sched group flags of a CPU to make debug easier
in the future.
Usage:
cat /debug/sched/domains/cpu0/domain1/groups_flags
to dump cpu0 domain1's sched group flags.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lore.kernel.org/r/ed1749262d94d95a8296c86a415999eda90bcfe3.1688770494.git.tim.c.chen@linux.intel.com
|
|
With the SMT vs non-SMT balancing issues sorted, also enable the
cluster domain for Hybrid machines.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
|
|
should_we_balance() traverses the group_balance_mask (AND'ed with lb_env::
cpus) starting from lower numbered CPUs looking for the first idle CPU.
In hybrid x86 systems, the siblings of SMT cores get CPU numbers, before
non-SMT cores:
[0, 1] [2, 3] [4, 5] 6 7 8 9
b i b i b i b i i i
In the figure above, CPUs in brackets are siblings of an SMT core. The
rest are non-SMT cores. 'b' indicates a busy CPU, 'i' indicates an
idle CPU.
We should let a CPU on a fully idle core get the first chance to idle
load balance as it has more CPU capacity than a CPU on an idle SMT
CPU with busy sibling. So for the figure above, if we are running
should_we_balance() to CPU 1, we should return false to let CPU 7 on
idle core to have a chance first to idle load balance.
A partially busy (i.e., of type group_has_spare) local group with SMT
cores will often have only one SMT sibling busy. If the destination CPU
is a non-SMT core, partially busy, lower-numbered, SMT cores should not
be considered when finding the first idle CPU.
However, in should_we_balance(), when we encounter idle SMT first in partially
busy core, we prematurely break the search for the first idle CPU.
Higher-numbered, non-SMT cores is not given the chance to have
idle balance done on their behalf. Those CPUs will only be considered
for idle balancing by chance via CPU_NEWLY_IDLE.
Instead, consider the idle state of the whole SMT core.
Signed-off-by: Ricardo Neri <[email protected]>
Co-developed-by: Tim Chen <[email protected]>
Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/807bdd05331378ea3bf5956bda87ded1036ba769.1688770494.git.tim.c.chen@linux.intel.com
|
|
asymmetric groups
In the current prefer sibling load balancing code, there is an implicit
assumption that the busiest sched group and local sched group are
equivalent, hence the tasks to be moved is simply the difference in
number of tasks between the two groups (i.e. imbalance) divided by two.
However, we may have different number of cores between the cluster groups,
say when we take CPU offline or we have hybrid groups. In that case,
we should balance between the two groups such that #tasks/#cores ratio
is the same between the same between both groups. Hence the imbalance
computed will need to reflect this.
Adjust the sibling imbalance computation to take into account of the
above considerations.
Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/4eacbaa236e680687dae2958378a6173654113df.1688770494.git.tim.c.chen@linux.intel.com
|
|
When balancing sibling domains that have different number of cores,
tasks in respective sibling domain should be proportional to the
number of cores in each domain. In preparation of implementing such a
policy, record the number of cores in a scheduling group.
Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/04641eeb0e95c21224352f5743ecb93dfac44654.1688770494.git.tim.c.chen@linux.intel.com
|
|
On hybrid CPUs with scheduling cluster enabled, we will need to
consider balancing between SMT CPU cluster, and Atom core cluster.
Below shows such a hybrid x86 CPU with 4 big cores and 8 atom cores.
Each scheduling cluster span a L2 cache.
--L2-- --L2-- --L2-- --L2-- ----L2---- -----L2------
[0, 1] [2, 3] [4, 5] [5, 6] [7 8 9 10] [11 12 13 14]
Big Big Big Big Atom Atom
core core core core Module Module
If the busiest group is a big core with both SMT CPUs busy, we should
active load balance if destination group has idle CPU cores. Such
condition is considered by asym_active_balance() in load balancing but not
considered when looking for busiest group and computing load imbalance.
Add this consideration in find_busiest_group() and calculate_imbalance().
In addition, update the logic determining the busier group when one group
is SMT and the other group is non SMT but both groups are partially busy
with idle CPU. The busier group should be the group with idle cores rather
than the group with one busy SMT CPU. We do not want to make the SMT group
the busiest one to pull the only task off SMT CPU and causing the whole core to
go empty.
Otherwise suppose in the search for the busiest group, we first encounter
an SMT group with 1 task and set it as the busiest. The destination
group is an atom cluster with 1 task and we next encounter an atom
cluster group with 3 tasks, we will not pick this atom cluster over the
SMT group, even though we should. As a result, we do not load balance
the busier Atom cluster (with 3 tasks) towards the local atom cluster
(with 1 task). And it doesn't make sense to pick the 1 task SMT group
as the busier group as we also should not pull task off the SMT towards
the 1 task atom cluster and make the SMT core completely empty.
Signed-off-by: Tim Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/e24f35d142308790f69be65930b82794ef6658a2.1688770494.git.tim.c.chen@linux.intel.com
|
|
The static key psi_cgroups_enabled is only used inside file psi.c.
Make it static.
Signed-off-by: Miaohe Lin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
As core scheduling introduced, a new state of idle is defined as
force idle, running idle task but nr_running greater than zero.
If a cpu is in force idle state, idle_cpu() will return zero. This
result makes sense in some scenarios, e.g., load balance,
showacpu when dumping, and judge the RCU boost kthread is starving.
But this will cause error in other scenarios, e.g., tick_irq_exit():
When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
not become 1, which became 0 in tick_nohz_irq_enter().
ts->idle_sleeptime won't update in function update_ts_time_stats(), if
ts->idle_active is 0, which should be 1. And this bug will result that
ts->idle_sleeptime is less than the actual value, and finally will
result that the idle time in /proc/stat is less than the actual value.
To solve this problem, we introduce sched_core_idle_cpu(), which
returns 1 when force idle. We audit all users of idle_cpu(), and
change idle_cpu() into sched_core_idle_cpu() in function
tick_irq_exit().
v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
function tick_irq_exit(). And modify the corresponding commit log.
Signed-off-by: Cruz Zhao <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Peter Zijlstra <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Joel Fernandes <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
We currently export the total throttled time for cgroups that are given
a bandwidth limit. This patch extends this accounting to also account
the total time that each children cgroup has been throttled.
This is useful to understand the degree to which children have been
affected by the throttling control. Children which are not runnable
during the entire throttled period, for example, will not show any
self-throttling time during this period.
Expose this in a new interface, 'cpu.stat.local', which is similar to
how non-hierarchical events are accounted in 'memory.events.local'.
Signed-off-by: Josh Don <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It is easy for a cfs_rq to become throttled even when it has no enqueued
entities (for example, if we have just put_prev()'d the last runnable
task of the cfs_rq, and the cfs_rq is out of quota).
Avoid accounting this time towards total throttle time, since it
otherwise falsely inflates the stats.
Note that the dequeue path is special, since we normally disallow
migrations when a task is in a throttled hierarchy (see
throttled_lb_pair()).
Signed-off-by: Josh Don <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In put_task_struct(), a spin_lock is indirectly acquired under the kernel
stock. When running the kernel in real-time (RT) configuration, the
operation is dispatched to a preemptible context call to ensure
guaranteed preemption. However, if PROVE_RAW_LOCK_NESTING is enabled
and __put_task_struct() is called while holding a raw_spinlock, lockdep
incorrectly reports an "Invalid lock context" in the stock kernel.
This false splat occurs because lockdep is unaware of the different
route taken under RT. To address this issue, override the inner wait
type to prevent the false lockdep splat.
Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Sebastian Andrzej Siewior <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
locks. Therefore, it can't be called from an non-preemptible context.
One practical example is splat inside inactive_task_timer(), which is
called in a interrupt context:
CPU: 1 PID: 2848 Comm: life Kdump: loaded Tainted: G W ---------
Hardware name: HP ProLiant DL388p Gen8, BIOS P70 07/15/2012
Call Trace:
dump_stack_lvl+0x57/0x7d
mark_lock_irq.cold+0x33/0xba
mark_lock+0x1e7/0x400
mark_usage+0x11d/0x140
__lock_acquire+0x30d/0x930
lock_acquire.part.0+0x9c/0x210
rt_spin_lock+0x27/0xe0
refill_obj_stock+0x3d/0x3a0
kmem_cache_free+0x357/0x560
inactive_task_timer+0x1ad/0x340
__run_hrtimer+0x8a/0x1a0
__hrtimer_run_queues+0x91/0x130
hrtimer_interrupt+0x10f/0x220
__sysvec_apic_timer_interrupt+0x7b/0xd0
sysvec_apic_timer_interrupt+0x4f/0xd0
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0033:0x7fff196bf6f5
Instead of calling __put_task_struct() directly, we defer it using
call_rcu(). A more natural approach would use a workqueue, but since
in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
the code would become more complex because we would need to put the
work_struct instance in the task_struct and initialize it when we
allocate a new task_struct.
The issue is reproducible with stress-ng:
while true; do
stress-ng --sched deadline --sched-period 1000000000 \
--sched-runtime 800000000 --sched-deadline \
1000000000 --mmapfork 23 -t 20
done
Reported-by: Hu Chunyu <[email protected]>
Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Valentin Schneider <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
ceph_frame_desc::fd_lens is an int array. decode_preamble() thus
effectively casts u32 -> int but the checks for segment lengths are
written as if on unsigned values. While reading in HELLO or one of the
AUTH frames (before authentication is completed), arithmetic in
head_onwire_len() can get duped by negative ctrl_len and produce
head_len which is less than CEPH_PREAMBLE_LEN but still positive.
This would lead to a buffer overrun in prepare_read_control() as the
preamble gets copied to the newly allocated buffer of size head_len.
Cc: [email protected]
Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Reported-by: Thelford Williams <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
|
|
Pedro Tammela says:
====================
net/sched: fixes for sch_qfq
Patch 1 fixes a regression introduced in 6.4 where the MTU size could be
bigger than 'lmax'.
Patch 3 fixes an issue where the code doesn't account for qdisc_pkt_len()
returning a size bigger then 'lmax'.
Patches 2 and 4 are selftests for the issues above.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
A packet with stab overhead greater than QFQ_MAX_LMAX should be dropped
by the QFQ qdisc as it can't handle such lengths.
Signed-off-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Zhengchao Shao <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.
Consider the following code in net/sched/sch_qfq.c:
static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb), gso_segs;
// ...
if (unlikely(cl->agg->lmax < len)) {
pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
cl->agg->lmax, len, cl->common.classid);
err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
if (err) {
cl->qstats.drops++;
return qdisc_drop(skb, sch, to_free);
}
// ...
}
Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.
This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.
A user may choose virtually any size using such a table.
As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------
We can create the issue with the following commands:
tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead 999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2
This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: Lion <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
QFQ only supports a certain bound of MTU size so make sure
we check for this requirement in the tests.
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Zhengchao Shao <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
25369891fcef deletes a check for the case where no 'lmax' is
specified which 3037933448f6 previously fixed as 'lmax'
could be set to the device's MTU without any bound checking
for QFQ_LMAX_MIN and QFQ_LMAX_MAX. Therefore, reintroduce the check.
Fixes: 25369891fcef ("net/sched: sch_qfq: refactor parsing of netlink parameters")
Acked-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
A recent change to start counting SuperH IRQ #s from 16 breaks support
for the Hitachi HD64461 companion chip.
Move the offchip IRQ base and HD64461 IRQ # by 16 in order to
accommodate for the new virq numbering rules.
Fixes: a8ac2961148e ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Artur Rojek <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: John Paul Adrian Glaubitz <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
|
|
Take into account the virq offset when translating cascaded interrupts.
Fixes: a8ac2961148e8c72 ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: John Paul Adrian Glaubitz <[email protected]>
Link: https://lore.kernel.org/r/7d0cb246c9f1cd24bb1f637ec5cb67e799a4c3b8.1688908227.git.geert+renesas@glider.be
Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
|
|
Take into account the virq offset when translating cascaded IRL
interrupts.
Fixes: a8ac2961148e8c72 ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: John Paul Adrian Glaubitz <[email protected]>
Link: https://lore.kernel.org/r/4fcb0d08a2b372431c41e04312742dc9e41e1be4.1688908186.git.geert+renesas@glider.be
Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
|
|
When booting rts7751r2dplus_defconfig on QEMU, the system hangs due to
an interrupt storm on IRQ 20. IRQ 20 aka event 0x280 is a cascaded IRL
interrupt, which maps to IRQ_VOYAGER, the interrupt used by the Silicon
Motion SM501 multimedia companion chip. As rts7751r2d_irq_demux() does
not take into account the new virq offset, the interrupt is no longer
translated, leading to an unhandled interrupt.
Fix this by taking into account the virq offset when translating
cascaded IRL interrupts.
Fixes: a8ac2961148e8c72 ("sh: Avoid using IRQ0 on SH3 and SH4")
Reported-by: Guenter Roeck <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: John Paul Adrian Glaubitz <[email protected]>
Tested-by: John Paul Adrian Glaubitz <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/2c99d5df41c40691f6c407b7b6a040d406bc81ac.1688901306.git.geert+renesas@glider.be
Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
|
|
Use new cifs_smb_ses_inc_refcount() helper to get an active reference
of @ses and @ses->dfs_root_ses (if set). This will prevent
@ses->dfs_root_ses of being put in the next call to cifs_put_smb_ses()
and thus potentially causing an use-after-free bug.
Fixes: 8e3554150d6c ("cifs: fix sharing of DFS connections")
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
pSMB->hdr.Protocol is an array of size 4 bytes, hence when the compiler
analyzes this line of code
parm_data = ((char *) &pSMB->hdr.Protocol) + offset;
it legitimately complains about the fact that offset points outside the
bounds of the array. Notice that the compiler gives priority to the object
as an array, rather than merely the address of one more byte in a structure
to wich offset should be added (which seems to be the actual intention of
the original implementation).
Fix this by explicitly instructing the compiler to treat the code as a
sequence of bytes in struct smb_com_transaction2_spi_req, and not as an
array accessed through pointer notation.
Notice that ((char *)pSMB) + sizeof(pSMB->hdr.smb_buf_length) points to
the same address as ((char *) &pSMB->hdr.Protocol), therefore this results
in no differences in binary output.
Fixes the following -Wstringop-overflow warnings when built s390
architecture with defconfig (GCC 13):
CC [M] fs/smb/client/cifssmb.o
In function 'cifs_init_ace',
inlined from 'posix_acl_to_cifs' at fs/smb/client/cifssmb.c:3046:3,
inlined from 'cifs_do_set_acl' at fs/smb/client/cifssmb.c:3191:15:
fs/smb/client/cifssmb.c:2987:31: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
2987 | cifs_ace->cifs_e_perm = local_ace->e_perm;
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
In file included from fs/smb/client/cifssmb.c:27:
fs/smb/client/cifspdu.h: In function 'cifs_do_set_acl':
fs/smb/client/cifspdu.h:384:14: note: at offset [7, 11] into destination object 'Protocol' of size 4
384 | __u8 Protocol[4];
| ^~~~~~~~
In function 'cifs_init_ace',
inlined from 'posix_acl_to_cifs' at fs/smb/client/cifssmb.c:3046:3,
inlined from 'cifs_do_set_acl' at fs/smb/client/cifssmb.c:3191:15:
fs/smb/client/cifssmb.c:2988:30: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
2988 | cifs_ace->cifs_e_tag = local_ace->e_tag;
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
fs/smb/client/cifspdu.h: In function 'cifs_do_set_acl':
fs/smb/client/cifspdu.h:384:14: note: at offset [6, 10] into destination object 'Protocol' of size 4
384 | __u8 Protocol[4];
| ^~~~~~~~
This helps with the ongoing efforts to globally enable
-Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/310
Fixes: dc1af4c4b472 ("cifs: implement set acl method")
Cc: [email protected]
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2023-07-12
We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 7 files changed, 93 insertions(+), 28 deletions(-).
The main changes are:
1) Fix max stack depth check for async callbacks, from Kumar.
2) Fix inconsistent JIT image generation, from Björn.
3) Use trusted arguments in XDP hints kfuncs, from Larysa.
4) Fix memory leak in cpu_map_update_elem, from Pu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: use trusted arguments in XDP hints kfuncs
bpf: cpumap: Fix memory leak in cpu_map_update_elem
riscv, bpf: Fix inconsistent JIT image generation
selftests/bpf: Add selftest for check_stack_max_depth bug
bpf: Fix max stack depth check for async callbacks
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix ethernet header length field after stripping the mesh header
Cc: [email protected]
Link: https://lore.kernel.org/all/CT5GNZSK28AI.2K6M69OXM9RW5@syracuse/
Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces")
Reported-and-tested-by: Nicolas Escande <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If there is a failure during rtw89_fw_h2c_raw() rtw89_debug_priv_send_h2c
should return negative error code instead of a positive value count.
Fix this bug by returning correct error code.
Fixes: e3ec7017f6a2 ("rtw89: add Realtek 802.11ax driver")
Signed-off-by: Zhang Shurong <[email protected]>
Acked-by: Ping-Ke Shih <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
For some device types like TXGBE_ID_XAUI, *checksum computed in
txgbe_calc_eeprom_checksum() is larger than TXGBE_EEPROM_SUM. Remove the
limit on the size of *checksum.
Fixes: 049fe5365324 ("net: txgbe: Add operations to interact with firmware")
Fixes: 5e2ea7801fac ("net: txgbe: Fix unsigned comparison to zero in txgbe_calc_eeprom_checksum()")
Signed-off-by: Jiawen Wu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull OpenRISC fix from Stafford Horne:
- During the 6.4 cycle my fpu support work broke ABI compatibility in
the sigcontext struct. This was noticed by musl libc developers after
the release. This fix restores the ABI.
* tag 'for-linus' of https://github.com/openrisc/linux:
openrisc: Union fpcsr and oldmask in sigcontext to unbreak userspace ABI
|
|
variables
Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so
$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
[ 100.263533] ==================================================================
[ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[ 100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
[ 100.266320]
[ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 100.268561] Call Trace:
[ 100.268902] <TASK>
[ 100.269189] dump_stack_lvl+0x4c/0x70
[ 100.269680] print_report+0xc5/0x600
[ 100.270165] ? resolve_var_refs+0xc7/0x180
[ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0
[ 100.271389] ? resolve_var_refs+0xc7/0x180
[ 100.271913] kasan_report+0xbd/0x100
[ 100.272380] ? resolve_var_refs+0xc7/0x180
[ 100.272920] __asan_load8+0x71/0xa0
[ 100.273377] resolve_var_refs+0xc7/0x180
[ 100.273888] event_hist_trigger+0x749/0x860
[ 100.274505] ? kasan_save_stack+0x2a/0x50
[ 100.275024] ? kasan_set_track+0x29/0x40
[ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10
[ 100.276138] ? ksys_write+0xd1/0x170
[ 100.276607] ? do_syscall_64+0x3c/0x90
[ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.277771] ? destroy_hist_data+0x446/0x470
[ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860
[ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10
[ 100.279627] ? __kasan_check_write+0x18/0x20
[ 100.280177] ? mutex_unlock+0x85/0xd0
[ 100.280660] ? __pfx_mutex_unlock+0x10/0x10
[ 100.281200] ? kfree+0x7b/0x120
[ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0
[ 100.282197] ? event_trigger_write+0xac/0x100
[ 100.282764] ? __kasan_slab_free+0x16/0x20
[ 100.283293] ? __kmem_cache_free+0x153/0x2f0
[ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250
[ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[ 100.285221] ? event_trigger_write+0xbc/0x100
[ 100.285781] ? __kasan_check_read+0x15/0x20
[ 100.286321] ? __bitmap_weight+0x66/0xa0
[ 100.286833] ? _find_next_bit+0x46/0xe0
[ 100.287334] ? task_mm_cid_work+0x37f/0x450
[ 100.287872] event_triggers_call+0x84/0x150
[ 100.288408] trace_event_buffer_commit+0x339/0x430
[ 100.289073] ? ring_buffer_event_data+0x3f/0x60
[ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0
[ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0
[ 100.298653] syscall_enter_from_user_mode+0x32/0x40
[ 100.301808] do_syscall_64+0x1a/0x90
[ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.307775] RIP: 0033:0x7f686c75c1cb
[ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[ 100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
[ 100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
[ 100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
[ 100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
[ 100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[ 100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
[ 100.338381] </TASK>
We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.
With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Signed-off-by: Mohamed Khalfella <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Eric Dumazet says[1]:
-------
Speaking of psched_mtu(), I see that net/sched/sch_pie.c is using it
without holding RTNL, so dev->mtu can be changed underneath.
KCSAN could issue a warning.
-------
Annotate dev->mtu with READ_ONCE() so KCSAN don't issue a warning.
[1] https://lore.kernel.org/all/CANn89iJoJO5VtaJ-2=_d2aOQhb0Xw8iBT_Cxqp2HyuS-zj6azw@mail.gmail.com/
v1 -> v2: Fix commit message
Fixes: d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme")
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ENA adapters on our instances occasionally reset. Once recently
logged a UBSAN failure to console in the process:
UBSAN: shift-out-of-bounds in build/linux/drivers/net/ethernet/amazon/ena/ena_com.c:540:13
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 28 PID: 70012 Comm: kworker/u72:2 Kdump: loaded not tainted 5.15.117
Hardware name: Amazon EC2 c5d.9xlarge/, BIOS 1.0 10/16/2017
Workqueue: ena ena_fw_reset_device [ena]
Call Trace:
<TASK>
dump_stack_lvl+0x4a/0x63
dump_stack+0x10/0x16
ubsan_epilogue+0x9/0x36
__ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e
? __const_udelay+0x43/0x50
ena_delay_exponential_backoff_us.cold+0x16/0x1e [ena]
wait_for_reset_state+0x54/0xa0 [ena]
ena_com_dev_reset+0xc8/0x110 [ena]
ena_down+0x3fe/0x480 [ena]
ena_destroy_device+0xeb/0xf0 [ena]
ena_fw_reset_device+0x30/0x50 [ena]
process_one_work+0x22b/0x3d0
worker_thread+0x4d/0x3f0
? process_one_work+0x3d0/0x3d0
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
Apparently, the reset delays are getting so large they can trigger a
UBSAN panic.
Looking at the code, the current timeout is capped at 5000us. Using a
base value of 100us, the current code will overflow after (1<<29). Even
at values before 32, this function wraps around, perhaps
unintentionally.
Cap the value of the exponent used for this backoff at (1<<16) which is
larger than currently necessary, but large enough to support bigger
values in the future.
Cc: [email protected]
Fixes: 4bb7f4cf60e3 ("net: ena: reduce driver load time")
Signed-off-by: Krister Johansen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Shay Agroskin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Commit 6eb4bd92c1ce ("kallsyms: strip LTO suffixes from static functions")
stripped all function/variable suffixes started with '.' regardless
of whether those suffixes are generated at LTO mode or not. In fact,
as far as I know, in LTO mode, when a static function/variable is
promoted to the global scope, '.llvm.<...>' suffix is added.
The existing mechanism breaks live patch for a LTO kernel even if
no <symbol>.llvm.<...> symbols are involved. For example, for the following
kernel symbols:
$ grep bpf_verifier_vlog /proc/kallsyms
ffffffff81549f60 t bpf_verifier_vlog
ffffffff8268b430 d bpf_verifier_vlog._entry
ffffffff8282a958 d bpf_verifier_vlog._entry_ptr
ffffffff82e12a1f d bpf_verifier_vlog.__already_done
'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and
'__already_done' are static variables used inside 'bpf_verifier_vlog',
so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'.
Note that the func-level to file-level static function promotion also
happens without LTO.
Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will
return 4 symbols to live patch subsystem which current live patching
subsystem cannot handle it. With non-LTO kernel, only one symbol
is returned.
In [1], we have a lengthy discussion, the suggestion is to separate two
cases:
(1). new symbols with suffix which are generated regardless of whether
LTO is enabled or not, and
(2). new symbols with suffix generated only when LTO is enabled.
The cleanup_symbol_name() should only remove suffixes for case (2).
Case (1) should not be changed so it can work uniformly with or without LTO.
This patch removed LTO-only suffix '.llvm.<...>' so live patching and
tracing should work the same way for non-LTO kernel.
The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same
filtering pattern so both kernel and kallsyms tool have the same
expectation on the order of symbols.
[1] https://lore.kernel.org/live-patching/[email protected]/T/#u
Fixes: 6eb4bd92c1ce ("kallsyms: strip LTO suffixes from static functions")
Reported-by: Song Liu <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Reviewed-by: Zhen Lei <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into fixes
pinctrl: renesas: Fixes for v6.5
- Fix handling of non-unique pin control configuration subnode names
on the RZ/V2M and RZ/G2L SoC families.
Signed-off-by: Linus Walleij <[email protected]>
|
|
Debounce handling is done in two different entry points in the driver.
Unify this to make sure that it's always handled the same.
Tested-by: Jan Visser <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
|
|
pinctrl-amd currently tries to program bit 19 of all GPIOs to select
either a 4kΩ or 8hΩ pull up, but this isn't what bit 19 does. Bit
19 is marked as reserved, even in the latest platforms documentation.
Drop this programming functionality.
Tested-by: Jan Visser <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
|
|
On ASUS TUF A16 it is reported that the ITE5570 ACPI device connected to
GPIO 7 is causing an interrupt storm. This issue doesn't happen on
Windows.
Comparing the GPIO register configuration between Windows and Linux
bit 20 has been configured as a pull up on Windows, but not on Linux.
Checking GPIO declaration from the firmware it is clear it *should* have
been a pull up on Linux as well.
```
GpioInt (Level, ActiveLow, Exclusive, PullUp, 0x0000,
"\\_SB.GPIO", 0x00, ResourceConsumer, ,)
{ // Pin list
0x0007
}
```
On Linux amd_gpio_set_config() is currently only used for programming
the debounce. Actually the GPIO core calls it with all the arguments
that are supported by a GPIO, pinctrl-amd just responds `-ENOTSUPP`.
To solve this issue expand amd_gpio_set_config() to support the other
arguments amd_pinconf_set() supports, namely `PIN_CONFIG_BIAS_PULL_DOWN`,
`PIN_CONFIG_BIAS_PULL_UP`, and `PIN_CONFIG_DRIVE_STRENGTH`.
Reported-by: Nik P <[email protected]>
Reported-by: Nathan Schulte <[email protected]>
Reported-by: Friedrich Vock <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217336
Reported-by: [email protected]
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217493
Link: https://lore.kernel.org/linux-input/[email protected]/
Tested-by: Jan Visser <[email protected]>
Fixes: 2956b5d94a76 ("pinctrl / gpio: Introduce .set_config() callback for GPIO chips")
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
|
|
It's uncommon to use debounce on any other pin, but technically
we should only set debounce to 0 when working off GPIO0.
Cc: [email protected]
Tested-by: Jan Visser <[email protected]>
Fixes: 968ab9261627 ("pinctrl: amd: Detect internal GPIO0 debounce handling")
Signed-off-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
|
|
The stack_trace event is an event created by the tracing subsystem to
store stack traces. It originally just contained a hard coded array of 8
words to hold the stack, and a "size" to know how many entries are there.
This is exported to user space as:
name: kernel_stack
ID: 4
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int size; offset:8; size:4; signed:1;
field:unsigned long caller[8]; offset:16; size:64; signed:0;
print fmt: "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n",i
(void *)REC->caller[0], (void *)REC->caller[1], (void *)REC->caller[2],
(void *)REC->caller[3], (void *)REC->caller[4], (void *)REC->caller[5],
(void *)REC->caller[6], (void *)REC->caller[7]
Where the user space tracers could parse the stack. The library was
updated for this specific event to only look at the size, and not the
array. But some older users still look at the array (note, the older code
still checks to make sure the array fits inside the event that it read.
That is, if only 4 words were saved, the parser would not read the fifth
word because it will see that it was outside of the event size).
This event was changed a while ago to be more dynamic, and would save a
full stack even if it was greater than 8 words. It does this by simply
allocating more ring buffer to hold the extra words. Then it copies in the
stack via:
memcpy(&entry->caller, fstack->calls, size);
As the entry is struct stack_entry, that is created by a macro to both
create the structure and export this to user space, it still had the caller
field of entry defined as: unsigned long caller[8].
When the stack is greater than 8, the FORTIFY_SOURCE code notices that the
amount being copied is greater than the source array and complains about
it. It has no idea that the source is pointing to the ring buffer with the
required allocation.
To hide this from the FORTIFY_SOURCE logic, pointer arithmetic is used:
ptr = ring_buffer_event_data(event);
entry = ptr;
ptr += offsetof(typeof(*entry), caller);
memcpy(ptr, fstack->calls, size);
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Reported-by: Sven Schnelle <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
As comments in ftrace_process_locs(), there may be NULL pointers in
mcount_loc section:
> Some architecture linkers will pad between
> the different mcount_loc sections of different
> object files to satisfy alignments.
> Skip any NULL pointers.
After commit 20e5227e9f55 ("ftrace: allow NULL pointers in mcount_loc"),
NULL pointers will be accounted when allocating ftrace pages but skipped
before adding into ftrace pages, this may result in some pages not being
used. Then after commit 706c81f87f84 ("ftrace: Remove extra helper
functions"), warning may occur at:
WARN_ON(pg->next);
To fix it, only warn for case that no pointers skipped but pages not used
up, then free those unused pages after releasing ftrace_lock.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Fixes: 706c81f87f84 ("ftrace: Remove extra helper functions")
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
1ba6113a90a0 removed a lot of the kernel GPU channel, but method 0x128
was important as otherwise the GPU spams us with `CACHE_ERROR` messages.
We use the blit subchannel inside our vblank handling, so we should keep
at least this part.
v2: Only do it for NV11+ GPUs
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/201
Fixes: 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers")
Signed-off-by: Karol Herbst <[email protected]>
Reviewed-by: Ben Skeggs <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|