Age | Commit message (Collapse) | Author | Files | Lines |
|
Intel's Skylake Server CPUs have a different LLC topology than previous
generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided
into two "slices", each containing half the cores, half the LLC, and one
memory controller and each slice is enumerated to Linux as a NUMA
node. This is similar to how the cores and LLC were arranged for the
Cluster-On-Die (CoD) feature.
CoD allowed the same cache line to be present in each half of the LLC.
But, with SNC, each line is only ever present in *one* slice. This means
that the portion of the LLC *available* to a CPU depends on the data being
accessed:
Remote socket: entire package LLC is shared
Local socket->local slice: data goes into local slice LLC
Local socket->remote slice: data goes into remote-slice LLC. Slightly
higher latency than local slice LLC.
The biggest implication from this is that a process accessing all
NUMA-local memory only sees half the LLC capacity.
The CPU describes its cache hierarchy with the CPUID instruction. One of
the CPUID leaves enumerates the "logical processors sharing this
cache". This information is used for scheduling decisions so that tasks
move more freely between CPUs sharing the cache.
But, the CPUID for the SNC configuration discussed above enumerates the LLC
as being shared by the entire package. This is not 100% precise because the
entire cache is not usable by all accesses. But, it *is* the way the
hardware enumerates itself, and this is not likely to change.
The userspace visible impact of all the above is that the sysfs info
reports the entire LLC as being available to the entire package. As noted
above, this is not true for local socket accesses. This patch does not
correct the sysfs info. It is the same, pre and post patch.
The current code emits the following warning:
sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
The warning is coming from the topology_sane() check in smpboot.c because
the topology is not matching the expectations of the model for obvious
reasons.
To fix this, add a vendor and model specific check to never call
topology_sane() for these systems. Also, just like "Cluster-on-Die" disable
the "coregroup" sched_domain_topology_level and use NUMA information from
the SRAT alone.
This is OK at least on the hardware we are immediately concerned about
because the LLC sharing happens at both the slice and at the package level,
which are also NUMA boundaries.
Signed-off-by: Alison Schofield <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: Dave Hansen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Igor Mammedov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tim Chen <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
If the get_callchain_buffers fails to allocate the buffer it will
decrease the nr_callchain_events right away.
There's no point of checking the allocation error for
nr_callchain_events > 1. Removing that check.
Signed-off-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The syzbot hit KASAN bug in perf_callchain_store having the entry stored
behind the allocated bounds [1].
We miss the sample_max_stack check for the initial event that allocates
callchain buffers. This missing check allows to create an event with
sample_max_stack value bigger than the global sysctl maximum:
# sysctl -a | grep perf_event_max_stack
kernel.perf_event_max_stack = 127
# perf record -vv -C 1 -e cycles/max-stack=256/ kill
...
perf_event_attr:
size 112
...
sample_max_stack 256
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4
Note the '-C 1', which forces perf record to create just single event.
Otherwise it opens event for every cpu, then the sample_max_stack check
fails on the second event and all's fine.
The fix is to run the sample_max_stack check also for the first event
with callchains.
[1] https://marc.info/?l=linux-kernel&m=152352732920874&w=2
Reported-by: [email protected]
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 97c79a38cd45 ("perf core: Per event callchain limit")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Return immediately when we find issue in the user stack checks. The
error value could get overwritten by following check for
PERF_SAMPLE_REGS_INTR.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: 60e2364e60e8 ("perf: Add ability to sample machine state on interrupt")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
'perf list' with flags -d and -v print a description (-d) or a very
verbose explanation (-v) of CPU specific counter events. These
descriptions are provided with the json files in directory
pmu-events/arch/s390/*.json.
Display of these descriptions on s390 requires the corresponding json
files.
On s390 this does not work because function is_pmu_core() does not
detect the s390 directory name where the CPU specific events are listed.
On x86 it is:
/sys/bus/event_source/devices/cpu
whereas on s390 it is:
/sys/bus/event_source/devices/cpum_cf
/sys/bus/event_source/devices/cpum_sf
Fix this by adding s390 directory name testing to function
is_pmu_core(). This is the same approach as taken for the ARM platform.
Output before:
[root@s35lp76 perf]# ./perf list -d pmu
List of pre-defined events (to be used in -e):
cpum_cf/AES_BLOCKED_CYCLES/ [Kernel PMU event]
cpum_cf/AES_BLOCKED_FUNCTIONS/ [Kernel PMU event]
cpum_cf/AES_CYCLES/ [Kernel PMU event]
cpum_cf/AES_FUNCTIONS/ [Kernel PMU event]
....
cpum_cf/TX_NC_TEND/ [Kernel PMU event]
cpum_cf/VX_BCD_EXECUTION_SLOTS/ [Kernel PMU event]
cpum_sf/SF_CYCLES_BASIC/ [Kernel PMU event]
Output after:
[root@s35lp76 perf]# ./perf list -d pmu
List of pre-defined events (to be used in -e):
cpum_cf/AES_BLOCKED_CYCLES/ [Kernel PMU event]
cpum_cf/AES_BLOCKED_FUNCTIONS/ [Kernel PMU event]
cpum_cf/AES_CYCLES/ [Kernel PMU event]
cpum_cf/AES_FUNCTIONS/ [Kernel PMU event]
....
cpum_cf/TX_NC_TEND/ [Kernel PMU event]
cpum_cf/VX_BCD_EXECUTION_SLOTS/ [Kernel PMU event]
cpum_sf/SF_CYCLES_BASIC/ [Kernel PMU event]
3906:
bcd_dfp_execution_slots
[BCD DFP Execution Slots]
decimal_instructions
[Decimal Instructions]
dtlb2_gpage_writes
[DTLB2 GPAGE Writes]
dtlb2_hpage_writes
[DTLB2 HPAGE Writes]
dtlb2_misses
[DTLB2 Misses]
dtlb2_writes
[DTLB2 Writes]
itlb2_misses
[ITLB2 Misses]
itlb2_writes
[ITLB2 Writes]
l1c_tlb2_misses
[L1C TLB2 Misses]
.....
cfvn 3:
cpu_cycles
[CPU Cycles]
instructions
[Instructions]
l1d_dir_writes
[L1D Directory Writes]
l1d_penalty_cycles
[L1D Penalty Cycles]
l1i_dir_writes
[L1I Directory Writes]
l1i_penalty_cycles
[L1I Penalty Cycles]
problem_state_cpu_cycles
[Problem State CPU Cycles]
problem_state_instructions
[Problem State Instructions]
....
csvn generic:
aes_blocked_cycles
[AES Blocked Cycles]
aes_blocked_functions
[AES Blocked Functions]
aes_cycles
[AES Cycles]
aes_functions
[AES Functions]
dea_blocked_cycles
[DEA Blocked Cycles]
dea_blocked_functions
[DEA Blocked Functions]
....
Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Hendrik Brueckner <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Append 'p' sign to 'S' tag designating the type of context switch out event so
'Sp' means preemption context switch. Documentation is extended to cover
new presentation changes.
$ perf script --show-switch-events -F +misc -I -i perf.data:
hdparm 4073 [004] U 762.198265: 380194 cycles:ppp: 7faf727f5a23 strchr (/usr/lib64/ld-2.26.so)
hdparm 4073 [004] K 762.198366: 441572 cycles:ppp: ffffffffb9218435 alloc_set_pte (/lib/modules/4.16.0-rc6+/build/vmlinux)
hdparm 4073 [004] S 762.198391: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0
swapper 0 [004] 762.198392: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 4073/4073
swapper 0 [004] Sp 762.198477: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 4073/4073
hdparm 4073 [004] 762.198478: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0
swapper 0 [007] K 762.198514: 2303073 cycles:ppp: ffffffffb98b0c66 intel_idle (/lib/modules/4.16.0-rc6+/build/vmlinux)
swapper 0 [007] Sp 762.198561: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 1134/1134
kworker/u16:18 1134 [007] 762.198562: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0
kworker/u16:18 1134 [007] S 762.198567: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0
Signed-off-by: Alexey Budankov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Print additional 'preempt' tag for PERF_RECORD_SWITCH[_CPU_WIDE] OUT records when
event header misc field contains PERF_RECORD_MISC_SWITCH_OUT_PREEMPT bit set
designating preemption context switch out event:
tools/perf/perf report -D -i perf.data | grep _SWITCH
0 768361415226 0x27f076 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 8/8
4 768362216813 0x28f45e [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0
4 768362217824 0x28f486 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 4073/4073
0 768362414027 0x27f0ce [0x28]: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 8/8
0 768362414367 0x27f0f6 [0x28]: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0
Signed-off-by: Alexey Budankov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Store preempting context switch out event into Perf trace as a part of
PERF_RECORD_SWITCH[_CPU_WIDE] record.
Percentage of preempting and non-preempting context switches help
understanding the nature of workloads (CPU or IO bound) that are running
on a machine;
The event is treated as preemption one when task->state value of the
thread being switched out is TASK_RUNNING. Event type encoding is
implemented using PERF_RECORD_MISC_SWITCH_OUT_PREEMPT bit;
Signed-off-by: Alexey Budankov <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Sync the following tooling headers with the latest kernel version:
tools/arch/arm/include/uapi/asm/kvm.h
- New ABI: KVM_REG_ARM_*
tools/arch/x86/include/asm/required-features.h
- Removal of NEED_LA57 dependency
tools/arch/x86/include/uapi/asm/kvm.h
- New KVM ABI: KVM_SYNC_X86_*
tools/include/uapi/asm-generic/mman-common.h
- New ABI: MAP_FIXED_NOREPLACE flag
tools/include/uapi/linux/bpf.h
- New ABI: BPF_F_SEQ_NUMBER functions
tools/include/uapi/linux/if_link.h
- New ABI: IFLA tun and rmnet support
tools/include/uapi/linux/kvm.h
- New ABI: hyperv eventfd and CONN_ID_MASK support plus header cleanups
tools/include/uapi/sound/asound.h
- New ABI: SNDRV_PCM_FORMAT_FIRST PCM format specifier
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
- The x86 system call table description changed due to the ptregs changes and the renames, in:
d5a00528b58c: syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
5ac9efa3c50d: syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
ebeb8c82ffaf: syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
Also fix the x86 syscall table warning:
-Warning: Kernel ABI header at 'tools/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
+Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
None of these changes impact existing tooling code, so we only have to copy the kernel version.
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Budankov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Brian Robbins <[email protected]>
Cc: Clark Williams <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Dmitriy Vyukov <[email protected]> <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Hendrik Brueckner <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Li Zhijian <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin Liška <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Miguel Bernal Marin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Takuya Yamamoto <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Thomas Richter <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: William Cohen <[email protected]>
Cc: Yonghong Song <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
xenbus_command_reply() did not actually copy the response string and
leaked stack content instead.
Fixes: 9a6161fe73bd ("xen: return xenstore command failures via response instead of rc")
Signed-off-by: Simon Gaiser <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Boris Ostrovsky <[email protected]>
|
|
This is the sync up with the canonical definition of the sound
protocol in Xen:
1. Protocol version was referenced in the protocol description,
but missed its definition. Fixed by adding a constant
for current protocol version.
2. Some of the request descriptions have "reserved" fields
missed: fixed by adding corresponding entries.
3. Extend the size of the requests and responses to 64 octets.
Bump protocol version to 2.
4. Add explicit back and front synchronization
In order to provide explicit synchronization between backend and
frontend the following changes are introduced in the protocol:
- add new ring buffer for sending asynchronous events from
backend to frontend to report number of bytes played by the
frontend (XENSND_EVT_CUR_POS)
- introduce trigger events for playback control: start/stop/pause/resume
- add "req-" prefix to event-channel and ring-ref to unify naming
of the Xen event channels for requests and events
5. Add explicit back and front parameter negotiation
In order to provide explicit stream parameter negotiation between
backend and frontend the following changes are introduced in the protocol:
add XENSND_OP_HW_PARAM_QUERY request to read/update
configuration space for the parameters given: request passes
desired parameter's intervals/masks and the response to this request
returns allowed min/max intervals/masks to be used.
Signed-off-by: Oleksandr Andrushchenko <[email protected]>
Signed-off-by: Oleksandr Grytsov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Takashi Iwai <[email protected]>
Signed-off-by: Boris Ostrovsky <[email protected]>
|
|
We might need to do some actions before the shadow variable is freed.
For example, we might need to remove it from a list or free some data
that it points to.
This is already possible now. The user can get the shadow variable
by klp_shadow_get(), do the necessary actions, and then call
klp_shadow_free().
This patch allows to do it a more elegant way. The user could implement
the needed actions in a callback that is passed to klp_shadow_free()
as a parameter. The callback usually does reverse operations to
the constructor callback that can be called by klp_shadow_*alloc().
It is especially useful for klp_shadow_free_all(). There we need to do
these extra actions for each found shadow variable with the given ID.
Note that the memory used by the shadow variable itself is still released
later by rcu callback. It is needed to protect internal structures that
keep all shadow variables. But the destructor is called immediately.
The shadow variable must not be access anyway after klp_shadow_free()
is called. The user is responsible to protect this any suitable way.
Be aware that the destructor is called under klp_shadow_lock. It is
the same as for the contructor in klp_shadow_alloc().
Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Miroslav Benes <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
The existing API allows to pass a sample data to initialize the shadow
data. It works well when the data are position independent. But it fails
miserably when we need to set a pointer to the shadow structure itself.
Unfortunately, we might need to initialize the pointer surprisingly
often because of struct list_head. It is even worse because the list
might be hidden in other common structures, for example, struct mutex,
struct wait_queue_head.
For example, this was needed to fix races in ALSA sequencer. It required
to add mutex into struct snd_seq_client. See commit b3defb791b26ea06
("ALSA: seq: Make ioctls race-free") and commit d15d662e89fc667b9
("ALSA: seq: Fix racy pool initializations")
This patch makes the API more safe. A custom constructor function and data
are passed to klp_shadow_*alloc() functions instead of the sample data.
Note that ctor_data are no longer a template for shadow->data. It might
point to any data that might be necessary when the constructor is called.
Also note that the constructor is called under klp_shadow_lock. It is
an internal spin_lock that synchronizes alloc() vs. get() operations,
see klp_shadow_get_or_alloc(). On one hand, this adds a risk of ABBA
deadlocks. On the other hand, it allows to do some operations safely.
For example, we could add the new structure into an existing list.
This must be done only once when the structure is allocated.
Reported-by: Nicolai Stange <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Acked-by: Miroslav Benes <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
early_trap_init() and cpu_set_gdt() have been removed, so remove the stale
declarations as well.
Signed-off-by: Dou Liyang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
RongQing reported that there are some X2APIC id 0xffffffff in his machine's
ACPI MADT table, which makes the number of possible CPU inaccurate.
The reason is that the ACPI X2APIC parser has no sanity check for APIC ID
0xffffffff, which is an invalid id in all APIC types. See "Intel® 64
Architecture x2APIC Specification", Chapter 2.4.1.
Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.
Reported-by: Li RongQing <[email protected]>
Signed-off-by: Dou Liyang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The TSC calibration code uses HPET as reference. The conversion normalizes
the delta of two HPET timestamps:
hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6
and then divides the normalized delta of the corresponding TSC timestamps
by the result to calulate the TSC frequency.
tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref
This uses do_div() which takes an u32 as the divisor, which worked so far
because the HPET frequency was low enough that 'hpetref' never exceeded
32bit.
On Skylake machines the HPET frequency increased so 'hpetref' can exceed
32bit. do_div() truncates the divisor, which causes the calibration to
fail.
Use div64_u64() to avoid the problem.
[ tglx: Fixes whitespace mangled patch and rewrote changelog ]
Signed-off-by: Xiaoming Gao <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The commit that switched x86 to dma_direct_ops stopped using and building
this file, but accidentally left it in the tree. Remove it.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
If there is no d-cache-size property in the device tree, l1d_size could
be zero. We don't actually expect that to happen, it's only been seen
on mambo (simulator) in some configurations.
A zero-size l1d_size leads to the loop in the asm wrapping around to
2^64-1, and then walking off the end of the fallback area and
eventually causing a page fault which is fatal.
Just default to 64K which is correct on some CPUs, and sane enough to
not cause a crash on others.
Fixes: aa8a5e0062ac9 ('powerpc/64s: Add support for RFI flush of L1-D cache')
Signed-off-by: Madhavan Srinivasan <[email protected]>
[mpe: Rewrite comment and change log]
Signed-off-by: Michael Ellerman <[email protected]>
|
|
After merging the netfilter tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:
net/netfilter/nf_conntrack_extend.c: In function 'nf_ct_ext_add':
net/netfilter/nf_conntrack_extend.c:74:2: error: implicit declaration of function 'kmemleak_not_leak' [-Werror=implicit-function-declaration]
kmemleak_not_leak(old);
^~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
Fixes: 114aa35d06d4 ("netfilter: conntrack: silent a memory leak warning")
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The struct sigaction for user space in arch/s390/include/uapi/asm/signal.h
is ill defined. The kernel uses two structures 'struct sigaction' and
'struct old_sigaction', the correlation in the kernel for both 31 and
64 bit is as follows
sys_sigaction -> struct old_sigaction
sys_rt_sigaction -> struct sigaction
The correlation of the (single) uapi definition for 'struct sigaction'
under '#ifndef __KERNEL__':
31-bit: sys_sigaction -> uapi struct sigaction
31-bit: sys_rt_sigaction -> no structure available
64-bit: sys_sigaction -> no structure available
64-bit: sys_rt_sigaction -> uapi struct sigaction
This is quite confusing. To make it a bit less confusing make the
uapi definition of 'struct sigaction' usable for sys_rt_sigaction for
both 31-bit and 64-bit.
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
It may be useful to compile host programs with different flags (e.g.
hardening). Ensure that objtool picks up the appropriate flags.
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/05a360681176f1423cb2fde8faae3a0a0261afc5.1523560825.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now exynos_drm_fb is just an empty wrapper around drm_framebuffer, we
can drop it.
Signed-off-by: Daniel Stone <[email protected]>
Signed-off-by: Inki Dae <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
|
|
This can be calculated from the GEM BO DMA address as well as the offset
stored in the base framebuffer.
Signed-off-by: Daniel Stone <[email protected]>
Signed-off-by: Inki Dae <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
|
|
Since drm_framebuffer can now store GEM objects directly, place them
there rather than in our own subclass. As this makes the framebuffer
create_handle and destroy functions the same as the GEM framebuffer
helper, we can reuse those.
Signed-off-by: Daniel Stone <[email protected]>
Signed-off-by: Inki Dae <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
|
|
This warning message is not very helpful, as the return value should
already show information about the error. Also, this message will
spam dmesg if the user space does testing in a loop, like:
for x in {0..5}
do
echo p:xx xx+$x >> /sys/kernel/debug/tracing/kprobe_events
done
Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Make lib/textsearch.c usable as kernel-doc.
Add textsearch() function family to kernel-api documentation.
Fix kernel-doc warnings in <linux/textsearch.h>:
../include/linux/textsearch.h:65: warning: Incorrect use of kernel-doc format:
* get_next_block - fetch next block of data
../include/linux/textsearch.h:82: warning: Incorrect use of kernel-doc format:
* finish - finalize/clean a series of get_next_block() calls
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Eric Dumazet says:
====================
tipc: Better check user provided attributes
syzbot reported a crash in __tipc_nl_net_set()
While fixing it, I also had to fix an old bug involving TIPC_NLA_NET_ADDR
====================
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
syzbot reported a crash in __tipc_nl_net_set() caused by NULL dereference.
We need to check that both TIPC_NLA_NET_NODEID and TIPC_NLA_NET_NODEID_W1
are present.
We also need to make sure userland provided u64 attributes.
Fixes: d50ccc2d3909 ("tipc: add 128-bit node identifier")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Before syzbot/KMSAN bites, add the missing policy for TIPC_NLA_NET_ADDR
Fixes: 27c21416727a ("tipc: add net set to new netlink api")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc build fix from Helge Deller:
"Fix build error because of missing binfmt_elf32.o file which is still
mentioned in the Makefile"
* 'parisc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix missing binfmt_elf32.o build error
|
|
The MIPS kernel memset / bzero implementation includes a small_memset
branch which is used when the region to be set is smaller than a long (4
bytes on 32bit, 8 bytes on 64bit). The current small_memset
implementation uses a simple store byte loop to write the destination.
There are 2 issues with this implementation:
1. When EVA mode is active, user and kernel address spaces may overlap.
Currently the use of the sb instruction means kernel mode addressing is
always used and an intended write to userspace may actually overwrite
some critical kernel data.
2. If the write triggers a page fault, for example by calling
__clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS
is triggered.
Fix these issues by replacing the sb instruction with the EX() macro,
which will emit EVA compatible instuctions as required. Additionally
implement a fault fixup for small_memset which sets a2 to the number of
bytes that could not be cleared (as defined by __clear_user).
Reported-by: Chuanhua Lei <[email protected]>
Signed-off-by: Matt Redfearn <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/18975/
Signed-off-by: James Hogan <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull missed timer updates from Thomas Gleixner:
"This is a branch which got forgotten during the merge window, but it
contains only fixes and hardware enablement. No fundamental changes.
- Various fixes for the imx-tpm clocksource driver
- A new timer driver for the NCPM7xx SoC family"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/imx-tpm: Add different counter width support
clocksource/drivers/imx-tpm: Correct some registers operation flow
clocksource/drivers/imx-tpm: Fix typo of clock name
dt-bindings: timer: tpm: fix typo of clock name
clocksource/drivers/npcm: Add NPCM7xx timer driver
dt-binding: timer: document NPCM7xx timer DT bindings
|
|
Both ecryptfs_filldir() and ecryptfs_readlink_lower() use
ecryptfs_decode_and_decrypt_filename() to translate lower filenames to
upper filenames. The function correctly passes up lower filenames,
unchanged, when filename encryption isn't in use. However, it was also
passing up lower filenames when the filename wasn't encrypted or
when decryption failed. Since 88ae4ab9802e, eCryptfs refuses to lookup
lower plaintext names when filename encryption is enabled so this
resulted in a situation where userspace would see lower plaintext
filenames in calls to getdents(2) but then not be able to lookup those
filenames.
An example of this can be seen when enabling filename encryption on an
eCryptfs mount at the root directory of an Ext4 filesystem:
$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
ls: cannot access '/upper/lost+found': No such file or directory
? lost+found
12 test
With this change, the lower lost+found dentry is ignored:
$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
12 test
Additionally, some potentially noisy error/info messages in the related
code paths are turned into debug messages so that the logs can't be
easily filled.
Fixes: 88ae4ab9802e ("ecryptfs_lookup(): try either only encrypted or plaintext name")
Reported-by: Guenter Roeck <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Tyler Hicks <[email protected]>
|
|
Pull kvm fixes from Paolo Bonzini:
"Bug fixes, plus a new test case and the associated infrastructure for
writing nested virtualization tests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: selftests: add vmx_tsc_adjust_test
kvm: x86: move MSR_IA32_TSC handling to x86.c
X86/KVM: Properly update 'tsc_offset' to represent the running guest
kvm: selftests: add -std=gnu99 cflags
x86: Add check for APIC access address for vmentry of L2 guests
KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update
X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available
kvm: selftests: fix spelling mistake: "divisable" and "divisible"
X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
|
|
The |= operator will let us end up with an invalid PTE. Use
the correct &= instead.
[ The bug was also independently reported by Shuah Khan ]
Fixes: fb43d6cb91ef ('x86/mm: Do not auto-massage page protections')
Acked-by: Andy Lutomirski <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It would allocate memory in this function when the cork->opt is NULL. But
the memory isn't freed if failed in the latter rt check, and return error
directly. It causes the memleak if its caller is ip_make_skb which also
doesn't free the cork->opt when meet a error.
Now move the rt check ahead to avoid the memleak.
Signed-off-by: Gao Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In both HDMI and DP, device count is represented by 6:0 bits of a
register(BInfo/Bstatus)
So macro for bitmasking the device_count is fixed(0x3F->0x7F).
v3:
Retained the Rb-ed.
v4:
%s/drm\/i915/drm [rodrigo]
v5:
Added "Fixes:" and HDCP keyword in subject [Rodrigo, Sean Paul]
Signed-off-by: Ramalingam C <[email protected]>
Fixes: 495eb7f877ab drm: Add some HDCP related #defines
cc: Sean Paul <[email protected]>
Reviewed-by: Sean Paul <[email protected]>
Signed-off-by: Sean Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The test checks the behavior of setting MSR_IA32_TSC in a nested guest,
and the TSC_OFFSET VMCS field in general. It also introduces the testing
infrastructure for Intel nested virtualization.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This is not specific to Intel/AMD anymore. The TSC offset is available
in vcpu->arch.tsc_offset.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always
captures the TSC_OFFSET of the running guest whether it is the L1 or L2
guest.
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Jim Mattson <[email protected]>
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: KarimAllah Ahmed <[email protected]>
[AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
set->name must be free'd here in case ops->init fails.
Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
rules in nftables a free'd using kfree, but protected by rcu, i.e. we
must wait for a grace period to elapse.
Normal removal patch does this, but nf_tables_newrule() doesn't obey
this rule during error handling.
It calls nft_trans_rule_add() *after* linking rule, and, if that
fails to allocate memory, it unlinks the rule and then kfree() it --
this is unsafe.
Switch order -- first add rule to transaction list, THEN link it
to public list.
Note: nft_trans_rule_add() uses GFP_KERNEL; it will not fail so this
is not a problem in practice (spotted only during code review).
Fixes: 0628b123c96d12 ("netfilter: nfnetlink: add batch support and use it from nf_tables")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
We get a new link error with CONFIG_NFT_REJECT_INET=y and CONFIG_NF_REJECT_IPV6=m
after larger parts of the nftables modules are linked together:
net/netfilter/nft_reject_inet.o: In function `nft_reject_inet_eval':
nft_reject_inet.c:(.text+0x17c): undefined reference to `nf_send_unreach6'
nft_reject_inet.c:(.text+0x190): undefined reference to `nf_send_reset6'
The problem is that with NF_TABLES_INET set, we implicitly try to use
the ipv6 version as well for NFT_REJECT, but when CONFIG_IPV6 is set to
a loadable module, it's impossible to reach that.
The best workaround I found is to express the above as a Kconfig
dependency, forcing NFT_REJECT itself to be 'm' in that particular
configuration.
Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The following memory leak is false postive:
unreferenced object 0xffff8f37f156fb38 (size 128):
comm "softirq", pid 0, jiffies 4294899665 (age 11.292s)
hex dump (first 32 bytes):
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
00 00 00 00 30 00 20 00 48 6b 6b 6b 6b 6b 6b 6b ....0. .Hkkkkkkk
backtrace:
[<000000004fda266a>] __kmalloc_track_caller+0x10d/0x141
[<000000007b0a7e3c>] __krealloc+0x45/0x62
[<00000000d08e0bfb>] nf_ct_ext_add+0xdc/0x133
[<0000000099b47fd8>] init_conntrack+0x1b1/0x392
[<0000000086dc36ec>] nf_conntrack_in+0x1ee/0x34b
[<00000000940592de>] nf_hook_slow+0x36/0x95
[<00000000d1bd4da7>] nf_hook.constprop.43+0x1c3/0x1dd
[<00000000c3673266>] __ip_local_out+0xae/0xb4
[<000000003e4192a6>] ip_local_out+0x17/0x33
[<00000000b64356de>] igmp_ifc_timer_expire+0x23e/0x26f
[<000000006a8f3032>] call_timer_fn+0x14c/0x2a5
[<00000000650c1725>] __run_timers.part.34+0x150/0x182
[<0000000090e6946e>] run_timer_softirq+0x2a/0x4c
[<000000004d1e7293>] __do_softirq+0x1d1/0x3c2
[<000000004643557d>] irq_exit+0x53/0xa2
[<0000000029ddee8f>] smp_apic_timer_interrupt+0x22a/0x235
because __krealloc() is not supposed to release the old
memory and it is released later via kfree_rcu(). Since this is
the only external user of __krealloc(), just mark it as not leak
here.
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
In order to remove the race caught by syzbot [1], we need
to lock the socket before using po->tp_version as this could
change under us otherwise.
This means lock_sock() and release_sock() must be done by
packet_set_ring() callers.
[1] :
BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:53
kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
__msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
SyS_setsockopt+0x76/0xa0 net/socket.c:1828
do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x449099
RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001
Local variable description: ----req_u@packet_setsockopt
Variable was created at:
packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Due to a firmware bug, the hypervisor can send an interrupt to a
transmit or receive queue just prior to a partition migration, not
allowing the device enough time to handle it and send an EOI. When
the partition migrates, the interrupt is lost but an "EOI-pending"
flag for the interrupt line is still set in firmware. No further
interrupts will be sent until that flag is cleared, effectively
freezing that queue. To workaround this, the driver will disable the
hardware interrupt and send an H_EOI signal prior to re-enabling it.
This will flush the pending EOI and allow the driver to continue
operation.
Signed-off-by: Thomas Falcon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.
Also, remove the redundant `tp->packets_out = 0` from
tcp_disconnect(), since tcp_disconnect() calls
tcp_write_queue_purge().
Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Subash Abhinov Kasiviswanathan <[email protected]>
Reported-by: Sami Farin <[email protected]>
Tested-by: Sami Farin <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Enable test cases for the kernel's fallback to label-less mode.
Signed-off-by: Dan Williams <[email protected]>
|
|
Sysfs userspace tooling generally expects the kernel to emit a newlines
when reading sysfs attributes.
Signed-off-by: Dan Williams <[email protected]>
|
|
The nfit_test.1 bus provides a pmem topology without blk-aperture
enabling, so it presents different failure modes for label space
handling. Allow custom DSM command error injection.
Signed-off-by: Dan Williams <[email protected]>
|