Age | Commit message (Collapse) | Author | Files | Lines |
|
LoongArch architecture changes for 6.2 depend on the acpi and irqchip
changes to work, so merge them to create a base.
|
|
memcg_write_event_control() accesses the dentry->d_name of the specified
control fd to route the write call. As a cgroup interface file can't be
renamed, it's safe to access d_name as long as the specified file is a
regular cgroup file. Also, as these cgroup interface files can't be
removed before the directory, it's safe to access the parent too.
Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a
call to __file_cft() which verified that the specified file is a regular
cgroupfs file before further accesses. The cftype pointer returned from
__file_cft() was no longer necessary and the commit inadvertently
dropped the file type check with it allowing any file to slip through.
With the invarients broken, the d_name and parent accesses can now race
against renames and removals of arbitrary files and cause
use-after-free's.
Fix the bug by resurrecting the file type check in __file_cft(). Now
that cgroupfs is implemented through kernfs, checking the file
operations needs to go through a layer of indirection. Instead, let's
check the superblock and dentry type.
Signed-off-by: Tejun Heo <[email protected]>
Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft")
Cc: [email protected] # v3.14+
Reported-by: Jann Horn <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For supporting post MSI-X enable allocations and for the upcoming PCI/IMS
support a separate interface is required which allows not only the
allocation of a specific index, but also the allocation of any, i.e. the
next free index. The latter is especially required for IMS because IMS
completely does away with index to functionality mappings which are
often found in MSI/MSI-X implementation.
But even with MSI-X there are devices where only the first few indices have
a fixed functionality and the rest is freely assignable by software,
e.g. to queues.
msi_domain_alloc_irq_at() is also different from the range based interfaces
as it always enforces that the MSI descriptor is allocated by the core code
and not preallocated by the caller like the PCI/MSI[-X] enable code path
does.
msi_domain_alloc_irq_at() can be invoked with the index argument set to
MSI_ANY_INDEX which makes the core code pick the next free index. The irq
domain can provide a prepare_desc() operation callback in it's
msi_domain_ops to do domain specific post allocation initialization before
the actual Linux interrupt and the associated interrupt descriptor and
hierarchy alloccations are conducted.
The function also takes an optional @icookie argument which is of type
union msi_instance_cookie. This cookie is not used by the core code and is
stored in the allocated msi_desc::data::icookie. The meaning of the cookie
is completely implementation defined. In case of IMS this might be a PASID
or a pointer to a device queue, but for the MSI core it's opaque and not
used in any way.
The function returns a struct msi_map which on success contains the
allocated index number and the Linux interrupt number so the caller can
spare the index to Linux interrupt number lookup.
On failure map::index contains the error code and map::virq is 0.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The existing MSI domain ops msi_prepare() and set_desc() turned out to be
unsuitable for implementing IMS support.
msi_prepare() does not operate on the MSI descriptors. set_desc() lacks
an irq_domain pointer and has a completely different purpose.
Introduce a prepare_desc() op which allows IMS implementations to amend an
MSI descriptor which was allocated by the core code, e.g. by adjusting the
iomem base or adding some data based on the allocated index. This is way
better than requiring that all IMS domain implementations preallocate the
MSI descriptor and then allocate the interrupt.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Provide new bus tokens for the upcoming per device PCI/MSI and PCI/MSIX
interrupt domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Per device domains provide the real domain size to the core code. This
allows range checking on insertion of MSI descriptors and also paves the
way for dynamic index allocations which are required e.g. for IMS. This
avoids external mechanisms like bitmaps on the device side and just
utilizes the core internal MSI descriptor storxe for it.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
proc_skip_spaces() seems to think it is working on C strings, and ends
up being just a wrapper around skip_spaces() with a really odd calling
convention.
Instead of basing it on skip_spaces(), it should have looked more like
proc_skip_char(), which really is the exact same function (except it
skips a particular character, rather than whitespace). So use that as
inspiration, odd coding and all.
Now the calling convention actually makes sense and works for the
intended purpose.
Reported-and-tested-by: Kyle Zeng <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
proc_get_long() is passed a size_t, but then assigns it to an 'int'
variable for the length. Let's not do that, even if our IO paths are
limited to MAX_RW_COUNT (exactly because of these kinds of type errors).
So do the proper test in the rigth type.
Reported-by: Kyle Zeng <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Provide an interface to match a per device domain bus token. This allows to
query which type of domain is installed for a particular domain id. Will be
used for PCI to avoid frequent create/remove cycles for the MSI resp. MSI-X
domains.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Now that all prerequsites are in place, provide the actual interfaces for
creating and removing per device interrupt domains.
MSI device interrupt domains are created from the provided
msi_domain_template which is duplicated so that it can be modified for the
particular device.
The name of the domain and the name of the interrupt chip are composed by
"$(PREFIX)$(CHIPNAME)-$(DEVNAME)"
$PREFIX: The optional prefix provided by the underlying MSI parent domain
via msi_parent_ops::prefix.
$CHIPNAME: The name of the irq_chip in the template
$DEVNAME: The name of the device
The domain is further initialized through a MSI parent domain callback which
fills in the required functionality for the parent domain or domains further
down the hierarchy. This initialization can fail, e.g. when the requested
feature or MSI domain type cannot be supported.
The domain pointer is stored in the pointer array inside of msi_device_data
which is attached to the domain.
The domain can be removed via the API or left for disposal via devres when
the device is torn down. The API removal is useful e.g. for PCI to have
seperate domains for MSI and MSI-X, which are mutually exclusive and always
occupy the default domain id slot.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Split the functionality of msi_create_irq_domain() so it can
be reused for creating per device irq domains.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
To allow proper range checking especially for dynamic allocations add a
size field to struct msi_domain_info. If the field is 0 then the size is
unknown or unlimited (up to MSI_MAX_INDEX) to provide backwards
compability.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
MSI parent domains must have some control over the MSI domains which are
built on top. On domain creation they need to fill in e.g. architecture
specific chip callbacks or msi domain ops to make the outermost domain
parent agnostic which is obviously required for architecture independence
etc.
The structure contains:
1) A bitfield which exposes the supported functional features. This
allows to check for features and is also used in the initialization
callback to mask out unsupported features when the actual domain
implementation requests a broader range, e.g. on x86 PCI multi-MSI
is only supported by remapping domains but not by the underlying
vector domain. The PCI/MSI code can then always request multi-MSI
support, but the resulting feature set after creation might not
have it set.
2) An optional string prefix which is put in front of domain and chip
names during creation of the MSI domain. That allows to keep the
naming schemes e.g. on x86 where PCI-MSI domains have a IR- prefix
when interrupt remapping is enabled.
3) An initialization callback to sanity check the domain info of
the to be created MSI domain, to restrict features and to
apply changes in MSI ops and interrupt chip callbacks to
accomodate to the particular MSI parent implementation and/or
the underlying hierarchy.
Add a conveniance function to delegate the initialization from the
MSI parent domain to an underlying domain in the hierarchy.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Now that all users are converted remove the old interfaces.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Provide two sorts of interfaces to handle the different use cases:
- msi_domain_alloc_irqs_range():
Handles a caller defined precise range
- msi_domain_alloc_irqs_all():
Allocates all interrupts associated to a domain by scanning the
allocated MSI descriptors
The latter is useful for the existing PCI/MSI support which does not have
range information available.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Provide two sorts of interfaces to handle the different use cases:
- msi_domain_free_irqs_range():
Handles a caller defined precise range
- msi_domain_free_irqs_all():
Frees all interrupts associated to a domain
The latter is useful for device teardown and to handle the legacy MSI support
which does not have any range information available.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Allocating simple interrupt descriptors in the core code has to be multi
device irqdomain aware for the upcoming PCI/IMS support.
Change the interfaces to take a domain id into account. Use the internal
control struct for transport of arguments.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Change the descriptor free functions to take a domain id to prepare for the
upcoming multi MSI domain per device support.
To avoid changing and extending the interfaces over and over use an core
internal control struct and hand the pointer through the various functions.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Change the descriptor allocation and insertion functions to take a domain
id to prepare for the upcoming multi MSI domain per device support.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This reflects the functionality better. No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In preparation of the upcoming per device multi MSI domain support, change
the interface to support lookups based on domain id and zero based index
within the domain.
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
To support multiple MSI interrupt domains per device it is necessary to
segment the xarray MSI descriptor storage. Each domain gets up to
MSI_MAX_INDEX entries.
Change the iterators so they operate with domain ids and take the domain
offsets into account.
The publicly available iterators which are mostly used in legacy
implementations and the PCI/MSI core default to MSI_DEFAULT_DOMAIN (0)
which is the id for the existing "global" domains.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With the upcoming per device MSI interrupt domain support it is necessary
to store the domain pointers per device.
Instead of delegating that storage to device drivers or subsystems add a
domain pointer to the msi_dev_domain array in struct msi_device_data.
This pointer is also used to take care of tearing down the irq domains when
msi_device_data is cleaned up via devres.
The interfaces into the MSI core will be changed from irqdomain pointer
based interfaces to domain id based interfaces to support multiple MSI
domains on a single device (e.g. PCI/MSI[-X] and PCI/IMS.
Once the per device domain support is complete the irq domain pointer in
struct device::msi.domain will not longer contain a pointer to the "global"
MSI domain. It will contain a pointer to the MSI parent domain instead.
It would be a horrible maze of conditionals to evaluate all over the place
which domain pointer should be used, i.e. the "global" one in
device::msi::domain or one from the internal pointer array.
To avoid this evaluate in msi_setup_device_data() whether the irq domain
which is associated to a device is a "global" or a parent MSI domain. If it
is global then copy the pointer into the first entry of the msi_dev_domain
array.
This allows to convert interfaces and implementation to domain ids while
keeping everything existing working.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The upcoming support for multiple MSI domains per device requires storage
for the MSI descriptors and in a second step storage for the irqdomain
pointers.
Move the xarray into a separate data structure msi_dev_domain and create an
array with size 1 in msi_device_data, which can be expanded later when the
support for per device domains is implemented.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In the upcoming per device MSI domain concept the MSI parent domains are
not allowed to be used as regular MSI domains where the MSI allocation/free
operations are applicable.
Add appropriate checks.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
irq_domain::dev is a misnomer as it's usually the rule that a device
pointer points to something which is directly related to the instance.
irq_domain::dev can point to some other device for power management to
ensure that this underlying device is not powered down when an interrupt is
allocated.
The upcoming per device MSI domains really require a pointer to the device
which instantiated the irq domain and not to some random other device which
is required for power management down the chain.
Rename irq_domain::dev to irq_domain::pm_dev and fixup the few sites which
use that pointer.
Conversion was done with the help of coccinelle.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It's truly a MSI only flag and for the upcoming per device MSI domains this
must be in the MSI flags so it can be set during domain setup without
exposing this quirk outside of x86.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- Fix a use-after-free case where the perf pending task callback would
see an already freed event
* tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix perf_pending_task() UaF
|
|
Fault injection tests trigger warnings like this:
kernfs: can not remove 'chip_name', no directory
WARNING: CPU: 0 PID: 253 at fs/kernfs/dir.c:1616 kernfs_remove_by_name_ns+0xce/0xe0
RIP: 0010:kernfs_remove_by_name_ns+0xce/0xe0
Call Trace:
<TASK>
remove_files.isra.1+0x3f/0xb0
sysfs_remove_group+0x68/0xe0
sysfs_remove_groups+0x41/0x70
__kobject_del+0x45/0xc0
kobject_del+0x29/0x40
free_desc+0x42/0x70
irq_free_descs+0x5e/0x90
The reason is that the interrupt descriptor sysfs handling does not roll
back on a failing kobject_add() during allocation. If the descriptor is
freed later on, kobject_del() is invoked with a not added kobject resulting
in the above warnings.
A proper rollback in case of a kobject_add() failure would be the straight
forward solution. But this is not possible due to the way how interrupt
descriptor sysfs handling works.
Interrupt descriptors are allocated before sysfs becomes available. So the
sysfs files for the early allocated descriptors are added later in the boot
process. At this point there can be nothing useful done about a failing
kobject_add(). For consistency the interrupt descriptor allocation always
treats kobject_add() failures as non-critical and just emits a warning.
To solve this problem, keep track in the interrupt descriptor whether
kobject_add() was successful or not and make the invocation of
kobject_del() conditional on that.
[ tglx: Massage changelog, comments and use a state bit. ]
Fixes: ecb3f394c5db ("genirq: Expose interrupt information through sysfs")
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and wifi.
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF"
* tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
MAINTAINERS: Update maintainer list for chelsio drivers
ionic: update MAINTAINERS entry
sctp: fix memory leak in sctp_stream_outq_migrate()
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
net/mlx5: Lag, Fix for loop when checking lag
Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path"
net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions
net: tun: Fix use-after-free in tun_detach()
net: mdiobus: fix unbalanced node reference count
net: hsr: Fix potential use-after-free
tipc: re-fetch skb cb after tipc_msg_validate
mptcp: fix sleep in atomic at close time
mptcp: don't orphan ssk in mptcp_close()
dsa: lan9303: Correct stat name
ipv4: Fix route deletion when nexthop info is not specified
net: wwan: iosm: fix incorrect skb length
net: wwan: iosm: fix crash in peek throughput test
net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type
net: wwan: iosm: fix kernel test robot reported error
...
|
|
Per syzbot it is possible for perf_pending_task() to run after the
event is free()'d. There are two related but distinct cases:
- the task_work was already queued before destroying the event;
- destroying the event itself queues the task_work.
The first cannot be solved using task_work_cancel() since
perf_release() itself might be called from a task_work (____fput),
which means the current->task_works list is already empty and
task_work_cancel() won't be able to find the perf_pending_task()
entry.
The simplest alternative is extending the perf_event lifetime to cover
the task_work.
The second is just silly, queueing a task_work while you know the
event is going away makes no sense and is easily avoided by
re-arranging how the event is marked STATE_DEAD and ensuring it goes
through STATE_OFF on the way down.
Reported-by: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Marco Elver <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix osnoise duration type to 64bit not 32bit
- Have histogram triggers be able to handle an unexpected NULL pointer
for the record event, which can happen when the histogram first
starts up
- Clear out ring buffers when dynamic events are removed, as the type
that is saved in the ring buffer is used to read the event, and a
stale type that is reused by another event could cause use after free
issues
- Trivial comment fix
- Fix memory leak in user_event_create()
* tag 'trace-v6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Free buffers when a used dynamic event is removed
tracing: Add tracing_reset_all_online_cpus_unlocked() function
tracing: Fix race where histograms can be called before the event
tracing/osnoise: Fix duration type
tracing/user_events: Fix memory leak in user_event_create()
tracing/hist: add in missing * in comment blocks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
"Two more fixes to the perf sigtrap handling:
- output the address in the sample only when it has been requested
- handle the case where user-only events can hit in kernel and thus
upset the sigtrap sanity checking"
* tag 'perf_urgent_for_v6.1_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Consider OS filter fail
perf: Fixup SIGTRAP and sample_flags interaction
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These revert a recent change in the schedutil cpufreq governor that
had not been expected to make any functional difference, but turned
out to introduce a performance regression, fix an initialization issue
in the amd-pstate driver and make it actually replace the venerable
ACPI cpufreq driver on the supported systems by default.
Specifics:
- Revert a recent schedutil cpufreq governor change that introduced a
performace regression on Pixel 6 (Sam Wu)
- Fix amd-pstate driver initialization after running the kernel via
kexec (Wyes Karny)
- Turn amd-pstate into a built-in driver which allows it to take
precedence over acpi-cpufreq by default on supported systems and
amend it with a mechanism to disable this behavior (Perry Yuan)
- Update amd-pstate documentation in accordance with the other
changes made to it (Perry Yuan)"
* tag 'pm-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation: add amd-pstate kernel command line options
Documentation: amd-pstate: add driver working mode introduction
cpufreq: amd-pstate: add amd-pstate driver parameter for mode selection
cpufreq: amd-pstate: change amd-pstate driver to be built-in type
cpufreq: amd-pstate: cpufreq: amd-pstate: reset MSR_AMD_PERF_CTL register at init
Revert "cpufreq: schedutil: Move max CPU capacity to sugov_policy"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"24 MM and non-MM hotfixes. 8 marked cc:stable and 16 for post-6.0
issues.
There have been a lot of hotfixes this cycle, and this is quite a
large batch given how far we are into the -rc cycle. Presumably a
reflection of the unusually large amount of MM material which went
into 6.1-rc1"
* tag 'mm-hotfixes-stable-2022-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits)
test_kprobes: fix implicit declaration error of test_kprobes
nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty
mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
mm: fix unexpected changes to {failslab|fail_page_alloc}.attr
swapfile: fix soft lockup in scan_swap_map_slots
hugetlb: fix __prep_compound_gigantic_page page flag setting
kfence: fix stack trace pruning
proc/meminfo: fix spacing in SecPageTables
mm: multi-gen LRU: retry folios written back while isolated
mailmap: update email address for Satya Priya
mm/migrate_device: return number of migrating pages in args->cpages
kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
MAINTAINERS: update Alex Hung's email address
mailmap: update Alex Hung's email address
mm: mmap: fix documentation for vma_mas_szero
mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed
mm/memory: return vm_fault_t result from migrate_to_ram() callback
mm: correctly charge compressed memory to its memcg
ipc/shm: call underlying open/close vm_ops
gcov: clang: fix the buffer overflow issue
...
|
|
Some PMUs (notably the traditional hardware kind) have boundary issues
with the OS filter. Specifically, it is possible for
perf_event_attr::exclude_kernel=1 events to trigger in-kernel due to
SKID or errata.
This can upset the sigtrap logic some and trigger the WARN.
However, if this invalid sample is the first we must not loose the
SIGTRAP, OTOH if it is the second, it must not override the
pending_addr with a (possibly) invalid one.
Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs")
Reported-by: Pengfei Xu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The perf_event_attr::sigtrap functionality relies on data->addr being
set. However commit 7b0846301531 ("perf: Use sample_flags for addr")
changed this to only initialize data->addr when not 0.
Fixes: 7b0846301531 ("perf: Use sample_flags for addr")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/Y3426b4OimE%2FI5po%40hirez.programming.kicks-ass.net
|
|
After 65536 dynamic events have been added and removed, the "type" field
of the event then uses the first type number that is available (not
currently used by other events). A type number is the identifier of the
binary blobs in the tracing ring buffer (known as events) to map them to
logic that can parse the binary blob.
The issue is that if a dynamic event (like a kprobe event) is traced and
is in the ring buffer, and then that event is removed (because it is
dynamic, which means it can be created and destroyed), if another dynamic
event is created that has the same number that new event's logic on
parsing the binary blob will be used.
To show how this can be an issue, the following can crash the kernel:
# cd /sys/kernel/tracing
# for i in `seq 65536`; do
echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
# done
For every iteration of the above, the writing to the kprobe_events will
remove the old event and create a new one (with the same format) and
increase the type number to the next available on until the type number
reaches over 65535 which is the max number for the 16 bit type. After it
reaches that number, the logic to allocate a new number simply looks for
the next available number. When an dynamic event is removed, that number
is then available to be reused by the next dynamic event created. That is,
once the above reaches the max number, the number assigned to the event in
that loop will remain the same.
Now that means deleting one dynamic event and created another will reuse
the previous events type number. This is where bad things can happen.
After the above loop finishes, the kprobes/foo event which reads the
do_sys_openat2 function call's first parameter as an integer.
# echo 1 > kprobes/foo/enable
# cat /etc/passwd > /dev/null
# cat trace
cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
# echo 0 > kprobes/foo/enable
Now if we delete the kprobe and create a new one that reads a string:
# echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
And now we can the trace:
# cat trace
sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
And dmesg has:
==================================================================
BUG: KASAN: use-after-free in string+0xd4/0x1c0
Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x77
print_report+0x17f/0x47b
kasan_report+0xad/0x130
string+0xd4/0x1c0
vsnprintf+0x500/0x840
seq_buf_vprintf+0x62/0xc0
trace_seq_printf+0x10e/0x1e0
print_type_string+0x90/0xa0
print_kprobe_event+0x16b/0x290
print_trace_line+0x451/0x8e0
s_show+0x72/0x1f0
seq_read_iter+0x58e/0x750
seq_read+0x115/0x160
vfs_read+0x11d/0x460
ksys_read+0xa9/0x130
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fc2e972ade2
Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
</TASK>
The buggy address belongs to the physical page:
page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
This was found when Zheng Yejian sent a patch to convert the event type
number assignment to use IDA, which gives the next available number, and
this bug showed up in the fuzz testing by Yujie Liu and the kernel test
robot. But after further analysis, I found that this behavior is the same
as when the event type numbers go past the 16bit max (and the above shows
that).
As modules have a similar issue, but is dealt with by setting a
"WAS_ENABLED" flag when a module event is enabled, and when the module is
freed, if any of its events were enabled, the ring buffer that holds that
event is also cleared, to prevent reading stale events. The same can be
done for dynamic events.
If any dynamic event that is being removed was enabled, then make sure the
buffers they were enabled in are now cleared.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework")
Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events")
Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in")
Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced")
Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
Reported-by: Zheng Yejian <[email protected]>
Reported-by: Yujie Liu <[email protected]>
Reported-by: kernel test robot <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Currently the tracing_reset_all_online_cpus() requires the
trace_types_lock held. But only one caller of this function actually has
that lock held before calling it, and the other just takes the lock so
that it can call it. More users of this function is needed where the lock
is not held.
Add a tracing_reset_all_online_cpus_unlocked() function for the one use
case that calls it without being held, and also add a lockdep_assert to
make sure it is held when called.
Then have tracing_reset_all_online_cpus() take the lock internally, such
that callers do not need to worry about taking it.
Link: https://lkml.kernel.org/r/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before
the event") fixed an issue where if an event is soft disabled, and the
trigger is being added, there's a small window where the event sees that
there's a trigger but does not see that it requires reading the event yet,
and then calls the trigger with the record == NULL.
This could be solved with adding memory barriers in the hot path, or to
make sure that all the triggers requiring a record check for NULL. The
latter was chosen.
Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but
the same needs to be done with histograms.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Tom Zanussi <[email protected]>
Cc: [email protected]
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Reported-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Currently, in clang version of gcov code when module is getting removed
gcov_info_add() incorrectly adds the sfn_ptr->counter to all the
dst->functions and it result in the kernel panic in below crash report.
Fix this by properly handling it.
[ 8.899094][ T599] Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000
[ 8.899100][ T599] Mem abort info:
[ 8.899102][ T599] ESR = 0x9600004f
[ 8.899103][ T599] EC = 0x25: DABT (current EL), IL = 32 bits
[ 8.899105][ T599] SET = 0, FnV = 0
[ 8.899107][ T599] EA = 0, S1PTW = 0
[ 8.899108][ T599] FSC = 0x0f: level 3 permission fault
[ 8.899110][ T599] Data abort info:
[ 8.899111][ T599] ISV = 0, ISS = 0x0000004f
[ 8.899113][ T599] CM = 0, WnR = 1
[ 8.899114][ T599] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000ab8de000
[ 8.899116][ T599] [ffffff80461cc000] pgd=18000009ffcde003, p4d=18000009ffcde003, pud=18000009ffcde003, pmd=18000009ffcad003, pte=00600000c61cc787
[ 8.899124][ T599] Internal error: Oops: 9600004f [#1] PREEMPT SMP
[ 8.899265][ T599] Skip md ftrace buffer dump for: 0x1609e0
....
..,
[ 8.899544][ T599] CPU: 7 PID: 599 Comm: modprobe Tainted: G S OE 5.15.41-android13-8-g38e9b1af6bce #1
[ 8.899547][ T599] Hardware name: XXX (DT)
[ 8.899549][ T599] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
[ 8.899551][ T599] pc : gcov_info_add+0x9c/0xb8
[ 8.899557][ T599] lr : gcov_event+0x28c/0x6b8
[ 8.899559][ T599] sp : ffffffc00e733b00
[ 8.899560][ T599] x29: ffffffc00e733b00 x28: ffffffc00e733d30 x27: ffffffe8dc297470
[ 8.899563][ T599] x26: ffffffe8dc297000 x25: ffffffe8dc297000 x24: ffffffe8dc297000
[ 8.899566][ T599] x23: ffffffe8dc0a6200 x22: ffffff880f68bf20 x21: 0000000000000000
[ 8.899569][ T599] x20: ffffff880f68bf00 x19: ffffff8801babc00 x18: ffffffc00d7f9058
[ 8.899572][ T599] x17: 0000000000088793 x16: ffffff80461cbe00 x15: 9100052952800785
[ 8.899575][ T599] x14: 0000000000000200 x13: 0000000000000041 x12: 9100052952800785
[ 8.899577][ T599] x11: ffffffe8dc297000 x10: ffffffe8dc297000 x9 : ffffff80461cbc80
[ 8.899580][ T599] x8 : ffffff8801babe80 x7 : ffffffe8dc2ec000 x6 : ffffffe8dc2ed000
[ 8.899583][ T599] x5 : 000000008020001f x4 : fffffffe2006eae0 x3 : 000000008020001f
[ 8.899586][ T599] x2 : ffffff8027c49200 x1 : ffffff8801babc20 x0 : ffffff80461cb3a0
[ 8.899589][ T599] Call trace:
[ 8.899590][ T599] gcov_info_add+0x9c/0xb8
[ 8.899592][ T599] gcov_module_notifier+0xbc/0x120
[ 8.899595][ T599] blocking_notifier_call_chain+0xa0/0x11c
[ 8.899598][ T599] do_init_module+0x2a8/0x33c
[ 8.899600][ T599] load_module+0x23cc/0x261c
[ 8.899602][ T599] __arm64_sys_finit_module+0x158/0x194
[ 8.899604][ T599] invoke_syscall+0x94/0x2bc
[ 8.899607][ T599] el0_svc_common+0x1d8/0x34c
[ 8.899609][ T599] do_el0_svc+0x40/0x54
[ 8.899611][ T599] el0_svc+0x94/0x2f0
[ 8.899613][ T599] el0t_64_sync_handler+0x88/0xec
[ 8.899615][ T599] el0t_64_sync+0x1b4/0x1b8
[ 8.899618][ T599] Code: f905f56c f86e69ec f86e6a0f 8b0c01ec (f82e6a0c)
[ 8.899620][ T599] ---[ end trace ed5218e9e5b6e2e6 ]---
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e178a5beb369 ("gcov: clang support")
Signed-off-by: Mukesh Ojha <[email protected]>
Reviewed-by: Peter Oberparleiter <[email protected]>
Tested-by: Peter Oberparleiter <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: <[email protected]> [5.2+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
The duration type is a 64 long value, not an int. This was
causing some long noise to report wrong values.
Change the duration to a 64 bits value.
Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org
Cc: [email protected]
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Fixes: bce29ac9ce0b ("trace: Add osnoise tracer")
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Before current_user_event_group(), it has allocated memory and save it
in @name, this should freed before return error.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e5d271812e7a ("tracing/user_events: Move pages/locks into groups to prepare for namespaces")
Signed-off-by: Xiu Jianfeng <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Beau Belgrave <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
There are a couple of missing * in comment blocks. Fix these.
Cleans up two clang warnings:
kernel/trace/trace_events_hist.c:986: warning: bad line:
kernel/trace/trace_events_hist.c:3229: warning: bad line:
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
This reverts commit 6d5afdc97ea71958287364a1f1d07e59ef151b11.
On a Pixel 6 device, it is observed that this commit increases
latency by approximately 50ms, or 20%, in migrating a task
that requires full CPU utilization from a LITTLE CPU to Fmax
on a big CPU. Reverting this change restores the latency back
to its original baseline value.
Fixes: 6d5afdc97ea7 ("cpufreq: schedutil: Move max CPU capacity to sugov_policy")
Signed-off-by: Sam Wu <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
bpf_selem_alloc function is used by inode_storage, sk_storage and
task_storage maps to set map value, for these map types, there may
be a spin lock in the map value, so if we use memcpy to copy the whole
map value from user, the spin lock field may be initialized incorrectly.
Since the spin lock field is zeroed by kzalloc, call copy_map_value
instead of memcpy to skip copying the spin lock field to fix it.
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Signed-off-by: Xu Kuohai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing/probes fixes from Steven Rostedt:
- Fix possible NULL pointer dereference on trace_event_file in
kprobe_event_gen_test_exit()
- Fix NULL pointer dereference for trace_array in
kprobe_event_gen_test_exit()
- Fix memory leak of filter string for eprobes
- Fix a possible memory leak in rethook_alloc()
- Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case which
can cause a possible use-after-free
- Fix warning in eprobe filter creation
- Fix eprobe filter creation as it picked the wrong event for the
fields
* tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/eprobe: Fix eprobe filter to make a filter correctly
tracing/eprobe: Fix warning in filter creation
kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case
rethook: fix a potential memleak in rethook_alloc()
tracing/eprobe: Fix memory leak of filter string
tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()
tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix polling to block on watermark like the reads do, as user space
applications get confused when the select says read is available, and
then the read blocks
- Fix accounting of ring buffer dropped pages as it is what is used to
determine if the buffer is empty or not
- Fix memory leak in tracing_read_pipe()
- Fix struct trace_array warning about being declared in parameters
- Fix accounting of ftrace pages used in output at start up.
- Fix allocation of dyn_ftrace pages by subtracting one from order
instead of diving it by 2
- Static analyzer found a case were a pointer being used outside of a
NULL check (rb_head_page_deactivate())
- Fix possible NULL pointer dereference if kstrdup() fails in
ftrace_add_mod()
- Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
- Fix bad pointer dereference in register_synth_event() on error path
- Remove unused __bad_type_size() method
- Fix possible NULL pointer dereference of entry in list 'tr->err_log'
- Fix NULL pointer deference race if eprobe is called before the event
setup
* tag 'trace-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix race where eprobes can be called before the event
tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'
tracing: Remove unused __bad_type_size() method
tracing: Fix wild-memory-access in register_synth_event()
tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()
ftrace: Fix null pointer dereference in ftrace_add_mod()
ring_buffer: Do not deactivate non-existant pages
ftrace: Optimize the allocation for mcount entries
ftrace: Fix the possible incorrect kernel message
tracing: Fix warning on variable 'struct trace_array'
tracing: Fix memory leak in tracing_read_pipe()
ring-buffer: Include dropped pages in counting dirty patches
tracing/ring-buffer: Have polling block on watermark
|
|
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.
Test for a NULL record to keep this from happening.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Linux Trace Kernel <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: Tom Zanussi <[email protected]>
Cc: [email protected]
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Reported-by: Rafael Mendonca <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Fix a small race on the task's exit path where there's a
misunderstanding whether the task holds rq->lock or not
- Prevent processes from getting killed when using deprecated or
unknown rseq ABI flags in order to be able to fuzz the rseq() syscall
with syzkaller
* tag 'sched_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix race in task_call_func()
rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered
|