aboutsummaryrefslogtreecommitdiff
path: root/include/trace/events
AgeCommit message (Collapse)AuthorFilesLines
2022-12-12f2fs: refactor extent_cache to support for read and moreJaegeuk Kim1-26/+36
This patch prepares extent_cache to be ready for addition. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2022-12-12Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds1-0/+112
Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for 6.2. While it may seem backwards, the CXL update this time around includes some focus on CXL 1.x enabling where the work to date had been with CXL 2.0 (VH topologies) in mind. First generation CXL can mostly be supported via BIOS, similar to DDR, however it became clear there are use cases for OS native CXL error handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x hosts (Restricted CXL Host (RCH) topologies). So, this update brings RCH topologies into the Linux CXL device model. In support of the ongoing CXL 2.0+ enabling two new core kernel facilities are added. One is the ability for the kernel to flag collisions between userspace access to PCI configuration registers and kernel accesses. This is brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware mailbox over config-cycles. The other is a cpu_cache_invalidate_memregion() API that maps to wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest VMs and architectures that do not support it yet. The CXL paths that need it, dynamic memory region creation and security commands (erase / unlock), are disabled when it is not present. As for the CXL 2.0+ this cycle the subsystem gains support Persistent Memory Security commands, error handling in response to PCIe AER notifications, and support for the "XOR" host bridge interleave algorithm. Summary: - Add the cpu_cache_invalidate_memregion() API for cache flushing in response to physical memory reconfiguration, or memory-side data invalidation from operations like secure erase or memory-device unlock. - Add a facility for the kernel to warn about collisions between kernel and userspace access to PCI configuration registers - Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1) - Add handling and reporting of CXL errors reported via the PCIe AER mechanism - Add support for CXL Persistent Memory Security commands - Add support for the "XOR" algorithm for CXL host bridge interleave - Rework / simplify CXL to NVDIMM interactions - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits) cxl/region: Fix memdev reuse check cxl/pci: Remove endian confusion cxl/pci: Add some type-safety to the AER trace points cxl/security: Drop security command ioctl uapi cxl/mbox: Add variable output size validation for internal commands cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size cxl/security: Fix Get Security State output payload endian handling cxl: update names for interleave ways conversion macros cxl: update names for interleave granularity conversion macros cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry tools/testing/cxl: Require cache invalidation bypass cxl/acpi: Fail decoder add if CXIMS for HBIG is missing cxl/region: Fix spelling mistake "memergion" -> "memregion" cxl/regs: Fix sparse warning cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support tools/testing/cxl: Add an RCH topology cxl/port: Add RCD endpoint port enumeration cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem tools/testing/cxl: Add XOR Math support to cxl_test cxl/acpi: Support CXL XOR Interleave Math (CXIMS) ...
2022-12-12Merge branches 'clk-mediatek', 'clk-trace', 'clk-qcom' and 'clk-microchip' ↵Stephen Boyd1-0/+43
into clk-next - Tracepoints for clk_rate_request structures * clk-mediatek: clk: mediatek: fix dependency of MT7986 ADC clocks clk: mediatek: Change PLL register API for MT8186 clk: mediatek: Add new clock driver to handle FHCTL hardware dt-bindings: clock: mediatek: Add new bindings of MediaTek frequency hopping clk: mediatek: Export PLL operations symbols clk: mediatek: mt8186-topckgen: Add GPU clock mux notifier clk: mediatek: mt8186-mfg: Propagate rate changes to parent clk: mediatek: mt8195-topckgen: Drop flags for main/univpll fixed factors clk: mediatek: mt8192: Drop flags for main/univpll fixed factors clk: mediatek: mt6795-topckgen: Drop flags for main/sys/univpll fixed factors clk: mediatek: mt8173: Drop flags for main/sys/univpll fixed factors clk: mediatek: mt8183: Drop flags for sys/univpll fixed factors clk: mediatek: mt8183: Compress top_divs array entries clk: mediatek: mt8186-topckgen: Drop flags for main/univpll fixed factors clk: mediatek: clk-mtk: Allow specifying flags on mtk_fixed_factor clocks * clk-trace: clk: Add trace events for rate requests clk: Store clk_core for clk_rate_request * clk-qcom: (69 commits) clk: qcom: rpmh: add support for SM6350 rpmh IPA clock clk: qcom: mmcc-msm8974: use parent_hws/_data instead of parent_names clk: qcom: mmcc-msm8974: move clock parent tables down clk: qcom: mmcc-msm8974: use ARRAY_SIZE instead of specifying num_parents clk: qcom: gcc-msm8974: use parent_hws/_data instead of parent_names clk: qcom: gcc-msm8974: move clock parent tables down clk: qcom: gcc-msm8974: use ARRAY_SIZE instead of specifying num_parents dt-bindings: clocks: qcom,mmcc: define clocks/clock-names for MSM8974 dt-bindings: clock: split qcom,gcc-msm8974,-msm8226 to the separate file clk: qcom: gcc-ipq4019: switch to devm_clk_notifier_register clk: qcom: rpmh: remove usage of platform name clk: qcom: rpmh: rename VRM clock data clk: qcom: rpmh: rename ARC clock data clk: qcom: rpmh: support separate symbol name for the RPMH clocks clk: qcom: rpmh: remove platform names from BCM clocks clk: qcom: rpmh: drop all _ao names clk: qcom: rpmh: reuse common duplicate clocks clk: qcom: rpmh: group clock definitions together clk: qcom: rpm: drop the platform from clock definitions clk: qcom: rpm: drop the _clk suffix completely ... * clk-microchip: clk: microchip: enable the MPFS clk driver by default if SOC_MICROCHIP_POLARFIRE clk: microchip: check for null return of devm_kzalloc()
2022-12-11mm/khugepaged: add tracepoint to collapse_file()Gautam Menghani1-0/+38
"mm_khugepaged_collapse_file" for capturing is_shmem. Currently, is_shmem is not being captured. Capturing is_shmem is useful as it can indicate if tmpfs is being used as a backing store instead of persistent storage. Add the tracepoint in collapse_file() named "mm_khugepaged_collapse_file" for capturing is_shmem. [[email protected]: swap is_shmem and addr to save space, per Steven Rostedt] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gautam Menghani <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> [tracing] Cc: David Hildenbrand <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-10SUNRPC: Make the svc_authenticate tracepoint conditionalChuck Lever1-1/+3
Clean up: Simplify the tracepoint's only call site. Also, I noticed that when svc_authenticate() returns SVC_COMPLETE, it leaves rq_auth_stat set to an error value. That doesn't need to be recorded in the trace log. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10trace: Relocate event helper filesChuck Lever7-687/+4
Steven Rostedt says: > The include/trace/events/ directory should only hold files that > are to create events, not headers that hold helper functions. > > Can you please move them out of include/trace/events/ as that > directory is "special" in the creation of events. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Leon Romanovsky <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Acked-by: Anna Schumaker <[email protected]>
2022-12-09Merge tag 'v6.1-rc8' into rdma.git for-nextJason Gunthorpe1-4/+4
For dependencies in following patches Signed-off-by: Jason Gunthorpe <[email protected]>
2022-12-08ext4: disable fast-commit of encrypted dir operationsEric Biggers1-2/+5
fast-commit of create, link, and unlink operations in encrypted directories is completely broken because the unencrypted filenames are being written to the fast-commit journal instead of the encrypted filenames. These operations can't be replayed, as encryption keys aren't present at journal replay time. It is also an information leak. Until if/when we can get this working properly, make encrypted directory operations ineligible for fast-commit. Note that fast-commit operations on encrypted regular files continue to be allowed, as they seem to work. Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: <[email protected]> # v5.10+ Signed-off-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2022-12-08jbd2: use the correct print formatBixuan Cui1-22/+22
The print format error was found when using ftrace event: <...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368 <...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0 Use the correct print format for transaction, head and tid. Fixes: 879c5e6b7cb4 ('jbd2: convert instrumentation from markers to tracepoints') Signed-off-by: Bixuan Cui <[email protected]> Reviewed-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected]
2022-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07clk: Add trace events for rate requestsMaxime Ripard1-0/+43
It is currently fairly difficult to follow what clk_rate_request are issued, and how they have been modified once done. Indeed, there's multiple paths that can be taken, some functions are recursive and will just forward the request to its parent, etc. Adding a lot of debug prints is just not very convenient, so let's add trace events for the clock requests, one before they are submitted and one after they are returned. That way we can simply toggle the tracing on without modifying the kernel code and without affecting performances or the kernel logs too much. Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stephen Boyd <[email protected]>
2022-12-07fscache: Fix oops due to race with cookie_lru and use_cookieDave Wysochanski1-0/+2
If a cookie expires from the LRU and the LRU_DISCARD flag is set, but the state machine has not run yet, it's possible another thread can call fscache_use_cookie and begin to use it. When the cookie_worker finally runs, it will see the LRU_DISCARD flag set, transition the cookie->state to LRU_DISCARDING, which will then withdraw the cookie. Once the cookie is withdrawn the object is removed the below oops will occur because the object associated with the cookie is now NULL. Fix the oops by clearing the LRU_DISCARD bit if another thread uses the cookie before the cookie_worker runs. BUG: kernel NULL pointer dereference, address: 0000000000000008 ... CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G E 6.0.0-5.dneg.x86_64 #1 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles] ... Call Trace: netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs] process_one_work+0x217/0x3e0 worker_thread+0x4a/0x3b0 kthread+0xd6/0x100 Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Reported-by: Daire Byrne <[email protected]> Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: David Howells <[email protected]> Tested-by: Daire Byrne <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ # v1 Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Linus Torvalds <[email protected]>
2022-12-07fscache,cachefiles: add prepare_ondemand_read() callbackJingbo Xu1-13/+14
Add prepare_ondemand_read() callback dedicated for the on-demand read scenario, so that callers from this scenario can be decoupled from netfs_io_subrequest. The original cachefiles_prepare_read() is now refactored to a generic routine accepting a parameter list instead of netfs_io_subrequest. There's no logic change, except that the debug id of subrequest and request is removed from trace_cachefiles_prep_read(). Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Jingbo Xu <[email protected]> Acked-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-12-06cxl/pci: Add some type-safety to the AER trace pointsDan Williams1-8/+8
The first argument to the CXL AER trace points is the source device. Pass a 'const struct device *' rather than a 'const char *' for more type precision / safety. Cc: Jonathan Cameron <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Steven Rostedt <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Link: https://lore.kernel.org/r/167030091477.4045167.15174636482098463885.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <[email protected]>
2022-12-06pwm/tracing: Also record trace events for failed API callsUwe Kleine-König1-10/+10
Record and report an error code for the events. This allows to report about failed calls without ambiguity and so gives a more complete picture. Acked-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
2022-12-05btrfs: switch extent_io_tree::private_data to btrfs_inode and renameDavid Sterba1-15/+12
The extent_io_tree::private_data was meant to be a preparatory work for the metadata inode rework but that never materialized. Now it's used only for an inode so it's better to change the appropriate type and rename it. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-12-03cxl/pci: add tracepoint events for CXL RASDave Jiang1-0/+112
Add tracepoint events for recording the CXL uncorrectable and correctable errors. For uncorrectable errors, there is additional data of 512B from the header log register (CXL spec rev3 8.2.4.16.7). The trace event will intake a dynamic array that will dump the entire Header Log data. If multiple errors are set in the status register, then the 'first error' field (CXL spec rev3 v8.2.4.16.6) is read from the Error Capabilities and Control Register in order to determine the error. This implementation does not include CXL IDE Error details. Cc: Steven Rostedt <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Link: https://lore.kernel.org/r/166974413388.1608150.5875712482260436188.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dan Williams <[email protected]>
2022-12-01ext4: split ext4_journal_start trace for debugchangfengnan1-11/+46
we might want to know why jbd2 thread using high io for detail, split ext4_journal_start trace to ext4_journal_start_sb and ext4_journal_start_inode, show ino and handle type when possible. Signed-off-by: changfengnan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2022-12-01blk-iocost: Trace vtime_base_rate instead of vtime_rateKemeng Shi1-2/+2
Since commit ac33e91e2daca ("blk-iocost: implement vtime loss compensation") rename original vtime_rate to vtime_base_rate and current vtime_rate is original vtime_rate with compensation. The current rate showed in tracepoint is mixed with vtime_rate and vtime_base_rate: 1) In function ioc_adjust_base_vrate, the first trace_iocost_ioc_vrate_adj shows vtime_rate, the second trace_iocost_ioc_vrate_adj shows vtime_base_rate. 2) In function iocg_activate shows vtime_rate by calling TRACE_IOCG_PATH(iocg_activate... 3) In function ioc_check_iocgs shows vtime_rate by calling TRACE_IOCG_PATH(iocg_idle... Trace vtime_base_rate instead of vtime_rate as: 1) Before commit ac33e91e2daca ("blk-iocost: implement vtime loss compensation"), the traced rate is without compensation, so still show rate without compensation. 2) The vtime_base_rate is more stable while vtime_rate heavily depends on excess budeget on current period which may change abruptly in next period. Signed-off-by: Kemeng Shi <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-12-01rxrpc: Transmit ACKs at the point of generationDavid Howells1-3/+0
For ACKs generated inside the I/O thread, transmit the ACK at the point of generation. Where the ACK is generated outside of the I/O thread, it's offloaded to the I/O thread to transmit it. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Trace/count transmission underflows and cwnd resetsDavid Howells1-0/+38
Add a tracepoint to log when a cwnd reset occurs due to lack of transmission on a call. Add stat counters to count transmission underflows (ie. when we have tx window space, but sendmsg doesn't manage to keep up), cwnd resets and transmission failures. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Make the I/O thread take over the call and local processor workDavid Howells1-22/+20
Move the functions from the call->processor and local->processor work items into the domain of the I/O thread. The call event processor, now called from the I/O thread, then takes over the job of cranking the call state machine, processing incoming packets and transmitting DATA, ACK and ABORT packets. In a future patch, rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it for later transmission. The call event processor becomes purely received-skb driven. It only transmits things in response to events. We use "pokes" to queue a dummy skb to make it do things like start/resume transmitting data. Timer expiry also results in pokes. The connection event processor, becomes similar, though crypto events, such as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item to avoid doing crypto in the I/O thread. The local event processor is removed and VERSION response packets are generated directly from the packet parser. Similarly, ABORTs generated in response to protocol errors will be transmitted immediately rather than being pushed onto a queue for later transmission. Changes: ======== ver #2) - Fix a couple of introduced lock context imbalances. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Simplify skbuff accounting in receive pathDavid Howells1-1/+2
A received skbuff needs a ref when it gets put on a call data queue or conn packet queue, and rxrpc_input_packet() and co. jump through a lot of hoops to avoid double-dropping the skbuff ref so that we can avoid getting a ref when we queue the packet. Change this so that the skbuff ref is unconditionally dropped by the caller of rxrpc_input_packet(). An additional ref is then taken on the packet if it is pushed onto a queue. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Move DATA transmission into call processor work itemDavid Howells1-1/+5
Move DATA transmission into the call processor work item. In a future patch, this will be called from the I/O thread rather than being itsown work item. This will allow DATA transmission to be driven directly by incoming ACKs, pokes and timers as those are processed. The Tx queue is also split: The queue of packets prepared by sendmsg is now places in call->tx_sendmsg and the packet dispatcher decants the packets into call->tx_buffer as space becomes available in the transmission window. This allows sendmsg to run ahead of the available space to try and prevent an underflow in transmission. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Copy client call parameters into rxrpc_call earlierDavid Howells1-0/+3
Copy client call parameters into rxrpc_call earlier so that that can be used to convey them to the connection code - which can then be offloaded to the I/O thread. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Implement a mechanism to send an event notification to a callDavid Howells1-0/+52
Provide a means by which an event notification can be sent to a call such that the I/O thread can process it rather than it being done in a separate workqueue. This will allow a lot of locking to be removed. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Don't hold a ref for connection workqueueDavid Howells1-6/+5
Currently, rxrpc gives the connection's work item a ref on the connection when it queues it - and this is called from the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). (1) Don't give a ref to the work item. (2) Simplify handling of service connections by adding a separate active count so that the refcount isn't also used for this. (3) Connection destruction for both client and service connections can then be cleaned up by putting rxrpc_put_connection() out of line and making a tidy progression through the destruction code (offloaded to a workqueue if put from softirq or processor function context). The RCU part of the cleanup then only deals with the freeing at the end. (4) Make rxrpc_queue_conn() return immediately if it sees the active count is -1 rather then queuing the connection. (5) Make sure that the cleanup routine waits for the work item to complete. (6) Stash the rxrpc_net pointer in the conn struct so that the rcu free routine can use it, even if the local endpoint has been freed. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the connection work item is mostly going to go away with the main event work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Don't hold a ref for call timer or workqueueDavid Howells1-5/+1
Currently, rxrpc gives the call timer a ref on the call when it starts it and this is passed along to the workqueue by the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). Fix this by: (1) Don't give a ref to the timer. (2) Making the expiration function not do anything if the refcount is 0. Note that this is more of an optimisation. (3) Make sure that the cleanup routine waits for timer to complete. However, this has a consequence that timer cannot give a ref to the work item. Therefore the following fixes are also necessary: (4) Don't give a ref to the work item. (5) Make the work item return asap if it sees the ref count is 0. (6) Make sure that the cleanup routine waits for the work item to complete. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the call work item is going to go away with the work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: trace: Don't use __builtin_return_address for sk_buff tracingDavid Howells1-24/+33
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the sk_buff tracepoint. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Trace rxrpc_bundle refcountDavid Howells1-0/+34
Add a tracepoint for the rxrpc_bundle refcounting. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracingDavid Howells1-34/+49
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_call tracepoint Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracingDavid Howells1-21/+37
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_conn tracepoint Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracingDavid Howells1-17/+26
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_peer tracepoint Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracingDavid Howells1-13/+36
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_local tracepoint Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Remove the [k_]proto() debugging macrosDavid Howells1-0/+60
Remove the kproto() and _proto() debugging macros in preference to using tracepoints for this. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-11-30mm: convert mm's rss stats into percpu_counterShakeel Butt1-4/+4
Currently mm_struct maintains rss_stats which are updated on page fault and the unmapping codepaths. For page fault codepath the updates are cached per thread with the batch of TASK_RSS_EVENTS_THRESH which is 64. The reason for caching is performance for multithreaded applications otherwise the rss_stats updates may become hotspot for such applications. However this optimization comes with the cost of error margin in the rss stats. The rss_stats for applications with large number of threads can be very skewed. At worst the error margin is (nr_threads * 64) and we have a lot of applications with 100s of threads, so the error margin can be very high. Internally we had to reduce TASK_RSS_EVENTS_THRESH to 32. Recently we started seeing the unbounded errors for rss_stats for specific applications which use TCP rx0cp. It seems like vm_insert_pages() codepath does not sync rss_stats at all. This patch converts the rss_stats into percpu_counter to convert the error margin from (nr_threads * 64) to approximately (nr_cpus ^ 2). However this conversion enable us to get the accurate stats for situations where accuracy is more important than the cpu cost. This patch does not make such tradeoffs - we can just use percpu_counter_add_local() for the updates and percpu_counter_sum() (or percpu_counter_sync() + percpu_counter_read) for the readers. At the moment the readers are either procfs interface, oom_killer and memory reclaim which I think are not performance critical and should be ok with slow read. However I think we can make that change in a separate patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Cc: Marek Szyprowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30Merge branch 'mm-hotfixes-stable' into mm-stableAndrew Morton1-4/+4
2022-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-4/+4
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-29mm: Add PG_arch_3 page flagPeter Collingbourne1-0/+1
As with PG_arch_2, this flag is only allowed on 64-bit architectures due to the shortage of bits available. It will be used by the arm64 MTE code in subsequent patches. Signed-off-by: Peter Collingbourne <[email protected]> Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Steven Price <[email protected]> [[email protected]: added flag preserving in __split_huge_page_tail()] Signed-off-by: Catalin Marinas <[email protected]> Reviewed-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-29mm: Do not enable PG_arch_2 for all 64-bit architecturesCatalin Marinas1-4/+4
Commit 4beba9486abd ("mm: Add PG_arch_2 page flag") introduced a new page flag for all 64-bit architectures. However, even if an architecture is 64-bit, it may still have limited spare bits in the 'flags' member of 'struct page'. This may happen if an architecture enables SPARSEMEM without SPARSEMEM_VMEMMAP as is the case with the newly added loongarch. This architecture port needs 19 more bits for the sparsemem section information and, while it is currently fine with PG_arch_2, adding any more PG_arch_* flags will trigger build-time warnings. Add a new CONFIG_ARCH_USES_PG_ARCH_X option which can be selected by architectures that need more PG_arch_* flags beyond PG_arch_1. Select it on arm64. Signed-off-by: Catalin Marinas <[email protected]> [[email protected]: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <[email protected]> Reported-by: kernel test robot <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Steven Price <[email protected]> Reviewed-by: Steven Price <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-24net: use %pS for kfree_skb tracing event locationStanislav Fomichev1-1/+1
For the cases where 'reason' doesn't give any clue, it's still nice to be able to track the kfree_skb caller location. %p doesn't help much so let's use %pS which prints the symbol+offset. Signed-off-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-11-22mm/khugepaged: refactor mm_khugepaged_scan_file tracepoint to remove ↵Gautam Menghani1-4/+4
filename from function call Refactor the mm_khugepaged_scan_file tracepoint to move filename dereference to the tracepoint definition, to maintain consistency with other tracepoints[1]. [1]:lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: d41fd2016ed07 ("mm/khugepaged: add tracepoint to hpage_collapse_scan_file()") Signed-off-by: Gautam Menghani <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Zach O'Keefe <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-21asm-generic/io: Add _RET_IP_ to MMIO trace for more accurate debug infoSai Prakash Ranjan1-16/+27
Due to compiler optimizations like inlining, there are cases where MMIO traces using _THIS_IP_ for caller information might not be sufficient to provide accurate debug traces. 1) With optimizations (Seen with GCC): In this case, _THIS_IP_ works fine and prints the caller information since it will be inlined into the caller and we get the debug traces on who made the MMIO access, for ex: rwmmio_read: qcom_smmu_tlb_sync+0xe0/0x1b0 width=32 addr=0xffff8000087447f4 rwmmio_post_read: qcom_smmu_tlb_sync+0xe0/0x1b0 width=32 val=0x0 addr=0xffff8000087447f4 2) Without optimizations (Seen with Clang): _THIS_IP_ will not be sufficient in this case as it will print only the MMIO accessors itself which is of not much use since it is not inlined as below for example: rwmmio_read: readl+0x4/0x80 width=32 addr=0xffff8000087447f4 rwmmio_post_read: readl+0x48/0x80 width=32 val=0x4 addr=0xffff8000087447f4 So in order to handle this second case as well irrespective of the compiler optimizations, add _RET_IP_ to MMIO trace to make it provide more accurate debug information in all these scenarios. Before: rwmmio_read: readl+0x4/0x80 width=32 addr=0xffff8000087447f4 rwmmio_post_read: readl+0x48/0x80 width=32 val=0x4 addr=0xffff8000087447f4 After: rwmmio_read: qcom_smmu_tlb_sync+0xe0/0x1b0 -> readl+0x4/0x80 width=32 addr=0xffff8000087447f4 rwmmio_post_read: qcom_smmu_tlb_sync+0xe0/0x1b0 -> readl+0x4/0x80 width=32 val=0x0 addr=0xffff8000087447f4 Fixes: 210031971cdd ("asm-generic/io: Add logging support for MMIO accessors") Signed-off-by: Sai Prakash Ranjan <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2022-11-21fs: dlm: add dst nodeid for msg tracingAlexander Aring1-16/+22
In DLM when we send a dlm message it is easy to add the lock resource name, but additional lookup is required when to trace the receive message side. The idea here is to move the lookup work to the user by using a lookup to find the right send message with recv message. As note DLM can't drop any message which is guaranteed by a special session layer. For doing the lookup a 3 tupel is required as an unique identification which is dst nodeid, src nodeid and sequence number. This patch adds the destination nodeid to the dlm message trace points. The source nodeid is given by the h_nodeid field inside the header. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]>
2022-11-21fs: dlm: rename seq to h_seq for msg tracingAlexander Aring1-22/+22
This patch renames seq to h_seq as it is named in the dlm header structure. Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David Teigland <[email protected]>
2022-11-11Merge branch 'mana-shared-6.2' of ↵Leon Romanovsky1-0/+66
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Long Li says: ==================== Introduce Microsoft Azure Network Adapter (MANA) RDMA driver [netdev prep] The first 11 patches which modify the MANA Ethernet driver to support RDMA driver. * 'mana-shared-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Define data structures for protection domain and memory registration net: mana: Define data structures for allocating doorbell page from GDMA net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES net: mana: Define max values for SGL entries net: mana: Move header files to a common location net: mana: Record port number in netdev net: mana: Export Work Queue functions for use by RDMA driver net: mana: Set the DMA device max segment size net: mana: Handle vport sharing between devices net: mana: Record the physical address for doorbell page region net: mana: Add support for auxiliary device ==================== Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Leon Romanovsky <[email protected]>
2022-11-10IB/mad: Don't call to function that might sleep while in atomic contextLeonid Ravich1-9/+4
Tracepoints are not allowed to sleep, as such the following splat is generated due to call to ib_query_pkey() in atomic context. WARNING: CPU: 0 PID: 1888000 at kernel/trace/ring_buffer.c:2492 rb_commit+0xc1/0x220 CPU: 0 PID: 1888000 Comm: kworker/u9:0 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.3.1.el8.x86_64 #1 Hardware name: Red Hat KVM, BIOS 1.13.0-2.module_el8.3.0+555+a55c8938 04/01/2014 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] RIP: 0010:rb_commit+0xc1/0x220 RSP: 0000:ffffa8ac80f9bca0 EFLAGS: 00010202 RAX: ffff8951c7c01300 RBX: ffff8951c7c14a00 RCX: 0000000000000246 RDX: ffff8951c707c000 RSI: ffff8951c707c57c RDI: ffff8951c7c14a00 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8951c7c01300 R11: 0000000000000001 R12: 0000000000000246 R13: 0000000000000000 R14: ffffffff964c70c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8951fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f20e8f39010 CR3: 000000002ca10005 CR4: 0000000000170ef0 Call Trace: ring_buffer_unlock_commit+0x1d/0xa0 trace_buffer_unlock_commit_regs+0x3b/0x1b0 trace_event_buffer_commit+0x67/0x1d0 trace_event_raw_event_ib_mad_recv_done_handler+0x11c/0x160 [ib_core] ib_mad_recv_done+0x48b/0xc10 [ib_core] ? trace_event_raw_event_cq_poll+0x6f/0xb0 [ib_core] __ib_process_cq+0x91/0x1c0 [ib_core] ib_cq_poll_work+0x26/0x80 [ib_core] process_one_work+0x1a7/0x360 ? create_worker+0x1a0/0x1a0 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x35/0x40 ---[ end trace 78ba8509d3830a16 ]--- Fixes: 821bf1de45a1 ("IB/MAD: Add recv path trace point") Signed-off-by: Leonid Ravich <[email protected]> Link: https://lore.kernel.org/r/Y2t5feomyznrVj7V@leonid-Inspiron-3421 Signed-off-by: Leon Romanovsky <[email protected]>
2022-11-08mm: vmalloc: add free_vmap_area_noflush trace eventUladzislau Rezki (Sony)1-0/+34
This event is used in order to validate/debug a start address of freed VA, number of currently outstanding and maximum allowed areas. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm: vmalloc: add purge_vmap_area_lazy trace eventUladzislau Rezki (Sony)1-0/+33
It is for debug purposes to track number of freed vmap areas including a range it occurs on. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-08mm: vmalloc: add alloc_vmap_area trace eventUladzislau Rezki (Sony)1-0/+56
Patch series "Add basic trace events for vmap/vmalloc (v2)", v2. This small series add some basic trace events for the vmap/vmalloc code. Since currently we lack any, sometimes it is hard to start debuging vmap code if an issue is reported or occured. For example https://lore.kernel.org/linux-mm/Y0p8BZIiDXLQbde%2F@pc636/T/ The final patch adds two reviewers for vmalloc code. This patch (of 7): It is for debug purposes and for validation of passed parameters. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleksiy Avramchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>