aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-07-10xtensa: ISS: add comment about etherdev freeingMax Filippov1-0/+1
iss_net_configure explicitly frees etherdev in all error return paths except one where register_netdevice fails. In that remaining error return path the etherdev is freed by the iss_net_pdev_release callback triggered by the platform_device_unregister call. Add a comment stating that. Signed-off-by: Max Filippov <[email protected]>
2023-07-10jbd2: remove __journal_try_to_free_buffer()Zhang Yi1-24/+7
__journal_try_to_free_buffer() has only one caller and it's logic is much simple now, so just remove it and open code in jbd2_journal_try_to_free_buffers(). Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-07-10jbd2: fix a race when checking checkpoint buffer busyZhang Yi3-15/+41
Before removing checkpoint buffer from the t_checkpoint_list, we have to check both BH_Dirty and BH_Lock bits together to distinguish buffers have not been or were being written back. But __cp_buffer_busy() checks them separately, it first check lock state and then check dirty, the window between these two checks could be raced by writing back procedure, which locks buffer and clears buffer dirty before I/O completes. So it cannot guarantee checkpointing buffers been written back to disk if some error happens later. Finally, it may clean checkpoint transactions and lead to inconsistent filesystem. jbd2_journal_forget() and __journal_try_to_free_buffer() also have the same problem (journal_unmap_buffer() escape from this issue since it's running under the buffer lock), so fix them through introducing a new helper to try holding the buffer lock and remove really clean buffer. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Cc: [email protected] Suggested-by: Jan Kara <[email protected]> Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-07-10jbd2: Fix wrongly judgement for buffer head removing while doing checkpointZhihao Cheng1-15/+17
Following process, jbd2_journal_commit_transaction // there are several dirty buffer heads in transaction->t_checkpoint_list P1 wb_workfn jbd2_log_do_checkpoint if (buffer_locked(bh)) // false __block_write_full_page trylock_buffer(bh) test_clear_buffer_dirty(bh) if (!buffer_dirty(bh)) __jbd2_journal_remove_checkpoint(jh) if (buffer_write_io_error(bh)) // false >> bh IO error occurs << jbd2_cleanup_journal_tail __jbd2_update_log_tail jbd2_write_superblock // The bh won't be replayed in next mount. , which could corrupt the ext4 image, fetch a reproducer in [Link]. Since writeback process clears buffer dirty after locking buffer head, we can fix it by try locking buffer and check dirtiness while buffer is locked, the buffer head can be removed if it is neither dirty nor locked. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-07-10jbd2: remove journal_clean_one_cp_list()Zhang Yi2-66/+21
journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost the same, so merge them into journal_shrink_one_cp_list(), remove the nr_to_scan parameter, always scan and try to free the whole checkpoint list. Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-07-10jbd2: remove t_checkpoint_io_listZhang Yi3-48/+3
Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint() now, it's time to remove the whole t_checkpoint_io_list logic. Signed-off-by: Zhang Yi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-07-10jbd2: recheck chechpointing non-dirty bufferZhang Yi1-73/+29
There is a long-standing metadata corruption issue that happens from time to time, but it's very difficult to reproduce and analyse, benefit from the JBD2_CYCLE_RECORD option, we found out that the problem is the checkpointing process miss to write out some buffers which are raced by another do_get_write_access(). Looks below for detail. jbd2_log_do_checkpoint() //transaction X //buffer A is dirty and not belones to any transaction __buffer_relink_io() //move it to the IO list __flush_batch() write_dirty_buffer() do_get_write_access() clear_buffer_dirty __jbd2_journal_file_buffer() //add buffer A to a new transaction Y lock_buffer(bh) //doesn't write out __jbd2_journal_remove_checkpoint() //finish checkpoint except buffer A //filesystem corrupt if the new transaction Y isn't fully write out. Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint() have already handles waiting for buffers under IO and re-added new transaction to complete commit, and it also removing cleaned buffers, this makes sure the list will eventually get empty. So it's fine to leave buffers on the t_checkpoint_list while flushing out and completely stop using the t_checkpoint_io_list. Cc: [email protected] Suggested-by: Jan Kara <[email protected]> Signed-off-by: Zhang Yi <[email protected]> Tested-by: Zhihao Cheng <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-07-10tracing/user_events: Fix struct arg size match checkBeau Belgrave1-0/+3
When users register an event the name of the event and it's argument are checked to ensure they match if the event already exists. Normally all arguments are in the form of "type name", except for when the type starts with "struct ". In those cases, the size of the struct is passed in addition to the name, IE: "struct my_struct a 20" for an argument that is of type "struct my_struct" with a field name of "a" and has the size of 20 bytes. The current code does not honor the above case properly when comparing a match. This causes the event register to fail even when the same string was used for events that contain a struct argument within them. The example above "struct my_struct a 20" generates a match string of "struct my_struct a" omitting the size field. Add the struct size of the existing field when generating a comparison string for a struct field to ensure proper match checking. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: e6f89a149872 ("tracing/user_events: Ensure user provided strings are safely formatted") Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-07-10x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()YueHaibing1-1/+0
This is now unused, so can remove it. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: YueHaibing <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-07-11fprobe: Ensure running fprobe_exit_handler() finished before calling ↵Masami Hiramatsu (Google)3-0/+17
rethook_free() Ensure running fprobe_exit_handler() has finished before calling rethook_free() in the unregister_fprobe() so that caller can free the fprobe right after unregister_fprobe(). unregister_fprobe() ensured that all running fprobe_entry/exit_handler() have finished by calling unregister_ftrace_function() which synchronizes RCU. But commit 5f81018753df ("fprobe: Release rethook after the ftrace_ops is unregistered") changed to call rethook_free() after unregister_ftrace_function(). So call rethook_stop() to make rethook disabled before unregister_ftrace_function() and ensure it again. Here is the possible code flow that can call the exit handler after unregister_fprobe(). ------ CPU1 CPU2 call unregister_fprobe(fp) ... __fprobe_handler() rethook_hook() on probed function unregister_ftrace_function() return from probed function rethook hooks find rh->handler == fprobe_exit_handler call fprobe_exit_handler() rethook_free(): set rh->handler = NULL; return from unreigster_fprobe; call fp->exit_handler() <- (*) ------ (*) At this point, the exit handler is called after returning from unregister_fprobe(). This fixes it as following; ------ CPU1 CPU2 call unregister_fprobe() ... rethook_stop(): set rh->handler = NULL; __fprobe_handler() rethook_hook() on probed function unregister_ftrace_function() return from probed function rethook hooks find rh->handler == NULL return from rethook rethook_free() return from unreigster_fprobe; ------ Link: https://lore.kernel.org/all/168873859949.156157.13039240432299335849.stgit@devnote2/ Fixes: 5f81018753df ("fprobe: Release rethook after the ftrace_ops is unregistered") Cc: [email protected] Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]>
2023-07-10arm64: ftrace: Add direct call trampoline samples supportFlorent Revest6-0/+151
The ftrace samples need per-architecture trampoline implementations to save and restore argument registers around the calls to my_direct_func* and to restore polluted registers (eg: x30). These samples also include <asm/asm-offsets.h> which, on arm64, is not necessary and redefines previously defined macros (resulting in warnings) so these includes are guarded by !CONFIG_ARM64. Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-07-10samples: ftrace: Save required argument registers in sample trampolinesFlorent Revest1-6/+8
The ftrace-direct-too sample traces the handle_mm_fault function whose signature changed since the introduction of the sample. Since: commit bce617edecad ("mm: do page fault accounting in handle_mm_fault") handle_mm_fault now has 4 arguments. Therefore, the sample trampoline should save 4 argument registers. s390 saves all argument registers already so it does not need a change but x86_64 needs an extra push and pop. This also evolves the signature of the tracing function to make it mirror the signature of the traced function. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: bce617edecad ("mm: do page fault accounting in handle_mm_fault") Reviewed-by: Steven Rostedt (Google) <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-07-10openrisc: Union fpcsr and oldmask in sigcontext to unbreak userspace ABIStafford Horne2-4/+6
With commit 27267655c531 ("openrisc: Support floating point user api") I added an entry to the struct sigcontext which caused an unwanted change to the userspace ABI. To fix this we use the previously unused oldmask field space for the floating point fpcsr state. We do this with a union to restore the ABI back to the pre kernel v6.4 ABI and keep API compatibility. This does mean if there is some code somewhere that is setting oldmask in an OpenRISC specific userspace sighandler it would end up setting the floating point register status, but I think it's unlikely as oldmask was never functional before. Fixes: 27267655c531 ("openrisc: Support floating point user api") Reported-by: Szabolcs Nagy <[email protected]> Closes: https://lore.kernel.org/openrisc/[email protected]/ Signed-off-by: Stafford Horne <[email protected]>
2023-07-10Merge tag 'linux-watchdog-6.5-rc2' of ↵Linus Torvalds1-0/+42
git://www.linux-watchdog.org/linux-watchdog Pull watchdog update from Wim Van Sebroeck: - Add Loongson-1 watchdog dt-bindings * tag 'linux-watchdog-6.5-rc2' of git://www.linux-watchdog.org/linux-watchdog: dt-bindings: watchdog: Add Loongson-1 watchdog
2023-07-10Merge tag 'v6.5-p2' of ↵Linus Torvalds3-9/+22
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix a couple of regressions in af_alg and incorrect return values in crypto/asymmetric_keys/public_key" * tag 'v6.5-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algif_hash - Fix race between MORE and non-MORE sends KEYS: asymmetric: Fix error codes crypto: af_alg - Fix merging of written data into spliced pages
2023-07-10block: remove dead struc request->completion_data fieldJens Axboe1-3/+3
It's no longer used. While in there, also update the comment as to why it can coexist with the rb_node. Signed-off-by: Jens Axboe <[email protected]>
2023-07-10nvme: fix the NVME_ID_NS_NVM_STS_MASK definitionAnkit Kumar1-1/+1
As per NVMe command set specification 1.0c Storage tag size is 7 bits. Fixes: 4020aad85c67 ("nvme: add support for enhanced metadata") Signed-off-by: Ankit Kumar <[email protected]> Reviewed-by: Kanchan Joshi <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-07-10igc: Fix inserting of empty frame for launchtimeFlorian Kauer1-1/+1
The insertion of an empty frame was introduced with commit db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit") in order to ensure that the current cycle has at least one packet if there is some packet to be scheduled for the next cycle. However, the current implementation does not properly check if a packet is already scheduled for the current cycle. Currently, an empty packet is always inserted if and only if txtime >= end_of_cycle && txtime > last_tx_cycle but since last_tx_cycle is always either the end of the current cycle (end_of_cycle) or the end of a previous cycle, the second part (txtime > last_tx_cycle) is always true unless txtime == last_tx_cycle. What actually needs to be checked here is if the last_tx_cycle was already written within the current cycle, so an empty frame should only be inserted if and only if txtime >= end_of_cycle && end_of_cycle > last_tx_cycle. This patch does not only avoid an unnecessary insertion, but it can actually be harmful to insert an empty packet if packets are already scheduled in the current cycle, because it can lead to a situation where the empty packet is actually processed as the first packet in the upcoming cycle shifting the packet with the first_flag even one cycle into the future, finally leading to a TX hang. The TX hang can be reproduced on a i225 with: sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x1 \ txtime-delay 500000 \ clockid CLOCK_TAI sudo tc qdisc replace dev enp1s0 parent 100:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload \ skip_sock_check and traffic generator sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns with traffic.cfg #define ETH_P_IP 0x0800 { /* Ethernet Header */ 0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed 0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed const16(ETH_P_IP), /* IPv4 Header */ 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum /* UDP Header */ 10, 0, 48, 1, # IP Src - adapt as needed 10, 0, 48, 10, # IP Dest - adapt as needed const16(5555), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum /* Payload */ fill('W', 1000), } and the observed message with that is for example igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang Tx Queue <0> TDH <32> TDT <3c> next_to_use <3c> next_to_clean <32> buffer_info[next_to_clean] time_stamp <ffff26a8> next_to_watch <00000000632a1828> jiffies <ffff27f8> desc.status <1048000> Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit") Signed-off-by: Florian Kauer <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-07-10igc: Fix launchtime before start of cycleFlorian Kauer1-1/+1
It is possible (verified on a running system) that frames are processed by igc_tx_launchtime with a txtime before the start of the cycle (baset_est). However, the result of txtime - baset_est is written into a u32, leading to a wrap around to a positive number. The following launchtime > 0 check will only branch to executing launchtime = 0 if launchtime is already 0. Fix it by using a s32 before checking launchtime > 0. Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit") Signed-off-by: Florian Kauer <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-07-10igc: No strict mode in pure launchtime/CBS offloadFlorian Kauer1-2/+22
The flags IGC_TXQCTL_STRICT_CYCLE and IGC_TXQCTL_STRICT_END prevent the packet transmission over slot and cycle boundaries. This is important for taprio offload where the slots and cycles correspond to the slots and cycles configured for the network. However, the Qbv offload feature of the i225 is also used for enabling TX launchtime / ETF offload. In that case, however, the cycle has no meaning for the network and is only used internally to adapt the base time register after a second has passed. Enabling strict mode in this case would unnecessarily prevent the transmission of certain packets (i.e. at the boundary of a second) and thus interferes with the ETF qdisc that promises transmission at a certain point in time. Similar to ETF, this also applies to CBS offload that also should not be influenced by strict mode unless taprio offload would be enabled at the same time. This fully reverts commit d8f45be01dd9 ("igc: Use strict cycles for Qbv scheduling") but its commit message only describes what was already implemented before that commit. The difference to a plain revert of that commit is that it now copes with the base_time = 0 case that was fixed with commit e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") In particular, enabling strict mode leads to TX hang situations under high traffic if taprio is applied WITHOUT taprio offload but WITH ETF offload, e.g. as in sudo tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x1 \ txtime-delay 500000 \ clockid CLOCK_TAI sudo tc qdisc replace dev enp1s0 parent 100:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload \ skip_sock_check and traffic generator sudo trafgen -i traffic.cfg -o enp1s0 --cpp -n0 -q -t1400ns with traffic.cfg #define ETH_P_IP 0x0800 { /* Ethernet Header */ 0x30, 0x1f, 0x9a, 0xd0, 0xf0, 0x0e, # MAC Dest - adapt as needed 0x24, 0x5e, 0xbe, 0x57, 0x2e, 0x36, # MAC Src - adapt as needed const16(ETH_P_IP), /* IPv4 Header */ 0b01000101, 0, # IPv4 version, IHL, TOS const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header)) const16(2), # IPv4 ident 0b01000000, 0, # IPv4 flags, fragmentation off 64, # IPv4 TTL 17, # Protocol UDP csumip(14, 33), # IPv4 checksum /* UDP Header */ 10, 0, 48, 1, # IP Src - adapt as needed 10, 0, 48, 10, # IP Dest - adapt as needed const16(5555), # UDP Src Port const16(6666), # UDP Dest Port const16(1008), # UDP length (UDP header 8 bytes + payload length) csumudp(14, 34), # UDP checksum /* Payload */ fill('W', 1000), } and the observed message with that is for example igc 0000:01:00.0 enp1s0: Detected Tx Unit Hang Tx Queue <0> TDH <d0> TDT <f0> next_to_use <f0> next_to_clean <d0> buffer_info[next_to_clean] time_stamp <ffff661f> next_to_watch <00000000245a4efb> jiffies <ffff6e48> desc.status <1048000> Fixes: d8f45be01dd9 ("igc: Use strict cycles for Qbv scheduling") Signed-off-by: Florian Kauer <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-07-11kernel: kprobes: Remove unnecessary ‘0’ valuesLi zeming1-3/+3
it is assigned first, so it does not need to initialize the assignment. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Li zeming <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-07-11kprobes: Remove unnecessary ‘NULL’ values from correct_ret_addrLi zeming1-1/+1
The 'correct_ret_addr' pointer is always set in the later code, no need to initialize it at definition time. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Li zeming <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-07-11fprobe: add unlock to match a succeeded ftrace_test_recursion_trylockZe Gao1-1/+3
Unlock ftrace recursion lock when fprobe_kprobe_handler() is failed because of some running kprobe. Link: https://lore.kernel.org/all/[email protected]/ Fixes: 3cc4e2c5fbae ("fprobe: make fprobe_kprobe_handler recursion free") Reported-by: Yafang <[email protected]> Closes: https://lore.kernel.org/linux-trace-kernel/CALOAHbC6UpfFOOibdDiC7xFc5YFUgZnk3MZ=3Ny6we=AcrNbew@mail.gmail.com/ Signed-off-by: Ze Gao <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Yafang Shao <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-07-10nvmet: use PAGE_SECTORS_SHIFTDamien Le Moal2-3/+3
Replace occurences of the pattern "PAGE_SHIFT - 9" in the passthru and loop targets with PAGE_SECTORS_SHIFT. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-07-11kernel/trace: Fix cleanup logic of enable_trace_eprobeTzvetomir Stoyanov (VMware)1-2/+16
The enable_trace_eprobe() function enables all event probes, attached to given trace probe. If an error occurs in enabling one of the event probes, all others should be roll backed. There is a bug in that roll back logic - instead of all event probes, only the failed one is disabled. Link: https://lore.kernel.org/all/[email protected]/ Reported-by: Dan Carpenter <[email protected]> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Tzvetomir Stoyanov (VMware) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-07-10nvme: add BOGUS_NID quirk for Samsung SM953Pankaj Raghav1-0/+2
Add the quirk as SM953 is reporting bogus namespace ID. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217593 Reported-by: Clemens Springsguth <[email protected]> Tested-by: Clemens Springsguth <[email protected]> Signed-off-by: Pankaj Raghav <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-07-10igc: Handle already enabled taprio offload for basetime 0Florian Kauer1-1/+1
Since commit e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") it is possible to enable taprio offload with a basetime of 0. However, the check if taprio offload is already enabled (and thus -EALREADY should be returned for igc_save_qbv_schedule) still relied on adapter->base_time > 0. This can be reproduced as follows: # TAPRIO offload (flags == 0x2) and base-time = 0 sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x2 # The second call should fail with "Error: Device failed to setup taprio offload." # But that only happens if base-time was != 0 sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time 0 \ sched-entry S 01 300000 \ flags 0x2 Fixes: e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-07-10igc: Do not enable taprio offload for invalid argumentsFlorian Kauer1-12/+6
Only set adapter->taprio_offload_enable after validating the arguments. Otherwise, it stays set even if the offload was not enabled. Since the subsequent code does not get executed in case of invalid arguments, it will not be read at first. However, by activating and then deactivating another offload (e.g. ETF/TX launchtime offload), taprio_offload_enable is read and erroneously keeps the offload feature of the NIC enabled. This can be reproduced as follows: # TAPRIO offload (flags == 0x2) and negative base-time leading to expected -ERANGE sudo tc qdisc replace dev enp1s0 parent root handle 100 stab overhead 24 taprio \ num_tc 1 \ map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \ queues 1@0 \ base-time -1000 \ sched-entry S 01 300000 \ flags 0x2 # IGC_TQAVCTRL is 0x0 as expected (iomem=relaxed for reading register) sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1 # Activate ETF offload sudo tc qdisc replace dev enp1s0 parent root handle 6666 mqprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 \ hw 0 sudo tc qdisc add dev enp1s0 parent 6666:1 etf \ clockid CLOCK_TAI \ delta 500000 \ offload # IGC_TQAVCTRL is 0x9 as expected sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1 # Deactivate ETF offload again sudo tc qdisc delete dev enp1s0 parent 6666:1 # IGC_TQAVCTRL should now be 0x0 again, but is observed as 0x9 sudo pcimem /sys/bus/pci/devices/0000:01:00.0/resource0 0x3570 w*1 Fixes: e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-07-10igc: Rename qbv_enable to taprio_offload_enableFlorian Kauer3-5/+5
In the current implementation the flags adapter->qbv_enable and IGC_FLAG_TSN_QBV_ENABLED have a similar name, but do not have the same meaning. The first one is used only to indicate taprio offload (i.e. when igc_save_qbv_schedule was called), while the second one corresponds to the Qbv mode of the hardware. However, the second one is also used to support the TX launchtime feature, i.e. ETF qdisc offload. This leads to situations where adapter->qbv_enable is false, but the flag IGC_FLAG_TSN_QBV_ENABLED is set. This is prone to confusion. The rename should reduce this confusion. Since it is a pure rename, it has no impact on functionality. Fixes: e17090eb2494 ("igc: allow BaseTime 0 enrollment for Qbv") Signed-off-by: Florian Kauer <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Tested-by: Naama Meir <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2023-07-10cifs: if deferred close is disabled then close files immediatelyBharath SM1-2/+2
If defer close timeout value is set to 0, then there is no need to include files in the deferred close list and utilize the delayed worker for closing. Instead, we can close them immediately. Signed-off-by: Bharath SM <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]> Cc: [email protected] Signed-off-by: Steve French <[email protected]>
2023-07-10of: make OF_EARLY_FLATTREE depend on HAS_IOMEMBaoquan He1-1/+1
On s390 systems (aka mainframes), it has classic channel devices for networking and permanent storage that are currently even more common than PCI devices. Hence it could have a fully functional s390 kernel with CONFIG_PCI=n, then the relevant iomem mapping functions [including ioremap(), devm_ioremap(), etc.] are not available. In LKP error report at below on s390: ------ ld: kernel/dma/coherent.o: in function `dma_init_coherent_memory': coherent.c:(.text+0x102): undefined reference to `memremap' ld: coherent.c:(.text+0x226): undefined reference to `memunmap' ld: kernel/dma/coherent.o: in function `dma_declare_coherent_memory': coherent.c:(.text+0x8b8): undefined reference to `memunmap' ld: kernel/dma/coherent.o: in function `dma_release_coherent_memory': coherent.c:(.text+0x9aa): undefined reference to `memunmap' ------ In the config file, several Kconfig options are: ------ '# CONFIG_PCI is not set' CONFIG_OF_EARLY_FLATTREE=y CONFIG_DMA_DECLARE_COHERENT=y ------ So, enabling OF_EARLY_FLATTREE will select DMA_DECLARE_COHERENT and cause above building errors even though they are not needed because CONFIG_PCI is disabled. Here let OF_EARLY_FLATTREE depend on HAS_IOMEM so that it won't be built to cause compiling error if PCI is unset. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Baoquan He <[email protected]> Cc: Rob Herring <[email protected]> Cc: Frank Rowand <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-07-10platform/x86: int3472/discrete: set variable ↵Tom Rix1-1/+1
skl_int3472_regulator_second_sensor storage-class-specifier to static smatch reports drivers/platform/x86/intel/int3472/clk_and_regulator.c:263:28: warning: symbol 'skl_int3472_regulator_second_sensor' was not declared. Should it be static? This variable is only used in its defining file, so it should be static. Signed-off-by: Tom Rix <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2023-07-10platform/x86/intel/tpmi: Prevent overflow for cap_offsetSrinivas Pandruvada1-3/+1
cap_offset is a u16 field, so multiplying with TPMI_CAP_OFFSET_UNIT (which is equal to 1024) to covert to bytes will cause overflow. This will be a problem once more TPMI features are added. This field is not used except for calculating pfs->vsec_offset. So, leave cap_offset field unchanged and multiply with TPMI_CAP_OFFSET_UNIT while calculating pfs->vsec_offset. Signed-off-by: Srinivas Pandruvada <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2023-07-10platform/x86: wmi: Replace open coded guid_parse_and_compare()Andy Shevchenko1-5/+1
Even though we have no issues in the code, let's replace the open coded guid_parse_and_compare(). Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Armin Wolf <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2023-07-10platform/x86: wmi: Break possible infinite loop when parsing GUIDAndy Shevchenko1-10/+12
The while-loop may break on one of the two conditions, either ID string is empty or GUID matches. The second one, may never be reached if the parsed string is not correct GUID. In such a case the loop will never advance to check the next ID. Break possible infinite loop by factoring out guid_parse_and_compare() helper which may be moved to the generic header for everyone later on and preventing from similar mistake in the future. Interestingly that firstly it appeared when WMI was turned into a bus driver, but later when duplicated GUIDs were checked, the while-loop has been replaced by for-loop and hence no mistake made again. Fixes: a48e23385fcf ("platform/x86: wmi: add context pointer field to struct wmi_device_id") Fixes: 844af950da94 ("platform/x86: wmi: Turn WMI into a bus driver") Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Armin Wolf <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2023-07-10drm/ttm: never consider pinned BOs for eviction&swapChristian König1-0/+6
There is a small window where we have already incremented the pin count but not yet moved the bo from the lru to the pinned list. Signed-off-by: Christian König <[email protected]> Reported-by: Pelloux-Prayer, Pierre-Eric <[email protected]> Tested-by: Pelloux-Prayer, Pierre-Eric <[email protected]> Acked-by: Alex Deucher <[email protected]> Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-07-10spi: bcm63xx: fix max prepend lengthJonas Gorski1-1/+1
The command word is defined as following: /* Command */ #define SPI_CMD_COMMAND_SHIFT 0 #define SPI_CMD_DEVICE_ID_SHIFT 4 #define SPI_CMD_PREPEND_BYTE_CNT_SHIFT 8 #define SPI_CMD_ONE_BYTE_SHIFT 11 #define SPI_CMD_ONE_WIRE_SHIFT 12 If the prepend byte count field starts at bit 8, and the next defined bit is SPI_CMD_ONE_BYTE at bit 11, it can be at most 3 bits wide, and thus the max value is 7, not 15. Fixes: b17de076062a ("spi/bcm63xx: work around inability to keep CS up") Signed-off-by: Jonas Gorski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-07-10MAINTAINERS: Add myself as a maintainer for Microchip SPIRyan Wanner1-1/+1
Tudor is not with Microchip anymore. I have worked lately with Microchip SPI drivers replacing Tudor with myself as this maintainer. Signed-off-by: Ryan Wanner <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Acked-by: Tudor Ambarus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-07-10pinctrl: renesas: rzg2l: Handle non-unique subnode namesBiju Das1-8/+20
Currently, sd1 and sd0 have unique subnode names 'sd1_mux' and 'sd0_mux'. If we change these to non-unique subnode names such as 'mux' this can lead to the below conflict as the RZ/G2L pin control driver considers only the names of the subnodes. pinctrl-rzg2l 11030000.pinctrl: pin P47_0 already requested by 11c00000.mmc; cannot claim for 11c10000.mmc pinctrl-rzg2l 11030000.pinctrl: pin-376 (11c10000.mmc) status -22 pinctrl-rzg2l 11030000.pinctrl: could not request pin 376 (P47_0) from group mux on device pinctrl-rzg2l renesas_sdhi_internal_dmac 11c10000.mmc: Error applying setting, reverse things back Fix this by constructing unique names from the node names of both the pin control configuration node and its child node, where appropriate. Based on the work done by Geert for the RZ/V2M pinctrl driver. Fixes: c4c4637eb57f ("pinctrl: renesas: Add RZ/G2L pin and gpio controller driver") Signed-off-by: Biju Das <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Geert Uytterhoeven <[email protected]>
2023-07-10pinctrl: renesas: rzv2m: Handle non-unique subnode namesGeert Uytterhoeven1-8/+20
The eMMC and SDHI pin control configuration nodes in DT have subnodes with the same names ("data" and "ctrl"). As the RZ/V2M pin control driver considers only the names of the subnodes, this leads to conflicts: pinctrl-rzv2m b6250000.pinctrl: pin P8_2 already requested by 85000000.mmc; cannot claim for 85020000.mmc pinctrl-rzv2m b6250000.pinctrl: pin-130 (85020000.mmc) status -22 renesas_sdhi_internal_dmac 85020000.mmc: Error applying setting, reverse things back Fix this by constructing unique names from the node names of both the pin control configuration node and its child node, where appropriate. Reported by: Fabrizio Castro <[email protected]> Fixes: 92a9b825257614af ("pinctrl: renesas: Add RZ/V2M pin and gpio controller driver") Signed-off-by: Geert Uytterhoeven <[email protected]> Tested-by: Fabrizio Castro <[email protected]> Link: https://lore.kernel.org/r/607bd6ab4905b0b1b119a06ef953fa1184505777.1688396717.git.geert+renesas@glider.be
2023-07-10HID: amd_sfh: Fix for shift-out-of-boundsBasavaraj Natikar1-2/+18
Shift operation of 'exp' and 'shift' variables exceeds the maximum number of shift values in the u32 range leading to UBSAN shift-out-of-bounds. ... [ 6.120512] UBSAN: shift-out-of-bounds in drivers/hid/amd-sfh-hid/sfh1_1/amd_sfh_desc.c:149:50 [ 6.120598] shift exponent 104 is too large for 64-bit type 'long unsigned int' [ 6.120659] CPU: 4 PID: 96 Comm: kworker/4:1 Not tainted 6.4.0amd_1-next-20230519-dirty #10 [ 6.120665] Hardware name: AMD Birman-PHX/Birman-PHX, BIOS SFH_with_HPD_SEN.FD 04/05/2023 [ 6.120667] Workqueue: events amd_sfh_work_buffer [amd_sfh] [ 6.120687] Call Trace: [ 6.120690] <TASK> [ 6.120694] dump_stack_lvl+0x48/0x70 [ 6.120704] dump_stack+0x10/0x20 [ 6.120707] ubsan_epilogue+0x9/0x40 [ 6.120716] __ubsan_handle_shift_out_of_bounds+0x10f/0x170 [ 6.120720] ? psi_group_change+0x25f/0x4b0 [ 6.120729] float_to_int.cold+0x18/0xba [amd_sfh] [ 6.120739] get_input_rep+0x57/0x340 [amd_sfh] [ 6.120748] ? __schedule+0xba7/0x1b60 [ 6.120756] ? __pfx_get_input_rep+0x10/0x10 [amd_sfh] [ 6.120764] amd_sfh_work_buffer+0x91/0x180 [amd_sfh] [ 6.120772] process_one_work+0x229/0x430 [ 6.120780] worker_thread+0x4a/0x3c0 [ 6.120784] ? __pfx_worker_thread+0x10/0x10 [ 6.120788] kthread+0xf7/0x130 [ 6.120792] ? __pfx_kthread+0x10/0x10 [ 6.120795] ret_from_fork+0x29/0x50 [ 6.120804] </TASK> ... Fix this by adding the condition to validate shift ranges. Fixes: 93ce5e0231d7 ("HID: amd_sfh: Implement SFH1.1 functionality") Cc: [email protected] Tested-by: Kai-Heng Feng <[email protected]> Signed-off-by: Basavaraj Natikar <[email protected]> Signed-off-by: Akshata MukundShetty <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>
2023-07-10HID: amd_sfh: Rename the float32 variableBasavaraj Natikar1-6/+6
As float32 is also used in other places as a data type, it is necessary to rename the float32 variable in order to avoid confusion. Cc: [email protected] Tested-by: Kai-Heng Feng <[email protected]> Signed-off-by: Basavaraj Natikar <[email protected]> Signed-off-by: Akshata MukundShetty <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>
2023-07-10sched/psi: use kernfs polling functions for PSI trigger pollingSuren Baghdasaryan4-11/+28
Destroying psi trigger in cgroup_file_release causes UAF issues when a cgroup is removed from under a polling process. This is happening because cgroup removal causes a call to cgroup_file_release while the actual file is still alive. Destroying the trigger at this point would also destroy its waitqueue head and if there is still a polling process on that file accessing the waitqueue, it will step on the freed pointer: do_select vfs_poll do_rmdir cgroup_rmdir kernfs_drain_open_files cgroup_file_release cgroup_pressure_release psi_trigger_destroy wake_up_pollfree(&t->event_wait) // vfs_poll is unblocked synchronize_rcu kfree(t) poll_freewait -> UAF access to the trigger's waitqueue head Patch [1] fixed this issue for epoll() case using wake_up_pollfree(), however the same issue exists for synchronous poll() case. The root cause of this issue is that the lifecycles of the psi trigger's waitqueue and of the file associated with the trigger are different. Fix this by using kernfs_generic_poll function when polling on cgroup-specific psi triggers. It internally uses kernfs_open_node->poll waitqueue head with its lifecycle tied to the file's lifecycle. This also renders the fix in [1] obsolete, so revert it. [1] commit c2dbe32d5db5 ("sched/psi: Fix use-after-free in ep_remove_wait_queue()") Fixes: 0e94682b73bf ("psi: introduce psi monitor") Closes: https://lore.kernel.org/all/[email protected]/ Reported-by: Lu Jialin <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-07-10sched/fair: Use recent_used_cpu to test p->cpus_ptrMiaohe Lin1-1/+1
When checking whether a recently used CPU can be a potential idle candidate, recent_used_cpu should be used to test p->cpus_ptr as p->recent_used_cpu is not equal to recent_used_cpu and candidate decision is made based on recent_used_cpu here. Fixes: 89aafd67f28c ("sched/fair: Use prev instead of new target as recent_used_cpu") Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Phil Auld <[email protected]> Acked-by: Mel Gorman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-07-10iov_iter: Mark copy_iovec_from_user() noclonePeter Zijlstra1-1/+1
Extend commit 50f9a76ef127 ("iov_iter: Mark copy_compat_iovec_from_user() noinline") to also cover copy_iovec_from_user(). Different compiler versions cause the same problem on different functions. lib/iov_iter.o: warning: objtool: .altinstr_replacement+0x1f: redundant UACCESS disable lib/iov_iter.o: warning: objtool: iovec_from_user+0x84: call to copy_iovec_from_user.part.0() with UACCESS enabled lib/iov_iter.o: warning: objtool: __import_iovec+0x143: call to copy_iovec_from_user.part.0() with UACCESS enabled Fixes: 50f9a76ef127 ("iov_iter: Mark copy_compat_iovec_from_user() noinline") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Borislav Petkov (AMD) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-07-10objtool: initialize all of struct elfMichal Kubecek1-1/+1
Function elf_open_read() only zero initializes the initial part of allocated struct elf; num_relocs member was recently added outside the zeroed part so that it was left uninitialized, resulting in build failures on some systems. The partial initialization is a relic of times when struct elf had large hash tables embedded. This is no longer the case so remove the trap and initialize the whole structure instead. Fixes: eb0481bbc4ce ("objtool: Fix reloc_hash size") Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-07-10x86/fineibt: Poison ENDBR at +0Peter Zijlstra1-0/+16
Alyssa noticed that when building the kernel with CFI_CLANG+IBT and booting on IBT enabled hardware to obtain FineIBT, the indirect functions look like: __cfi_foo: endbr64 subl $hash, %r10d jz 1f ud2 nop 1: foo: endbr64 This is because the compiler generates code for kCFI+IBT. In that case the caller does the hash check and will jump to +0, so there must be an ENDBR there. The compiler doesn't know about FineIBT at all; also it is possible to actually use kCFI+IBT when booting with 'cfi=kcfi' on IBT enabled hardware. Having this second ENDBR however makes it possible to elide the CFI check. Therefore, we should poison this second ENDBR when switching to FineIBT mode. Fixes: 931ab63664f0 ("x86/ibt: Implement FineIBT") Reported-by: "Milburn, Alyssa" <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-07-10x86: Rewrite ret_from_fork() in CBrian Gerst4-49/+40
When kCFI is enabled, special handling is needed for the indirect call to the kernel thread function. Rewrite the ret_from_fork() function in C so that the compiler can properly handle the indirect call. Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-07-10x86/32: Remove schedule_tail_wrapper()Brian Gerst1-23/+10
The unwinder expects a return address at the very top of the kernel stack just below pt_regs and before any stack frame is created. Instead of calling a wrapper, set up a return address as if ret_from_fork() was called from the syscall entry code. Signed-off-by: Brian Gerst <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-07-10x86/cfi: Extend ENDBR sealing to kCFIPeter Zijlstra1-1/+43
Kees noted that IBT sealing could be extended to kCFI. Fundamentally it is the list of functions that do not have their address taken and are thus never called indirectly. It doesn't matter that objtool uses IBT infrastructure to determine this list, once we have it it can also be used to clobber kCFI hashes and avoid kCFI indirect calls. Suggested-by: Kees Cook <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Link: https://lkml.kernel.org/r/20230622144321.494426891%40infradead.org