aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-28Merge tag 'trace-v6.4' of ↵Linus Torvalds35-518/+1959
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - User events are finally ready! After lots of collaboration between various parties, we finally locked down on a stable interface for user events that can also work with user space only tracing. This is implemented by telling the kernel (or user space library, but that part is user space only and not part of this patch set), where the variable is that the application uses to know if something is listening to the trace. There's also an interface to tell the kernel about these events, which will show up in the /sys/kernel/tracing/events/user_events/ directory, where it can be enabled. When it's enabled, the kernel will update the variable, to tell the application to start writing to the kernel. See https://lwn.net/Articles/927595/ - Cleaned up the direct trampolines code to simplify arm64 addition of direct trampolines. Direct trampolines use the ftrace interface but instead of jumping to the ftrace trampoline, applications (mostly BPF) can register their own trampoline for performance reasons. - Some updates to the fprobe infrastructure. fprobes are more efficient than kprobes, as it does not need to save all the registers that kprobes on ftrace do. More work needs to be done before the fprobes will be exposed as dynamic events. - More updates to references to the obsolete path of /sys/kernel/debug/tracing for the new /sys/kernel/tracing path. - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line by line instead of all at once. There are users in production kernels that have a large data dump that originally used printk() directly, but the data dump was larger than what printk() allowed as a single print. Using seq_buf() to do the printing fixes that. - Add /sys/kernel/tracing/touched_functions that shows all functions that was every traced by ftrace or a direct trampoline. This is used for debugging issues where a traced function could have caused a crash by a bpf program or live patching. - Add a "fields" option that is similar to "raw" but outputs the fields of the events. It's easier to read by humans. - Some minor fixes and clean ups. * tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits) ring-buffer: Sync IRQ works before buffer destruction tracing: Add missing spaces in trace_print_hex_seq() ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus recordmcount: Fix memory leaks in the uwrite function tracing/user_events: Limit max fault-in attempts tracing/user_events: Prevent same address and bit per process tracing/user_events: Ensure bit is cleared on unregister tracing/user_events: Ensure write index cannot be negative seq_buf: Add seq_buf_do_printk() helper tracing: Fix print_fields() for __dyn_loc/__rel_loc tracing/user_events: Set event filter_type from type ring-buffer: Clearly check null ptr returned by rb_set_head_page() tracing: Unbreak user events tracing/user_events: Use print_format_fields() for trace output tracing/user_events: Align structs with tabs for readability tracing/user_events: Limit global user_event count tracing/user_events: Charge event allocs to cgroups tracing/user_events: Update documentation for ABI tracing/user_events: Use write ABI in example tracing/user_events: Add ABI self-test ...
2023-04-28Merge tag 'smp-core-2023-04-27' of ↵Linus Torvalds33-281/+217
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28Merge tag 'sched-core-2023-04-27' of ↵Linus Torvalds21-350/+1424
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Allow unprivileged PSI poll()ing - Fix performance regression introduced by mm_cid - Improve livepatch stalls by adding livepatch task switching to cond_resched(). This resolves livepatching busy-loop stalls with certain CPU-bound kthreads - Improve sched_move_task() performance on autogroup configs - On core-scheduling CPUs, avoid selecting throttled tasks to run - Misc cleanups, fixes and improvements * tag 'sched-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Fix local_clock() before sched_clock_init() sched/rt: Fix bad task migration for rt tasks sched: Fix performance regression introduced by mm_cid sched/core: Make sched_dynamic_mutex static sched/psi: Allow unprivileged polling of N*2s period sched/psi: Extract update_triggers side effect sched/psi: Rename existing poll members in preparation sched/psi: Rearrange polling code in preparation sched/fair: Fix inaccurate tally of ttwu_move_affine vhost: Fix livepatch timeouts in vhost_worker() livepatch,sched: Add livepatch task switching to cond_resched() livepatch: Skip task_call_func() for current task livepatch: Convert stack entries array to percpu sched: Interleave cfs bandwidth timers for improved single thread performance at low utilization sched/core: Reduce cost of sched_move_task when config autogroup sched/core: Avoid selecting the task that is throttled to run when core-sched enable sched/topology: Make sched_energy_mutex,update static
2023-04-28Merge tag 'perf-core-2023-04-27' of ↵Linus Torvalds5-4/+32
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: - Add Intel Granite Rapids support - Add uncore events for Intel SPR IMC PMU - Fix perf IRQ throttling bug * tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add events for Intel SPR IMC PMU perf/core: Fix hardlockup failure caused by perf throttle perf/x86/cstate: Add Granite Rapids support perf/x86/msr: Add Granite Rapids perf/x86/intel: Add Granite Rapids
2023-04-28Merge tag 'objtool-core-2023-04-27' of ↵Linus Torvalds84-684/+641
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ...
2023-04-28NFSv4.2: Rework scratch handling for READ_PLUSAnna Schumaker3-7/+15
Instead of using a tiny, static scratch buffer, we should use a kmalloc()-ed buffer that is allocated when checking for read plus usage. This lets us use the buffer before decoding any part of the READ_PLUS operation instead of setting it right before segment decoding, meaning it should be a little more robust. Signed-off-by: Anna Schumaker <[email protected]>
2023-04-28block: Skip destroyed blkg when restart in blkg_destroy_all()Tao Su1-0/+3
Kernel hang in blkg_destroy_all() when total blkg greater than BLKG_DESTROY_BATCH_SIZE, because of not removing destroyed blkg in blkg_list. So the size of blkg_list is same after destroying a batch of blkg, and the infinite 'restart' occurs. Since blkg should stay on the queue list until blkg_free_workfn(), skip destroyed blkg when restart a new round, which will solve this kernel hang issue and satisfy the previous will to restart. Reported-by: Xiangfei Ma <[email protected]> Tested-by: Xiangfei Ma <[email protected]> Tested-by: Farrah Chen <[email protected]> Signed-off-by: Tao Su <[email protected]> Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Suggested-and-reviewed-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-28fs/9p: Fix bit operation logic errorEric Van Hensbergen2-2/+2
This re-introduces a fix that somehow got dropped during rebase of the current series in for-next. When writeback is enabled, opens are forced to support both read and write operations but with the logic error other flags may be dropped unintentionaly. Reported-by: Christophe Jaillet <[email protected]> Signed-off-by: Eric Van Hensbergen <[email protected]>
2023-04-28ext4: clean up error handling in __ext4_fill_super()Theodore Ts'o1-22/+29
There were two ways to return an error code; one was via setting the 'err' variable, and the second, if err was zero, was via the 'ret' variable. This was both confusing and fragile, and when code was factored out of __ext4_fill_super(), some of the error codes returned by the original code was replaced by -EINVAL, and in one case, the error code was placed by 0, triggering a kernel null pointer dereference. Clean this up by removing the 'ret' variable, leaving only one way to set the error code to be returned, and restore the errno codes that were returned via the the mount system call as they were before we started refactoring __ext4_fill_super(). Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Jason Yan <[email protected]>
2023-04-28ext4: reflect error codes from ext4_multi_mount_protect() to its callersTheodore Ts'o2-8/+17
This will allow more fine-grained errno codes to be returned by the mount system call. Cc: Andreas Dilger <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]>
2023-04-28ext4: fix lost error code reporting in __ext4_fill_super()Theodore Ts'o1-1/+2
When code was factored out of __ext4_fill_super() into ext4_percpu_param_init() the error return was discarded. This meant that it was possible for __ext4_fill_super() to return zero, indicating success, without the struct super getting completely filled in, leading to a potential NULL pointer dereference. Reported-by: [email protected] Fixes: 1f79467c8a6b ("ext4: factor out ext4_percpu_param_init() ...") Link: https://syzkaller.appspot.com/bug?id=6dac47d5e58af770c0055f680369586ec32e144c Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Jason Yan <[email protected]>
2023-04-28ext4: fix unused iterator variable warningsNathan Chancellor1-4/+3
When CONFIG_QUOTA is disabled, there are warnings around unused iterator variables: fs/ext4/super.c: In function 'ext4_put_super': fs/ext4/super.c:1262:13: error: unused variable 'i' [-Werror=unused-variable] 1262 | int i, err; | ^ fs/ext4/super.c: In function '__ext4_fill_super': fs/ext4/super.c:5200:22: error: unused variable 'i' [-Werror=unused-variable] 5200 | unsigned int i; | ^ cc1: all warnings being treated as errors The kernel has updated to GNU11, allowing the variables to be declared within the for loop. Do so to clear up the warnings. Fixes: dcbf87589d90 ("ext4: factor out ext4_flex_groups_free()") Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/20230420-ext4-unused-variables-super-c-v1-1-138b6db6c21c@kernel.org Signed-off-by: Theodore Ts'o <[email protected]>
2023-04-28ext4: fix use-after-free read in ext4_find_extent for bigalloc + inlineYe Bin1-1/+2
Syzbot found the following issue: loop0: detected capacity change from 0 to 2048 EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none. ================================================================== BUG: KASAN: use-after-free in ext4_ext_binsearch_idx fs/ext4/extents.c:768 [inline] BUG: KASAN: use-after-free in ext4_find_extent+0x76e/0xd90 fs/ext4/extents.c:931 Read of size 4 at addr ffff888073644750 by task syz-executor420/5067 CPU: 0 PID: 5067 Comm: syz-executor420 Not tainted 6.2.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:306 print_report+0x107/0x1f0 mm/kasan/report.c:417 kasan_report+0xcd/0x100 mm/kasan/report.c:517 ext4_ext_binsearch_idx fs/ext4/extents.c:768 [inline] ext4_find_extent+0x76e/0xd90 fs/ext4/extents.c:931 ext4_clu_mapped+0x117/0x970 fs/ext4/extents.c:5809 ext4_insert_delayed_block fs/ext4/inode.c:1696 [inline] ext4_da_map_blocks fs/ext4/inode.c:1806 [inline] ext4_da_get_block_prep+0x9e8/0x13c0 fs/ext4/inode.c:1870 ext4_block_write_begin+0x6a8/0x2290 fs/ext4/inode.c:1098 ext4_da_write_begin+0x539/0x760 fs/ext4/inode.c:3082 generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772 ext4_buffered_write_iter+0x122/0x3a0 fs/ext4/file.c:285 ext4_file_write_iter+0x1d0/0x18f0 call_write_iter include/linux/fs.h:2186 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x7dc/0xc50 fs/read_write.c:584 ksys_write+0x177/0x2a0 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f4b7a9737b9 RSP: 002b:00007ffc5cac3668 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4b7a9737b9 RDX: 00000000175d9003 RSI: 0000000020000200 RDI: 0000000000000004 RBP: 00007f4b7a933050 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000079f R11: 0000000000000246 R12: 00007f4b7a9330e0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Above issue is happens when enable bigalloc and inline data feature. As commit 131294c35ed6 fixed delayed allocation bug in ext4_clu_mapped for bigalloc + inline. But it only resolved issue when has inline data, if inline data has been converted to extent(ext4_da_convert_inline_data_to_extent) before writepages, there is no EXT4_STATE_MAY_INLINE_DATA flag. However i_data is still store inline data in this scene. Then will trigger UAF when find extent. To resolve above issue, there is need to add judge "ext4_has_inline_data(inode)" in ext4_clu_mapped(). Fixes: 131294c35ed6 ("ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline") Reported-by: [email protected] Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Ye Bin <[email protected]> Reviewed-by: Tudor Ambarus <[email protected]> Tested-by: Tudor Ambarus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-04-28Merge tag 'x86_mm_for_6.4' of ↵Linus Torvalds35-149/+1701
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 LAM (Linear Address Masking) support from Dave Hansen: "Add support for the new Linear Address Masking CPU feature. This is similar to ARM's Top Byte Ignore and allows userspace to store metadata in some bits of pointers without masking it out before use" * tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA selftests/x86/lam: Add test cases for LAM vs thread creation selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking selftests/x86/lam: Add inherit test cases for linear-address masking selftests/x86/lam: Add io_uring test cases for linear-address masking selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking x86/mm/iommu/sva: Make LAM and SVA mutually exclusive iommu/sva: Replace pasid_valid() helper with mm_valid_pasid() mm: Expose untagging mask in /proc/$PID/status x86/mm: Provide arch_prctl() interface for LAM x86/mm: Reduce untagged_addr() overhead for systems without LAM x86/uaccess: Provide untagged_addr() and remove tags before address check mm: Introduce untagged_addr_remote() x86/mm: Handle LAM on context switch x86: CPUID and CR3/CR4 flags for Linear Address Masking x86: Allow atomic MM_CONTEXT flags setting x86/mm: Rework address range check in get_user() and put_user()
2023-04-28writeback: fix call of incorrect macroMaxim Korotkov1-1/+1
the variable 'history' is of type u16, it may be an error that the hweight32 macro was used for it I guess macro hweight16 should be used Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 2a81490811d0 ("writeback: implement foreign cgroup inode detection") Signed-off-by: Maxim Korotkov <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-04-28Merge tag 'md-next-2023-04-28' of ↵Jens Axboe2-4/+47
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.4/block Pull MD fixes from Song: "1. Improve raid5 sequential IO performance on spinning disks, which fixes a regression since v6.0, by Jan Kara. 2. Fix bitmap offset types, which fixes an issue introduced in this merge window, by Jonathan Derrick." * tag 'md-next-2023-04-28' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: Fix bitmap offset type in sb writer md/raid5: Improve performance for sequential IO
2023-04-28Merge tag 'x86_tdx_for_6.4' of ↵Linus Torvalds4-42/+51
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tdx update from Dave Hansen: "The original tdx hypercall assembly code took two flags in %RSI to tweak its behavior at runtime. PeterZ recently axed one flag in commit e80a48bade61 ("x86/tdx: Remove TDX_HCALL_ISSUE_STI"). Kill the other flag too and tweak the 'output' mode with an assembly macro instead. This results in elimination of one push/pop pair and overall easier to read assembly. - Do conditional __tdx_hypercall() 'output' processing via an assembly macro argument rather than a runtime register" * tag 'x86_tdx_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Drop flags from __tdx_hypercall()
2023-04-28Merge tag 'x86_fpu_for_6.4' of ↵Linus Torvalds2-0/+103
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Dave Hansen: "There's no _actual_ kernel functionality here. This expands the documentation around AMX support including some code examples. The example code also exposed the fact that hardware architecture constants as part of the ABI, but there's no easy place that they get defined for apps. Adding them to a uabi header will eventually make life easier for consumers of the ABI. Summary: - Improve AMX documentation along with example code - Explicitly make some hardware constants part of the uabi" * tag 'x86_fpu_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/x86: Explain the state component permission for guests Documentation/x86: Add the AMX enabling example x86/arch_prctl: Add AMX feature numbers as ABI constants Documentation/x86: Explain the purpose for dynamic features
2023-04-28Merge tag 'x86_cache_for_6.4' of ↵Linus Torvalds1-24/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resctrl update from Dave Hansen: "Reduce redundant counter reads with resctrl refactoring" * tag 'x86_cache_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Avoid redundant counter read in __mon_event_count()
2023-04-28Merge tag 'x86_cleanups_for_v6.4_rc1' of ↵Linus Torvalds9-63/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Unify duplicated __pa() and __va() definitions - Simplify sysctl tables registration - Remove unused symbols - Correct function name in comment * tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Centralize __pa()/__va() definitions x86: Simplify one-level sysctl registration for itmt_kern_table x86: Simplify one-level sysctl registration for abi_table2 x86/platform/intel-mid: Remove unused definitions from intel-mid.h x86/uaccess: Remove memcpy_page_flushcache() x86/entry: Change stale function name in comment to error_return()
2023-04-28md: Fix bitmap offset type in sb writerJonathan Derrick1-3/+3
Bitmap offset is allowed to be negative, indicating that bitmap precedes metadata. Change the type back from sector_t to loff_t to satisfy conditionals and calculations. Reported-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/linux-raid/CAPhsuW6HuaUJ5WcyPajVgUfkQFYp2D_cy1g6qxN4CU_gP2=z7g@mail.gmail.com/ Fixes: 10172f200b67 ("md: Fix types in sb writer") Signed-off-by: Jonathan Derrick <[email protected]> Suggested-by: Yu Kuai <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-28md/raid5: Improve performance for sequential IOJan Kara1-1/+44
Commit 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()") changed the order in which requests for underlying disks are created. Since for large sequential IO adding of requests frequently races with md_raid5 thread submitting bios to underlying disks, this results in a change in IO pattern because intermediate states of new order of request creation result in more smaller discontiguous requests. For RAID5 on top of three rotational disks our performance testing revealed this results in regression in write throughput: iozone -a -s 131072000 -y 4 -q 8 -i 0 -i 1 -R before 7e55c60acfbb: KB reclen write rewrite read reread 131072000 4 493670 525964 524575 513384 131072000 8 540467 532880 512028 513703 after 7e55c60acfbb: KB reclen write rewrite read reread 131072000 4 421785 456184 531278 509248 131072000 8 459283 456354 528449 543834 To reduce the amount of discontiguous requests we can start generating requests with the stripe with the lowest chunk offset as that has the best chance of being adjacent to IO queued previously. This improves the performance to: KB reclen write rewrite read reread 131072000 4 497682 506317 518043 514559 131072000 8 514048 501886 506453 504319 restoring big part of the regression. Fixes: 7e55c60acfbb ("md/raid5: Pivot raid5_make_request()") Cc: [email protected] # v6.0+ Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Logan Gunthorpe <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-04-28lsm: move hook comments docs to security/security.cRandy Dunlap3-5/+5
Fix one kernel-doc warning, but invesigating that led to other kernel-doc movement (lsm_hooks.h to security.c) that needs to be fixed also. include/linux/lsm_hooks.h:1: warning: no structured comments found Fixes: e261301c851a ("lsm: move the remaining LSM hook comments to security/security.c") Fixes: 1cd2aca64a5d ("lsm: move the io_uring hook comments to security/security.c") Fixes: 452b670c7222 ("lsm: move the perf hook comments to security/security.c") Fixes: 55e853201a9e ("lsm: move the bpf hook comments to security/security.c") Fixes: b14faf9c94a6 ("lsm: move the audit hook comments to security/security.c") Fixes: 1427ddbe5cc1 ("lsm: move the binder hook comments to security/security.c") Fixes: 43fad2821876 ("lsm: move the sysv hook comments to security/security.c") Fixes: ecc419a44535 ("lsm: move the key hook comments to security/security.c") Fixes: 742b99456e86 ("lsm: move the xfrm hook comments to security/security.c") Fixes: ac318aed5498 ("lsm: move the Infiniband hook comments to security/security.c") Fixes: 4a49f592e931 ("lsm: move the SCTP hook comments to security/security.c") Fixes: 6b6bbe8c02a1 ("lsm: move the socket hook comments to security/security.c") Fixes: 2c2442fd46cd ("lsm: move the AF_UNIX hook comments to security/security.c") Fixes: 2bcf51bf2f03 ("lsm: move the netlink hook comments to security/security.c") Fixes: 130c53bfee4b ("lsm: move the task hook comments to security/security.c") Fixes: a0fd6480de48 ("lsm: move the file hook comments to security/security.c") Fixes: 9348944b775d ("lsm: move the kernfs hook comments to security/security.c") Fixes: 916e32584dfa ("lsm: move the inode hook comments to security/security.c") Fixes: 08526a902cc4 ("lsm: move the filesystem hook comments to security/security.c") Fixes: 36819f185590 ("lsm: move the fs_context hook comments to security/security.c") Fixes: 1661372c912d ("lsm: move the program execution hook comments to security/security.c") Signed-off-by: Randy Dunlap <[email protected]> Cc: Paul Moore <[email protected]> Cc: James Morris <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: [email protected] Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Cc: KP Singh <[email protected]> Cc: [email protected] Signed-off-by: Paul Moore <[email protected]>
2023-04-28ext4: fix i_disksize exceeding i_size problem in paritally written caseZhihao Cheng1-0/+3
It is possible for i_disksize can exceed i_size, triggering a warning. generic_perform_write copied = iov_iter_copy_from_user_atomic(len) // copied < len ext4_da_write_end | ext4_update_i_disksize | new_i_size = pos + copied; | WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize | generic_write_end | copied = block_write_end(copied, len) // copied = 0 | if (unlikely(copied < len)) | if (!PageUptodate(page)) | copied = 0; | if (pos + copied > inode->i_size) // return false if (unlikely(copied == 0)) goto again; if (unlikely(iov_iter_fault_in_readable(i, bytes))) { status = -EFAULT; break; } We get i_disksize greater than i_size here, which could trigger WARNING check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio: ext4_dio_write_iter iomap_dio_rw __iomap_dio_rw // return err, length is not aligned to 512 ext4_handle_inode_extension WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319 CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2 RIP: 0010:ext4_file_write_iter+0xbc7 Call Trace: vfs_write+0x3b1 ksys_write+0x77 do_syscall_64+0x39 Fix it by updating 'copied' value before updating i_disksize just like ext4_write_inline_data_end() does. A reproducer can be found in the buganizer link below. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209 Fixes: 64769240bd07 ("ext4: Add delayed allocation support in data=writeback mode") Signed-off-by: Zhihao Cheng <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2023-04-28tpm: Re-enable TPM chip boostrapping non-tpm_tis TPM driversJarkko Sakkinen4-11/+28
TPM chip bootstrapping was removed from tpm_chip_register(), and it was relocated to tpm_tis_core. This breaks all drivers which are not based on tpm_tis because the chip will not get properly initialized. Take the corrective steps: 1. Rename tpm_chip_startup() as tpm_chip_bootstrap() and make it one-shot. 2. Call tpm_chip_bootstrap() in tpm_chip_register(), which reverts the things as tehy used to be. Cc: Lino Sanfilippo <[email protected]> Fixes: 548eb516ec0f ("tpm, tpm_tis: startup chip before testing for interrupts") Reported-by: Pengfei Xu <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Tested-by: Pengfei Xu <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]>
2023-04-28crypto: engine - fix crypto_queue backlog handlingOlivier Bacon2-3/+6
CRYPTO_TFM_REQ_MAY_BACKLOG tells the crypto driver that it should internally backlog requests until the crypto hw's queue becomes full. At that point, crypto_engine backlogs the request and returns -EBUSY. Calling driver such as dm-crypt then waits until the complete() function is called with a status of -EINPROGRESS before sending a new request. The problem lies in the call to complete() with a value of -EINPROGRESS that is made when a backlog item is present on the queue. The call is done before the successful execution of the crypto request. In the case that do_one_request() returns < 0 and the retry support is available, the request is put back in the queue. This leads upper drivers to send a new request even if the queue is still full. The problem can be reproduced by doing a large dd into a crypto dm-crypt device. This is pretty easy to see when using Freescale CAAM crypto driver and SWIOTLB dma. Since the actual amount of requests that can be hold in the queue is unlimited we get IOs error and dma allocation. The fix is to call complete with a value of -EINPROGRESS only if the request is not enqueued back in crypto_queue. This is done by calling complete() later in the code. In order to delay the decision, crypto_queue is modified to correctly set the backlog pointer when a request is enqueued back. Fixes: 6a89f492f8e5 ("crypto: engine - support for parallel requests based on retry mechanism") Co-developed-by: Sylvain Ouellet <[email protected]> Signed-off-by: Sylvain Ouellet <[email protected]> Signed-off-by: Olivier Bacon <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-04-28crypto: sun8i-ss - Fix a test in sun8i_ss_setup_ivs()Christophe JAILLET1-1/+1
SS_ENCRYPTION is (0 << 7 = 0), so the test can never be true. Use a direct comparison to SS_ENCRYPTION instead. The same king of test is already done the same way in sun8i_ss_run_task(). Fixes: 359e893e8af4 ("crypto: sun8i-ss - rework handling of IV") Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-04-28ALSA: emu10k1: use more existing defines instead of open-coded numbersOswald Buddenhagen7-65/+69
Using the *_MASK defines for "maximal value" is debatable. I got the idea from FreeBSD, and it sorta makes sense to me. Some hunks look a bit incomplete, because code that is going to be subsequently removed is not touched here. Signed-off-by: Oswald Buddenhagen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-04-28net: dsa: mv88e6xxx: add mv88e6321 rsvd2cpuAngelo Dureghello1-0/+1
Add rsvd2cpu capability for mv88e6321 model, to allow proper bpdu processing. Signed-off-by: Angelo Dureghello <[email protected]> Fixes: 51c901a775621 ("net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU") Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28net: ipv6: fix skb hash for some RST packetsAntoine Tenart1-1/+1
The skb hash comes from sk->sk_txhash when using TCP, except for some IPv6 RST packets. This is because in tcp_v6_send_reset when not in TIME_WAIT the hash is taken from sk->sk_hash, while it should come from sk->sk_txhash as those two hashes are not computed the same way. Packetdrill script to test the above, 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > (flowlabel 0x1) S 0:0(0) <...> // Wrong ack seq, trigger a rst. +0 < S. 0:0(0) ack 0 win 4000 // Check the flowlabel matches prior one from SYN. +0 > (flowlabel 0x1) R 0:0(0) <...> Fixes: 9258b8b1be2e ("ipv6: tcp: send consistent autoflowlabel in RST packets") Signed-off-by: Antoine Tenart <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28selftests: srv6: make srv6_end_dt46_l3vpn_test more robustAndrea Mayer1-5/+5
On some distributions, the rp_filter is automatically set (=1) by default on a netdev basis (also on VRFs). In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using the table associated with the VRF bound to that tunnel. During lookup operations, the rp_filter can lead to packet loss when activated on the VRF. Therefore, we chose to make this selftest more robust by explicitly disabling the rp_filter during tests (as it is automatically set by some Linux distributions). Fixes: 03a0b567a03d ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior") Reported-by: Hangbin Liu <[email protected]> Signed-off-by: Andrea Mayer <[email protected]> Tested-by: Hangbin Liu <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28atlantic:hw_atl2:hw_atl2_utils_fw: Remove unnecessary (void*) conversionswuych1-2/+2
Pointer variables of void * type do not require type cast. Signed-off-by: wuych <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28sit: update dev->needed_headroom in ipip6_tunnel_bind_dev()Cong Wang1-3/+5
When a tunnel device is bound with the underlying device, its dev->needed_headroom needs to be updated properly. IPv4 tunnels already do the same in ip_tunnel_bind_dev(). Otherwise we may not have enough header room for skb, especially after commit b17f709a2401 ("gue: TX support for using remote checksum offload option"). Fixes: 32b8a8e59c9c ("sit: add IPv4 over IPv4 support") Reported-by: Palash Oswal <[email protected]> Link: https://lore.kernel.org/netdev/CAGyP=7fDcSPKu6nttbGwt7RXzE3uyYxLjCSE97J64pRxJP8jPA@mail.gmail.com/ Cc: Kuniyuki Iwashima <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: Cong Wang <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28net/sched: cls_api: remove block_cb from driver_list before freeingVlad Buslov1-0/+1
Error handler of tcf_block_bind() frees the whole bo->cb_list on error. However, by that time the flow_block_cb instances are already in the driver list because driver ndo_setup_tc() callback is called before that up the call chain in tcf_block_offload_cmd(). This leaves dangling pointers to freed objects in the list and causes use-after-free[0]. Fix it by also removing flow_block_cb instances from driver_list before deallocating them. [0]: [ 279.868433] ================================================================== [ 279.869964] BUG: KASAN: slab-use-after-free in flow_block_cb_setup_simple+0x631/0x7c0 [ 279.871527] Read of size 8 at addr ffff888147e2bf20 by task tc/2963 [ 279.873151] CPU: 6 PID: 2963 Comm: tc Not tainted 6.3.0-rc6+ #4 [ 279.874273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 279.876295] Call Trace: [ 279.876882] <TASK> [ 279.877413] dump_stack_lvl+0x33/0x50 [ 279.878198] print_report+0xc2/0x610 [ 279.878987] ? flow_block_cb_setup_simple+0x631/0x7c0 [ 279.879994] kasan_report+0xae/0xe0 [ 279.880750] ? flow_block_cb_setup_simple+0x631/0x7c0 [ 279.881744] ? mlx5e_tc_reoffload_flows_work+0x240/0x240 [mlx5_core] [ 279.883047] flow_block_cb_setup_simple+0x631/0x7c0 [ 279.884027] tcf_block_offload_cmd.isra.0+0x189/0x2d0 [ 279.885037] ? tcf_block_setup+0x6b0/0x6b0 [ 279.885901] ? mutex_lock+0x7d/0xd0 [ 279.886669] ? __mutex_unlock_slowpath.constprop.0+0x2d0/0x2d0 [ 279.887844] ? ingress_init+0x1c0/0x1c0 [sch_ingress] [ 279.888846] tcf_block_get_ext+0x61c/0x1200 [ 279.889711] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.890682] ? clsact_init+0x2b0/0x2b0 [sch_ingress] [ 279.891701] qdisc_create+0x401/0xea0 [ 279.892485] ? qdisc_tree_reduce_backlog+0x470/0x470 [ 279.893473] tc_modify_qdisc+0x6f7/0x16d0 [ 279.894344] ? tc_get_qdisc+0xac0/0xac0 [ 279.895213] ? mutex_lock+0x7d/0xd0 [ 279.896005] ? __mutex_lock_slowpath+0x10/0x10 [ 279.896910] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.897770] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 279.898672] ? __sys_sendmsg+0xb5/0x140 [ 279.899494] ? do_syscall_64+0x3d/0x90 [ 279.900302] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.901337] ? kasan_save_stack+0x2e/0x40 [ 279.902177] ? kasan_save_stack+0x1e/0x40 [ 279.903058] ? kasan_set_track+0x21/0x30 [ 279.903913] ? kasan_save_free_info+0x2a/0x40 [ 279.904836] ? ____kasan_slab_free+0x11a/0x1b0 [ 279.905741] ? kmem_cache_free+0x179/0x400 [ 279.906599] netlink_rcv_skb+0x12c/0x360 [ 279.907450] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 279.908360] ? netlink_ack+0x1550/0x1550 [ 279.909192] ? rhashtable_walk_peek+0x170/0x170 [ 279.910135] ? kmem_cache_alloc_node+0x1af/0x390 [ 279.911086] ? _copy_from_iter+0x3d6/0xc70 [ 279.912031] netlink_unicast+0x553/0x790 [ 279.912864] ? netlink_attachskb+0x6a0/0x6a0 [ 279.913763] ? netlink_recvmsg+0x416/0xb50 [ 279.914627] netlink_sendmsg+0x7a1/0xcb0 [ 279.915473] ? netlink_unicast+0x790/0x790 [ 279.916334] ? iovec_from_user.part.0+0x4d/0x220 [ 279.917293] ? netlink_unicast+0x790/0x790 [ 279.918159] sock_sendmsg+0xc5/0x190 [ 279.918938] ____sys_sendmsg+0x535/0x6b0 [ 279.919813] ? import_iovec+0x7/0x10 [ 279.920601] ? kernel_sendmsg+0x30/0x30 [ 279.921423] ? __copy_msghdr+0x3c0/0x3c0 [ 279.922254] ? import_iovec+0x7/0x10 [ 279.923041] ___sys_sendmsg+0xeb/0x170 [ 279.923854] ? copy_msghdr_from_user+0x110/0x110 [ 279.924797] ? ___sys_recvmsg+0xd9/0x130 [ 279.925630] ? __perf_event_task_sched_in+0x183/0x470 [ 279.926656] ? ___sys_sendmsg+0x170/0x170 [ 279.927529] ? ctx_sched_in+0x530/0x530 [ 279.928369] ? update_curr+0x283/0x4f0 [ 279.929185] ? perf_event_update_userpage+0x570/0x570 [ 279.930201] ? __fget_light+0x57/0x520 [ 279.931023] ? __switch_to+0x53d/0xe70 [ 279.931846] ? sockfd_lookup_light+0x1a/0x140 [ 279.932761] __sys_sendmsg+0xb5/0x140 [ 279.933560] ? __sys_sendmsg_sock+0x20/0x20 [ 279.934436] ? fpregs_assert_state_consistent+0x1d/0xa0 [ 279.935490] do_syscall_64+0x3d/0x90 [ 279.936300] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.937311] RIP: 0033:0x7f21c814f887 [ 279.938085] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 279.941448] RSP: 002b:00007fff11efd478 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 279.942964] RAX: ffffffffffffffda RBX: 0000000064401979 RCX: 00007f21c814f887 [ 279.944337] RDX: 0000000000000000 RSI: 00007fff11efd4e0 RDI: 0000000000000003 [ 279.945660] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 279.947003] R10: 00007f21c8008708 R11: 0000000000000246 R12: 0000000000000001 [ 279.948345] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400 [ 279.949690] </TASK> [ 279.950706] Allocated by task 2960: [ 279.951471] kasan_save_stack+0x1e/0x40 [ 279.952338] kasan_set_track+0x21/0x30 [ 279.953165] __kasan_kmalloc+0x77/0x90 [ 279.954006] flow_block_cb_setup_simple+0x3dd/0x7c0 [ 279.955001] tcf_block_offload_cmd.isra.0+0x189/0x2d0 [ 279.956020] tcf_block_get_ext+0x61c/0x1200 [ 279.956881] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.957873] qdisc_create+0x401/0xea0 [ 279.958656] tc_modify_qdisc+0x6f7/0x16d0 [ 279.959506] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.960392] netlink_rcv_skb+0x12c/0x360 [ 279.961216] netlink_unicast+0x553/0x790 [ 279.962044] netlink_sendmsg+0x7a1/0xcb0 [ 279.962906] sock_sendmsg+0xc5/0x190 [ 279.963702] ____sys_sendmsg+0x535/0x6b0 [ 279.964534] ___sys_sendmsg+0xeb/0x170 [ 279.965343] __sys_sendmsg+0xb5/0x140 [ 279.966132] do_syscall_64+0x3d/0x90 [ 279.966908] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.968407] Freed by task 2960: [ 279.969114] kasan_save_stack+0x1e/0x40 [ 279.969929] kasan_set_track+0x21/0x30 [ 279.970729] kasan_save_free_info+0x2a/0x40 [ 279.971603] ____kasan_slab_free+0x11a/0x1b0 [ 279.972483] __kmem_cache_free+0x14d/0x280 [ 279.973337] tcf_block_setup+0x29d/0x6b0 [ 279.974173] tcf_block_offload_cmd.isra.0+0x226/0x2d0 [ 279.975186] tcf_block_get_ext+0x61c/0x1200 [ 279.976080] ingress_init+0x112/0x1c0 [sch_ingress] [ 279.977065] qdisc_create+0x401/0xea0 [ 279.977857] tc_modify_qdisc+0x6f7/0x16d0 [ 279.978695] rtnetlink_rcv_msg+0x5fe/0x9d0 [ 279.979562] netlink_rcv_skb+0x12c/0x360 [ 279.980388] netlink_unicast+0x553/0x790 [ 279.981214] netlink_sendmsg+0x7a1/0xcb0 [ 279.982043] sock_sendmsg+0xc5/0x190 [ 279.982827] ____sys_sendmsg+0x535/0x6b0 [ 279.983703] ___sys_sendmsg+0xeb/0x170 [ 279.984510] __sys_sendmsg+0xb5/0x140 [ 279.985298] do_syscall_64+0x3d/0x90 [ 279.986076] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 279.987532] The buggy address belongs to the object at ffff888147e2bf00 which belongs to the cache kmalloc-192 of size 192 [ 279.989747] The buggy address is located 32 bytes inside of freed 192-byte region [ffff888147e2bf00, ffff888147e2bfc0) [ 279.992367] The buggy address belongs to the physical page: [ 279.993430] page:00000000550f405c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147e2a [ 279.995182] head:00000000550f405c order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 279.996713] anon flags: 0x200000000010200(slab|head|node=0|zone=2) [ 279.997878] raw: 0200000000010200 ffff888100042a00 0000000000000000 dead000000000001 [ 279.999384] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 280.000894] page dumped because: kasan: bad access detected [ 280.002386] Memory state around the buggy address: [ 280.003338] ffff888147e2be00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.004781] ffff888147e2be80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 280.006224] >ffff888147e2bf00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.007700] ^ [ 280.008592] ffff888147e2bf80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 280.010035] ffff888147e2c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 280.011564] ================================================================== Fixes: 59094b1e5094 ("net: sched: use flow block API") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28mISDN: Use list_count_nodes()Christophe JAILLET1-13/+2
count_list_member() really looks the same as list_count_nodes(), so use the latter instead of hand writing it. The first one return an int and the other a size_t, but that should be fine. It is really unlikely that we get so many parties in a conference. Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28tcp: fix skb_copy_ubufs() vs BIG TCPEric Dumazet1-6/+14
David Ahern reported crashes in skb_copy_ubufs() caused by TCP tx zerocopy using hugepages, and skb length bigger than ~68 KB. skb_copy_ubufs() assumed it could copy all payload using up to MAX_SKB_FRAGS order-0 pages. This assumption broke when BIG TCP was able to put up to 512 KB per skb. We did not hit this bug at Google because we use CONFIG_MAX_SKB_FRAGS=45 and limit gso_max_size to 180000. A solution is to use higher order pages if needed. v2: add missing __GFP_COMP, or we leak memory. Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536") Reported-by: David Ahern <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/T/ Signed-off-by: Eric Dumazet <[email protected]> Cc: Xin Long <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Coco Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28net/ncsi: clear Tx enable mode when handling a Config required AENCosmo Chou1-0/+1
ncsi_channel_is_tx() determines whether a given channel should be used for Tx or not. However, when reconfiguring the channel by handling a Configuration Required AEN, there is a misjudgment that the channel Tx has already been enabled, which results in the Enable Channel Network Tx command not being sent. Clear the channel Tx enable flag before reconfiguring the channel to avoid the misjudgment. Fixes: 8d951a75d022 ("net/ncsi: Configure multi-package, multi-channel modes with failover") Signed-off-by: Cosmo Chou <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-04-28i3c: ast2600: fix register setting for 545 ohm pullupsJeremy Kerr1-2/+2
The 2k register setting is zero, OR-ing it in doesn't parallel the 2k and 750 ohm pullups. We need a separate value for the 545 ohm setting. Reported-by: Lukwinski Zbigniew <[email protected]> Signed-off-by: Jeremy Kerr <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: ast2600: enable IBI supportJeremy Kerr1-0/+21
The ast2600 i3c hardware is capable of IBIs, but we need a workaround for a hardware issue with the I3C state machine handling IBI payloads of specific lengths when PEC is not enabled. To avoid this, we need to unconditionally enable PECs, at the consquence of losing a byte of data when the device does not send a PEC. Enable IBIs on the ast2600 platform, including an implementation of the PEC workaround, which prints a warning when triggered. Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/ba923b96d6d129024c975e8a0472c5b2fcb3af32.1680161823.git.jk@codeconstruct.com.au Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: dw: Add a platform facility for IBI PEC workaroundsJeremy Kerr2-0/+29
On the AST2600 i3c controller, we'll need to apply a workaround for a hardware issue with IBI payloads. Introduce a platform hook to allow dw i3c platform implementations to modify the DAT entry in IBI enable/disable to allow this workaround in a future change. Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/d5d76a8d2336d2a71886537f42e71d51db184df6.1680161823.git.jk@codeconstruct.com.au Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: dw: Add support for in-band interruptsJeremy Kerr2-3/+289
This change adds support for receiving and dequeueing i3c IBIs. By setting struct dw_i3c_master->ibi_capable before probe, a platform implementation can select the IBI-enabled version of the i3c_master_ops, enabling the global IBI infrastrcture for that controller. Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/79daeefd7ccb7c935d0c159149df21a6c9a73ffa.1680161823.git.jk@codeconstruct.com.au Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: dw: Turn DAT array entry into a structJeremy Kerr2-12/+21
In an upcoming change, we will want to store additional data about the devices we have in the data address table. Change the type of the DAT entries into a struct, which currently just has the address data. Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/9dc0d9e2857e851a0cf04819df48e5d31921f83e.1680161823.git.jk@codeconstruct.com.au Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: dw: Create a generic fifo read functionJeremy Kerr1-4/+10
In a future change we'll want to read from the IBI FIFO too, so turn dw_i3c_read_rx_fifo() into a generic read with the FIFO register as a parameter. Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/827204789583dd86addffb47ecaeab9d67cf95d5.1680161823.git.jk@codeconstruct.com.au Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: Allow OF-alias-based persistent bus numberingJeremy Kerr1-5/+25
Parse the /aliases node to assign any fixed bus numbers, as is done with the i2c subsystem. Numbering for non-aliased busses will start after the highest fixed bus number. This allows an alias node such as: aliases { i3c0 = &bus_a, i3c4 = &bus_b, }; to set the numbering for a set of i3c controllers: /* fixed-numbered bus, assigned "i3c-0" */ bus_a: i3c-master { }; /* another fixed-numbered bus, assigned "i3c-4" */ bus_b: i3c-master { }; /* dynamic-numbered bus, likely assigned "i3c-5" */ bus_c: i3c-master { }; If no i3c device aliases are present, the numbering will stay as-is, starting from 0. Signed-off-by: Jeremy Kerr <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: ast2600: Add AST2600 platform-specific driverJeremy Kerr4-0/+189
Now that we have platform-specific infrastructure for the dw i3c driver, add platform support for the ASPEED AST2600 SoC. The AST2600 has a small set of "i3c global" registers, providing platform-level i3c configuration outside of the i3c core. For the ast2600, we need a couple of extra setup operations: - on probe: find the i3c global register set and parse the SDA pullup resistor values - on init: set the pullups accordingly, and set the i3c instance IDs Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28dt-bindings: i3c: Add AST2600 i3c controllerJeremy Kerr1-0/+72
Add a devicetree binding for the ast2600 i3c controller hardware. This is heavily based on the designware i3c core, plus a reset facility and two platform-specific properties: - sda-pullup-ohms: to specify the value of the configurable pullup resistors on the SDA line - aspeed,global-regs: to reference the (ast2600-specific) i3c global register block, and the device index to use within it. Reviewed-by: Krzysztof Kozlowski <[email protected]> (on v1) Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28i3c: dw: Add infrastructure for platform-specific implementationsJeremy Kerr2-34/+97
The dw i3c core can be integrated into various SoC devices. Platforms that use this core may need a little configuration that is specific to that platform. Add some infrastructure to allow platform-specific behaviour: common probe/remove functions, a set of platform hook operations, and a pointer for platform-specific data in struct dw_i3c_master. Move the common api into a new (i3c local) header file. Platforms will provide their own struct platform_driver, which allocates struct dw_i3c_master, does any platform-specific probe behaviour, and calls into the common probe. A future change will add new platform support that uses this infrastructure. Signed-off-by: Jeremy Kerr <[email protected]> Reviewed-by: Joel Stanley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28rtc: armada38x: use devm_platform_ioremap_resource_byname()Ye Xingchen1-5/+2
Convert platform_get_resource_byname(),devm_ioremap_resource() to a single call to devm_platform_ioremap_resource_byname(), as this is exactly what this function does. Signed-off-by: Ye Xingchen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28rtc: sunplus: use devm_platform_ioremap_resource_byname()Ye Xingchen1-2/+1
Convert platform_get_resource_byname(),devm_ioremap_resource() to a single call to devm_platform_ioremap_resource_byname(), as this is exactly what this function does. Signed-off-by: Ye Xingchen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2023-04-28rtc: jz4740: Make sure clock provider gets removedLars-Peter Clausen1-1/+2
The jz4740 RTC driver registers a clock provider, but never removes it. This leaves a stale clock provider behind that references freed clocks when the device is unbound. Use the managed `devm_of_clk_add_hw_provider()` instead of `of_clk_add_hw_provider()` to make sure the provider gets automatically removed on unbind. Fixes: 5ddfa148de8c ("rtc: jz4740: Register clock provider for the CLK32K pin") Signed-off-by: Lars-Peter Clausen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>