aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-02-21Merge tag 'v6.2' into iommufd.git for-nextJason Gunthorpe54-123/+415
Resolve conflicts from the signature change in iommu_map: - drivers/infiniband/hw/usnic/usnic_uiom.c Switch iommu_map_atomic() to iommu_map(.., GFP_ATOMIC) - drivers/vfio/vfio_iommu_type1.c Following indenting change for GFP_KERNEL Signed-off-by: Jason Gunthorpe <[email protected]>
2023-02-20Merge tag 'sched-core-2023-02-20' of ↵Linus Torvalds23-53/+290
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Improve the scalability of the CFS bandwidth unthrottling logic with large number of CPUs. - Fix & rework various cpuidle routines, simplify interaction with the generic scheduler code. Add __cpuidle methods as noinstr to objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks. - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query previously issued registrations. - Limit scheduler slice duration to the sysctl_sched_latency period, to improve scheduling granularity with a large number of SCHED_IDLE tasks. - Debuggability enhancement on sys_exit(): warn about disabled IRQs, but also enable them to prevent a cascade of followup problems and repeat warnings. - Fix the rescheduling logic in prio_changed_dl(). - Micro-optimize cpufreq and sched-util methods. - Micro-optimize ttwu_runnable() - Micro-optimize the idle-scanning in update_numa_stats(), select_idle_capacity() and steal_cookie_task(). - Update the RSEQ code & self-tests - Constify various scheduler methods - Remove unused methods - Refine __init tags - Documentation updates - Misc other cleanups, fixes * tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits) sched/rt: pick_next_rt_entity(): check list_entry sched/deadline: Add more reschedule cases to prio_changed_dl() sched/fair: sanitize vruntime of entity being placed sched/fair: Remove capacity inversion detection sched/fair: unlink misfit task from cpu overutilized objtool: mem*() are not uaccess safe cpuidle: Fix poll_idle() noinstr annotation sched/clock: Make local_clock() noinstr sched/clock/x86: Mark sched_clock() noinstr x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read() x86/atomics: Always inline arch_atomic64*() cpuidle: tracing, preempt: Squash _rcuidle tracing cpuidle: tracing: Warn about !rcu_is_watching() cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG cpuidle: drivers: firmware: psci: Dont instrument suspend code KVM: selftests: Fix build of rseq test exit: Detect and fix irq disabled state in oops cpuidle, arm64: Fix the ARM64 cpuidle logic cpuidle: mvebu: Fix duplicate flags assignment sched/fair: Limit sched slice duration ...
2023-02-20Merge tag 'perf-core-2023-02-20' of ↵Linus Torvalds1-49/+123
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: - Optimize perf_sample_data layout - Prepare sample data handling for BPF integration - Update the x86 PMU driver for Intel Meteor Lake - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids) discovery breakage - Fix the x86 Zhaoxin PMU driver - Cleanups * tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) perf/x86/intel/uncore: Add Meteor Lake support x86/perf/zhaoxin: Add stepping check for ZXC perf/x86/intel/ds: Fix the conversion from TSC to perf time perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table perf/x86/uncore: Add a quirk for UPI on SPR perf/x86/uncore: Ignore broken units in discovery table perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name perf/x86/uncore: Factor out uncore_device_to_die() perf/core: Call perf_prepare_sample() before running BPF perf/core: Introduce perf_prepare_header() perf/core: Do not pass header for sample ID init perf/core: Set data->sample_flags in perf_prepare_sample() perf/core: Add perf_sample_save_brstack() helper perf/core: Add perf_sample_save_raw_data() helper perf/core: Add perf_sample_save_callchain() helper perf/core: Save the dynamic parts of sample data size x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation perf/core: Change the layout of perf_sample_data perf/x86/msr: Add Meteor Lake support perf/x86/cstate: Add Meteor Lake support ...
2023-02-20Merge tag 'locking-core-2023-02-20' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - rwsem micro-optimizations - spinlock micro-optimizations - cleanups, simplifications * tag 'locking-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vduse: Remove include of rwlock.h locking/lockdep: Remove lockdep_init_map_crosslock. x86/ACPI/boot: Use try_cmpxchg() in __acpi_{acquire,release}_global_lock() x86/PAT: Use try_cmpxchg() in set_page_memtype() locking/rwsem: Disable preemption in all down_write*() and up_write() code paths locking/rwsem: Disable preemption in all down_read*() and up_read() code paths locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath locking/qspinlock: Micro-optimize pending state waiting for unlock
2023-02-20net/sched: cls_api: Support hardware miss to tc actionPaul Blakey4-15/+28
For drivers to support partial offload of a filter's action list, add support for action miss to specify an action instance to continue from in sw. CT action in particular can't be fully offloaded, as new connections need to be handled in software. This imposes other limitations on the actions that can be offloaded together with the CT action, such as packet modifications. Assign each action on a filter's action list a unique miss_cookie which drivers can then use to fill action_miss part of the tc skb extension. On getting back this miss_cookie, find the action instance with relevant cookie and continue classifying from there. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-20net/sched: Rename user cookie and act cookiePaul Blakey2-3/+3
struct tc_action->act_cookie is a user defined cookie, and the related struct flow_action_entry->act_cookie is used as an handle similar to struct flow_cls_offload->cookie. Rename tc_action->act_cookie to user_cookie, and flow_action_entry->act_cookie to cookie so their names would better fit their usage. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-20Merge tag 'ieee802154-for-net-next-2023-02-20' of ↵Jakub Kicinski6-42/+197
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2023-02-20 Miquel Raynal build upon his earlier work and introduced two new features into the ieee802154 stack. Beaconing to announce existing PAN's and passive scanning to discover the beacons and associated PAN's. The matching changes to the userspace configuration tool have been posted as well and will be released together with the kernel release. Arnd Bergmann and Dmitry Torokhov worked on converting the at86rf230 and cc2520 drivers away from the unused platform_data usage and towards the new gpiod API. (I had to add a revert as Dmitry found a regression on an already pushed tree on my side). Changes since v1 (pull request 2023-02-02) - Netlink API extack and NLA_POLICY* usage as suggested by Jakub - Removed always true condition found by kernel test robot - Simplify device removal with running background job for scanning - Fix problems with beacon sending in some cases by using the MLME tx path * tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: ieee802154: Drop device trackers mac802154: Fix an always true condition mac802154: Send beacons using the MLME Tx path ieee802154: Change error code on monitor scan netlink request ieee802154: Convert scan error messages to extack ieee802154: Use netlink policies when relevant on scan parameters ieee802154: at86rf230: switch to using gpiod API ieee802154: at86rf230: drop support for platform data Revert "at86rf230: convert to gpio descriptors" cc2520: move to gpio descriptors mac802154: Avoid superfluous endianness handling at86rf230: convert to gpio descriptors mac802154: Handle basic beaconing ieee802154: Add support for user beaconing requests mac802154: Handle passive scanning mac802154: Add MLME Tx locked helpers mac802154: Prepare forcing specific symbol duration ieee802154: Introduce a helper to validate a channel ieee802154: Define a beacon frame header ieee802154: Add support for user scanning requests ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-20net/mlx4_en: Introduce flexible array to silence overflow warningKees Cook1-0/+1
The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY memcpy() warning on ppc64 platform: In function ‘fortify_memcpy_chk’, inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2, inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4, inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3: ./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 513 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same behaviour on x86 you can get if you use "__always_inline" instead of "inline" for skb_copy_from_linear_data() in skbuff.h The call here copies data into inlined tx destricptor, which has 104 bytes (MAX_INLINE) space for data payload. In this case "spc" is known in compile-time but the destination is used with hidden knowledge (real structure of destination is different from that the compiler can see). That cause the fortify warning because compiler can check bounds, but the real bounds are different. "spc" can't be bigger than 64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined tx descriptor. The fact that "inl" points into inlined tx descriptor is determined earlier in mlx4_en_xmit(). Avoid confusing the compiler with "inl + 1" constructions to get to past the inl header by introducing a flexible array "data" to the struct so that the compiler can see that we are not dealing with an array of inl structs, but rather, arbitrary data following the structure. There are no changes to the structure layout reported by pahole, and the resulting machine code is actually smaller. Reported-by: Josef Oskera <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Fixes: f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time") Cc: Yishai Hadas <[email protected]> Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-20Merge tag 'for-netdev' of ↵Jakub Kicinski4-20/+93
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-02-17 We've added 64 non-merge commits during the last 7 day(s) which contain a total of 158 files changed, 4190 insertions(+), 988 deletions(-). The main changes are: 1) Add a rbtree data structure following the "next-gen data structure" precedent set by recently-added linked-list, that is, by using kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky. 2) Add a new benchmark for hashmap lookups to BPF selftests, from Anton Protopopov. 3) Fix bpf_fib_lookup to only return valid neighbors and add an option to skip the neigh table lookup, from Martin KaFai Lau. 4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accouting for container environments, from Yafang Shao. 5) Batch of ice multi-buffer and driver performance fixes, from Alexander Lobakin. 6) Fix a bug in determining whether global subprog's argument is PTR_TO_CTX, which is based on type names which breaks kprobe progs, from Andrii Nakryiko. 7) Prep work for future -mcpu=v4 LLVM option which includes usage of BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier, from Eduard Zingerman. 8) More prep work for later building selftests with Memory Sanitizer in order to detect usages of undefined memory, from Ilya Leoshkevich. 9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer dereference via sendmsg(), from Maciej Fijalkowski. 10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui. 11) Fix BPF memory allocator in combination with BPF hashtab where it could corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao. 12) Fix LoongArch BPF JIT to always use 4 instructions for function address so that instruction sequences don't change between passes, from Hengqi Chen. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits) selftests/bpf: Add bpf_fib_lookup test bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup riscv, bpf: Add bpf trampoline support for RV64 riscv, bpf: Add bpf_arch_text_poke support for RV64 riscv, bpf: Factor out emit_call for kernel and bpf context riscv: Extend patch_text for multiple instructions Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" selftests/bpf: Add global subprog context passing tests selftests/bpf: Convert test_global_funcs test to test_loader framework bpf: Fix global subprog context argument resolution logic LoongArch, bpf: Use 4 instructions for function address in JIT bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state bpf: Disable bh in bpf_test_run for xdp and tc prog xsk: check IFF_UP earlier in Tx path Fix typos in selftest/bpf files selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd() ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-20vringh: fix a typo in comments for vringh_kiovShunsuke Mie1-1/+1
Probably it is a simple copy error from struct vring_iov. Signed-off-by: Shunsuke Mie <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-02-20vdpa: introduce get_vq_dma_device()Jason Wang1-0/+6
This patch introduces a new method to query the dma device that is use for a specific virtqueue. Reviewed-by: Eli Cohen <[email protected]> Tested-by: Eli Cohen <[email protected]> Signed-off-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-02-20virtio_ring: per virtqueue dma deviceJason Wang1-0/+16
This patch introduces a per virtqueue dma device. This will be used for virtio devices whose virtqueue are backed by different underlayer devices. One example is the vDPA that where the control virtqueue could be implemented through software mediation. Some of the work are actually done before since the helper like vring_dma_device(). This work left are: - Let vring_dma_device() return the per virtqueue dma device instead of the vdev's parent. - Allow passing a dma_device when creating the virtqueue through a new helper, old vring creation helper will keep using vdev's parent. Reviewed-by: Eli Cohen <[email protected]> Tested-by: Eli Cohen <[email protected]> Signed-off-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-02-20vhost-vdpa: uAPI to resume the deviceSebastien Boeuf1-0/+8
This new ioctl adds support for resuming the device from userspace. This is required when trying to restore the device in a functioning state after it's been suspended. It is already possible to reset a suspended device, but that means the device must be reconfigured and all the IOMMU/IOTLB mappings must be recreated. This new operation allows the device to be resumed without going through a full reset. This is particularly useful when trying to perform offline migration of a virtual machine (also known as snapshot/restore) as it allows the VMM to resume the virtual machine back to a running state after the snapshot is performed. Acked-by: Jason Wang <[email protected]> Signed-off-by: Sebastien Boeuf <[email protected]> Message-Id: <73b75fb87d25cff59768b4955a81fe7ffe5b4770.1672742878.git.sebastien.boeuf@intel.com> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]>
2023-02-20vhost-vdpa: Introduce RESUME backend feature bitSebastien Boeuf1-0/+2
Userspace knows if the device can be resumed or not by checking this feature bit. It's only exposed if the vdpa driver backend implements the resume() operation callback. Userspace trying to negotiate this feature when it hasn't been exposed will result in an error. Acked-by: Jason Wang <[email protected]> Signed-off-by: Sebastien Boeuf <[email protected]> Message-Id: <b18db236ba3d990cdb41278eb4703be9201d9514.1672742878.git.sebastien.boeuf@intel.com> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]>
2023-02-20vdpa: Add resume operationSebastien Boeuf1-1/+5
Add a new operation to allow a vDPA device to be resumed after it has been suspended. Trying to resume a device that wasn't suspended will result in a no-op. This operation is optional. If it's not implemented, the associated backend feature bit will not be exposed. And if the feature bit is not exposed, invoking this operation will return an error. Acked-by: Jason Wang <[email protected]> Signed-off-by: Sebastien Boeuf <[email protected]> Message-Id: <6e05c4b31b47f3e29cb2bd7ebd56c81f84b8f48a.1672742878.git.sebastien.boeuf@intel.com> Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]>
2023-02-20PCI: Add SolidRun vendor IDAlvaro Karsz1-0/+2
Add SolidRun vendor ID to pci_ids.h The vendor ID is used in 2 different source files, the SNET vDPA driver and PCI quirks. Signed-off-by: Alvaro Karsz <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-02-20Merge tag 'asm-generic-6.3' of ↵Linus Torvalds2-3/+14
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "Only three minor changes: a cross-platform series from Mike Rapoport to consolidate asm/agp.h between architectures, and a correctness change for __generic_cmpxchg_local() from Matt Evans" * tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: char/agp: introduce asm-generic/agp.h char/agp: consolidate {alloc,free}_gatt_pages() locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value
2023-02-20Merge tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds10-1/+698
Pull SoC DT updates from Arnd Bergmann: "About a quarter of the changes are for 32-bit arm, mostly filling in device support for existing machines and adding minor cleanups, mostly for Qualcomm and Samsung based machines. Two new 32-bit SoCs are added, both are quad-core Cortex-A7 chips from Rockchips that have been around for a while but were lacking kernel support so far: RV1126 is a Vision SoC with an NPU and is used in the Edgeble Neural Compute Module 2(Neu2) board, while RK3128 is design for TV boxes and so far only comes with a dts for its refernece design. The other 32-bit boards that were added are two ASpeed AST2600 based BMC boards, the Microchip sam9x60_curiosity development board (Armv5 based!), the Enclustra PE1 FPGA-SoM baseboard, and a few more boards for i.MX53 and i.MX6ULL. On the RISC-V side, there are fewer patches, but a total of ten new single-board computers based on variations of the Allwinner D1/T113 chip, plus one more board based on Microchip Polarfire. As usual, arm64 has by far the most changes here, with over 700 non-merge changesets, among them over 400 alone for Qualcomm. The newly added SoCs this time are all recent high-end embedded SoCs for various markets, each on comes with support for its reference board: - Qualcomm SM8550 (Snapdragon 8 Gen 2) for mobile phones - Qualcomm QDU1000/QRU1000 5G RAN platform - Rockchips RK3588/RK3588s for tablets, chromebooks and SBCs - TI J784S4 for industrial and automotive applications In total, there are 46 new arm64 machines: - Reference platforms for each of the five new SoCs - Three Amlogic based development boards - Six embedded machines based on NXP i.MX8MM and i.MX8MP - The Mediatek mt7986a based Banana Pi R3 router - Six tablets based on Qualcomm MSM8916 (Snapdragon 410), SM6115 (Snapdragon 662) and SM8250 (Snapdragon 865) - Two LTE dongles, also based on MSM8916 - Seven mobile phones, based on Qualcomm MSM8953 (Snapdragon 610), SDM450 and SDM632 - Three chromebooks based on Qualcomm SC7280 (Snapdragon 7c) - Nine development boards based on Rockchips RK3588, RK3568, RK3566 and RK3328. - Five development machines based on TI K3 (AM642/AM654/AM68/AM69) The cleanup of dtc warnings continues across all platforms, adding to the total number of changes" * tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (1035 commits) dt-bindings: riscv: correct starfive visionfive 2 compatibles ARM: dts: socfpga: Add enclustra PE1 devicetree dt-bindings: altera: Add enclustra mercury PE1 arm64: dts: qcom: msm8996: align RPM G-Link clock-controller node with bindings arm64: dts: qcom: qcs404: align RPM G-Link node with bindings arm64: dts: qcom: ipq6018: align RPM G-Link node with bindings arm64: dts: qcom: sm8550: remove invalid interconnect property from cryptobam arm64: dts: qcom: sc7280: Adjust zombie PWM frequency arm64: dts: qcom: sc8280xp-pmics: Specify interrupt parent explicitly arm64: dts: qcom: sm7225-fairphone-fp4: enable remaining i2c busses arm64: dts: qcom: sm7225-fairphone-fp4: move status property down arm64: dts: qcom: pmk8350: Use the correct PON compatible arm64: dts: qcom: sc8280xp-x13s: Enable external display arm64: dts: qcom: sc8280xp-crd: Introduce pmic_glink arm64: dts: qcom: sc8280xp: Add USB-C-related DP blocks arm64: dts: qcom: sm8350-hdk: enable GPU arm64: dts: qcom: sm8350: add GPU, GMU, GPU CC and SMMU nodes arm64: dts: qcom: sm8350: finish reordering nodes arm64: dts: qcom: sm8350: move more nodes to correct place arm64: dts: qcom: sm8350: reorder device nodes ...
2023-02-21x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe rangeYang Jihong1-0/+1
When arch_prepare_optimized_kprobe calculating jump destination address, it copies original instructions from jmp-optimized kprobe (see __recover_optprobed_insn), and calculated based on length of original instruction. arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when checking whether jmp-optimized kprobe exists. As a result, setup_detour_execution may jump to a range that has been overwritten by jump destination address, resulting in an inval opcode error. For example, assume that register two kprobes whose addresses are <func+9> and <func+11> in "func" function. The original code of "func" function is as follows: 0xffffffff816cb5e9 <+9>: push %r12 0xffffffff816cb5eb <+11>: xor %r12d,%r12d 0xffffffff816cb5ee <+14>: test %rdi,%rdi 0xffffffff816cb5f1 <+17>: setne %r12b 0xffffffff816cb5f5 <+21>: push %rbp 1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1. After the optimization, "func" code changes to: 0xffffffff816cc079 <+9>: push %r12 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp Now op1->flags == KPROBE_FLAG_OPTIMATED; 2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2. register_kprobe(kp2) register_aggr_kprobe alloc_aggr_kprobe __prepare_optimized_kprobe arch_prepare_optimized_kprobe __recover_optprobed_insn // copy original bytes from kp1->optinsn.copied_insn, // jump address = <func+14> 3. disable kp1: disable_kprobe(kp1) __disable_kprobe ... if (p == orig_p || aggr_kprobe_disabled(orig_p)) { ret = disarm_kprobe(orig_p, true) // add op1 in unoptimizing_list, not unoptimized orig_p->flags |= KPROBE_FLAG_DISABLED; // op1->flags == KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED ... 4. unregister kp2 __unregister_kprobe_top ... if (!kprobe_disabled(ap) && !kprobes_all_disarmed) { optimize_kprobe(op) ... if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return return; p->kp.flags |= KPROBE_FLAG_OPTIMIZED; // now op2 has KPROBE_FLAG_OPTIMIZED } "func" code now is: 0xffffffff816cc079 <+9>: int3 0xffffffff816cc07a <+10>: push %rsp 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp 5. if call "func", int3 handler call setup_detour_execution: if (p->flags & KPROBE_FLAG_OPTIMIZED) { ... regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX; ... } The code for the destination address is 0xffffffffa021072c: push %r12 0xffffffffa021072e: xor %r12d,%r12d 0xffffffffa0210731: jmp 0xffffffff816cb5ee <func+14> However, <func+14> is not a valid start instruction address. As a result, an error occurs. Link: https://lore.kernel.org/all/[email protected]/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Signed-off-by: Yang Jihong <[email protected]> Cc: [email protected] Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-02-21x86/kprobes: Fix __recover_optprobed_insn check optimizing logicYang Jihong1-0/+1
Since the following commit: commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe may be in the optimizing or unoptimizing state when op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty. The __recover_optprobed_insn check logic is incorrect, a kprobe in the unoptimizing state may be incorrectly determined as unoptimizing. As a result, incorrect instructions are copied. The optprobe_queued_unopt function needs to be exported for invoking in arch directory. Link: https://lore.kernel.org/all/[email protected]/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Cc: [email protected] Signed-off-by: Yang Jihong <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
2023-02-21Merge tag 'drm-misc-next-fixes-2023-02-16' of ↵Dave Airlie1-1/+5
git://anongit.freedesktop.org/drm/drm-misc into drm-next Short summary of fixes pull: Contains fixes for DP MST and the panel orientation on an Lenovo IdeaPad model. Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/Y+4H4C4E6cZcM9+J@linux-uq9g
2023-02-20Merge tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds3-6/+0
Pull ARM SoC updates from Arnd Bergmann: "The majority of the changes are for the OMAP2 platform, mostly removing some dead code that got left behind from previous cleanups. Aside from that, there are very minor updates and correctness fixes for Zynq, i.MX, Samsung, Broadcom, AT91, ep93xx, and OMAP1" * tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits) dt-bindings: soc: samsung: exynos-pmu: allow phys as child ARM: imx: mach-imx6ul: add imx6ulz support ARM: imx: Call ida_simple_remove() for ida_simple_get arm64: drop redundant "ARMv8" from Kconfig option title ARM: ep93xx: Convert to use descriptors for GPIO LEDs ARM: s3c: fix s3c64xx_set_timer_source prototype ARM: OMAP2+: Fix spelling typos in comment ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/machine.h> ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/pinmux.h> ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init() ARM: BCM63xx: remove useless goto statement ARM: omap2: make functions static ARM: omap2: remove unused omap2_pm_init ARM: omap2: remove unused declarations ARM: omap2: remove unused functions ARM: omap2: smartreflex: remove on_init control ARM: omap2: remove APLL control ARM: omap2: simplify clock2xxx header ARM: omap2: remove unused omap_hwmod_reset.c ARM: omap2: remove unused headers ...
2023-02-20Merge tag 'arm-boardfile-remove-6.3' of ↵Linus Torvalds55-3604/+1
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC boardfile updates from Arnd Bergmann "Unused boardfile removal for 6.3 This is a follow-up to the deprecation of most of the old-style board files that was merged in linux-6.0, removing them for good. This branch is almost exclusively dead code removal based on those annotations. Some device driver removals went through separate subsystem trees, but the majority is in the same branch, in order to better handle dependencies between the patches and avoid breaking bisection. Unfortunately that leads to merge conflicts against other changes in the subsystem trees, but they should all be trivial to resolve by removing the files. See commit 7d0d3fa7339e ("Merge tag 'arm-boardfiles-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc") for the description of which machines were marked unused and are now removed. The only removals that got postponed are Terastation WXL (mv78xx0) and Jornada720 (StrongARM1100), which turned out to still have potential users" * tag 'arm-boardfile-remove-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (91 commits) mmc: omap: drop TPS65010 dependency ARM: pxa: restore mfp-pxa320.h usb: ohci-omap: avoid unused-variable warning ARM: debug: remove references in DEBUG_UART_8250_SHIFT to removed configs ARM: s3c: remove obsolete s3c-cpu-freq header MAINTAINERS: adjust SAMSUNG SOC CLOCK DRIVERS after s3c24xx support removal MAINTAINERS: update file entries after arm multi-platform rework and mach-pxa removal ARM: remove CONFIG_UNUSED_BOARD_FILES mfd: remove htc-pasic3 driver w1: remove ds1wm driver usb: remove ohci-tmio driver fbdev: remove w100fb driver fbdev: remove tmiofb driver mmc: remove tmio_mmc driver mfd: remove ucb1400 support mfd: remove toshiba tmio drivers rtc: remove v3020 driver power: remove pda_power supply driver ASoC: pxa: remove unused board support pcmcia: remove unused pxa/sa1100 drivers ...
2023-02-20netfs: Add a function to extract an iterator into a scatterlistDavid Howells1-0/+4
Provide a function for filling in a scatterlist from the list of pages contained in an iterator. If the iterator is UBUF- or IOBUF-type, the pages have a pin taken on them (as FOLL_PIN). If the iterator is BVEC-, KVEC- or XARRAY-type, no pin is taken on the pages and it is left to the caller to manage their lifetime. It cannot be assumed that a ref can be validly taken, particularly in the case of a KVEC iterator. Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: Steve French <[email protected]> cc: Shyam Prasad N <[email protected]> cc: Rohith Surabattula <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2023-02-20netfs: Add a function to extract a UBUF or IOVEC into a BVEC iteratorDavid Howells1-0/+4
Add a function to extract the pages from a user-space supplied iterator (UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by getting a pin on them (as FOLL_PIN) as we go. This is useful in three situations: (1) A userspace thread may have a sibling that unmaps or remaps the process's VM during the operation, changing the assignment of the pages and potentially causing an error. Retaining the pages keeps some pages around, even if this occurs; futher, we find out at the point of extraction if EFAULT is going to be incurred. (2) Pages might get swapped out/discarded if not retained, so we want to retain them to avoid the reload causing a deadlock due to a DIO from/to an mmapped region on the same file. (3) The iterator may get passed to sendmsg() by the filesystem. If a fault occurs, we may get a short write to a TCP stream that's then tricky to recover from. We don't deal with other types of iterator here, leaving it to other mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe lock). Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: Steve French <[email protected]> cc: Shyam Prasad N <[email protected]> cc: Rohith Surabattula <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2023-02-20iov_iter: Add a function to extract a page list from an iteratorDavid Howells1-1/+26
Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a pin added or nothing, depending on the type of iterator. Add a second function, iov_iter_extract_will_pin(), to determine how the cleanup should be done. There are two cases: (1) ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins (FOLL_PIN) obtained on them so that a concurrent fork() will forcibly copy the page so that DMA is done to/from the parent's buffer and is unavailable to/unaffected by the child process. iov_iter_extract_will_pin() will return true for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_will_pin() will return false. The pages don't need additional disposal. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: John Hubbard <[email protected]> cc: David Hildenbrand <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2023-02-20iov_iter: Define flags to qualify page extraction.David Howells1-2/+8
Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is supported. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2023-02-20splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPEDavid Howells1-0/+3
Implement a function, direct_file_splice(), that deals with this by using an ITER_BVEC iterator instead of an ITER_PIPE iterator as the former won't free its buffers when reverted. The function bulk allocates all the buffers it thinks it is going to use in advance, does the read synchronously and only then trims the buffer down. The pages we did use get pushed into the pipe. This fixes a problem with the upcoming iov_iter_extract_pages() function, whereby pages extracted from a non-user-backed iterator such as ITER_PIPE aren't pinned. __iomap_dio_rw(), however, calls iov_iter_revert() to shorten the iterator to just the bufferage it is going to use - which has the side-effect of freeing the excess pipe buffers, even though they're attached to a bio and may get written to by DMA (thanks to Hillf Danton for spotting this[1]). This then causes memory corruption that is particularly noticeable when the syzbot test[2] is run. The test boils down to: out = creat(argv[1], 0666); ftruncate(out, 0x800); lseek(out, 0x200, SEEK_SET); in = open(argv[1], O_RDONLY | O_DIRECT | O_NOFOLLOW); sendfile(out, in, NULL, 0x1dd00); run repeatedly in parallel. What I think is happening is that ftruncate() occasionally shortens the DIO read that's about to be made by sendfile's splice core by reducing i_size. This should be more efficient for DIO read by virtue of doing a bulk page allocation, but slightly less efficient by ignoring any partial page in the pipe. Reported-by: [email protected] Signed-off-by: David Howells <[email protected]> Reviewed-by: Jens Axboe <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/[email protected]/ [2] Signed-off-by: Steve French <[email protected]>
2023-02-20splice: Add a func to do a splice from a buffered file without ITER_PIPEDavid Howells2-0/+23
Provide a function to do splice read from a buffered file, pulling the folios out of the pagecache directly by calling filemap_get_pages() to do any required reading and then pasting the returned folios into the pipe. A helper function is provided to do the actual folio pasting and will handle multipage folios by splicing as many of the relevant subpages as will fit into the pipe. The code is loosely based on filemap_read() and might belong in mm/filemap.c with that as it needs to use filemap_get_pages(). Signed-off-by: David Howells <[email protected]> Reviewed-by: Jens Axboe <[email protected]> cc: Christoph Hellwig <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2023-02-20Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds8-120/+228
Pull block updates from Jens Axboe: - NVMe updates via Christoph: - Small improvements to the logging functionality (Amit Engel) - Authentication cleanups (Hannes Reinecke) - Cleanup and optimize the DMA mapping cod in the PCIe driver (Keith Busch) - Work around the command effects for Format NVM (Keith Busch) - Misc cleanups (Keith Busch, Christoph Hellwig) - Fix and cleanup freeing single sgl (Keith Busch) - MD updates via Song: - Fix a rare crash during the takeover process - Don't update recovery_cp when curr_resync is ACTIVE - Free writes_pending in md_stop - Change active_io to percpu - Updates to drbd, inching us closer to unifying the out-of-tree driver with the in-tree one (Andreas, Christoph, Lars, Robert) - BFQ update adding support for multi-actuator drives (Paolo, Federico, Davide) - Make brd compliant with REQ_NOWAIT (me) - Fix for IOPOLL and queue entering, fixing stalled IO waiting on timeouts (me) - Fix for REQ_NOWAIT with multiple bios (me) - Fix memory leak in blktrace cleanup (Greg) - Clean up sbitmap and fix a potential hang (Kemeng) - Clean up some bits in BFQ, and fix a bug in the request injection (Kemeng) - Clean up the request allocation and issue code, and fix some bugs related to that (Kemeng) - ublk updates and fixes: - Add support for unprivileged ublk (Ming) - Improve device deletion handling (Ming) - Misc (Liu, Ziyang) - s390 dasd fixes (Alexander, Qiheng) - Improve utility of request caching and fixes (Anuj, Xiao) - zoned cleanups (Pankaj) - More constification for kobjs (Thomas) - blk-iocost cleanups (Yu) - Remove bio splitting from drivers that don't need it (Christoph) - Switch blk-cgroups to use struct gendisk. Some of this is now incomplete as select late reverts were done. (Christoph) - Add bvec initialization helpers, and convert callers to use that rather than open-coding it (Christoph) - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin, Matthew, Ulf, Zhong) * tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits) brd: use radix_tree_maybe_preload instead of radix_tree_preload block: use proper return value from bio_failfast() block: bio-integrity: Copy flags when bio_integrity_payload is cloned block: Fix io statistics for cgroup in throttle path brd: mark as nowait compatible brd: check for REQ_NOWAIT and set correct page allocation mask brd: return 0/-error from brd_insert_page() block: sync mixed merged request's failfast with 1st bio's Revert "blk-cgroup: pin the gendisk in struct blkcg_gq" Revert "blk-cgroup: pass a gendisk to blkg_lookup" Revert "blk-cgroup: delay blk-cgroup initialization until add_disk" Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release" Revert "blk-cgroup: move the cgroup information to struct gendisk" nvme-pci: remove iod use_sgls nvme-pci: fix freeing single sgl block: ublk: check IO buffer based on flag need_get_data s390/dasd: Fix potential memleak in dasd_eckd_init() s390/dasd: sort out physical vs virtual pointers usage block: Remove the ALLOC_CACHE_SLACK constant block: make kobj_type structures constant ...
2023-02-20Merge tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull io_uring ITER_UBUF conversion from Jens Axboe: "Since we now have ITER_UBUF available, switch to using it for single ranges as it's more efficient than ITER_IOVEC for that" * tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux: block: use iter_ubuf for single range iov_iter: move iter_ubuf check inside restore WARN io_uring: use iter_ubuf for single range imports io_uring: switch network send/recv to ITER_UBUF iov: add import_ubuf()
2023-02-20Merge tag 'for-6.3/io_uring-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds2-10/+19
Pull io_uring updates from Jens Axboe: - Cleanup series making the async prep and handling of REQ_F_FORCE_ASYNC easier to follow and verify (Dylan) - Enable specifying specific flags for OP_MSG_RING (Breno) - Enable use of KASAN with the internal request cache (Breno) - Split the opcode definition structs into a hot and cold part (Breno) - OP_MSG_RING fixes (Pavel, me) - Fix an issue with IOPOLL cancelation and PREEMPT_NONE (me) - Handle TIF_NOTIFY_RESUME for the io-wq threads that never return to userspace (me) - Add support for using io_uring_register() with a registered ring fd (Josh) - Improve handling of poll on the ring fd (Pavel) - Series improving the task_work handling (Pavel) - Misc cleanups, fixes, improvements (Dmitrii, Quanfa, Richard, Pavel, me) * tag 'for-6.3/io_uring-2023-02-16' of git://git.kernel.dk/linux: (51 commits) io_uring: Support calling io_uring_register with a registered ring fd io_uring,audit: don't log IORING_OP_MADVISE io_uring: mark task TASK_RUNNING before handling resume/task work io_uring: always go async for unsupported open flags io_uring: always go async for unsupported fadvise flags io_uring: for requests that require async, force it io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async io_uring: add reschedule point to handle_tw_list() io_uring: add a conditional reschedule to the IOPOLL cancelation loop io_uring: return normal tw run linking optimisation io_uring: refactor tctx_task_work io_uring: refactor io_put_task helpers io_uring: refactor req allocation io_uring: improve io_get_sqe io_uring: kill outdated comment about overflow flush io_uring: use user visible tail in io_uring_poll() io_uring: pass in io_issue_def to io_assign_file() io_uring: Enable KASAN for request cache io_uring: handle TIF_NOTIFY_RESUME when checking for task_work io_uring/msg-ring: ensure flags passing works for task_work completions ...
2023-02-20of: dynamic: add lifecycle docbook info to node creation functionsFrank Rowand1-0/+11
The existing docbook comments for the functions related to creating a devicetree node do not explain the reference count of a newly created node, how decrementing the reference count to zero will free the associated memory, and the caller's responsibility to call of_node_put() on the node. Explain what happens when the reference count is decremented to zero. Signed-off-by: Frank Rowand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2023-02-20Merge tag 'for-6.3-tag' of ↵Linus Torvalds3-25/+109
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The usual mix of performance improvements and new features. The core change is reworking how checksums are processed, with followup cleanups and simplifications. There are two minor changes in block layer and iomap code. Features: - block group allocation class heuristics: - pack files by size (up to 128k, up to 8M, more) to avoid fragmentation in block groups, assuming that file size and life time is correlated, in particular this may help during balance - with tracepoints and extensible in the future Performance: - send: cache directory utimes and only emit the command when necessary - speedup up to 10x - smaller final stream produced (no redundant utimes commands issued) - compatibility not affected - fiemap: skip backref checks for shared leaves - speedup 3x on sample filesystem with all leaves shared (e.g. on snapshots) - micro optimized b-tree key lookup, speedup in metadata operations (sample benchmark: fs_mark +10% of files/sec) Core changes: - change where checksumming is done in the io path: - checksum and read repair does verification at lower layer - cascaded cleanups and simplifications - raid56 refactoring and cleanups Fixes: - sysfs: make sure that a run-time change of a feature is correctly tracked by the feature files - scrub: better reporting of tree block errors Other: - locally enable -Wmaybe-uninitialized after fixing all warnings - misc cleanups, spelling fixes Other code: - block: export bio_split_rw - iomap: remove IOMAP_F_ZONE_APPEND" * tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (109 commits) btrfs: make kobj_type structures constant btrfs: remove the bdev argument to btrfs_rmap_block btrfs: don't rely on unchanging ->bi_bdev for zone append remaps btrfs: never return true for reads in btrfs_use_zone_append btrfs: pass a btrfs_bio to btrfs_use_append btrfs: set bbio->file_offset in alloc_new_bio btrfs: use file_offset to limit bios size in calc_bio_boundaries btrfs: do unsigned integer division in the extent buffer binary search loop btrfs: eliminate extra call when doing binary search on extent buffer btrfs: raid56: handle endio in scrub_rbio btrfs: raid56: handle endio in recover_rbio btrfs: raid56: handle endio in rmw_rbio btrfs: raid56: submit the read bios from scrub_assemble_read_bios btrfs: raid56: fold rmw_read_wait_recover into rmw_read_bios btrfs: raid56: fold recover_assemble_read_bios into recover_rbio btrfs: raid56: add a bio_list_put helper btrfs: raid56: wait for I/O completion in submit_read_bios btrfs: raid56: simplify code flow in rmw_rbio btrfs: raid56: simplify error handling and code flow in raid56_parity_write btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback ...
2023-02-20include/linux/migrate.h: remove unneeded externsAndrew Morton1-8/+8
As suggested by Matthew. Suggested-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-20mm: change to return bool for isolate_movable_page()Baolin Wang1-3/+3
Now the isolate_movable_page() can only return 0 or -EBUSY, and no users will care about the negative return value, thus we can convert the isolate_movable_page() to return a boolean value to make the code more clear when checking the movable page isolation state. No functional changes intended. [[email protected]: remove unneeded comment, per Matthew] Link: https://lkml.kernel.org/r/cb877f73f4fff8d309611082ec740a7065b1ade0.1676424378.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Linus Torvalds <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-20mm: hugetlb: change to return bool for isolate_hugetlb()Baolin Wang1-3/+3
Now the isolate_hugetlb() only returns 0 or -EBUSY, and most users did not care about the negative value, thus we can convert the isolate_hugetlb() to return a boolean value to make code more clear when checking the hugetlb isolation state. Moreover converts 2 users which will consider the negative value returned by isolate_hugetlb(). No functional changes intended. [[email protected]: shorten locked section, per SeongJae Park] Link: https://lkml.kernel.org/r/12a287c5bebc13df304387087bbecc6421510849.1676424378.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Acked-by: Linus Torvalds <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-20Merge tag 'fsnotify_for_v6.3-rc1' of ↵Linus Torvalds3-5/+39
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Support for auditing decisions regarding fanotify permission events" * tag 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify,audit: Allow audit to use the full permission event response fanotify: define struct members to hold response decision context fanotify: Ensure consistent variable type for response
2023-02-20Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds2-21/+81
Pull fsverity updates from Eric Biggers: "Fix the longstanding implementation limitation that fsverity was only supported when the Merkle tree block size, filesystem block size, and PAGE_SIZE were all equal. Specifically, add support for Merkle tree block sizes less than PAGE_SIZE, and make ext4 support fsverity on filesystems where the filesystem block size is less than PAGE_SIZE. Effectively, this means that fsverity can now be used on systems with non-4K pages, at least on ext4. These changes have been tested using the verity group of xfstests, newly updated to cover the new code paths. Also update fs/verity/ to support verifying data from large folios. There's also a similar patch for fs/crypto/, to support decrypting data from large folios, which I'm including in here to avoid a merge conflict between the fscrypt and fsverity branches" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fscrypt: support decrypting data from large folios fsverity: support verifying data from large folios fsverity.rst: update git repo URL for fsverity-utils ext4: allow verity with fs block size < PAGE_SIZE fs/buffer.c: support fsverity in block_read_full_folio() f2fs: simplify f2fs_readpage_limit() ext4: simplify ext4_readpage_limit() fsverity: support enabling with tree block size < PAGE_SIZE fsverity: support verification with tree block size < PAGE_SIZE fsverity: replace fsverity_hash_page() with fsverity_hash_block() fsverity: use EFBIG for file too large to enable verity fsverity: store log2(digest_size) precomputed fsverity: simplify Merkle tree readahead size calculation fsverity: use unsigned long for level_start fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG fsverity: pass pos and size to ->write_merkle_tree_block fsverity: optimize fsverity_cleanup_inode() on non-verity files fsverity: optimize fsverity_prepare_setattr() on non-verity files fsverity: optimize fsverity_file_open() on non-verity files
2023-02-20Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds1-9/+0
Pull fscrypt updates from Eric Biggers: "Simplify the implementation of the test_dummy_encryption mount option by adding the 'test dummy key' on-demand" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: clean up fscrypt_add_test_dummy_key() fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() f2fs: stop calling fscrypt_add_test_dummy_key() ext4: stop calling fscrypt_add_test_dummy_key() fscrypt: add the test dummy encryption key on-demand
2023-02-20Merge tag 'erofs-for-6.3-rc1' of ↵Linus Torvalds1-6/+11
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "The most noticeable feature for this cycle is per-CPU kthread decompression since Android use cases need low-latency I/O handling in order to ensure the app runtime performance, currently unbounded workqueue latencies are not quite good for production on many aarch64 hardwares and thus we need to introduce a deterministic expectation for these. Decompression is CPU-intensive and it is sleepable for EROFS, so other alternatives like decompression under softirq contexts are not considered. More details are in the corresponding commit message. Others are random cleanups around the whole codebase and we will continue to clean up further in the next few months. Due to Lunar New Year holidays, some other new features were not completely reviewed and solidified as expected and we may delay them into the next version. Summary: - Add per-cpu kthreads for low-latency decompression for Android use cases - Get rid of tagged pointer helpers since they are rarely used now - Several code cleanups to reduce codebase - Documentation and MAINTAINERS updates" * tag 'erofs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (21 commits) erofs: fix an error code in z_erofs_init_zip_subsystem() erofs: unify anonymous inodes for blob erofs: relinquish volume with mutex held erofs: maintain cookies of share domain in self-contained list erofs: remove unused device mapping in meta routine MAINTAINERS: erofs: Add Documentation/ABI/testing/sysfs-fs-erofs Documentation/ABI: sysfs-fs-erofs: update supported features erofs: remove unused EROFS_GET_BLOCKS_RAW flag erofs: update print symbols for various flags in trace erofs: make kobj_type structures constant erofs: add per-cpu threads for decompression as an option erofs: tidy up internal.h erofs: get rid of z_erofs_do_map_blocks() forward declaration erofs: move zdata.h into zdata.c erofs: remove tagged pointer helpers erofs: avoid tagged pointers to mark sync decompression erofs: get rid of erofs_inode_datablocks() erofs: simplify iloc() erofs: get rid of debug_one_dentry() erofs: remove linux/buffer_head.h dependency ...
2023-02-20Merge tag 'fs.v6.3' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs hardening update from Christian Brauner: "Jan pointed out that during shutdown both filp_close() and super block destruction will use basic printk logging when bugs are detected. This causes issues in a few scenarios: - Tools like syzkaller cannot figure out that the logged message indicates a bug. - Users that explicitly opt in to have the kernel bug on data corruption by selecting CONFIG_BUG_ON_DATA_CORRUPTION should see the kernel crash when they did actually select that option. - When there are busy inodes after the superblock is shut down later access to such a busy inodes walks through freed memory. It would be better to cleanly crash instead. All of this can be addressed by using the already existing CHECK_DATA_CORRUPTION() macro in these places when kernel bugs are detected. Its logging improvement is useful for all users. Otherwise this only has a meaningful behavioral effect when users do select CONFIG_BUG_ON_DATA_CORRUPTION which means this is backward compatible for regular users" * tag 'fs.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected
2023-02-20Merge tag 'fs.idmapped.v6.3' of ↵Linus Torvalds15-384/+205
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-02-20Merge tag 'iversion-v6.3' of ↵Linus Torvalds3-39/+31
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull i_version updates from Jeff Layton: "This overhauls how we handle i_version queries from nfsd. Instead of having special routines and grabbing the i_version field directly out of the inode in some cases, we've moved most of the handling into the various filesystems' getattr operations. As a bonus, this makes ceph's change attribute usable by knfsd as well. This should pave the way for future work to make this value queryable by userland, and to make it more resilient against rolling back on a crash" * tag 'iversion-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: nfsd: remove fetch_iversion export operation nfsd: use the getattr operation to fetch i_version nfsd: move nfsd4_change_attribute to nfsfh.c ceph: report the inode version in getattr if requested nfs: report the inode version in getattr if requested vfs: plumb i_version handling into struct kstat fs: clarify when the i_version counter must be updated fs: uninline inode_query_iversion
2023-02-20Merge tag 'locks-v6.3' of ↵Linus Torvalds4-431/+442
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "The main change here is that I've broken out most of the file locking definitions into a new header file. I also went ahead and completed the removal of locks_inode function" * tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove locks_inode filelock: move file locking definitions to separate header file
2023-02-20Merge tag 'tpm-v6.3-rc1' of ↵Linus Torvalds3-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "In additon to bug fixes, these are noteworthy changes: - In TPM I2C drivers, migrate from probe() to probe_new() (a new driver model in I2C). - TPM CRB: Pluton support - Add duplicate hash detection to the blacklist keyring in order to give more meaningful klog output than e.g. [1]" Link: https://askubuntu.com/questions/1436856/ubuntu-22-10-blacklist-problem-blacklisting-hash-13-message-on-boot [1] * tag 'tpm-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: add vendor flag to command code validation tpm: Add reserved memory event log tpm: Use managed allocation for bios event log tpm: tis_i2c: Convert to i2c's .probe_new() tpm: tpm_i2c_nuvoton: Convert to i2c's .probe_new() tpm: tpm_i2c_infineon: Convert to i2c's .probe_new() tpm: tpm_i2c_atmel: Convert to i2c's .probe_new() tpm: st33zp24: Convert to i2c's .probe_new() KEYS: asymmetric: Fix ECDSA use via keyctl uapi certs: don't try to update blacklist keys KEYS: Add new function key_create() certs: make blacklisted hash available in klog tpm_crb: Add support for CRB devices based on Pluton crypto: certs: fix FIPS selftest dependency
2023-02-20filemap: Remove lock_page_killable()Matthew Wilcox (Oracle)1-10/+0
There are no more callers; remove this function before any more appear. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Steve French <[email protected]>
2023-02-20Merge tag 'remove-get_kernel_pages-for-6.3' of ↵Linus Torvalds2-3/+4
https://git.linaro.org/people/jens.wiklander/linux-tee Pull TEE update from Jens Wiklander: "Remove get_kernel_pages() Vmalloc page support is removed from shm_get_kernel_pages() and the get_kernel_pages() call is replaced by calls to get_page(). With no remaining callers of get_kernel_pages() the function is removed" [ This looks like it's just some random 'tee' cleanup, but the bigger picture impetus for this is really to to to remove historical confusion with mixed use of kernel virtual addresses and 'struct page' pointers. Kernel virtual pointers in the vmalloc space is then particularly confusing - both for looking up a page pointer (when trying to then unify a "virtual address or page" interface) and _particularly_ when mixed with HIGHMEM support and the kmap*() family of remapping. This is particularly true with HIGHMEM getting much less test coverage with 32-bit architectures being increasingly legacy targets. So we actively wanted to remove get_kernel_pages() to make sure nobody else used it too, and thus the 'tee' part is "finally remove last user". See also commit 6647e76ab623 ("v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails") for a totally different version of a conceptually similar "let's stop this confusion of different ways of referring to memory". - Linus ] * tag 'remove-get_kernel_pages-for-6.3' of https://git.linaro.org/people/jens.wiklander/linux-tee: mm: Remove get_kernel_pages() tee: Remove call to get_kernel_pages() tee: Remove vmalloc page support highmem: Enhance is_kmap_addr() to check kmap_local_page() mappings
2023-02-20SUNRPC: Remove ->xpo_secure_port()Chuck Lever1-1/+0
There's no need for the cost of this extra virtual function call during every RPC transaction: the RQ_SECURE bit can be set properly in ->xpo_recvfrom() instead. Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2023-02-20SUNRPC: Clean up the svc_xprt_flags() macroChuck Lever1-14/+14
Make this macro more conventional: - Use BIT() instead of open-coding " 1UL << " - Don't display the "XPT_" in every flag name Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>