aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-11Merge branch 'getsockopt-parameter-validation'David S. Miller6-10/+13
Gavrilov Ilia says: ==================== fix incorrect parameter validation in the *_get_sockopt() functions This v2 series fix incorrent parameter validation in *_get_sockopt() functions in several places. version 2 changes: - reword the patch description - add two patches for net/kcm and net/x25 ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-11net/x25: fix incorrect parameter validation in the x25_getsockopt() functionGavrilov Ilia1-2/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11net: kcm: fix incorrect parameter validation in the kcm_getsockopt) functionGavrilov Ilia1-1/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11udp: fix incorrect parameter validation in the udp_lib_getsockopt() functionGavrilov Ilia1-2/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11l2tp: fix incorrect parameter validation in the pppol2tp_getsockopt() functionGavrilov Ilia1-2/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reviewed-by: Tom Parkin <[email protected]> Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11ipmr: fix incorrect parameter validation in the ip_mroute_getsockopt() functionGavrilov Ilia1-1/+3
The 'olr' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'olr' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11tcp: fix incorrect parameter validation in the do_tcp_getsockopt() functionGavrilov Ilia1-2/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <[email protected]> Reviewed-by: Jason Xing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11Merge branch 'qmc-hdlc'David S. Miller6-0/+960
Herve Codina says: ==================== Add support for QMC HDLC This series introduces the QMC HDLC support. Patches were previously sent as part of a full feature series and were previously reviewed in that context: "Add support for QMC HDLC, framer infrastructure and PEF2256 framer" [1] In order to ease the merge, the full feature series has been split and needed parts were merged in v6.8-rc1: - "Prepare the PowerQUICC QMC and TSA for the HDLC QMC driver" [2] - "Add support for framer infrastructure and PEF2256 framer" [3] This series contains patches related to the QMC HDLC part (QMC HDLC driver): - Introduce the QMC HDLC driver (patches 1 and 2) - Add timeslots change support in QMC HDLC (patch 3) - Add framer support as a framer consumer in QMC HDLC (patch 4) Compare to the original full feature series, a modification was done on patch 3 in order to use a coherent prefix in the commit title. I kept the patches unsquashed as they were previously sent and reviewed. Of course, I can squash them if needed. Compared to the previous iteration: https://lore.kernel.org/linux-kernel/[email protected]/ this v7 series mainly: - Rename a variable. - Fix reverse xmas tree declarations. - Add 'Acked-by' tag. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-11net: wan: fsl_qmc_hdlc: Add framer supportHerve Codina1-5/+234
Add framer support in the fsl_qmc_hdlc driver in order to be able to signal carrier changes to the network stack based on the framer status Also use this framer to provide information related to the E1/T1 line interface on IF_GET_IFACE and configure the line interface according to IF_IFACE_{E1,T1} information. Signed-off-by: Herve Codina <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11net: wan: fsl_qmc_hdlc: Add runtime timeslots changes supportHerve Codina1-1/+150
QMC channels support runtime timeslots changes but nothing is done at the QMC HDLC driver to handle these changes. Use existing IFACE ioctl in order to configure the timeslots to use. Signed-off-by: Herve Codina <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11lib/bitmap: Introduce bitmap_scatter() and bitmap_gather() helpersAndy Shevchenko2-0/+143
These helpers scatters or gathers a bitmap with the help of the mask position bits parameter. bitmap_scatter() does the following: src: 0000000001011010 |||||| +------+||||| | +----+|||| | |+----+||| | || +-+|| | || | || mask: ...v..vv...v..vv ...0..11...0..10 dst: 0000001100000010 and bitmap_gather() performs this one: mask: ...v..vv...v..vv src: 0000001100000010 ^ ^^ ^ 0 | || | 10 | || > 010 | |+--> 1010 | +--> 11010 +----> 011010 dst: 0000000000011010 bitmap_gather() can the seen as the reverse bitmap_scatter() operation. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Co-developed-by: Herve Codina <[email protected]> Signed-off-by: Herve Codina <[email protected]> Acked-by: Yury Norov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11MAINTAINERS: Add the Freescale QMC HDLC driver entryHerve Codina1-0/+7
After contributing the driver, add myself as the maintainer for the Freescale QMC HDLC driver. Signed-off-by: Herve Codina <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11net: wan: Add support for QMC HDLCHerve Codina3-0/+432
The QMC HDLC driver provides support for HDLC using the QMC (QUICC Multichannel Controller) to transfer the HDLC data. Signed-off-by: Herve Codina <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-11Merge branch '100GbE' of ↵David S. Miller5-9/+210
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ethtool: ice: Support for RSS settings to GTP Takeru Hayasaka enables RSS functionality for GTP packets on ice driver with ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-11Merge tag 'riscv-dt-fixes-for-v6.8-final' of ↵Arnd Bergmann9-38/+22
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/dt RISC-V Devicetree fixes for v6.8-final Starfive: The previous cleanup broke boot on the jh7100 as the driver depended on the fallback clock name created based on the node-name when clock-output-names is not present. Add clock-output-names to restore working order. Generic: BUILTIN_DTB has been broken for ages on any platform other than the nommu Canaan k210 SoC as the first dtb built (in alphanumerical order), would get built into the image. This didn't get fixed for ages because nobody actually cared about running it other than the k210 enough to fix it. The folks doing Sophgo SG2042 development have come along and fixed it, as they want to use builtin dtbs. linux-boot on that platform reuses the dtb it was provided by OpenSBI when booting linux proper, which is unfortunately not possible to boot a mainline kernel with. Signed-off-by: Conor Dooley <[email protected]> * tag 'riscv-dt-fixes-for-v6.8-final' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig riscv: dts: starfive: jh7100: fix root clock names Link: https://lore.kernel.org/r/20240306-waltz-facial-9e4e1b792053@spud Signed-off-by: Arnd Bergmann <[email protected]>
2024-03-11dt-bindings: opp: drop maxItems from inner itemsDavid Heidelberg1-2/+0
With recent changes within matrix dimensions calculation, dropping maxItems: 1 provides a warning-free run. Fixes warning such as: arch/arm64/boot/dts/qcom/sdm845-oneplus-enchilada.dtb: opp-table: opp-200000000:opp-hz:0: [200000000, 0, 0, 150000000, 0, 0, 0, 0, 300000000] is too long Fixes: 3cb16ad69bef ("dt-bindings: opp: accept array of frequencies") Suggested-by: Rob Herring <[email protected]> Signed-off-by: David Heidelberg <[email protected]> Reviewed-by: Dhruva Gole <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Viresh Kumar <[email protected]>
2024-03-11OPP: debugfs: Fix warning around icc_get_name()Viresh Kumar1-2/+4
If the kernel isn't built with interconnect support, icc_get_name() returns NULL and we get following warning: drivers/opp/debugfs.c: In function 'bw_name_read': drivers/opp/debugfs.c:43:42: error: '%.62s' directive argument is null [-Werror=format-overflow=] i = scnprintf(buf, sizeof(buf), "%.62s\n", icc_get_name(path)); Fix it. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: 0430b1d5704b0 ("opp: Expose bandwidth information via debugfs") Signed-off-by: Viresh Kumar <[email protected]> Reviewed-by: Dhruva Gole <[email protected]>
2024-03-11OPP: debugfs: Fix warning with W=1 buildsViresh Kumar1-4/+4
We currently get the following warning: debugfs.c:105:54: error: '%d' directive output may be truncated writing between 1 and 11 bytes into a region of size 8 [-Werror=format-truncation=] snprintf(name, sizeof(name), "supply-%d", i); ^~ debugfs.c:105:46: note: directive argument in the range [-2147483644, 2147483646] snprintf(name, sizeof(name), "supply-%d", i); ^~~~~~~~~~~ debugfs.c:105:17: note: 'snprintf' output between 9 and 19 bytes into a destination of size 15 snprintf(name, sizeof(name), "supply-%d", i); Fix this and other potential issues it by allocating larger arrays. Use the exact string format to allocate the arrays without getting into these issues again. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Viresh Kumar <[email protected]> Reviewed-by: Dhruva Gole <[email protected]>
2024-03-11cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.hViresh Kumar2-20/+16
Move the declaration of functions defined in the OPP core to pm_opp.h. These were added to cpufreq.h as it was the only user of the APIs, but that was a mistake perhaps. Fix it. Signed-off-by: Viresh Kumar <[email protected]>
2024-03-11OPP: Extend dev_pm_opp_data with turbo supportSibi Sankar2-0/+3
Let's extend the dev_pm_opp_data with a turbo variable, to allow users to specify if it's a boost frequency for a dynamically added OPP. Signed-off-by: Sibi Sankar <[email protected]> Signed-off-by: Viresh Kumar <[email protected]>
2024-03-10mailbox: imx: support i.MX95 Generic/ELE/V2X MUPeng Fan1-0/+3
Add i.MX95 Generic/ELE/V2X MU support, its register layout is same as i.MX8ULP, but the Parameter registers would show different TR/RR. Since the driver already supports get TR/RR from Parameter registers, not hardcoding the number, this patch just add the compatible entry to reuse i.MX8ULP S4 cfg data. Signed-off-by: Peng Fan <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2024-03-10mailbox: imx: populate sub-nodesPeng Fan1-0/+3
Some MUs such as i.MX95 MU, have internal SRAM which could be used for SCMI shared memory, so populate the sub-nodes to use the SRAM. Signed-off-by: Peng Fan <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2024-03-10mailbox: imx: get RR/TR registers num from Parameter registerPeng Fan1-11/+36
i.MX8ULP, i.MX93 MU has a Parameter register encoded as below: BIT: 15 --- 8 | 7 --- 0 RR_NUM TR_NUM So to make driver easy to support more variants, get the RR/TR registers number from Parameter register. The patch only adds support the specific MU, such as ELE MU. For generic MU, not add support for number larger than 4. Reviewed-by: Sascha Hauer <[email protected]> Signed-off-by: Peng Fan <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2024-03-10mailbox: imx: support return value of initPeng Fan1-11/+24
There will be changes that init may fail, so adding return value for init function. Reviewed-by: Sascha Hauer <[email protected]> Signed-off-by: Peng Fan <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2024-03-10dt-bindings: mailbox: fsl,mu: add i.MX95 Generic/ELE/V2X MU compatiblePeng Fan1-1/+57
Add i.MX95 Generic, Secure Enclave and V2X Message Unit compatible string. And the MUs in AONMIX has internal RAMs for SCMI shared buffer usage. Reviewed-by: Conor Dooley <[email protected]> Signed-off-by: Peng Fan <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2024-03-10Linux 6.8Linus Torvalds1-1/+1
2024-03-10hwmon: (dell-smm) Add XPS 9315 to fan control whitelistArmin Wolf1-0/+13
A user reported that on this machine, disabling BIOS fan control is necessary in order to change the fan speed. Signed-off-by: Armin Wolf <[email protected]> Acked-by: Pali Rohár <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2024-03-10Merge tag 'trace-ring-buffer-v6.8-rc7' of ↵Linus Torvalds4-94/+120
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Do not allow large strings (> 4096) as single write to trace_marker The size of a string written into trace_marker was determined by the size of the sub-buffer in the ring buffer. That size is dependent on the PAGE_SIZE of the architecture as it can be mapped into user space. But on PowerPC, where PAGE_SIZE is 64K, that made the limit of the string of writing into trace_marker 64K. One of the selftests looks at the size of the ring buffer sub-buffers and writes that plus more into the trace_marker. The write will take what it can and report back what it consumed so that the user space application (like echo) will write the rest of the string. The string is stored in the ring buffer and can be read via the "trace" or "trace_pipe" files. The reading of the ring buffer uses vsnprintf(), which uses a precision "%.*s" to make sure it only reads what is stored in the buffer, as a bug could cause the string to be non terminated. With the combination of the precision change and the PAGE_SIZE of 64K allowing huge strings to be added into the ring buffer, plus the test that would actually stress that limit, a bug was reported that the precision used was too big for "%.*s" as the string was close to 64K in size and the max precision of vsnprintf is 32K. Linus suggested not to have that precision as it could hide a bug if the string was again stored without a nul byte. Another issue that was brought up is that the trace_seq buffer is also based on PAGE_SIZE even though it is not tied to the architecture limit like the ring buffer sub-buffer is. Having it be 64K * 2 is simply just too big and wasting memory on systems with 64K page sizes. It is now hardcoded to 8K which is what all other architectures with 4K PAGE_SIZE has. Finally, the write to trace_marker is now limited to 4K as there is no reason to write larger strings into trace_marker. - ring_buffer_wait() should not loop. The ring_buffer_wait() does not have the full context (yet) on if it should loop or not. Just exit the loop as soon as its woken up and let the callers decide to loop or not (they already do, so it's a bit redundant). - Fix shortest_full field to be the smallest amount in the ring buffer that a waiter is waiting for. The "shortest_full" field is updated when a new waiter comes in and wants to wait for a smaller amount of data in the ring buffer than other waiters. But after all waiters are woken up, it's not reset, so if another waiter comes in wanting to wait for more data, it will be woken up when the ring buffer has a smaller amount from what the previous waiters were waiting for. - The wake up all waiters on close is incorrectly called frome .release() and not from .flush() so it will never wake up any waiters as the .release() will not get called until all .read() calls are finished. And the wakeup is for the waiters in those .read() calls. * tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use .flush() call to wake up readers ring-buffer: Fix resetting of shortest_full ring-buffer: Fix waking up ring buffer readers tracing: Limit trace_marker writes to just 4K tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE tracing: Remove precision vsnprintf() check from print event
2024-03-10Merge tag 'phy-fixes3-6.8' of ↵Linus Torvalds1-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy fixes from Vinod Koul: - fixes for Qualcomm qmp-combo driver for ordering of drm and type-c switch registartion due to drivers might not probe defer after having registered child devices to avoid triggering a probe deferral loop. This fixes internal display on Lenovo ThinkPad X13s * tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: qcom-qmp-combo: fix type-c switch registration phy: qcom-qmp-combo: fix drm bridge registration
2024-03-10tracing: Use .flush() call to wake up readersSteven Rostedt (Google)1-6/+15
The .release() function does not get called until all readers of a file descriptor are finished. If a thread is blocked on reading a file descriptor in ring_buffer_wait(), and another thread closes the file descriptor, it will not wake up the other thread as ring_buffer_wake_waiters() is called by .release(), and that will not get called until the .read() is finished. The issue originally showed up in trace-cmd, but the readers are actually other processes with their own file descriptors. So calling close() would wake up the other tasks because they are blocked on another descriptor then the one that was closed(). But there's other wake ups that solve that issue. When a thread is blocked on a read, it can still hang even when another thread closed its descriptor. This is what the .flush() callback is for. Have the .flush() wake up the readers. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: linke li <[email protected]> Cc: Rabin Vincent <[email protected]> Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-03-10ring-buffer: Fix resetting of shortest_fullSteven Rostedt (Google)1-7/+23
The "shortest_full" variable is used to keep track of the waiter that is waiting for the smallest amount on the ring buffer before being woken up. When a tasks waits on the ring buffer, it passes in a "full" value that is a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to 100% full buffer. As all waiters are on the same wait queue, the wake up happens for the waiter with the smallest percentage. The problem is that the smallest_full on the cpu_buffer that stores the smallest amount doesn't get reset when all the waiters are woken up. It does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace). This means that tasks may be woken up more often then when they want to be. Instead, have the shortest_full field get reset just before waking up all the tasks. If the tasks wait again, they will update the shortest_full before sleeping. Also add locking around setting of shortest_full in the poll logic, and change "work" to "rbwork" to match the variable name for rb_irq_work structures that are used in other places. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: linke li <[email protected]> Cc: Rabin Vincent <[email protected]> Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-03-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-15/+120
Pull kvm fixes from Paolo Bonzini: "KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable from userspace, so there would be no way to write to a read-only guest_memfd). - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely for development and testing. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes. x86 fixes: - Fix missing marking of a guest page as dirty when emulating an atomic access. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC in non-preemptible mode). - Disable AMD DebugSwap by default, it breaks VMSA signing and will be re-enabled with a better VM creation API in 6.10. - Do the cache flush of converted pages in svm_register_enc_region() before dropping kvm->lock, to avoid a race with unregistering of the same region and the consequent use-after-free issue" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: SEV: disable SEV-ES DebugSwap by default KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-10ring-buffer: Fix waking up ring buffer readersSteven Rostedt (Google)1-71/+68
A task can wait on a ring buffer for when it fills up to a specific watermark. The writer will check the minimum watermark that waiters are waiting for and if the ring buffer is past that, it will wake up all the waiters. The waiters are in a wait loop, and will first check if a signal is pending and then check if the ring buffer is at the desired level where it should break out of the loop. If a file that uses a ring buffer closes, and there's threads waiting on the ring buffer, it needs to wake up those threads. To do this, a "wait_index" was used. Before entering the wait loop, the waiter will read the wait_index. On wakeup, it will check if the wait_index is different than when it entered the loop, and will exit the loop if it is. The waker will only need to update the wait_index before waking up the waiters. This had a couple of bugs. One trivial one and one broken by design. The trivial bug was that the waiter checked the wait_index after the schedule() call. It had to be checked between the prepare_to_wait() and the schedule() which it was not. The main bug is that the first check to set the default wait_index will always be outside the prepare_to_wait() and the schedule(). That's because the ring_buffer_wait() doesn't have enough context to know if it should break out of the loop. The loop itself is not needed, because all the callers to the ring_buffer_wait() also has their own loop, as the callers have a better sense of what the context is to decide whether to break out of the loop or not. Just have the ring_buffer_wait() block once, and if it gets woken up, exit the function and let the callers decide what to do next. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: linke li <[email protected]> Cc: Rabin Vincent <[email protected]> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-03-10erofs: support compressed inodes over fscacheJingbo Xu4-20/+77
Since fscache can utilize iov_iter to write dest buffers, bio_vec can be used in this way too. To simplify this, pseudo bios are prepared and bio_vec will be filled with bio_add_page(). And a common .bi_end_io will be called directly to handle I/O completions. Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2024-03-10erofs: make iov_iter describe target buffers over fscacheJingbo Xu1-112/+123
So far the fscache mode supports uncompressed data only, and the data read from fscache is put directly into the target page cache. As the support for compressed data in fscache mode is going to be introduced, rework the fscache internals so that the following compressed part could make the raw data read from fscache be directed to the target buffer it wants, decompress the raw data, and finally fill the page cache with the decompressed data. As the first step, a new structure, i.e. erofs_fscache_io (io), is introduced to describe a generic read request from the fscache, while the caller can specify the target buffer it wants in the iov_iter structure (io->iter). Besides, the caller can also specify its completion callback and private data through erofs_fscache_io, which will be called to make further handling, e.g. unlocking the page cache for uncompressed data or decompressing the read raw data, when the read request from the fscache completes. Now erofs_fscache_read_io_async() serves as a generic interface for reading raw data from fscache for both compressed and uncompressed data. The erofs_fscache_rq structure is kept to describe a request to fill the page cache in the specified range. Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2024-03-10erofs: fix lockdep false positives on initializing erofs_pseudo_mntBaokun Li3-31/+15
Lockdep reported the following issue when mounting erofs with a domain_id: ============================================ WARNING: possible recursive locking detected 6.8.0-rc7-xfstests #521 Not tainted -------------------------------------------- mount/396 is trying to acquire lock: ffff907a8aaaa0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 but task is already holding lock: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&type->s_umount_key#50/1); lock(&type->s_umount_key#50/1); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by mount/396: #0: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 #1: ffffffffc00e6f28 (erofs_domain_list_lock){+.+.}-{3:3}, at: erofs_fscache_register_fs+0x3d/0x270 [erofs] stack backtrace: CPU: 1 PID: 396 Comm: mount Not tainted 6.8.0-rc7-xfstests #521 Call Trace: <TASK> dump_stack_lvl+0x64/0xb0 validate_chain+0x5c4/0xa00 __lock_acquire+0x6a9/0xd50 lock_acquire+0xcd/0x2b0 down_write_nested+0x45/0xd0 alloc_super+0xe3/0x3d0 sget_fc+0x62/0x2f0 vfs_get_super+0x21/0x90 vfs_get_tree+0x2c/0xf0 fc_mount+0x12/0x40 vfs_kern_mount.part.0+0x75/0x90 kern_mount+0x24/0x40 erofs_fscache_register_fs+0x1ef/0x270 [erofs] erofs_fc_fill_super+0x213/0x380 [erofs] This is because the file_system_type of both erofs and the pseudo-mount point of domain_id is erofs_fs_type, so two successive calls to alloc_super() are considered to be using the same lock and trigger the warning above. Therefore add a nodev file_system_type called erofs_anon_fs_type in fscache.c to silence this complaint. Because kern_mount() takes a pointer to struct file_system_type, not its (string) name. So we don't need to call register_filesystem(). In addition, call init_pseudo() in erofs_anon_init_fs_context() as suggested by Al Viro, so that we can remove erofs_fc_fill_pseudo_super(), erofs_fc_anon_get_tree(), and erofs_anon_context_ops. Suggested-by: Al Viro <[email protected]> Fixes: a9849560c55e ("erofs: introduce a pseudo mnt to manage shared cookies") Signed-off-by: Baokun Li <[email protected]> Reviewed-and-tested-by: Jingbo Xu <[email protected]> Reviewed-by: Yang Erkun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2024-03-10erofs: refine managed cache operations to foliosGao Xiang6-48/+34
Convert erofs_try_to_free_all_cached_pages() and z_erofs_cache_release_folio(). Besides, erofs_page_is_managed() is moved to zdata.c and renamed as erofs_folio_is_managed(). Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-10erofs: convert z_erofs_submissionqueue_endio() to foliosGao Xiang1-11/+11
Use bio_for_each_folio() to iterate over each folio in the bio and there is no large folios for now. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-10erofs: convert z_erofs_fill_bio_vec() to foliosGao Xiang1-35/+36
Introduce a folio member to `struct z_erofs_bvec` and convert most of z_erofs_fill_bio_vec() to folios, which is still straight-forward. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-10erofs: get rid of `justfound` debugging tagGao Xiang1-17/+3
`justfound` is introduced to identify cached folios that are just added to compressed bvecs so that more checks can be applied in the I/O submission path. EROFS is quite now stable compared to the codebase at that stage. `justfound` becomes a burden for upcoming features. Drop it. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-10erofs: convert z_erofs_do_read_page() to foliosGao Xiang1-16/+15
It is a straight-forward conversion. Besides, it's renamed as z_erofs_scan_folio(). Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-10erofs: convert z_erofs_onlinepage_.* to foliosGao Xiang1-28/+22
Online folios are locked file-backed folios which will eventually keep decoded (e.g. decompressed) data of each inode for end users to utilize. It may belong to a few pclusters and contain other data (e.g. compressed data for inplace I/Os) temporarily in a time-sharing manner to reduce memory footprints for low-ended storage devices with high latencies under heary I/O pressure. Apart from folio_end_read() usage, it's a straight-forward conversion. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-09exec: Simplify remove_arg_zero() error pathKees Cook1-7/+3
We don't need the "out" label any more, so remove "ret" and return directly on error. Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Kees Cook <[email protected]> --- Cc: Eric Biederman <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jan Kara <[email protected]> Cc: [email protected] Cc: [email protected]
2024-03-09pstore/zone: Don't clear memory twiceChristophe JAILLET1-1/+0
There is no need to call memset(..., 0, ...) on memory allocated by kcalloc(). It is already zeroed. Remove the redundant call. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/fa2597400051c18c6ca11187b0e4b906729991b2.1709972649.git.christophe.jaillet@wanadoo.fr Signed-off-by: Kees Cook <[email protected]>
2024-03-09NFSD: Clean up nfsd4_encode_replay()Chuck Lever2-16/+31
Replace open-coded encoding logic with the use of conventional XDR utility functions. Add a tracepoint to make replays observable in field troubleshooting situations. The WARN_ON is removed. A stack trace is of little use, as there is only one call site for nfsd4_encode_replay(), and a buffer length shortage here is unlikely. Signed-off-by: Chuck Lever <[email protected]>
2024-03-09Merge tag 'i2c-for-6.8-rc8' of ↵Linus Torvalds3-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Two patches from Heiner for the i801 are targeting muxes discovered while working on some other features. Essentially, there is a reordering when adding optional slaves and proper cleanup upon registering a mux device. Christophe fixes the exit path in the wmt driver that was leaving the clocks hanging, and the last fix from Tommy avoids false error reports in IRQ" * tag 'i2c-for-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: aspeed: Fix the dummy irq expected print i2c: wmt: Fix an error handling path in wmt_i2c_probe() i2c: i801: Avoid potential double call to gpiod_remove_lookup_table i2c: i801: Fix using mux_pdev before it's set
2024-03-09Merge tag 'firewire-fixes-6.8-final' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "A fix to suppress a warning about unreleased IRQ for 1394 OHCI hardware when disabling MSI. In Linux kernel v6.5, a PCI driver for 1394 OHCI hardware was optimized into the managed device resources. Edmund Raile points out that the change brings the warning about unreleased IRQ at the call of pci_disable_msi(), since the API expects that the relevant IRQ has already been released in advance. As long as the API is called in .remove callback of PCI device operation, it is prohibited to maintain the IRQ as the part of managed device resource. As a workaround, the IRQ is explicitly released at .remove callback, before the call of pci_disable_msi(). pci_disable_msi() is legacy API nowadays in PCI MSI implementation. I have a plan to replace it with the modern API in the development for the future version of Linux kernel. So at present I keep them as is" * tag 'firewire-fixes-6.8-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: ohci: prevent leak of left-over IRQ on unbind
2024-03-09SEV: disable SEV-ES DebugSwap by defaultPaolo Bonzini1-2/+5
The DebugSwap feature of SEV-ES provides a way for confidential guests to use data breakpoints. However, because the status of the DebugSwap feature is recorded in the VMSA, enabling it by default invalidates the attestation signatures. In 6.10 we will introduce a new API to create SEV VMs that will allow enabling DebugSwap based on what the user tells KVM to do. Contextually, we will change the legacy KVM_SEV_ES_INIT API to never enable DebugSwap. For compatibility with kernels that pre-date the introduction of DebugSwap, as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable the feature by default. If anybody wants to use it, for now they can enable the sev_es_debug_swap_enabled module parameter, but this will result in a warning. Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES") Cc: [email protected] Signed-off-by: Paolo Bonzini <[email protected]>
2024-03-09Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of ↵Paolo Bonzini5-6/+28
https://github.com/kvm-x86/linux into HEAD KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating ABI that KVM can't sanely support. - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely a development and testing vehicle, and come with zero guarantees. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini3-0/+78
KVM x86 fixes for 6.8, round 2: - When emulating an atomic access, mark the gfn as dirty in the memslot to fix a bug where KVM could fail to mark the slot as dirty during live migration, ultimately resulting in guest data corruption due to a dirty page not being re-copied from the source to the target. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention. Contending mmu_lock is especially problematic on preemptible kernels, as KVM may yield mmu_lock in response to the contention, which severely degrades overall performance due to vCPUs making it difficult for the task that triggered invalidation to make forward progress. Note, due to another kernel bug, this fix isn't limited to preemtible kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield contended rwlocks and spinlocks. https://lore.kernel.org/all/[email protected]