Age | Commit message (Collapse) | Author | Files | Lines |
|
Gavrilov Ilia says:
====================
fix incorrect parameter validation in the *_get_sockopt() functions
This v2 series fix incorrent parameter validation in *_get_sockopt()
functions in several places.
version 2 changes:
- reword the patch description
- add two patches for net/kcm and net/x25
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Gavrilov Ilia <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Signed-off-by: Gavrilov Ilia <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Gavrilov Ilia <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
Reviewed-by: Tom Parkin <[email protected]>
Signed-off-by: Gavrilov Ilia <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 'olr' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'olr' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Gavrilov Ilia <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 'len' variable can't be negative when assigned the result of
'min_t' because all 'min_t' parameters are cast to unsigned int,
and then the minimum one is chosen.
To fix the logic, check 'len' as read from 'optlen',
where the types of relevant variables are (signed) int.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Gavrilov Ilia <[email protected]>
Reviewed-by: Jason Xing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Herve Codina says:
====================
Add support for QMC HDLC
This series introduces the QMC HDLC support.
Patches were previously sent as part of a full feature series and were
previously reviewed in that context:
"Add support for QMC HDLC, framer infrastructure and PEF2256 framer" [1]
In order to ease the merge, the full feature series has been split and
needed parts were merged in v6.8-rc1:
- "Prepare the PowerQUICC QMC and TSA for the HDLC QMC driver" [2]
- "Add support for framer infrastructure and PEF2256 framer" [3]
This series contains patches related to the QMC HDLC part (QMC HDLC
driver):
- Introduce the QMC HDLC driver (patches 1 and 2)
- Add timeslots change support in QMC HDLC (patch 3)
- Add framer support as a framer consumer in QMC HDLC (patch 4)
Compare to the original full feature series, a modification was done on
patch 3 in order to use a coherent prefix in the commit title.
I kept the patches unsquashed as they were previously sent and reviewed.
Of course, I can squash them if needed.
Compared to the previous iteration:
https://lore.kernel.org/linux-kernel/[email protected]/
this v7 series mainly:
- Rename a variable.
- Fix reverse xmas tree declarations.
- Add 'Acked-by' tag.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add framer support in the fsl_qmc_hdlc driver in order to be able to
signal carrier changes to the network stack based on the framer status
Also use this framer to provide information related to the E1/T1 line
interface on IF_GET_IFACE and configure the line interface according to
IF_IFACE_{E1,T1} information.
Signed-off-by: Herve Codina <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
QMC channels support runtime timeslots changes but nothing is done at
the QMC HDLC driver to handle these changes.
Use existing IFACE ioctl in order to configure the timeslots to use.
Signed-off-by: Herve Codina <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
These helpers scatters or gathers a bitmap with the help of the mask
position bits parameter.
bitmap_scatter() does the following:
src: 0000000001011010
||||||
+------+|||||
| +----+||||
| |+----+|||
| || +-+||
| || | ||
mask: ...v..vv...v..vv
...0..11...0..10
dst: 0000001100000010
and bitmap_gather() performs this one:
mask: ...v..vv...v..vv
src: 0000001100000010
^ ^^ ^ 0
| || | 10
| || > 010
| |+--> 1010
| +--> 11010
+----> 011010
dst: 0000000000011010
bitmap_gather() can the seen as the reverse bitmap_scatter() operation.
Signed-off-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Co-developed-by: Herve Codina <[email protected]>
Signed-off-by: Herve Codina <[email protected]>
Acked-by: Yury Norov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After contributing the driver, add myself as the maintainer for the
Freescale QMC HDLC driver.
Signed-off-by: Herve Codina <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The QMC HDLC driver provides support for HDLC using the QMC (QUICC
Multichannel Controller) to transfer the HDLC data.
Signed-off-by: Herve Codina <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ethtool: ice: Support for RSS settings to GTP
Takeru Hayasaka enables RSS functionality for GTP packets on ice driver
with ethtool.
A user can include TEID and make RSS work for GTP-U over IPv4 by doing the
following:`ethtool -N ens3 rx-flow-hash gtpu4 sde`
In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e,
gtpu(4|6)u, and gtpu(4|6)d.
gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does
not include a TEID.
gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that
includes a TEID.
gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios.
gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6.
gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended
header includes Uplink, applicable to both IPv4 and IPv6.
gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink,
for both IPv4 and IPv6.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/dt
RISC-V Devicetree fixes for v6.8-final
Starfive:
The previous cleanup broke boot on the jh7100 as the driver depended on
the fallback clock name created based on the node-name when
clock-output-names is not present. Add clock-output-names to restore
working order.
Generic:
BUILTIN_DTB has been broken for ages on any platform other than the
nommu Canaan k210 SoC as the first dtb built (in alphanumerical order),
would get built into the image. This didn't get fixed for ages because
nobody actually cared about running it other than the k210 enough to
fix it. The folks doing Sophgo SG2042 development have come along and
fixed it, as they want to use builtin dtbs. linux-boot on that platform
reuses the dtb it was provided by OpenSBI when booting linux proper,
which is unfortunately not possible to boot a mainline kernel with.
Signed-off-by: Conor Dooley <[email protected]>
* tag 'riscv-dt-fixes-for-v6.8-final' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: Move BUILTIN_DTB_SOURCE to common Kconfig
riscv: dts: starfive: jh7100: fix root clock names
Link: https://lore.kernel.org/r/20240306-waltz-facial-9e4e1b792053@spud
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
With recent changes within matrix dimensions calculation,
dropping maxItems: 1 provides a warning-free run.
Fixes warning such as:
arch/arm64/boot/dts/qcom/sdm845-oneplus-enchilada.dtb: opp-table: opp-200000000:opp-hz:0: [200000000, 0, 0, 150000000, 0, 0, 0, 0, 300000000] is too long
Fixes: 3cb16ad69bef ("dt-bindings: opp: accept array of frequencies")
Suggested-by: Rob Herring <[email protected]>
Signed-off-by: David Heidelberg <[email protected]>
Reviewed-by: Dhruva Gole <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
|
|
If the kernel isn't built with interconnect support, icc_get_name()
returns NULL and we get following warning:
drivers/opp/debugfs.c: In function 'bw_name_read':
drivers/opp/debugfs.c:43:42: error: '%.62s' directive argument is null [-Werror=format-overflow=]
i = scnprintf(buf, sizeof(buf), "%.62s\n", icc_get_name(path));
Fix it.
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Fixes: 0430b1d5704b0 ("opp: Expose bandwidth information via debugfs")
Signed-off-by: Viresh Kumar <[email protected]>
Reviewed-by: Dhruva Gole <[email protected]>
|
|
We currently get the following warning:
debugfs.c:105:54: error: '%d' directive output may be truncated writing between 1 and 11 bytes into a region of size 8 [-Werror=format-truncation=]
snprintf(name, sizeof(name), "supply-%d", i);
^~
debugfs.c:105:46: note: directive argument in the range [-2147483644, 2147483646]
snprintf(name, sizeof(name), "supply-%d", i);
^~~~~~~~~~~
debugfs.c:105:17: note: 'snprintf' output between 9 and 19 bytes into a destination of size 15
snprintf(name, sizeof(name), "supply-%d", i);
Fix this and other potential issues it by allocating larger arrays.
Use the exact string format to allocate the arrays without getting into
these issues again.
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Viresh Kumar <[email protected]>
Reviewed-by: Dhruva Gole <[email protected]>
|
|
Move the declaration of functions defined in the OPP core to pm_opp.h.
These were added to cpufreq.h as it was the only user of the APIs, but
that was a mistake perhaps. Fix it.
Signed-off-by: Viresh Kumar <[email protected]>
|
|
Let's extend the dev_pm_opp_data with a turbo variable, to allow users to
specify if it's a boost frequency for a dynamically added OPP.
Signed-off-by: Sibi Sankar <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
|
|
Add i.MX95 Generic/ELE/V2X MU support, its register layout is same as
i.MX8ULP, but the Parameter registers would show different
TR/RR. Since the driver already supports get TR/RR from Parameter
registers, not hardcoding the number, this patch just add
the compatible entry to reuse i.MX8ULP S4 cfg data.
Signed-off-by: Peng Fan <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
|
|
Some MUs such as i.MX95 MU, have internal SRAM which could be used
for SCMI shared memory, so populate the sub-nodes to use the SRAM.
Signed-off-by: Peng Fan <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
|
|
i.MX8ULP, i.MX93 MU has a Parameter register encoded as below:
BIT: 15 --- 8 | 7 --- 0
RR_NUM TR_NUM
So to make driver easy to support more variants, get the RR/TR
registers number from Parameter register.
The patch only adds support the specific MU, such as ELE MU.
For generic MU, not add support for number larger than 4.
Reviewed-by: Sascha Hauer <[email protected]>
Signed-off-by: Peng Fan <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
|
|
There will be changes that init may fail, so adding return value for
init function.
Reviewed-by: Sascha Hauer <[email protected]>
Signed-off-by: Peng Fan <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
|
|
Add i.MX95 Generic, Secure Enclave and V2X Message Unit compatible string.
And the MUs in AONMIX has internal RAMs for SCMI shared buffer usage.
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Peng Fan <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>
|
|
|
|
A user reported that on this machine, disabling BIOS fan control
is necessary in order to change the fan speed.
Signed-off-by: Armin Wolf <[email protected]>
Acked-by: Pali Rohár <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Guenter Roeck <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Do not allow large strings (> 4096) as single write to trace_marker
The size of a string written into trace_marker was determined by the
size of the sub-buffer in the ring buffer. That size is dependent on
the PAGE_SIZE of the architecture as it can be mapped into user
space. But on PowerPC, where PAGE_SIZE is 64K, that made the limit of
the string of writing into trace_marker 64K.
One of the selftests looks at the size of the ring buffer sub-buffers
and writes that plus more into the trace_marker. The write will take
what it can and report back what it consumed so that the user space
application (like echo) will write the rest of the string. The string
is stored in the ring buffer and can be read via the "trace" or
"trace_pipe" files.
The reading of the ring buffer uses vsnprintf(), which uses a
precision "%.*s" to make sure it only reads what is stored in the
buffer, as a bug could cause the string to be non terminated.
With the combination of the precision change and the PAGE_SIZE of 64K
allowing huge strings to be added into the ring buffer, plus the test
that would actually stress that limit, a bug was reported that the
precision used was too big for "%.*s" as the string was close to 64K
in size and the max precision of vsnprintf is 32K.
Linus suggested not to have that precision as it could hide a bug if
the string was again stored without a nul byte.
Another issue that was brought up is that the trace_seq buffer is
also based on PAGE_SIZE even though it is not tied to the
architecture limit like the ring buffer sub-buffer is. Having it be
64K * 2 is simply just too big and wasting memory on systems with 64K
page sizes. It is now hardcoded to 8K which is what all other
architectures with 4K PAGE_SIZE has.
Finally, the write to trace_marker is now limited to 4K as there is
no reason to write larger strings into trace_marker.
- ring_buffer_wait() should not loop.
The ring_buffer_wait() does not have the full context (yet) on if it
should loop or not. Just exit the loop as soon as its woken up and
let the callers decide to loop or not (they already do, so it's a bit
redundant).
- Fix shortest_full field to be the smallest amount in the ring buffer
that a waiter is waiting for. The "shortest_full" field is updated
when a new waiter comes in and wants to wait for a smaller amount of
data in the ring buffer than other waiters. But after all waiters are
woken up, it's not reset, so if another waiter comes in wanting to
wait for more data, it will be woken up when the ring buffer has a
smaller amount from what the previous waiters were waiting for.
- The wake up all waiters on close is incorrectly called frome
.release() and not from .flush() so it will never wake up any waiters
as the .release() will not get called until all .read() calls are
finished. And the wakeup is for the waiters in those .read() calls.
* tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Use .flush() call to wake up readers
ring-buffer: Fix resetting of shortest_full
ring-buffer: Fix waking up ring buffer readers
tracing: Limit trace_marker writes to just 4K
tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE
tracing: Remove precision vsnprintf() check from print event
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:
- fixes for Qualcomm qmp-combo driver for ordering of drm and type-c
switch registartion due to drivers might not probe defer after having
registered child devices to avoid triggering a probe deferral loop.
This fixes internal display on Lenovo ThinkPad X13s
* tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: qcom-qmp-combo: fix type-c switch registration
phy: qcom-qmp-combo: fix drm bridge registration
|
|
The .release() function does not get called until all readers of a file
descriptor are finished.
If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
and another thread closes the file descriptor, it will not wake up the
other thread as ring_buffer_wake_waiters() is called by .release(), and
that will not get called until the .read() is finished.
The issue originally showed up in trace-cmd, but the readers are actually
other processes with their own file descriptors. So calling close() would wake
up the other tasks because they are blocked on another descriptor then the
one that was closed(). But there's other wake ups that solve that issue.
When a thread is blocked on a read, it can still hang even when another
thread closed its descriptor.
This is what the .flush() callback is for. Have the .flush() wake up the
readers.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: linke li <[email protected]>
Cc: Rabin Vincent <[email protected]>
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The "shortest_full" variable is used to keep track of the waiter that is
waiting for the smallest amount on the ring buffer before being woken up.
When a tasks waits on the ring buffer, it passes in a "full" value that is
a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to
100% full buffer.
As all waiters are on the same wait queue, the wake up happens for the
waiter with the smallest percentage.
The problem is that the smallest_full on the cpu_buffer that stores the
smallest amount doesn't get reset when all the waiters are woken up. It
does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace).
This means that tasks may be woken up more often then when they want to
be. Instead, have the shortest_full field get reset just before waking up
all the tasks. If the tasks wait again, they will update the shortest_full
before sleeping.
Also add locking around setting of shortest_full in the poll logic, and
change "work" to "rbwork" to match the variable name for rb_irq_work
structures that are used in other places.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: linke li <[email protected]>
Cc: Rabin Vincent <[email protected]>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Pull kvm fixes from Paolo Bonzini:
"KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not
writable from userspace, so there would be no way to write to a
read-only guest_memfd).
- Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
clear that such VMs are purely for development and testing.
- Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term
plan is to support confidential VMs with deterministic private
memory (SNP and TDX) only in the TDP MMU.
- Fix a bug in a GUEST_MEMFD dirty logging test that caused false
passes.
x86 fixes:
- Fix missing marking of a guest page as dirty when emulating an
atomic access.
- Check for mmu_notifier invalidation events before faulting in the
pfn, and before acquiring mmu_lock, to avoid unnecessary work and
lock contention with preemptible kernels (including
CONFIG_PREEMPT_DYNAMIC in non-preemptible mode).
- Disable AMD DebugSwap by default, it breaks VMSA signing and will
be re-enabled with a better VM creation API in 6.10.
- Do the cache flush of converted pages in svm_register_enc_region()
before dropping kvm->lock, to avoid a race with unregistering of
the same region and the consequent use-after-free issue"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
SEV: disable SEV-ES DebugSwap by default
KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
KVM: x86: Mark target gfn of emulated atomic instruction as dirty
|
|
A task can wait on a ring buffer for when it fills up to a specific
watermark. The writer will check the minimum watermark that waiters are
waiting for and if the ring buffer is past that, it will wake up all the
waiters.
The waiters are in a wait loop, and will first check if a signal is
pending and then check if the ring buffer is at the desired level where it
should break out of the loop.
If a file that uses a ring buffer closes, and there's threads waiting on
the ring buffer, it needs to wake up those threads. To do this, a
"wait_index" was used.
Before entering the wait loop, the waiter will read the wait_index. On
wakeup, it will check if the wait_index is different than when it entered
the loop, and will exit the loop if it is. The waker will only need to
update the wait_index before waking up the waiters.
This had a couple of bugs. One trivial one and one broken by design.
The trivial bug was that the waiter checked the wait_index after the
schedule() call. It had to be checked between the prepare_to_wait() and
the schedule() which it was not.
The main bug is that the first check to set the default wait_index will
always be outside the prepare_to_wait() and the schedule(). That's because
the ring_buffer_wait() doesn't have enough context to know if it should
break out of the loop.
The loop itself is not needed, because all the callers to the
ring_buffer_wait() also has their own loop, as the callers have a better
sense of what the context is to decide whether to break out of the loop
or not.
Just have the ring_buffer_wait() block once, and if it gets woken up, exit
the function and let the callers decide what to do next.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: linke li <[email protected]>
Cc: Rabin Vincent <[email protected]>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Since fscache can utilize iov_iter to write dest buffers, bio_vec can
be used in this way too.
To simplify this, pseudo bios are prepared and bio_vec will be filled
with bio_add_page(). And a common .bi_end_io will be called directly
to handle I/O completions.
Signed-off-by: Jingbo Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>
|
|
So far the fscache mode supports uncompressed data only, and the data
read from fscache is put directly into the target page cache. As the
support for compressed data in fscache mode is going to be introduced,
rework the fscache internals so that the following compressed part
could make the raw data read from fscache be directed to the target
buffer it wants, decompress the raw data, and finally fill the page
cache with the decompressed data.
As the first step, a new structure, i.e. erofs_fscache_io (io), is
introduced to describe a generic read request from the fscache, while
the caller can specify the target buffer it wants in the iov_iter
structure (io->iter). Besides, the caller can also specify its
completion callback and private data through erofs_fscache_io, which
will be called to make further handling, e.g. unlocking the page cache
for uncompressed data or decompressing the read raw data, when the read
request from the fscache completes. Now erofs_fscache_read_io_async()
serves as a generic interface for reading raw data from fscache for both
compressed and uncompressed data.
The erofs_fscache_rq structure is kept to describe a request to fill the
page cache in the specified range.
Signed-off-by: Jingbo Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>
|
|
Lockdep reported the following issue when mounting erofs with a domain_id:
============================================
WARNING: possible recursive locking detected
6.8.0-rc7-xfstests #521 Not tainted
--------------------------------------------
mount/396 is trying to acquire lock:
ffff907a8aaaa0e0 (&type->s_umount_key#50/1){+.+.}-{3:3},
at: alloc_super+0xe3/0x3d0
but task is already holding lock:
ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3},
at: alloc_super+0xe3/0x3d0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&type->s_umount_key#50/1);
lock(&type->s_umount_key#50/1);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by mount/396:
#0: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3},
at: alloc_super+0xe3/0x3d0
#1: ffffffffc00e6f28 (erofs_domain_list_lock){+.+.}-{3:3},
at: erofs_fscache_register_fs+0x3d/0x270 [erofs]
stack backtrace:
CPU: 1 PID: 396 Comm: mount Not tainted 6.8.0-rc7-xfstests #521
Call Trace:
<TASK>
dump_stack_lvl+0x64/0xb0
validate_chain+0x5c4/0xa00
__lock_acquire+0x6a9/0xd50
lock_acquire+0xcd/0x2b0
down_write_nested+0x45/0xd0
alloc_super+0xe3/0x3d0
sget_fc+0x62/0x2f0
vfs_get_super+0x21/0x90
vfs_get_tree+0x2c/0xf0
fc_mount+0x12/0x40
vfs_kern_mount.part.0+0x75/0x90
kern_mount+0x24/0x40
erofs_fscache_register_fs+0x1ef/0x270 [erofs]
erofs_fc_fill_super+0x213/0x380 [erofs]
This is because the file_system_type of both erofs and the pseudo-mount
point of domain_id is erofs_fs_type, so two successive calls to
alloc_super() are considered to be using the same lock and trigger the
warning above.
Therefore add a nodev file_system_type called erofs_anon_fs_type in
fscache.c to silence this complaint. Because kern_mount() takes a
pointer to struct file_system_type, not its (string) name. So we don't
need to call register_filesystem(). In addition, call init_pseudo() in
erofs_anon_init_fs_context() as suggested by Al Viro, so that we can
remove erofs_fc_fill_pseudo_super(), erofs_fc_anon_get_tree(), and
erofs_anon_context_ops.
Suggested-by: Al Viro <[email protected]>
Fixes: a9849560c55e ("erofs: introduce a pseudo mnt to manage shared cookies")
Signed-off-by: Baokun Li <[email protected]>
Reviewed-and-tested-by: Jingbo Xu <[email protected]>
Reviewed-by: Yang Erkun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>
|
|
Convert erofs_try_to_free_all_cached_pages() and
z_erofs_cache_release_folio().
Besides, erofs_page_is_managed() is moved to zdata.c and renamed
as erofs_folio_is_managed().
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Use bio_for_each_folio() to iterate over each folio in the bio and
there is no large folios for now.
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Introduce a folio member to `struct z_erofs_bvec` and convert most
of z_erofs_fill_bio_vec() to folios, which is still straight-forward.
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
`justfound` is introduced to identify cached folios that are just added
to compressed bvecs so that more checks can be applied in the I/O
submission path.
EROFS is quite now stable compared to the codebase at that stage.
`justfound` becomes a burden for upcoming features. Drop it.
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It is a straight-forward conversion. Besides, it's renamed as
z_erofs_scan_folio().
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Online folios are locked file-backed folios which will eventually
keep decoded (e.g. decompressed) data of each inode for end users to
utilize. It may belong to a few pclusters and contain other data (e.g.
compressed data for inplace I/Os) temporarily in a time-sharing manner
to reduce memory footprints for low-ended storage devices with high
latencies under heary I/O pressure.
Apart from folio_end_read() usage, it's a straight-forward conversion.
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
We don't need the "out" label any more, so remove "ret" and return
directly on error.
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
---
Cc: Eric Biederman <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
There is no need to call memset(..., 0, ...) on memory allocated by
kcalloc(). It is already zeroed.
Remove the redundant call.
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/fa2597400051c18c6ca11187b0e4b906729991b2.1709972649.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Kees Cook <[email protected]>
|
|
Replace open-coded encoding logic with the use of conventional XDR
utility functions. Add a tracepoint to make replays observable in
field troubleshooting situations.
The WARN_ON is removed. A stack trace is of little use, as there is
only one call site for nfsd4_encode_replay(), and a buffer length
shortage here is unlikely.
Signed-off-by: Chuck Lever <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two patches from Heiner for the i801 are targeting muxes discovered
while working on some other features. Essentially, there is a
reordering when adding optional slaves and proper cleanup upon
registering a mux device.
Christophe fixes the exit path in the wmt driver that was leaving the
clocks hanging, and the last fix from Tommy avoids false error reports
in IRQ"
* tag 'i2c-for-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: aspeed: Fix the dummy irq expected print
i2c: wmt: Fix an error handling path in wmt_i2c_probe()
i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
i2c: i801: Fix using mux_pdev before it's set
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
"A fix to suppress a warning about unreleased IRQ for 1394 OHCI
hardware when disabling MSI.
In Linux kernel v6.5, a PCI driver for 1394 OHCI hardware was
optimized into the managed device resources. Edmund Raile points out
that the change brings the warning about unreleased IRQ at the call of
pci_disable_msi(), since the API expects that the relevant IRQ has
already been released in advance.
As long as the API is called in .remove callback of PCI device
operation, it is prohibited to maintain the IRQ as the part of managed
device resource. As a workaround, the IRQ is explicitly released at
.remove callback, before the call of pci_disable_msi().
pci_disable_msi() is legacy API nowadays in PCI MSI implementation. I
have a plan to replace it with the modern API in the development for
the future version of Linux kernel. So at present I keep them as is"
* tag 'firewire-fixes-6.8-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: prevent leak of left-over IRQ on unbind
|
|
The DebugSwap feature of SEV-ES provides a way for confidential guests to use
data breakpoints. However, because the status of the DebugSwap feature is
recorded in the VMSA, enabling it by default invalidates the attestation
signatures. In 6.10 we will introduce a new API to create SEV VMs that
will allow enabling DebugSwap based on what the user tells KVM to do.
Contextually, we will change the legacy KVM_SEV_ES_INIT API to never
enable DebugSwap.
For compatibility with kernels that pre-date the introduction of DebugSwap,
as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable
the feature by default. If anybody wants to use it, for now they can enable
the sev_es_debug_swap_enabled module parameter, but this will result in a
warning.
Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES")
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
https://github.com/kvm-x86/linux into HEAD
KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
avoid creating ABI that KVM can't sanely support.
- Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
clear that such VMs are purely a development and testing vehicle, and
come with zero guarantees.
- Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
is to support confidential VMs with deterministic private memory (SNP
and TDX) only in the TDP MMU.
- Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
|
|
KVM x86 fixes for 6.8, round 2:
- When emulating an atomic access, mark the gfn as dirty in the memslot
to fix a bug where KVM could fail to mark the slot as dirty during live
migration, ultimately resulting in guest data corruption due to a dirty
page not being re-copied from the source to the target.
- Check for mmu_notifier invalidation events before faulting in the pfn,
and before acquiring mmu_lock, to avoid unnecessary work and lock
contention. Contending mmu_lock is especially problematic on preemptible
kernels, as KVM may yield mmu_lock in response to the contention, which
severely degrades overall performance due to vCPUs making it difficult
for the task that triggered invalidation to make forward progress.
Note, due to another kernel bug, this fix isn't limited to preemtible
kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield
contended rwlocks and spinlocks.
https://lore.kernel.org/all/[email protected]
|