Age | Commit message (Collapse) | Author | Files | Lines |
|
https://git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull clocksource/events updates from Daniel Lezcano:
- Refactor resource allocation on the Exynos_mct driver without
functional changes (Marek Szyprowski)
- Add imx8ulp compatible string for NPX TPM driver (Jacky Bai)
- Fix comma introduced by error by replacing it by the initial
semicolon on the Exynos_mct (Will Deacon)
- Add OSTM driver support on Renesas. The reset line must be
deasserted before accessing the registers. This change depends on an
external change resulting in a shared immutable branch
'reset/of-get-optional-exclusive' from
git://git.pengutronix.de/pza/linux (Biju Das)
- Make the OSTM Kconfig option visible to user in order to let him
disable it when ARM architected timers is enabled (Biju Das)
- Tag two variables on iMX sysctr _ro_afterinit (Peng Fan)
- Set the cpumask to cpu_possible_mask in order to have full benefit
of the DYNIRQ flag on iMX sysctr (Peng Fan)
- Tag __maybe_unused a variable in the Pistachio timer driver in order
to fix a warning reported by the kernel test robot (Drew Fustini)
- Add MStar MSC313e timer support and the ssd20xd-based variant, as
well as the DT bindings (Romain Perier)
- Remove the incompatible compatible string for the rk3066 (Johan
Jonker)
- Fix dts_check warnings on the cadence ttc driver by adding the power
domain bindings (Michal Simek)
Link: https://lore.kernel.org/lkml/[email protected]
|
|
Move NFS to using fscache DIO API instead of the old upstream I/O API as
that has been removed. This is a stopgap solution as the intention is that
at sometime in the future, the cache will move to using larger blocks and
won't be able to store individual pages in order to deal with the potential
for data corruption due to the backing filesystem being able insert/remove
bridging blocks of zeros into its extent list[1].
NFS then reads and writes cache pages synchronously and one page at a time.
The preferred change would be to use the netfs lib, but the new I/O API can
be used directly. It's just that as the cache now needs to track data for
itself, caching blocks may exceed page size...
This code is somewhat borrowed from my "fallback I/O" patchset[2].
Changes
=======
ver #3:
- Restore lost =n fallback for nfs_fscache_release_page()[2].
Signed-off-by: David Howells <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Acked-by: Jeff Layton <[email protected]>
cc: Trond Myklebust <[email protected]>
cc: Anna Schumaker <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected] [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [2]
Link: https://lore.kernel.org/r/163906981318.143852.17220018647843475985.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967184451.1823006.6450645559828329590.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021577632.640689.11069627070150063812.stgit@warthog.procyon.org.uk/ # v4
|
|
Change the nfs filesystem to support fscache's indexing rewrite and
reenable caching in nfs.
The following changes have been made:
(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.
(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.
For nfs, I've made it render the volume name string as:
"nfs,<ver>,<family>,<address>,<port>,<fsidH>,<fsidL>*<,param>[,<uniq>]"
(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.
fscache_acquire_cookie() is passed the same keying and coherency
information as before.
(4) fscache_enable/disable_cookie() have been removed.
Call fscache_use_cookie() and fscache_unuse_cookie() when a file is
opened or closed to prevent a cache file from being culled and to keep
resources to hand that are needed to do I/O.
If a file is opened for writing, we invalidate it with
FSCACHE_INVAL_DIO_WRITE in lieu of doing writeback to the cache,
thereby making it cease caching until all currently open files are
closed. This should give the same behaviour as the uptream code.
Making the cache store local modifications isn't straightforward for
NFS, so that's left for future patches.
(5) fscache_invalidate() now needs to be given uptodate auxiliary data and
a file size. It also takes a flag to indicate if this was due to a
DIO write.
(6) Call nfs_fscache_invalidate() with FSCACHE_INVAL_DIO_WRITE on a file
to which a DIO write is made.
(7) Call fscache_note_page_release() from nfs_release_page().
(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.
(9) The functions to read and write data to/from the cache are stubbed out
pending a conversion to use netfslib.
Changes
=======
ver #3:
- Added missing =n fallback for nfs_fscache_release_file()[1][2].
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.
- Remove NFS_INO_FSCACHE as it's no longer used.
- Need to unuse a cookie on file-release, not inode-clear.
Signed-off-by: Dave Wysochanski <[email protected]>
Co-developed-by: David Howells <[email protected]>
Signed-off-by: David Howells <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Acked-by: Jeff Layton <[email protected]>
cc: Trond Myklebust <[email protected]>
cc: Anna Schumaker <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/163819668938.215744.14448852181937731615.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906979003.143852.2601189243864854724.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967182112.1823006.7791504655391213379.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021575950.640689.12069642327533368467.stgit@warthog.procyon.org.uk/ # v4
|
|
- new driver to support for LetSketch device (Hans de Goede)
|
|
- locking performance improvement for hidraw code (André Almeida)
|
|
- support for USI style pens (Tero Kristo, Mika Westerberg)
- quirk for devices that need inverted X/Y axes (Alistair Francis)
- small core code cleanups and deduplication (Benjamin Tissoires)
|
|
Merge in fixes directly in prep for the 5.17 merge window.
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Replace kfree_skb() with kfree_skb_reason() in __udp4_lib_rcv.
New drop reason 'SKB_DROP_REASON_UDP_CSUM' is added for udp csum
error.
Signed-off-by: Menglong Dong <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Replace kfree_skb() with kfree_skb_reason() in tcp_v4_rcv(). Following
drop reasons are added:
SKB_DROP_REASON_NO_SOCKET
SKB_DROP_REASON_PKT_TOO_SMALL
SKB_DROP_REASON_TCP_CSUM
SKB_DROP_REASON_TCP_FILTER
After this patch, 'kfree_skb' event will print message like this:
$ TASK-PID CPU# ||||| TIMESTAMP FUNCTION
$ | | | ||||| | |
<idle>-0 [000] ..s1. 36.113438: kfree_skb: skbaddr=(____ptrval____) protocol=2048 location=(____ptrval____) reason: NO_SOCKET
The reason of skb drop is printed too.
Signed-off-by: Menglong Dong <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Introduce the interface kfree_skb_reason(), which is able to pass
the reason why the skb is dropped to 'kfree_skb' tracepoint.
Add the 'reason' field to 'trace_kfree_skb', therefor user can get
more detail information about abnormal skb with 'drop_monitor' or
eBPF.
All drop reasons are defined in the enum 'skb_drop_reason', and
they will be print as string in 'kfree_skb' tracepoint in format
of 'reason: XXX'.
( Maybe the reasons should be defined in a uapi header file, so that
user space can use them? )
Signed-off-by: Menglong Dong <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Netfilter conntrack maintains NAT flags per connection indicating
whether NAT was configured for the connection. Openvswitch maintains
NAT flags on the per packet flow key ct_state field, indicating
whether NAT was actually executed on the packet.
When a packet misses from tc to ovs the conntrack NAT flags are set.
However, NAT was not necessarily executed on the packet because the
connection's state might still be in NEW state. As such, openvswitch
wrongly assumes that NAT was executed and sets an incorrect flow key
NAT flags.
Fix this, by flagging to openvswitch which NAT was actually done in
act_ct via tc_skb_ext and tc_skb_cb to the openvswitch module, so
the packet flow key NAT flags will be correctly set.
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Paul Blakey <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next. This
includes one patch to update ovs and act_ct to use nf_ct_put() instead
of nf_conntrack_put().
1) Add netns_tracker to nfnetlink_log and masquerade, from Eric Dumazet.
2) Remove redundant rcu read-size lock in nf_tables packet path.
3) Replace BUG() by WARN_ON_ONCE() in nft_payload.
4) Consolidate rule verdict tracing.
5) Replace WARN_ON() by WARN_ON_ONCE() in nf_tables core.
6) Make counter support built-in in nf_tables.
7) Add new field to conntrack object to identify locally generated
traffic, from Florian Westphal.
8) Prevent NAT from shadowing well-known ports, from Florian Westphal.
9) Merge nf_flow_table_{ipv4,ipv6} into nf_flow_table_inet, also from
Florian.
10) Remove redundant pointer in nft_pipapo AVX2 support, from Colin Ian King.
11) Replace opencoded max() in conntrack, from Jiapeng Chong.
12) Update conntrack to use refcount_t API, from Florian Westphal.
13) Move ip_ct_attach indirection into the nf_ct_hook structure.
14) Constify several pointer object in the netfilter codebase,
from Florian Westphal.
15) Tree-wide replacement of nf_conntrack_put() by nf_ct_put(), also
from Florian.
16) Fix egress splat due to incorrect rcu notation, from Florian.
17) Move stateful fields of connlimit, last, quota, numgen and limit
out of the expression data area.
18) Build a blob to represent the ruleset in nf_tables, this is a
requirement of the new register tracking infrastructure.
19) Add NFT_REG32_NUM to define the maximum number of 32-bit registers.
20) Add register tracking infrastructure to skip redundant
store-to-register operations, this includes support for payload,
meta and bitwise expresssions.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: (32 commits)
netfilter: nft_meta: cancel register tracking after meta update
netfilter: nft_payload: cancel register tracking after payload update
netfilter: nft_bitwise: track register operations
netfilter: nft_meta: track register operations
netfilter: nft_payload: track register operations
netfilter: nf_tables: add register tracking infrastructure
netfilter: nf_tables: add NFT_REG32_NUM
netfilter: nf_tables: add rule blob layout
netfilter: nft_limit: move stateful fields out of expression data
netfilter: nft_limit: rename stateful structure
netfilter: nft_numgen: move stateful fields out of expression data
netfilter: nft_quota: move stateful fields out of expression data
netfilter: nft_last: move stateful fields out of expression data
netfilter: nft_connlimit: move stateful fields out of expression data
netfilter: egress: avoid a lockdep splat
net: prefer nf_ct_put instead of nf_conntrack_put
netfilter: conntrack: avoid useless indirection during conntrack destruction
netfilter: make function op structures const
netfilter: core: move ip_ct_attach indirection to struct nf_ct_hook
netfilter: conntrack: convert to refcount_t api
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
include/linux/netfilter_netdev.h:97 suspicious rcu_dereference_check() usage!
2 locks held by sd-resolve/1100:
0: ..(rcu_read_lock_bh){1:3}, at: ip_finish_output2
1: ..(rcu_read_lock_bh){1:3}, at: __dev_queue_xmit
__dev_queue_xmit+0 ..
The helper has two callers, one uses rcu_read_lock, the other
rcu_read_lock_bh(). Annotate the dereference to reflect this.
Fixes: 42df6e1d221dd ("netfilter: Introduce egress hook")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
nf_ct_put() results in a usesless indirection:
nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock +
indirect call of ct_hooks->destroy().
There are two _put helpers:
nf_ct_put and nf_conntrack_put. The latter is what should be used in
code that MUST NOT cause a linker dependency on the conntrack module
(e.g. calls from core network stack).
Everyone else should call nf_ct_put() instead.
A followup patch will convert a few nf_conntrack_put() calls to
nf_ct_put(), in particular from modules that already have a conntrack
dependency such as act_ct or even nf_conntrack itself.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
No functional changes, these structures should be const.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
ip_ct_attach predates struct nf_ct_hook, we can place it there and
remove the exported symbol.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Convert nf_conn reference counting from atomic_t to refcount_t based api.
refcount_t api provides more runtime sanity checks and will warn on
certain constructs, e.g. refcount_inc() on a zero reference count, which
usually indicates use-after-free.
For this reason template allocation is changed to init the refcount to
1, the subsequenct add operations are removed.
Likewise, init_conntrack() is changed to set the initial refcount to 1
instead refcount_inc().
This is safe because the new entry is not (yet) visible to other cpus.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
My patch to rework oabi fcntl64() introduced a harmless
sparse warning when file locking is disabled:
arch/arm/kernel/sys_oabi-compat.c:251:51: sparse: sparse: incorrect type in argument 3 (different address spaces) @@ expected struct flock64 [noderef] __user *user @@ got struct flock64 * @@
arch/arm/kernel/sys_oabi-compat.c:251:51: sparse: expected struct flock64 [noderef] __user *user
arch/arm/kernel/sys_oabi-compat.c:251:51: sparse: got struct flock64 *
arch/arm/kernel/sys_oabi-compat.c:265:55: sparse: sparse: incorrect type in argument 4 (different address spaces) @@ expected struct flock64 [noderef] __user *user @@ got struct flock64 * @@
arch/arm/kernel/sys_oabi-compat.c:265:55: sparse: expected struct flock64 [noderef] __user *user
arch/arm/kernel/sys_oabi-compat.c:265:55: sparse: got struct flock64 *
When file locking is enabled, everything works correctly and the
right data gets passed, but the stub declarations in linux/fs.h
did not get modified when the calling conventions changed in an
earlier patch.
Reported-by: kernel test robot <[email protected]>
Fixes: 7e2d8c29ecdd ("ARM: 9111/1: oabi-compat: rework fcntl64() emulation")
Fixes: a75d30c77207 ("fs/locks: pass kernel struct flock to fcntl_getlk/setlk")
Cc: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
|
|
Move the 'inline' keyword to the front of 'void'.
Remove a warning found by clang(make W=1 LLVM=1)
./include/linux/blk-mq.h:259:1: warning: ‘inline’ is not at beginning of
declaration
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
If something went wrong during the TPM firmware upgrade, like power
failure or the firmware image file get corrupted, the TPM might end
up in Upgrade or Failure mode upon the next start. The state is
persistent between the TPM power cycle/restart.
According to TPM specification:
* If the TPM is in Upgrade mode, it will answer with TPM2_RC_UPGRADE
to all commands except TPM2_FieldUpgradeData(). It may also accept
other commands if it is able to complete them using the previously
installed firmware.
* If the TPM is in Failure mode, it will allow performing TPM
initialization but will not provide any crypto operations.
Will happily respond to Field Upgrade calls.
Change the behavior of the tpm2_auto_startup(), so it detects the active
running mode of the TPM by adding the following checks. If
tpm2_do_selftest() call returns TPM2_RC_UPGRADE, the TPM is in Upgrade
mode.
If the TPM is in Failure mode, it will successfully respond to both
tpm2_do_selftest() and tpm2_startup() calls. Although, will fail to
answer to tpm2_get_cc_attrs_tbl(). Use this fact to conclude that TPM is
in Failure mode.
If detected that the TPM is in the Upgrade or Failure mode, the function
sets TPM_CHIP_FLAG_FIRMWARE_UPGRADE_MODE flag.
The TPM_CHIP_FLAG_FIRMWARE_UPGRADE_MODE flag is used later during driver
initialization/deinitialization to disable functionality which makes no
sense or will fail in the current TPM state. Following functionality is
affected:
* Do not register TPM as a hwrng
* Do not register sysfs entries which provide information impossible to
obtain in limited mode
* Do not register resource managed character device
Signed-off-by: axelj <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
|
|
NFSv4.1 supports an optional lock notification feature which notifies
the client when a lock comes available. (Normally NFSv4 clients just
poll for locks if necessary.) To make that work, we need to request a
blocking lock from the filesystem.
We turned that off for NFS in commit f657f8eef3ff ("nfs: don't atempt
blocking locks on nfs reexports") [sic] because it actually blocks the
nfsd thread while waiting for the lock.
Thanks to Vasily Averin for pointing out that NFS isn't the only
filesystem with that problem.
Any filesystem that leaves ->lock NULL will use posix_lock_file(), which
does the right thing. Simplest is just to assume that any filesystem
that defines its own ->lock is not safe to request a blocking lock from.
So, this patch mostly reverts commit f657f8eef3ff ("nfs: don't atempt
blocking locks on nfs reexports") [sic] and commit b840be2f00c0 ("lockd:
don't attempt blocking locks on nfs reexports"), and instead uses a
check of ->lock (Vasily's suggestion) to decide whether to support
blocking lock notifications on a given filesystem. Also add a little
documentation.
Perhaps someday we could add back an export flag later to allow
filesystems with "good" ->lock methods to support blocking lock
notifications.
Reported-by: Vasily Averin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[ cel: Description rewritten to address checkpatch nits ]
[ cel: Fixed warning when SUNRPC debugging is disabled ]
[ cel: Fixed NULL check ]
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Vasily Averin <[email protected]>
|
|
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
All profile_handoff_task does is notify the task_free_notifier chain.
The helpers task_handoff_register and task_handoff_unregister are used
to add and delete entries from that chain and are never called.
So remove the dead code and make it much easier to read and reason
about __put_task_struct.
Suggested-by: Al Viro <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
When I say remove I mean remove. All profile_task_exit and
profile_munmap do is call a blocking notifier chain. The helpers
profile_task_register and profile_task_unregister are not called
anywhere in the tree. Which means this is all dead code.
So remove the dead code and make it easier to read do_exit.
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
This helper is misleading. It tests for an ongoing exec as well as
the process having received a fatal signal.
Sometimes it is appropriate to treat an on-going exec differently than
a process that is shutting down due to a fatal signal. In particular
taking the fast path out of exit_signals instead of retargeting
signals is not appropriate during exec, and not changing the the exit
code in do_group_exit during exec.
Removing the helper makes it more obvious what is going on as both
cases must be coded for explicitly.
While removing the helper fix the two cases where I have observed
using signal_group_exit resulted in the wrong result.
In exit_signals only test for SIGNAL_GROUP_EXIT so that signals are
retargetted during an exec.
In do_group_exit use 0 as the exit code during an exec as de_thread
does not set group_exit_code. As best as I can determine
group_exit_code has been is set to 0 most of the time during
de_thread. During a thread group stop group_exit_code is set to the
stop signal and when the thread group receives SIGCONT group_exit_code
is reset to 0.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
The only remaining user of group_exit_task is exec. Rename the field
so that it is clear which part of the code uses it.
Update the comment above the definition of group_exec_task to document
how it is currently used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
After the previous cleanups "signal->core_state" is set whenever
SIGNAL_GROUP_COREDUMP is set and "signal->core_state" is tested
whenver the code wants to know if a coredump is in progress. The
remaining tests of SIGNAL_GROUP_COREDUMP also test to see if
SIGNAL_GROUP_EXIT is set. Similarly the only place that sets
SIGNAL_GROUP_COREDUMP also sets SIGNAL_GROUP_EXIT.
Which makes SIGNAL_GROUP_COREDUMP unecessary and redundant. So stop
setting SIGNAL_GROUP_COREDUMP, stop testing SIGNAL_GROUP_COREDUMP, and
remove it's definition.
With the setting of SIGNAL_GROUP_COREDUMP gone, coredump_finish no
longer needs to clear SIGNAL_GROUP_COREDUMP out of signal->flags
by setting SIGNAL_GROUP_EXIT.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
When CONFIG_SYSFS is not enabled provide stubs for the helper functions
to not break their callers.
Fixes: 539b9c94ac83 ("power: supply: add helpers for charge_behaviour sysfs")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Thomas Weißschuh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
The point of using set_child_tid to hold the kthread pointer was that
it already did what is necessary. There are now restrictions on when
set_child_tid can be initialized and when set_child_tid can be used in
schedule_tail. Which indicates that continuing to use set_child_tid
to hold the kthread pointer is a bad idea.
Instead of continuing to use the set_child_tid field of task_struct
generalize the pf_io_worker field of task_struct and use it to hold
the kthread pointer.
Rename pf_io_worker (which is a void * pointer) to worker_private so
it can be used to store kthreads struct kthread pointer. Update the
kthread code to store the kthread pointer in the worker_private field.
Remove the places where set_child_tid had to be dealt with carefully
because kthreads also used it.
Link: https://lkml.kernel.org/r/CAHk-=wgtFAA9SbVYg0gR1tqPMC17-NYcs0GQkaYg1bGhh1uJQQ@mail.gmail.com
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We currently store large folios as 2^N consecutive entries. While this
consumes rather more memory than necessary, it also turns out to be buggy.
A writeback operation which starts within a tail page of a dirty folio will
not write back the folio as the xarray's dirty bit is only set on the
head index. With multi-index entries, the dirty bit will be found no
matter where in the folio the operation starts.
This does end up simplifying the page cache slightly, although not as
much as I had hoped.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
|
|
Add a new helper function to help iterate over multi-index entries.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
|
|
All of its callers now call folio_batch_remove_exceptionals().
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
|
|
find_lock_entries() already only returned the head page of folios, so
convert it to return a folio_batch instead of a pagevec. That cascades
through converting truncate_inode_pages_range() to
delete_from_page_cache_batch() and page_cache_delete_batch().
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
|
|
The callers have all been converted to work on folios, so convert
find_get_entries() to return a batch of folios instead of pages.
We also now return multiple large folios in a single call.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
|
|
Convert all callers of truncate_inode_page() to call
truncate_inode_folio() instead, and move the declaration to mm/internal.h.
Move the assertion that the caller is not passing in a tail page to
generic_error_remove_page(). We can't entirely remove the struct page
from the callers yet because the page pointer in the pvec might be a
shadow/dax/swap entry instead of actually a page.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
|
|
Convert both callers of unmap_mapping_page() to call unmap_mapping_folio()
instead. Also move zap_details from linux/mm.h to mm/memory.c
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
|
|
Pull drm fixes from Dave Airlie:
"There is only the amdgpu runtime pm regression fix in here:
amdgpu:
- suspend/resume fix
- fix runtime PM regression"
* tag 'drm-fixes-2022-01-07' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: disable runpm if we are the primary adapter
fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb
drm/amd/pm: keep the BACO feature enabled for suspend
|
|
Commit b42faeee718c ("spi: Add a PTP system timestamp
to the transfer structure") added an include of ptp_clock_kernel.h
to spi.h for struct ptp_system_timestamp but a forward declaration
is enough. Let's use that to limit the number of objects we have
to rebuild every time we touch networking headers.
Signed-off-by: Jakub Kicinski <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
This adds basic support for delivering 2 level event channels to a guest.
Initially, it only supports delivery via the IRQ routing table, triggered
by an eventfd. In order to do so, it has a kvm_xen_set_evtchn_fast()
function which will use the pre-mapped shared_info page if it already
exists and is still valid, while the slow path through the irqfd_inject
workqueue will remap the shared_info page if necessary.
It sets the bits in the shared_info page but not the vcpu_info; that is
deferred to __kvm_xen_has_interrupt() which raises the vector to the
appropriate vCPU.
Add a 'verbose' mode to xen_shinfo_test while adding test cases for this.
Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
This can be used in two modes. There is an atomic mode where the cached
mapping is accessed while holding the rwlock, and a mode where the
physical address is used by a vCPU in guest mode.
For the latter case, an invalidation will wake the vCPU with the new
KVM_REQ_GPC_INVALIDATE, and the architecture will need to refresh any
caches it still needs to access before entering guest mode again.
Only one vCPU can be targeted by the wake requests; it's simple enough
to make it wake all vCPUs or even a mask but I don't see a use case for
that additional complexity right now.
Invalidation happens from the invalidate_range_start MMU notifier, which
needs to be able to sleep in order to wake the vCPU and wait for it.
This means that revalidation potentially needs to "wait" for the MMU
operation to complete and the invalidate_range_end notifier to be
invoked. Like the vCPU when it takes a page fault in that period, we
just spin — fixing that in a future patch by implementing an actual
*wait* may be another part of shaving this particularly hirsute yak.
As noted in the comments in the function itself, the only case where
the invalidate_range_start notifier is expected to be called *without*
being able to sleep is when the OOM reaper is killing the process. In
that case, we expect the vCPU threads already to have exited, and thus
there will be nothing to wake, and no reason to wait. So we clear the
KVM_REQUEST_WAIT bit and send the request anyway, then complain loudly
if there actually *was* anything to wake up.
Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
The various kvm_write_guest() and mark_page_dirty() functions must only
ever be called in the context of an active vCPU, because if dirty ring
tracking is enabled it may simply oops when kvm_get_running_vcpu()
returns NULL for the vcpu and then kvm_dirty_ring_get() dereferences it.
This oops was reported by "butt3rflyh4ck" <[email protected]> in
https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/
That actual bug will be fixed under separate cover but this warning
should help to prevent new ones from being added.
Signed-off-by: David Woodhouse <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.16
- Simplification of the 'vcpu first run' by integrating it into
KVM's 'pid change' flow
- Refactoring of the FP and SVE state tracking, also leading to
a simpler state and less shared data between EL1 and EL2 in
the nVHE case
- Tidy up the header file usage for the nvhe hyp object
- New HYP unsharing mechanism, finally allowing pages to be
unmapped from the Stage-1 EL2 page-tables
- Various pKVM cleanups around refcounting and sharing
- A couple of vgic fixes for bugs that would trigger once
the vcpu xarray rework is merged, but not sooner
- Add minimal support for ARMv8.7's PMU extension
- Rework kvm_pgtable initialisation ahead of the NV work
- New selftest for IRQ injection
- Teach selftests about the lack of default IPA space and
page sizes
- Expand sysreg selftest to deal with Pointer Authentication
- The usual bunch of cleanups and doc update
|
|
Add a stat counter of culling events whereby the cache backend culls a file
to make space (when asked by cachefilesd in this case) and display in
/proc/fs/fscache/stats.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819654165.215744.3797804661644212436.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906961387.143852.9291157239960289090.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967168266.1823006.14436200166581605746.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021567619.640689.4339228906248763197.stgit@warthog.procyon.org.uk/ # v4
|
|
Add stat counters of no-space events that caused caching not to happen and
display in /proc/fs/fscache/stats.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819653216.215744.17210522251617386509.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906958369.143852.7257100711818401748.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967166917.1823006.14842444049198947892.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021566184.640689.4417328329632709265.stgit@warthog.procyon.org.uk/ # v4
|
|
Store the volume coherency data in an xattr and check it when we rebind the
volume. If it doesn't match the cache volume is moved to the graveyard and
rebuilt anew.
Changes
=======
ver #4:
- Remove a couple of debugging prints.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/163967164397.1823006.2950539849831291830.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021563138.640689.15851092065380543119.stgit@warthog.procyon.org.uk/ # v4
|
|
Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by
the kernel to prevent cachefiles or other kernel services from interfering
with that file.
Alter rmdir to reject attempts to remove a directory marked with this flag.
This is used by cachefiles to prevent cachefilesd from removing them.
Using S_SWAPFILE instead isn't really viable as that has other effects in
the I/O paths.
Changes
=======
ver #3:
- Check for the object pointer being NULL in the tracepoints rather than
the caller.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819630256.215744.4815885535039369574.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906931596.143852.8642051223094013028.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967141000.1823006.12920680657559677789.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021541207.640689.564689725898537127.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a function to change the size of the storage attached to a cookie,
to match the size of the file being cached when it's changed by truncate or
fallocate:
void fscache_resize_cookie(struct fscache_cookie *cookie,
loff_t new_size);
This acts synchronously and is expected to run under the inode lock of the
caller.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819621839.215744.7895597119803515402.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906922387.143852.16394459879816147793.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967128998.1823006.10740669081985775576.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021527861.640689.3466382085497236267.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a function to be called from a network filesystem's releasepage
method to indicate that a page has been released that might have been a
reflection of data upon the server - and now that data must be reloaded
from the server or the cache.
This is used to end an optimisation for empty files, in particular files
that have just been created locally, whereby we know there cannot yet be
any data that we would need to read from the server or the cache.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819617128.215744.4725572296135656508.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906920354.143852.7511819614661372008.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967128061.1823006.611781655060034988.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021525963.640689.9264556596205140044.stgit@warthog.procyon.org.uk/ # v4
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-01-06
1) Expose FEC per lane block counters via ethtool
2) Trivial fixes/updates/cleanup to mlx5e netdev driver
3) Fix htmldoc build warning
4) Spread mlx5 SFs (sub-functions) to all available CPU cores: Commits 1..5
Shay Drory Says:
================
Before this patchset, mlx5 subfunction shared the same IRQs (MSI-X) with
their peers subfunctions, causing them to use same CPU cores.
In large scale, this is very undesirable, SFs use small number of cpu
cores and all of them will be packed on the same CPU cores, not
utilizing all CPU cores in the system.
In this patchset we want to achieve two things.
a) Spread IRQs used by SFs to all cpu cores
b) Pack less SFs in the same IRQ, will result in multiple IRQs per core.
In this patchset, we spread SFs over all online cpus available to mlx5
irqs in Round-Robin manner. e.g.: Whenever a SF is created, pick the next
CPU core with least number of SF IRQs bound to it, SFs will share IRQs on
the same core until a certain limit, when such limit is reached, we
request a new IRQ and add it to that CPU core IRQ pool, when out of IRQs,
pick any IRQ with least number of SF users.
This enhancement is done in order to achieve a better distribution of
the SFs over all the available CPUs, which reduces application latency,
as shown bellow.
Machine details:
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
PCI Express 3 with BW of 126 Gb/s.
ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0
x16.
Base line test description:
Single SF on the system. One instance of netperf is running on-top the
SF.
Numbers: latency = 15.136 usec, CPU Util = 35%
Test description:
There are 250 SFs on the system. There are 3 instances of netperf
running, on-top three different SFs, in parallel.
Perf numbers:
# netperf SFs latency(usec) latency CPU utilization
affinity affinity (lower is better) increase %
1 cpu=0 cpu={0} ~23 (app 1-3) 35% 75%
2 cpu=0,2,4 cpu={0} app 1: 21.625 30% 68% (CPU 0)
app 2-3: 16.5 9% 15% (CPU 2,4)
3 cpu=0 cpu={0,2,4} app 1: ~16 7% 84% (CPU 0)
app 2-3: ~17.9 14% 22% (CPU 2,4)
4 cpu=0,2,4 cpu={0,2,4} 15.2 (app 1-3) 0% 33% (CPU 0,2,4)
- The first two entries (#1 and #2) show current state. e.g.: SFs are
using the same CPU. The last two entries (#3 and #4) shows the latency
reduction improvement of this patch. e.g.: SFs are on different CPUs.
- Whenever we use several CPUs, in case there is a different CPU
utilization, write the utilization of each CPU separately.
- Whenever the latency result of the netperf instances were different,
write the latency of each netperf instances separately.
Commands:
- for netperf CPU=0:
$ for i in {1..3}; do taskset -c 0 netperf -H 1${i}.1.1.1 -t TCP_RR -- \
-o RT_LATENCY -r8 & done
- for netperf CPU=0,2,4
$ for i in {1..3}; do taskset -c $(( ($i - 1) * 2 )) netperf -H \
1${i}.1.1.1 -t TCP_RR -- -o RT_LATENCY -r8 & done
================
====================
Signed-off-by: David S. Miller <[email protected]>
|