Age | Commit message (Collapse) | Author | Files | Lines |
|
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move dnotify sysctls to dnotify.c and use the new
register_sysctl_init() to register the sysctl interface.
[[email protected]: adjust the commit log to justify the move]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Acked-by: Jan Kara <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Qing Wang <[email protected]>
Cc: Sebastian Reichel <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Stephen Kitt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Antti Palosaari <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Lukas Middendorf <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Phillip Potter <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
Move aio sysctl to aio.c and use the new register_sysctl_init() to
register the sysctl interface for aio.
[[email protected]: adjust commit log to justify the move]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Qing Wang <[email protected]>
Cc: Sebastian Reichel <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Stephen Kitt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Antti Palosaari <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Lukas Middendorf <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Phillip Potter <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
So move hung_task sysctl interface to hung_task.c and use
register_sysctl() to register the sysctl interface.
[[email protected]: commit log refresh and fixed 2-3 0day reported compile issues]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qing Wang <[email protected]>
Cc: Sebastian Reichel <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Stephen Kitt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Antti Palosaari <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Lukas Middendorf <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Phillip Potter <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
sysctl has helpers which let us specify boundary values for a min or max
int value. Since these are used for a boundary check only they don't
change, so move these variables to sysctl_vals to avoid adding duplicate
variables. This will help with our cleanup of kernel/sysctl.c.
[[email protected]: update it for "mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%"]
[[email protected]: major rebase]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Qing Wang <[email protected]>
Cc: Sebastian Reichel <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Stephen Kitt <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Antti Palosaari <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Lukas Middendorf <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Phillip Potter <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "sysctl: first set of kernel/sysctl cleanups", v2.
Finally had time to respin the series of the work we had started last
year on cleaning up the kernel/sysct.c kitchen sink. People keeps
stuffing their sysctls in that file and this creates a maintenance
burden. So this effort is aimed at placing sysctls where they actually
belong.
I'm going to split patches up into series as there is quite a bit of
work.
This first set adds register_sysctl_init() for uses of registerting a
sysctl on the init path, adds const where missing to a few places,
generalizes common values so to be more easy to share, and starts the
move of a few kernel/sysctl.c out where they belong.
The majority of rework on v2 in this first patch set is 0-day fixes.
Eric Biederman's feedback is later addressed in subsequent patch sets.
I'll only post the first two patch sets for now. We can address the
rest once the first two patch sets get completely reviewed / Acked.
This patch (of 9):
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
Today though folks heavily rely on tables on kernel/sysctl.c so they can
easily just extend this table with their needed sysctls. In order to
help users move their sysctls out we need to provide a helper which can
be used during code initialization.
We special-case the initialization use of register_sysctl() since it
*is* safe to fail, given all that sysctls do is provide a dynamic
interface to query or modify at runtime an existing variable. So the
use case of register_sysctl() on init should *not* stop if the sysctls
don't end up getting registered. It would be counter productive to stop
boot if a simple sysctl registration failed.
Provide a helper for init then, and document the recommended init levels
to use for callers of this routine. We will later use this in
subsequent patches to start slimming down kernel/sysctl.c tables and
moving sysctl registration to the code which actually needs these
sysctls.
[[email protected]: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Sebastian Reichel <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Qing Wang <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Stephen Kitt <[email protected]>
Cc: Antti Palosaari <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Clemens Ladisch <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Julia Lawall <[email protected]>
Cc: Lukas Middendorf <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Phillip Potter <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This fixes the FIXME in migrate_vma_check_page().
Before migrating a page migration code will take a reference and check
there are no unexpected page references, failing the migration if there
are. When a thread faults on a migration entry it will take a temporary
reference to the page to wait for the page to become unlocked signifying
the migration entry has been removed.
This reference is dropped just prior to waiting on the page lock,
however the extra reference can cause migration failures so it is
desirable to avoid taking it.
As migration code already has a reference to the migrating page an extra
reference to wait on PG_locked is unnecessary so long as the reference
can't be dropped whilst setting up the wait.
When faulting on a migration entry the ptl is taken to check the
migration entry. Removing a migration entry also requires the ptl, and
migration code won't drop its page reference until after the migration
entry has been removed. Therefore retaining the ptl of a migration
entry is sufficient to ensure the page has a reference. Reworking
migration_entry_wait() to hold the ptl until the wait setup is complete
means the extra page reference is no longer needed.
[[email protected]: v5]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alistair Popple <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add a comment into fscache_note_page_release() to explain how the
page-release optimisation logic works[1]. It's not entirely obvious as it
has nothing to do with whether or not the netfs file contains data.
FSCACHE_COOKIE_NO_DATA_TO_READ is set if we have no data in the cache yet
(ie. the backing file lookup was negative, the file is 0 length or the
cookie got invalidated). It means that we have no data in the cache, not
that the file is necessarily empty on the server.
FSCACHE_COOKIE_HAVE_DATA is set once we've stored data in the backing file.
From that point on, we have data we *could* read - however, it's covered by
pages in the netfs pagecache until at such time one of those covering pages
is released.
So if we've written data to the cache (HAVE_DATA) and there wasn't any data
in the cache when we started (NO_DATA_TO_READ), it may no longer be true
that we can skip reading from the cache.
Read skipping is done by cachefiles_prepare_read().
Note that tracking is not done on a per-page basis, but only on a per-file
basis.
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/164251408479.3435901.9540165422908194636.stgit@warthog.procyon.org.uk/ # v1
|
|
Pull block fixes from Jens Axboe:
"Various little minor fixes that should go into this release:
- Fix issue with cloned bios and IO accounting (Christoph)
- Remove redundant assignments (Colin, GuoYong)
- Fix an issue with the mq-deadline async_depth sysfs interface (me)
- Fix brd module loading race (Tetsuo)
- Shared tag map wakeup fix (Laibin)
- End of bdev read fix (OGAWA)
- srcu leak fix (Ming)"
* tag 'block-5.17-2022-01-21' of git://git.kernel.dk/linux-block:
block: fix async_depth sysfs interface for mq-deadline
block: Fix wrong offset in bio_truncate()
block: assign bi_bdev for cloned bios in blk_rq_prep_clone
block: cleanup q->srcu
block: Remove unnecessary variable assignment
brd: remove brd_devices_mutex mutex
aoe: remove redundant assignment on variable n
loop: remove redundant initialization of pointer node
blk-mq: fix tag_get wait task can't be awakened
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Two new drivers this cycle and a significant rework of the CMOS driver
make the bulk of the changes.
I also carry powerpc changes with the agreement of Michael.
New drivers:
- Sunplus SP7021 RTC
- Nintendo GameCube, Wii and Wii U RTC
Driver updates:
- cmos: refactor UIP handling and presence check, fix century
- rs5c372: offset correction support, report low voltage
- rv8803: Epson RX8804 support"
* tag 'rtc-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (33 commits)
rtc: sunplus: fix return value in sp_rtc_probe()
rtc: cmos: Evaluate century appropriate
rtc: gamecube: Fix an IS_ERR() vs NULL check
rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
dt-bindings: rtc: qcom-pm8xxx-rtc: update register numbers
rtc: pxa: fix null pointer dereference
rtc: ftrtc010: Use platform_get_irq() to get the interrupt
rtc: Move variable into switch case statement
rtc: pcf2127: Fix typo in comment
dt-bindings: rtc: Add Sunplus RTC json-schema
rtc: Add driver for RTC in Sunplus SP7021
rtc: rs5c372: fix incorrect oscillation value on r2221tl
rtc: rs5c372: add offset correction support
rtc: cmos: avoid UIP when writing alarm time
rtc: cmos: avoid UIP when reading alarm time
rtc: mc146818-lib: refactor mc146818_does_rtc_work
rtc: mc146818-lib: refactor mc146818_get_time
rtc: mc146818-lib: extract mc146818_avoid_UIP
rtc: mc146818-lib: fix RTC presence check
rtc: Check return value from mc146818_get_time()
...
|
|
Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.
Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.
Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
|
|
We can get rid of all the empty stubs because all these functions call
of_property_read_variable_u{8,16,32,64}_array() which already have an
empty stub if CONFIG_OF is not defined.
Signed-off-by: Michael Walle <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Make all the smaller variants of the of_parse_phandle() static inline.
This also let us remove the empty function stubs if CONFIG_OF is not
defined.
Suggested-by: Rob Herring <[email protected]>
Signed-off-by: Michael Walle <[email protected]>
[robh: move index < 0 check into __of_parse_phandle_with_args]
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Pull ceph updates from Ilya Dryomov:
"The highlight is the new mount "device" string syntax implemented by
Venky Shankar. It solves some long-standing issues with using
different auth entities and/or mounting different CephFS filesystems
from the same cluster, remounting and also misleading /proc/mounts
contents. The existing syntax of course remains to be maintained.
On top of that, there is a couple of fixes for edge cases in quota and
a new mount option for turning on unbuffered I/O mode globally instead
of on a per-file basis with ioctl(CEPH_IOC_SYNCIO)"
* tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client:
ceph: move CEPH_SUPER_MAGIC definition to magic.h
ceph: remove redundant Lsx caps check
ceph: add new "nopagecache" option
ceph: don't check for quotas on MDS stray dirs
ceph: drop send metrics debug message
rbd: make const pointer spaces a static const array
ceph: Fix incorrect statfs report for small quota
ceph: mount syntax module parameter
doc: document new CephFS mount device syntax
ceph: record updated mon_addr on remount
ceph: new device mount syntax
libceph: rename parse_fsid() to ceph_parse_fsid() and export
libceph: generalize addr/ip parsing based on delimiter
|
|
The link extended sub-states are assigned as enum that is an integer
size but read from a union as u8, this is working for small values on
little endian systems but for big endian this always give 0. Fix the
variable in the union to match the enum size.
Fixes: ecc31c60240b ("ethtool: Add link extended state")
Signed-off-by: Moshe Tal <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In one net namespace, after creating a packet socket without binding
it to a device, users in other net namespaces can observe the new
`packet_type` added by this packet socket by reading `/proc/net/ptype`
file. This is minor information leakage as packet socket is
namespace aware.
Add a net pointer in `packet_type` to keep the net namespace of
of corresponding packet socket. In `ptype_seq_show`, this net pointer
must be checked when it is not NULL.
Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.")
Signed-off-by: Congyu Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bpf.
Quite a handful of old regression fixes but most of those are
pre-5.16.
Current release - regressions:
- fix memory leaks in the skb free deferral scheme if upper layer
protocols are used, i.e. in-kernel TCP readers like TLS
Current release - new code bugs:
- nf_tables: fix NULL check typo in _clone() functions
- change the default to y for Vertexcom vendor Kconfig
- a couple of fixes to incorrect uses of ref tracking
- two fixes for constifying netdev->dev_addr
Previous releases - regressions:
- bpf:
- various verifier fixes mainly around register offset handling
when passed to helper functions
- fix mount source displayed for bpffs (none -> bpffs)
- bonding:
- fix extraction of ports for connection hash calculation
- fix bond_xmit_broadcast return value when some devices are down
- phy: marvell: add Marvell specific PHY loopback
- sch_api: don't skip qdisc attach on ingress, prevent ref leak
- htb: restore minimal packet size handling in rate control
- sfp: fix high power modules without diagnostic monitoring
- mscc: ocelot:
- don't let phylink re-enable TX PAUSE on the NPI port
- don't dereference NULL pointers with shared tc filters
- smsc95xx: correct reset handling for LAN9514
- cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
- phy: micrel: use kszphy_suspend/_resume for irq aware devices,
avoid races with the interrupt
Previous releases - always broken:
- xdp: check prog type before updating BPF link
- smc: resolve various races around abnormal connection termination
- sit: allow encapsulated IPv6 traffic to be delivered locally
- axienet: fix init/reset handling, add missing barriers, read the
right status words, stop queues correctly
- add missing dev_put() in sock_timestamping_bind_phc()
Misc:
- ipv4: prevent accidentally passing RTO_ONLINK to
ip_route_output_key_hash() by sanitizing flags
- ipv4: avoid quadratic behavior in netns dismantle
- stmmac: dwmac-oxnas: add support for OX810SE
- fsl: xgmac_mdio: add workaround for erratum A-009885"
* tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
ipv4: avoid quadratic behavior in netns dismantle
net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
dt-bindings: net: Document fsl,erratum-a009885
net/fsl: xgmac_mdio: Add workaround for erratum A-009885
net: mscc: ocelot: fix using match before it is set
net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
net: axienet: increase default TX ring size to 128
net: axienet: fix for TX busy handling
net: axienet: fix number of TX ring slots for available check
net: axienet: Fix TX ring slot available check
net: axienet: limit minimum TX ring size
net: axienet: add missing memory barriers
net: axienet: reset core on initialization prior to MDIO access
net: axienet: Wait for PhyRstCmplt after core reset
net: axienet: increase reset timeout
bpf, selftests: Add ringbuf memory type confusion test
...
|
|
Merge more updates from Andrew Morton:
"55 patches.
Subsystems affected by this patch series: percpu, procfs, sysctl,
misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"
* emailed patches from Andrew Morton <[email protected]>: (55 commits)
lib: remove redundant assignment to variable ret
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
btrfs: use generic Kconfig option for 256kB page size limit
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
configs: introduce debug.config for CI-like setup
delayacct: track delays from memory compact
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
delayacct: cleanup flags in struct task_delay_info and functions use it
delayacct: fix incomplete disable operation when switch enable to disable
delayacct: support swapin delay accounting for swapping without blkio
panic: remove oops_id
panic: use error_report_end tracepoint on warnings
fs/adfs: remove unneeded variable make code cleaner
FAT: use io_schedule_timeout() instead of congestion_wait()
hfsplus: use struct_group_attr() for memcpy() region
nilfs2: remove redundant pointer sbufs
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
const_structs.checkpatch: add frequently used ops structs
...
|
|
Delay accounting does not track the delay of memory compact. When there
is not enough free memory, tasks can spend a amount of their time
waiting for compact.
To get the impact of tasks in direct memory compact, measure the delay
when allocating memory through memory compact.
Also update tools/accounting/getdelays.c:
/ # ./getdelays_next -di -p 304
print delayacct stats ON
printing IO accounting
PID 304
CPU count real total virtual total delay total delay average
277 780000000 849039485 18877296 0.068ms
IO count delay total delay average
0 0 0ms
SWAP count delay total delay average
0 0 0ms
RECLAIM count delay total delay average
5 11088812685 2217ms
THRASHING count delay total delay average
0 0 0ms
COMPACT count delay total delay average
3 72758 0ms
watch: read=0, write=0, cancelled_write=0
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: wangyong <[email protected]>
Reviewed-by: Jiang Xuexin <[email protected]>
Reviewed-by: Zhang Wenya <[email protected]>
Reviewed-by: Yang Yang <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Flags in struct task_delay_info is used to distinguish the difference
between swapin and blkio delay acountings. But after patch "delayacct:
support swapin delay accounting for swapping without blkio", there is no
need to do that since swapin and blkio delay accounting use their own
functions.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Yang <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zeal Robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a task is created after delayacct is enabled, kernel will do all
the delay accountings for that task. The problems is if user disables
delayacct by set /proc/sys/kernel/task_delayacct to zero, only blkio
delay accounting is disabled.
Now disable all the kinds of delay accountings when
/proc/sys/kernel/task_delayacct sets to zero.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Yang <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently delayacct accounts swapin delay only for swapping that cause
blkio. If we use zram for swapping, tools/accounting/getdelays can't
get any SWAP delay.
It's useful to get zram swapin delay information, for example to adjust
compress algorithm or /proc/sys/vm/swappiness.
Reference to PSI, it accounts any kind of swapping by doing its work in
swap_readpage(), no matter whether swapping causes blkio. Let delayacct
do the similar work.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Yang <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "test_hash.c: refactor into KUnit", v3.
We refactored the lib/test_hash.c file into KUnit as part of the student
group LKCAMP [1] introductory hackathon for kernel development.
This test was pointed to our group by Daniel Latypov [2], so its full
conversion into a pure KUnit test was our goal in this patch series, but
we ran into many problems relating to it not being split as unit tests,
which complicated matters a bit, as the reasoning behind the original
tests is quite cryptic for those unfamiliar with hash implementations.
Some interesting developments we'd like to highlight are:
- In patch 1/5 we noticed that there was an unused define directive
that could be removed.
- In patch 4/5 we noticed how stringhash and hash tests are all under
the lib/test_hash.c file, which might cause some confusion, and we
also broke those kernel config entries up.
Overall KUnit developments have been made in the other patches in this
series:
In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c file so as
to make it more compatible with the KUnit style, whilst preserving the
original idea of the maintainer who designed it (i.e. George Spelvin),
which might be undesirable for unit tests, but we assume it is enough
for a first patch.
This patch (of 5):
Currently, there exist hash_32() and __hash_32() functions, which were
introduced in a patch [1] targeting architecture specific optimizations.
These functions can be overridden on a per-architecture basis to achieve
such optimizations. They must set their corresponding define directive
(HAVE_ARCH_HASH_32 and HAVE_ARCH__HASH_32, respectively) so that header
files can deal with these overrides properly.
As the supported 32-bit architectures that have their own hash function
implementation (i.e. m68k, Microblaze, H8/300, pa-risc) have only been
making use of the (more general) __hash_32() function (which only lacks
a right shift operation when compared to the hash_32() function), remove
the define directive corresponding to the arch-specific hash_32()
implementation.
[1] https://lore.kernel.org/lkml/[email protected]/
[[email protected]: hash_32_generic() becomes hash_32()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Reviewed-by: David Gow <[email protected]>
Tested-by: David Gow <[email protected]>
Co-developed-by: Augusto Durães Camargo <[email protected]>
Signed-off-by: Augusto Durães Camargo <[email protected]>
Co-developed-by: Enzo Ferreira <[email protected]>
Signed-off-by: Enzo Ferreira <[email protected]>
Signed-off-by: Isabella Basso <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Daniel Latypov <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: kernel test robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Introduce list_is_head() in the similar (*) way as it's done for
list_entry_is_head(). Make use of it in the list.h.
*) it's done as inliner and not a macro to be aligned with other
list_is_*() APIs; while at it, make all three to have the same
style.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Heikki Krogerus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19
all have the same name "cfs_migration/1", which will confuse the user.
This issue is not critical, because we can get the corresponding CPU
from the task's Cpus_allowed. But for kthreads corresponding to other
hardware devices, it is not easy to get the detailed device info from
task comm, for example,
jbd2/nvme0n1p2-
xfs-reclaim/sdf
Currently there are so many truncated kthreads:
rcu_tasks_kthre
rcu_tasks_rude_
rcu_tasks_trace
poll_mpt3sas0_s
ext4-rsv-conver
xfs-reclaim/sd{a, b, c, ...}
xfs-blockgc/sd{a, b, c, ...}
xfs-inodegc/sd{a, b, c, ...}
audit_send_repl
ecryptfs-kthrea
vfio-irqfd-clea
jbd2/nvme0n1p2-
...
We can shorten these names to work around this problem, but it may be
not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-'
for example, it is a nice name, and it is not a good idea to shorten it.
One possible way to fix this issue is extending the task comm size, but
as task->comm is used in lots of places, that may cause some potential
buffer overflows. Another more conservative approach is introducing a
new pointer to store kthread's full name if it is truncated, which won't
introduce too much overhead as it is in the non-critical path. Finally
we make a dicision to use the second approach. See also the discussions
in this thread:
https://lore.kernel.org/lkml/[email protected]/
After this change, the full name of these truncated kthreads will be
displayed via /proc/[pid]/comm:
rcu_tasks_kthread
rcu_tasks_rude_kthread
rcu_tasks_trace_kthread
poll_mpt3sas0_statu
ext4-rsv-conversion
xfs-reclaim/sdf1
xfs-blockgc/sdf1
xfs-inodegc/sdf1
audit_send_reply
ecryptfs-kthread
vfio-irqfd-cleanup
jbd2/nvme0n1p2-8
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As the sched:sched_switch tracepoint args are derived from the kernel,
we'd better make it same with the kernel. So the macro TASK_COMM_LEN is
converted to type enum, then all the BPF programs can get it through
BTF.
The BPF program which wants to use TASK_COMM_LEN should include the
header vmlinux.h. Regarding the test_stacktrace_map and
test_tracepoint, as the type defined in linux/bpf.h are also defined in
vmlinux.h, so we don't need to include linux/bpf.h again.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is better to use get_task_comm() instead of the open coded string
copy as we do in other places.
struct elf_prpsinfo is used to dump the task information in userspace
coredump or kernel vmcore. Below is the verification of vmcore,
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3]
> 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4]
> 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5]
0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6]
> 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7]
> 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8]
> 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9]
> 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10]
> 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11]
> 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12]
> 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13]
> 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14]
> 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15]
It works well as expected.
Some comments are added to explain why we use the hard-coded 16.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Include a note at the top to discourage people from including it in
headers.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
The rest of the changes are induced by the above and may not be split.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Arend van Spriel <[email protected]> [brcmfmac]
Acked-by: Kalle Valo <[email protected]>
Cc: Arend van Spriel <[email protected]>
Cc: Franky Lin <[email protected]>
Cc: Hante Meuleman <[email protected]>
Cc: Chi-hsien Lin <[email protected]>
Cc: Wright Feng <[email protected]>
Cc: Chung-hsien Hsu <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Heikki Krogerus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Change the proc_create[_data]() stubs which are used when CONFIG_PROC_FS
is not set from #defines to a static inline stubs.
This should fix clang -Werror builds failing due to errors like this:
drivers/platform/x86/thinkpad_acpi.c:918:30: error: unused variable
'dispatch_proc_ops' [-Werror,-Wunused-const-variable]
Fixing this in include/linux/proc_fs.h should ensure that the same issue
is also fixed in any other drivers hitting the same -Werror issue.
[[email protected]: fix CONFIG_PROC_FS=n]
[[email protected]: fix arch/sparc/kernel/led.c]
[[email protected]: fix build]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Reported-by: kernel test robot <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: David Howells <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_populate_pte(), which is marked __weak and used on most
architectures, but it is overridden on x86, which has its own
implementation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architecture.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding node by it, this is prepare for the next patch.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Move the seemingly generic block_vcpu_list from kvm_vcpu to vcpu_vmx, and
rename the list and all associated variables to clarify that it tracks
the set of vCPU that need to be poked on a posted interrupt to the wakeup
vector. The list is not used to track _all_ vCPUs that are blocking, and
the term "blocked" can be misleading as it may refer to a blocking
condition in the host or the guest, where as the PI wakeup case is
specifically for the vCPUs that are actively blocking from within the
guest.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove kvm_vcpu.pre_pcpu as it no longer has any users. No functional
change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Bring in fix for VT-d posted interrupts before further changing the code in 5.17.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add new kconfig target 'make mod2noconfig', which will be useful to
speed up the build and test iteration.
- Raise the minimum supported version of LLVM to 11.0.0
- Refactor certs/Makefile
- Change the format of include/config/auto.conf to stop double-quoting
string type CONFIG options.
- Fix ARCH=sh builds in dash
- Separate compression macros for general purposes (cmd_bzip2 etc.) and
the ones for decompressors (cmd_bzip2_with_size etc.)
- Misc Makefile cleanups
* tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: add cmd_file_size
arch: decompressor: remove useless vmlinux.bin.all-y
kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}
kbuild: drop $(size_append) from cmd_zstd
sh: rename suffix-y to suffix_y
doc: kbuild: fix default in `imply` table
microblaze: use built-in function to get CPU_{MAJOR,MINOR,REV}
certs: move scripts/extract-cert to certs/
kbuild: do not quote string values in include/config/auto.conf
kbuild: do not include include/config/auto.conf from shell scripts
certs: simplify $(srctree)/ handling and remove config_filename macro
kbuild: stop using config_filename in scripts/Makefile.modsign
certs: remove misleading comments about GCC PR
certs: refactor file cleaning
certs: remove unneeded -I$(srctree) option for system_certificates.o
certs: unify duplicated cmd_extract_certs and improve the log
certs: use $< and $@ to simplify the key generation rule
kbuild: remove headers_check stub
kbuild: move headers_check.pl to usr/include/
certs: use if_changed to re-generate the key when the key type is changed
...
|
|
The bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument, and thus both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.
While the non-NULL memory from bpf_ringbuf_reserve() can be passed to other
helpers, the two sinks (bpf_ringbuf_submit(), bpf_ringbuf_discard()) right now
only enforce a register type of PTR_TO_MEM.
This can lead to potential type confusion since it would allow other PTR_TO_MEM
memory to be passed into the two sinks which did not come from bpf_ringbuf_reserve().
Add a new MEM_ALLOC composable type attribute for PTR_TO_MEM, and enforce that:
- bpf_ringbuf_reserve() returns NULL or PTR_TO_MEM | MEM_ALLOC
- bpf_ringbuf_submit() and bpf_ringbuf_discard() only take PTR_TO_MEM | MEM_ALLOC
but not plain PTR_TO_MEM arguments via ARG_PTR_TO_ALLOC_MEM
- however, other helpers might treat PTR_TO_MEM | MEM_ALLOC as plain PTR_TO_MEM
to populate the memory area when they use ARG_PTR_TO_{UNINIT_,}MEM in their
func proto description
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
Generalize the check_ctx_reg() helper function into a more generic named one
so that it can be reused for other register types as well to check whether
their offset is non-zero. No functional change.
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"A bunch of new support and few updates to drivers:
New support:
- DMA_MEMCPY_SG support is bought back as we have a user in Xilinx
driver
- Support for TI J721S2 SoC in k3-udma driver
- Support for Ingenic MDMA and BDMA in the JZ4760
- Support for Renesas r8a779f0 dmac
Updates:
- We are finally getting rid of slave_id, so this brings in the
changes across tree for that
- updates for idxd driver
- at_xdmac driver cleanup"
* tag 'dmaengine-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (60 commits)
dt-bindings: dma-controller: Split interrupt fields in example
dmaengine: pch_dma: Remove usage of the deprecated "pci-dma-compat.h" API
dmaengine: at_xdmac: Fix race over irq_status
dmaengine: at_xdmac: Remove a level of indentation in at_xdmac_tasklet()
dmaengine: at_xdmac: Fix at_xdmac_lld struct definition
dmaengine: at_xdmac: Fix lld view setting
dmaengine: at_xdmac: Remove a level of indentation in at_xdmac_advance_work()
dmaengine: at_xdmac: Fix concurrency over xfers_list
dmaengine: at_xdmac: Move the free desc to the tail of the desc list
dmaengine: at_xdmac: Fix race for the tx desc callback
dmaengine: at_xdmac: Fix concurrency over chan's completed_cookie
dmaengine: at_xdmac: Print debug message after realeasing the lock
dmaengine: at_xdmac: Start transfer for cyclic channels in issue_pending
dmaengine: at_xdmac: Don't start transactions at tx_submit level
dmaengine: idxd: deprecate token sysfs attributes for read buffers
dmaengine: idxd: change bandwidth token to read buffers
dmaengine: idxd: fix wq settings post wq disable
dmaengine: idxd: change MSIX allocation based on per wq activation
dmaengine: idxd: fix descriptor flushing locking
dmaengine: idxd: embed irq_entry in idxd_wq struct
...
|
|
since commit 2279f540ea7d ("sched/deadline: Fix priority
inheritance with multiple scheduling classes"), we should not
keep it here.
Signed-off-by: Hui Su <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
With write operation on psi files replacing old trigger with a new one,
the lifetime of its waitqueue is totally arbitrary. Overwriting an
existing trigger causes its waitqueue to be freed and pending poll()
will stumble on trigger->event_wait which was destroyed.
Fix this by disallowing to redefine an existing psi trigger. If a write
operation is used on a file descriptor with an already existing psi
trigger, the operation will fail with EBUSY error.
Also bypass a check for psi_disabled in the psi_trigger_destroy as the
flag can be flipped after the trigger is created, leading to a memory
leak.
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: [email protected]
Suggested-by: Linus Torvalds <[email protected]>
Analyzed-by: Eric Biggers <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Time readers that cannot take locks (due to NMI etc..) currently make
use of perf_event::shadow_ctx_time, which, for that event gives:
time' = now + (time - timestamp)
or, alternatively arranged:
time' = time + (now - timestamp)
IOW, the progression of time since the last time the shadow_ctx_time
was updated.
There's problems with this:
A) the shadow_ctx_time is per-event, even though the ctx_time it
reflects is obviously per context. The direct concequence of this
is that the context needs to iterate all events all the time to
keep the shadow_ctx_time in sync.
B) even with the prior point, the context itself might not be active
meaning its time should not advance to begin with.
C) shadow_ctx_time isn't consistently updated when ctx_time is
There are 3 users of this stuff, that suffer differently from this:
- calc_timer_values()
- perf_output_read()
- perf_event_update_userpage() /* A */
- perf_event_read_local() /* A,B */
In particular, perf_output_read() doesn't suffer at all, because it's
sample driven and hence only relevant when the event is actually
running.
This same was supposed to be true for perf_event_update_userpage(),
after all self-monitoring implies the context is active *HOWEVER*, as
per commit f79256532682 ("perf/core: fix userpage->time_enabled of
inactive events") this goes wrong when combined with counter
overcommit, in that case those events that do not get scheduled when
the context becomes active (task events typically) miss out on the
EVENT_TIME update and ENABLED time is inflated (for a little while)
with the time the context was inactive. Once the event gets rotated
in, this gets corrected, leading to a non-monotonic timeflow.
perf_event_read_local() made things even worse, it can request time at
any point, suffering all the problems perf_event_update_userpage()
does and more. Because while perf_event_update_userpage() is limited
by the context being active, perf_event_read_local() users have no
such constraint.
Therefore, completely overhaul things and do away with
perf_event::shadow_ctx_time. Instead have regular context time updates
keep track of this offset directly and provide perf_event_time_now()
to complement perf_event_time().
perf_event_time_now() will, in adition to being context wide, also
take into account if the context is active. For inactive context, it
will not advance time.
This latter property means the cgroup perf_cgroup_info context needs
to grow addition state to track this.
Additionally, since all this is strictly per-cpu, we can use barrier()
to order context activity vs context time.
Fixes: 7d9285e82db5 ("perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Song Liu <[email protected]>
Tested-by: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA updates from Damien Le Moal:
"A larger than usual set of changes for this cycle. The bulk of the
changes are part of a rework of libata messages and debugging features
from Hannes. In more detail, the changes are as follows.
- Small code cleanups in the pata_ali driver (unnecessary variable
initialization and simplified return statement, from Jason and
Colin.
- Switch to using struct_group() in the sata_fsl driver, from Kees.
- Convert many sysfs attribute show functions to use sysfs_emit()
instead of snprintf(), from me.
- sata_dwc_460ex driver code cleanups, from Andy.
- Improve DMA setup and remove superfluous error message in
libahci_platform, from Andy
- A small code cleanup in libata to use min() instead of open coding
test, from Changcheng.
- Rework of libata messages from Hannes. This is especially focused
on replacing compile time defined debugging messages (DPRINTK() and
VPRINTK()) with regular dynamic debugging messages (pr_debug()) and
traceipoint events. Both libata-core and many drivers are updated
to have a consistent debugging level control for all drivers.
- Extend compile test support to as many drivers as possible in ATA
Kconfig to improve compile test coverage, from me.
- Fixes to avoid compile time warnings (W=1) and sparse warnings in
sata_fsl and ahci_xgene drivers, from me.
- Fix the interface of the read_id() port operation method to clarify
that the data buffer passed as an argument is little endian. This
avoids sparse warnings in the pata_netcell, pata_it821x,
ahci_xgene, ahci_cevaxi and ahci_brcm drivers. From me.
- Small code cleanup in the pata_octeon_cf driver, from Minghao.
- Improved IRQ configuration code in pata_of_platform, from Lad.
- Simplified implementation of __ata_scsi_queuecmd(), from Wenchao.
- Debounce delay flag renaming, from Paul.
- Add support for AMD A85 FCH (Hudson D4) AHCI adapters, from Paul"
* tag 'ata-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (106 commits)
ata: pata_ali: remove redundant return statement
ata: ahci: Add support for AMD A85 FCH (Hudson D4)
ata: libata: Rename link flag ATA_LFLAG_NO_DB_DELAY
ata: libata-scsi: simplify __ata_scsi_queuecmd()
ata: pata_of_platform: Use platform_get_irq_optional() to get the interrupt
ata: pata_samsung_cf: add compile test support
ata: pata_pxa: add compile test support
ata: pata_imx: add compile test support
ata: pata_ftide010: add compile test support
ata: pata_cs5535: add compile test support
ata: pata_octeon_cf: remove redundant val variable
ata: fix read_id() ata port operation interface
ata: ahci_xgene: use correct type for port mmio address
ata: sata_fsl: fix cmdhdr_tbl_entry and prde struct definitions
ata: sata_fsl: fix scsi host initialization
ata: pata_bk3710: add compile test support
ata: ahci_seattle: add compile test support
ata: ahci_xgene: add compile test support
ata: ahci_tegra: add compile test support
ata: ahci_sunxi: add compile test support
...
|
|
Pull virtio updates from Michael Tsirkin:
"virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes.
- partial support for < MAX_ORDER - 1 granularity for virtio-mem
- driver_override for vdpa
- sysfs ABI documentation for vdpa
- multiqueue config support for mlx5 vdpa
- and misc fixes, cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits)
vdpa/mlx5: Fix tracking of current number of VQs
vdpa/mlx5: Fix is_index_valid() to refer to features
vdpa: Protect vdpa reset with cf_mutex
vdpa: Avoid taking cf_mutex lock on get status
vdpa/vdpa_sim_net: Report max device capabilities
vdpa: Use BIT_ULL for bit operations
vdpa/vdpa_sim: Configure max supported virtqueues
vdpa/mlx5: Report max device capabilities
vdpa: Support reporting max device capabilities
vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()
vdpa: Add support for returning device configuration information
vdpa/mlx5: Support configuring max data virtqueue
vdpa/mlx5: Fix config_attr_mask assignment
vdpa: Allow to configure max data virtqueues
vdpa: Read device configuration only if FEATURES_OK
vdpa: Sync calls set/get config/status with cf_mutex
vdpa/mlx5: Distribute RX virtqueues in RQT object
vdpa: Provide interface to read driver features
vdpa: clean up get_config_size ret value handling
virtio_ring: mark ring unused on error
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"This is a continuation of the rework of device power management macros
used for declaring device power management callbacks (Paul Cercueil)"
* tag 'pm-5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
iio: pressure: bmp280: Use new PM macros
PM: runtime: Add EXPORT[_GPL]_RUNTIME_DEV_PM_OPS macros
PM: runtime: Add DEFINE_RUNTIME_DEV_PM_OPS() macro
PM: core: Add EXPORT[_GPL]_SIMPLE_DEV_PM_OPS macros
PM: core: Remove static qualifier in DEFINE_SIMPLE_DEV_PM_OPS macro
PM: core: Remove DEFINE_UNIVERSAL_DEV_PM_OPS() macro
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"The most significant item here is the Platform Firmware Runtime Update
and Telemetry (PFRUT) support designed to allow certain pieces of the
platform firmware to be updated on the fly, among other things.
Also important is the e820 handling change on x86 that should work
around PCI BAR allocation issues on some systems shipping since 2019.
The rest is just a handful of assorted fixes and cleanups on top of
the ACPI material merged previously.
Specifics:
- Add support for the the Platform Firmware Runtime Update and
Telemetry (PFRUT) interface based on ACPI to allow certain pieces
of the platform firmware to be updated without restarting the
system and to provide a mechanism for collecting platform firmware
telemetry data (Chen Yu, Dan Carpenter, Yang Yingliang).
- Ignore E820 reservations covering PCI host bridge windows on
sufficiently recent x86 systems to avoid issues with allocating PCI
BARs on systems where the E820 reservations cover the entire PCI
host bridge memory window returned by the _CRS object in the
system's ACPI tables (Hans de Goede).
- Fix and clean up acpi_scan_init() (Rafael Wysocki).
- Add more sanity checking to ACPI SPCR tables parsing (Mark
Langsdorf).
- Fix up ACPI APD (AMD Soc) driver initialization (Jiasheng Jiang).
- Drop unnecessary "static" from the ACPI PCC address space handling
driver added recently (kernel test robot)"
* tag 'acpi-5.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PCC: pcc_ctx can be static
ACPI: scan: Rename label in acpi_scan_init()
ACPI: scan: Simplify initialization of power and sleep buttons
ACPI: scan: Change acpi_scan_init() return value type to void
ACPI: SPCR: check if table->serial_port.access_width is too wide
ACPI: APD: Check for NULL pointer after calling devm_ioremap()
x86/PCI: Ignore E820 reservations for bridge windows on newer systems
ACPI: pfr_telemetry: Fix info leak in pfrt_log_ioctl()
ACPI: pfr_update: Fix return value check in pfru_write()
ACPI: tools: Introduce utility for firmware updates/telemetry
ACPI: Introduce Platform Firmware Runtime Telemetry driver
ACPI: Introduce Platform Firmware Runtime Update device driver
efi: Introduce EFI_FIRMWARE_MANAGEMENT_CAPSULE_HEADER and corresponding structures
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull more slab updates from Vlastimil Babka:
"Finish the conversion to struct slab by removing slab-specific fields
from struct page.
The first slab update (see merge commit ca1a46d6f506) did most of the
conversion, but there was also series in iommu tree removing the
iommu's usage of struct page 'freelist' field, blocking the final
struct page cleanup.
Now that the iommu changes have been merged, we can finish the job"
* tag 'slab-for-5.17-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm: Remove slab from struct page
|
|
Merge support for the Platform Firmware Runtime Update and Telemetry
interface based on ACPI.
The interface provided here allows updating certain pieces of the
platform firmware without restarting the system and collecting
platform firmware telemetry data.
This also includes a utility for accesing the new interface from user
space.
* acpi-pfrut:
ACPI: pfr_telemetry: Fix info leak in pfrt_log_ioctl()
ACPI: pfr_update: Fix return value check in pfru_write()
ACPI: tools: Introduce utility for firmware updates/telemetry
ACPI: Introduce Platform Firmware Runtime Telemetry driver
ACPI: Introduce Platform Firmware Runtime Update device driver
efi: Introduce EFI_FIRMWARE_MANAGEMENT_CAPSULE_HEADER and corresponding structures
|
|
Prior to Linux v5.4 devtmpfs used mount_single() which treats the given
mount options as "remount" options, so it updates the configuration of
the single super_block on each mount.
Since that was changed, the mount options used for devtmpfs are ignored.
This is a regression which affect systemd - which mounts devtmpfs with
"-o mode=755,size=4m,nr_inodes=1m".
This patch restores the "remount" effect by calling reconfigure_single()
Fixes: d401727ea0d7 ("devtmpfs: don't mix {ramfs,shmem}_fill_super() with mount_single()")
Acked-by: Christian Brauner <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull NTB updates from Jon Mason:
"New AMD PCI ID for NTB, and a number of bug fixes for ntb_hw_switchtec
for Linux v5.17"
* tag 'ntb-5.17' of git://github.com/jonmason/ntb:
ntb_hw_switchtec: Fix a minor issue in config_req_id_table()
ntb_hw_switchtec: Remove code for disabling ID protection
ntb_hw_switchtec: Update the way of getting VEP instance ID
ntb_hw_switchtec: AND with the part_map for a valid tpart_vec
ntb_hw_switchtec: Fix bug with more than 32 partitions
ntb_hw_switchtec: Fix pff ioread to read into mmio_part_cfg_all
ntb_hw_switchtec: fix the spelling of "its"
NTB/msi: Fix ntbm_msi_request_threaded_irq() kernel-doc comment
ntb_hw_amd: Add NTB PCI ID for new gen CPU
|