Age | Commit message (Collapse) | Author | Files | Lines |
|
Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix
the incorrect comments in code. Also fixed some zone->lru_lock comment
error from ancient time. etc.
I struggled to understand the comment above move_pages_to_lru() (surely
it never calls page_referenced()), and eventually realized that most of
it had got separated from shrink_active_list(): move that comment back.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: "Chen, Rong A" <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mika Penttilä <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add relock_page_lruvec() to replace repeated same code, no functional
change.
When testing for relock we can avoid the need for RCU locking if we simply
compare the page pgdat and memcg pointers versus those that the lruvec is
holding. By doing this we can avoid the extra pointer walks and accesses
of the memory cgroup.
In addition we can avoid the checks entirely if lruvec is currently NULL.
[[email protected]: use page_memcg()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: "Chen, Rong A" <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mika Penttilä <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
each of memcg per node. So on a large machine, each of memcg don't have
to suffer from per node pgdat->lru_lock competition. They could go fast
with their self lru_lock.
After move memcg charge before lru inserting, page isolation could
serialize page's memcg, then per memcg lruvec lock is stable and could
replace per node lru lock.
In isolate_migratepages_block(), compact_unlock_should_abort and
lock_page_lruvec_irqsave are open coded to work with compact_control.
Also add a debug func in locking which may give some clues if there are
sth out of hands.
Daniel Jordan's testing show 62% improvement on modified readtwice case on
his 2P * 10 core * 2 HT broadwell box.
https://lore.kernel.org/lkml/[email protected]/
Hugh Dickins helped on the patch polish, thanks!
[[email protected]: fix comment typo]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: use page_memcg()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Rong Chen <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mika Penttilä <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, compaction would get the lru_lock and then do page isolation
which works fine with pgdat->lru_lock, since any page isoltion would
compete for the lru_lock. If we want to change to memcg lru_lock, we have
to isolate the page before getting lru_lock, thus isoltion would block
page's memcg change which relay on page isoltion too. Then we could
safely use per memcg lru_lock later.
The new page isolation use previous introduced TestClearPageLRU() + pgdat
lru locking which will be changed to memcg lru lock later.
Hugh Dickins <[email protected]> fixed following bugs in this patch's early
version:
Fix lots of crashes under compaction load: isolate_migratepages_block()
must clean up appropriately when rejecting a page, setting PageLRU again
if it had been cleared; and a put_page() after get_page_unless_zero()
cannot safely be done while holding locked_lruvec - it may turn out to be
the final put_page(), which will take an lruvec lock when PageLRU.
And move __isolate_lru_page_prepare back after get_page_unless_zero to
make trylock_page() safe: trylock_page() is not safe to use at this time:
its setting PG_locked can race with the page being freed or allocated
("Bad page"), and can also erase flags being set by one of those "sole
owners" of a freshly allocated page who use non-atomic __SetPageFlag().
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Johannes Weiner <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: "Chen, Rong A" <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mika Penttilä <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently lru_lock still guards both lru list and page's lru bit, that's
ok. but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking. Just taking lruvec lock
first may be undermined by the page's memcg charge/migration. To fix this
problem, we will clear the lru bit out of locking and use it as pin down
action to block the page isolation in memcg changing.
So now a standard steps of page isolation is following:
1, get_page(); #pin the page avoid to be free
2, TestClearPageLRU(); #block other isolation like memcg change
3, spin_lock on lru_lock; #serialize lru list access
4, delete page from lru list;
This patch start with the first part: TestClearPageLRU, which combines
PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This
function will be used as page isolation precondition to prevent other
isolations some where else. Then there are may !PageLRU page on lru list,
need to remove BUG() checking accordingly.
There 2 rules for lru bit now:
1, the lru bit still indicate if a page on lru list, just in some
temporary moment(isolating), the page may have no lru bit when
it's on lru list. but the page still must be on lru list when the
lru bit set.
2, have to remove lru bit before delete it from lru list.
As Andrew Morton mentioned this change would dirty cacheline for a page
which isn't on the LRU. But the loss would be acceptable in Rong Chen
<[email protected]> report:
https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Johannes Weiner <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mika Penttilä <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "per memcg lru lock", v21.
This patchset includes 3 parts:
1) some code cleanup and minimum optimization as preparation
2) use TestCleanPageLRU as page isolation's precondition
3) replace per node lru_lock with per memcg per node lru_lock
Current lru_lock is one for each of node, pgdat->lru_lock, that guard
for lru lists, but now we had moved the lru lists into memcg for long
time. Still using per node lru_lock is clearly unscalable, pages on
each of memcgs have to compete each others for a whole lru_lock. This
patchset try to use per lruvec/memcg lru_lock to repleace per node lru
lock to guard lru lists, make it scalable for memcgs and get performance
gain.
Currently lru_lock still guards both lru list and page's lru bit, that's
ok. but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking. Just taking lruvec
lock first may be undermined by the page's memcg charge/migration. To
fix this problem, we could take out the page's lru bit clear and use it
as pin down action to block the memcg changes. That's the reason for
new atomic func TestClearPageLRU. So now isolating a page need both
actions: TestClearPageLRU and hold the lru_lock.
The typical usage of this is isolate_migratepages_block() in
compaction.c we have to take lru bit before lru lock, that serialized
the page isolation in memcg page charge/migration which will change
page's lruvec and new lru_lock in it.
The above solution suggested by Johannes Weiner, and based on his new
memcg charge path, then have this patchset. (Hugh Dickins tested and
contributed much code from compaction fix to general code polish, thanks
a lot!).
Daniel Jordan's testing show 62% improvement on modified readtwice case
on his 2P * 10 core * 2 HT broadwell box on v18, which has no much
different with this v20.
https://lore.kernel.org/lkml/[email protected]/
Thanks to Hugh Dickins and Konstantin Khlebnikov, they both brought this
idea 8 years ago, and others who gave comments as well: Daniel Jordan,
Mel Gorman, Shakeel Butt, Matthew Wilcox, Alexander Duyck etc.
Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
and Yun Wang. Hugh Dickins also shared his kbuild-swap case.
This patch (of 19):
lru_add_page_tail() is only used in huge_memory.c, defining it in other
file with a CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird.
Let's move it THP. And make it static as Hugh Dickins suggested.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: "Chen, Rong A" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mika Penttilä <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging / IIO driver updates from Greg KH:
"Here is the big staging and IIO driver pull request for 5.11-rc1
Lots of different things in here:
- loads of driver updates
- so many coding style cleanups
- new IIO drivers
- Android ION code is finally removed from the tree
- wimax drivers are moved to staging on their way out of the kernel
Nothing really exciting, just the constant grind of kernel development :)
All have been in linux-next for a while with no reported issues"
* tag 'staging-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (341 commits)
staging: olpc_dcon: Do not call platform_device_unregister() in dcon_probe()
staging: most: Fix spelling mistake "tranceiver" -> "transceiver"
staging: qlge: remove duplicate word in comment
staging: comedi: mf6x4: Fix AI end-of-conversion detection
staging: greybus: Add TODO item about modernizing the pwm code
pinctrl: ralink: add a pinctrl driver for the rt2880 family
dt-bindings: pinctrl: rt2880: add binding document
staging: rtl8723bs: remove ELEMENT_ID enum
staging: rtl8723bs: remove unused macros
staging: rtl8723bs: replace EID_EXTCapability
staging: rtl8723bs: replace EID_BSSIntolerantChlReport
staging: rtl8723bs: replace EID_BSSCoexistence
staging: rtl8723bs: replace _MME_IE_
staging: rtl8723bs: replace _WAPI_IE_
staging: rtl8723bs: replace _EXT_SUPPORTEDRATES_IE_
staging: rtl8723bs: replace _ERPINFO_IE_
staging: rtl8723bs: replace _CHLGETXT_IE_
staging: rtl8723bs: replace _COUNTRY_IE_
staging: rtl8723bs: replace _IBSS_PARA_IE_
staging: rtl8723bs: replace _TIM_IE_
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here is the big char/misc driver update for 5.11-rc1.
Continuing the tradition of previous -rc1 pulls, there seems to be
more and more tiny driver subsystems flowing through this tree.
Lots of different things, all of which have been in linux-next for a
while with no reported issues:
- extcon driver updates
- habannalab driver updates
- mei driver updates
- uio driver updates
- binder fixes and features added
- soundwire driver updates
- mhi bus driver updates
- phy driver updates
- coresight driver updates
- fpga driver updates
- speakup driver updates
- slimbus driver updates
- various small char and misc driver updates"
* tag 'char-misc-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (305 commits)
extcon: max77693: Fix modalias string
extcon: fsa9480: Support TI TSU6111 variant
extcon: fsa9480: Rewrite bindings in YAML and extend
dt-bindings: extcon: add binding for TUSB320
extcon: Add driver for TI TUSB320
slimbus: qcom: fix potential NULL dereference in qcom_slim_prg_slew()
siox: Make remove callback return void
siox: Use bus_type functions for probe, remove and shutdown
spmi: Add driver shutdown support
spmi: fix some coding style issues at the spmi core
spmi: get rid of a warning when built with W=1
uio: uio_hv_generic: use devm_kzalloc() for private data alloc
uio: uio_fsl_elbc_gpcm: use device-managed allocators
uio: uio_aec: use devm_kzalloc() for uio_info object
uio: uio_cif: use devm_kzalloc() for uio_info object
uio: uio_netx: use devm_kzalloc() for or uio_info object
uio: uio_mf624: use devm_kzalloc() for uio_info object
uio: uio_sercos3: use device-managed functions for simple allocs
uio: uio_dmem_genirq: finalize conversion of probe to devm_ handlers
uio: uio_dmem_genirq: convert simple allocations to device-managed
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core updates for 5.11-rc1
This time there was a lot of different work happening here for some
reason:
- redo of the fwnode link logic, speeding it up greatly
- auxiliary bus added (this was a tag that will be pulled in from
other trees/maintainers this merge window as well, as driver
subsystems started to rely on it)
- platform driver core cleanups on the way to fixing some long-time
api updates in future releases
- minor fixes and tweaks.
All have been in linux-next with no (finally) reported issues. Testing
there did helped in shaking issues out a lot :)"
* tag 'driver-core-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits)
driver core: platform: don't oops in platform_shutdown() on unbound devices
ACPI: Use fwnode_init() to set up fwnode
misc: pvpanic: Replace OF headers by mod_devicetable.h
misc: pvpanic: Combine ACPI and platform drivers
usb: host: sl811: Switch to use platform_get_mem_or_io()
vfio: platform: Switch to use platform_get_mem_or_io()
driver core: platform: Introduce platform_get_mem_or_io()
dyndbg: fix use before null check
soc: fix comment for freeing soc_dev_attr
driver core: platform: use bus_type functions
driver core: platform: change logic implementing platform_driver_probe
driver core: platform: reorder functions
driver core: make driver_probe_device() static
driver core: Fix a couple of typos
driver core: Reorder devices on successful probe
driver core: Delete pointless parameter in fwnode_operations.add_links
driver core: Refactor fw_devlink feature
efi: Update implementation of add_links() to create fwnode links
of: property: Update implementation of add_links() to create fwnode links
driver core: Use device's fwnode to check if it is waiting for suppliers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the "large" set of tty and serial patches for 5.11-rc1.
Nothing major at all, some cleanups and some driver removals, always a
nice sign:
- build warning cleanups
- vt locking and logic unwinding and cleanups
- tiny serial driver fixes and updates
- removal of the synclink serial driver as it's no longer needed
- removal of dead termiox code
All of this has been in linux-next for a while with no reported issues"
* tag 'tty-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (89 commits)
serial: 8250_pci: Drop bogus __refdata annotation
tty: serial: meson: enable console as module
serial: 8250_omap: Avoid FIFO corruption caused by MDR1 access
serial: imx: Move imx_uart_probe_dt() content into probe()
serial: imx: Remove unneeded of_device_get_match_data() NULL check
tty: Fix whitespace inconsistencies in vt_io_ioctl
serial_core: Check for port state when tty is in error state
dt-bindings: serial: Update DT binding docs to support SiFive FU740 SoC
tty: use const parameters in port-flag accessors
tty: use assign_bit() in port-flag accessors
earlycon: drop semicolon from earlycon macro
tty: Remove dead termiox code
tty/serial/imx: Enable TXEN bit in imx_poll_init().
tty : serial: jsm: Fixed file by adding spacing
tty: serial: uartlite: Support probe deferral
earlycon: simplify earlycon-table implementation
tty: serial: bcm63xx: lower driver dependencies
serial: mxs-auart: Remove unneeded platform_device_id
serial: 8250-mtk: Fix reference leak in mtk8250_probe
serial: imx: Remove unused .id_table support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt updates from Greg KH:
"Here is the big USB and thunderbolt pull request for 5.11-rc1.
Nothing major in here, just the grind of constant development to
support new hardware and fix old issues:
- thunderbolt updates for new USB4 hardware
- cdns3 major driver updates
- lots of typec updates and additions as more hardware is available
- usb serial driver updates and fixes
- other tiny USB driver updates
All have been in linux-next with no reported issues"
* tag 'usb-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (172 commits)
usb: phy: convert comma to semicolon
usb: ucsi: convert comma to semicolon
usb: typec: tcpm: convert comma to semicolon
usb: typec: tcpm: Update vbus_vsafe0v on init
usb: typec: tcpci: Enable bleed discharge when auto discharge is enabled
usb: typec: Add class for plug alt mode device
USB: typec: tcpci: Add Bleed discharge to POWER_CONTROL definition
USB: typec: tcpm: Add a 30ms room for tPSSourceOn in PR_SWAP
USB: typec: tcpm: Fix PR_SWAP error handling
USB: typec: tcpm: Hard Reset after not receiving a Request
USB: gadget: f_fs: remove likely/unlikely
usb: gadget: f_fs: Re-use SS descriptors for SuperSpeedPlus
USB: gadget: f_midi: setup SuperSpeed Plus descriptors
USB: gadget: f_acm: add support for SuperSpeed Plus
USB: gadget: f_rndis: fix bitrate for SuperSpeed and above
usb: typec: intel_pmc_mux: Configure cable generation value for USB4
MAINTAINERS: Add myself as a reviewer for CADENCE USB3 DRD IP DRIVER
usb: chipidea: ci_hdrc_imx: Use of_device_get_match_data()
usb: chipidea: usbmisc_imx: Use of_device_get_match_data()
usb: cdns3: fix NULL pointer dereference on no platform data
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"Lots of changes (slightly more code increase than usual) at this time,
while most of code changes are ASoC driver-specific.
Here are some highlights:
Core:
- The new auxiliary bus implementation for Intel DSP, which will be
used by other drivers as well
- Lots of ASoC core cleanups and refactoring
- UBSAN and KCSAN fixes in rawmidi, sequencer and a few others
- Compress-offload API enhancement for the pause during draining
HD- and USB-audio:
- Enhancements of the USB-audio implicit feedback support, including
better full-duplex operations
- Continued CA0132 improvements and fixes
- A few new quirk entries, HDMI audio fixes
ASoC:
- Support for boot time selection of Intel DSP firmware, which should
help distros/users testing new stuff more easily; the kconfig was
moved to boot time option, too
- Some basic DPCM support in audio graph card
- Removal of old pre-DT Freescale drivers
- Support for Allwinner H6 I2S, Analog Devices ADAU1372, Intel
Alderlake-S, GMediatek MT8192, NXP i.MX HDMI and XCVR, Realtek
RT715, Qualcomm SM8250 and simple GPIO based muxes"
* tag 'sound-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (445 commits)
ALSA: pcm: oss: Fix potential out-of-bounds shift
ALSA: usb-audio: Fix potential out-of-bounds shift
ALSA: hda/ca0132 - Add ZxR surround DAC setup.
ALSA: hda/ca0132 - Add 8051 PLL write helper functions.
ALSA: hda/hdmi: packet buffer index must be set before reading value
ASoC: SOF: imx: update kernel-doc description
ASoC: mediatek: mt8183: delete some unreachable code
ASoC: mediatek: mt8183: add PM ops to machine drivers
ASoC: topology: Fix wrong size check
ASoC: topology: Add missing size check
ASoC: SOF: Intel: hda: fix the condition passed to sof_dev_dbg_or_err
ASoC: SOF: modify the SOF_DBG flags
ASoC: SOF: Intel: hda: remove duplicated status dump
ASoC: rt1015p: delay 300ms after SDB pulling high for calibration
ASoC: rt1015p: move SDB control from trigger to DAPM
ASoC: wm_adsp: remove "ctl" from list on error in wm_adsp_create_control()
ALSA: usb-audio: Fix control 'access overflow' errors from chmap
ALSA: hda/hdmi: always print pin NIDs as hexadecimal
ALSA: hda/realtek - Add supported for more Lenovo ALC285 Headset Button
ALSA: hda/ca0132 - Remove now unnecessary DSP setup functions.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- support "prefer busy polling" NAPI operation mode, where we defer
softirq for some time expecting applications to periodically busy
poll
- AF_XDP: improve efficiency by more batching and hindering the
adjacency cache prefetcher
- af_packet: make packet_fanout.arr size configurable up to 64K
- tcp: optimize TCP zero copy receive in presence of partial or
unaligned reads making zero copy a performance win for much smaller
messages
- XDP: add bulk APIs for returning / freeing frames
- sched: support fragmenting IP packets as they come out of conntrack
- net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs
BPF:
- BPF switch from crude rlimit-based to memcg-based memory accounting
- BPF type format information for kernel modules and related tracing
enhancements
- BPF implement task local storage for BPF LSM
- allow the FENTRY/FEXIT/RAW_TP tracing programs to use
bpf_sk_storage
Protocols:
- mptcp: improve multiple xmit streams support, memory accounting and
many smaller improvements
- TLS: support CHACHA20-POLY1305 cipher
- seg6: add support for SRv6 End.DT4/DT6 behavior
- sctp: Implement RFC 6951: UDP Encapsulation of SCTP
- ppp_generic: add ability to bridge channels directly
- bridge: Connectivity Fault Management (CFM) support as is defined
in IEEE 802.1Q section 12.14.
Drivers:
- mlx5: make use of the new auxiliary bus to organize the driver
internals
- mlx5: more accurate port TX timestamping support
- mlxsw:
- improve the efficiency of offloaded next hop updates by using
the new nexthop object API
- support blackhole nexthops
- support IEEE 802.1ad (Q-in-Q) bridging
- rtw88: major bluetooth co-existance improvements
- iwlwifi: support new 6 GHz frequency band
- ath11k: Fast Initial Link Setup (FILS)
- mt7915: dual band concurrent (DBDC) support
- net: ipa: add basic support for IPA v4.5
Refactor:
- a few pieces of in_interrupt() cleanup work from Sebastian Andrzej
Siewior
- phy: add support for shared interrupts; get rid of multiple driver
APIs and have the drivers write a full IRQ handler, slight growth
of driver code should be compensated by the simpler API which also
allows shared IRQs
- add common code for handling netdev per-cpu counters
- move TX packet re-allocation from Ethernet switch tag drivers to a
central place
- improve efficiency and rename nla_strlcpy
- number of W=1 warning cleanups as we now catch those in a patchwork
build bot
Old code removal:
- wan: delete the DLCI / SDLA drivers
- wimax: move to staging
- wifi: remove old WDS wifi bridging support"
* tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits)
net: hns3: fix expression that is currently always true
net: fix proc_fs init handling in af_packet and tls
nfc: pn533: convert comma to semicolon
af_vsock: Assign the vsock transport considering the vsock address flags
af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
vsock_addr: Check for supported flag values
vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
vm_sockets: Add flags field in the vsock address data structure
net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled
tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
nfc: s3fwrn5: Release the nfc firmware
net: vxget: clean up sparse warnings
mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router
mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3
mlxsw: spectrum_router_xm: Introduce basic XM cache flushing
mlxsw: reg: Add Router LPM Cache Enable Register
mlxsw: reg: Add Router LPM Cache ML Delete Register
mlxsw: spectrum_router_xm: Implement L-value tracking for M-index
mlxsw: reg: Add XM Router M Table Register
...
|
|
- Unify ECAM constants in native PCI Express drivers (Krzysztof Wilczyński)
- Add thunder-pem constant for custom ".bus_shift" initialiser (Krzysztof
Wilczyński)
- Convert iproc to use new ECAM constants (Krzysztof Wilczyński)
- Change vmd __iomem pointers from "char *" to "void *" (Krzysztof
Wilczyński)
- Remove unused xgene .bus_shift initialisers (Krzysztof Wilczyński)
* pci/ecam:
PCI: xgene: Removed unused ".bus_shift" initialisers from pci-xgene.c
PCI: vmd: Update type of the __iomem pointers
PCI: iproc: Convert to use the new ECAM constants
PCI: thunder-pem: Add constant for custom ".bus_shift" initialiser
PCI: Unify ECAM constants in native PCI Express drivers
|
|
- Add sysfs attribute for device power state (Maximilian Luz)
- Rename pci_wakeup_bus() to pci_resume_bus() (Mika Westerberg)
- Do not generate wakeup event when runtime resuming bus (Mika Westerberg)
* pci/pm:
PCI/PM: Do not generate wakeup event when runtime resuming device
PCI/PM: Rename pci_wakeup_bus() to pci_resume_bus()
PCI: Add sysfs attribute for device power state
|
|
- Update kernel-doc to match function prototypes (Mauro Carvalho Chehab)
- Bounds-check "pci=resource_alignment=" requests (Bjorn Helgaas)
- Fix integer overflow in "pci=resource_alignment=" requests (Colin Ian
King)
- Remove unused HAVE_PCI_SET_MWI definition (Heiner Kallweit)
- Reduce pci_set_cacheline_size() message to debug level (Heiner Kallweit)
* pci/misc:
PCI: Reduce pci_set_cacheline_size() message to debug level
PCI: Remove unused HAVE_PCI_SET_MWI
PCI: Fix overflow in command-line resource alignment requests
PCI: Bounds-check command-line resource alignment requests
PCI: Fix kernel-doc markup
# Conflicts:
# drivers/pci/pci-driver.c
|
|
- Stop writing AER Capability when we don't own it (Sean V Kelley)
- Bind RCEC devices to the Port driver (Qiuxu Zhuo)
- Cache the RCEC RA Capability offset (Sean V Kelley)
- Add pci_walk_bridge() (Sean V Kelley)
- Clear AER status only when we control AER (Sean V Kelley)
- Recover from RCEC AER errors (Sean V Kelley)
- Add pcie_link_rcec() to associate RCiEPs with RCECs (Sean V Kelley)
- Recover from RCiEP AER errors (Sean V Kelley)
- Add pcie_walk_rcec() for RCEC AER handling (Sean V Kelley)
- Add pcie_walk_rcec() for RCEC PME handling (Sean V Kelley)
- Add RCEC AER error injection support (Qiuxu Zhuo)
* pci/err:
PCI/AER: Add RCEC AER error injection support
PCI/PME: Add pcie_walk_rcec() to RCEC PME handling
PCI/AER: Add pcie_walk_rcec() to RCEC AER handling
PCI/ERR: Recover from RCiEP AER errors
PCI/ERR: Add pcie_link_rcec() to associate RCiEPs
PCI/ERR: Recover from RCEC AER errors
PCI/ERR: Clear AER status only when we control AER
PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery()
PCI/ERR: Avoid negated conditional for clarity
PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()
PCI/ERR: Simplify by computing pci_pcie_type() once
PCI/ERR: Simplify by using pci_upstream_bridge()
PCI/ERR: Rename reset_link() to reset_subordinates()
PCI/ERR: Cache RCEC EA Capability offset in pci_init_capabilities()
PCI/ERR: Bind RCEC devices to the Root Port driver
PCI/AER: Write AER Capability only when we control it
|
|
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <[email protected]>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...
|
|
Patch series "Control over userfaultfd kernel-fault handling", v6.
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.
It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel. For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3]. Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.
This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.
The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
This patch (of 2):
userfaultfd handles page faults from both user and kernel code. Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.
A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Colascione <[email protected]>
Signed-off-by: Lokesh Gidra <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: <[email protected]>
Cc: Daniel Colascione <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Jeff Vander Stoep <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: "Joel Fernandes (Google)" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kalesh Singh <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Stephen Smalley <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
CONFIG_PAGE_POISONING_ZERO uses the zero pattern instead of 0xAA. It was
introduced by commit 1414c7f4f7d7 ("mm/page_poisoning.c: allow for zero
poisoning"), noting that using zeroes retains the benefit of sanitizing
content of freed pages, with the benefit of not having to zero them again
on alloc, and the downside of making some forms of corruption (stray
writes of NULLs) harder to detect than with the 0xAA pattern. Together
with CONFIG_PAGE_POISONING_NO_SANITY it made possible to sanitize the
contents on free without checking it back on alloc.
These days we have the init_on_free() option to achieve sanitization with
zeroes and to save clearing on alloc (and without checking on alloc).
Arguably if someone does choose to check the poison for corruption on
alloc, the savings of not clearing the page are secondary, and it makes
sense to always use the 0xAA poison pattern. Thus, remove the
CONFIG_PAGE_POISONING_ZERO option for being redundant.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mateusz Nosek <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Page poisoning used to be incompatible with hibernation, as the state of
poisoned pages was lost after resume, thus enabling CONFIG_HIBERNATION
forces CONFIG_PAGE_POISONING_NO_SANITY. For the same reason, the
poisoning with zeroes variant CONFIG_PAGE_POISONING_ZERO used to disable
hibernation. The latter restriction was removed by commit 1ad1410f632d
("PM / Hibernate: allow hibernation with PAGE_POISONING_ZERO") and
similarly for init_on_free by commit 18451f9f9e58 ("PM: hibernate: fix
crashes with init_on_free=1") by making sure free pages are cleared after
resume.
We can use the same mechanism to instead poison free pages with
PAGE_POISON after resume. This covers both zero and 0xAA patterns. Thus
we can remove the Kconfig restriction that disables page poison sanity
checking when hibernation is enabled.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]> [hibernation]
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mateusz Nosek <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 11c9c7edae06 ("mm/page_poison.c: replace bool variable with static
key") changed page_poisoning_enabled() to a static key check. However,
the function is not inlined, so each check still involves a function call
with overhead not eliminated when page poisoning is disabled.
Analogically to how debug_pagealloc is handled, this patch converts
page_poisoning_enabled() back to boolean check, and introduces
page_poisoning_enabled_static() for fast paths. Both functions are
inlined.
The function kernel_poison_pages() is also called unconditionally and does
the static key check inside. Remove it from there and put it to callers.
Also split it to two functions kernel_poison_pages() and
kernel_unpoison_pages() instead of the confusing bool parameter.
Also optimize the check that enables page poisoning instead of
debug_pagealloc for architectures without proper debug_pagealloc support.
Move the check to init_mem_debugging_and_hardening() to enable a single
static key instead of having two static branches in
page_poisoning_enabled_static().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Mateusz Nosek <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
init_on_alloc/free parameters
Patch series "cleanup page poisoning", v3.
I have identified a number of issues and opportunities for cleanup with
CONFIG_PAGE_POISON and friends:
- interaction with init_on_alloc and init_on_free parameters depends on
the order of parameters (Patch 1)
- the boot time enabling uses static key, but inefficienty (Patch 2)
- sanity checking is incompatible with hibernation (Patch 3)
- CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
init_on_free (Patch 4)
- CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
have init_on_free (Patch 5)
This patch (of 5):
Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
produces a warning in dmesg that page_poison takes precedence. However,
as these warnings are printed in early_param handlers for
init_on_alloc/free, they are not printed if page_poison is enabled later
on the command line (handlers are called in the order of their
parameters), or when init_on_alloc/free is always enabled by the
respective config option - before the page_poison early param handler is
called, it is not considered to be enabled. This is inconsistent.
We can remove the dependency on order by making the init_on_* parameters
only set a boolean variable, and postponing the evaluation after all early
params have been processed. Introduce a new
init_mem_debugging_and_hardening() function for that, and move the related
debug_pagealloc processing there as well.
As a result init_mem_debugging_and_hardening() knows always accurately if
init_on_* and/or page_poison options were enabled. Thus we can also
optimize want_init_on_alloc() and want_init_on_free(). We don't need to
check page_poisoning_enabled() there, we can instead not enable the
init_on_* static keys at all, if page poisoning is enabled. This results
in a simpler and more effective code.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mateusz Nosek <[email protected]>
Cc: Laura Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The migrate_prep{_local} never fails, so it is pointless to have return
value and check the return value.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Song Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can only kmap() one subpage of a THP at a time, so loop over all
relevant subpages, skipping ones which don't need to be zeroed. This is
too large to inline when THPs are enabled and we actually need highmem, so
put it in highmem.c.
[[email protected]: start1 was allowed to be less than start2]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Naresh Kamboju <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
defer_compaction() and compaction_deferred() and compaction_restarting()
in mm/compaction.c won't be used in other files, so make them static, and
remove the declaration in the header file.
Take the chance to fix a typo.
Link: https://lkml.kernel.org/r/20201123170801.GA9625@rlk
Signed-off-by: Hui Su <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Mateusz Nosek <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The external function definitions don't need the "extern" keyword. Remove
them so future changes don't copy the function definition style.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ralph Campbell <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We haven't had 'dontuse' flags since 2002. Replace this obsolete warning
with a hopefully more useful one.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
They are not used anymore.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
All per-cpu pagesets for a zone use the same high and batch values, that
are duplicated there just for performance (locality) reasons. This patch
adds the same variables also to struct zone as a shared copy.
This will be useful later for making possible to disable pcplists
temporarily by setting high value to 0, while remembering the values for
restoring them later. But we can also immediately benefit from not
updating pagesets of all possible cpus in case the newly recalculated
values (after sysctl change or memory online/offline) are actually
unchanged from the previous ones.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For architectures that enable ARCH_HAS_SET_MEMORY having the ability to
verify that a page is mapped in the kernel direct map can be useful
regardless of hibernation.
Add RISC-V implementation of kernel_page_present(), update its forward
declarations and stubs to be a part of set_memory API and remove ugly
ifdefery in inlcude/linux/mm.h around current declarations of
kernel_page_present().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: "Edgecombe, Rick P" <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The design of DEBUG_PAGEALLOC presumes that __kernel_map_pages() must
never fail. With this assumption is wouldn't be safe to allow general
usage of this function.
Moreover, some architectures that implement __kernel_map_pages() have this
function guarded by #ifdef DEBUG_PAGEALLOC and some refuse to map/unmap
pages when page allocation debugging is disabled at runtime.
As all the users of __kernel_map_pages() were converted to use
debug_pagealloc_map_pages() it is safe to make it available only when
DEBUG_PAGEALLOC is set.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: "Edgecombe, Rick P" <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a page may be
not present in the direct map and has to be explicitly mapped before it
could be copied.
Introduce hibernate_map_page() and hibernation_unmap_page() that will
explicitly use set_direct_map_{default,invalid}_noflush() for
ARCH_HAS_SET_DIRECT_MAP case and debug_pagealloc_{map,unmap}_pages() for
DEBUG_PAGEALLOC case.
The remapping of the pages in safe_copy_page() presumes that it only
changes protection bits in an existing PTE and so it is safe to ignore
return value of set_direct_map_{default,invalid}_noflush().
Still, add a pr_warn() so that future changes in set_memory APIs will not
silently break hibernation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: "Edgecombe, Rick P" <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "arch, mm: improve robustness of direct map manipulation", v7.
During recent discussion about KVM protected memory, David raised a
concern about usage of __kernel_map_pages() outside of DEBUG_PAGEALLOC
scope [1].
Indeed, for architectures that define CONFIG_ARCH_HAS_SET_DIRECT_MAP it is
possible that __kernel_map_pages() would fail, but since this function is
void, the failure will go unnoticed.
Moreover, there's lack of consistency of __kernel_map_pages() semantics
across architectures as some guard this function with #ifdef
DEBUG_PAGEALLOC, some refuse to update the direct map if page allocation
debugging is disabled at run time and some allow modifying the direct map
regardless of DEBUG_PAGEALLOC settings.
This set straightens this out by restoring dependency of
__kernel_map_pages() on DEBUG_PAGEALLOC and updating the call sites
accordingly.
Since currently the only user of __kernel_map_pages() outside
DEBUG_PAGEALLOC is hibernation, it is updated to make direct map accesses
there more explicit.
[1] https://lore.kernel.org/lkml/[email protected]
This patch (of 4):
When CONFIG_DEBUG_PAGEALLOC is enabled, it unmaps pages from the kernel
direct mapping after free_pages(). The pages than need to be mapped back
before they could be used. Theese mapping operations use
__kernel_map_pages() guarded with with debug_pagealloc_enabled().
The only place that calls __kernel_map_pages() without checking whether
DEBUG_PAGEALLOC is enabled is the hibernation code that presumes
availability of this function when ARCH_HAS_SET_DIRECT_MAP is set. Still,
on arm64, __kernel_map_pages() will bail out when DEBUG_PAGEALLOC is not
enabled but set_direct_map_invalid_noflush() may render some pages not
present in the direct map and hibernation code won't be able to save such
pages.
To make page allocation debugging and hibernation interaction more robust,
the dependency on DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP has to be
made more explicit.
Start with combining the guard condition and the call to
__kernel_map_pages() into debug_pagealloc_map_pages() and
debug_pagealloc_unmap_pages() functions to emphasize that
__kernel_map_pages() should not be called without DEBUG_PAGEALLOC and use
these new functions to map/unmap pages when page allocation debugging is
enabled.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: "Edgecombe, Rick P" <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
ARM is the only architecture that defines CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
which in turn enables memmap_valid_within() function that is intended to
verify existence of struct page associated with a pfn when there are holes
in the memory map.
However, the ARCH_HAS_HOLES_MEMORYMODEL also enables HAVE_ARCH_PFN_VALID
and arch-specific pfn_valid() implementation that also deals with the holes
in the memory map.
The only two users of memmap_valid_within() call this function after
a call to pfn_valid() so the memmap_valid_within() check becomes redundant.
Remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL and memmap_valid_within() and rely
entirely on ARM's implementation of pfn_valid() that is now enabled
unconditionally.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Meelis Roos <[email protected]>
Cc: Michael Schmitz <[email protected]>
Cc: Russell King <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The ia64 implementation of __early_pfn_to_nid() essentially relies on the
same data as the generic implementation.
The correspondence between memory ranges and nodes is set in memblock
during early memory initialization in register_active_ranges() function.
The initialization of sparsemem that requires early_pfn_to_nid() happens
later and it can use the memblock information like the other architectures.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Meelis Roos <[email protected]>
Cc: Michael Schmitz <[email protected]>
Cc: Russell King <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A current "lazy drain" model suffers from at least two issues.
First one is related to the unsorted list of vmap areas, thus in order to
identify the [min:max] range of areas to be drained, it requires a full
list scan. What is a time consuming if the list is too long.
Second one and as a next step is about merging all fragments with a free
space. What is also a time consuming because it has to iterate over
entire list which holds outstanding lazy areas.
See below the "preemptirqsoff" tracer that illustrates a high latency. It
is ~24676us. Our workloads like audio and video are effected by such long
latency:
<snip>
tracer: preemptirqsoff
preemptirqsoff latency trace v1.1.5 on 4.9.186-perf+
--------------------------------------------------------------------
latency: 24676 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 P:8)
-----------------
| task: crtc_commit:112-261 (uid:0 nice:0 policy:1 rt_prio:16)
-----------------
=> started at: __purge_vmap_area_lazy
=> ended at: __purge_vmap_area_lazy
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| / delay
cmd pid ||||| time | caller
\ / ||||| \ | /
crtc_com-261 1...1 1us*: _raw_spin_lock <-__purge_vmap_area_lazy
[...]
crtc_com-261 1...1 24675us : _raw_spin_unlock <-__purge_vmap_area_lazy
crtc_com-261 1...1 24677us : trace_preempt_on <-__purge_vmap_area_lazy
crtc_com-261 1...1 24683us : <stack trace>
=> free_vmap_area_noflush
=> remove_vm_area
=> __vunmap
=> vfree
=> drm_property_free_blob
=> drm_mode_object_unreference
=> drm_property_unreference_blob
=> __drm_atomic_helper_crtc_destroy_state
=> sde_crtc_destroy_state
=> drm_atomic_state_default_clear
=> drm_atomic_state_clear
=> drm_atomic_state_free
=> complete_commit
=> _msm_drm_commit_work_cb
=> kthread_worker_fn
=> kthread
=> ret_from_fork
<snip>
To address those two issues we can redesign a purging of the outstanding
lazy areas. Instead of queuing vmap areas to the list, we replace it by
the separate rb-tree. In hat case an area is located in the tree/list in
ascending order. It will give us below advantages:
a) Outstanding vmap areas are merged creating bigger coalesced blocks,
thus it becomes less fragmented.
b) It is possible to calculate a flush range [min:max] without scanning
all elements. It is O(1) access time or complexity;
c) The final merge of areas with the rb-tree that represents a free
space is faster because of (a). As a result the lock contention is
also reduced.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Oleksiy Avramchenko <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: huang ying <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Extracted from slab.h, which seems to have the most complete version
including the correct might_sleep() check. Roll it out to slob.c.
Motivated by a discussion with Paul about possibly changing call_rcu
behaviour to allocate memory, but only roughly every 500th call.
There are a lot fewer places in the kernel that care about whether
allocating memory is allowed or not (due to deadlocks with reclaim code)
than places that care whether sleeping is allowed. But debugging these
also tends to be a lot harder, so nice descriptive checks could come in
handy. I might have some use eventually for annotations in drivers/gpu.
Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking does
not consult the PF_MEMALLOC flags. But there is no flag equivalent for
GFP_NOWAIT, hence this check can't go wrong due to
memalloc_no*_save/restore contexts. Willy is working on a patch series
which might change this:
https://lore.kernel.org/linux-mm/[email protected]/
I think best would be if that updates gfpflags_allow_blocking(), since
there's a ton of callers all over the place for that already.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Vetter <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: Christian König <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Thomas Hellström (Intel) <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rename the callback to reflect that it's not called *on* or *after* split,
but rather some time before the splitting to check if it's possible.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Geffon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As kernel expect to see only one of such mappings, any further operations
on the VMA-copy may be unexpected by the kernel. Maybe it's being on the
safe side, but there doesn't seem to be any expected use-case for this, so
restrict it now.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Geffon <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Code outside mm/ should not be calling free_unref_page(). Also move
free_unref_page_list().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The goal of these tracepoints is to be able to debug lock contention
issues. This lock is acquired on most (all?) mmap / munmap / page fault
operations, so a multi-threaded process which does a lot of these can
experience significant contention.
We trace just before we start acquisition, when the acquisition returns
(whether it succeeded or not), and when the lock is released (or
downgraded). The events are broken out by lock type (read / write).
The events are also broken out by memcg path. For container-based
workloads, users often think of several processes in a memcg as a single
logical "task", so collecting statistics at this level is useful.
The end goal is to get latency information. This isn't directly included
in the trace events. Instead, users are expected to compute the time
between "start locking" and "acquire returned", using e.g. synthetic
events or BPF. The benefit we get from this is simpler code.
Because we use tracepoint_enabled() to decide whether or not to trace,
this patch has effectively no overhead unless tracepoints are enabled at
runtime. If tracepoints are enabled, there is a performance impact, but
how much depends on exactly what e.g. the BPF program does.
[[email protected]: fix use-after-free race and css ref leak in tracepoints]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: v3]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: in-depth examples of tracepoint_enabled() usage, and per-cpu-per-context buffer design]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Axel Rasmussen <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Yafang Shao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Despite a comment that said that page fault accounting would be charged to
whatever task_struct* was passed into __access_remote_vm(), the tsk
argument was actually unused.
Making page fault accounting actually use this task struct is quite a
project, so there is no point in keeping the tsk argument.
Delete both the comment, and the argument.
[[email protected]: changelog addition]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For many workloads, pagetable consumption is significant and it makes
sense to expose it in the memory.stat for the memory cgroups. However at
the moment, the pagetables are accounted per-zone. Converting them to
per-node and using the right interface will correctly account for the
memory cgroups as well.
[[email protected]: export __mod_lruvec_page_state to modules for arch/mips/kvm/]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "memcg: add pagetable comsumption to memory.stat", v2.
Many workloads consumes significant amount of memory in pagetables. One
specific use-case is the user space network driver which mmaps the
application memory to provide zero copy transfer. This driver can consume
a large amount memory in page tables. This patch series exposes the
pagetable comsumption for each memory cgroup.
This patch (of 2):
This does not change any functionality and only move the functions which
update the lruvec stats to vmstat.h from memcontrol.h. The main reason
for this patch is to be able to use these functions in the page table
contructor function which is defined in mm.h and we can not include the
memcontrol.h in that file. Also this is a better place for this interface
in general. The lruvec abstraction, while invented for memcg, isn't
specific to memcg at all.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The *_lruvec_slab_state is also suitable for pages allocated from buddy,
not just for the slab objects. But the function name seems to tell us
that only slab object is applicable. So we can rename the keyword of slab
to kmem.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With the deprecation of the non-hierarchical mode of the memory controller
there are no more examples of broken hierarchies left.
Let's remove the cgroup core code which was supposed to print warnings
about creating of broken hierarchies.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: memcg: deprecate cgroup v1 non-hierarchical mode", v1.
The non-hierarchical cgroup v1 mode is a legacy of early days
of the memory controller and doesn't bring any value today.
However, it complicates the code and creates many edge cases
all over the memory controller code.
It's a good time to deprecate it completely. This patchset removes
the internal logic, adjusts the user interface and updates
the documentation. The alt patch removes some bits of the cgroup
core code, which become obsolete.
Michal Hocko said:
"All that we know today is that we have a warning in place to complain
loudly when somebody relies on use_hierarchy=0 with a deeper
hierarchy. For all those years we have seen _zero_ reports that would
describe a sensible usecase.
Moreover we (SUSE) have backported this warning into old distribution
kernels (since 3.0 based kernels) to extend the coverage and didn't
hear even for users who adopt new kernels only very slowly. The only
report we have seen so far was a LTP test suite which doesn't really
reflect any real life usecase"
This patch (of 3):
The non-hierarchical cgroup v1 mode is a legacy of early days of the
memory controller and doesn't bring any value today. However, it
complicates the code and creates many edge cases all over the memory
controller code.
It's a good time to deprecate it completely.
Functionally this patch enabled is by default for all cgroups and forbids
switching it off. Nothing changes if cgroup v2 is used: hierarchical mode
was enforced from scratch.
To protect the ABI memory.use_hierarchy interface is preserved with a
limited functionality: reading always returns "1", writing of "1" passes
silently, writing of any other value fails with -EINVAL and a warning to
dmesg (on the first occasion).
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch fixes/removes some obsolete comments in the code related
to the kernel memory accounting:
- kmem_cache->memcg_params.memcg_caches has been removed by commit
9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for
all accounted allocations")
- memcg->kmemcg_id is not used as a gate for kmem accounting since
commit 0b8f73e10428 ("mm: memcontrol: clean up alloc, online,
offline, free functions")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic
v2"), the code to check the secondary MMU's page table access bit is
broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the
secondary MMU's page table before the check. More specifically for those
secondary MMUs which unmap the memory in
mmu_notifier_invalidate_range_start() like kvm.
However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
access check before trying to unmap the page. So, at worst the reclaim
will miss accesses in a very short window if we remove page table access
check in unmapping code.
There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
reclaim. From memcg reclaim the page_referenced() only account the
accesses from the processes which are in the same memcg of the target page
but the unmapping code is considering accesses from all the processes, so,
decreasing the effectiveness of memcg reclaim.
The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
code.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2")
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|