Age | Commit message (Collapse) | Author | Files | Lines |
|
ice_send_event_to_aux() eventually descends to mutex_lock()
(-> might_sched()), so it must not be called under non-task
context. However, at least two fixes have happened already for the
bug splats occurred due to this function being called from atomic
context.
To make the emergency landings softer, bail out early when executed
in non-task context emitting a warn splat only once. This way we
trade some events being potentially lost for system stability and
avoid any related hangs and crashes.
Fixes: 348048e724a0e ("ice: Implement iidc operations")
Signed-off-by: Alexander Lobakin <[email protected]>
Tested-by: Michal Kubiak <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
Acked-by: Tony Nguyen <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There's a kernel BUG splat on processing aux critical error
interrupts in ice_misc_intr():
[ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000
...
[ 2101.060770] Call Trace:
[ 2101.063229] <IRQ>
[ 2101.065252] dump_stack+0x41/0x60
[ 2101.068587] __schedule_bug.cold.100+0x4c/0x58
[ 2101.073060] __schedule+0x6a4/0x830
[ 2101.076570] schedule+0x35/0xa0
[ 2101.079727] schedule_preempt_disabled+0xa/0x10
[ 2101.084284] __mutex_lock.isra.7+0x310/0x420
[ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice]
[ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice]
[ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice]
[ 2101.102232] __handle_irq_event_percpu+0x40/0x180
[ 2101.106965] handle_irq_event_percpu+0x30/0x80
[ 2101.111434] handle_irq_event+0x36/0x53
[ 2101.115292] handle_edge_irq+0x82/0x190
[ 2101.119148] handle_irq+0x1c/0x30
[ 2101.122480] do_IRQ+0x49/0xd0
[ 2101.125465] common_interrupt+0xf/0xf
[ 2101.129146] </IRQ>
...
As Andrew correctly mentioned previously[0], the following call
ladder happens:
ice_misc_intr() <- hardirq
ice_send_event_to_aux()
device_lock()
mutex_lock()
might_sleep()
might_resched() <- oops
Add a new PF state bit which indicates that an aux critical error
occurred and serve it in ice_service_task() in process context.
The new ice_pf::oicr_err_reg is read-write in both hardirq and
process contexts, but only 3 bits of non-critical data probably
aren't worth explicit synchronizing (and they're even in the same
byte [31:24]).
[0] https://lore.kernel.org/all/[email protected]
Fixes: 348048e724a0e ("ice: Implement iidc operations")
Signed-off-by: Alexander Lobakin <[email protected]>
Tested-by: Michal Kubiak <[email protected]>
Acked-by: Tony Nguyen <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
prestera_module_init()
Add the missing destroy_workqueue() before return from
prestera_module_init() in the error handling case.
Fixes: 4394fbcb78cf ("net: marvell: prestera: handle fib notifications")
Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
All packets on ingress (except for jumbo) are terminated with a 4-bytes
CRC checksum. It's the responsability of the driver to strip those 4
bytes. Unfortunately a change dating back to March 2017 re-shuffled some
code and made the CRC stripping code effectively dead.
This change re-orders that part a bit such that the datalen is
immediately altered if needed.
Fixes: 4902a92270fb ("drivers: net: xgene: Add workaround for errata 10GE_8/ENET_11")
Cc: [email protected]
Signed-off-by: Stephane Graber <[email protected]>
Tested-by: Stephane Graber <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Raw NAND core changes:
* Rework of_get_nand_bus_width()
* Remove of_get_nand_on_flash_bbt() wrapper
* Protect access to rawnand devices while in suspend
* bindings: Document the wp-gpios property
Rax NAND controller driver changes:
* atmel: Fix refcount issue in atmel_nand_controller_init
* nandsim:
- Add NS_PAGE_BYTE_SHIFT macro to replace the repeat pattern
- Merge repeat codes in ns_switch_state
- Replace overflow check with kzalloc to single kcalloc
* rockchip: Fix platform_get_irq.cocci warning
* stm32_fmc2: Add NAND Write Protect support
* pl353: Set the nand chip node as the flash node
* brcmnand: Fix sparse warnings in bcma_nand
* omap_elm: Remove redundant variable 'errors'
* gpmi:
- Support fast edo timings for mx28
- Validate controller clock rate
- Fix controller timings setting
* brcmnand:
- Add BCMA shim
- BCMA controller uses command shift of 0
- Allow platform data instantation
- Add platform data structure for BCMA
- Allow working without interrupts
- Move OF operations out of brcmnand_init_cs()
- Avoid pdev in brcmnand_init_cs()
- Allow SoC to provide I/O operations
- Assign soc as early as possible
Onenand changes:
* Check for error irq
Signed-off-by: Miquel Raynal <[email protected]>
|
|
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <[email protected]>
Reviewed-by: Matti Vaittinen <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
It is not recommened to use platform_get_resource(pdev, IORESOURCE_IRQ)
for requesting IRQ's resources any more, as they can be not ready yet in
case of DT-booting.
platform_get_irq() instead is a recommended way for getting IRQ even if
it was not retrieved earlier.
It also makes code simpler because we're getting "int" value right away
and no conversion from resource to int is required.
The print function dev_err() is redundant because platform_get_irq()
already prints an error.
Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Minghao Chi (CGEL ZTE) <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
x86/ACPI boards with an arizona WM5102 codec ship with either Windows or
Android as factory installed OS.
The ACPI fwnode for the codec on Android boards misses 2 things compared
to the Windows boards (this is hardcoded in the Android board kernels):
1. There is no CLKE ACPI method to enabe the 32 KHz clock the codec needs
for jack-detection.
2. The GPIOs used by the codec are not listed in the fwnode for the codec.
The ACPI tables on x86/ACPI boards shipped with Android being incomplete
happens a lot. The special drivers/platform/x86/x86-android-tablets.c
module contains DMI based per model handling to compensate for this.
This module will enable the 32KHz clock through the pinctrl framework
to fix 1. and it will also register a gpio-lookup table for all GPIOs
needed by the codec + machine driver, including the GPIOs coming from
the codec itself.
Add an arizona_spi_acpi_android_probe() function which waits for the
x86-android-tablets to have set things up before continue with probing
the arizona WM5102 codec.
Acked-by: Charles Keepax <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
x86/ACPI boards with an arizona WM5102 codec ship with either Windows or
Android as factory installed OS.
The ACPI fwnode describing the codec differs depending on the factory OS,
and the current arizona_spi_acpi_probe() function is tailored for use
with the Windows board ACPI tables.
Split out the Windows board ACPI tables specific bits into a new
arizona_spi_acpi_windows_probe() function in preparation for also
adding support for the Android board ACPI tables.
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Charles Keepax <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add the missing iounmap() before return from asic3_mfd_probe
in the error handling case.
Fixes: 64e8867ba809 ("mfd: tmio_mmc hardware abstraction for CNF area")
Signed-off-by: Miaoqian Lin <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The SPI driver wants to know the exact type of the controller.
Provide this information to it, hence it allows to fix the
Intel Cannon Lake and others in the future.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Fix "unused variable 'atmel_flexcom_pm_ops' [-Wunused-const-variable]"
compilation warning by using __maybe_unused on PM ops.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Claudiu Beznea <[email protected]>
Acked-by: Nicolas Ferre <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
- integration of first part of DIGImend [1] patches in order to vastly
improve Linux support of tablets (Nikolai Kondrashov, José Expósito)
[1] https://github.com/DIGImend/digimend-kernel-drivers
|
|
- driver for SiGma Micro keyboards (Desmond Lim)
|
|
- driver for Razer Blackwidow keyboards (Jelle van der Waa)
|
|
- fixes for handling unnumbered reports fully correctly (Angela Czubak
Dmitry Torokhov)
- untangling of intermingled code for sending and handling output reports
in __i2c_hid_command() (Dmitry Torokhov)
|
|
|
|
- rework of generic input handling which ultimately makes the processing of
tablet events more generic and reliable (Benjamin Tissoires)
|
|
- Apple magic keyboard support improvements for newer models (José Expósito)
- Apple T2 Macs support improvements (Aun-Ali Zaidi, Paul Pawlowski)
|
|
- dead code elimination (Christophe JAILLET)
|
|
Add quirks to not fail the initialization and to have quick resume
latency after cold/warm reboot.
Signed-off-by: Monish Kumar R <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Allow reading /sys/module/nvme/parameters/use_threaded_interrupts to see
if the use_threaded_interrupts module parameter is in use.
Signed-off-by: Xin Hao <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
commit 2f4c9ba23b88 ("nvme: export zoned namespaces without Zone Append
support read-only") marks zoned namespaces without append support
read-only. It does iso by setting NVME_NS_FORCE_RO in ns->flags in
nvme_update_zone_info and checking for that flag later in
nvme_update_disk_info to mark the disk as read-only.
But commit 73d90386b559 ("nvme: cleanup zone information initialization")
rearranged nvme_update_disk_info to be called before
nvme_update_zone_info and thus not marking the disk as read-only.
The call order cannot be just reverted because nvme_update_zone_info sets
certain queue parameters such as zone_write_granularity that depend on the
prior call to nvme_update_disk_info.
Remove the call to set_disk_ro in nvme_update_disk_info. and call
set_disk_ro after nvme_update_zone_info and nvme_update_disk_info to set
the permission for ZNS drives correctly. The same applies to the
multipath disk path.
Fixes: 73d90386b559 ("nvme: cleanup zone information initialization")
Signed-off-by: Pankaj Raghav <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
IFLA_GENEVE_INNER_PROTO_INHERIT
Add missing netlink attribute policy and size calculation.
Also enable strict validation from this new attribute onwards.
Fixes: 435fe1c0c1f7 ("net: geneve: support IPv4/IPv6 as inner protocol")
Signed-off-by: Eyal Birger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull filesystem folio updates from Matthew Wilcox:
"Primarily this series converts some of the address_space operations to
take a folio instead of a page.
Notably:
- a_ops->is_partially_uptodate() takes a folio instead of a page and
changes the type of the 'from' and 'count' arguments to make it
obvious they're bytes.
- a_ops->invalidatepage() becomes ->invalidate_folio() and has a
similar type change.
- a_ops->launder_page() becomes ->launder_folio()
- a_ops->set_page_dirty() becomes ->dirty_folio() and adds the
address_space as an argument.
There are a couple of other misc changes up front that weren't worth
separating into their own pull request"
* tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits)
fs: Remove aops ->set_page_dirty
fb_defio: Use noop_dirty_folio()
fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
fs: Convert __set_page_dirty_buffers to block_dirty_folio
nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio()
mm: Convert swap_set_page_dirty() to swap_dirty_folio()
ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio
f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio
f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio
f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio
afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio()
btrfs: Convert extent_range_redirty_for_io() to use folios
fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio
btrfs: Convert from set_page_dirty to dirty_folio
fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio()
fs: Add aops->dirty_folio
fs: Remove aops->launder_page
orangefs: Convert launder_page to launder_folio
nfs: Convert from launder_page to launder_folio
fuse: Convert from launder_page to launder_folio
...
|
|
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/[email protected]/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/[email protected]/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
|
|
When merged with Linus tree, the cited patch below will cause the
following build warning:
In function 'fortify_memset_chk',
inlined from 'mlx5e_xmit_xdp_frame' at drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c:438:3:
include/linux/fortify-string.h:242:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
242 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix that by grouping the fields to memeset in struct_group() to avoid
the false alarm.
Fixes: 9ded70fa1d81 ("net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode")
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Stephen Rothwell <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When we're copying the PPAG table into the cmd structure we're failing
if the table doesn't exist in ACPI or is invalid, or if the FW doesn't
support PPAG setting etc.
This is wrong because those are valid scenarios. Fix this by not
failing in those cases.
Fixes: e8e10a37c51c ("iwlwifi: acpi: move ppag code from mvm to fw/acpi")
Tested-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Signed-off-by: Luca Coelho <[email protected]>
Acked-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/iwlwifi.20220322173828.fa47f369b717.I6a9c65149c2c3c11337f3a802dff22f514a3a436@changeid
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Merge updates from Andrew Morton:
- A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs
- Most the MM patches which precede the patches in Willy's tree: kasan,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.
* emailed patches from Andrew Morton <[email protected]>: (227 commits)
mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
Docs/ABI/testing: add DAMON sysfs interface ABI document
Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
selftests/damon: add a test for DAMON sysfs interface
mm/damon/sysfs: support DAMOS stats
mm/damon/sysfs: support DAMOS watermarks
mm/damon/sysfs: support schemes prioritization
mm/damon/sysfs: support DAMOS quotas
mm/damon/sysfs: support DAMON-based Operation Schemes
mm/damon/sysfs: support the physical address space monitoring
mm/damon/sysfs: link DAMON for virtual address spaces monitoring
mm/damon: implement a minimal stub for sysfs-based DAMON interface
mm/damon/core: add number of each enum type values
mm/damon/core: allow non-exclusive DAMON start/stop
Docs/damon: update outdated term 'regions update interval'
Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
Docs/vm/damon: call low level monitoring primitives the operations
mm/damon: remove unnecessary CONFIG_DAMON option
mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
mm/damon/dbgfs-test: fix is_target_id() change
...
|
|
Let's make it clearer at which places we actually add and remove memory
blocks -- streamlining the terminology -- and highlight which memory block
start out online and which start out as offline.
* rename add_memory_block -> add_boot_memory_block
* rename init_memory_block -> add_memory_block
* rename unregister_memory -> remove_memory_block
* rename register_memory -> __add_memory_block
* add add_hotplug_memory_block
* mark add_boot_memory_block with __init (suggested by Oscar)
__add_memory_block() is a pure helper for add_memory_block(), remove
the somewhat obvious comment.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
test_pages_in_a_zone() is just another nasty PFN walker that can easily
stumble over ZONE_DEVICE memory ranges falling into the same memory block
as ordinary system RAM: the memmap of parts of these ranges might possibly
be uninitialized. In fact, we observed (on an older kernel) with UBSAN:
UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50
index 7 is out of range for type 'zone [5]'
CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...]
Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019
Call Trace:
dump_stack+0x9a/0xf0
ubsan_epilogue+0x9/0x7a
__ubsan_handle_out_of_bounds+0x13a/0x181
test_pages_in_a_zone+0x3c4/0x500
show_valid_zones+0x1fa/0x380
dev_attr_show+0x43/0xb0
sysfs_kf_seq_show+0x1c5/0x440
seq_read+0x49d/0x1190
vfs_read+0xff/0x300
ksys_read+0xb8/0x170
do_syscall_64+0xa5/0x4b0
entry_SYSCALL_64_after_hwframe+0x6a/0xdf
RIP: 0033:0x7f01f4439b52
We seem to stumble over a memmap that contains a garbage zone id. While
we could try inserting pfn_to_online_page() calls, it will just make
memory offlining slower, because we use test_pages_in_a_zone() to make
sure we're offlining pages that all belong to the same zone.
Let's just get rid of this PFN walker and determine the single zone of a
memory block -- if any -- for early memory blocks during boot. For memory
onlining, we know the single zone already. Let's avoid any additional
memmap scanning and just rely on the zone information available during
boot.
For memory hot(un)plug, we only really care about memory blocks that:
* span a single zone (and, thereby, a single node)
* are completely System RAM (IOW, no holes, no ZONE_DEVICE)
If one of these conditions is not met, we reject memory offlining.
Hotplugged memory blocks (starting out offline), always meet both
conditions.
There are three scenarios to handle:
(1) Memory hot(un)plug
A memory block with zone == NULL cannot be offlined, corresponding to
our previous test_pages_in_a_zone() check.
After successful memory onlining/offlining, we simply set the zone
accordingly.
* Memory onlining: set the zone we just used for onlining
* Memory offlining: set zone = NULL
So a hotplugged memory block starts with zone = NULL. Once memory
onlining is done, we set the proper zone.
(2) Boot memory with !CONFIG_NUMA
We know that there is just a single pgdat, so we simply scan all zones
of that pgdat for an intersection with our memory block PFN range when
adding the memory block. If more than one zone intersects (e.g., DMA and
DMA32 on x86 for the first memory block) we set zone = NULL and
consequently mimic what test_pages_in_a_zone() used to do.
(3) Boot memory with CONFIG_NUMA
At the point in time we create the memory block devices during boot, we
don't know yet which nodes *actually* span a memory block. While we could
scan all zones of all nodes for intersections, overlapping nodes complicate
the situation and scanning all nodes is possibly expensive. But that
problem has already been solved by the code that sets the node of a memory
block and creates the link in the sysfs --
do_register_memory_block_under_node().
So, we hook into the code that sets the node id for a memory block. If
we already have a different node id set for the memory block, we know
that multiple nodes *actually* have PFNs falling into our memory block:
we set zone = NULL and consequently mimic what test_pages_in_a_zone() used
to do. If there is no node id set, we do the same as (2) for the given
node.
Note that the call order in driver_init() is:
-> memory_dev_init(): create memory block devices
-> node_dev_init(): link memory block devices to the node and set the
node id
So in summary, we detect if there is a single zone responsible for this
memory block and we consequently store the zone in that case in the
memory block, updating it during memory onlining/offlining.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: Rafael Parra <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Rafael Parra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
register_memory_block_under_node()
Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2.
I remember talking to Michal in the past about removing
test_pages_in_a_zone(), which we use for:
* verifying that a memory block we intend to offline is really only managed
by a single zone. We don't support offlining of memory blocks that are
managed by multiple zones (e.g., multiple nodes, DMA and DMA32)
* exposing that zone to user space via
/sys/devices/system/memory/memory*/valid_zones
Now that I identified some more cases where test_pages_in_a_zone() might
go wrong, and we received an UBSAN report (see patch #3), let's get rid of
this PFN walker.
So instead of detecting the zone at runtime with test_pages_in_a_zone() by
scanning the memmap, let's determine and remember for each memory block if
it's managed by a single zone. The stored zone can then be used for the
above two cases, avoiding a manual lookup using test_pages_in_a_zone().
This avoids eventually stumbling over uninitialized memmaps in corner
cases, especially when ZONE_DEVICE ranges partly fall into memory block
(that are responsible for managing System RAM).
Handling memory onlining is easy, because we online to exactly one zone.
Handling boot memory is more tricky, because we want to avoid scanning all
zones of all nodes to detect possible zones that overlap with the physical
memory region of interest. Fortunately, we already have code that
determines the applicable nodes for a memory block, to create sysfs links
-- we'll hook into that.
Patch #1 is a simple cleanup I had laying around for a longer time.
Patch #2 contains the main logic to remove test_pages_in_a_zone() and
further details.
[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]
This patch (of 2):
Let's adjust the stale terminology, making it match
unregister_memory_block_under_nodes() and
do_register_memory_block_under_node(). We're dealing with memory block
devices, which span 1..X memory sections.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Oscar Salvador <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rafael Parra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
node_dev_init()
... and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls. All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().
This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.
Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.
The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device. The latter should be the case for the current users of
topology_init().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Anatoly Pugachev <[email protected]> (sparc64)
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
succeeded
If register_memory() fails, we freed the memory block but already added
the memory block to the group list, not good. Let's defer adding the
block to the memory group to after registering the memory block device.
We do handle it properly during unregister_memory(), but that's not
called when the registration fails.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When the hwpoison page meets the filter conditions, it should not be
regarded as successful memory_failure() processing for mce handler, but
should return a distinct value, otherwise mce handler regards the error
page has been identified and isolated, which may lead to calling
set_mce_nospec() to change page attribute, etc.
Here memory_failure() return -EOPNOTSUPP to indicate that the error
event is filtered, mce handler should not take any action for this
situation and hwpoison injector should treat as correct.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: luofei <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some places in the kernel don't really expect pageblock_order >=
MAX_ORDER, and it looks like this is only possible in corner cases:
1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order
pages via __free_pages_core(), which cannot possibly work.
2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE
start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger
pageblock_order, we could have a single pageblock partially managed by
two zones.
3) compaction code runs into __fragmentation_index() with order
>= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1]
4) mm/page_reporting.c won't be reporting any pages with default
page_reporting_order == pageblock_order, as we'll be skipping the
reporting loop inside page_reporting_process_zone().
5) __rmqueue_fallback() will never be able to steal with
ALLOC_NOFRAGMENT.
pageblock_order >= MAX_ORDER is weird either way: it's a pure
optimization for making alloc_contig_range(), as used for allcoation of
gigantic pages, a little more reliable to succeed. However, if there is
demand for somewhat reliable allocation of gigantic pages, affected
setups should be using CMA or boottime allocations instead.
So let's make sure that pageblock_order < MAX_ORDER and simplify.
[1] https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Rowand <[email protected]>
Cc: John Garry via iommu <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: enforce pageblock_order < MAX_ORDER".
Having pageblock_order >= MAX_ORDER seems to be able to happen in corner
cases and some parts of the kernel are not prepared for it.
For example, Aneesh has shown [1] that such kernels can be compiled on
ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will
run into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right
during boot.
We can get pageblock_order >= MAX_ORDER when the default hugetlb size is
bigger than the maximum allocation granularity of the buddy, in which
case we are no longer talking about huge pages but instead gigantic
pages.
Having pageblock_order >= MAX_ORDER can only make alloc_contig_range()
of such gigantic pages more likely to succeed.
Reliable use of gigantic pages either requires boot time allcoation or
CMA, no need to overcomplicate some places in the kernel to optimize for
corner cases that are broken in other areas of the kernel.
This patch (of 2):
Let's enforce pageblock_order < MAX_ORDER and simplify.
Especially patch #1 can be regarded a cleanup before:
[PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range
alignment. [2]
[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Acked-by: Rob Herring <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Frank Rowand <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: John Garry via iommu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
At each login the user forces the kernel to create a new terminal and
allocate up to ~1Kb memory for the tty-related structures.
By default it's allowed to create up to 4096 ptys with 1024 reserve for
initial mount namespace only and the settings are controlled by host
admin.
Though this default is not enough for hosters with thousands of
containers per node. Host admin can be forced to increase it up to
NR_UNIX98_PTY_MAX = 1<<20.
By default container is restricted by pty mount_opt.max = 1024, but
admin inside container can change it via remount. As a result, one
container can consume almost all allowed ptys and allocate up to 1Gb of
unaccounted memory.
It is not enough per-se to trigger OOM on host, however anyway, it
allows to significantly exceed the assigned memcg limit and leads to
troubles on the over-committed node.
It makes sense to account for them to restrict the host's memory
consumption from inside the memcg-limited container.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vasily Averin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jiri Slaby <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Theodore Ts'o <[email protected]> [ext4]
Acked-by: Roman Gushchin <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Fam Zheng <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kari Argillander <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
These functions are no longer useful as no BDIs report congestions any
more.
Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.
So replace the calls by 'false' and simplify the code - and remove the
functions.
[[email protected]: fix build]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: NeilBrown <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]> [nilfs]
Cc: Anna Schumaker <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Ilya Dryomov <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Paolo Valente <[email protected]>
Cc: Philipp Reisner <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
- Revert "PCI: xgene: Use inbound resources for setup" (Marc Zyngier)
- Revert "PCI: xgene: Fix IB window setup" (Marc Zyngier)
* remotes/lorenzo/pci/xgene:
PCI: xgene: Revert "PCI: xgene: Fix IB window setup"
PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
|
|
- Add DT binding and endpoint driver support for UniPhier NX1 SoC (Kunihiko
Hayashi)
* remotes/lorenzo/pci/uniphier:
PCI: uniphier-ep: Add NX1 support
PCI: uniphier-ep: Add SoC data structure
dt-bindings: PCI: uniphier-ep: Add bindings for NX1 SoC
|
|
- Finish transition to L1 state in rcar_pcie_config_access() because R-Car
can't do it on its own (Marek Vasut)
- Return PCI_ERROR_RESPONSE for reads that trigger PCIe errors (Marek
Vasut)
* remotes/lorenzo/pci/rcar:
PCI: rcar: Use PCI_SET_ERROR_RESPONSE after read which triggered an exception
PCI: rcar: Finish transition to L1 state in rcar_pcie_config_access()
|
|
- Save pointer to device match data instead of copying it (Dmitry
Baryshkov)
- Add ddrss_sf_tbu flag to device match data instead of checking OF
compatible string (Dmitry Baryshkov)
- Add SM8450 SoC PCIe DT bindings (Dmitry Baryshkov)
- Add SM8450 PCIe support (Dmitry Baryshkov)
* remotes/lorenzo/pci/qcom:
PCI: qcom: Add SM8450 PCIe support
PCI: qcom: Add ddrss_sf_tbu flag
PCI: qcom: Remove redundancy between qcom_pcie and qcom_pcie_cfg
dt-bindings: pci: qcom: Document PCIe bindings for SM8450
|
|
- Add Pali Rohár as pci-mvebu.c maintainer (Pali Rohár)
- Make struct pci_bridge_emul_ops const (Pali Rohár)
- Rename PCI_BRIDGE_EMUL_NO_PREFETCHABLE_BAR to
PCI_BRIDGE_EMUL_NO_PREFMEM_FORWARD since it doesn't apply to BARs (Pali
Rohár)
- Add new flag PCI_BRIDGE_EMUL_NO_IO_FORWARD for bridges that don't support
IO forwarding (Pali Rohár)
- Add Kconfig help text for CONFIG_PCI_MVEBU (Pali Rohár)
- Remove duplicate nports assignment (Pali Rohár)
- Set PCI_BRIDGE_EMUL_NO_IO_FORWARD when IO is unsupported (Pali Rohár)
- Initialize vendor, device and revision of emulated bridge (Pali Rohár)
- Fix Data Link Layer Link Active reporting on emulated bridge (Pali Rohár)
- Rearrange tests in bridge emulation for easier maintenance (Russell King)
- Add emulated bridge support for PCIe extended capabilities (Russell King)
- Add emulated bridge support for bridge Subsystem Vendor ID capability
(Pali Rohár)
- Configure Maximum Link Width based on DT "num-lanes" property (Pali
Rohár)
- Emulate bridge Subsystem Vendor ID capability (Pali Rohár)
- Emulate AER Capability (Pali Rohár)
- Use PCI core bridge->ops and bridge->child_ops to separate config
accesses to Root Port vs downstream devices (Pali Rohár)
- Unmask all INTx interrupts; they're reported via a single shared GIC
source (Pali Rohár)
- Add INTx support (Pali Rohár)
* remotes/lorenzo/pci/mvebu:
PCI: mvebu: Implement support for legacy INTx interrupts
PCI: mvebu: Fix macro names and comments about legacy interrupts
dt-bindings: PCI: mvebu: Update information about intx interrupts
PCI: mvebu: Use child_ops API
PCI: mvebu: Add support for Advanced Error Reporting registers on emulated bridge
PCI: mvebu: Add support for PCI Bridge Subsystem Vendor ID on emulated bridge
PCI: mvebu: Correctly configure x1/x4 mode
dt-bindings: PCI: mvebu: Add num-lanes property
PCI: pci-bridge-emul: Add support for PCI Bridge Subsystem Vendor ID capability
PCI: pci-bridge-emul: Add support for PCIe extended capabilities
PCI: pci-bridge-emul: Re-arrange register tests
PCI: mvebu: Fix reporting Data Link Layer Link Active on emulated bridge
PCI: mvebu: Update comment for PCI_EXP_LNKCTL register on emulated bridge
PCI: mvebu: Update comment for PCI_EXP_LNKCAP register on emulated bridge
PCI: mvebu: Properly initialize vendor, device and revision of emulated bridge
PCI: mvebu: Set PCI_BRIDGE_EMUL_NO_IO_FORWARD when IO is unsupported
PCI: mvebu: Remove duplicate nports assignment
PCI: mvebu: Add help string for CONFIG_PCI_MVEBU option
PCI: pci-bridge-emul: Add support for new flag PCI_BRIDGE_EMUL_NO_IO_FORWARD
PCI: pci-bridge-emul: Rename PCI_BRIDGE_EMUL_NO_PREFETCHABLE_BAR to PCI_BRIDGE_EMUL_NO_PREFMEM_FORWARD
PCI: pci-bridge-emul: Make struct pci_bridge_emul_ops as const
MAINTAINERS: Add Pali Rohár as pci-mvebu.c maintainer
|
|
- Add generic SZ_1T macro instead of a local one in pci-xgene.c (Christophe
Leroy)
* remotes/lorenzo/pci/misc:
sizes.h: Add SZ_1T macro
|
|
- Allow host controller driver to probe successfully (as other drivers do)
even if link is currently down (Fabio Estevam)
- Enable i.MX6QP PCIe power management (Richard Zhu)
- Invoke PHY exit function after PHY power off (Richard Zhu)
- Assert i.MX8MM CLKREQ# even if no device present to avoid boot hangs
(Richard Zhu)
* remotes/lorenzo/pci/imx6:
PCI: imx6: Assert i.MX8MM CLKREQ# even if no device present
PCI: imx6: Invoke the PHY exit function after PHY power off
PCI: imx6: Enable i.MX6QP PCIe power management support
PCI: imx6: Allow to probe when dw_pcie_wait_for_link() fails
|
|
- Avoid retarget interrupt hypercall in irq_unmask() on ARM64 (Boqun Feng)
* remotes/lorenzo/pci/hv:
PCI: hv: Avoid the retarget interrupt hypercall in irq_unmask() on ARM64
|
|
- Drop redundant '-gpios' from DT GPIO lookup (Ben Dooks)
- Force 2.5GT/s for initial device probe to workaround enumeration issue on
SiFive Unmatched board (Ben Dooks)
* pci/host/fu740:
PCI: fu740: Force 2.5GT/s for initial device probe
PCI: fu740: Drop redundant '-gpios' from DT GPIO lookup
|
|
- Fix alignment fault error in copy tests (Hou Zhiqiang)
- Fix misused goto label (Li Chen)
* remotes/lorenzo/pci/endpoint:
PCI: endpoint: Fix misused goto label
PCI: endpoint: Fix alignment fault error in copy tests
|