Age | Commit message (Collapse) | Author | Files | Lines |
|
dump_user_range() is used to copy the user page to a coredump file, but if
a hardware memory error occurred during copy, which called from
__kernel_write_iter() in dump_user_range(), it crashes,
CPU: 112 PID: 7014 Comm: mca-recover Not tainted 6.3.0-rc2 #425
pc : __memcpy+0x110/0x260
lr : _copy_from_iter+0x3bc/0x4c8
...
Call trace:
__memcpy+0x110/0x260
copy_page_from_iter+0xcc/0x130
pipe_write+0x164/0x6d8
__kernel_write_iter+0x9c/0x210
dump_user_range+0xc8/0x1d8
elf_core_dump+0x308/0x368
do_coredump+0x2e8/0xa40
get_signal+0x59c/0x788
do_signal+0x118/0x1f8
do_notify_resume+0xf0/0x280
el0_da+0x130/0x138
el0t_64_sync_handler+0x68/0xc0
el0t_64_sync+0x188/0x190
Generally, the '->write_iter' of file ops will use copy_page_from_iter()
and copy_page_from_iter_atomic(), change memcpy() to copy_mc_to_kernel()
in both of them to handle #MC during source read, which stop coredump
processing and kill the task instead of kernel panic, but the source
address may not always a user address, so introduce a new copy_mc flag in
struct iov_iter{} to indicate that the iter could do a safe memory copy,
also introduce the helpers to set/cleck the flag, for now, it's only used
in coredump's dump_user_range(), but it could expand to any other
scenarios to fix the similar issue.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Miaohe Lin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Tong Tiangen <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
__pageblock_pfn_to_page()
Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which
checks whether the given zone contains holes, and uses
pfn_to_online_page() to validate if the start pfn is online and valid, as
well as using pfn_valid() to validate the end pfn.
However, the __pageblock_pfn_to_page() function may return non-NULL even
if the end pfn of a pageblock is in a memory hole in some situations. For
example, if the pageblock order is MAX_ORDER, which will fall into 2
sub-sections, and the end pfn of the pageblock may be hole even though the
start pfn is online and valid.
See below memory layout as an example and suppose the pageblock order is
MAX_ORDER.
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x0000001fa3c7ffff]
[ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff]
[ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff]
[ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff]
[ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff]
[ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff]
[ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff]
[ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff]
[ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7dfffff]
Focus on the last memory range, and there is a hole for the range [mem
0x0000001fa7590000-0x0000001fa7dfffff]. That means the last pageblock
will contain the range from 0x1fa7c00000 to 0x1fa7ffffff, since the
pageblock must be 4M aligned. And in this pageblock, these pfns will fall
into 2 sub-section (the sub-section size is 2M aligned).
So, the 1st sub-section (indicates pfn range: 0x1fa7c00000 - 0x1fa7dfffff
) in this pageblock is valid by calling subsection_map_init() in
free_area_init(), but the 2nd sub-section (indicates pfn range:
0x1fa7e00000 - 0x1fa7ffffff ) in this pageblock is not valid.
This did not break anything until now, but the zone continuous is fragile
in this possible scenario. So as previous discussion[1], it is better to
add some comments to explain this possible issue in case there are some
future pfn walkers that rely on this.
[1] https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Let's factor out actual disabling of KSM. The existing "mm->def_flags &=
~VM_MERGEABLE;" was essentially a NOP and can be dropped, because
def_flags should never include VM_MERGEABLE. Note that we don't currently
prevent re-enabling KSM.
This should now be faster in case KSM was never enabled, because we only
conditionally iterate all VMAs. Further, it certainly looks cleaner.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Janosch Frank <[email protected]>
Acked-by: Stefan Roesch <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Let's test whether setting PR_SET_MEMORY_MERGE to 0 after setting it to 1
will unmerge pages, similar to how setting MADV_UNMERGEABLE after setting
MADV_MERGEABLE would.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Stefan Roesch <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/ksm: improve PR_SET_MEMORY_MERGE=0 handling and cleanup
disabling KSM", v2.
(1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
does, (2) add a selftest for it and (3) factor out disabling of KSM from
s390/gmap code.
This patch (of 3):
Let's unmerge any KSM pages when setting PR_SET_MEMORY_MERGE=0, and clear
the VM_MERGEABLE flag from all VMAs -- just like KSM would. Of course,
only do that if we previously set PR_SET_MEMORY_MERGE=1.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Stefan Roesch <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Claudio Imbrenda <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Janosch Frank <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The *folio_sz in damon_pa_young() will be used(as last_folio_sz) by
__damon_pa_check_access(), so it's need to be updated, fix missing branch.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Omit one line by unified folio_put(), and make code more clear.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/damon/paddr: minor code improvement", v3.
Unify folio_put() to make code more clear, and also fix minor issue in
damon_pa_young().
This patch (of 3):
Omit three lines by unified folio_put(), and make code more clear.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"Mostly drivers! Nothing special: some new Qualcomm chips as usual, and
the new NXP S32 and nVidia BlueField-3.
Core changes:
- Make a lot of pin controllers with GPIO and irqchips immutable,
i.e. not living structs, but const structs. This is driving a
changed initiated by the irqchip maintainers.
New drivers:
- New driver for the NXP S32 SoC pin controller
- As part of a thorough cleanup and restructuring of the
Ralink/Mediatek drivers, the Ralink MIPS pin control drivers were
folded into the Mediatek directory and the family is renamed
"mtmips". The Ralink chips live on as Mediatek MIPS family where
new variants can be added. As part of this work also the device
tree bindings were reworked.
- New subdriver for the Qualcomm SM7150 SoC.
- New subdriver for the Qualcomm IPQ9574 SoC.
- New driver for the nVidia BlueField-3 SoC.
- Support for the Qualcomm PMM8654AU mixed signal circuit GPIO.
- Support for the Qualcomm PMI632 mixed signal circuit GPIO.
Improvements:
- Add some missing pins and generic cleanups on the Renesas r8a779g0
and r8a779g0 pin controllers. Generic Renesas extension for power
source selection on several SoCs.
- Misc cleanups for the Atmel AT91 and AT91-PIO4 pin controllers
- Make the GPIO mode work on the Qualcomm SM8550-lpass-lpi driver.
- Several device tree binding cleanups as the binding YAML syntax is
solidifying"
* tag 'pinctrl-v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (153 commits)
pinctrl-bcm2835.c: fix race condition when setting gpio dir
dt-bindings: pinctrl: qcom,sm8150: Drop duplicate function value "atest_usb2"
dt-bindings: pinctrl: qcom: Add few missing functions
pinctrl: qcom: spmi-gpio: Add PMI632 support
dt-bindings: pinctrl: qcom,pmic-gpio: add PMI632
pinctrl: wpcm450: select MFD_SYSCON
pinctrl: qcom ssbi-gpio: Convert to immutable irq_chip
pinctrl: qcom ssbi-mpp: Convert to immutable irq_chip
pinctrl: qcom spmi-mpp: Convert to immutable irq_chip
pinctrl: plgpio: Convert to immutable irq_chip
pinctrl: pistachio: Convert to immutable irq_chip
pinctrl: pic32: Convert to immutable irq_chip
pinctrl: sx150x: Convert to immutable irq_chip
pinctrl: stmfx: Convert to immutable irq_chip
pinctrl: st: Convert to immutable irq_chip
pinctrl: mcp23s08: Convert to immutable irq_chip
pinctrl: equilibrium: Convert to immutable irq_chip
pinctrl: npcm7xx: Convert to immutable irq_chip
pinctrl: armada-37xx: Convert to immutable irq_chip
pinctrl: nsp: Convert to immutable irq_chip
...
|
|
Pull VFIO updates from Alex Williamson:
- Expose and allow R/W access to the PCIe DVSEC capability through
vfio-pci, as we already do with the legacy vendor capability
(K V P Satyanarayana)
- Fix kernel-doc issues with structure definitions (Simon Horman)
- Clarify ordering of operations relative to the kvm-vfio device for
driver dependencies against the kvm pointer (Yi Liu)
* tag 'vfio-v6.4-rc1' of https://github.com/awilliam/linux-vfio:
docs: kvm: vfio: Suggest KVM_DEV_VFIO_GROUP_ADD vs VFIO_GROUP_GET_DEVICE_FD ordering
vfio: correct kdoc for ops structures
vfio/pci: Add DVSEC PCI Extended Config Capability to user visible list.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells:
"Three fixes to AFS directory handling:
- Make sure that afs_read_dir() sees any increase in file size if the
file unexpectedly changed on the server (e.g. due to another client
making a change).
- Make afs_getattr() always return the server's dir file size, not
the locally edited one, so that pagecache eviction doesn't cause
the dir file size to change unexpectedly.
- Prevent afs_read_dir() from getting into an endless loop if the
server indicates that the directory file size is larger than
expected"
* tag 'afs-fixes-20230502' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Avoid endless loop if file is larger than expected
afs: Fix getattr to report server i_size on dirs, not local size
afs: Fix updating of i_size with dv jump from server
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Fix-ups:
- Add / improve Device Tree bindings
- Convert (int) .remove functions to (void) .remove_new
- Rid 'defined but not used' warnings
- Remove ineffective casts and pointer stubs
- Use specifically crafted API for testing DT property presence"
* tag 'backlight-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: as3711: Use of_property_read_bool() for boolean properties
backlight: hx8357: Use of_property_present() for testing DT property presence
backlight: arcxcnn_bl: Drop of_match_ptr for ID table
backlight: lp855x: Mark OF related data as maybe unused
backlight: sky81452-backlight: Convert to platform remove callback returning void
backlight: rt4831-backlight: Convert to platform remove callback returning void
backlight: qcom-wled: Convert to platform remove callback returning void
backlight: pwm_bl: Convert to platform remove callback returning void
backlight: mt6370-backlight: Convert to platform remove callback returning void
backlight: lp8788_bl: Convert to platform remove callback returning void
backlight: lm3533_bl: Convert to platform remove callback returning void
backlight: led_bl: Convert to platform remove callback returning void
backlight: hp680_bl: Convert to platform remove callback returning void
backlight: da9052_bl: Convert to platform remove callback returning void
backlight: cr_bllcd: Convert to platform remove callback returning void
backlight: adp5520_bl: Convert to platform remove callback returning void
backlight: aat2870_bl: Convert to platform remove callback returning void
backlight: qcom-wled: Add PMI8950 compatible
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Drivers:
- Add support for Renesas RZ/G2L MTU3
New Device Support:
- Add support for Lenovo Yoga Book X90F to Intel CHT WC
- Add support for MAX5970 and MAX5978 to Simple MFD (I2C)
- Add support for Meteor Lake PCH-S LPSS PCI to Intel LPSS PCI
- Add support for AXP15060 PMIC to X-Powers PMIC collection
Remove Device Support:
- Remove support for Samsung 5M8751 and S5M8763 PMIC devices
New Functionality:
- Convert deprecated QCOM IRQ Chip to config registers
- Add support for 32-bit address spaces to Renesas SMUs
Fix-ups:
- Make use of APIs / MACROs designed to simplify and demystify
- Add / improve Device Tree bindings
- Memory saving struct layout optimisations
- Remove old / deprecated functionality
- Factor out unassigned register addresses from ranges
- Trivial: Spelling fixes, renames and coding style fixes
- Rid 'defined but not used' warnings
- Remove ineffective casts and pointer stubs
Bug Fixes:
- Fix incorrectly non-inverted mask/unmask IRQs on QCOM platforms
- Remove MODULE_*() helpers from non-tristate drivers
- Do not attempt to use out-of-range memory addresses associated with io_base
- Provide missing export helpers
- Fix remap bulk read optimisation fallout
- Fix memory leak issues in error paths"
* tag 'mfd-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (88 commits)
dt-bindings: mfd: ti,j721e-system-controller: Add SoC chip ID
leds: bd2606mvv: Driver for the Rohm 6 Channel i2c LED driver
dt-bindings: mfd: qcom,spmi-pmic: Document flash LED controller
dt-bindings: mfd: x-powers,axp152: Document the AXP15060 variant
mfd: axp20x: Add support for AXP15060 PMIC
dt-bindings: mfd: x-powers,axp152: Document the AXP313a variant
counter: rz-mtu3-cnt: Unlock on error in rz_mtu3_count_ceiling_write()
dt-bindings: mfd: dlg,da9063: Document voltage monitoring
dt-bindings: mfd: stm32: Remove unnecessary blank lines
dt-bindings: mfd: qcom,spmi-pmic: Use generic ADC node name in examples
dt-bindings: mfd: syscon: Add nuvoton,ma35d1-sys compatible
MAINTAINERS: Add entries for Renesas RZ/G2L MTU3a counter driver
counter: Add Renesas RZ/G2L MTU3a counter driver
Documentation: ABI: sysfs-bus-counter: add cascade_counts_enable and external_input_phase_clock_select
mfd: Add Renesas RZ/G2L MTU3a core driver
dt-bindings: timer: Document RZ/G2L MTU3a bindings
mfd: rsmu_i2c: Convert to i2c's .probe_new() again
mfd: intel-lpss: Add Intel Meteor Lake PCH-S LPSS PCI IDs
mfd: dln2: Fix memory leak in dln2_probe()
mfd: axp20x: Fix axp288 writable-ranges
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"New Drivers:
- Add support for MediaTek MT6370 LED Indicator
- Add support for MediaTek MT6370 Flashlight
- Add support for QCOM PMIC Flash
- Add support for Rohm BD2606MVV Charge Pump LED
New Device Support:
- Add support for PMK8550 PWM to QCOM LPG
New Functionality:
- Add support for high resolution PWM to QCOM LPG
Fix-ups:
- Kconfig 'depends' and 'select' dependency changes
- Remove unused / irrelevant includes
- Remove unnecessary checks (already performed further into the call stack)
- Trivial: Fix commentary, simplify error messages
- Rid 'defined but not used' warnings
- Provide documentation
- Explicitly provide include files
Bug Fixes:
- Mark GPIO LED as BROKEN
- Fix Kconfig entries
- Fix various Smatch staticify reports
- Fix error handling (or a lack there of)"
* tag 'leds-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (30 commits)
leds: bd2606mvv: Driver for the Rohm 6 Channel i2c LED driver
dt-bindings: leds: Add ROHM BD2606MVV LED
docs: leds: ledtrig-oneshot: Fix spelling mistake
leds: pwm-multicolor: Simplify an error message
dt-bindings: leds: Convert PCA9532 to dtschema
leds: rgb: leds-qcom-lpg: Add support for PMK8550 PWM
leds: rgb: leds-qcom-lpg: Add support for high resolution PWM
dt-bindings: leds-qcom-lpg: Add qcom,pmk8550-pwm compatible string
leds: tca6507: Fix error handling of using fwnode_property_read_string
leds: flash: Set variables mvflash_{3,4}ch_regs storage-class-specifier to static
leds: rgb: mt6370: Correct config name to select in LEDS_MT6370_RGB
MAINTAINERS: Add entry for LED devices documentation
Documentation: leds: MT6370: Use bullet lists for timing variables
Documentation: leds: mt6370: Properly wrap hw_pattern chart
Documentation: leds: Add MT6370 doc to the toctree
leds: rgb: mt6370: Fix implicit declaration for FIELD_GET
docs: leds: Add MT6370 RGB LED pattern document
leds: flash: mt6370: Add MediaTek MT6370 flashlight support
leds: rgb: mt6370: Add MediaTek MT6370 current sink type LED Indicator support
dt-bindings: leds: spmi-flash-led: Add pm6150l compatible
...
|
|
Translate Documentation/process/adding-syscalls.rst into Spanish.
Co-developed-by: Mauricio Fuentes <[email protected]>
Signed-off-by: Mauricio Fuentes <[email protected]>
Signed-off-by: Carlos Bilbao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Changing my email address in CREDITS to be consistent with what's in use
in MAINTAINERS and mailmap. Also removed extra date information from the
CREDITS entry since I'm a maintainer for MPTCP again.
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Commit 6538b8ea886e ("x86_64: expand kernel stack to 16K")
expanded kernel stack for x86_64 but left the wrong documentation,
update it.
Signed-off-by: Yan Yan <[email protected]>
Reviewed-by: Lai Jiangshan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
There is a non-printable unicode char '\u202a' or "0xe2 0x80 0xaa" in hex
in the translation doc. It is unnecessary and should be removed for better
text formatting when using editors like vi.
Signed-off-by: Tao Liu <[email protected]>
Reviewed-by: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Spell "Reviewed" properly in Documentation/Translations/jp/SubmittingPatches
Signed-off-by: Deming Wang <[email protected]>
Reviewed-by: Akira Yokosawa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
Non-scalar time was removed from the ktime hybrid union in v3.17, and
the union itself followed suit in v4.10.
Make it clear that ktime_t is always a 64bit scalar type, to avoid
confusing the casual reader.
While at it, fix a spelling mistake.
Fixes: 24e4a8c3e8868874 ("ktime: Kill non-scalar ktime_t implementation for 2038")
Fixes: 2456e855354415bf ("ktime: Get rid of the union")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/59250a3d1c2c827b5c1833169a6e652ca6a784e6.1683021785.git.geert+renesas@glider.be
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
afs_read_dir fetches an amount of data that's based on what the inode
size is thought to be. If the file on the server is larger than what
was fetched, the code rechecks i_size and retries. If the local i_size
was not properly updated, this can lead to an endless loop of fetching
i_size from the server and noticing each time that the size is larger on
the server.
If it is known that the remote size is larger than i_size, bump up the
fetch size to that size.
Fixes: f3ddee8dc4e2 ("afs: Fix directory handling")
Signed-off-by: Marc Dionne <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
|
|
Fix afs_getattr() to report the server's idea of the file size of a
directory rather than the local size. The local size may differ as we edit
the local copy to avoid having to redownload it and we may end up with a
differently structured blob of a different size.
However, if the directory is discarded from the pagecache we then download
it again and the user may see the directory file size apparently change.
Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...")
Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
|
|
If the data version returned from the server is larger than expected,
the local data is invalidated, but we may still want to note the remote
file size.
Since we're setting change_size, we have to also set data_changed
for the i_size to get updated.
Fixes: 3f4aa9818163 ("afs: Fix EOF corruption")
Signed-off-by: Marc Dionne <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
|
|
TCP_Server_Info::hostname may be updated once or many times during
reconnect, so protect its access outside reconnect path as well and
then prevent any potential use-after-free bugs.
Cc: [email protected]
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Print full device name (UNC + optional prefix) from @old_ctx->source
when printing info about mount.
Before patch
mount.cifs //srv/share/dir /mnt -o ...
dmesg
...
CIFS: Attempting to mount \\srv\share
After patch
mount.cifs //srv/share/dir /mnt -o ...
dmesg
...
CIFS: Attempting to mount //srv/share/dir
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Use @ses->ses_lock to protect access of @ses->ses_status.
Cc: [email protected]
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
The name lengths were incorrect for two create contexts.
SMB2_CREATE_APP_INSTANCE_ID
SMB2_CREATE_APP_INSTANCE_VERSION
Update the definitions for these two to match the protocol specs.
Acked-by: Paulo Alcantara (SUSE) <[email protected]>
Reviewed-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
This Asus Zenbook laptop uses Realtek HDA codec combined with
2xCS35L41 Amplifiers using I2C with External Boost.
Signed-off-by: Mark Asselstine <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
We have observed a btrfs filesystem corruption on workloads using
no-holes and encoded writes via send stream v2. The symptom is that a
file appears to be truncated to the end of its last aligned extent, even
though the final unaligned extent and even the file extent and otherwise
correctly updated inode item have been written.
So if we were writing out a 1MiB+X file via 8 128K extents and one
extent of length X, i_size would be set to 1MiB, but the ninth extent,
nbyte, etc. would all appear correct otherwise.
The source of the race is a narrow (one line of code) window in which a
no-holes fs has read in an updated i_size, but has not yet set a shared
disk_i_size variable to write. Therefore, if two ordered extents run in
parallel (par for the course for receive workloads), the following
sequence can play out: (following "threads" a bit loosely, since there
are callbacks involved for endio but extra threads aren't needed to
cause the issue)
ENC-WR1 (second to last) ENC-WR2 (last)
------- -------
btrfs_do_encoded_write
set i_size = 1M
submit bio B1 ending at 1M
endio B1
btrfs_inode_safe_disk_i_size_write
local i_size = 1M
falls off a cliff for some reason
btrfs_do_encoded_write
set i_size = 1M+X
submit bio B2 ending at 1M+X
endio B2
btrfs_inode_safe_disk_i_size_write
local i_size = 1M+X
disk_i_size = 1M+X
disk_i_size = 1M
btrfs_delayed_update_inode
btrfs_delayed_update_inode
And the delayed inode ends up filled with nbytes=1M+X and isize=1M, and
writes respect i_size and present a corrupted file missing its last
extents.
Fix this by holding the inode lock in the no-holes case so that a thread
can't sneak in a write to disk_i_size that gets overwritten with an out
of date i_size.
Fixes: 41a2ee75aab0 ("btrfs: introduce per-inode file extent tree")
CC: [email protected] # 5.10+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Boris Burkov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Currently, the .got section is placed within the output section .text.
However, when .got is non-empty, the SHF_WRITE flag is set for .text
when linked by lld. GNU ld recognizes .text as a special section and
ignores the SHF_WRITE flag. By renaming .text, we can also get the
SHF_WRITE flag.
The kernel has performed R_AARCH64_RELATIVE resolving very early, and can
then assume that .got is read-only. Let's move .got to the vmlinux_rodata
pseudo-segment.
As Ard Biesheuvel notes:
"This matters to consumers of the vmlinux ELF representation of the
kernel image, such as syzkaller, which disregards writable PT_LOAD
segments when resolving code symbols. The kernel itself does not care
about this distinction, but given that the GOT contains data and not
code, it does not require executable permissions, and therefore does
not belong in .text to begin with."
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Fangrui Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
commit d54170812ef1 ("arm64: fix .idmap.text assertion for large kernels")
modified some of the section assembler directives that declare
.idmap.text to be SHF_ALLOC instead of
SHF_ALLOC|SHF_WRITE|SHF_EXECINSTR.
This patch fixes up the remaining stragglers that were left behind. Add
Fixes tag so that this doesn't precede related change in stable.
Fixes: d54170812ef1 ("arm64: fix .idmap.text assertion for large kernels")
Reported-by: Greg Thelen <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
The pointer auth hwcaps are not getting reported to userspace, as they
are missing the .matches field. Add the field back.
Fixes: 876e3c8efe79 ("arm64/cpufeature: Pull out helper for CPUID register definitions")
Signed-off-by: Kristina Martsenko <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
When is_valid_tracepoint() returns 1, need to call put_events_file() to
free `dir_path`.
Fixes: 25a7d914274de386 ("perf parse-events: Use get/put_events_file()")
Signed-off-by: Yang Jihong <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The current implementation supports coresight trace decode for a range
of CPUs, if the first CPU is CPU0.
Perf report segfaults, if tried for sparse CPUs list and also for
any range of CPUs(non zero first CPU).
Adding a fix to perf report for any range of CPUs and for sparse list.
Signed-off-by: Ganapatrao Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
With the following bash and make versions:
$ make --version
GNU Make 4.2.1
Built for aarch64-unknown-linux-gnu
$ bash --version
GNU bash, version 5.0.17(1)-release (aarch64-unknown-linux-gnu)
This error is encountered when running the build-test target:
$ make -C tools/perf build-test
tests/make:181: *** unterminated call to function 'shell': missing ')'. Stop.
make: *** [Makefile:103: build-test] Error 2
Fix it by escaping the # which was causing make to interpret the rest of
the line as a comment leaving the unclosed opening bracket.
Fixes: 56d5229471ee1634 ("tools build: Pass libbpf feature only if libbpf 1.0+")
Signed-off-by: James Clark <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
When cross-analyzing perf data recorded on an another platform, massive
unsupported target platform errors are printed. So let's show this message
as warning and only once.
Signed-off-by: Changbin Du <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Hui Wang <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
using it
Include reason parameter that was added in commit c504e5c2f9648a1e
("net: skb: introduce kfree_skb_reason()")
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sriram Yagnaraman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Before this, the raw ip is printed for non-callchain and dso offset for
callchain. This inconsistent output for address may confuse people. And
mostly what we expect is the raw ip.
'dso offset' is printed in callchain:
$ perf script
...
ls 1341034 2739463.008343: 2162417 cycles:
ffffffff99d657a7 [unknown] ([unknown])
ffffffff99e00b67 [unknown] ([unknown])
235d3 memset+0x53 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) # dso offset
a61b _dl_map_object+0x1bb (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
raw ip is printed for non-callchain:
$ perf script -G
...
ls 1341034 2739463.008876: 2053304 cycles: ffffffffc1596923 [unknown] ([unknown])
ls 1341034 2739463.009381: 1917049 cycles: 14def8e149e6 __strcoll_l+0xd96 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) # raw ip
Let's have consistent output for it. Later I'll add a new field 'dsoff' to
print dso offset.
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Hui Wang <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In elf_read_build_id(), if gnu build_id is found, should return the size of
the actually copied data. If descsz is greater thanBuild_ID_SIZE,
write_buildid data access may occur.
Fixes: be96ea8ffa788dcc ("perf symbols: Fix issue with binaries using 16-bytes buildids (v2)")
Reported-by: Will Ochowicz <[email protected]>
Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Will Ochowicz <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/lkml/CWLP265MB49702F7BA3D6D8F13E4B1A719C649@CWLP265MB4970.GBRP265.PROD.OUTLOOK.COM/T/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It should mention scandirat() instead of scandir().
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It should free entries (not only the array) filled by scandirat()
after use.
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
It seems BPF CO-RE reloc doesn't work well with the pattern that gets
the field-offset only. Use offsetof() to make it explicit so that
the compiler would generate the correct code.
Fixes: 0c1228486befa3d6 ("perf lock contention: Support pre-5.14 kernels")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: [email protected]
Co-developed-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The BPF CO-RE's ignore suffix rule requires three underscores.
Otherwise it'd fail like below:
$ sudo perf lock contention -ab
libbpf: prog 'collect_lock_syms': BPF program load failed: Invalid argument
libbpf: prog 'collect_lock_syms': -- BEGIN PROG LOAD LOG --
reg type unsupported for arg#0 function collect_lock_syms#380
; int BPF_PROG(collect_lock_syms)
0: (b7) r6 = 0 ; R6_w=0
1: (b7) r7 = 0 ; R7_w=0
2: (b7) r9 = 1 ; R9_w=1
3: <invalid CO-RE relocation>
failed to resolve CO-RE relocation <byte_off> [381] struct rq__new.__lock (0:0 @ offset 0)
Fixes: 0c1228486befa3d6 ("perf lock contention: Support pre-5.14 kernels")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Checking the config via ifdef incorrectly compiles out the report
functions when CRYPTO_USER is set to =m. Fix it by using IS_ENABLED()
instead.
Fixes: c0f9e01dd266 ("crypto: api - Check CRYPTO_USER instead of NET for report")
Signed-off-by: Ondrej Mosnacek <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
|
|
The recent fix to ensure atomicity of lookup and allocation inadvertently
broke the pool refill mechanism.
Prior to that change debug_objects_activate() and debug_objecs_assert_init()
invoked debug_objecs_init() to set up the tracking object for statically
initialized objects. That's not longer the case and debug_objecs_init() is
now the only place which does pool refills.
Depending on the number of statically initialized objects this can be
enough to actually deplete the pool, which was observed by Ido via a
debugobjects OOM warning.
Restore the old behaviour by adding explicit refill opportunities to
debug_objects_activate() and debug_objecs_assert_init().
Fixes: 63a759694eed ("debugobject: Prevent init race with static objects")
Reported-by: Ido Schimmel <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/871qk05a9d.ffs@tglx
|
|
Automation complains:
warning: symbol '__pcpu_scope_misaligned_access_speed' was not declared. Should it be static?
cpufeature.c doesn't actually include the header of the same name, as it
had not previously used anything from it.
The per-cpu variable is declared there, so include it to silence the
complaints.
Fixes: 62a31d6e38bd ("RISC-V: hwprobe: Support probing of misaligned access performance")
Signed-off-by: Conor Dooley <[email protected]>
Reviewed-by: Evan Green <[email protected]>
Link: https://lore.kernel.org/r/20230420-wound-gizzard-2b2b589d9bea@spud
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a new driver for Novatek touch controllers
- a new driver for power button for NXP BBNSM
- a skeleton KUnit tests for the input core
- improvements to Xpad game controller driver to support more devices
- improvements to edt-ft5x06, hideep and other drivers
* tag 'input-for-v6.4-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (42 commits)
Revert "Input: xpad - fix support for some third-party controllers"
dt-bindings: input: pwm-beeper: convert to dt schema
Input: xpad - fix PowerA EnWired Controller guide button
Input: xpad - add constants for GIP interface numbers
Input: synaptics-rmi4 - fix function name in kerneldoc
Input: raspberrypi-ts - fix refcount leak in rpi_ts_probe
Input: edt-ft5x06 - select REGMAP_I2C
Input: melfas_mip4 - report palm touches
Input: cma3000_d0x - remove unneeded code
Input: edt-ft5x06 - calculate points data length only once
Input: edt-ft5x06 - unify the crc check
Input: edt-ft5x06 - convert to use regmap API
Input: edt-ft5x06 - don't print error messages with dev_dbg()
Input: edt-ft5x06 - remove code duplication
Input: edt-ft5x06 - don't recalculate the CRC
Input: edt-ft5x06 - add spaces to ensure format specification
Input: edt-ft5x06 - remove unnecessary blank lines
Input: edt-ft5x06 - fix indentation
Input: tsc2007 - enable cansleep pendown GPIO
Input: Add KUnit tests for some of the input core helper functions
...
|
|
The recent introduction of relocatable kernels prepared the move of
.rela.dyn to the init section, but actually forgot to do so, so do it
here.
Before this patch: "Freeing unused kernel image (initmem) memory: 2592K"
After this patch: "Freeing unused kernel image (initmem) memory: 6288K"
The difference corresponds to the size of the .rela.dyn section:
"[42] .rela.dyn RELA ffffffff8197e798 0127f798
000000000039c660 0000000000000018 A 47 0 8"
Fixes: 559d1e45a16d ("riscv: Use --emit-relocs in order to move .rela.dyn in init")
Signed-off-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The dt-binding was defined before the extraction of csr access and
fence.i into their own extensions, and thus the presence of the I
base extension implies Zicsr and Zifencei.
There's no harm in adding them obviously, but for backwards
compatibility with DTs that existed prior to that extraction, software
is unable to differentiate between "i" and "i_zicsr_zifencei" without
any further information.
Signed-off-by: Conor Dooley <[email protected]>
Acked-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/20230427-fence-blurred-c92fb69d4137@wendy
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
syzbot reported this warning from the faux inodegc shrinker that tries
to kick off inodegc work:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444
RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444
Call Trace:
__queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672
mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746
xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline]
xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191
do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853
shrink_slab+0x175/0x660 mm/vmscan.c:1013
shrink_one+0x502/0x810 mm/vmscan.c:5343
shrink_many mm/vmscan.c:5394 [inline]
lru_gen_shrink_node mm/vmscan.c:5511 [inline]
shrink_node+0x2064/0x35f0 mm/vmscan.c:6459
kswapd_shrink_node mm/vmscan.c:7262 [inline]
balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452
kswapd+0x677/0xd60 mm/vmscan.c:7712
kthread+0x2e8/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
This warning corresponds to this code in __queue_work:
/*
* For a draining wq, only works from the same workqueue are
* allowed. The __WQ_DESTROYING helps to spot the issue that
* queues a new work item to a wq after destroy_workqueue(wq).
*/
if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) &&
WARN_ON_ONCE(!is_chained_work(wq))))
return;
For this to trip, we must have a thread draining the inodedgc workqueue
and a second thread trying to queue inodegc work to that workqueue.
This can happen if freezing or a ro remount race with reclaim poking our
faux inodegc shrinker and another thread dropping an unlinked O_RDONLY
file:
Thread 0 Thread 1 Thread 2
xfs_inodegc_stop
xfs_inodegc_shrinker_scan
xfs_is_inodegc_enabled
<yes, will continue>
xfs_clear_inodegc_enabled
xfs_inodegc_queue_all
<list empty, do not queue inodegc worker>
xfs_inodegc_queue
<add to list>
xfs_is_inodegc_enabled
<no, returns>
drain_workqueue
<set WQ_DRAINING>
llist_empty
<no, will queue list>
mod_delayed_work_on(..., 0)
__queue_work
<sees WQ_DRAINING, kaboom>
In other words, everything between the access to inodegc_enabled state
and the decision to poke the inodegc workqueue requires some kind of
coordination to avoid the WQ_DRAINING state. We could perhaps introduce
a lock here, but we could also try to eliminate WQ_DRAINING from the
picture.
We could replace the drain_workqueue call with a loop that flushes the
workqueue and queues workers as long as there is at least one inode
present in the per-cpu inodegc llists. We've disabled inodegc at this
point, so we know that the number of queued inodes will eventually hit
zero as long as xfs_inodegc_start cannot reactivate the workers.
There are four callers of xfs_inodegc_start. Three of them come from the
VFS with s_umount held: filesystem thawing, failed filesystem freezing,
and the rw remount transition. The fourth caller is mounting rw (no
remount or freezing possible).
There are three callers ofs xfs_inodegc_stop. One is unmounting (no
remount or thaw possible). Two of them come from the VFS with s_umount
held: fs freezing and ro remount transition.
Hence, it is correct to replace the drain_workqueue call with a loop
that drains the inodegc llists.
Fixes: 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
|