Age | Commit message (Collapse) | Author | Files | Lines |
|
The SYS_FIELD_ macros are not safe for assembly contexts, move them inside
the guarded section.
Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
The SYS_FIELD_ macros in sysreg.h use definitions from bitfield.h but there
is no direct inclusion of it, add one to ensure that sysreg.h is directly
usable.
Signed-off-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
fw_level
Though acpi_find_last_cache_level() always returned signed value and the
document states it will return any errors caused by lack of a PPTT table,
it never returned negative values before.
Commit 0c80f9e165f8 ("ACPI: PPTT: Leave the table mapped for the runtime usage")
however changed it by returning -ENOENT if no PPTT was found. The value
returned from acpi_find_last_cache_level() is then assigned to unsigned
fw_level.
It will result in the number of cache leaves calculated incorrectly as
a huge value which will then cause the following warning from __alloc_pages
as the order would be great than MAX_ORDER because of incorrect and huge
cache leaves value.
| WARNING: CPU: 0 PID: 1 at mm/page_alloc.c:5407 __alloc_pages+0x74/0x314
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-10393-g7c2a8d3ac4c0 #73
| pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __alloc_pages+0x74/0x314
| lr : alloc_pages+0xe8/0x318
| Call trace:
| __alloc_pages+0x74/0x314
| alloc_pages+0xe8/0x318
| kmalloc_order_trace+0x68/0x1dc
| __kmalloc+0x240/0x338
| detect_cache_attributes+0xe0/0x56c
| update_siblings_masks+0x38/0x284
| store_cpu_topology+0x78/0x84
| smp_prepare_cpus+0x48/0x134
| kernel_init_freeable+0xc4/0x14c
| kernel_init+0x2c/0x1b4
| ret_from_fork+0x10/0x20
Fix the same by changing fw_level to be signed integer and return the
error from init_cache_level() early in case of error.
Reported-and-Tested-by: Bruno Goncalves <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
The AMU counter AMEVCNTR01 (constant counter) should increment at the same
rate as the system counter. On affected Cortex-A510 cores, AMEVCNTR01
increments incorrectly giving a significantly higher output value. This
results in inaccurate task scheduler utilization tracking and incorrect
feedback on CPU frequency.
Work around this problem by returning 0 when reading the affected counter
in key locations that results in disabling all users of this counter from
using it either for frequency invariance or as FFH reference counter. This
effect is the same to firmware disabling affected counters.
Details on how the two features are affected by this erratum:
- AMU counters will not be used for frequency invariance for affected
CPUs and CPUs in the same cpufreq policy. AMUs can still be used for
frequency invariance for unaffected CPUs in the system. Although
unlikely, if no alternative method can be found to support frequency
invariance for affected CPUs (cpufreq based or solution based on
platform counters) frequency invariance will be disabled. Please check
the chapter on frequency invariance at
Documentation/scheduler/sched-capacity.rst for details of its effect.
- Given that FFH can be used to fetch either the core or constant counter
values, restrictions are lifted regarding any of these counters
returning a valid (!0) value. Therefore FFH is considered supported
if there is a least one CPU that support AMUs, independent of any
counters being disabled or affected by this erratum. Clarifying
comments are now added to the cpc_ffh_supported(), cpu_read_constcnt()
and cpu_read_corecnt() functions.
The above is achieved through adding a new erratum: ARM64_ERRATUM_2457168.
Signed-off-by: Ionela Voinescu <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: James Morse <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
On arm64, "rodata=full" has been suppored (but not documented) since
commit:
c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")
As it's necessary to determine the rodata configuration early during
boot, arm64 has an early_param() handler for this, whereas init/main.c
has a __setup() handler which is run later.
Unfortunately, this split meant that since commit:
f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions")
... passing "rodata=full" would result in a spurious warning from the
__setup() handler (though RO permissions would be configured
appropriately).
Further, "rodata=full" has been broken since commit:
0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
... which caused strtobool() to parse "full" as false (in addition to
many other values not documented for the "rodata=" kernel parameter.
This patch fixes this breakage by:
* Moving the core parameter parser to an __early_param(), such that it
is available early.
* Adding an (optional) arch hook which arm64 can use to parse "full".
* Updating the documentation to mention that "full" is valid for arm64.
* Having the core parameter parser handle "on" and "off" explicitly,
such that any undocumented values (e.g. typos such as "ful") are
reported as errors rather than being silently accepted.
Note that __setup() and early_param() have opposite conventions for
their return values, where __setup() uses 1 to indicate a parameter was
handled and early_param() uses 0 to indicate a parameter was handled.
Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions")
Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
Signed-off-by: Mark Rutland <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Jagdish Gediya <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Will Deacon <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Replace wrong 'FIQ EL1h' comment with 'FIQ EL1t'.
Signed-off-by: Kuan-Ying Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
Unify horizontal spacing (remove extra newlines) which
are sensitive to visual presentation by Sphinx.
Fixes: 5e64b862c482 ("arm64/sme: Basic enumeration support")
Signed-off-by: Martin Liska <[email protected]>
Reviewed-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
|
|
smb3 fallocate punch hole was not grabbing the inode or filemap_invalidate
locks so could have race with pagemap reinstantiating the page.
Cc: [email protected]
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
smb3 fallocate zero range was not grabbing the inode or filemap_invalidate
locks so could have race with pagemap reinstantiating the page.
Cc: [email protected]
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Although commit 88f1669019bd ("scsi: sd: Rework asynchronous resume support")
eliminates a delay for some ATA disks after resume, it causes resume of ATA
disks to fail on other setups. See also:
* "Resume process hangs for 5-6 seconds starting sometime in 5.16"
(https://bugzilla.kernel.org/show_bug.cgi?id=215880).
* Geert's regression report
(https://lore.kernel.org/linux-scsi/[email protected]/).
This is what I understand about this issue:
* During resume, ata_port_pm_resume() starts the SCSI error handler. This
changes the SCSI host state into SHOST_RECOVERY and causes
scsi_queue_rq() to return BLK_STS_RESOURCE.
* sd_resume() calls sd_start_stop_device() for ATA devices. That function
in turn calls sd_submit_start() which tries to submit a START STOP UNIT
command. That command can only be submitted after the SCSI error handler
has changed the SCSI host state back to SHOST_RUNNING.
* The SCSI error handler runs on its own thread and calls
schedule_work(&(ap->scsi_rescan_task)). That causes
ata_scsi_dev_rescan() to be called from the context of a kernel
workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - maybe
because all available tags have been allocated by sd_submit_start()
calls (this is a guess).
Link: https://lore.kernel.org/r/[email protected]
Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support")
Cc: Damien Le Moal <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: [email protected]
Reported-by: Geert Uytterhoeven <[email protected]>
Reported-by: [email protected]
Reported-and-tested-by: Vlastimil Babka <[email protected]>
Tested-by: John Garry <[email protected]>
Tested-by: Hans de Goede <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Jonathan Toppins says:
====================
bonding: 802.3ad: fix no transmission of LACPDUs
Configuring a bond in a specific order can leave the bond in a state
where it never transmits LACPDUs.
The first patch adds some kselftest infrastructure and the reproducer
that demonstrates the problem. The second patch fixes the issue. The
new third patch makes ad_ticks_per_sec a static const and removes the
passing of this variable via the stack.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The value is only ever set once in bond_3ad_initialize and only ever
read otherwise. There seems to be no reason to set the variable via
bond_3ad_initialize when setting the global variable will do. Change
ad_ticks_per_sec to a const to enforce its read-only usage.
Signed-off-by: Jonathan Toppins <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.
To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.
line 44: ip link add fbond type bond mode 4 miimon 200 \
xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr
line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
bond->dev->dev_addr = new_mac
line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
params.ad_actor_system = 0
call stack:
bond_option_ad_actor_sys_prio()
-> bond_3ad_update_ad_actor_settings()
-> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
-> ad.system.sys_mac_addr = bond->dev->dev_addr; because
params.ad_actor_system == 0
results:
ad.system.sys_mac_addr = bond->dev->dev_addr
line 60: ip link set veth1-bond down master fbond
given:
params.ad_actor_system = 0
params.mode = BOND_MODE_8023AD
ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
bond_enslave
-> bond_3ad_initialize(); because first slave
-> if ad.system.sys_mac_addr != bond->dev->dev_addr
return
results:
Nothing is run in bond_3ad_initialize() because dev_addr equals
sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
never initialized anywhere else.
The if check around the contents of bond_3ad_initialize() is no longer
needed due to commit 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings
changes immediately") which sets ad.system.sys_mac_addr if any one of
the bonding parameters whos set function calls
bond_3ad_update_ad_actor_settings(). This is because if
ad.system.sys_mac_addr is zero it will be set to the current bond mac
address, this causes the if check to never be true.
Fixes: 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately")
Signed-off-by: Jonathan Toppins <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This creates a test collection in drivers/net/bonding for bonding
specific kernel selftests.
The first test is a reproducer that provisions a bond and given the
specific order in how the ip-link(8) commands are issued the bond never
transmits an LACPDU frame on any of its slaves.
Signed-off-by: Jonathan Toppins <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Since priv->rx_mapping[i] is maped in moxart_mac_open(), we
should unmap it from moxart_mac_stop(). Fixes 2 warnings.
1. During error unwinding in moxart_mac_probe(): "goto init_fail;",
then moxart_mac_free_memory() calls dma_unmap_single() with
priv->rx_mapping[i] pointers zeroed.
WARNING: CPU: 0 PID: 1 at kernel/dma/debug.c:963 check_unmap+0x704/0x980
DMA-API: moxart-ethernet 92000000.mac: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=1600 bytes]
CPU: 0 PID: 1 Comm: swapper Not tainted 5.19.0+ #60
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x34/0x44
dump_stack_lvl from __warn+0xbc/0x1f0
__warn from warn_slowpath_fmt+0x94/0xc8
warn_slowpath_fmt from check_unmap+0x704/0x980
check_unmap from debug_dma_unmap_page+0x8c/0x9c
debug_dma_unmap_page from moxart_mac_free_memory+0x3c/0xa8
moxart_mac_free_memory from moxart_mac_probe+0x190/0x218
moxart_mac_probe from platform_probe+0x48/0x88
platform_probe from really_probe+0xc0/0x2e4
2. After commands:
ip link set dev eth0 down
ip link set dev eth0 up
WARNING: CPU: 0 PID: 55 at kernel/dma/debug.c:570 add_dma_entry+0x204/0x2ec
DMA-API: moxart-ethernet 92000000.mac: cacheline tracking EEXIST, overlapping mappings aren't supported
CPU: 0 PID: 55 Comm: ip Not tainted 5.19.0+ #57
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x34/0x44
dump_stack_lvl from __warn+0xbc/0x1f0
__warn from warn_slowpath_fmt+0x94/0xc8
warn_slowpath_fmt from add_dma_entry+0x204/0x2ec
add_dma_entry from dma_map_page_attrs+0x110/0x328
dma_map_page_attrs from moxart_mac_open+0x134/0x320
moxart_mac_open from __dev_open+0x11c/0x1ec
__dev_open from __dev_change_flags+0x194/0x22c
__dev_change_flags from dev_change_flags+0x14/0x44
dev_change_flags from devinet_ioctl+0x6d4/0x93c
devinet_ioctl from inet_ioctl+0x1ac/0x25c
v1 -> v2:
Extraneous change removed.
Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Sergei Antonov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
For some MAC drivers, they set the mac_managed_pm to true in its
->ndo_open() callback. So before the mac_managed_pm is set to true,
we still want to leverage the mdio_bus_phy_suspend()/resume() for
the phy device suspend and resume. In this case, the phy device is
in PHY_READY, and we shouldn't warn about this. It also seems that
the check of mac_managed_pm in WARN_ON is redundant since we already
check this in the entry of mdio_bus_phy_resume(), so drop it.
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Xiaolei Wang <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In ipa_smem_init(), a Qualcomm SMEM region is allocated (if needed)
and then its virtual address is fetched using qcom_smem_get(). The
physical address associated with that region is also fetched.
The physical address is adjusted so that it is page-aligned, and an
attempt is made to update the size of the region to compensate for
any non-zero adjustment.
But that adjustment isn't done properly. The physical address is
aligned twice, and as a result the size is never actually adjusted.
Fix this by *not* aligning the "addr" local variable, and instead
making the "phys" local variable be the adjusted "addr" value.
Fixes: a0036bb413d5b ("net: ipa: define SMEM memory region for IPA")
Signed-off-by: Alex Elder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
DSA has multiple ways of specifying a MAC connection to an internal PHY.
One requires a DT description like this:
port@0 {
reg = <0>;
phy-handle = <&internal_phy>;
phy-mode = "internal";
};
(which is IMO the recommended approach, as it is the clearest
description)
but it is also possible to leave the specification as just:
port@0 {
reg = <0>;
}
and if the driver implements ds->ops->phy_read and ds->ops->phy_write,
the DSA framework "knows" it should create a ds->slave_mii_bus, and it
should connect to a non-OF-based internal PHY on this MDIO bus, at an
MDIO address equal to the port address.
There is also an intermediary way of describing things:
port@0 {
reg = <0>;
phy-handle = <&internal_phy>;
};
In case 2, DSA calls phylink_connect_phy() and in case 3, it calls
phylink_of_phy_connect(). In both cases, phylink_create() has been
called with a phy_interface_t of PHY_INTERFACE_MODE_NA, and in both
cases, PHY_INTERFACE_MODE_NA is translated into phy->interface.
It is important to note that phy_device_create() initializes
dev->interface = PHY_INTERFACE_MODE_GMII, and so, when we use
phylink_create(PHY_INTERFACE_MODE_NA), no one will override this, and we
will end up with a PHY_INTERFACE_MODE_GMII interface inherited from the
PHY.
All this means that in order to maintain compatibility with device tree
blobs where the phy-mode property is missing, we need to allow the
"gmii" phy-mode and treat it as "internal".
Fixes: 2c709e0bdad4 ("net: dsa: microchip: ksz8795: add phylink support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216320
Reported-by: Craig McQueen <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Alvin Šipraga <[email protected]>
Tested-by: Rasmus Villemoes <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
|
|
Audit_alloc_mark() assign pathname to audit_mark->path, on error path
from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory
of audit_mark->path, but the caller of audit_alloc_mark will free
the pathname again, so there will be double free problem.
Fix this by resetting audit_mark->path to NULL pointer on error path
from fsnotify_add_inode_mark().
Cc: [email protected]
Fixes: 7b1293234084d ("fsnotify: Add group pointer in fsnotify_init_mark()")
Signed-off-by: Gaosheng Cui <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
|
|
Unlock before returning if mlx5_device_enable_sriov() fails.
Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Call mlx5e_fs_vlan_free(fs) before kvfree(fs).
Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Use the list_for_each_entry_safe() macro to prevent dereferencing "obj"
after it has been freed.
Fixes: c4dfe704f53f ("net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Unlock before returning on this error path.
Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The cited commit reintroduced the ability to set hw-tc-offload
in switchdev mode by reusing NIC mode calls without modifying it
to support both modes, this can cause an illegal memory access
when trying to turn hw-tc-offload off.
Fix this by using the right TC_FLAG when checking if tc rules
are installed while disabling hw-tc-offload.
Fixes: d3cbd4254df8 ("net/mlx5e: Add ndo_set_feature for uplink representor")
Signed-off-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
There is a missing policer validation when offloading police action
with tc action api. Add it.
Fixes: 7d1a5ce46e47 ("net/mlx5e: TC, Support tc action api for police")
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Driver caches packet merge type in mlx5e_params instance which must be
in perfect sync with the netdev_feature's bit.
Prior to this patch, in certain conditions (*) LRO state was set in
mlx5e_params, while netdev_feature's bit was off. Causing the LRO to
be applied on the RQs (HW level).
(*) This can happen only on profile init (mlx5e_build_nic_params()),
when RQ expect non-linear SKB and PCI is fast enough in comparison to
link width.
Solution: remove setting of packet merge type from
mlx5e_build_nic_params() as netdev features are not updated.
Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Reviewed-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add a lock_class_key per mlx5 device to avoid a false positive
"possible circular locking dependency" warning by lockdep, on flows
which lock more than one mlx5 device, such as adding SF.
kernel log:
======================================================
WARNING: possible circular locking dependency detected
5.19.0-rc8+ #2 Not tainted
------------------------------------------------------
kworker/u20:0/8 is trying to acquire lock:
ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]
but task is already holding lock:
ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(¬ifier->n_head)->rwsem){++++}-{3:3}:
down_write+0x90/0x150
blocking_notifier_chain_register+0x53/0xa0
mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
mlx5_init_one+0x261/0x490 [mlx5_core]
probe_one+0x430/0x680 [mlx5_core]
local_pci_probe+0xd6/0x170
work_for_cpu_fn+0x4e/0xa0
process_one_work+0x7c2/0x1340
worker_thread+0x6f6/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
-> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
__lock_acquire+0x2fc7/0x6720
lock_acquire+0x1c1/0x550
__mutex_lock+0x12c/0x14b0
mlx5_init_one+0x2e/0x490 [mlx5_core]
mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
auxiliary_bus_probe+0x9d/0xe0
really_probe+0x1e0/0xaa0
__driver_probe_device+0x219/0x480
driver_probe_device+0x49/0x130
__device_attach_driver+0x1b8/0x280
bus_for_each_drv+0x123/0x1a0
__device_attach+0x1a3/0x460
bus_probe_device+0x1a2/0x260
device_add+0x9b1/0x1b40
__auxiliary_device_add+0x88/0xc0
mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
blocking_notifier_call_chain+0xd5/0x130
mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
process_one_work+0x7c2/0x1340
worker_thread+0x59d/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(¬ifier->n_head)->rwsem);
lock(&dev->intf_state_mutex);
lock(&(¬ifier->n_head)->rwsem);
lock(&dev->intf_state_mutex);
*** DEADLOCK ***
4 locks held by kworker/u20:0/8:
#0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
#1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
#2: ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
#3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460
stack backtrace:
CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
check_noncircular+0x278/0x300
? print_circular_bug+0x460/0x460
? lock_chain_count+0x20/0x20
? register_lock_class+0x1880/0x1880
__lock_acquire+0x2fc7/0x6720
? register_lock_class+0x1880/0x1880
? register_lock_class+0x1880/0x1880
lock_acquire+0x1c1/0x550
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
__mutex_lock+0x12c/0x14b0
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? _raw_read_unlock+0x1f/0x30
? mutex_lock_io_nested+0x1320/0x1320
? __ioremap_caller.constprop.0+0x306/0x490
? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
? iounmap+0x160/0x160
mlx5_init_one+0x2e/0x490 [mlx5_core]
mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
auxiliary_bus_probe+0x9d/0xe0
really_probe+0x1e0/0xaa0
__driver_probe_device+0x219/0x480
? auxiliary_match_id+0xe9/0x140
driver_probe_device+0x49/0x130
__device_attach_driver+0x1b8/0x280
? driver_allows_async_probing+0x140/0x140
bus_for_each_drv+0x123/0x1a0
? bus_for_each_dev+0x1a0/0x1a0
? lockdep_hardirqs_on_prepare+0x286/0x400
? trace_hardirqs_on+0x2d/0x100
__device_attach+0x1a3/0x460
? device_driver_attach+0x1e0/0x1e0
? kobject_uevent_env+0x22d/0xf10
bus_probe_device+0x1a2/0x260
device_add+0x9b1/0x1b40
? dev_set_name+0xab/0xe0
? __fw_devlink_link_to_suppliers+0x260/0x260
? memset+0x20/0x40
? lockdep_init_map_type+0x21a/0x7d0
__auxiliary_device_add+0x88/0xc0
? auxiliary_device_init+0x86/0xa0
mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
blocking_notifier_call_chain+0xd5/0x130
mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
? lock_downgrade+0x6e0/0x6e0
? lockdep_hardirqs_on_prepare+0x286/0x400
process_one_work+0x7c2/0x1340
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0x230/0x230
? rwlock_bug.part.0+0x90/0x90
worker_thread+0x59d/0xec0
? process_one_work+0x1340/0x1340
kthread+0x28f/0x330
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support")
Signed-off-by: Moshe Shemesh <[email protected]>
Reviewed-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When the driver unloads, give/reclaim_pages may fail as PF driver in
teardown flow, current code will lead to the following kernel log print
'failed reclaiming pages: err 0'.
Fix it to get same behavior as before the cited commits,
by calling mlx5_cmd_check before handling error state.
mlx5_cmd_check will verify if the returned error is an actual error
needed to be handled by the driver or not and will return an
appropriate value.
Fixes: 8d564292a166 ("net/mlx5: Remove redundant error on reclaim pages")
Fixes: 4dac2f10ada0 ("net/mlx5: Remove redundant notify fail on give pages")
Signed-off-by: Roy Novich <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The lag_lock is taken from both process and softirq contexts which results
lockdep warning[0] about potential deadlock. However, just disabling
softirqs by using *_bh spinlock API is not enough since it will cause
warning in some contexts where the lock is obtained with hard irqs
disabled. To fix the issue save current irq state, disable them before
obtaining the lock an re-enable irqs from saved state after releasing it.
[0]:
[Sun Aug 7 13:12:29 2022] ================================
[Sun Aug 7 13:12:29 2022] WARNING: inconsistent lock state
[Sun Aug 7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted
[Sun Aug 7 13:12:29 2022] --------------------------------
[Sun Aug 7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[Sun Aug 7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[Sun Aug 7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at:
[Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40
[Sun Aug 7 13:12:29 2022] mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_nic_enable+0x114/0x470 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_resume+0x105/0x160 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5e_probe+0xac3/0x14f0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] auxiliary_bus_probe+0x9d/0xe0
[Sun Aug 7 13:12:29 2022] really_probe+0x1e0/0xaa0
[Sun Aug 7 13:12:29 2022] __driver_probe_device+0x219/0x480
[Sun Aug 7 13:12:29 2022] driver_probe_device+0x49/0x130
[Sun Aug 7 13:12:29 2022] __driver_attach+0x1e4/0x4d0
[Sun Aug 7 13:12:29 2022] bus_for_each_dev+0x11e/0x1a0
[Sun Aug 7 13:12:29 2022] bus_add_driver+0x3f4/0x5a0
[Sun Aug 7 13:12:29 2022] driver_register+0x20f/0x390
[Sun Aug 7 13:12:29 2022] __auxiliary_driver_register+0x14e/0x260
[Sun Aug 7 13:12:29 2022] mlx5e_init+0x38/0x90 [mlx5_core]
[Sun Aug 7 13:12:29 2022] vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb]
[Sun Aug 7 13:12:29 2022] do_one_initcall+0xc4/0x400
[Sun Aug 7 13:12:29 2022] do_init_module+0x18a/0x620
[Sun Aug 7 13:12:29 2022] load_module+0x563a/0x7040
[Sun Aug 7 13:12:29 2022] __do_sys_finit_module+0x122/0x1d0
[Sun Aug 7 13:12:29 2022] do_syscall_64+0x3d/0x90
[Sun Aug 7 13:12:29 2022] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[Sun Aug 7 13:12:29 2022] irq event stamp: 3596508
[Sun Aug 7 13:12:29 2022] hardirqs last enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100
[Sun Aug 7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100
[Sun Aug 7 13:12:29 2022] softirqs last enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022]
other info that might help us debug this:
[Sun Aug 7 13:12:29 2022] Possible unsafe locking scenario:
[Sun Aug 7 13:12:29 2022] CPU0
[Sun Aug 7 13:12:29 2022] ----
[Sun Aug 7 13:12:29 2022] lock(lag_lock);
[Sun Aug 7 13:12:29 2022] <Interrupt>
[Sun Aug 7 13:12:29 2022] lock(lag_lock);
[Sun Aug 7 13:12:29 2022]
*** DEADLOCK ***
[Sun Aug 7 13:12:29 2022] 4 locks held by swapper/0/0:
[Sun Aug 7 13:12:29 2022] #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60
[Sun Aug 7 13:12:29 2022] #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570
[Sun Aug 7 13:12:29 2022] #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0
[Sun Aug 7 13:12:29 2022]
stack backtrace:
[Sun Aug 7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1
[Sun Aug 7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[Sun Aug 7 13:12:29 2022] Call Trace:
[Sun Aug 7 13:12:29 2022] <IRQ>
[Sun Aug 7 13:12:29 2022] dump_stack_lvl+0x57/0x7d
[Sun Aug 7 13:12:29 2022] mark_lock.part.0.cold+0x5f/0x92
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? unwind_next_frame+0x1c4/0x1b50
[Sun Aug 7 13:12:29 2022] ? secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? stack_access_ok+0x1d0/0x1d0
[Sun Aug 7 13:12:29 2022] ? start_kernel+0x3a7/0x3c5
[Sun Aug 7 13:12:29 2022] __lock_acquire+0x1260/0x6720
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? mark_lock.part.0+0xed/0x3060
[Sun Aug 7 13:12:29 2022] ? stack_trace_save+0x91/0xc0
[Sun Aug 7 13:12:29 2022] lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x400/0x400
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] _raw_spin_lock+0x2c/0x40
[Sun Aug 7 13:12:29 2022] ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
[Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] atomic_notifier_call_chain+0xd7/0x1d0
[Sun Aug 7 13:12:29 2022] br_switchdev_fdb_notify+0xea/0x100
[Sun Aug 7 13:12:29 2022] ? br_switchdev_set_port_flag+0x310/0x310
[Sun Aug 7 13:12:29 2022] fdb_notify+0x11b/0x150
[Sun Aug 7 13:12:29 2022] br_fdb_update+0x34c/0x570
[Sun Aug 7 13:12:29 2022] ? lock_chain_count+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? br_fdb_add_local+0x50/0x50
[Sun Aug 7 13:12:29 2022] ? br_allowed_ingress+0x5f/0x1070
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] br_handle_frame_finish+0x786/0x18e0
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] ? sctp_inet_bind_verify+0x4d/0x190
[Sun Aug 7 13:12:29 2022] ? xlog_unpack_data+0x2e0/0x310
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] br_nf_hook_thresh+0x227/0x380 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? setup_pre_routing+0x460/0x460 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter]
[Sun Aug 7 13:12:29 2022] br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter]
[Sun Aug 7 13:12:29 2022] ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter]
[Sun Aug 7 13:12:29 2022] br_handle_frame+0x8a9/0x1270
[Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? br_handle_local_finish+0x20/0x20
[Sun Aug 7 13:12:29 2022] ? bond_handle_frame+0xf9/0xac0 [bonding]
[Sun Aug 7 13:12:29 2022] ? br_handle_frame_finish+0x18e0/0x18e0
[Sun Aug 7 13:12:29 2022] __netif_receive_skb_core+0x7c0/0x2c70
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] ? generic_xdp_tx+0x5b0/0x5b0
[Sun Aug 7 13:12:29 2022] ? __lock_acquire+0xd6f/0x6720
[Sun Aug 7 13:12:29 2022] ? register_lock_class+0x1880/0x1880
[Sun Aug 7 13:12:29 2022] ? check_chain_key+0x24a/0x580
[Sun Aug 7 13:12:29 2022] __netif_receive_skb_list_core+0x2d7/0x8a0
[Sun Aug 7 13:12:29 2022] ? lock_acquire+0x1c1/0x550
[Sun Aug 7 13:12:29 2022] ? process_backlog+0x960/0x960
[Sun Aug 7 13:12:29 2022] ? lockdep_hardirqs_on_prepare+0x129/0x400
[Sun Aug 7 13:12:29 2022] ? kvm_clock_get_cycles+0x14/0x20
[Sun Aug 7 13:12:29 2022] netif_receive_skb_list_internal+0x5f4/0xd60
[Sun Aug 7 13:12:29 2022] ? do_xdp_generic+0x150/0x150
[Sun Aug 7 13:12:29 2022] ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core]
[Sun Aug 7 13:12:29 2022] napi_complete_done+0x188/0x710
[Sun Aug 7 13:12:29 2022] mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
[Sun Aug 7 13:12:29 2022] ? __queue_work+0x53c/0xeb0
[Sun Aug 7 13:12:29 2022] __napi_poll+0x9f/0x540
[Sun Aug 7 13:12:29 2022] net_rx_action+0x420/0xb70
[Sun Aug 7 13:12:29 2022] ? napi_threaded_poll+0x470/0x470
[Sun Aug 7 13:12:29 2022] ? __common_interrupt+0x79/0x1a0
[Sun Aug 7 13:12:29 2022] __do_softirq+0x271/0x92c
[Sun Aug 7 13:12:29 2022] irq_exit_rcu+0x11a/0x170
[Sun Aug 7 13:12:29 2022] common_interrupt+0x7d/0xa0
[Sun Aug 7 13:12:29 2022] </IRQ>
[Sun Aug 7 13:12:29 2022] <TASK>
[Sun Aug 7 13:12:29 2022] asm_common_interrupt+0x22/0x40
[Sun Aug 7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60
[Sun Aug 7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00
[Sun Aug 7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242
[Sun Aug 7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc
[Sun Aug 7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac
[Sun Aug 7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3
[Sun Aug 7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000
[Sun Aug 7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000
[Sun Aug 7 13:12:29 2022] ? default_idle_call+0xcc/0x460
[Sun Aug 7 13:12:29 2022] default_idle_call+0xec/0x460
[Sun Aug 7 13:12:29 2022] do_idle+0x394/0x450
[Sun Aug 7 13:12:29 2022] ? arch_cpu_idle_exit+0x40/0x40
[Sun Aug 7 13:12:29 2022] cpu_startup_entry+0x19/0x20
[Sun Aug 7 13:12:29 2022] rest_init+0x156/0x250
[Sun Aug 7 13:12:29 2022] arch_call_rest_init+0xf/0x15
[Sun Aug 7 13:12:29 2022] start_kernel+0x3a7/0x3c5
[Sun Aug 7 13:12:29 2022] secondary_startup_64_no_verify+0xcd/0xdb
[Sun Aug 7 13:12:29 2022] </TASK>
Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Make sure to modify the rule for uplink forwarding only for the case
where destination vport number is MLX5_VPORT_UPLINK.
Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Only set MLX5_LAG_FLAG_NDEVS_READY if both netdevices are registered.
Doing so guarantees that both ldev->pf[MLX5_LAG_P0].dev and
ldev->pf[MLX5_LAG_P1].dev have valid pointers when
MLX5_LAG_FLAG_NDEVS_READY is set.
The core issue is asymmetry in setting MLX5_LAG_FLAG_NDEVS_READY and
clearing it. Setting it is done wrongly when both
ldev->pf[MLX5_LAG_P0].dev and ldev->pf[MLX5_LAG_P1].dev are set;
clearing it is done right when either of ldev->pf[i].netdev is cleared.
Consider the following scenario:
1. PF0 loads and sets ldev->pf[MLX5_LAG_P0].dev to a valid pointer
2. PF1 loads and sets both ldev->pf[MLX5_LAG_P1].dev and
ldev->pf[MLX5_LAG_P1].netdev with valid pointers. This results in
MLX5_LAG_FLAG_NDEVS_READY is set.
3. PF0 is unloaded before setting dev->pf[MLX5_LAG_P0].netdev.
MLX5_LAG_FLAG_NDEVS_READY remains set.
Further execution of mlx5_do_bond() will result in null pointer
dereference when calling mlx5_lag_is_multipath()
This patch fixes the following call trace actually encountered:
[ 1293.475195] BUG: kernel NULL pointer dereference, address: 00000000000009a8
[ 1293.478756] #PF: supervisor read access in kernel mode
[ 1293.481320] #PF: error_code(0x0000) - not-present page
[ 1293.483686] PGD 0 P4D 0
[ 1293.484434] Oops: 0000 [#1] SMP PTI
[ 1293.485377] CPU: 1 PID: 23690 Comm: kworker/u16:2 Not tainted 5.18.0-rc5_for_upstream_min_debug_2022_05_05_10_13 #1
[ 1293.488039] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1293.490836] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
[ 1293.492448] RIP: 0010:mlx5_lag_is_multipath+0x5/0x50 [mlx5_core]
[ 1293.494044] Code: e8 70 40 ff e0 48 8b 14 24 48 83 05 5c 1a 1b 00 01 e9 19 ff ff ff 48 83 05 47 1a 1b 00 01 eb d7 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 a8 09 00 00 48 85 c0 74 26 48 83 05 a7 1b 1b 00 01 41 b8
[ 1293.498673] RSP: 0018:ffff88811b2fbe40 EFLAGS: 00010202
[ 1293.500152] RAX: ffff88818a94e1c0 RBX: ffff888165eca6c0 RCX: 0000000000000000
[ 1293.501841] RDX: 0000000000000001 RSI: ffff88818a94e1c0 RDI: 0000000000000000
[ 1293.503585] RBP: 0000000000000000 R08: ffff888119886740 R09: ffff888165eca73c
[ 1293.505286] R10: 0000000000000018 R11: 0000000000000018 R12: ffff88818a94e1c0
[ 1293.506979] R13: ffff888112729800 R14: 0000000000000000 R15: ffff888112729858
[ 1293.508753] FS: 0000000000000000(0000) GS:ffff88852cc40000(0000) knlGS:0000000000000000
[ 1293.510782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1293.512265] CR2: 00000000000009a8 CR3: 00000001032d4002 CR4: 0000000000370ea0
[ 1293.514001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1293.515806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 8a66e4585979 ("net/mlx5: Change ownership model for lag")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When querying mlx5 non-uplink representors capabilities with ethtool
rx-vlan-offload is marked as "off [fixed]". However, it is actually always
enabled because mlx5e_params->vlan_strip_disable is 0 by default when
initializing struct mlx5e_params instance. Fix the issue by explicitly
setting the vlan_strip_disable to 'true' for non-uplink representors.
Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Pull NFS client fixes from Trond Myklebust:
"Stable fixes:
- NFS: Fix another fsync() issue after a server reboot
Bugfixes:
- NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
- NFS: Fix missing unlock in nfs_unlink()
- Add sanity checking of the file type used by __nfs42_ssc_open
- Fix a case where we're failing to set task->tk_rpc_status
Cleanups:
- Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the
fsync() fix"
* tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: RPC level errors should set task->tk_rpc_status
NFSv4.2 fix problems with __nfs42_ssc_open
NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES
NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds
NFS: Fix another fsync() issue after a server reboot
NFS: Fix missing unlock in nfs_unlink()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull idmapping fixes from Christian Brauner:
- Since Seth joined as co-maintainer for idmapped mounts we decided to
use a shared git tree. Konstantin suggested we use vfs/idmapping.git
on kernel.org under the vfs/ namespace. So this updates the tree in
the maintainers file.
- Ensure that POSIX ACLs checking, getting, and setting works correctly
for filesystems mountable with a filesystem idmapping that want to
support idmapped mounts.
Since no filesystems mountable with an fs_idmapping do yet support
idmapped mounts there is no problem. But this could change in the
future, so add a check to refuse to create idmapped mounts when the
mounter is not privileged over the mount's idmapping.
- Check that caller is privileged over the idmapping that will be
attached to a mount.
Currently no FS_USERNS_MOUNT filesystems support idmapped mounts,
thus this is not a problem as only CAP_SYS_ADMIN in init_user_ns is
allowed to set up idmapped mounts. But this could change in the
future, so add a check to refuse to create idmapped mounts when the
mounter is not privileged over the mount's idmapping.
- Fix POSIX ACLs for ntfs3. While looking at our current POSIX ACL
handling in the context of some overlayfs work I went through a range
of other filesystems checking how they handle them currently and
encountered a few bugs in ntfs3.
I've sent this some time ago and the fixes haven't been picked up
even though the pull request for other ntfs3 fixes got sent after.
This should really be fixed as right now POSIX ACLs are broken in
certain circumstances for ntfs3.
* tag 'fs.idmapped.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
ntfs: fix acl handling
fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts
MAINTAINERS: update idmapping tree
acl: handle idmapped mounts for idmapped filesystems
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking fix from Jeff Layton:
"Just a single patch for a bugfix in the flock() codepath, introduced
by a patch that went in recently"
* tag 'filelock-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: Fix dropped call to ->fl_release_private()
|
|
Commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") eradicates
CC_HAS_ASM_GOTO, and in the process also causes the perf tool on x86 to
use asm_volatile_goto when compiling __GEN_RMWcc.
However, asm_volatile_goto is not declared in the perf tool headers,
which causes a compilation error:
In file included from tools/arch/x86/include/asm/atomic.h:7,
from tools/include/asm/atomic.h:6,
from tools/include/linux/atomic.h:5,
from tools/include/linux/refcount.h:41,
from tools/lib/perf/include/internal/cpumap.h:5,
from tools/perf/util/cpumap.h:7,
from tools/perf/util/env.h:7,
from tools/perf/util/header.h:12,
from pmu-events/pmu-events.c:9:
tools/arch/x86/include/asm/atomic.h: In function ‘atomic_dec_and_test’:
tools/arch/x86/include/asm/rmwcc.h:7:2: error: implicit declaration of function ‘asm_volatile_goto’ [-Werror=implicit-function-declaration]
asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \
^~~~~~~~~~~~~~~~~
Define asm_volatile_goto in compiler_types.h if not declared, like the
main kernel header files do.
Fixes: a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO")
Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Ingo Molnar <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Dylan and Jens reported a problem where they had an io_uring test that
was returning short reads, and bisected it to ee5b46a353af ("btrfs:
increase direct io read size limit to 256 sectors").
The root cause is their test was doing larger reads via io_uring with
NOWAIT and async. This was triggering a page fault during the direct
read, however the first page was able to work just fine and thus we
submitted a 4k read for a larger iocb.
Btrfs allows for partial IO's in this case specifically because we don't
allow page faults, and thus we'll attempt to do any io that we can,
submit what we could, come back and fault in the rest of the range and
try to do the remaining IO.
However for !is_sync_kiocb() we'll call ->ki_complete() as soon as the
partial dio is done, which is incorrect. In the sync case we can exit
the iomap code, submit more io's, and return with the amount of IO we
were able to complete successfully.
We were always doing short reads in this case, but for NOWAIT we were
getting saved by the fact that we were limiting direct reads to
sectorsize, and if we were larger than that we would return EAGAIN.
Fix the regression by simply returning EAGAIN in the NOWAIT case with
larger reads, that way io_uring can retry and get the larger IO and have
the fault logic handle everything properly.
This still leaves the AIO short read case, but that existed before this
change. The way to properly fix this would be to handle partial iocb
completions, but that's a lot of work, for now deal with the regression
in the most straightforward way possible.
Reported-by: Dylan Yudaken <[email protected]>
Fixes: ee5b46a353af ("btrfs: increase direct io read size limit to 256 sectors")
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
[BUG]
Zygo reported on latest development branch, he could hit
ASSERT()/BUG_ON() caused crash when doing RAID5 recovery (intentionally
corrupt one disk, and let btrfs to recover the data during read/scrub).
And The following minimal reproducer can cause extent state leakage at
rmmod time:
mkfs.btrfs -f -d raid5 -m raid5 $dev1 $dev2 $dev3 -b 1G > /dev/null
mount $dev1 $mnt
fsstress -w -d $mnt -n 25 -s 1660807876
sync
fssum -A -f -w /tmp/fssum.saved $mnt
umount $mnt
# Wipe the dev1 but keeps its super block
xfs_io -c "pwrite -S 0x0 1m 1023m" $dev1
mount $dev1 $mnt
fssum -r /tmp/fssum.saved $mnt > /dev/null
umount $mnt
rmmod btrfs
This will lead to the following extent states leakage:
BTRFS: state leak: start 499712 end 503807 state 5 in tree 1 refs 1
BTRFS: state leak: start 495616 end 499711 state 5 in tree 1 refs 1
BTRFS: state leak: start 491520 end 495615 state 5 in tree 1 refs 1
BTRFS: state leak: start 487424 end 491519 state 5 in tree 1 refs 1
BTRFS: state leak: start 483328 end 487423 state 5 in tree 1 refs 1
BTRFS: state leak: start 479232 end 483327 state 5 in tree 1 refs 1
BTRFS: state leak: start 475136 end 479231 state 5 in tree 1 refs 1
BTRFS: state leak: start 471040 end 475135 state 5 in tree 1 refs 1
[CAUSE]
Since commit 7aa51232e204 ("btrfs: pass a btrfs_bio to
btrfs_repair_one_sector"), we always use btrfs_bio->file_offset to
determine the file offset of a page.
But that usage assume that, one bio has all its page having a continuous
page offsets.
Unfortunately that's not true, btrfs only requires the logical bytenr
contiguous when assembling its bios.
From above script, we have one bio looks like this:
fssum-27671 submit_one_bio: bio logical=217739264 len=36864
fssum-27671 submit_one_bio: r/i=5/261 page_offset=466944 <<<
fssum-27671 submit_one_bio: r/i=5/261 page_offset=724992 <<<
fssum-27671 submit_one_bio: r/i=5/261 page_offset=729088
fssum-27671 submit_one_bio: r/i=5/261 page_offset=733184
fssum-27671 submit_one_bio: r/i=5/261 page_offset=737280
fssum-27671 submit_one_bio: r/i=5/261 page_offset=741376
fssum-27671 submit_one_bio: r/i=5/261 page_offset=745472
fssum-27671 submit_one_bio: r/i=5/261 page_offset=749568
fssum-27671 submit_one_bio: r/i=5/261 page_offset=753664
Note that the 1st and the 2nd page has non-contiguous page offsets.
This means, at repair time, we will have completely wrong file offset
passed in:
kworker/u32:2-19927 btrfs_repair_one_sector: r/i=5/261 page_off=729088 file_off=475136 bio_offset=8192
Since the file offset is incorrect, we latter incorrectly set the extent
states, and no way to really release them.
Thus later it causes the leakage.
In fact, this can be even worse, since the file offset is incorrect, we
can hit cases like the incorrect file offset belongs to a HOLE, and
later cause btrfs_num_copies() to trigger error, finally hit
BUG_ON()/ASSERT() later.
[FIX]
Add an extra condition in btrfs_bio_add_page() for uncompressed IO.
Now we will have more strict requirement for bio pages:
- They should all have the same mapping
(the mapping check is already implied by the call chain)
- Their logical bytenr should be adjacent
This is the same as the old condition.
- Their page_offset() (file offset) should be adjacent
This is the new check.
This would result a slightly increased amount of bios from btrfs
(needs holes and inside the same stripe boundary to trigger).
But this would greatly reduce the confusion, as it's pretty common
to assume a btrfs bio would only contain continuous page cache.
Later we may need extra cleanups, as we no longer needs to handle gaps
between page offsets in endio functions.
Currently this should be the minimal patch to fix commit 7aa51232e204
("btrfs: pass a btrfs_bio to btrfs_repair_one_sector").
Reported-by: Zygo Blaxell <[email protected]>
Fixes: 7aa51232e204 ("btrfs: pass a btrfs_bio to btrfs_repair_one_sector")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
When punching a hole into a file range that is adjacent with a hole and we
are not using the no-holes feature, we expand the range of the adjacent
file extent item that represents a hole, to save metadata space.
However we don't update the generation of hole file extent item, which
means a full fsync will not log that file extent item if the fsync happens
in a later transaction (since commit 7f30c07288bb9e ("btrfs: stop copying
old file extents when doing a full fsync")).
For example, if we do this:
$ mkfs.btrfs -f -O ^no-holes /dev/sdb
$ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0xab 2M 2M" /mnt/foobar
$ sync
We end up with 2 file extent items in our file:
1) One that represents the hole for the file range [0, 2M), with a
generation of 7;
2) Another one that represents an extent covering the range [2M, 4M).
After that if we do the following:
$ xfs_io -c "fpunch 2M 2M" /mnt/foobar
We end up with a single file extent item in the file, which represents a
hole for the range [0, 4M) and with a generation of 7 - because we end
dropping the data extent for range [2M, 4M) and then update the file
extent item that represented the hole at [0, 2M), by increasing
length from 2M to 4M.
Then doing a full fsync and power failing:
$ xfs_io -c "fsync" /mnt/foobar
<power failure>
will result in the full fsync not logging the file extent item that
represents the hole for the range [0, 4M), because its generation is 7,
which is lower than the generation of the current transaction (8).
As a consequence, after mounting again the filesystem (after log replay),
the region [2M, 4M) does not have a hole, it still points to the
previous data extent.
So fix this by always updating the generation of existing file extent
items representing holes when we merge/expand them. This solves the
problem and it's the same approach as when we merge prealloc extents that
got written (at btrfs_mark_extent_written()). Setting the generation to
the current transaction's generation is also what we do when merging
the new hole extent map with the previous one or the next one.
A test case for fstests, covering both cases of hole file extent item
merging (to the left and to the right), will be sent soon.
Fixes: 7f30c07288bb9e ("btrfs: stop copying old file extents when doing a full fsync")
CC: [email protected] # 5.18+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
In btrfs_get_dev_args_from_path(), btrfs_get_bdev_and_sb() can fail if
the path is invalid. In this case, btrfs_get_dev_args_from_path()
returns directly without freeing args->uuid and args->fsid allocated
before, which causes memory leak.
To fix these possible leaks, when btrfs_get_bdev_and_sb() fails,
btrfs_put_dev_args_from_path() is called to clean up the memory.
Reported-by: TOTE Robot <[email protected]>
Fixes: faa775c41d655 ("btrfs: add a btrfs_get_dev_args_from_path helper")
CC: [email protected] # 5.16
Reviewed-by: Boris Burkov <[email protected]>
Signed-off-by: Zixuan Fu <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
For a filesystem which has btrfs read-only property set to true, all
write operations including xattr should be denied. However, security
xattr can still be changed even if btrfs ro property is true.
This happens because xattr_permission() does not have any restrictions
on security.*, system.* and in some cases trusted.* from VFS and
the decision is left to the underlying filesystem. See comments in
xattr_permission() for more details.
This patch checks if the root is read-only before performing the set
xattr operation.
Testcase:
DEV=/dev/vdb
MNT=/mnt
mkfs.btrfs -f $DEV
mount $DEV $MNT
echo "file one" > $MNT/f1
setfattr -n "security.one" -v 2 $MNT/f1
btrfs property set /mnt ro true
setfattr -n "security.one" -v 1 $MNT/f1
umount $MNT
CC: [email protected] # 4.9+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Goldwyn Rodrigues <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Ice driver allocates per cpu XDP queues so that redirect path can safely
use smp_processor_id() as an index to the array. At the same time
though, XDP rings are used to pick NAPI context to call napi_schedule()
or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
num_possible_cpus() of underlying platform is 44, then this means queue
vectors with correlated NAPI contexts will carry several XDP queues.
This in turn can result in a broken behavior where NAPI context of
interest will never be scheduled and AF_XDP socket will not process any
traffic.
To fix this, let us change the way how XDP rings are assigned to Rx
rings and use this information later on when setting
ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
queue vector and walk through Tx ring's linked list. Once we stumble
upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.
Previous [0] approach of fixing this issue was for txonly scenario
because of the described grouping of XDP rings across queue vectors. So,
relying on Rx ring meant that NAPI context could be scheduled with a
queue vector without XDP ring with associated XSK pool.
[0]: https://lore.kernel.org/netdev/[email protected]/
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Fix the following scenario:
1. ethtool -L $IFACE rx 8 tx 96
2. xdpsock -q 10 -t -z
Above refers to a case where user would like to attach XSK socket in
txonly mode at a queue id that does not have a corresponding Rx queue.
At this moment ice's XSK logic is tightly bound to act on a "queue pair",
e.g. both Tx and Rx queues at a given queue id are disabled/enabled and
both of them will get XSK pool assigned, which is broken for the presented
queue configuration. This results in the splat included at the bottom,
which is basically an OOB access to Rx ring array.
To fix this, allow using the ids only in scope of "combined" queues
reported by ethtool. However, logic should be rewritten to allow such
configurations later on, which would end up as a complete rewrite of the
control path, so let us go with this temporary fix.
[420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082
[420160.566359] #PF: supervisor read access in kernel mode
[420160.572657] #PF: error_code(0x0000) - not-present page
[420160.579002] PGD 0 P4D 0
[420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI
[420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G OE 5.19.0-rc7+ #10
[420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420160.693211] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420160.703645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420160.740707] PKRU: 55555554
[420160.745960] Call Trace:
[420160.750962] <TASK>
[420160.755597] ? kmalloc_large_node+0x79/0x90
[420160.762703] ? __kmalloc_node+0x3f5/0x4b0
[420160.769341] xp_assign_dev+0xfd/0x210
[420160.775661] ? shmem_file_read_iter+0x29a/0x420
[420160.782896] xsk_bind+0x152/0x490
[420160.788943] __sys_bind+0xd0/0x100
[420160.795097] ? exit_to_user_mode_prepare+0x20/0x120
[420160.802801] __x64_sys_bind+0x16/0x20
[420160.809298] do_syscall_64+0x38/0x90
[420160.815741] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[420160.823731] RIP: 0033:0x7fa86a0dd2fb
[420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48
[420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb
[420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003
[420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000
[420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0
[420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000
[420160.919817] </TASK>
[420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice]
[420160.977576] CR2: 0000000000000082
[420160.985037] ---[ end trace 0000000000000000 ]---
[420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
[420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
[420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
[420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
[420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
[420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
[420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
[420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
[420161.201505] FS: 00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
[420161.213628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
[420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[420161.257052] PKRU: 55555554
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
When the pn532 uart device is detaching, the pn532_uart_remove()
is called. But there are no functions in pn532_uart_remove() that
could delete the cmd_timeout timer, which will cause use-after-free
bugs. The process is shown below:
(thread 1) | (thread 2)
| pn532_uart_send_frame
pn532_uart_remove | mod_timer(&pn532->cmd_timeout,...)
... | (wait a time)
kfree(pn532) //FREE | pn532_cmd_timeout
| pn532_uart_send_frame
| pn532->... //USE
This patch adds del_timer_sync() in pn532_uart_remove() in order to
prevent the use-after-free bugs. What's more, the pn53x_unregister_nfc()
is well synchronized, it sets nfc_dev->shutting_down to true and there
are no syscalls could restart the cmd_timeout timer.
Fixes: c656aa4c27b1 ("nfc: pn533: add UART phy driver")
Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The change that made IPMODIFY and DIRECT ops work together needed access
to the ops_references_ip() function, which it pulled out of the module
only code. But now if both CONFIG_MODULES and
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not set, we get the below
warning:
‘ops_references_rec’ defined but not used.
Since ops_references_rec() only calls ops_references_ip() replace the
usage of ops_references_rec() with ops_references_ip() and encompass the
function with an #ifdef of DIRECT_CALLS || MODULES being defined.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function")
Signed-off-by: Wang Jingjin <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Hayes Wang says:
====================
r8152: fix flow control settings
These patches fix the settings of RX FIFO about flow control.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The RX FIFO would be changed when suspending, so the related settings
have to be modified, too. Otherwise, the flow control would work
abnormally.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216333
Reported-by: Mark Blakeney <[email protected]>
Fixes: cdf0b86b250f ("r8152: fix a WOL issue")
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The units of PLA_RX_FIFO_FULL and PLA_RX_FIFO_EMPTY are 16 bytes.
Fixes: 195aae321c82 ("r8152: support new chips")
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 3b3fd068c56e3fbea30090859216a368398e39bf added NULL check for
`rose_loopback_neigh->dev` in rose_loopback_timer() but omitted to
check rose_loopback_neigh->loopback.
It thus prevents *all* rose connect.
The reason is that a special rose_neigh loopback has a NULL device.
/proc/net/rose_neigh illustrates it via rose_neigh_show() function :
[...]
seq_printf(seq, "%05d %-9s %-4s %3d %3d %3s %3s %3lu %3lu",
rose_neigh->number,
(rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign),
rose_neigh->dev ? rose_neigh->dev->name : "???",
rose_neigh->count,
/proc/net/rose_neigh displays special rose_loopback_neigh->loopback as
callsign RSLOOP-0:
addr callsign dev count use mode restart t0 tf digipeaters
00001 RSLOOP-0 ??? 1 2 DCE yes 0 0
By checking rose_loopback_neigh->loopback, rose_rx_call_request() is called
even in case rose_loopback_neigh->dev is NULL. This repairs rose connections.
Verification with rose client application FPAC:
FPAC-Node v 4.1.3 (built Aug 5 2022) for LINUX (help = h)
F6BVP-4 (Commands = ?) : u
Users - AX.25 Level 2 sessions :
Port Callsign Callsign AX.25 state ROSE state NetRom status
axudp F6BVP-5 -> F6BVP-9 Connected Connected ---------
Fixes: 3b3fd068c56e ("rose: Fix Null pointer dereference in rose_send_frame()")
Signed-off-by: Bernard Pidoux <[email protected]>
Suggested-by: Francois Romieu <[email protected]>
Cc: Thomas DL9SAU Osterried <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|