aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-03-15perf: Fix check before add_event_to_groups() in perf_group_detach()Budimir Markovic1-1/+1
Events should only be added to a groups rb tree if they have not been removed from their context by list_del_event(). Since remove_on_exec made it possible to call list_del_event() on individual events before they are detached from their group, perf_group_detach() should check each sibling's attach_state before calling add_event_to_groups() on it. Fixes: 2e498d0a74e5 ("perf: Add support for event removal on exec") Signed-off-by: Budimir Markovic <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/ZBFzvQV9tEqoHEtH@gentoo
2023-03-15perf: fix perf_event_context->timeSong Liu1-1/+1
Time readers rely on perf_event_context->[time|timestamp|timeoffset] to get accurate time_enabled and time_running for an event. The difference between ctx->timestamp and ctx->time is the among of time when the context is not enabled. __update_context_time(ctx, false) is used to increase timestamp, but not time. Therefore, it should only be called in ctx_sched_in() when EVENT_TIME was not enabled. Fixes: 09f5e7dc7ad7 ("perf: Fix perf_event_read_local() time") Signed-off-by: Song Liu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-03-15perf/core: Fix perf_output_begin parameter is incorrectly invoked in ↵Yang Jihong1-1/+1
perf_event_bpf_output syzkaller reportes a KASAN issue with stack-out-of-bounds. The call trace is as follows: dump_stack+0x9c/0xd3 print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 __perf_event_header__init_id+0x34/0x290 perf_event_header__init_id+0x48/0x60 perf_output_begin+0x4a4/0x560 perf_event_bpf_output+0x161/0x1e0 perf_iterate_sb_cpu+0x29e/0x340 perf_iterate_sb+0x4c/0xc0 perf_event_bpf_event+0x194/0x2c0 __bpf_prog_put.constprop.0+0x55/0xf0 __cls_bpf_delete_prog+0xea/0x120 [cls_bpf] cls_bpf_delete_prog_work+0x1c/0x30 [cls_bpf] process_one_work+0x3c2/0x730 worker_thread+0x93/0x650 kthread+0x1b8/0x210 ret_from_fork+0x1f/0x30 commit 267fb27352b6 ("perf: Reduce stack usage of perf_output_begin()") use on-stack struct perf_sample_data of the caller function. However, perf_event_bpf_output uses incorrect parameter to convert small-sized data (struct perf_bpf_event) into large-sized data (struct perf_sample_data), which causes memory overwriting occurs in __perf_event_header__init_id. Fixes: 267fb27352b6 ("perf: Reduce stack usage of perf_output_begin()") Signed-off-by: Yang Jihong <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-03-15btrfs: zoned: drop space_info->active_total_bytesNaohiro Aota4-43/+9
The space_info->active_total_bytes is no longer necessary as we now count the region of newly allocated block group as zone_unusable. Drop its usage. Fixes: 6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes") CC: [email protected] # 6.1+ Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-03-15btrfs: zoned: count fresh BG region as zone unusableNaohiro Aota2-6/+26
The naming of space_info->active_total_bytes is misleading. It counts not only active block groups but also full ones which are previously active but now inactive. That confusion results in a bug not counting the full BGs into active_total_bytes on mount time. For a background, there are three kinds of block groups in terms of activation. 1. Block groups never activated 2. Block groups currently active 3. Block groups previously active and currently inactive (due to fully written or zone finish) What we really wanted to exclude from "total_bytes" is the total size of BGs #1. They seem empty and allocatable but since they are not activated, we cannot rely on them to do the space reservation. And, since BGs #1 never get activated, they should have no "used", "reserved" and "pinned" bytes. OTOH, BGs #3 can be counted in the "total", since they are already full we cannot allocate from them anyway. For them, "total_bytes == used + reserved + pinned + zone_unusable" should hold. Tracking #2 and #3 as "active_total_bytes" (current implementation) is confusing. And, tracking #1 and subtract that properly from "total_bytes" every time you need space reservation is cumbersome. Instead, we can count the whole region of a newly allocated block group as zone_unusable. Then, once that block group is activated, release [0 .. zone_capacity] from the zone_unusable counters. With this, we can eliminate the confusing ->active_total_bytes and the code will be common among regular and the zoned mode. Also, no additional counter is needed with this approach. Fixes: 6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes") CC: [email protected] # 6.1+ Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-03-15btrfs: use temporary variable for space_info in btrfs_update_block_groupJosef Bacik1-10/+12
We do cache->space_info->counter += num_bytes; everywhere in here. This is makes the lines longer than they need to be, and will be especially noticeable when we add the active tracking in, so add a temp variable for the space_info so this is cleaner. Reviewed-by: Naohiro Aota <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-03-15btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKINGJosef Bacik3-8/+4
This flag only gets set when we're doing active zone tracking, and we're going to need to use this flag for things related to this behavior. Rename the flag to represent what it actually means for the file system so it can be used in other ways and still make sense. Reviewed-by: Naohiro Aota <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-03-15btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profileNaohiro Aota1-2/+12
btrfs_can_activate_zone() returns true if at least one device has one zone available for activation. This is OK for the single profile, but not OK for DUP profile. We need two zones to create a DUP block group. Fix it by properly handling the case with the profile flags. Fixes: 265f7237dd25 ("btrfs: zoned: allow DUP on meta-data block groups") CC: [email protected] # 6.1+ Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-03-15btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filenameSweet Tea Dorminy1-1/+6
Commit 1ec49744ba83 ("btrfs: turn on -Wmaybe-uninitialized") exposed that on SPARC and PA-RISC, gcc is unaware that fscrypt_setup_filename() only returns negative error values or 0. This ultimately results in a maybe-uninitialized warning in btrfs_lookup_dentry(). Change to only return negative error values or 0 from fscrypt_setup_filename() at the relevant call site, and assert that no positive error codes are returned (which would have wider implications involving other users). Reported-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Sweet Tea Dorminy <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-03-15btrfs: handle missing chunk mapping more gracefullyQu Wenruo1-1/+2
[BUG] During my scrub rework, I did a stupid thing like this: bio->bi_iter.bi_sector = stripe->logical; btrfs_submit_bio(fs_info, bio, stripe->mirror_num); Above bi_sector assignment is using logical address directly, which lacks ">> SECTOR_SHIFT". This results a read on a range which has no chunk mapping. This results the following crash: BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536 assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387 Sure this is all my fault, but this shows a possible problem in real world, that some bit flip in file extents/tree block can point to unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT is not configured, cause invalid pointer access. [PROBLEMS] In the above call chain, we just don't handle the possible error from btrfs_get_chunk_map() inside __btrfs_map_block(). [FIX] The fix is straightforward, replace the ASSERT() with proper error handling (callers handle errors already). Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-03-15Merge tag 'linux-kselftest-fixes-6.3-rc3' of ↵Linus Torvalds2-4/+11
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "A fix to amd-pstate test Makefile and a fix to LLVM build for x86 in kselftest common lib.mk" * tag 'linux-kselftest-fixes-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: fix LLVM build for i386 and x86_64 selftests: amd-pstate: fix TEST_FILES
2023-03-15Merge branch 'md-fixes' of ↵Jens Axboe2-9/+12
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.3 Pull MD fixes from Song: "This set contains two fixes for old issues (by Neil) and one fix for 6.3 (by Xiao)." * 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: select BLOCK_LEGACY_AUTOLOAD md: avoid signed overflow in slot_store() md: Free resources in __md_stop
2023-03-15md: select BLOCK_LEGACY_AUTOLOADNeilBrown1-0/+4
When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to activate new arrays unless "CREATE names=yes" appears in mdadm.conf As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD for when MD is selected - at least until mdadm is updated and the updates widely available. Cc: [email protected] # v5.18+ Fixes: fbdee71bb5d8 ("block: deprecate autoloading based on dev_t") Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Song Liu <[email protected]>
2023-03-15block: count 'ios' and 'sectors' when io is done for bio-based deviceYu Kuai4-20/+15
While using iostat for raid, I observed very strange 'await' occasionally, and turns out it's due to that 'ios' and 'sectors' is counted in bdev_start_io_acct(), while 'nsecs' is counted in bdev_end_io_acct(). I'm not sure why they are ccounted like that but I think this behaviour is obviously wrong because user will get wrong disk stats. Fix the problem by counting 'ios' and 'sectors' when io is done, like what rq-based device does. Fixes: 394ffa503bc4 ("blk: introduce generic io stat accounting help function") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15block: sunvdc: add check for mdesc_grab() returning NULLLiang He1-0/+2
In vdc_port_probe(), we should check the return value of mdesc_grab() as it may return NULL, which can cause potential NPD bug. Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.") Signed-off-by: Liang He <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: style cleanup] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15nvmet: avoid potential UAF in nvmet_req_complete()Damien Le Moal1-1/+3
An nvme target ->queue_response() operation implementation may free the request passed as argument. Such implementation potentially could result in a use after free of the request pointer when percpu_ref_put() is called in nvmet_req_complete(). Avoid such problem by using a local variable to save the sq pointer before calling __nvmet_req_complete(), thus avoiding dereferencing the req pointer after that function call. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15nvme-trace: show more opcode namesMinwoo Im1-0/+5
We have more commands to show in the trace. Sync up. Signed-off-by: Minwoo Im <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15nvme-tcp: add nvme-tcp pdu size build protectionSagi Grimberg1-0/+9
Make sure that we don't somehow mess up the wire structures in the spec. Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15nvme-tcp: fix opcode reporting in the timeout handlerSagi Grimberg1-6/+18
For non in-capsule writes we reuse the request pdu space for a h2cdata pdu in order to avoid over allocating space (either preallocate or dynamically upon receving an r2t pdu). However if the request times out the core expects to find the opcode in the start of the request, which we override. In order to prevent that, without sacrificing additional 24 bytes per request, we just use the tail of the command pdu space instead (last 24 bytes from the 72 bytes command pdu). That should make the command opcode always available, and we get away from allocating more space. If in the future we would need the last 24 bytes of the nvme command available we would need to allocate a dedicated space for it in the request, but until then we can avoid doing so. Reported-by: Akinobu Mita <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Tested-by: Akinobu Mita <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620Philipp Geulen1-0/+2
Added a quirk to fix Lexar NM620 1TB SSD reporting duplicate NGUIDs. Signed-off-by: Philipp Geulen <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000Elmer Miroslav Mosher Golovin1-0/+2
Added a quirk to fix the Netac NV3000 SSD reporting duplicate NGUIDs. Cc: <[email protected]> Signed-off-by: Elmer Miroslav Mosher Golovin <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15nvme-pci: fixing memory leak in probe teardown pathIrvin Cote1-0/+1
In case the nvme_probe teardown path is triggered the ctrl ref count does not reach 0 thus creating a memory leak upon failure of nvme_probe. Signed-off-by: Irvin Cote <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15nvme: fix handling single range discard requestMing Lei1-9/+19
When investigating one customer report on warning in nvme_setup_discard, we observed the controller(nvme/tcp) actually exposes queue_max_discard_segments(req->q) == 1. Obviously the current code can't handle this situation, since contiguity merge like normal RW request is taken. Fix the issue by building range from request sector/nr_sectors directly. Fixes: b35ba01ea697 ("nvme: support ranged discard requests") Signed-off-by: Ming Lei <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15MAINTAINERS: repair malformed T: entries in NVM EXPRESS DRIVERSLukas Bulwahn1-4/+4
The T: entries shall be composed of a SCM tree type (git, hg, quilt, stgit or topgit) and location. Add the SCM tree type to the T: entry, and reorder the file entries in alphabetical order. Fixes: b508fc354f6d ("nvme: update maintainers information") Signed-off-by: Lukas Bulwahn <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2023-03-15io_uring/sqpoll: Do not set PF_NO_SETAFFINITY on sqpoll threadsMichal Koutný1-1/+0
Users may specify a CPU where the sqpoll thread would run. This may conflict with cpuset operations because of strict PF_NO_SETAFFINITY requirement. That flag is unnecessary for polling "kernel" threads, see the reasoning in commit 01e68ce08a30 ("io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers"). Drop the flag on poll threads too. Fixes: 01e68ce08a30 ("io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers") Link: https://lore.kernel.org/all/20230314162559.pnyxdllzgw7jozgx@blackpad/ Signed-off-by: Michal Koutný <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15block: null_blk: cleanup null_queue_rq()Damien Le Moal1-15/+14
Use a local struct request pointer variable to avoid having to dereference struct blk_mq_queue_data multiple times. While at it, also fix the function argument indentation and remove a useless "else" after a return. Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Pankaj Raghav <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15block: null_blk: Fix handling of fake timeout requestDamien Le Moal1-3/+3
When injecting a fake timeout into the null_blk driver using fail_io_timeout, the request timeout handler does not execute blk_mq_complete_request(), so the complete callback is never executed for a timedout request. The null_blk driver also has a driver-specific fake timeout mechanism which does not have this problem. Fix the problem with fail_io_timeout by using the same meachanism as null_blk internal timeout feature, using the fake_timeout field of null_blk commands. Reported-by: Akinobu Mita <[email protected]> Fixes: de3510e52b0a ("null_blk: fix command timeout completion handling") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-03-15wifi: mac80211: Serialize ieee80211_handle_wake_tx_queue()Alexander Wetzel3-0/+8
ieee80211_handle_wake_tx_queue must not run concurrent multiple times. It calls ieee80211_txq_schedule_start() and the drivers migrated to iTXQ do not expect overlapping drv_tx() calls. This fixes 'c850e31f79f0 ("wifi: mac80211: add internal handler for wake_tx_queue")', which introduced ieee80211_handle_wake_tx_queue. Drivers started to use it with 'a790cc3a4fad ("wifi: mac80211: add wake_tx_queue callback to drivers")'. But only after fixing an independent bug with '4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")' problematic concurrent calls really happened and exposed the initial issue. Fixes: c850e31f79f0 ("wifi: mac80211: add internal handler for wake_tx_queue") Reported-by: Thomas Mann <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217119 Link: https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/r/[email protected]> CC: <[email protected]> Signed-off-by: Alexander Wetzel <[email protected]> Link: https://lore.kernel.org/r/[email protected] [add missing spin_lock_init() noticed by Felix] Signed-off-by: Johannes Berg <[email protected]>
2023-03-15wifi: mwifiex: mark OF related data as maybe unusedKrzysztof Kozlowski2-2/+2
The driver can be compile tested with !CONFIG_OF making certain data unused: drivers/net/wireless/marvell/mwifiex/sdio.c:498:34: error: ‘mwifiex_sdio_of_match_table’ defined but not used [-Werror=unused-const-variable=] drivers/net/wireless/marvell/mwifiex/pcie.c:175:34: error: ‘mwifiex_pcie_of_match_table’ defined but not used [-Werror=unused-const-variable=] Signed-off-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-15powerpc/mm: Fix false detection of read faultsRussell Currey1-3/+8
To support detection of read faults with Radix execute-only memory, the vma_is_accessible() check in access_error() (which checks for PROT_NONE) was replaced with a check to see if VM_READ was missing, and if so, returns true to assert the fault was caused by a bad read. This is incorrect, as it ignores that both VM_WRITE and VM_EXEC imply read on powerpc, as defined in protection_map[]. This causes mappings containing VM_WRITE or VM_EXEC without VM_READ to misreport the cause of page faults, since the MMU is still allowing reads. Correct this by restoring the original vma_is_accessible() check for PROT_NONE mappings, and adding a separate check for Radix PROT_EXEC-only mappings. Fixes: 395cac7752b9 ("powerpc/mm: Support execute-only memory on the Radix MMU") Reported-by: Michal Suchánek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Benjamin Gray <[email protected]> Signed-off-by: Russell Currey <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-03-15drm/meson: dw-hdmi: Fix devm_regulator_*get_enable*() conversion againMarek Szyprowski1-1/+1
devm_regulator_get_enable_optional() returns -ENODEV if requested optional regulator is not present. Adjust code for that, because in the 67d0a30128c9 I've incorrectly assumed that it also returns 0 when regulator is not present. Reported-by: Ricardo Cañuelo <[email protected]> Fixes: 67d0a30128c9 ("drm/meson: dw-hdmi: Fix devm_regulator_*get_enable*() conversion") Signed-off-by: Marek Szyprowski <[email protected]> Acked-by: Martin Blumenstingl <[email protected]> Acked-by: Neil Armstrong <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-03-15drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdocLiu Ying1-2/+2
The returned array size for input formats is set through atomic_get_input_bus_fmts()'s 'num_input_fmts' argument, so use 'num_input_fmts' to represent the array size in the function's kdoc, not 'num_output_fmts'. Fixes: 91ea83306bfa ("drm/bridge: Fix the bridge kernel doc") Fixes: f32df58acc68 ("drm/bridge: Add the necessary bits to support bus format negotiation") Signed-off-by: Liu Ying <[email protected]> Reviewed-by: Robert Foss <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-03-15Merge branch 'mtk_eth_soc-SGMII-fixes'David S. Miller2-12/+20
Daniel Golle says: ==================== net: ethernet: mtk_eth_soc: minor SGMII fixes This small series brings two minor fixes for the SGMII unit found in MediaTek's router SoCs. The first patch resets the PCS internal state machine on major configuration changes, just like it is also done in MediaTek's SDK. The second patch makes sure we only write values and restart AN if actually needed, thus preventing unnesseray loss of an existing link in some cases. Both patches have previously been submitted as part of the series "net: ethernet: mtk_eth_soc: various enhancements" which grew a bit too big and it has correctly been criticized that some of the patches should rather go as fixes to net-next. This new series tries to address this. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-03-15net: ethernet: mtk_eth_soc: only write values if neededDaniel Golle1-12/+12
Only restart auto-negotiation and write link timer if actually necessary. This prevents losing the link in case of minor changes. Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII") Reviewed-by: Russell King (Oracle) <[email protected]> Tested-by: Bjørn Mork <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-15net: ethernet: mtk_eth_soc: reset PCS stateDaniel Golle2-0/+8
Reset the internal PCS state machine when changing interface mode. This prevents confusing the state machine when changing interface modes, e.g. from SGMII to 2500Base-X or vice-versa. Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII") Reviewed-by: Russell King (Oracle) <[email protected]> Tested-by: Bjørn Mork <[email protected]> Signed-off-by: Daniel Golle <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-15net: usb: smsc75xx: Limit packet length to skb->lenSzymon Heidrich1-1/+2
Packet length retrieved from skb data may be larger than the actual socket buffer length (up to 9026 bytes). In such case the cloned skb passed up the network stack will leak kernel memory contents. Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Szymon Heidrich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-15Merge branch 'net-smc-fixes'David S. Miller2-1/+2
Wenjia Zhang says: ==================== net/smc: Fixes 2023-03-01 The 1st patch solves the problem that CLC message initialization was not properly reversed in error handling path. And the 2nd one fixes the possible deadlock triggered by cancel_delayed_work_sync(). ==================== Signed-off-by: David S. Miller <[email protected]>
2023-03-15net/smc: Fix device de-init sequenceStefan Raspl1-0/+1
CLC message initialization was not properly reversed in error handling path. Reported-and-suggested-by: Alexander Gordeev <[email protected]> Signed-off-by: Stefan Raspl <[email protected]> Signed-off-by: Wenjia Zhang <[email protected]> Reviewed-by: Tony Lu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-15net/smc: fix deadlock triggered by cancel_delayed_work_syn()Wenjia Zhang1-1/+1
The following LOCKDEP was detected: Workqueue: events smc_lgr_free_work [smc] WARNING: possible circular locking dependency detected 6.1.0-20221027.rc2.git8.56bc5b569087.300.fc36.s390x+debug #1 Not tainted ------------------------------------------------------ kworker/3:0/176251 is trying to acquire lock: 00000000f1467148 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}, at: __flush_workqueue+0x7a/0x4f0 but task is already holding lock: 0000037fffe97dc8 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x232/0x730 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __flush_work+0x76/0xf0 __cancel_work_timer+0x170/0x220 __smc_lgr_terminate.part.0+0x34/0x1c0 [smc] smc_connect_rdma+0x15e/0x418 [smc] __smc_connect+0x234/0x480 [smc] smc_connect+0x1d6/0x230 [smc] __sys_connect+0x90/0xc0 __do_sys_socketcall+0x186/0x370 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #3 (smc_client_lgr_pending){+.+.}-{3:3}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __mutex_lock+0x96/0x8e8 mutex_lock_nested+0x32/0x40 smc_connect_rdma+0xa4/0x418 [smc] __smc_connect+0x234/0x480 [smc] smc_connect+0x1d6/0x230 [smc] __sys_connect+0x90/0xc0 __do_sys_socketcall+0x186/0x370 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #2 (sk_lock-AF_SMC){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 lock_sock_nested+0x46/0xa8 smc_tx_work+0x34/0x50 [smc] process_one_work+0x30c/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 -> #1 ((work_completion)(&(&smc->conn.tx_work)->work)){+.+.}-{0:0}: __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 process_one_work+0x2bc/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 -> #0 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}: check_prev_add+0xd8/0xe88 validate_chain+0x70c/0xb20 __lock_acquire+0x58e/0xbd8 lock_acquire.part.0+0xe2/0x248 lock_acquire+0xac/0x1c8 __flush_workqueue+0xaa/0x4f0 drain_workqueue+0xaa/0x158 destroy_workqueue+0x44/0x2d8 smc_lgr_free+0x9e/0xf8 [smc] process_one_work+0x30c/0x730 worker_thread+0x62/0x420 kthread+0x138/0x150 __ret_from_fork+0x3c/0x58 ret_from_fork+0xa/0x40 other info that might help us debug this: Chain exists of: (wq_completion)smc_tx_wq-00000000#2 --> smc_client_lgr_pending --> (work_completion)(&(&lgr->free_work)->work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&lgr->free_work)->work)); lock(smc_client_lgr_pending); lock((work_completion) (&(&lgr->free_work)->work)); lock((wq_completion)smc_tx_wq-00000000#2); *** DEADLOCK *** 2 locks held by kworker/3:0/176251: #0: 0000000080183548 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x232/0x730 #1: 0000037fffe97dc8 ((work_completion) (&(&lgr->free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x232/0x730 stack backtrace: CPU: 3 PID: 176251 Comm: kworker/3:0 Not tainted Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) Call Trace: [<000000002983c3e4>] dump_stack_lvl+0xac/0x100 [<0000000028b477ae>] check_noncircular+0x13e/0x160 [<0000000028b48808>] check_prev_add+0xd8/0xe88 [<0000000028b49cc4>] validate_chain+0x70c/0xb20 [<0000000028b4bd26>] __lock_acquire+0x58e/0xbd8 [<0000000028b4cf6a>] lock_acquire.part.0+0xe2/0x248 [<0000000028b4d17c>] lock_acquire+0xac/0x1c8 [<0000000028addaaa>] __flush_workqueue+0xaa/0x4f0 [<0000000028addf9a>] drain_workqueue+0xaa/0x158 [<0000000028ae303c>] destroy_workqueue+0x44/0x2d8 [<000003ff8029af26>] smc_lgr_free+0x9e/0xf8 [smc] [<0000000028adf3d4>] process_one_work+0x30c/0x730 [<0000000028adf85a>] worker_thread+0x62/0x420 [<0000000028aeac50>] kthread+0x138/0x150 [<0000000028a63914>] __ret_from_fork+0x3c/0x58 [<00000000298503da>] ret_from_fork+0xa/0x40 INFO: lockdep is turned off. =================================================================== This deadlock occurs because cancel_delayed_work_sync() waits for the work(&lgr->free_work) to finish, while the &lgr->free_work waits for the work(lgr->tx_wq), which needs the sk_lock-AF_SMC, that is already used under the mutex_lock. The solution is to use cancel_delayed_work() instead, which kills off a pending work. Fixes: a52bcc919b14 ("net/smc: improve termination processing") Signed-off-by: Wenjia Zhang <[email protected]> Reviewed-by: Jan Karcher <[email protected]> Reviewed-by: Karsten Graul <[email protected]> Reviewed-by: Tony Lu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-15mlxsw: spectrum: Fix incorrect parsing depth after reloadIdo Schimmel2-0/+16
Spectrum ASICs have a configurable limit on how deep into the packet they parse. By default, the limit is 96 bytes. There are several cases where this parsing depth is not enough and there is a need to increase it. For example, timestamping of PTP packets and a FIB multipath hash policy that requires hashing on inner fields. The driver therefore maintains a reference count that reflects the number of consumers that require an increased parsing depth. During reload_down() the parsing depth reference count does not necessarily drop to zero, but the parsing depth itself is restored to the default during reload_up() when the firmware is reset. It is therefore possible to end up in situations where the driver thinks that the parsing depth was increased (reference count is non-zero), when it is not. Fix by making sure that all the consumers that increase the parsing depth reference count also decrease it during reload_down(). Specifically, make sure that when the routing code is de-initialized it drops the reference count if it was increased because of a FIB multipath hash policy that requires hashing on inner fields. Add a warning if the reference count is not zero after the driver was de-initialized and explicitly reset it to zero during initialization for good measures. Fixes: 2d91f0803b84 ("mlxsw: spectrum: Add infrastructure for parsing configuration") Reported-by: Maksym Yaremchuk <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Link: https://lore.kernel.org/r/9c35e1b3e6c1d8f319a2449d14e2b86373f3b3ba.1678727526.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15veth: rely on rtnl_dereference() instead of on rcu_dereference() in ↵Lorenzo Bianconi1-1/+1
veth_set_xdp_features() Fix the following kernel warning in veth_set_xdp_features routine relying on rtnl_dereference() instead of on rcu_dereference(): ============================= WARNING: suspicious RCU usage 6.3.0-rc1-00144-g064d70527aaa #149 Not tainted ----------------------------- drivers/net/veth.c:1265 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/135: (net/core/rtnetlink.c:6172) stack backtrace: CPU: 1 PID: 135 Comm: ip Not tainted 6.3.0-rc1-00144-g064d70527aaa #149 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107) lockdep_rcu_suspicious (include/linux/context_tracking.h:152) veth_set_xdp_features (drivers/net/veth.c:1265 (discriminator 9)) veth_newlink (drivers/net/veth.c:1892) ? veth_set_features (drivers/net/veth.c:1774) ? kasan_save_stack (mm/kasan/common.c:47) ? kasan_save_stack (mm/kasan/common.c:46) ? kasan_set_track (mm/kasan/common.c:52) ? alloc_netdev_mqs (include/linux/slab.h:737) ? rcu_read_lock_sched_held (kernel/rcu/update.c:125) ? trace_kmalloc (include/trace/events/kmem.h:54) ? __xdp_rxq_info_reg (net/core/xdp.c:188) ? alloc_netdev_mqs (net/core/dev.c:10657) ? rtnl_create_link (net/core/rtnetlink.c:3312) rtnl_newlink_create (net/core/rtnetlink.c:3440) ? rtnl_link_get_net_capable.constprop.0 (net/core/rtnetlink.c:3391) __rtnl_newlink (net/core/rtnetlink.c:3657) ? lock_downgrade (kernel/locking/lockdep.c:5321) ? rtnl_link_unregister (net/core/rtnetlink.c:3487) rtnl_newlink (net/core/rtnetlink.c:3671) rtnetlink_rcv_msg (net/core/rtnetlink.c:6174) ? rtnl_link_fill (net/core/rtnetlink.c:6070) ? mark_usage (kernel/locking/lockdep.c:4914) ? mark_usage (kernel/locking/lockdep.c:4914) netlink_rcv_skb (net/netlink/af_netlink.c:2574) ? rtnl_link_fill (net/core/rtnetlink.c:6070) ? netlink_ack (net/netlink/af_netlink.c:2551) ? lock_acquire (kernel/locking/lockdep.c:467) ? net_generic (include/linux/rcupdate.h:805) ? netlink_deliver_tap (include/linux/rcupdate.h:805) netlink_unicast (net/netlink/af_netlink.c:1340) ? netlink_attachskb (net/netlink/af_netlink.c:1350) netlink_sendmsg (net/netlink/af_netlink.c:1942) ? netlink_unicast (net/netlink/af_netlink.c:1861) ? netlink_unicast (net/netlink/af_netlink.c:1861) sock_sendmsg (net/socket.c:727) ____sys_sendmsg (net/socket.c:2501) ? kernel_sendmsg (net/socket.c:2448) ? __copy_msghdr (net/socket.c:2428) ___sys_sendmsg (net/socket.c:2557) ? mark_usage (kernel/locking/lockdep.c:4914) ? do_recvmmsg (net/socket.c:2544) ? lock_acquire (kernel/locking/lockdep.c:467) ? find_held_lock (kernel/locking/lockdep.c:5159) ? __lock_release (kernel/locking/lockdep.c:5345) ? __might_fault (mm/memory.c:5625) ? lock_downgrade (kernel/locking/lockdep.c:5321) ? __fget_light (include/linux/atomic/atomic-arch-fallback.h:227) __sys_sendmsg (include/linux/file.h:31) ? __sys_sendmsg_sock (net/socket.c:2572) ? rseq_get_rseq_cs (kernel/rseq.c:275) ? lockdep_hardirqs_on_prepare.part.0 (kernel/locking/lockdep.c:4263) do_syscall_64 (arch/x86/entry/common.c:50) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7f0d1aadeb17 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 Fixes: fccca038f300 ("veth: take into account device reconfiguration for xdp_features flag") Suggested-by: Eric Dumazet <[email protected]> Reported-by: Matthieu Baerts <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/T/#me4c9d8e985ec7ebee981cfdb5bc5ec651ef4035d Signed-off-by: Lorenzo Bianconi <[email protected]> Reported-by: [email protected] Reviewed-by: Eric Dumazet <[email protected]> Tested-by: Matthieu Baerts <[email protected]> Link: https://lore.kernel.org/r/dfd6a9a7d85e9113063165e1f47b466b90ad7b8a.1678748579.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15nfc: st-nci: Fix use after free bug in ndlc_remove due to race conditionZheng Wang1-2/+4
This bug influences both st_nci_i2c_remove and st_nci_spi_remove. Take st_nci_i2c_remove as an example. In st_nci_i2c_probe, it called ndlc_probe and bound &ndlc->sm_work with llt_ndlc_sm_work. When it calls ndlc_recv or timeout handler, it will finally call schedule_work to start the work. When we call st_nci_i2c_remove to remove the driver, there may be a sequence as follows: Fix it by finishing the work before cleanup in ndlc_remove CPU0 CPU1 |llt_ndlc_sm_work st_nci_i2c_remove | ndlc_remove | st_nci_remove | nci_free_device| kfree(ndev) | //free ndlc->ndev | |llt_ndlc_rcv_queue |nci_recv_frame |//use ndlc->ndev Fixes: 35630df68d60 ("NFC: st21nfcb: Add driver for STMicroelectronics ST21NFCB NFC chip") Signed-off-by: Zheng Wang <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15Merge branch 'tcp-fix-bind-regression-for-dual-stack-wildcard-address'Jakub Kicinski4-1/+123
Kuniyuki Iwashima says: ==================== tcp: Fix bind() regression for dual-stack wildcard address. The first patch fixes the regression reported in [0], and the second patch adds a test for similar cases to catch future regression. [0]: https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15selftest: Add test for bind() conflicts.Kuniyuki Iwashima3-0/+116
The test checks if (IPv4, IPv6) address pair properly conflict or not. * IPv4 * 0.0.0.0 * 127.0.0.1 * IPv6 * :: * ::1 If the IPv6 address is [::], the second bind() always fails. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15tcp: Fix bind() conflict check for dual-stack wildcard address.Kuniyuki Iwashima1-1/+7
Paul Holzinger reported [0] that commit 5456262d2baa ("net: Fix incorrect address comparison when searching for a bind2 bucket") introduced a bind() regression. Paul also gave a nice repro that calls two types of bind() on the same port, both of which now succeed, but the second call should fail: bind(fd1, ::, port) + bind(fd2, 127.0.0.1, port) The cited commit added address family tests in three functions to fix the uninit-value KMSAN report. [1] However, the test added to inet_bind2_bucket_match_addr_any() removed a necessary conflict check; the dual-stack wildcard address no longer conflicts with an IPv4 non-wildcard address. If tb->family is AF_INET6 and sk->sk_family is AF_INET in inet_bind2_bucket_match_addr_any(), we still need to check if tb has the dual-stack wildcard address. Note that the IPv4 wildcard address does not conflict with IPv6 non-wildcard addresses. [0]: https://lore.kernel.org/netdev/[email protected]/ [1]: https://lore.kernel.org/netdev/CAG_fn=Ud3zSW7AZWXc+asfMhZVL5ETnvuY44Pmyv4NPv-ijN-A@mail.gmail.com/ Fixes: 5456262d2baa ("net: Fix incorrect address comparison when searching for a bind2 bucket") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reported-by: Paul Holzinger <[email protected]> Link: https://lore.kernel.org/netdev/CAG_fn=Ud3zSW7AZWXc+asfMhZVL5ETnvuY44Pmyv4NPv-ijN-A@mail.gmail.com/ Reviewed-by: Eric Dumazet <[email protected]> Tested-by: Paul Holzinger <[email protected]> Reviewed-by: Martin KaFai Lau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15net: phy: smsc: bail out in lan87xx_read_status if genphy_read_status failsHeiner Kallweit1-1/+4
If genphy_read_status fails then further access to the PHY may result in unpredictable behavior. To prevent this bail out immediately if genphy_read_status fails. Fixes: 4223dbffed9f ("net: phy: smsc: Re-enable EDPD mode for LAN87xx") Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15net: tunnels: annotate lockless accesses to dev->needed_headroomEric Dumazet3-10/+12
IP tunnels can apparently update dev->needed_headroom in their xmit path. This patch takes care of three tunnels xmit, and also the core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA() helpers. More changes might be needed for completeness. BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1: ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0: ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653 process_one_work+0x3e6/0x750 kernel/workqueue.c:2390 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537 kthread+0x1ac/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x0dd4 -> 0x0e14 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: mld mld_ifc_work Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15ice: avoid bonding causing auxiliary plug/unplug under RTNL lockDave Ertman2-20/+13
RDMA is not supported in ice on a PF that has been added to a bonded interface. To enforce this, when an interface enters a bond, we unplug the auxiliary device that supports RDMA functionality. This unplug currently happens in the context of handling the netdev bonding event. This event is sent to the ice driver under RTNL context. This is causing a deadlock where the RDMA driver is waiting for the RTNL lock to complete the removal. Defer the unplugging/re-plugging of the auxiliary device to the service task so that it is not performed under the RTNL lock context. Cc: [email protected] # 6.1.x Reported-by: Jaroslav Pulchart <[email protected]> Link: https://lore.kernel.org/netdev/CAK8fFZ6A_Gphw_3-QMGKEFQk=sfCw1Qmq0TVZK3rtAi7vb621A@mail.gmail.com/ Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave") Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed") Signed-off-by: Dave Ertman <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-15kbuild: deb-pkg: use dh_listpackages to know enabled packagesMasahiro Yamada2-29/+39
Use dh_listpackages to get a list of all binary packages. With this, debian/control lists which binary packages will be produced. Previously, ARCH=um listed linux-libc-dev in debian/control, but it was not generated because each of mkdebian and builddeb independently maintained the if-conditionals. Another motivation is to allow scripts/package/builddeb to get the package name (linux-image-*, etc.) dynamically from debian/control. This will also allow the BuildProfile to control the generation of the binary packages. Signed-off-by: Masahiro Yamada <[email protected]>
2023-03-15kbuild: deb-pkg: split image and debug objects staging out into functionsMasahiro Yamada1-106/+116
Prepare for the refactoring in the next commit. Signed-off-by: Masahiro Yamada <[email protected]>