aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-09-15ipv6: lockless IPV6_MINHOPCOUNT implementationEric Dumazet1-16/+15
Add one missing READ_ONCE() annotation in do_ipv6_getsockopt() and make IPV6_MINHOPCOUNT setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_MTU implementationEric Dumazet2-16/+18
np->frag_size can be read/written without holding socket lock. Add missing annotations and make IPV6_MTU setsockopt() lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_MULTICAST_HOPS implementationEric Dumazet6-25/+21
This fixes data-races around np->mcast_hops, and make IPV6_MULTICAST_HOPS lockless. Note that np->mcast_hops is never negative, thus can fit an u8 field instead of s16. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_MULTICAST_LOOP implementationEric Dumazet8-25/+30
Add inet6_{test|set|clear|assign}_bit() helpers. Note that I am using bits from inet->inet_flags, this might change in the future if we need more flags. While solving data-races accessing np->mc_loop, this patch also allows to implement lockless accesses to np->mcast_hops in the following patch. Also constify sk_mc_loop() argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_UNICAST_HOPS implementationEric Dumazet6-24/+16
Some np->hop_limit accesses are racy, when socket lock is not held. Add missing annotations and switch to full lockless implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni948-11119/+23357
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14Merge tag 'net-6.6-rc2' of ↵Linus Torvalds36-207/+352
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Quite unusually, this does not contains any fix coming from subtrees (nf, ebpf, wifi, etc). Current release - regressions: - bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active() Previous releases - regressions: - ipv4: fix one memleak in __inet_del_ifa() - tcp: fix bind() regressions for v4-mapped-v6 addresses. - tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() - dsa: fixes for SJA1105 FDB regressions - veth: update XDP feature set when bringing up device - igb: fix hangup when enabling SR-IOV Previous releases - always broken: - kcm: fix memory leak in error path of kcm_sendmsg() - smc: fix data corruption in smcr_port_add - microchip: fix possible memory leak for vcap_dup_rule()" * tag 'net-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits) kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). net: renesas: rswitch: Add spin lock protection for irq {un}mask net: renesas: rswitch: Fix unmasking irq condition igb: clean up in all error paths when enabling SR-IOV ixgbe: fix timestamp configuration code selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c. selftest: tcp: Move expected_errno into each test case in bind_wildcard.c. selftest: tcp: Fix address length in bind_wildcard.c. tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address. tcp: Fix bind() regression for v4-mapped-v6 wildcard address. tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any). ipv6: fix ip6_sock_set_addr_preferences() typo veth: Update XDP feature set when bringing up device net: macb: fix sleep inside spinlock net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988 net: ethernet: mtk_eth_soc: fix uninitialized variable kcm: Fix memory leak in error path of kcm_sendmsg() r8152: check budget for r8152_poll() net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset ...
2023-09-14ipv6: mcast: Remove redundant comparison in igmp6_mcf_get_next()Gavrilov Ilia1-2/+0
The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230912084100.1502379-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14ipv4: igmp: Remove redundant comparison in igmp_mcf_get_next()Gavrilov Ilia1-2/+0
The 'state->im' value will always be non-zero after the 'while' statement, so the check can be removed. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230912084039.1501984-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14Merge branch 'udp-round-of-data-races-fixes'Paolo Abeni13-105/+118
Eric Dumazet says: ==================== udp: round of data-races fixes This series is inspired by multiple syzbot reports. Many udp fields reads or writes are racy. Add a proper udp->udp_flags and move there all flags needing atomic safety. Also add missing READ_ONCE()/WRITE_ONCE() when lockless readers need access to specific fields. ==================== Link: https://lore.kernel.org/r/20230912091730.1591459-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udplite: fix various data-racesEric Dumazet4-23/+27
udp->pcflag, udp->pcslen and udp->pcrlen reads/writes are racy. Move udp->pcflag to udp->udp_flags for atomicity, and add READ_ONCE()/WRITE_ONCE() annotations for pcslen and pcrlen. Fixes: ba4e58eca8aa ("[NET]: Supporting UDP-Lite (RFC 3828) in Linux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udplite: remove UDPLITE_BITEric Dumazet3-5/+2
This flag is set but never read, we can remove it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: annotate data-races around udp->encap_typeEric Dumazet6-15/+17
syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing. Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field. syzbot report was: BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x05 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GROEric Dumazet5-20/+12
Move udp->encap_enabled to udp->udp_flags. Add udp_test_and_set_bit() helper to allow lockless udp_tunnel_encap_enable() implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flagsEric Dumazet2-8/+10
These are read locklessly, move them to udp_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: add missing WRITE_ONCE() around up->encap_rcvEric Dumazet1-2/+4
UDP_ENCAP_ESPINUDP_NON_IKE setsockopt() writes over up->encap_rcv while other cpus read it. Fixes: 067b207b281d ("[UDP]: Cleanup UDP encapsulation code") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->gro_enabled to udp->udp_flagsEric Dumazet4-7/+7
syzbot reported that udp->gro_enabled can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->no_check6_rx to udp->udp_flagsEric Dumazet3-10/+10
syzbot reported that udp->no_check6_rx can be read locklessly. Use one atomic bit from udp->udp_flags. Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: move udp->no_check6_tx to udp->udp_flagsEric Dumazet3-9/+9
syzbot reported that udp->no_check6_tx can be read locklessly. Use one atomic bit from udp->udp_flags Fixes: 1c19448c9ba6 ("net: Make enabling of zero UDP6 csums more restrictive") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14udp: introduce udp->udp_flagsEric Dumazet3-16/+30
According to syzbot, it is time to use proper atomic flags for various UDP flags. Add udp_flags field, and convert udp->corkflag to first bit in it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14net: ethernet: mtk_wed: check update_wo_rx_stats in mtk_wed_update_rx_stats()Lorenzo Bianconi1-0/+3
Check if update_wo_rx_stats function pointer is properly set in mtk_wed_update_rx_stats routine before accessing it. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/b0d233386e059bccb59f18f69afb79a7806e5ded.1694507226.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14net: ethernet: mtk_eth_soc: rely on mtk_pse_port definitions in ↵Lorenzo Bianconi1-2/+2
mtk_flow_set_output_device Similar to ethernet ports, rely on mtk_pse_port definitions for pse wdma ports as well. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/b86bdb717e963e3246c1dec5f736c810703cf056.1694506814.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14net: wangxun: move MDIO bus implementation to the libraryJiawen Wu6-172/+106
Move similar code of accessing MDIO bus from txgbe/ngbe to libwx. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230912031424.721386-1-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().Kuniyuki Iwashima1-7/+8
syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by updating kcm_tx_msg(head)->last_skb if partial data is copied so that the following sendmsg() will resume from the skb. However, we cannot know how many bytes were copied when we get the error. Thus, we could mess up the MSG_MORE queue. When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we do so for UDP by udp_flush_pending_frames(). Even without this change, when the error occurred, the following sendmsg() resumed from a wrong skb and the queue was messed up. However, we have yet to get such a report, and only syzkaller stumbled on it. So, this can be changed safely. Note this does not change SOCK_SEQPACKET behaviour. Fixes: c821a88bd720 ("kcm: Fix memory leak in error path of kcm_sendmsg()") Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14Merge branch 'net-renesas-rswitch-fix-a-lot-of-redundant-irq-issue'Paolo Abeni2-4/+18
Yoshihiro Shimoda says: ==================== net: renesas: rswitch: Fix a lot of redundant irq issue After this patch series was applied, a lot of redundant interrupts no longer occur. For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider Before the patches are applied: about 800,000 times happened After the patches were applied: about 100,000 times happened ==================== Link: https://lore.kernel.org/r/20230912014936.3175430-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14net: renesas: rswitch: Add spin lock protection for irq {un}maskYoshihiro Shimoda2-0/+14
Add spin lock protection for irq {un}mask registers' control. After napi_complete_done() and this protection were applied, a lot of redundant interrupts no longer occur. For example: when "iperf3 -c <ipaddr> -R" on R-Car S4-8 Spider Before the patches are applied: about 800,000 times happened After the patches were applied: about 100,000 times happened Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14net: renesas: rswitch: Fix unmasking irq conditionYoshihiro Shimoda1-4/+4
Fix unmasking irq condition by using napi_complete_done(). Otherwise, redundant interrupts happen. Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14atl1c: Work around the DMA RX overflow issueSieng-Piaw Liew2-54/+16
This is based on alx driver commit 881d0327db37 ("net: alx: Work around the DMA RX overflow issue"). The alx and atl1c drivers had RX overflow error which was why a custom allocator was created to avoid certain addresses. The simpler workaround then created for alx driver, but not for atl1c due to lack of tester. Instead of using a custom allocator, check the allocated skb address and use skb_reserve() to move away from problematic 0x...fc0 address. Tested on AR8131 on Acer 4540. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com> Link: https://lore.kernel.org/r/20230912010711.12036-1-liew.s.piaw@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14Merge branch 'vsock-handle-writes-to-shutdowned-socket'Paolo Abeni2-0/+141
Arseniy Krasnov says: ==================== vsock: handle writes to shutdowned socket this small patchset adds POSIX compliant behaviour on writes to the socket which was shutdowned with 'shutdown()' (both sides - local with SHUT_WR flag, peer - with SHUT_RD flag). According POSIX we must send SIGPIPE in such cases (but SIGPIPE is not send when MSG_NOSIGNAL is set). First patch is implemented in the same way as net/ipv4/tcp.c:tcp_sendmsg_locked(). It uses 'sk_stream_error()' function which handles EPIPE error. Another way is to use code from net/unix/af_unix.c:unix_stream_sendmsg() where same logic from 'sk_stream_error()' is implemented "from scratch", but it doesn't check 'sk_err' field. I think error from this field has more priority to be returned from syscall. So I guess it is better to reuse currently implemented 'sk_stream_error()' function. Test is also added. ==================== Link: https://lore.kernel.org/r/20230911202027.1928574-1-avkrasnov@salutedevices.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14test/vsock: shutdowned socket testArseniy Krasnov1-0/+138
This adds two tests for 'shutdown()' call. It checks that SIGPIPE is sent when MSG_NOSIGNAL is not set and vice versa. Both flags SHUT_WR and SHUT_RD are tested. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-14vsock: send SIGPIPE on write to shutdowned socketArseniy Krasnov1-0/+3
POSIX requires to send SIGPIPE on write to SOCK_STREAM socket which was shutdowned with SHUT_WR flag or its peer was shutdowned with SHUT_RD flag. Also we must not send SIGPIPE if MSG_NOSIGNAL flag is set. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-13Merge tag 'pmdomain-v6.6-rc1' of ↵Linus Torvalds87-12/+12
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull genpm / pmdomain rename from Ulf Hansson: "This renames the genpd subsystem to pmdomain. As discussed on LKML, using 'genpd' as the name of a subsystem isn't very self-explanatory and the acronym itself that means Generic PM Domain, is known only by a limited group of people. The suggestion to improve the situation is to rename the subsystem to 'pmdomain', which there seems to be a good consensus around using. Ideally it should indicate that its purpose is to manage Power Domains or 'PM domains' as we often also use within the Linux Kernel terminology" * tag 'pmdomain-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: Rename the genpd subsystem to pmdomain
2023-09-13Merge tag 'tpmdd-v6.6-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fix from Jarkko Sakkinen. * tag 'tpmdd-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Fix typo in tpmrm class definition
2023-09-13Merge tag 'parisc-for-6.6-rc2' of ↵Linus Torvalds23-76/+207
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: - fix reference to exported symbols for parisc64 [Masahiro Yamada] - Block-TLB (BTLB) support on 32-bit CPUs - sparse and build-warning fixes * tag 'parisc-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: linux/export: fix reference to exported functions for parisc64 parisc: BTLB: Initialize BTLB tables at CPU startup parisc: firmware: Simplify calling non-PA20 functions parisc: BTLB: _edata symbol has to be page aligned for BTLB support parisc: BTLB: Add BTLB insert and purge firmware function wrappers parisc: BTLB: Clear possibly existing BTLB entries parisc: Prepare for Block-TLB support on 32-bit kernel parisc: shmparam.h: Document aliasing requirements of PA-RISC parisc: irq: Make irq_stack_union static to avoid sparse warning parisc: drivers: Fix sparse warning parisc: iosapic.c: Fix sparse warnings parisc: ccio-dma: Fix sparse warnings parisc: sba-iommu: Fix sparse warnigs parisc: sba: Fix compile warning wrt list of SBA devices parisc: sba_iommu: Fix build warning if procfs if disabled
2023-09-13Merge tag 'trace-v6.6-rc1' of ↵Linus Torvalds12-46/+152
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Add missing LOCKDOWN checks for eventfs callers When LOCKDOWN is active for tracing, it causes inconsistent state when some functions succeed and others fail. - Use dput() to free the top level eventfs descriptor There was a race between accesses and freeing it. - Fix a long standing bug that eventfs exposed due to changing timings by dynamically creating files. That is, If a event file is opened for an instance, there's nothing preventing the instance from being removed which will make accessing the files cause use-after-free bugs. - Fix a ring buffer race that happens when iterating over the ring buffer while writers are active. Check to make sure not to read the event meta data if it's beyond the end of the ring buffer sub buffer. - Fix the print trigger that disappeared because the test to create it was looking for the event dir field being filled, but now it has the "ef" field filled for the eventfs structure. - Remove the unused "dir" field from the event structure. - Fix the order of the trace_dynamic_info as it had it backwards for the offset and len fields for which one was for which endianess. - Fix NULL pointer dereference with eventfs_remove_rec() If an allocation fails in one of the eventfs_add_*() functions, the caller of it in event_subsystem_dir() or event_create_dir() assigns the result to the structure. But it's assigning the ERR_PTR and not NULL. This was passed to eventfs_remove_rec() which expects either a good pointer or a NULL, not ERR_PTR. The fix is to not assign the ERR_PTR to the structure, but to keep it NULL on error. - Fix list_for_each_rcu() to use list_for_each_srcu() in dcache_dir_open_wrapper(). One iteration of the code used RCU but because it had to call sleepable code, it had to be changed to use SRCU, but one of the iterations was missed. - Fix synthetic event print function to use "as_u64" instead of passing in a pointer to the union. To fix big/little endian issues, the u64 that represented several types was turned into a union to define the types properly. * tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec() tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper() tracing/synthetic: Print out u64 values properly tracing/synthetic: Fix order of struct trace_dynamic_info selftests/ftrace: Fix dependencies for some of the synthetic event tests tracing: Remove unused trace_event_file dir field tracing: Use the new eventfs descriptor for print trigger ring-buffer: Do not attempt to read past "commit" tracefs/eventfs: Free top level files on removal ring-buffer: Avoid softlockup in ring_buffer_resize() tracing: Have event inject files inc the trace array ref count tracing: Have option files inc the trace array ref count tracing: Have current_trace inc the trace array ref count tracing: Have tracing_max_latency inc the trace array ref count tracing: Increase trace array ref count on enable and filter files tracefs/eventfs: Use dput to free the toplevel events directory tracefs/eventfs: Add missing lockdown checks tracefs: Add missing lockdown check to tracefs_create_dir()
2023-09-13Merge branch 'selftests-classid'David S. Miller4-16/+120
Pedro Tammela says: ==================== selftests/tc-testing: add tests covering classid Patches 1-3 add missing tests covering classid behaviour on tdc for cls_fw, cls_route and cls_fw. This behaviour was recently fixed by valis[0]. Patch 4 comes from the development done in the previous patches as it turns out cls_route never returns meaningful errors. [0] https://lore.kernel.org/all/20230729123202.72406-1-jhs@mojatatu.com/ v2->v3: https://lore.kernel.org/all/20230825155148.659895-1-pctammela@mojatatu.com/ - Added changes that were left in the working tree (Jakub) - Fixed two typos in commit message titles - Added Victor tags v1->v2: https://lore.kernel.org/all/20230818163544.351104-1-pctammela@mojatatu.com/ - Drop u32 updates ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13net/sched: cls_route: make netlink errors meaningfulPedro Tammela1-16/+21
Use netlink extended ack and parsing policies to return more meaningful errors instead of the relying solely on errnos. Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13selftests/tc-testing: cls_u32: add tests for classidPedro Tammela1-0/+25
As discussed in '3044b16e7c6f', cls_u32 was handling the use of classid incorrectly. Add a test to check if it's conforming to the correct behaviour. Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13selftests/tc-testing: cls_route: add tests for classidPedro Tammela1-0/+25
As discussed in 'b80b829e9e2c', cls_route was handling the use of classid incorrectly. Add a test to check if it's conforming to the correct behaviour. Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13selftests/tc-testing: cls_fw: add tests for classidPedro Tammela1-0/+49
As discussed in '76e42ae83199', cls_fw was handling the use of classid incorrectly. Add a few tests to check if it's conforming to the correct behaviour. Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13igb: clean up in all error paths when enabling SR-IOVCorinna Vinschen1-1/+4
After commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), removing the igb module could hang or crash (depending on the machine) when the module has been loaded with the max_vfs parameter set to some value != 0. In case of one test machine with a dual port 82580, this hang occurred: [ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1 [ 233.093257] igb 0000:41:00.1: IOV Disabled [ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0 [ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000 [ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First) [ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata) [ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000 [ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First) [ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c [ 233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback) [ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0 [ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed [ 234.157244] igb 0000:41:00.0: IOV Disabled [ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds. [ 371.627489] Not tainted 6.4.0-dirty #2 [ 371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this. [ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0 [ 371.650330] Call Trace: [ 371.653061] <TASK> [ 371.655407] __schedule+0x20e/0x660 [ 371.659313] schedule+0x5a/0xd0 [ 371.662824] schedule_preempt_disabled+0x11/0x20 [ 371.667983] __mutex_lock.constprop.0+0x372/0x6c0 [ 371.673237] ? __pfx_aer_root_reset+0x10/0x10 [ 371.678105] report_error_detected+0x25/0x1c0 [ 371.682974] ? __pfx_report_normal_detected+0x10/0x10 [ 371.688618] pci_walk_bus+0x72/0x90 [ 371.692519] pcie_do_recovery+0xb2/0x330 [ 371.696899] aer_process_err_devices+0x117/0x170 [ 371.702055] aer_isr+0x1c0/0x1e0 [ 371.705661] ? __set_cpus_allowed_ptr+0x54/0xa0 [ 371.710723] ? __pfx_irq_thread_fn+0x10/0x10 [ 371.715496] irq_thread_fn+0x20/0x60 [ 371.719491] irq_thread+0xe6/0x1b0 [ 371.723291] ? __pfx_irq_thread_dtor+0x10/0x10 [ 371.728255] ? __pfx_irq_thread+0x10/0x10 [ 371.732731] kthread+0xe2/0x110 [ 371.736243] ? __pfx_kthread+0x10/0x10 [ 371.740430] ret_from_fork+0x2c/0x50 [ 371.744428] </TASK> The reproducer was a simple script: #!/bin/sh for i in `seq 1 5`; do modprobe -rv igb modprobe -v igb max_vfs=1 sleep 1 modprobe -rv igb done It turned out that this could only be reproduce on 82580 (quad and dual-port), but not on 82576, i350 and i210. Further debugging showed that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because dev->is_physfn is 0 on 82580. Prior to commit 50f303496d92 ("igb: Enable SR-IOV after reinit"), igb_enable_sriov() jumped into the "err_out" cleanup branch. After this commit it only returned the error code. So the cleanup didn't take place, and the incorrect VF setup in the igb_adapter structure fooled the igb driver into assuming that VFs have been set up where no VF actually existed. Fix this problem by cleaning up again if pci_enable_sriov() fails. Fixes: 50f303496d92 ("igb: Enable SR-IOV after reinit") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13ixgbe: fix timestamp configuration codeVadim Fedorenko1-13/+15
The commit in fixes introduced flags to control the status of hardware configuration while processing packets. At the same time another structure is used to provide configuration of timestamper to user-space applications. The way it was coded makes this structures go out of sync easily. The repro is easy for 82599 chips: [root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1 current settings: tx_type 0 rx_filter 0 new settings: tx_type 1 rx_filter 12 The eth0 device is properly configured to timestamp any PTPv2 events. [root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1 current settings: tx_type 1 rx_filter 12 SIOCSHWTSTAMP failed: Numerical result out of range The requested time stamping mode is not supported by the hardware. The error is properly returned because HW doesn't support all packets timestamping. But the adapter->flags is cleared of timestamp flags even though no HW configuration was done. From that point no RX timestamps are received by user-space application. But configuration shows good values: [root@hostname ~]# hwstamp_ctl -i eth0 current settings: tx_type 1 rx_filter 12 Fix the issue by applying new flags only when the HW was actually configured. Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13MAINTAINERS: update tg3 maintainer listAndy Gospodarek1-2/+1
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Prashant Sreedharan <prashant.sreedharan@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13net: hinic: Use devm_kasprintf()Christophe JAILLET1-5/+3
Use devm_kasprintf() instead of hand writing it. This is less verbose and less error prone. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13pmdomain: Rename the genpd subsystem to pmdomainUlf Hansson87-12/+12
It has been pointed out that naming a subsystem "genpd" isn't very self-explanatory and the acronym itself that means Generic PM Domain, is known only by a limited group of people. In a way to improve the situation, let's rename the subsystem to pmdomain, which ideally should indicate that this is about so called Power Domains or "PM domains" as we often also use within the Linux Kernel terminology. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20230912221127.487327-1-ulf.hansson@linaro.org
2023-09-13Merge branch 'tcp-bind-fixes'David S. Miller3-27/+82
Kuniyuki Iwashima says: ==================== tcp: Fix bind() regression for v4-mapped-v6 address Since bhash2 was introduced, bind() is broken in two cases related to v4-mapped-v6 address. This series fixes the regression and adds test to cover the cases. Changes: v2: * Added patch 1 to factorise duplicated comparison (Eric Dumazet) v1: https://lore.kernel.org/netdev/20230911165106.39384-1-kuniyu@amazon.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13selftest: tcp: Add v4-mapped-v6 cases in bind_wildcard.c.Kuniyuki Iwashima1-0/+46
We add these 8 test cases in bind_wildcard.c to check bind() conflicts. 1st bind() 2nd bind() --------- --------- 0.0.0.0 ::FFFF:0.0.0.0 ::FFFF:0.0.0.0 0.0.0.0 0.0.0.0 ::FFFF:127.0.0.1 ::FFFF:127.0.0.1 0.0.0.0 127.0.0.1 ::FFFF:0.0.0.0 ::FFFF:0.0.0.0 127.0.0.1 127.0.0.1 ::FFFF:127.0.0.1 ::FFFF:127.0.0.1 127.0.0.1 All test passed without bhash2 and with bhash2 and this series. Before bhash2: $ uname -r 6.0.0-rc1-00393-g0bf73255d3a3 $ ./bind_wildcard ... # PASSED: 16 / 16 tests passed. Just after bhash2: $ uname -r 6.0.0-rc1-00394-g28044fc1d495 $ ./bind_wildcard ... ok 15 bind_wildcard.v4_local_v6_v4mapped_local.v4_v6 not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4 # FAILED: 15 / 16 tests passed. On net.git: $ ./bind_wildcard ... not ok 14 bind_wildcard.v4_local_v6_v4mapped_any.v6_v4 not ok 16 bind_wildcard.v4_local_v6_v4mapped_local.v6_v4 # FAILED: 13 / 16 tests passed. With this series: $ ./bind_wildcard ... # PASSED: 16 / 16 tests passed. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13selftest: tcp: Move expected_errno into each test case in bind_wildcard.c.Kuniyuki Iwashima1-10/+10
This is a preparation patch for the following patch. Let's define expected_errno in each test case so that we can add other test cases easily. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13selftest: tcp: Fix address length in bind_wildcard.c.Kuniyuki Iwashima1-1/+1
The selftest passes the IPv6 address length for an IPv4 address. We should pass the correct length. Note inet_bind_sk() does not check if the size is larger than sizeof(struct sockaddr_in), so there is no real bug in this selftest. Fixes: 13715acf8ab5 ("selftest: Add test for bind() conflicts.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-13tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.Kuniyuki Iwashima1-1/+6
Since bhash2 was introduced, the example below does not work as expected. These two bind() should conflict, but the 2nd bind() now succeeds. from socket import * s1 = socket(AF_INET6, SOCK_STREAM) s1.bind(('::ffff:127.0.0.1', 0)) s2 = socket(AF_INET, SOCK_STREAM) s2.bind(('127.0.0.1', s1.getsockname()[1])) During the 2nd bind() in inet_csk_get_port(), inet_bind2_bucket_find() fails to find the 1st socket's tb2, so inet_bind2_bucket_create() allocates a new tb2 for the 2nd socket. Then, we call inet_csk_bind_conflict() that checks conflicts in the new tb2 by inet_bhash2_conflict(). However, the new tb2 does not include the 1st socket, thus the bind() finally succeeds. In this case, inet_bind2_bucket_match() must check if AF_INET6 tb2 has the conflicting v4-mapped-v6 address so that inet_bind2_bucket_find() returns the 1st socket's tb2. Note that if we bind two sockets to 127.0.0.1 and then ::FFFF:127.0.0.1, the 2nd bind() fails properly for the same reason mentinoed in the previous commit. Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>