aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2024-03-01iommu: constify fwnode in iommu_ops_from_fwnode()Krzysztof Kozlowski1-2/+2
Make pointer to fwnode_handle a pointer to const for code safety. Signed-off-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2024-03-01iommu: constify of_phandle_args in xlateKrzysztof Kozlowski1-2/+2
The xlate callbacks are supposed to translate of_phandle_args to proper provider without modifying the of_phandle_args. Make the argument pointer to const for code safety and readability. Signed-off-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2024-03-01pidfs: convert to path_from_stashed() helperChristian Brauner2-0/+2
Moving pidfds from the anonymous inode infrastructure to a separate tiny in-kernel filesystem similar to sockfs, pipefs, and anon_inodefs causes selinux denials and thus various userspace components that make heavy use of pidfds to fail as pidfds used anon_inode_getfile() which aren't subject to any LSM hooks. But dentry_open() is and that would cause regressions. The failures that are seen are selinux denials. But the core failure is dbus-broker. That cascades into other services failing that depend on dbus-broker. For example, when dbus-broker fails to start polkit and all the others won't be able to work because they depend on dbus-broker. The reason for dbus-broker failing is because it doesn't handle failures for SO_PEERPIDFD correctly. Last kernel release we introduced SO_PEERPIDFD (and SCM_PIDFD). SO_PEERPIDFD allows dbus-broker and polkit and others to receive a pidfd for the peer of an AF_UNIX socket. This is the first time in the history of Linux that we can safely authenticate clients in a race-free manner. dbus-broker immediately made use of this but messed up the error checking. It only allowed EINVAL as a valid failure for SO_PEERPIDFD. That's obviously problematic not just because of LSM denials but because of seccomp denials that would prevent SO_PEERPIDFD from working; or any other new error code from there. So this is catching a flawed implementation in dbus-broker as well. It has to fallback to the old pid-based authentication when SO_PEERPIDFD doesn't work no matter the reasons otherwise it'll always risk such failures. So overall that LSM denial should not have caused dbus-broker to fail. It can never assume that a feature released one kernel ago like SO_PEERPIDFD can be assumed to be available. So, the next fix separate from the selinux policy update is to try and fix dbus-broker at [3]. That should make it into Fedora as well. In addition the selinux reference policy should also be updated. See [4] for that. If Selinux is in enforcing mode in userspace and it encounters anything that it doesn't know about it will deny it by default. And the policy is entirely in userspace including declaring new types for stuff like nsfs or pidfs to allow it. For now we continue to raise S_PRIVATE on the inode if it's a pidfs inode which means things behave exactly like before. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2265630 Link: https://github.com/fedora-selinux/selinux-policy/pull/2050 Link: https://github.com/bus1/dbus-broker/pull/343 [3] Link: https://github.com/SELinuxProject/refpolicy/pull/762 [4] Reported-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner Signed-off-by: Christian Brauner <[email protected]>
2024-03-01nsfs: convert to path_from_stashed() helperChristian Brauner2-2/+2
Use the newly added path_from_stashed() helper for nsfs. Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner Signed-off-by: Christian Brauner <[email protected]>
2024-03-01pidfd: add pidfsChristian Brauner2-2/+11
This moves pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This has been on my todo for quite a while as it will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows: * statx() on pidfds becomes useful for the first time. * pidfds can be compared simply via statx() and then comparing inode numbers. * pidfds have unique inode numbers for the system lifetime. * struct pid is now stashed in inode->i_private instead of file->private_data. This means it is now possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. * file->private_data is freed up for per-file options for pidfds. * Each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. Even if we were to move to anon_inode_create_getfile() which creates new inodes we'd still be associating the same struct pid with multiple different inodes. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode for each struct pid and we reuse that inode for all pidfds. We use iget_locked() to find that inode again based on the inode number which isn't recycled. We allocate a new dentry for each pidfd that uses the same inode. That is similar to anonymous inodes which reuse the same inode for thousands of dentries. For pidfds we're talking way less than that. There usually won't be a lot of concurrent openers of the same struct pid. They can probably often be counted on two hands. I know that systemd does use separate pidfd for the same struct pid for various complex process tracking issues. So I think with that things actually become way simpler. Especially because we don't have to care about lookup. Dentries and inodes continue to be always deleted. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs which uses a similar stashing mechanism just for namespaces. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-03-01net: bql: fix building with BQL disabledArnd Bergmann1-0/+10
It is now possible to disable BQL, but that causes the cpsw driver to break: drivers/net/ethernet/ti/am65-cpsw-nuss.c:297:28: error: no member named 'dql' in 'struct netdev_queue' 297 | dql_avail(&netif_txq->dql), There is already a helper function in net/sch_generic.h that could be used to help here. Move its implementation into the common linux/netdevice.h along with the other bql interfaces and change both users over to the new interface. Fixes: ea7f3cfaa588 ("net: bql: allow the config to be disabled") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-01Simplify net_dbg_ratelimited() dummyGeert Uytterhoeven1-4/+1
There is no need to wrap calls to the no_printk() helper inside an always-false check, as no_printk() already does that internally. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-01ipv6: add ipv6_devconf_read_txrx cacheline_groupEric Dumazet1-4/+9
IPv6 TX and RX fast path use the following fields: - disable_ipv6 - hop_limit - mtu6 - forwarding - disable_policy - proxy_ndp Place them in a group to increase data locality. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-01Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memoryMichael Kelley1-1/+21
The VMBUS_RING_SIZE macro adds space for a ring buffer header to the requested ring buffer size. The header size is always 1 page, and so its size varies based on the PAGE_SIZE for which the kernel is built. If the requested ring buffer size is a large power-of-2 size and the header size is small, the resulting size is inefficient in its use of memory. For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory allocator, and wastes 508 Kbytes of memory. In such situations, the exact size of the ring buffer isn't that important, and it's OK to allocate the 4 Kbyte header at the beginning of the 512 Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted. Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer sizes. "Large" is somewhat arbitrarily defined as 8 times the size of the ring buffer header (which is of size PAGE_SIZE). For example, for 4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first 4 Kbytes as the ring buffer header. For 64 Kbyte PAGE_SIZE, ring buffers of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer header. In both cases, smaller sizes add space for the header so the ring size isn't reduced too much by using part of the space for the header. For example, with a 64 Kbyte page size, we don't want a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half of the space for the header. In such a case, the memory allocation is less efficient, but it's the best that can be done. While the new algorithm slightly changes the amount of space allocated for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't known to be sensitive to small changes in ring buffer size, so there shouldn't be any effect. Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL") Fixes: 6941f67ad37d ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502 Cc: [email protected] Signed-off-by: Michael Kelley <[email protected]> Reviewed-by: Saurabh Sengar <[email protected]> Reviewed-by: Dexuan Cui <[email protected]> Tested-by: Souradeep Chakrabarti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]> Message-ID: <[email protected]>
2024-02-29lib/string_helpers: Add flags param to string_get_size()Andy Shevchenko1-3/+7
The new flags parameter allows controlling - Whether or not the units suffix is separated by a space, for compatibility with sort -h - Whether or not to append a B suffix - we're not always printing bytes. Co-developed-by: Kent Overstreet <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Kent Overstreet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-02-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski12-28/+32
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/[email protected]/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-29workqueue: Drain BH work items on hot-unplugged CPUsTejun Heo1-0/+1
Boqun pointed out that workqueues aren't handling BH work items on offlined CPUs. Unlike tasklet which transfers out the pending tasks from CPUHP_SOFTIRQ_DEAD, BH workqueue would just leave them pending which is problematic. Note that this behavior is specific to BH workqueues as the non-BH per-CPU workers just become unbound when the CPU goes offline. This patch fixes the issue by draining the pending BH work items from an offlined CPU from CPUHP_SOFTIRQ_DEAD. Because work items carry more context, it's not as easy to transfer the pending work items from one pool to another. Instead, run BH work items which execute the offlined pools on an online CPU. Note that this assumes that no further BH work items will be queued on the offlined CPUs. This assumption is shared with tasklet and should be fine for conversions. However, this issue also exists for per-CPU workqueues which will just keep executing work items queued after CPU offline on unbound workers and workqueue should reject per-CPU and BH work items queued on offline CPUs. This will be addressed separately later. Signed-off-by: Tejun Heo <[email protected]> Reported-and-reviewed-by: Boqun Feng <[email protected]> Link: http://lkml.kernel.org/r/Zdvw0HdSXcU3JZ4g@boqun-archlinux
2024-02-29overflow: Use POD in check_shl_overflow()Andy Shevchenko1-1/+1
The check_shl_overflow() uses u64 type that is defined in types.h. Instead of including that header, just switch to use POD type directly. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-02-29kernel.h: Move lib/cmdline.c prototypes to string.hAndy Shevchenko2-6/+8
The lib/cmdline.c is basically a set of some small string parsers which are wide used in the kernel. Their prototypes belong to the string.h rather then kernel.h. Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-02-29fortify: Improve buffer overflow reportingKees Cook1-26/+30
Improve the reporting of buffer overflows under CONFIG_FORTIFY_SOURCE to help accelerate debugging efforts. The calculations are all just sitting in registers anyway, so pass them along to the function to be reported. For example, before: detected buffer overflow in memcpy and after: memcpy: detected buffer overflow: 4096 byte read of buffer size 1 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-02-29fortify: Provide KUnit counters for failure testingKees Cook1-20/+23
The standard C string APIs were not designed to have a failure mode; they were expected to always succeed without memory safety issues. Normally, CONFIG_FORTIFY_SOURCE will use fortify_panic() to stop processing, as truncating a read or write may provide an even worse system state. However, this creates a problem for testing under things like KUnit, which needs a way to survive failures. When building with CONFIG_KUNIT, provide a failure path for all users of fortify_panic, and track whether the failure was a read overflow or a write overflow, for KUnit tests to examine. Inspired by similar logic in the slab tests. Signed-off-by: Kees Cook <[email protected]>
2024-02-29fortify: Split reporting and avoid passing string pointerKees Cook1-21/+60
In preparation for KUnit testing and further improvements in fortify failure reporting, split out the report and encode the function and access failure (read or write overflow) into a single u8 argument. This mainly ends up saving a tiny bit of space in the data segment. For a defconfig with FORTIFY_SOURCE enabled: $ size gcc/vmlinux.before gcc/vmlinux.after text data bss dec hex filename 26132309 9760658 2195460 38088427 2452eeb gcc/vmlinux.before 26132386 9748382 2195460 38076228 244ff44 gcc/vmlinux.after Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2024-02-29refcount: Annotated intentional signed integer wrap-aroundKees Cook1-3/+6
Mark the various refcount_t functions with __signed_wrap, as we depend on the wrapping behavior to detect the overflow and perform saturation. Silences warnings seen with the LKDTM REFCOUNT_* tests: UBSAN: signed-integer-overflow in ../include/linux/refcount.h:189:11 2147483647 + 1 cannot be represented in type 'int' Reviewed-by: Miguel Ojeda <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-02-29lib/string_choices: Add str_plural() helperMichal Wajdeczko1-0/+11
Add str_plural() helper to replace existing open implementations used by many drivers and help improve future user facing messages. Signed-off-by: Michal Wajdeczko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-02-29overflow: Introduce wrapping_assign_add() and wrapping_assign_sub()Kees Cook1-0/+32
This allows replacements of the idioms "var += offset" and "var -= offset" with the wrapping_assign_add() and wrapping_assign_sub() helpers respectively. They will avoid wrap-around sanitizer instrumentation. Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Marco Elver <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2024-02-29overflow: Introduce wrapping_add(), wrapping_sub(), and wrapping_mul()Kees Cook1-0/+48
Provide helpers that will perform wrapping addition, subtraction, or multiplication without tripping the arithmetic wrap-around sanitizers. The first argument is the type under which the wrap-around should happen with. In other words, these two calls will get very different results: wrapping_mul(int, 50, 50) == 2500 wrapping_mul(u8, 50, 50) == 196 Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Marco Elver <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2024-02-29overflow: Adjust check_*_overflow() kern-doc to reflect resultsKees Cook1-12/+9
The check_*_overflow() helpers will return results with potentially wrapped-around values. These values have always been checked by the selftests, so avoid the confusing language in the kern-doc. The idea of "safe for use" was relative to the expectation of whether or not the caller wants a wrapped value -- the calculation itself will always follow arithmetic wrapping rules. Reviewed-by: Gustavo A. R. Silva <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2024-02-29kernel.h: Move upper_*_bits() and lower_*_bits() to wordpart.hAndy Shevchenko2-28/+31
The wordpart.h header is collecting APIs related to the handling parts of the word (usually in byte granularity). The upper_*_bits() and lower_*_bits() are good candidates to be moved to there. This helps to clean up header dependency hell with regard to kernel.h as the latter gathers completely unrelated stuff together and slows down compilation (especially when it's included into other header). Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Randy Dunlap <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds3-10/+13
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29cgroup/cpuset: Remove cpuset_do_slab_mem_spread()Xiongwei Song1-10/+0
The SLAB allocator has been removed sine 6.8-rc1 [1], so there is no user with SLAB_MEM_SPREAD and cpuset_do_slab_mem_spread(). Then SLAB_MEM_SPREAD is marked as unused by [2]. Here we can remove cpuset_do_slab_mem_spread(). For more details, please check [3]. [1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://lore.kernel.org/linux-kernel/[email protected]/T/ [3] https://lore.kernel.org/lkml/[email protected]/T/#mf14b838c5e0e77f4756d436bac3d8c0447ea4350 Signed-off-by: Xiongwei Song <[email protected]> Reviewed-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-02-29dpll: fix build failure due to rcu_dereference_check() on unknown typeEric Dumazet1-4/+4
Tasmiya reports that their compiler complains that we deref a pointer to unknown type with rcu_dereference_rtnl(): include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’ Unclear what compiler it is, at the moment, and we can't report but since DPLL can't be a module - move the code from the header into the source file. Fixes: 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") Reported-by: Tasmiya Nalatwad <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-29cpufreq: Limit resolving a frequency to policy min/maxShivnandan Kumar1-1/+14
Resolving a frequency to an efficient one should not transgress policy->max (which can be set for thermal reason) and policy->min. Currently, there is possibility where scaling_cur_freq can exceed scaling_max_freq when scaling_max_freq is an inefficient frequency. Add a check to ensure that resolving a frequency will respect policy->min/max. Cc: All applicable <[email protected]> Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E") Signed-off-by: Shivnandan Kumar <[email protected]> [ rjw: Whitespace adjustment, changelog edits ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-02-29f2fs: stop checkpoint when get a out-of-bounds segmentZhiguo Niu1-0/+1
There is low probability that an out-of-bounds segment will be got on a small-capacity device. In order to prevent subsequent write requests allocating block address from this invalid segment, which may cause unexpected issue, stop checkpoint should be performed. Also introduce a new stop cp reason: STOP_CP_REASON_NO_SEGMENT. Note, f2fs_stop_checkpoint(, false) is complex and it may sleep, so we should move it outside segmap_lock spinlock coverage in get_new_segment(). Signed-off-by: Zhiguo Niu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2024-02-29Merge tag 'nf-24-02-29' of ↵Paolo Abeni1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin. Patch #2 fixes an issue with bridge netfilter and broadcast/multicast packets. There is a day 0 bug in br_netfilter when used with connection tracking. Conntrack assumes that an nf_conn structure that is not yet added to hash table ("unconfirmed"), is only visible by the current cpu that is processing the sk_buff. For bridge this isn't true, sk_buff can get cloned in between, and clones can be processed in parallel on different cpu. This patch disables NAT and conntrack helpers for multicast packets. Patch #3 adds a selftest to cover for the br_netfilter bug. netfilter pull request 24-02-29 * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-29gpio: nomadik: support mobileye,eyeq5-gpioThéo Lebrun1-0/+1
We create a custom compatible for the STA2X11 IP block as integrated into the Mobileye EyeQ5 platform. Its wake and alternate functions have been disabled, we want to avoid touching those registers. We both do: (1) early return in functions that do not support the platform, but with warnings, and (2) avoid calling those functions in the first place. We ensure that pinctrl-nomadik is not used with this STA2X11 variant. Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Théo Lebrun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2024-02-29gpio: nomadik: extract GPIO platform driver from drivers/pinctrl/nomadik/Théo Lebrun1-0/+276
Previously, drivers/pinctrl/nomadik/pinctrl-nomadik.c registered two platform drivers: pinctrl & GPIO. Move the GPIO aspect to the drivers/gpio/ folder, as would be expected. Both drivers are intertwined for a reason; pinctrl requires access to GPIO registers for pinmuxing, pull-disable, disabling interrupts while setting the muxing and wakeup control. Information sharing is done through a shared array containing GPIO chips and a few helper functions. That shared array is not touched from gpio-nomadik when CONFIG_PINCTRL_NOMADIK is not defined. Make no change to the code that moved into gpio-nomadik; there should be no behavior change following. A few functions are shared and header comments are added. Checkpatch warnings are addressed. NUM_BANKS is renamed to NMK_MAX_BANKS. It is supported to compile gpio-nomadik without pinctrl-nomadik. The opposite is not true. Signed-off-by: Théo Lebrun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2024-02-28tcp: remove some holes in struct tcp_sockEric Dumazet1-4/+4
By moving some fields around, this patch shrinks holes size from 56 to 32, saving 24 bytes on 64bit arches. After the patch pahole gives the following for 'struct tcp_sock': /* size: 2304, cachelines: 36, members: 162 */ /* sum members: 2234, holes: 6, sum holes: 32 */ /* sum bitfield members: 34 bits, bit holes: 5, sum bit holes: 14 bits */ /* padding: 32 */ /* paddings: 3, sum paddings: 10 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 12 */ Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28inet: annotate devconf data-racesEric Dumazet1-6/+8
Add READ_ONCE() in ipv4_devconf_get() and corresponding WRITE_ONCE() in ipv4_devconf_set() Add IPV4_DEVCONF_RO() and IPV4_DEVCONF_ALL_RO() macros, and use them when reading devconf fields. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28clk: Add a devm variant of clk_rate_exclusive_get()Uwe Kleine-König1-0/+12
This allows to simplify drivers that use clk_rate_exclusive_get() in their probe routine as calling clk_rate_exclusive_put() is cared for automatically. Signed-off-by: Uwe Kleine-König <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Russell King (Oracle) <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2024-02-29netfilter: bridge: confirm multicast packets before passing them up the stackFlorian Westphal1-0/+1
conntrack nf_confirm logic cannot handle cloned skbs referencing the same nf_conn entry, which will happen for multicast (broadcast) frames on bridges. Example: macvlan0 | br0 / \ ethX ethY ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table. 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. -> skb->_nfct now references a unconfirmed entry 2. skb is broad/mcast packet. bridge now passes clones out on each bridge interface. 3. skb gets passed up the stack. 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices. The clone skb->_nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RX_HANDLER_PASS. 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb. The Macvlan broadcast worker and normal confirm path will race. This race will not happen if step 2 already confirmed a clone. In that case later steps perform skb_clone() with skb->_nfct already confirmed (in hash table). This works fine. But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting. Pablo points out that nf_conntrack_bridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again. This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook. Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry. The downside is that this disables NAT and conntrack helpers. Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example: -m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5 For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings. Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change. Suggested-by: Pablo Neira Ayuso <[email protected]> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777 Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-02-28SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned intDai Ngo1-1/+1
When the NFS client is under extreme load the rpc_wait_queue.qlen counter can be overflowed. Here is an instant of the backlog queue overflow in a real world environment shown by drgn helper: rpc_task_stats(rpc_clnt): ------------------------- rpc_clnt: 0xffff92b65d2bae00 rpc_xprt: 0xffff9275db64f000 Queue: sending[64887] pending[524] backlog[30441] binding[0] XMIT task: 0xffff925c6b1d8e98 WRITE: 750654 __dta_call_status_580: 65463 __dta_call_transmit_status_579: 1 call_reserveresult: 685189 nfs_client_init_is_complete: 1 COMMIT: 584 call_reserveresult: 573 __dta_call_status_580: 11 ACCESS: 1 __dta_call_status_580: 1 GETATTR: 10 __dta_call_status_580: 4 call_reserveresult: 6 751249 tasks for server 111.222.333.444 Total tasks: 751249 count_rpc_wait_queues(xprt): ---------------------------- **** rpc_xprt: 0xffff9275db64f000 num_reqs: 65511 wait_queue: xprt_binding[0] cnt: 0 wait_queue: xprt_binding[1] cnt: 0 wait_queue: xprt_binding[2] cnt: 0 wait_queue: xprt_binding[3] cnt: 0 rpc_wait_queue[xprt_binding].qlen: 0 maxpriority: 0 wait_queue: xprt_sending[0] cnt: 0 wait_queue: xprt_sending[1] cnt: 64887 wait_queue: xprt_sending[2] cnt: 0 wait_queue: xprt_sending[3] cnt: 0 rpc_wait_queue[xprt_sending].qlen: 64887 maxpriority: 3 wait_queue: xprt_pending[0] cnt: 524 wait_queue: xprt_pending[1] cnt: 0 wait_queue: xprt_pending[2] cnt: 0 wait_queue: xprt_pending[3] cnt: 0 rpc_wait_queue[xprt_pending].qlen: 524 maxpriority: 0 wait_queue: xprt_backlog[0] cnt: 0 wait_queue: xprt_backlog[1] cnt: 685801 wait_queue: xprt_backlog[2] cnt: 0 wait_queue: xprt_backlog[3] cnt: 0 rpc_wait_queue[xprt_backlog].qlen: 30441 maxpriority: 3 [task cnt mismatch] There is no effect on operations when this overflow occurs. However it causes confusion when trying to diagnose the performance problem. Signed-off-by: Dai Ngo <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2024-02-28SUNRPC: Add a transport callback to handle dequeuing of an RPC requestTrond Myklebust1-0/+1
Add a transport level callback to allow it to handle the consequences of dequeuing the request that was in the process of being transmitted. For something like a TCP connection, we may need to disconnect if the request was partially transmitted. Signed-off-by: Trond Myklebust <[email protected]>
2024-02-28sched/topology: Rename SD_SHARE_PKG_RESOURCES to SD_SHARE_LLCAlex Shi2-5/+5
SD_SHARE_PKG_RESOURCES is a bit of a misnomer: its naming suggests that it's sharing all 'package resources' - while in reality it's specifically for sharing the LLC only. Rename it to SD_SHARE_LLC to reduce confusion. [ mingo: Rewrote the confusing changelog as well. ] Suggested-by: Valentin Schneider <[email protected]> Signed-off-by: Alex Shi <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Ricardo Neri <[email protected]> Reviewed-by: Barry Song <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-28swiotlb: add debugfs to track swiotlb transient pool usageZhangPeng1-0/+3
Introduce a new debugfs interface io_tlb_transient_nslabs. The device driver can create a new swiotlb transient memory pool once default memory pool is full. To export the swiotlb transient memory pool usage via debugfs would help the user estimate the size of transient swiotlb memory pool or analyze device driver memory leak issue. Signed-off-by: ZhangPeng <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2024-02-28net: ethtool: eee: Remove legacy _u32 from keeeAndrew Lunn1-3/+0
All MAC drivers have been converted to use the link mode members of keee. So remove the _u32 values, and the code in the ethtool core to convert the legacy _u32 values to link modes. Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-02-28locking/mutex: Simplify <linux/mutex.h>Waiman Long1-6/+2
CONFIG_DEBUG_MUTEXES and CONFIG_PREEMPT_RT are mutually exclusive. They can't be both set at the same time. Move up the mutex_destroy() function declaration and the __DEBUG_MUTEX_INITIALIZER() macro above the "#ifndef CONFIG_PREEMPT_RT" section to eliminate duplicated mutex_destroy() declaration. Also remove the duplicated mutex_trylock() function declaration in the CONFIG_PREEMPT_RT section. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-28bitfield: suppress "dubious: x & !y" sparse warningJohannes Berg1-1/+2
There's a somewhat common pattern of using FIELD_PREP() even for single bits, e.g. cmd->info1 |= FIELD_PREP(HTT_SRNG_SETUP_CMD_INFO1_RING_FLAGS_MSI_SWAP, !!(params.flags & HAL_SRNG_FLAGS_MSI_SWAP)); which might as well be written as if (params.flags & HAL_SRNG_FLAGS_MSI_SWAP) cmd->info1 |= HTT_SRNG_SETUP_CMD_INFO1_RING_FLAGS_MSI_SWAP; (since info1 is fully initialized to start with), but in a long chain of FIELD_PREP() this really seems fine. However, it triggers a sparse warning, in the check in the macro for whether a constant value fits into the mask, as this contains a "& (_val)". In this case, this really is always intentional, so just suppress the warning by adding "0+" to the expression, indicating explicitly that this is correct. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://msgid.link/20240223100146.d243b6b1a9a1.I033828b1187c6bccf086e31400f7e933bb8373e7@changeid
2024-02-28fbdev: Clean up include statements in header fileThomas Zimmermann1-4/+4
Include mutex.h, printk.h and types.h, remove several unnecessary include statements, and sort the list alphabetically. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Acked-by: Helge Deller <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-28fbdev: Clean up forward declarations in header fileThomas Zimmermann1-3/+5
Add forward declarations for struct i2c_adapter and struct module, and sort the list alphabetically. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Acked-by: Helge Deller <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-28fbdev: Do not include <linux/slab.h> in headerThomas Zimmermann1-1/+1
Forward declare struct page and remove the include statement. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Acked-by: Helge Deller <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-28fbdev: Do not include <linux/notifier.h> in headerThomas Zimmermann1-1/+1
Forward declare struct notifier_block and remove the include statement. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Acked-by: Helge Deller <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-28fbdev: Do not include <linux/fs.h> in headerThomas Zimmermann1-1/+1
Forward declare struct inode and remove the include statement. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Acked-by: Helge Deller <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-28fbdev: Do not include <linux/backlight.h> in headerThomas Zimmermann1-1/+1
Forward declare struct backlight_device and remove the include statement. Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Acked-by: Helge Deller <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-27Merge tag 'mm-hotfixes-stable-2024-02-27-14-52' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Six hotfixes. Three are cc:stable and the remainder address post-6.7 issues or aren't considered appropriate for backporting" * tag 'mm-hotfixes-stable-2024-02-27-14-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/debug_vm_pgtable: fix BUG_ON with pud advanced test mm: cachestat: fix folio read-after-free in cache walk MAINTAINERS: add memory mapping entry with reviewers mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index kasan: revert eviction of stack traces in generic mode stackdepot: use variable size records for non-evictable entries
2024-02-27libfs: Drop generic_set_encrypted_ci_d_opsGabriel Krisman Bertazi1-1/+0
No filesystems depend on it anymore, and it is generally a bad idea. Since all dentries should have the same set of dentry operations in case-insensitive capable filesystems, it should be propagated through ->s_d_op. Reviewed-by: Eric Biggers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gabriel Krisman Bertazi <[email protected]>