aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-28Merge tag 'filelock-v6.6' of ↵Linus Torvalds3-5/+159
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: - new functionality for F_OFD_GETLK: requesting a type of F_UNLCK will find info about whatever lock happens to be first in the given range, regardless of type. - an OFD lock selftest - bugfix involving a UAF in a tracepoint - comment typo fix * tag 'filelock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock fs/locks: Fix typo selftests: add OFD lock tests fs/locks: F_UNLCK extension for F_OFD_GETLK
2023-08-28Merge tag 'v6.6-fs.proc.uapi' of ↵Linus Torvalds3-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull procfs fixes from Christian Brauner: "Mode changes to files under /proc/<pid>/ aren't supported ever since commit 6d76fa58b050 ("Don't allow chmod() on the /proc/<pid>/ files"). Due to an oversight in commit 1b3044e39a89 ("procfs: fix pthread cross-thread naming if !PR_DUMPABLE") in switching from REG to NOD, mode changes on /proc/thread-self/comm were accidently allowed. Similar, mode changes for all files beneath /proc/<pid>/net/ are blocked but mode changes on /proc/<pid>/net itself were accidently allowed. Both issues come down to not using the generic proc_setattr() helper which blocks all mode changes. This is rectified with this pull request. This also removes a strange nolibc test that abused /proc/<pid>/net for testing mode changes. Using procfs for this test never made a lot of sense given procfs has special semantics for almost everything anway. Both changes are minor user-visible changes. It is however very unlikely that mode changes on proc/<pid>/net and /proc/thread-self/comm are something that userspace relies on" * tag 'v6.6-fs.proc.uapi' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: procfs: block chmod on /proc/thread-self/comm proc: use generic setattr() for /proc/$PID/net selftests/nolibc: drop test chmod_net
2023-08-28Merge tag 'v6.6-vfs.autofs' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull autofs fixes from Christian Brauner: "This fixes a memory leak in autofs reported by syzkaller and a missing conversion from uninterruptible to interruptible wake up when autofs is in catatonic mode" * tag 'v6.6-vfs.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: autofs: use wake_up() instead of wake_up_interruptible(() autofs: fix memory leak of waitqueues in autofs_catatonic_mode
2023-08-28Merge tag 'v6.6-vfs.fchmodat2' of ↵Linus Torvalds25-7/+196
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fchmodat2 system call from Christian Brauner: "This adds the fchmodat2() system call. It is a revised version of the fchmodat() system call, adding a missing flag argument. Support for both AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH are included. Adding this system call revision has been a longstanding request but so far has always fallen through the cracks. While the kernel implementation of fchmodat() does not have a flag argument the libc provided POSIX-compliant fchmodat(3) version does. Both glibc and musl have to implement a workaround in order to support AT_SYMLINK_NOFOLLOW (see [1] and [2]). The workaround is brittle because it relies not just on O_PATH and O_NOFOLLOW semantics and procfs magic links but also on our rather inconsistent symlink semantics. This gives userspace a proper fchmodat2() system call that libcs can use to properly implement fchmodat(3) and allows them to get rid of their hacks. In this case it will immediately benefit them as the current workaround is already defunct because of aformentioned inconsistencies. In addition to AT_SYMLINK_NOFOLLOW, give userspace the ability to use AT_EMPTY_PATH with fchmodat2(). This is already possible with fchownat() so there's no reason to not also support it for fchmodat2(). The implementation is simple and comes with selftests. Implementation of the system call and wiring up the system call are done as separate patches even though they could arguably be one patch. But in case there are merge conflicts from other system call additions it can be beneficial to have separate patches" Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fchmodat.c;h=17eca54051ee28ba1ec3f9aed170a62630959143;hb=a492b1e5ef7ab50c6fdd4e4e9879ea5569ab0a6c#l35 [1] Link: https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c?id=718f363bc2067b6487900eddc9180c84e7739f80#n28 [2] * tag 'v6.6-vfs.fchmodat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: fchmodat2: remove duplicate unneeded defines fchmodat2: add support for AT_EMPTY_PATH selftests: Add fchmodat2 selftest arch: Register fchmodat2, usually as syscall 452 fs: Add fchmodat2() Non-functional cleanup of a "__user * filename"
2023-08-28Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds40-629/+1043
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2023-08-28Merge tag 'v6.6-vfs.misc' of ↵Linus Torvalds44-243/+458
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems. Features: - Block mode changes on symlinks and rectify our broken semantics - Report file modifications via fsnotify() for splice - Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up - Use synchronous fput for the close system call Cleanups: - Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper - Simplify epoll allocation helper - Convert simple_write_begin and simple_write_end to use a folio - Convert page_cache_pipe_buf_confirm() to use a folio - Simplify __range_close to avoid pointless locking - Disable per-cpu buffer head cache for isolated cpus - Port ecryptfs to kmap_local_page() api - Remove redundant initialization of pointer buf in pipe code - Unexport the d_genocide() function which is only used within core vfs - Replace printk(KERN_ERR) and WARN_ON() with WARN() Fixes: - Fix various kernel-doc issues - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE - Fix a mainly theoretical issue in devpts - Check the return value of __getblk() in reiserfs - Fix a racy assert in i_readcount_dec - Fix integer conversion issues in various functions - Fix LSM security context handling during automounts that prevented NFS superblock sharing" * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ...
2023-08-28Merge tag 'v6.6-vfs.tmpfs' of ↵Linus Torvalds19-297/+1412
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull libfs and tmpfs updates from Christian Brauner: "This cycle saw a lot of work for tmpfs that required changes to the vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this cycle. Things will go back to mm next cycle. Features ======== - By far the biggest work is the quota support for tmpfs. New tmpfs quota infrastructure is added to support it and a new QFMT_SHMEM uapi option is exposed. This offers user and group quotas to tmpfs (project quotas will be added later). Similar to other filesystems tmpfs quota are not supported within user namespaces yet. - Add support for user xattrs. While tmpfs already supports security xattrs (security.*) and POSIX ACLs for a long time it lacked support for user xattrs (user.*). With this pull request tmpfs will be able to support a limited number of user xattrs. This is accompanied by a fix (see below) to limit persistent simple xattr allocations. - Add support for stable directory offsets. Currently tmpfs relies on the libfs provided cursor-based mechanism for readdir. This causes issues when a tmpfs filesystem is exported via NFS. NFS clients do not open directories. Instead, each server-side readdir operation opens the directory, reads it, and then closes it. Since the cursor state for that directory is associated with the opened file it is discarded after each readdir operation. Such directory offsets are not just cached by NFS clients but also various userspace libraries based on these clients. As it stands there is no way to invalidate the caches when directory offsets have changed and the whole application depends on unchanging directory offsets. At LSFMM we discussed how to solve this problem and decided to support stable directory offsets. libfs now allows filesystems like tmpfs to use an xarrary to map a directory offset to a dentry. This mechanism is currently only used by tmpfs but can be supported by others as well. Fixes ===== - Change persistent simple xattrs allocations in libfs from GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory cgroup limits. Since this is a change to libfs it affects both tmpfs and kernfs. - Correctly verify {g,u}id mount options. A new filesystem context is created via fsopen() which records the namespace that becomes the owning namespace of the superblock when fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are mountable in namespaces. However, fsconfig() calls can occur in a namespace different from the namespace where fsopen() has been called. Currently, when fsconfig() is called to set {g,u}id mount options the requested {g,u}id is mapped into a k{g,u}id according to the namespace where fsconfig() was called from. The resulting k{g,u}id is not guaranteed to be resolvable in the namespace of the filesystem (the one that fsopen() was called in). This means it's possible for an unprivileged user to create files owned by any group in a tmpfs mount since it's possible to set the setid bits on the tmpfs directory. The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs. The new mount api's cross-namespace delegation abilities are already widely used. Having talked to a bunch of userspace this is the most faithful solution with minimal regression risks" * tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs mm: invalidation check mapping before folio_contains tmpfs: trivial support for direct IO tmpfs,xattr: enable limited user extended attributes tmpfs: track free_ispace instead of free_inodes xattr: simple_xattr_set() return old_xattr to be freed tmpfs: verify {g,u}id mount options correctly shmem: move spinlock into shmem_recalc_inode() to fix quota support libfs: Remove parent dentry locking in offset_iterate_dir() libfs: Add a lock class for the offset map's xa_lock shmem: stable directory offsets shmem: Refactor shmem_symlink() libfs: Add directory operations for stable offsets shmem: fix quota lock nesting in huge hole handling shmem: Add default quota limit mount options shmem: quota support shmem: prepare shmem quota infrastructure quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks shmem: make shmem_get_inode() return ERR_PTR instead of NULL shmem: make shmem_inode_acct_block() return error
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds252-1049/+1267
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-28Merge tag 'v6.6-vfs.fs_context' of ↵Linus Torvalds5-71/+107
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mount API updates from Christian Brauner: "This introduces FSCONFIG_CMD_CREATE_EXCL which allows userspace to implement something like $ mount -t ext4 --exclusive /dev/sda /B which fails if a superblock for the requested filesystem does already exist instead of silently reusing an existing superblock. Without it, in the sequence $ move-mount -f xfs -o source=/dev/sda4 /A $ move-mount -f xfs -o noacl,source=/dev/sda4 /B the initial mounter will create a superblock. The second mounter will reuse the existing superblock, creating a bind-mount (see [1] for the source of the move-mount binary). The problem is that reusing an existing superblock means all mount options other than read-only and read-write will be silently ignored even if they are incompatible requests. For example, the second mount has requested no POSIX ACL support but since the existing superblock is reused POSIX ACL support will remain enabled. Such silent superblock reuse can easily become a security issue. After adding support for FSCONFIG_CMD_CREATE_EXCL to mount(8) in util-linux this can be fixed: $ move-mount -f xfs --exclusive -o source=/dev/sda4 /A $ move-mount -f xfs --exclusive -o noacl,source=/dev/sda4 /B Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed This requires the new mount api. With the old mount api it would be necessary to plumb this through every legacy filesystem's file_system_type->mount() method. If they want this feature they are most welcome to switch to the new mount api" Link: https://github.com/brauner/move-mount-beneath [1] Link: https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner Link: https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner Link: https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner Link: https://lore.kernel.org/lkml/20230824-anzog-allheilmittel-e8c63e429a79@brauner/ * tag 'v6.6-vfs.fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add FSCONFIG_CMD_CREATE_EXCL fs: add vfs_cmd_reconfigure() fs: add vfs_cmd_create() super: remove get_tree_single_reconf()
2023-08-28Merge branch 'devlink-finish-file-split-and-get-retire-leftover-c'Jakub Kicinski15-9540/+9710
Jiri Pirko says: ==================== devlink: finish file split and get retire leftover.c This patchset finishes a move Jakub started and Moshe continued in the past. I was planning to do this for a long time, so here it is, finally. This patchset does not change any behaviour. It just splits leftover.c into per-object files and do necessary changes, like declaring functions used from other code, on the way. The last 3 patches are pushing the rest of the code into appropriate existing files. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: move devlink_notify_register/unregister() to dev.cJiri Pirko4-64/+30
At last, move the last bits out of leftover.c, the devlink_notify_register/unregister() functions to dev.c Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: move small_ops definition into netlink.cJiri Pirko3-253/+251
Move the generic netlink small_ops definition where they are consumed, into netlink.c Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: move tracepoint definitions into core.cJiri Pirko2-6/+6
Move remaining tracepoint definitions to most suitable file core.c. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push linecard related code into separate fileJiri Pirko4-615/+626
Cut out another chunk from leftover.c and put linecard related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push rate related code into separate fileJiri Pirko4-719/+728
Cut out another chunk from leftover.c and put rate related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push trap related code into separate fileJiri Pirko4-1862/+1873
Cut out another chunk from leftover.c and put trap related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: use tracepoint_enabled() helperJiri Pirko1-1/+1
In preparation for the trap code move, use tracepoint_enabled() helper instead of trace_devlink_trap_report_enabled() which would not be defined in that scope. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push region related code into separate fileJiri Pirko4-1257/+1267
Cut out another chunk from leftover.c and put region related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push param related code into separate fileJiri Pirko4-860/+875
Cut out another chunk from leftover.c and put param related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push resource related code into separate fileJiri Pirko4-575/+582
Cut out another chunk from leftover.c and put resource related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push dpipe related code into separate fileJiri Pirko4-895/+926
Cut out another chunk from leftover.c and put dpipe related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: move and rename devlink_dpipe_send_and_alloc_skb() helperJiri Pirko3-25/+26
Since both dpipe and resource code is using this helper, in preparation for code split to separate files, move devlink_dpipe_send_and_alloc_skb() helper into netlink.c. Rename it on the way. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push shared buffer related code into separate fileJiri Pirko4-991/+1006
Cut out another chunk from leftover.c and put sb related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push port related code into separate fileJiri Pirko4-1530/+1547
Cut out another chunk from leftover.c and put port related code into a separate file. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28devlink: push object register/unregister notifications into separate helpersJiri Pirko1-84/+163
In preparations of leftover.c split to individual files, avoid need to have object structures exposed in devl_internal.h and allow to have them maintained in object files. The register/unregister notifications need to know the structures to iterate lists. To avoid the need, introduce per-object register/unregister notification helpers and use them. Signed-off-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-28Merge tag 'opp-updates-6.6' of ↵Rafael J. Wysocki8-229/+344
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull OPP updates for 6.6 from Viresh Kumar: "- Minor core cleanup and addition of new frequency related APIs (Viresh Kumar and Manivannan Sadhasivam). - Convert ti cpufreq/opp bindings to json schema (Nishanth Menon)." * tag 'opp-updates-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: dt-bindings: cpufreq: Convert ti-cpufreq to json schema dt-bindings: opp: Convert ti-omap5-opp-supply to json schema OPP: Fix argument name in doc comment dt-bindings: opp: Increase maxItems for opp-hz property OPP: Fix passing 0 to PTR_ERR in _opp_attach_genpd() OPP: Fix potential null ptr dereference in dev_pm_opp_get_required_pstate() OPP: Reuse dev_pm_opp_get_freq_indexed() OPP: Update _read_freq() to return the correct frequency OPP: Add dev_pm_opp_find_freq_exact_indexed() OPP: Introduce dev_pm_opp_get_freq_indexed() API OPP: Introduce dev_pm_opp_find_freq_{ceil/floor}_indexed() APIs OPP: Rearrange entries in pm_opp.h
2023-08-28Merge branch 'pm-cpufreq'Rafael J. Wysocki32-102/+108
Merge ARM cpufreq updates for 6.6: - Migrate various platforms to use remove callback returning void (Yangtao Li). - Add online/offline/exit hooks for Tegra driver (Sumit Gupta). - Explicitly include correct DT includes (Rob Herring). - Frequency domain updates for qcom-hw driver (Neil Armstrong). - Modify AMD pstate driver return the highest_perf value (Meng Li). - Generic cleanups for cppc, mediatek and powernow driver (Liao Chang, Konrad Dybcio). - Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino Del Regno, Konrad Dybcio). - brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva). * pm-cpufreq: (33 commits) cpufreq: tegra194: remove opp table in exit hook cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit() cpufreq: tegra194: add online/offline hooks cpufreq: qcom-cpufreq-hw: add support for 4 freq domains dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases. cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message cpufreq: amd-pstate-ut: Modify the function to get the highest_perf value cpufreq: mediatek-hw: Remove unused define cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug cpufreq: blocklist MSM8998 in cpufreq-dt-platdev cpufreq: omap: Convert to platform remove callback returning void cpufreq: qoriq: Convert to platform remove callback returning void cpufreq: acpi: Convert to platform remove callback returning void cpufreq: tegra186: Convert to platform remove callback returning void cpufreq: qcom-nvmem: Convert to platform remove callback returning void cpufreq: kirkwood: Convert to platform remove callback returning void cpufreq: pcc-cpufreq: Convert to platform remove callback returning void ...
2023-08-28Merge tag 'cpufreq-arm-updates-6.6' of ↵Rafael J. Wysocki32-102/+108
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 6.6 from Viresh Kumar: "- Migrate various platforms to use remove callback returning void (Yangtao Li). - Add online/offline/exit hooks for Tegra driver (Sumit Gupta). - Explicitly include correct DT includes (Rob Herring). - Frequency domain updates for qcom-hw driver (Neil Armstrong). - Modify AMD pstate driver return the highest_perf value (Meng Li). - Generic cleanups for cppc, mediatek and powernow driver (Liao Chang and Konrad Dybcio). - Add more platforms to cpufreq-arm driver's blocklist (AngeloGioacchino Del Regno and Konrad Dybcio). - brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)." * tag 'cpufreq-arm-updates-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (33 commits) cpufreq: tegra194: remove opp table in exit hook cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit() cpufreq: tegra194: add online/offline hooks cpufreq: qcom-cpufreq-hw: add support for 4 freq domains dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases. cpufreq: Prefer to print cpuid in MIN/MAX QoS register error message cpufreq: amd-pstate-ut: Modify the function to get the highest_perf value cpufreq: mediatek-hw: Remove unused define cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev cpufreq: brcmstb-avs-cpufreq: Fix -Warray-bounds bug cpufreq: blocklist MSM8998 in cpufreq-dt-platdev cpufreq: omap: Convert to platform remove callback returning void cpufreq: qoriq: Convert to platform remove callback returning void cpufreq: acpi: Convert to platform remove callback returning void cpufreq: tegra186: Convert to platform remove callback returning void cpufreq: qcom-nvmem: Convert to platform remove callback returning void cpufreq: kirkwood: Convert to platform remove callback returning void cpufreq: pcc-cpufreq: Convert to platform remove callback returning void ...
2023-08-28Merge remote-tracking branch 'linux-efi/urgent' into efi/nextArd Biesheuvel1-1/+1
2023-08-28cpufreq: tegra194: remove opp table in exit hookSumit Gupta1-0/+12
Add exit hook and remove OPP table when the device gets unregistered. This will fix the error messages when the CPU FREQ driver module is removed and then re-inserted. It also fixes these messages while onlining the first CPU from a policy whose all CPU's were previously offlined. debugfs: File 'cpu5' in directory 'opp' already present! debugfs: File 'cpu6' in directory 'opp' already present! debugfs: File 'cpu7' in directory 'opp' already present! Fixes: f41e1442ac5b ("cpufreq: tegra194: add OPP support and set bandwidth") Signed-off-by: Sumit Gupta <[email protected]> [ Viresh: Dropped irrelevant change from it ] Signed-off-by: Viresh Kumar <[email protected]>
2023-08-28Merge tag 'irqchip-6.6' of ↵Thomas Gleixner33-41/+35
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - Fix for Loongsoon eiointc init error handling - Fix a bunch of warning showing up when -Wmissing-prototypes is set - A set of fixes for drivers checking for 0 as a potential return value from platform_get_irq() - Another set of patches converting existing code to the use of helpers such as of_address_count() and devm_platform_get_and_ioremap_resource() - A tree-wide cleanup of drivers including of_*.h without discrimination - Added support for the Amlogic C3 SoCs Link: https://lore.kernel.org/lkml/[email protected]
2023-08-28inet: fix IP_TRANSPARENT error handlingEric Dumazet1-5/+3
My recent patch forgot to change error handling for IP_TRANSPARENT socket option. WARNING: bad unlock balance detected! 6.5.0-rc7-syzkaller-01717-g59da9885767a #0 Not tainted ------------------------------------- syz-executor151/5028 is trying to release lock (sk_lock-AF_INET) at: [<ffffffff88213983>] sockopt_release_sock+0x53/0x70 net/core/sock.c:1073 but there are no more locks to release! other info that might help us debug this: 1 lock held by syz-executor151/5028: stack backtrace: CPU: 0 PID: 5028 Comm: syz-executor151 Not tainted 6.5.0-rc7-syzkaller-01717-g59da9885767a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 __lock_release kernel/locking/lockdep.c:5438 [inline] lock_release+0x4b5/0x680 kernel/locking/lockdep.c:5781 sock_release_ownership include/net/sock.h:1824 [inline] release_sock+0x175/0x1b0 net/core/sock.c:3527 sockopt_release_sock+0x53/0x70 net/core/sock.c:1073 do_ip_setsockopt+0x12c1/0x3640 net/ipv4/ip_sockglue.c:1364 ip_setsockopt+0x59/0xe0 net/ipv4/ip_sockglue.c:1419 raw_setsockopt+0x218/0x290 net/ipv4/raw.c:833 __sys_setsockopt+0x2cd/0x5b0 net/socket.c:2305 __do_sys_setsockopt net/socket.c:2316 [inline] __se_sys_setsockopt net/socket.c:2313 [inline] Fixes: 4bd0623f04ee ("inet: move inet->transparent to inet->inet_flags") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Simon Horman <[email protected]> Cc: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28selftests: bonding: create directly devices in the target namespacesZhengchao Shao1-5/+3
If failed to set link1_1 to netns client, we should delete link1_1 in the cleanup path. But if set link1_1 to netns client successfully, delete link1_1 will report warning. So it will be safer creating directly the devices in the target namespaces. Reported-by: Hangbin Liu <[email protected]> Closes: https://lore.kernel.org/all/ZNyJx1HtXaUzOkNA@Laptop-X1/ Signed-off-by: Zhengchao Shao <[email protected]> Acked-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28r8169: fix ASPM-related issues on a number of systems with NIC version from ↵Heiner Kallweit1-4/+0
RTL8168h This effectively reverts 4b5f82f6aaef. On a number of systems ASPM L1 causes tx timeouts with RTL8168h, see referenced bug report. Fixes: 4b5f82f6aaef ("r8169: enable ASPM L1/L1.1 from RTL8168h") Cc: [email protected] Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217814 Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28ethernet: tg3: remove unreachable codeMikhail Kobuk1-4/+1
'tp->irq_max' value is either 1 [L16336] or 5 [L16354], as indicated in tg3_get_invariants(). Therefore, 'i' can't exceed 4 in tg3_init_one() that makes (i <= 4) always true. Moreover, 'intmbx' value set at the last iteration is not used later in it's scope. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 78f90dcf184b ("tg3: Move napi_add calls below tg3_get_invariants") Signed-off-by: Mikhail Kobuk <[email protected]> Reviewed-by: Alexey Khoroshilov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28net: Make consumed action consistent in sch_handle_egressDaniel Borkmann1-0/+2
While looking at TC_ACT_* handling, the TC_ACT_CONSUMED is only handled in sch_handle_ingress but not sch_handle_egress. This was added via cd11b164073b ("net/tc: introduce TC_ACT_REINSERT.") and e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible") and later got renamed into TC_ACT_CONSUMED via 720f22fed81b ("net: sched: refactor reinsert action"). The initial work was targeted for ovs back then and only needed on ingress, and the mirred action module also restricts it to only that. However, given it's an API contract it would still make sense to make this consistent to sch_handle_ingress and handle it on egress side in the same way, that is, setting return code to "success" and returning NULL back to the caller as otherwise an action module sitting on egress returning TC_ACT_CONSUMED could lead to an UAF when untreated. Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28net: Fix skb consume leak in sch_handle_egressDaniel Borkmann1-0/+1
Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}: [...] unreferenced object 0xffff88818bcb4f00 (size 232): comm "softirq", pid 0, jiffies 4299085078 (age 134.028s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1..... backtrace: [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400 [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0 [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0 [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870 [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0 [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190 [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470 [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00 [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230 [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190 [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10 [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0 [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450 [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0 [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420 [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920 [...] I was able to reproduce this via: ip link add dev dummy0 type dummy ip link set dev dummy0 up tc qdisc add dev eth0 clsact tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0 ping 1.1.1.1 <stolen> After the fix, there are no kmemleak reports with the reproducer. This is in line with what is also done on the ingress side, and from debugging the skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible that these are two different skbs with both skb_unref(skb) as true. The two seen skbs are due to mirred doing a skb_clone() internally as use_reinsert is false in tcf_mirred_act() for egress. This was initially reported by Gal. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: Gal Pressman <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28dccp: Fix out of bounds access in DCCP error handlerJann Horn2-9/+19
There was a previous attempt to fix an out-of-bounds access in the DCCP error handlers, but that fix assumed that the error handlers only want to access the first 8 bytes of the DCCP header. Actually, they also look at the DCCP sequence number, which is stored beyond 8 bytes, so an explicit pskb_may_pull() is required. Fixes: 6706a97fec96 ("dccp: fix out of bound access in dccp_v4_err()") Fixes: 1aa9d1a0e7ee ("ipv6: dccp: fix out of bound access in dccp_v6_err()") Cc: [email protected] Signed-off-by: Jann Horn <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28Merge branch 'octeontx2-af-misc-mac-block-changes'David S. Miller3-4/+15
Hariprasad Kelam says: ==================== octeontx2-af: misc MAC block changes This series of patches adds recent changes added in MAC (CGX/RPM) block. Patch1: Adds new LMAC mode supported by CN10KB silicon Patch2: In a scenario where system boots with no cgx devices, currently AF driver treats this as error as a result no interfaces will work. This patch relaxes this check, such that non cgx mapped netdev devices will work. Patch3: This patch adds required lmac validation in MAC block APIs. Patch4: Prints error message incase, no netdev is mapped with given cgx,lmac pair. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-08-28octeontx2-af: print error message incase of invalid pf mappingHariprasad Kelam1-0/+5
During AF driver initialization, it creates a mapping between pf to cgx,lmac pair. Whenever there is a physical link change, using this mapping driver forwards the message to the associated netdev. This patch prints error message incase of cgx,lmac pair is not associated with any pf netdev. Signed-off-by: Hariprasad Kelam <[email protected]> Signed-off-by: Sunil Kovvuri Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28octeontx2-af: Add validation of lmacHariprasad Kelam1-3/+7
With the addition of new MAC blocks like CN10K RPM and CN10KB RPM_USX, LMACs are noncontiguous. Though in most of the functions, lmac validation checks exist but in few functions they are missing. This patch adds the same. Signed-off-by: Hariprasad Kelam <[email protected]> Signed-off-by: Sunil Kovvuri Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28octeontx2-af: Don't treat lack of CGX interfaces as errorSunil Goutham1-1/+1
Don't treat lack of CGX LMACs on the system as a error. Instead ignore it so that LBK VFs are created and can be used. Signed-off-by: Sunil Goutham <[email protected]> Signed-off-by: Hariprasad Kelam <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28octeontx2-af: CN10KB: Add USGMII LMAC modeHariprasad Kelam2-0/+2
Upon physical link change, firmware reports to the kernel about the change along with the details like speed, lmac_type_id, etc. Kernel derives lmac_type based on lmac_type_id received from firmware. This patch extends current lmac list with new USGMII mode supported by CN10KB RPM block. Signed-off-by: Hariprasad Kelam <[email protected]> Signed-off-by: Sunil Kovvuri Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28dt-bindings: net: dsa: marvell: fix wrong model in compatibility listAlexis Lothoré1-1/+1
Fix wrong switch name in compatibility list. 88E6163 switch does not exist and is in fact 88E6361 Fixes: 9229a9483d80 ("dt-bindings: net: dsa: marvell: add MV88E6361 switch to compatibility list") Signed-off-by: Alexis Lothoré <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Conor Dooley <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()Liao Chang1-1/+2
Since the 'cpus' field of policy structure will become empty in the cpufreq core API, it is better to use 'related_cpus' in the exit() callback of driver. Fixes: c3274763bfc3 ("cpufreq: powernow-k8: Initialize per-cpu data-structures properly") Signed-off-by: Liao Chang <[email protected]> Signed-off-by: Viresh Kumar <[email protected]>
2023-08-28cpufreq: tegra194: add online/offline hooksSumit Gupta1-0/+17
Implement the light-weight tear down and bring up helpers to reduce the amount of work to do on CPU offline/online operation. This change helps to make the hotplugging paths much faster. Suggested-by: Viresh Kumar <[email protected]> Signed-off-by: Sumit Gupta <[email protected]> Link: https://lore.kernel.org/lkml/20230816033402.3abmugb5goypvllm@vireshk-i7/ [ Viresh: Fixed rebase conflict ] Signed-off-by: Viresh Kumar <[email protected]>
2023-08-28igb: set max size RX buffer when store bad packet is enabledRadoslaw Tyl1-4/+7
Increase the RX buffer size to 3K when the SBP bit is on. The size of the RX buffer determines the number of pages allocated which may not be sufficient for receive frames larger than the set MTU size. Cc: [email protected] Fixes: 89eaefb61dc9 ("igb: Support RX-ALL feature flag.") Reported-by: Manfred Rudigier <[email protected]> Signed-off-by: Radoslaw Tyl <[email protected]> Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28netrom: Deny concurrent connect().Kuniyuki Iwashima1-0/+5
syzkaller reported null-ptr-deref [0] related to AF_NETROM. This is another self-accept issue from the strace log. [1] syz-executor creates an AF_NETROM socket and calls connect(), which is blocked at that time. Then, sk->sk_state is TCP_SYN_SENT and sock->state is SS_CONNECTING. [pid 5059] socket(AF_NETROM, SOCK_SEQPACKET, 0) = 4 [pid 5059] connect(4, {sa_family=AF_NETROM, sa_data="..." <unfinished ...> Another thread calls connect() concurrently, which finally fails with -EINVAL. However, the problem here is the socket state is reset even while the first connect() is blocked. [pid 5060] connect(4, NULL, 0 <unfinished ...> [pid 5060] <... connect resumed>) = -1 EINVAL (Invalid argument) As sk->state is TCP_CLOSE and sock->state is SS_UNCONNECTED, the following listen() succeeds. Then, the first connect() looks up itself as a listener and puts skb into the queue with skb->sk itself. As a result, the next accept() gets another FD of itself as 3, and the first connect() finishes. [pid 5060] listen(4, 0 <unfinished ...> [pid 5060] <... listen resumed>) = 0 [pid 5060] accept(4, NULL, NULL <unfinished ...> [pid 5060] <... accept resumed>) = 3 [pid 5059] <... connect resumed>) = 0 Then, accept4() is called but blocked, which causes the general protection fault later. [pid 5059] accept4(4, NULL, 0x20000400, SOCK_NONBLOCK <unfinished ...> After that, another self-accept occurs by accept() and writev(). [pid 5060] accept(4, NULL, NULL <unfinished ...> [pid 5061] writev(3, [{iov_base=...}] <unfinished ...> [pid 5061] <... writev resumed>) = 99 [pid 5060] <... accept resumed>) = 6 Finally, the leader thread close()s all FDs. Since the three FDs reference the same socket, nr_release() does the cleanup for it three times, and the remaining accept4() causes the following fault. [pid 5058] close(3) = 0 [pid 5058] close(4) = 0 [pid 5058] close(5) = -1 EBADF (Bad file descriptor) [pid 5058] close(6) = 0 [pid 5058] <... exit_group resumed>) = ? [ 83.456055][ T5059] general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN To avoid the issue, we need to return an error for connect() if another connect() is in progress, as done in __inet_stream_connect(). [0]: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 0 PID: 5059 Comm: syz-executor.0 Not tainted 6.5.0-rc5-syzkaller-00194-gace0ab3a4b54 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5012 Code: 45 85 c9 0f 84 cc 0e 00 00 44 8b 05 11 6e 23 0b 45 85 c0 0f 84 be 0d 00 00 48 ba 00 00 00 00 00 fc ff df 4c 89 d1 48 c1 e9 03 <80> 3c 11 00 0f 85 e8 40 00 00 49 81 3a a0 69 48 90 0f 84 96 0d 00 RSP: 0018:ffffc90003d6f9e0 EFLAGS: 00010006 RAX: ffff8880244c8000 RBX: 1ffff920007adf6c RCX: 0000000000000003 RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000018 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000018 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f51d519a6c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f51d5158d58 CR3: 000000002943f000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5726 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3a/0x50 kernel/locking/spinlock.c:162 prepare_to_wait+0x47/0x380 kernel/sched/wait.c:269 nr_accept+0x20d/0x650 net/netrom/af_netrom.c:798 do_accept+0x3a6/0x570 net/socket.c:1872 __sys_accept4_file net/socket.c:1913 [inline] __sys_accept4+0x99/0x120 net/socket.c:1943 __do_sys_accept4 net/socket.c:1954 [inline] __se_sys_accept4 net/socket.c:1951 [inline] __x64_sys_accept4+0x96/0x100 net/socket.c:1951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f51d447cae9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f51d519a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000120 RAX: ffffffffffffffda RBX: 00007f51d459bf80 RCX: 00007f51d447cae9 RDX: 0000000020000400 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007f51d44c847a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f51d459bf80 R15: 00007ffc25c34e48 </TASK> Link: https://syzkaller.appspot.com/text?tag=CrashLog&x=152cdb63a80000 [1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=666c97e4686410e79649 Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-28dt-bindings: net: xilinx_gmii2rgmii: Convert to json schemaPranavi Somisetty2-35/+55
Convert the Xilinx GMII to RGMII Converter device tree binding documentation to json schema. This converter is usually used as gem <---> gmii2rgmii <---> external phy and, it's phy-handle should point to the phandle of the external phy. Signed-off-by: Pranavi Somisetty <[email protected]> Signed-off-by: Harini Katakam <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-08-27Merge branch 'tls-expand-tls_cipher_size_desc-to-simplify-getsockopt-setsockopt'Jakub Kicinski8-435/+278
Sabrina Dubroca says: ==================== tls: expand tls_cipher_size_desc to simplify getsockopt/setsockopt Commit 2d2c5ea24243 ("net/tls: Describe ciphers sizes by const structs") introduced tls_cipher_size_desc to describe the size of the fields of the per-cipher crypto_info structs, and commit ea7a9d88ba21 ("net/tls: Use cipher sizes structs") used it, but only in tls_device.c and tls_device_fallback.c, and skipped converting similar code in tls_main.c and tls_sw.c. This series expands tls_cipher_size_desc (renamed to tls_cipher_desc to better fit this expansion) to fully describe a cipher: - offset of the fields within the per-cipher crypto_info - size of the full struct (for copies to/from userspace) - offload flag - algorithm name used by SW crypto With these additions, we can remove ~350L of switch (crypto_info->cipher_type) { ... } from tls_set_device_offload, tls_sw_fallback_init, do_tls_getsockopt_conf, do_tls_setsockopt_conf, tls_set_sw_offload (mainly do_tls_getsockopt_conf and tls_set_sw_offload). This series also adds the ARIA ciphers to the tls selftests, and some more getsockopt/setsockopt tests to cover more of the code changed by this series. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>