blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-05-02	sysctl: remove register_sysctl_paths()	Luis Chamberlain	1	-51/+4
	The deprecation for register_sysctl_paths() is over. We can rejoice as we nuke register_sysctl_paths(). The routine register_sysctl_table() was the only user left of register_sysctl_paths(), so we can now just open code and move the implementation over to what used to be to __register_sysctl_paths(). The old dynamic struct ctl_table_set set is now the point to sysctl_table_root.default_set. The old dynamic const struct ctl_path path was being used in the routine register_sysctl_paths() with a static: static const struct ctl_path null_path[] = { {} }; Since this is a null path we can now just simplfy the old routine and remove its use as its always empty. This saves us a total of 230 bytes. $ ./scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 2/7 grow/shrink: 1/1 up/down: 1015/-1245 (-230) Function old new delta register_leaf_sysctl_tables.constprop - 524 +524 register_sysctl_table 22 497 +475 __pfx_register_leaf_sysctl_tables.constprop - 16 +16 null_path 8 - -8 __pfx_register_sysctl_paths 16 - -16 __pfx_register_leaf_sysctl_tables 16 - -16 __pfx___register_sysctl_paths 16 - -16 __register_sysctl_base 29 12 -17 register_sysctl_paths 18 - -18 register_leaf_sysctl_tables 534 - -534 __register_sysctl_paths 620 - -620 Total: Before=21259666, After=21259436, chg -0.00% Signed-off-by: Luis Chamberlain <[email protected]>
2023-04-27	Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of ↵	Linus Torvalds	1	-1/+0
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly singleton patches all over the place. Series of note are: - updates to scripts/gdb from Glenn Washburn - kexec cleanups from Bjorn Helgaas" * tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits) mailmap: add entries for Paul Mackerras libgcc: add forward declarations for generic library routines mailmap: add entry for Oleksandr ocfs2: reduce ioctl stack usage fs/proc: add Kthread flag to /proc/$pid/status ia64: fix an addr to taddr in huge_pte_offset() checkpatch: introduce proper bindings license check epoll: rename global epmutex scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry() scripts/gdb: create linux/vfs.py for VFS related GDB helpers uapi/linux/const.h: prefer ISO-friendly __typeof__ delayacct: track delays from IRQ/SOFTIRQ scripts/gdb: timerlist: convert int chunks to str scripts/gdb: print interrupts scripts/gdb: raise error with reduced debugging information scripts/gdb: add a Radix Tree Parser lib/rbtree: use '+' instead of '\|' for setting color. proc/stat: remove arch_idle_time() checkpatch: check for misuse of the link tags checkpatch: allow Closes tags with links ...
2023-04-13	proc_sysctl: enhance documentation	Luis Chamberlain	1	-5/+20
	Expand documentation to clarify: o that paths don't need to exist for the new API callers o clarify that we require callers to keep the memory of the table around during the lifetime of the sysctls o annotate routines we are trying to deprecate and later remove Cc: [email protected] # v5.17 Cc: Christian Brauner <[email protected]> Cc: Kefeng Wang <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-04-13	sysctl: clarify register_sysctl_init() base directory order	Luis Chamberlain	1	-4/+1
	Relatively new docs which I added which hinted the base directories needed to be created before is wrong, remove that incorrect comment. This has been hinted before by Eric twice already [0] [1], I had just not verified that until now. Now that I've verified that updates the docs to relax the context described. [0] https://lkml.kernel.org/r/[email protected] [1] https://lkml.kernel.org/r/[email protected] Cc: [email protected] # v5.17 Cc: Christian Brauner <[email protected]> Cc: Kefeng Wang <[email protected]> Suggested-by: Eric W. Biederman <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-04-13	proc_sysctl: move helper which creates required subdirectories	Luis Chamberlain	1	-24/+32
	Move the code which creates the subdirectories for a ctl table into a helper routine so to make it easier to review. Document the goal. This creates no functional changes. Reviewed-by: John Johansen <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-04-13	proc_sysctl: update docs for __register_sysctl_table()	Luis Chamberlain	1	-3/+11
	Update the docs for __register_sysctl_table() to make it clear no child entries can be passed. When the child is true these are non-leaf entries on the ctl table and sysctl treats these as directories. The point to __register_sysctl_table() is to deal only with directories not part of the ctl table where thay may riside, to be simple and avoid recursion. While at it, hint towards using long on extra1 and extra2 later. Cc: [email protected] # v5.17 Cc: Christian Brauner <[email protected]> Cc: Kefeng Wang <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-04-08	proc: remove mark_inode_dirty() in .setattr()	Chao Yu	1	-1/+0
	procfs' .setattr() has updated i_uid, i_gid and i_mode into proc dirent, we don't need to call mark_inode_dirty() for delayed update, remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Chao Yu <[email protected]> Reviewed-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-21	sysctl: fix proc_dobool() usability	Ondrej Mosnacek	1	-0/+6
	Currently proc_dobool expects a (bool ) in table->data, but sizeof(int) in table->maxsize, because it uses do_proc_dointvec() directly. This is unsafe for at least two reasons: 1. A sysctl table definition may use { .data = &variable, .maxsize = sizeof(variable) }, not realizing that this makes the sysctl unusable (see the Fixes: tag) and that they need to use the completely counterintuitive sizeof(int) instead. 2. proc_dobool() will currently try to parse an array of values if given .maxsize >= 2sizeof(int), but will try to write values of type bool by offsets of sizeof(int), so it will not work correctly with neither an (int ) nor a (bool ). There is no .maxsize validation to prevent this. Fix this by: 1. Constraining proc_dobool() to allow only one value and .maxsize == sizeof(bool). 2. Wrapping the original struct ctl_table in a temporary one with .data pointing to a local int variable and .maxsize set to sizeof(int) and passing this one to proc_dointvec(), converting the value to/from bool as needed (using proc_dou8vec_minmax() as an example). 3. Extending sysctl_check_table() to enforce proc_dobool() expectations. 4. Fixing the proc_dobool() docstring (it was just copy-pasted from proc_douintvec, apparently...). 5. Converting all existing proc_dobool() users to set .maxsize to sizeof(bool) instead of sizeof(int). Fixes: 83efeeeb3d04 ("tty: Allow TIOCSTI to be disabled") Fixes: a2071573d634 ("sysctl: introduce new proc handler proc_dobool") Signed-off-by: Ondrej Mosnacek <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-01-19	fs: port ->permission() to pass mnt_idmap	Christian Brauner	1	-1/+1
	Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
2023-01-19	fs: port ->getattr() to pass mnt_idmap	Christian Brauner	1	-2/+2
	Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
2023-01-19	fs: port ->setattr() to pass mnt_idmap	Christian Brauner	1	-3/+3
	Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
2022-09-08	kernel/sysctl.c: move sysctl_vals and sysctl_long_vals to sysctl.c	Liu Shixin	1	-7/+0
	sysctl_vals and sysctl_long_vals are declared even if sysctl is disabled. Move its definition to sysctl.c to make sure their integrity in any case. Signed-off-by: Liu Shixin <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-09-08	proc: remove initialization assignment	Li zeming	1	-1/+1
	The allocation address of the core_parent pointer variable is first executed in the function, no initialization assignment is required. Signed-off-by: Li zeming <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-26	Merge tag 'sysctl-5.19-rc1' of ↵	Linus Torvalds	1	-39/+50
	git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "For two kernel releases now kernel/sysctl.c has been being cleaned up slowly, since the tables were grossly long, sprinkled with tons of #ifdefs and all this caused merge conflicts with one susbystem or another. This tree was put together to help try to avoid conflicts with these cleanups going on different trees at time. So nothing exciting on this pull request, just cleanups. Thanks a lot to the Uniontech and Huawei folks for doing some of this nasty work" * tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits) sched: Fix build warning without CONFIG_SYSCTL reboot: Fix build warning without CONFIG_SYSCTL kernel/kexec_core: move kexec_core sysctls into its own file sysctl: minor cleanup in new_dir() ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n fs/proc: Introduce list_for_each_table_entry for proc sysctl mm: fix unused variable kernel warning when SYSCTL=n latencytop: move sysctl to its own file ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y ftrace: Fix build warning ftrace: move sysctl_ftrace_enabled to ftrace.c kernel/do_mount_initrd: move real_root_dev sysctls to its own file kernel/delayacct: move delayacct sysctls to its own file kernel/acct: move acct sysctls to its own file kernel/panic: move panic sysctls to its own file kernel/lockdep: move lockdep sysctls to its own file mm: move page-writeback sysctls to their own file mm: move oom_kill sysctls to their own file kernel/reboot: move reboot sysctls to its own file sched: Move energy_aware sysctls to topology.c ...
2022-05-04	memcg: accounting for objects allocated for new netdevice	Vasily Averin	1	-1/+1
	Creating a new netdevice allocates at least ~50Kb of memory for various kernel objects, but only ~5Kb of them are accounted to memcg. As a result, creating an unlimited number of netdevice inside a memcg-limited container does not fall within memcg restrictions, consumes a significant part of the host's memory, can cause global OOM and lead to random kills of host processes. The main consumers of non-accounted memory are: ~10Kb 80+ kernfs nodes ~6Kb ipv6_add_dev() allocations 6Kb __register_sysctl_table() allocations 4Kb neigh_sysctl_register() allocations 4Kb __devinet_sysctl_register() allocations 4Kb __addrconf_sysctl_register() allocations Accounting of these objects allows to increase the share of memcg-related memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice on typical VM with default Fedora 35 kernel) and this should be enough to somehow protect the host from misuse inside container. Other related objects are quite small and may not be taken into account to minimize the expected performance degradation. It should be separately mentonied ~300 bytes of percpu allocation of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes it can become the main consumer of memory. This patch does not enables kernfs accounting as it affects other parts of the kernel and should be discussed separately. However, even without kernfs, this patch significantly improves the current situation and allows to take into account more than half of all netdevice allocations. Signed-off-by: Vasily Averin <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-03	net: sysctl: introduce sysctl SYSCTL_THREE	Tonghao Zhang	1	-1/+1
	This patch introdues the SYSCTL_THREE. KUnit: [00:10:14] ================ sysctl_test (10 subtests) ================= [00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data [00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset [00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero [00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max [00:10:14] =================== [PASSED] sysctl_test =================== ./run_kselftest.sh -c sysctl ... ok 1 selftests: sysctl: sysctl.sh Cc: Luis Chamberlain <[email protected]> Cc: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: David Ahern <[email protected]> Cc: Simon Horman <[email protected]> Cc: Julian Anastasov <[email protected]> Cc: Pablo Neira Ayuso <[email protected]> Cc: Jozsef Kadlecsik <[email protected]> Cc: Florian Westphal <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Lorenz Bauer <[email protected]> Cc: Akhmat Karakotov <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-04-25	sysctl: minor cleanup in new_dir()	Vasily Averin	1	-1/+0
	Byte zeroing is not required here, since memory was allocated by kzalloc() Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-04-21	fs/proc: Introduce list_for_each_table_entry for proc sysctl	Meng Tang	1	-38/+50
	Use the list_for_each_table_entry macro to optimize the scenario of traverse ctl_table. This make the code neater and easier to understand. Suggested-by: Davidlohr Bueso<[email protected]> Signed-off-by: Meng Tang <[email protected]> [updated the sysctl_check_table() hunk due to some changes upstream] Signed-off-by: Luis Chamberlain <[email protected]>
2022-01-22	kernel/sysctl.c: rename sysctl_init() to sysctl_init_bases()	Luis Chamberlain	1	-2/+2
	Rename sysctl_init() to sysctl_init_bases() so to reflect exactly what this is doing. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Cc: Al Viro <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Christian Brauner <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Lukas Middendorf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Xiaoming Ni <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-22	sysctl: add and use base directory declarer and registration helper	Luis Chamberlain	1	-0/+9
	Patch series "sysctl: add and use base directory declarer and registration helper". In this patch series we start addressing base directories, and so we start with the "fs" sysctls. The end goal is we end up completely moving all "fs" sysctl knobs out from kernel/sysctl. This patch (of 6): Add a set of helpers which can be used to declare and register base directory sysctls on their own. We do this so we can later move each of the base sysctl directories like "fs", "kernel", etc, to their own respective files instead of shoving the declarations and registrations all on kernel/sysctl.c. The lazy approach has caught up and with this, we just end up extending the list of base directories / sysctls on one file and this makes maintenance difficult due to merge conflicts from many developers. The declarations are used first by kernel/sysctl.c for registration its own base which over time we'll try to clean up. It will be used in the next patch to demonstrate how to cleanly deal with base sysctl directories. [[email protected]: null-terminate the ctl_table arrays] Link: https://lkml.kernel.org/r/YafJY3rXDYnjK/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Cc: Al Viro <[email protected]> Cc: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Xiaoming Ni <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Lukas Middendorf <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Eric Biggers <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-22	sysctl: move maxolduid as a sysctl specific const	Luis Chamberlain	1	-1/+1
	The maxolduid value is only shared for sysctl purposes for use on a max range. Just stuff this into our shared const array. [[email protected]: fix sysctl_vals[], per Mickaël] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Signed-off-by: Mickaël Salaün <[email protected]> Cc: Al Viro <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: "J. Bruce Fields" <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Kees Cook <[email protected]> Cc: Lukas Middendorf <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Xiaoming Ni <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-22	sysctl: share unsigned long const values	Luis Chamberlain	1	-0/+3
	Provide a way to share unsigned long values. This will allow others to not have to re-invent these values. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Cc: Al Viro <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Clemens Ladisch <[email protected]> Cc: David Airlie <[email protected]> Cc: Douglas Gilbert <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jan Kara <[email protected]> Cc: Joel Becker <[email protected]> Cc: John Ogness <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Kees Cook <[email protected]> Cc: Lukas Middendorf <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Martin K. Petersen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Phillip Potter <[email protected]> Cc: Qing Wang <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Xiaoming Ni <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-22	sysctl: add helper to register a sysctl mount point	Luis Chamberlain	1	-0/+14
	The way to create a subdirectory on top of sysctl_mount_point is a bit obscure, and why we do that even so more. Provide a helper which makes it clear why we do this. [[email protected]: export register_sysctl_mount_point() to modules] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Luis Chamberlain <[email protected]> Suggested-by: "Eric W. Biederman" <[email protected]> Cc: Al Viro <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Clemens Ladisch <[email protected]> Cc: David Airlie <[email protected]> Cc: Douglas Gilbert <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jan Kara <[email protected]> Cc: Joel Becker <[email protected]> Cc: John Ogness <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Kees Cook <[email protected]> Cc: Lukas Middendorf <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Martin K. Petersen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Phillip Potter <[email protected]> Cc: Qing Wang <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Xiaoming Ni <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-22	sysctl: move some boundary constants from sysctl.c to sysctl_vals	Xiaoming Ni	1	-1/+1
	sysctl has helpers which let us specify boundary values for a min or max int value. Since these are used for a boundary check only they don't change, so move these variables to sysctl_vals to avoid adding duplicate variables. This will help with our cleanup of kernel/sysctl.c. [[email protected]: update it for "mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%"] [[email protected]: major rebase] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xiaoming Ni <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Jan Kara <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Qing Wang <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Clemens Ladisch <[email protected]> Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Lukas Middendorf <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Phillip Potter <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Douglas Gilbert <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: John Ogness <[email protected]> Cc: Martin K. Petersen <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-22	sysctl: add a new register_sysctl_init() interface	Xiaoming Ni	1	-0/+33
	Patch series "sysctl: first set of kernel/sysctl cleanups", v2. Finally had time to respin the series of the work we had started last year on cleaning up the kernel/sysct.c kitchen sink. People keeps stuffing their sysctls in that file and this creates a maintenance burden. So this effort is aimed at placing sysctls where they actually belong. I'm going to split patches up into series as there is quite a bit of work. This first set adds register_sysctl_init() for uses of registerting a sysctl on the init path, adds const where missing to a few places, generalizes common values so to be more easy to share, and starts the move of a few kernel/sysctl.c out where they belong. The majority of rework on v2 in this first patch set is 0-day fixes. Eric Biederman's feedback is later addressed in subsequent patch sets. I'll only post the first two patch sets for now. We can address the rest once the first two patch sets get completely reviewed / Acked. This patch (of 9): The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Today though folks heavily rely on tables on kernel/sysctl.c so they can easily just extend this table with their needed sysctls. In order to help users move their sysctls out we need to provide a helper which can be used during code initialization. We special-case the initialization use of register_sysctl() since it is safe to fail, given all that sysctls do is provide a dynamic interface to query or modify at runtime an existing variable. So the use case of register_sysctl() on init should not stop if the sysctls don't end up getting registered. It would be counter productive to stop boot if a simple sysctl registration failed. Provide a helper for init then, and document the recommended init levels to use for callers of this routine. We will later use this in subsequent patches to start slimming down kernel/sysctl.c tables and moving sysctl registration to the code which actually needs these sysctls. [[email protected]: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xiaoming Ni <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Qing Wang <[email protected]> Cc: Benjamin LaHaise <[email protected]> Cc: Al Viro <[email protected]> Cc: Jan Kara <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Antti Palosaari <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Clemens Ladisch <[email protected]> Cc: David Airlie <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Julia Lawall <[email protected]> Cc: Lukas Middendorf <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Phillip Potter <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Douglas Gilbert <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: John Ogness <[email protected]> Cc: Martin K. Petersen <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Steven Rostedt (VMware) <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20	sysctl: remove redundant ret assignment	luo penghao	1	-1/+0
	Subsequent if judgments will assign new values to ret, so the statement here should be deleted The clang_analyzer complains as follows: fs/proc/proc_sysctl.c: Value stored to 'ret' is never read Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: luo penghao <[email protected]> Reported-by: Zeal Robot <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-01-20	sysctl: fix duplicate path separator in printed entries	Geert Uytterhoeven	1	-4/+4
	sysctl_print_dir() always terminates the printed path name with a slash, so printing a slash before the file part causes a duplicate like in sysctl duplicate entry: /kernel//perf_user_access Fix this by dropping the extra slash. Link: https://lkml.kernel.org/r/e3054d605dc56f83971e4b6d2f5fa63a978720ad.1641551872.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Christian Brauner <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Iurii Zaikin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-03	Merge branch 'work.namei' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs name lookup updates from Al Viro: "Small namei.c patch series, mostly to simplify the rules for nameidata state. It's actually from the previous cycle - but I didn't post it for review in time... Changes visible outside of fs/namei.c: file_open_root() calling conventions change, some freed bits in LOOKUP_... space" * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namei: make sure nd->depth is always valid teach set_nameidata() to handle setting the root as well take LOOKUP_{ROOT,ROOT_GRABBED,JUMPED} out of LOOKUP_... space switch file_open_root() to struct path
2021-05-06	proc/sysctl: fix function name error in comments	zhouchuangao	1	-1/+1
	The function name should be modified to register_sysctl_paths instead of register_sysctl_table_path. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: zhouchuangao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-29	Merge tag 'kbuild-v5.13' of ↵	Linus Torvalds	1	-6/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Evaluate $(call cc-option,...) etc. only for build targets - Add CONFIG_VMLINUX_MAP to generate .map file when linking vmlinux - Remove unnecessary --gcc-toolchains Clang flag because the --prefix flag finds the toolchains - Do not pass Clang's --prefix flag when using the integrated as - Check the assembler version in Kconfig time - Add new CONFIG options, AS_VERSION, AS_IS_GNU, AS_IS_LLVM to clean up some dependencies in Kconfig - Fix invalid Module.symvers creation when building only modules without vmlinux - Fix false-positive modpost warnings when CONFIG_TRIM_UNUSED_KSYMS is set, but there is no module to build - Refactor module installation Makefile - Support zstd for module compression - Convert alpha and ia64 to use generic shell scripts to generate the syscall headers - Add a new elfnote to indicate if the kernel was built with LTO, which will be used by pahole - Flatten the directory structure under include/config/ so CONFIG options and filenames match - Change the deb source package name from linux-$(KERNELRELEASE) to linux-upstream * tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (42 commits) kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test kbuild: deb-pkg: change the source package name to linux-upstream tools: do not include scripts/Kbuild.include kbuild: redo fake deps at include/config/*.h kbuild: remove TMPO from try-run MAINTAINERS: add pattern for dummy-tools kbuild: add an elfnote for whether vmlinux is built with lto ia64: syscalls: switch to generic syscallhdr.sh ia64: syscalls: switch to generic syscalltbl.sh alpha: syscalls: switch to generic syscallhdr.sh alpha: syscalls: switch to generic syscalltbl.sh sysctl: use min() helper for namecmp() kbuild: add support for zstd compressed modules kbuild: remove CONFIG_MODULE_COMPRESS kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst kbuild: move module strip/compression code into scripts/Makefile.modinst kbuild: refactor scripts/Makefile.modinst kbuild: rename extmod-prefix to extmod_prefix kbuild: check module name conflict for external modules as well kbuild: show the target directory for depmod log ...
2021-04-25	sysctl: use min() helper for namecmp()	Masahiro Yamada	1	-6/+1
	Make it slightly readable by using min(). Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Kees Cook <[email protected]>
2021-04-07	switch file_open_root() to struct path	Al Viro	1	-1/+1
	... and provide file_open_root_mnt(), using the root of given mount. Signed-off-by: Al Viro <[email protected]>
2021-03-25	sysctl: add proc_dou8vec_minmax()	Eric Dumazet	1	-0/+6
	Networking has many sysctls that could fit in one u8. This patch adds proc_dou8vec_minmax() for this purpose. Note that the .extra1 and .extra2 fields are pointing to integers, because it makes conversions easier. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-02-26	proc: use kvzalloc for our kernel buffer	Josef Bacik	1	-2/+2
	Since sysctl: pass kernel pointers to ->proc_handler we have been pre-allocating a buffer to copy the data from the proc handlers into, and then copying that to userspace. The problem is this just blindly kzalloc()'s the buffer size passed in from the read, which in the case of our 'cat' binary was 64kib. Order-4 allocations are not awesome, and since we can potentially allocate up to our maximum order, so use kvzalloc for these buffers. [[email protected]: changelog tweaks] Link: https://lkml.kernel.org/r/6345270a2c1160b89dd5e6715461f388176899d1.1612972413.git.josef@toxicpanda.com Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> CC: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-23	Merge tag 'idmapped-mounts-v5.12' of ↵	Linus Torvalds	1	-6/+9
	git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-01-24	proc_sysctl: fix oops caused by incorrect command parameters	Xiaoming Ni	1	-1/+6
	The process_sysctl_arg() does not check whether val is empty before invoking strlen(val). If the command line parameter () is incorrectly configured and val is empty, oops is triggered. For example: "hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is triggered. The call stack is as follows: Kernel command line: .... hung_task_panic ...... Call trace: __pi_strlen+0x10/0x98 parse_args+0x278/0x344 do_sysctl_args+0x8c/0xfc kernel_init+0x5c/0xf4 ret_from_fork+0x10/0x30 To fix it, check whether "val" is empty when "phram" is a sysctl field. Error codes are returned in the failure branch, and error logs are generated by parse_args(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line") Signed-off-by: Xiaoming Ni <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Heiner Kallweit <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: <[email protected]> [5.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-01-24	fs: make helpers idmap mount aware	Christian Brauner	1	-3/+6
	Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2021-01-24	stat: handle idmapped mounts	Christian Brauner	1	-1/+1
	The generic_fillattr() helper fills in the basic attributes associated with an inode. Enable it to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace before we store the uid and gid. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: James Morris <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2021-01-24	attr: handle idmapped mounts	Christian Brauner	1	-2/+2
	When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/[email protected] Cc: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Al Viro <[email protected]> Cc: [email protected] Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2020-09-08	sysctl: Convert to iter interfaces	Matthew Wilcox (Oracle)	1	-24/+24
	Using the read_iter/write_iter interfaces allows for in-kernel users to set sysctls without using set_fs(). Also, the buffer is a string, so give it the real type of 'char ', not void . [AV: Christoph's fixup folded in] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-07-03	Call sysctl_head_finish on error	Matthew Wilcox (Oracle)	1	-3/+3
	This error path returned directly instead of calling sysctl_head_finish(). Fixes: ef9d965bc8b6 ("sysctl: reject gigantic reads/write to sysctl files") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-06-10	Merge branch 'work.sysctl' of ↵	Linus Torvalds	1	-0/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull sysctl fixes from Al Viro: "Fixups to regressions in sysctl series" * 'work.sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sysctl: reject gigantic reads/write to sysctl files cdrom: fix an incorrect __user annotation on cdrom_sysctl_info trace: fix an incorrect __user annotation on stack_trace_sysctl random: fix an incorrect __user annotation on proc_do_entropy net/sysctl: remove leftover __user annotations on neigh_proc_dointvec* net/sysctl: use cpumask_parse in flow_limit_cpu_sysctl
2020-06-10	sysctl: reject gigantic reads/write to sysctl files	Christoph Hellwig	1	-0/+4
	Instead of triggering a WARN_ON deep down in the page allocator just give up early on allocations that are way larger than the usual sysctl values. Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Reported-by: Vegard Nossum <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-06-08	kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases	Guilherme G. Piccoli	1	-2/+5
	After a recent change introduced by Vlastimil's series [0], kernel is able now to handle sysctl parameters on kernel command line; also, the series introduced a simple infrastructure to convert legacy boot parameters (that duplicate sysctls) into sysctl aliases. This patch converts the watchdog parameters softlockup_panic and {hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure. It fixes the documentation too, since the alias only accepts values 0 or 1, not the full range of integers. We also took the opportunity here to improve the documentation of the previously converted hung_task_panic (see the patch series [0]) and put the alias table in alphabetical order. [0] http://lkml.kernel.org/r/[email protected] Signed-off-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Kees Cook <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Luis Chamberlain <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-08	kernel/hung_task convert hung_task_panic boot parameter to sysctl	Vlastimil Babka	1	-0/+1
	We can now handle sysctl parameters on kernel command line and have infrastructure to convert legacy command line options that duplicate sysctl to become a sysctl alias. This patch converts the hung_task_panic parameter. Note that the sysctl handler is more strict and allows only 0 and 1, while the legacy parameter allowed any non-zero value. But there is little reason anyone would not be using 1. Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christian Brauner <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Guilherme G . Piccoli" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Ivan Teterevkov <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-08	kernel/sysctl: support handling command line aliases	Vlastimil Babka	1	-7/+41
	We can now handle sysctl parameters on kernel command line, but historically some parameters introduced their own command line equivalent, which we don't want to remove for compatibility reasons. We can, however, convert them to the generic infrastructure with a table translating the legacy command line parameters to their sysctl names, and removing the one-off param handlers. This patch adds the support and makes the first conversion to demonstrate it, on the (deprecated) numa_zonelist_order parameter. Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Christian Brauner <[email protected]> Cc: David Rientjes <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Guilherme G . Piccoli" <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Ivan Teterevkov <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-08	kernel/sysctl: support setting sysctl parameters from kernel command line	Vlastimil Babka	1	-0/+107
	Patch series "support setting sysctl parameters from kernel command line", v3. This series adds support for something that seems like many people always wanted but nobody added it yet, so here's the ability to set sysctl parameters via kernel command line options in the form of sysctl.vm.something=1 The important part is Patch 1. The second, not so important part is an attempt to clean up legacy one-off parameters that do the same thing as a sysctl. I don't want to remove them completely for compatibility reasons, but with generic sysctl support the idea is to remove the one-off param handlers and treat the parameters as aliases for the sysctl variants. I have identified several parameters that mention sysctl counterparts in Documentation/admin-guide/kernel-parameters.txt but there might be more. The conversion also has varying level of success: - numa_zonelist_order is converted in Patch 2 together with adding the necessary infrastructure. It's easy as it doesn't really do anything but warn on deprecated value these days. - hung_task_panic is converted in Patch 3, but there's a downside that now it only accepts 0 and 1, while previously it was any integer value - nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic, so there's no straighforward conversion possible - traceoff_on_warning is a flag without value and it would be required to handle that somehow in the conversion infractructure, which seems pointless for a single flag This patch (of 5): A recently proposed patch to add vm_swappiness command line parameter in addition to existing sysctl [1] made me wonder why we don't have a general support for passing sysctl parameters via command line. Googling found only somebody else wondering the same [2], but I haven't found any prior discussion with reasons why not to do this. Settings the vm_swappiness issue aside (the underlying issue might be solved in a different way), quick search of kernel-parameters.txt shows there are already some that exist as both sysctl and kernel parameter - hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning. A general mechanism would remove the need to add more of those one-offs and might be handy in situations where configuration by e.g. /etc/sysctl.d/ is impractical. Hence, this patch adds a new parse_args() pass that looks for parameters prefixed by 'sysctl.' and tries to interpret them as writes to the corresponding sys/ files using an temporary in-kernel procfs mount. This mechanism was suggested by Eric W. Biederman [3], as it handles all dynamically registered sysctl tables, even though we don't handle modular sysctls. Errors due to e.g. invalid parameter name or value are reported in the kernel log. The processing is hooked right before the init process is loaded, as some handlers might be more complicated than simple setters and might need some subsystems to be initialized. At the moment the init process can be started and eventually execute a process writing to /proc/sys/ then it should be also fine to do that from the kernel. Sysctls registered later on module load time are not set by this mechanism - it's expected that in such scenarios, setting sysctl values from userspace is practical enough. [1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/ [2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter [3] https://lore.kernel.org/r/[email protected]/ Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Iurii Zaikin <[email protected]> Cc: Ivan Teterevkov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: "Guilherme G . Piccoli" <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Christian Brauner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-27	sysctl: pass kernel pointers to ->proc_handler	Christoph Hellwig	1	-18/+29
	Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Andrey Ignatov <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-02-24	proc: Use d_invalidate in proc_prune_siblings_dcache	Eric W. Biederman	1	-4/+4
	The function d_prune_aliases has the problem that it will only prune aliases thare are completely unused. It will not remove aliases for the dcache or even think of removing mounts from the dcache. For that behavior d_invalidate is needed. To use d_invalidate replace d_prune_aliases with d_find_alias followed by d_invalidate and dput. For completeness the directory and the non-directory cases are separated because in theory (although not in currently in practice for proc) directories can only ever have a single dentry while non-directories can have hardlinks and thus multiple dentries. As part of this separation use d_find_any_alias for directories to spare d_find_alias the extra work of doing that. Plus the differences between d_find_any_alias and d_find_alias makes it clear why the directory and non-directory code and not share code. To make it clear these routines now invalidate dentries rename proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename proc_sys_prune_dcache proc_sys_invalidate_dcache. V2: Split the directory and non-directory cases. To make this code robust to future changes in proc. Signed-off-by: "Eric W. Biederman" <[email protected]>
2020-02-20	proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache	Eric W. Biederman	1	-34/+1
	This prepares the way for allowing the pid part of proc to use this dcache pruning code as well. Signed-off-by: Eric W. Biederman <[email protected]>