aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-02-09PM: EM: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-4/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-02-09time/debug: Fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-1/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-08kernel/fail_function: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-4/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: Andrew Morton <[email protected]> Reviewed-by: Yang Yingliang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-08kernel/power/energy_model.c: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-4/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: "Rafael J. Wysocki" <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Len Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-08kernel/time/test_udelay.c: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-1/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-07sched/topology: Introduce sched_numa_hop_mask()Valentin Schneider1-0/+33
Tariq has pointed out that drivers allocating IRQ vectors would benefit from having smarter NUMA-awareness - cpumask_local_spread() only knows about the local node and everything outside is in the same bucket. sched_domains_numa_masks is pretty much what we want to hand out (a cpumask of CPUs reachable within a given distance budget), introduce sched_numa_hop_mask() to export those cpumasks. Link: http://lore.kernel.org/r/[email protected] Signed-off-by: Valentin Schneider <[email protected]> Reviewed-by: Yury Norov <[email protected]> Signed-off-by: Yury Norov <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-07sched: add sched_numa_find_nth_cpu()Yury Norov1-0/+57
The function finds Nth set CPU in a given cpumask starting from a given node. Leveraging the fact that each hop in sched_domains_numa_masks includes the same or greater number of CPUs than the previous one, we can use binary search on hops instead of linear walk, which makes the overall complexity of O(log n) in terms of number of cpumask_weight() calls. Signed-off-by: Yury Norov <[email protected]> Acked-by: Tariq Toukan <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Peter Lafreniere <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-07tracing: Allow boot instances to have snapshot buffersSteven Rostedt (Google)1-7/+72
Add to ftrace_boot_snapshot, "=<instance>" name, where the instance will get a snapshot buffer, and will take a snapshot at the end of boot (which will save the boot traces). Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Add trace_array_puts() to write into instanceSteven Rostedt (Google)1-10/+17
Add a generic trace_array_puts() that can be used to "trace_puts()" into an allocated trace_array instance. This is just another variant of trace_array_printk(). Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Add enabling of events to boot instancesSteven Rostedt (Google)3-5/+10
Add the format of: trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall That will create the "foo" instance and enable the sched_switch event (here were the "sched" system is explicitly specified), the irq_handler_entry event, and all events under the system initcall. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Add creation of instances at boot command lineSteven Rostedt (Google)1-0/+50
Add kernel command line to add tracing instances. This only creates instances at boot but still does not enable any events to them. Later changes will extend this command line to add enabling of events, filters, and triggers. As well as possibly redirecting trace_printk()! Link: https://lkml.kernel.org/r/[email protected] Cc: Randy Dunlap <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Andrew Morton <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing: Fix trace_event_raw_event_synth() if else statementSteven Rostedt (Google)1-2/+2
The test to check if the field is a stack is to be done if it is not a string. But the code had: } if (event->fields[i]->is_stack) { and not } else if (event->fields[i]->is_stack) { which would cause it to always be tested. Worse yet, this also included an "else" statement that was only to be called if the field was not a string and a stack, but this code allows it to be called if it was a string (and not a stack). Also fixed some whitespace issues. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Tom Zanussi <[email protected]> Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Reported-by: kernel test robot <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]>
2023-02-07tracing: Acquire buffer from temparary trace sequenceLinyu Yuan1-0/+23
there is one dwc3 trace event declare as below, DECLARE_EVENT_CLASS(dwc3_log_event, TP_PROTO(u32 event, struct dwc3 *dwc), TP_ARGS(event, dwc), TP_STRUCT__entry( __field(u32, event) __field(u32, ep0state) __dynamic_array(char, str, DWC3_MSG_MAX) ), TP_fast_assign( __entry->event = event; __entry->ep0state = dwc->ep0state; ), TP_printk("event (%08x): %s", __entry->event, dwc3_decode_event(__get_str(str), DWC3_MSG_MAX, __entry->event, __entry->ep0state)) ); the problem is when trace function called, it will allocate up to DWC3_MSG_MAX bytes from trace event buffer, but never fill the buffer during fast assignment, it only fill the buffer when output function are called, so this means if output function are not called, the buffer will never used. add __get_buf(len) which acquiree buffer from iter->tmp_seq when trace output function called, it allow user write string to acquired buffer. the mentioned dwc3 trace event will changed as below, DECLARE_EVENT_CLASS(dwc3_log_event, TP_PROTO(u32 event, struct dwc3 *dwc), TP_ARGS(event, dwc), TP_STRUCT__entry( __field(u32, event) __field(u32, ep0state) ), TP_fast_assign( __entry->event = event; __entry->ep0state = dwc->ep0state; ), TP_printk("event (%08x): %s", __entry->event, dwc3_decode_event(__get_buf(DWC3_MSG_MAX), DWC3_MSG_MAX, __entry->event, __entry->ep0state)) );. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Linyu Yuan <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07tracing/osnoise: No need for schedule_hrtimeout rangeDavidlohr Bueso1-1/+1
No slack time is being passed, just use schedule_hrtimeout(). Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-07Merge tag 'trace-v6.2-rc6' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix regression in poll() and select() With the fix that made poll() and select() block if read would block caused a slight regression in rasdaemon, as it needed that kind of behavior. Add a way to make that behavior come back by writing zero into the 'buffer_percentage', which means to never block on read" * tag 'trace-v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
2023-02-07fanotify,audit: Allow audit to use the full permission event responseRichard Guy Briggs1-3/+15
This patch passes the full response so that the audit function can use all of it. The audit function was updated to log the additional information in the AUDIT_FANOTIFY record. Currently the only type of fanotify info that is defined is an audit rule number, but convert it to hex encoding to future-proof the field. Hex encoding suggested by Paul Moore <[email protected]>. The {subj,obj}_trust values are {0,1,2}, corresponding to no, yes, unknown. Sample records: type=FANOTIFY msg=audit(1600385147.372:590): resp=2 fan_type=1 fan_info=3137 subj_trust=3 obj_trust=5 type=FANOTIFY msg=audit(1659730979.839:284): resp=1 fan_type=0 fan_info=0 subj_trust=2 obj_trust=2 Suggested-by: Steve Grubb <[email protected]> Link: https://lore.kernel.org/r/3075502.aeNJFYEL58@x2 Tested-by: Steve Grubb <[email protected]> Acked-by: Steve Grubb <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <bcb6d552e517b8751ece153e516d8b073459069c.1675373475.git.rgb@redhat.com>
2023-02-07fanotify: Ensure consistent variable type for responseRichard Guy Briggs1-1/+1
The user space API for the response variable is __u32. This patch makes sure that the whole path through the kernel uses u32 so that there is no sign extension or truncation of the user space response. Suggested-by: Steve Grubb <[email protected]> Link: https://lore.kernel.org/r/12617626.uLZWGnKmhe@x2 Signed-off-by: Richard Guy Briggs <[email protected]> Acked-by: Paul Moore <[email protected]> Tested-by: Steve Grubb <[email protected]> Acked-by: Steve Grubb <[email protected]> Signed-off-by: Jan Kara <[email protected]> Message-Id: <3778cb0b3501bc4e686ba7770b20eb9ab0506cf4.1675373475.git.rgb@redhat.com>
2023-02-06cpuset: Call set_cpus_allowed_ptr() with appropriate mask for taskWill Deacon1-7/+11
set_cpus_allowed_ptr() will fail with -EINVAL if the requested affinity mask is not a subset of the task_cpu_possible_mask() for the task being updated. Consequently, on a heterogeneous system with cpusets spanning the different CPU types, updates to the cgroup hierarchy can silently fail to update task affinities when the effective affinity mask for the cpuset is expanded. For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are the only cores capable of executing 32-bit tasks. Attaching a 32-bit task to a cpuset containing CPUs 0-2 will correctly affine the task to CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend the affinity mask of the 32-bit task because update_tasks_cpumask() will pass the full 0-3 mask to set_cpus_allowed_ptr(). Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater and use it to mask the 'effective_cpus' mask with the possible mask for each task being updated. Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") Signed-off-by: Will Deacon <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-02-06cgroup/cpuset: Don't filter offline CPUs in cpuset_cpus_allowed() for top ↵Waiman Long1-2/+25
cpuset tasks Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask"), relax_compatible_cpus_allowed_ptr() is calling __sched_setaffinity() unconditionally. This helps to expose a bug in the current cpuset hotplug code where the cpumasks of the tasks in the top cpuset are not updated at all when some CPUs become online or offline. It is likely caused by the fact that some of the tasks in the top cpuset, like percpu kthreads, cannot have their cpu affinity changed. One way to reproduce this as suggested by Peter is: - boot machine - offline all CPUs except one - taskset -p ffffffff $$ - online all CPUs Fix this by allowing cpuset_cpus_allowed() to return a wider mask that includes offline CPUs for those tasks that are in the top cpuset. For tasks not in the top cpuset, the old rule applies and only online CPUs will be returned in the mask since hotplug events will update their cpumasks accordingly. Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") Reported-by: Will Deacon <[email protected]> Originally-from: Peter Zijlstra (Intel) <[email protected]> Tested-by: Will Deacon <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-02-06genirq/ipi-mux: Use irq_domain_alloc_irqs()Marc Zyngier1-2/+1
Using __irq_domain_alloc_irqs() is an unnecessary complexity. Use irq_domain_alloc_irqs(), which is simpler and makes the code more readable. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2023-02-06trace/blktrace: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-2/+2
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: Jens Axboe <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-02-06rtmutex: Ensure that the top waiter is always woken upWander Lairson Costa1-2/+3
Let L1 and L2 be two spinlocks. Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2. Let T2 be the task holding L2. Let T3 be a task trying to acquire L1. The following events will lead to a state in which the wait queue of L2 isn't empty, but no task actually holds the lock. T1 T2 T3 == == == spin_lock(L1) | raw_spin_lock(L1->wait_lock) | rtlock_slowlock_locked(L1) | | task_blocks_on_rt_mutex(L1, T3) | | | orig_waiter->lock = L1 | | | orig_waiter->task = T3 | | | raw_spin_unlock(L1->wait_lock) | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) | | | | | rt_mutex_slowunlock(L2) | | | | | | raw_spin_lock(L2->wait_lock) | | | | | | wakeup(T1) | | | | | | raw_spin_unlock(L2->wait_lock) | | | | | | | | waiter = T1->pi_blocked_on | | | | waiter == rt_mutex_top_waiter(L2) | | | | waiter->task == T1 | | | | raw_spin_lock(L2->wait_lock) | | | | dequeue(L2, waiter) | | | | update_prio(waiter, T1) | | | | enqueue(L2, waiter) | | | | waiter != rt_mutex_top_waiter(L2) | | | | L2->owner == NULL | | | | wakeup(T1) | | | | raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock() If the deadline of T1 is updated before the call to update_prio(), and the new deadline is greater than the deadline of the second top waiter, then after the requeue, T1 is no longer the top waiter, and the wrong task is woken up which will then go back to sleep because it is not the top waiter. This can be reproduced in PREEMPT_RT with stress-ng: while true; do stress-ng --sched deadline --sched-period 1000000000 \ --sched-runtime 800000000 --sched-deadline \ 1000000000 --mmapfork 23 -t 20 done A similar issue was pointed out by Thomas versus the cases where the top waiter drops out early due to a signal or timeout, which is a general issue for all regular rtmutex use cases, e.g. futex. The problematic code is in rt_mutex_adjust_prio_chain(): // Save the top waiter before dequeue/enqueue prerequeue_top_waiter = rt_mutex_top_waiter(lock); rt_mutex_dequeue(lock, waiter); waiter_update_prio(waiter, task); rt_mutex_enqueue(lock, waiter); // Lock has no owner? if (!rt_mutex_owner(lock)) { // Top waiter changed ----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) ----> wake_up_state(waiter->task, waiter->wake_state); This only takes the case into account where @waiter is the new top waiter due to the requeue operation. But it fails to handle the case where @waiter is not longer the top waiter due to the requeue operation. Ensure that the new top waiter is woken up so in all cases so it can take over the ownerless lock. [ tglx: Amend changelog, add Fixes tag ] Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter") Signed-off-by: Wander Lairson Costa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
2023-02-06posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()Uros Bizjak1-7/+6
Use atomic64_try_cmpxchg() instead of atomic64_cmpxchg() in __update_gt_cputime(). The x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg() (and related move instruction in front of cmpxchg()). Also, atomic64_try_cmpxchg() implicitly assigns old *ptr value to "old" when cmpxchg() fails. There is no need to re-read the value in the loop. No functional change intended. Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-06Merge 6.2-rc7 into char-misc-nextGreg Kroah-Hartman21-81/+144
We need the char-misc driver fixes in here as other patches depend on them. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-05Merge tag 'char-misc-6.2-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are a number of small char/misc/whatever driver fixes. They include: - IIO driver fixes for some reported problems - nvmem driver fixes - fpga driver fixes - debugfs memory leak fix in the hv_balloon and irqdomain code (irqdomain change was acked by the maintainer) All have been in linux-next with no reported problems" * tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (33 commits) kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup() HV: hv_balloon: fix memory leak with using debugfs_lookup() nvmem: qcom-spmi-sdam: fix module autoloading nvmem: core: fix return value nvmem: core: fix cell removal on error nvmem: core: fix device node refcounting nvmem: core: fix registration vs use race nvmem: core: fix cleanup after dev_set_name() nvmem: core: remove nvmem_config wp_gpio nvmem: core: initialise nvmem->id early nvmem: sunxi_sid: Always use 32-bit MMIO reads nvmem: brcm_nvram: Add check for kzalloc iio: imu: fxos8700: fix MAGN sensor scale and unit iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN iio: imu: fxos8700: fix failed initialization ODR mode assignment iio: imu: fxos8700: fix incorrect ODR mode readback iio: light: cm32181: Fix PM support on system with 2 I2C resources iio: hid: fix the retval in gyro_3d_capture_sample iio: hid: fix the retval in accel_3d_capture_sample iio: imu: st_lsm6dsx: fix build when CONFIG_IIO_TRIGGERED_BUFFER=m ...
2023-02-05Merge tag 'perf_urgent_for_v6.2_rc7' of ↵Linus Torvalds1-22/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Lock the proper critical section when dealing with perf event context * tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix perf_event_pmu_context serialization
2023-02-05genirq: Add mechanism to multiplex a single HW IPIAnup Patel3-0/+213
All RISC-V platforms have a single HW IPI provided by the INTC local interrupt controller. The HW method to trigger INTC IPI can be through external irqchip (e.g. RISC-V AIA), through platform specific device (e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call). To support multiple IPIs on RISC-V, add a generic IPI multiplexing mechanism which help us create multiple virtual IPIs using a single HW IPI. This generic IPI multiplexing is inspired by the Apple AIC irqchip driver and it is shared by various RISC-V irqchip drivers. Signed-off-by: Anup Patel <[email protected]> Reviewed-by: Hector Martin <[email protected]> Tested-by: Hector Martin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-03blk-cgroup: store a gendisk to throttle in struct task_structChristoph Hellwig1-1/+1
Switch from a request_queue pointer and reference to a gendisk once for the throttle information in struct task_struct. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Andreas Herrmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-02-03livepatch,x86: Clear relocation targets on a module removalSong Liu1-13/+49
Josh reported a bug: When the object to be patched is a module, and that module is rmmod'ed and reloaded, it fails to load with: module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8) livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd' The livepatch module has a relocation which references a symbol in the _previous_ loading of nfsd. When apply_relocate_add() tries to replace the old relocation with a new one, it sees that the previous one is nonzero and it errors out. He also proposed three different solutions. We could remove the error check in apply_relocate_add() introduced by commit eda9cec4c9a1 ("x86/module: Detect and skip invalid relocations"). However the check is useful for detecting corrupted modules. We could also deny the patched modules to be removed. If it proved to be a major drawback for users, we could still implement a different approach. The solution would also complicate the existing code a lot. We thus decided to reverse the relocation patching (clear all relocation targets on x86_64). The solution is not universal and is too much arch-specific, but it may prove to be simpler in the end. Reported-by: Josh Poimboeuf <[email protected]> Originally-by: Miroslav Benes <[email protected]> Signed-off-by: Song Liu <[email protected]> Acked-by: Miroslav Benes <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Josh Poimboeuf <[email protected]> Reviewed-by: Joe Lawrence <[email protected]> Tested-by: Joe Lawrence <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-03kernel/printk/index.c: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-1/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: Chris Down <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: John Ogness <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Reviewed-by: John Ogness <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-02-02kexec: introduce sysctl parameters kexec_load_limit_*Ricardo Ribalda3-7/+95
kexec allows replacing the current kernel with a different one. This is usually a source of concerns for sysadmins that want to harden a system. Linux already provides a way to disable loading new kexec kernel via kexec_load_disabled, but that control is very coard, it is all or nothing and does not make distinction between a panic kexec and a normal kexec. This patch introduces new sysctl parameters, with finer tuning to specify how many times a kexec kernel can be loaded. The sysadmin can set different limits for kexec panic and kexec reboot kernels. The value can be modified at runtime via sysctl, but only with a stricter value. With these new parameters on place, a system with loadpin and verity enabled, using the following kernel parameters: sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a good warranty that if initrd tries to load a panic kernel, a malitious user will have small chances to replace that kernel with a different one, even if they can trigger timeouts on the disk where the panic kernel lives. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ricardo Ribalda <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Guilherme G. Piccoli <[email protected]> # Steam Deck Cc: Joel Fernandes (Google) <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02kexec: factor out kexec_load_permittedRicardo Ribalda3-3/+12
Both syscalls (kexec and kexec_file) do the same check, let's factor it out. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ricardo Ribalda <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Guilherme G. Piccoli <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02userns: fix a struct's kernel-doc notationRandy Dunlap1-1/+1
Use the 'struct' keyword for a struct's kernel-doc notation to avoid a kernel-doc warning: kernel/user_namespace.c:232: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * idmap_key struct holds the information necessary to find an idmapping in a Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Eric Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02kthread_worker: check all delayed works when destroy kthread workerZqiang1-0/+5
When destroying a kthread worker warn if there are still some pending delayed works. This indicates that the caller should clear all pending delayed works before destroying the kthread worker. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zqiang <[email protected]> Acked-by: Tejun Heo <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-03kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()Greg Kroah-Hartman1-1/+1
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. To make things simpler, just call debugfs_lookup_and_remove() instead which handles all of the logic at once. Cc: Thomas Gleixner <[email protected]> Cc: stable <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-02-02mm: implement memory-deny-write-execute as a prctlJoey Gouly1-0/+33
Patch series "mm: In-kernel support for memory-deny-write-execute (MDWE)", v2. The background to this is that systemd has a configuration option called MemoryDenyWriteExecute [2], implemented as a SECCOMP BPF filter. Its aim is to prevent a user task from inadvertently creating an executable mapping that is (or was) writeable. Since such BPF filter is stateless, it cannot detect mappings that were previously writeable but subsequently changed to read-only. Therefore the filter simply rejects any mprotect(PROT_EXEC). The side-effect is that on arm64 with BTI support (Branch Target Identification), the dynamic loader cannot change an ELF section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect(). For libraries, it can resort to unmapping and re-mapping but for the main executable it does not have a file descriptor. The original bug report in the Red Hat bugzilla - [3] - and subsequent glibc workaround for libraries - [4]. This series adds in-kernel support for this feature as a prctl PR_SET_MDWE, that is inherited on fork(). The prctl denies PROT_WRITE | PROT_EXEC mappings. Like the systemd BPF filter it also denies adding PROT_EXEC to mappings. However unlike the BPF filter it only denies it if the mapping didn't previous have PROT_EXEC. This allows to PROT_EXEC -> PROT_EXEC | PROT_BTI with mprotect(), which is a problem with the BPF filter. This patch (of 2): The aim of such policy is to prevent a user task from creating an executable mapping that is also writeable. An example of mmap() returning -EACCESS if the policy is enabled: mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0); Similarly, mprotect() would return -EACCESS below: addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC); The BPF filter that systemd MDWE uses is stateless, and disallows mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to be enabled if it was already PROT_EXEC, which allows the following case: addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0); mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI); where PROT_BTI enables branch tracking identification on arm64. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joey Gouly <[email protected]> Co-developed-by: Catalin Marinas <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Jeremy Linton <[email protected]> Cc: Kees Cook <[email protected]> Cc: Lennart Poettering <[email protected]> Cc: Mark Brown <[email protected]> Cc: nd <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Szabolcs Nagy <[email protected]> Cc: Topi Miettinen <[email protected]> Cc: Zbigniew Jędrzejewski-Szmek <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02mm: remove munlock_vma_page()Matthew Wilcox (Oracle)1-1/+0
All callers now have a folio and can call munlock_vma_folio(). Update the documentation to refer to munlock_vma_folio(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02mm: remove 'First tail page' members from struct pageMatthew Wilcox (Oracle)1-2/+2
All former users now use the folio equivalents, so remove them from the definition of struct page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only exportAlistair Popple1-1/+1
mmu_notifier_range_update_to_read_only() was originally introduced in commit c6d23413f81b ("mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper") as an optimisation for device drivers that know a range has only been mapped read-only. However there are no users of this feature so remove it. As it is the only user of the struct mmu_notifier_range.vma field remove that also. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Jerome Glisse <[email protected]> Cc: John Hubbard <[email protected]> Cc: Ralph Campbell <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-02bpf: devmap: check XDP features in __xdp_enqueue routineLorenzo Bianconi1-3/+13
Check if the destination device implements ndo_xdp_xmit callback relying on NETDEV_XDP_ACT_NDO_XMIT flags. Moreover, check if the destination device supports XDP non-linear frame in __xdp_enqueue and is_valid_dst routines. This patch allows to perform XDP_REDIRECT on non-linear XDP buffers. Acked-by: Jesper Dangaard Brouer <[email protected]> Co-developed-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/26a94c33520c0bfba021b3fbb2cb8c1e69bf53b8.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-02bpf: Drop always true do_idr_lock parameter to bpf_map_free_idTobias Klauser2-18/+7
The do_idr_lock parameter to bpf_map_free_id was introduced by commit bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID"). However, all callers set do_idr_lock = true since commit 1e0bd5a091e5 ("bpf: Switch bpf_map ref counter to atomic64_t so bpf_map_inc() never fails"). While at it also inline __bpf_map_put into its only caller bpf_map_put now that do_idr_lock can be dropped from its signature. Signed-off-by: Tobias Klauser <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-02Merge branch 'stall.2023.01.09a' into HEADPaul E. McKenney7-4/+93
stall.2023.01.09a: RCU CPU stall-warning updates.
2023-02-02Merge branches 'doc.2023.01.05a', 'fixes.2023.01.23a', 'kvfree.2023.01.03a', ↵Paul E. McKenney15-400/+919
'srcu.2023.01.03a', 'srcu-always.2023.02.02a', 'tasks.2023.01.03a', 'torture.2023.01.05a' and 'torturescript.2023.01.03a' into HEAD doc.2023.01.05a: Documentation update. fixes.2023.01.23a: Miscellaneous fixes. kvfree.2023.01.03a: kvfree_rcu() updates. srcu.2023.01.03a: SRCU updates. srcu-always.2023.02.02a: Finish making SRCU be unconditionally available. tasks.2023.01.03a: Tasks-RCU updates. torture.2023.01.05a: Torture-test updates. torturescript.2023.01.03a: Torture-test scripting updates.
2023-02-02kernel/notifier: Remove CONFIG_SRCUPaul E. McKenney1-3/+0
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in conditional compilation based on CONFIG_SRCU. Therefore, remove the #ifdef. Signed-off-by: Paul E. McKenney <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: "Michał Mirosław" <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Alan Stern <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Reviewed-by: John Ogness <[email protected]>
2023-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski17-30/+70
net/core/gro.c 7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO") b1a78b9b9886 ("net: add support for ipv4 big tcp") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-02Merge tag 'net-6.2-rc7' of ↵Linus Torvalds5-12/+23
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, can and netfilter. Current release - regressions: - phy: fix null-deref in phy_attach_direct - mac802154: fix possible double free upon parsing error Previous releases - regressions: - bpf: preserve reg parent/live fields when copying range info, prevent mis-verification of programs as safe - ip6: fix GRE tunnels not generating IPv6 link local addresses - phy: dp83822: fix null-deref on DP83825/DP83826 devices - sctp: do not check hb_timer.expires when resetting hb_timer - eth: mtk_sock: fix SGMII configuration after phylink conversion Previous releases - always broken: - eth: xdp: execute xdp_do_flush() before napi_complete_done() - skb: do not mix page pool and page referenced frags in GRO - bpf: - fix a possible task gone issue with bpf_send_signal[_thread]() - fix an off-by-one bug in bpf_mem_cache_idx() to select the right cache - add missing btf_put to register_btf_id_dtor_kfuncs - sockmap: fon't let sock_map_{close,destroy,unhash} call itself - gso: fix null-deref in skb_segment_list() - mctp: purge receive queues on sk destruction - fix UaF caused by accept on already connected socket in exotic socket families - tls: don't treat list head as an entry in tls_is_tx_ready() - netfilter: br_netfilter: disable sabotage_in hook after first suppression - wwan: t7xx: fix runtime PM implementation Misc: - MAINTAINERS: spring cleanup of networking maintainers" * tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) mtk_sgmii: enable PCS polling to allow SFP work net: mediatek: sgmii: fix duplex configuration net: mediatek: sgmii: ensure the SGMII PHY is powered down on configuration MAINTAINERS: update SCTP maintainers MAINTAINERS: ipv6: retire Hideaki Yoshifuji mailmap: add John Crispin's entry MAINTAINERS: bonding: move Veaceslav Falico to CREDITS net: openvswitch: fix flow memory leak in ovs_flow_cmd_new net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC virtio-net: Keep stop() to follow mirror sequence of open() selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq can: isotp: split tx timer into transmission and timeout can: isotp: handle wait_event_interruptible() return values can: raw: fix CAN FD frame transmissions over CAN XL devices can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap() ...
2023-02-02tracing: Fix poll() and select() do not work on per_cpu trace_pipe and ↵Shiju Jose1-3/+0
trace_pipe_raw poll() and select() on per_cpu trace_pipe and trace_pipe_raw do not work since kernel 6.1-rc6. This issue is seen after the commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have polling block on watermark"). This issue is firstly detected and reported, when testing the CXL error events in the rasdaemon and also erified using the test application for poll() and select(). This issue occurs for the per_cpu case, when calling the ring_buffer_poll_wait(), in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until the percentage of pages are available. The default value set for the buffer_percent is 50 in the kernel/trace/trace.c. As a fix, allow userspace application could set buffer_percent as 0 through the buffer_percent_fops, so that the task will wake up as soon as data is added to any of the specific cpu buffer. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: [email protected] Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark") Signed-off-by: Shiju Jose <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-02-02Merge tag 'coresight-next-v6.3' of ↵Greg Kroah-Hartman1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: Updates for v6.3 - Dynamic TraceID allocation scheme for CoreSight trace source. Allows systems with > 44 CPUs to use the ETMs. TraceID is advertised via AUX_OUTPUT_HWID packets in perf.data. Also allows allocating trace-ids for non-CPU bound trace components (e.g., Qualcomm TPDA). - Support for Qualcomm TPDA and TPDM CoreSight devices. - Support for Ultrasoc SMB CoreSight Sink buffer. - Fixes for HiSilicon PTT driver - MAINTAINERS update: Add Reviewer for HiSilicon PTT driver - Bug fixes for CTI power management and sysfs mode - Fix CoreSight ETM4x TRCSEQRSTEVRn access Signed-off-by: Suzuki K Poulose <[email protected]> * tag 'coresight-next-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (35 commits) coresight: tmc: Don't enable TMC when it's not ready. coresight: tpda: fix return value check in tpda_probe() Coresight: tpda/tpdm: remove incorrect __exit annotation coresight: perf: Output trace id only once coresight: Fix uninitialised variable use in coresight_disable Documentation: coresight: tpdm: Add dummy comment after sysfs list Documentation: coresight: Extend title heading syntax in TPDM and TPDA documentation Documentation: trace: Add documentation for TPDM and TPDA dt-bindings: arm: Adds CoreSight TPDA hardware definitions Coresight: Add TPDA link driver coresight-tpdm: Add integration test support coresight-tpdm: Add DSB dataset support dt-bindings: arm: Add CoreSight TPDM hardware Coresight: Add coresight TPDM source driver coresight: core: Use IDR for non-cpu bound sources' paths. coresight: trace-id: Add debug & test macros to Trace ID allocation coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID kernel: events: Export perf_report_aux_output_id() coresight: trace id: Remove legacy get trace ID function. coresight: etmX.X: stm: Remove trace_id() callback ...
2023-02-02bpf: Add __bpf_kfunc tag to all kfuncsDavid Vernet5-56/+57
Now that we have the __bpf_kfunc tag, we should use add it to all existing kfuncs to ensure that they'll never be elided in LTO builds. Signed-off-by: David Vernet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-01-31cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask()Waiman Long1-1/+2
It was found that the check to see if a partition could use up all the cpus from the parent cpuset in update_parent_subparts_cpumask() was incorrect. As a result, it is possible to leave parent with no effective cpu left even if there are tasks in the parent cpuset. This can lead to system panic as reported in [1]. Fix this probem by updating the check to fail the enabling the partition if parent's effective_cpus is a subset of the child's cpus_allowed. Also record the error code when an error happens in update_prstate() and add a test case where parent partition and child have the same cpu list and parent has task. Enabling partition in the child will fail in this case. [1] https://www.spinics.net/lists/cgroups/msg36254.html Fixes: f0af1bfc27b5 ("cgroup/cpuset: Relax constraints to partition & cpus changes") Cc: [email protected] # v6.1 Reported-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>