<feed xmlns='http://www.w3.org/2005/Atom'>
<title>blaster4385/linux-IllusionX/kernel/sched/ext.c, branch v6.12.10</title>
<subtitle>Linux kernel with personal config changes for arch linux</subtitle>
<id>https://git.tablaster.dev/blaster4385/linux-IllusionX/atom?h=v6.12.10</id>
<link rel='self' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/atom?h=v6.12.10'/>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/'/>
<updated>2025-01-17T12:40:49Z</updated>
<entry>
<title>sched_ext: idle: Refresh idle masks during idle-to-idle transitions</title>
<updated>2025-01-17T12:40:49Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-01-10T22:16:31Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=cfe32daafd6ca0b23e3d2d299e249e78506d6779'/>
<id>urn:sha1:cfe32daafd6ca0b23e3d2d299e249e78506d6779</id>
<content type='text'>
[ Upstream commit a2a3374c47c428c0edb0bbc693638d4783f81e31 ]

With the consolidation of put_prev_task/set_next_task(), see
commit 436f3eed5c69 ("sched: Combine the last put_prev_task() and the
first set_next_task()"), we are now skipping the transition between
these two functions when the previous and the next tasks are the same.

As a result, the scx idle state of a CPU is updated only when
transitioning to or from the idle thread. While this is generally
correct, it can lead to uneven and inefficient core utilization in
certain scenarios [1].

A typical scenario involves proactive wake-ups: scx_bpf_pick_idle_cpu()
selects and marks an idle CPU as busy, followed by a wake-up via
scx_bpf_kick_cpu(), without dispatching any tasks. In this case, the CPU
continues running the idle thread, returns to idle, but remains marked
as busy, preventing it from being selected again as an idle CPU (until a
task eventually runs on it and releases the CPU).

For example, running a workload that uses 20% of each CPU, combined with
an scx scheduler using proactive wake-ups, results in the following core
utilization:

 CPU 0: 25.7%
 CPU 1: 29.3%
 CPU 2: 26.5%
 CPU 3: 25.5%
 CPU 4:  0.0%
 CPU 5: 25.5%
 CPU 6:  0.0%
 CPU 7: 10.5%

To address this, refresh the idle state also in pick_task_idle(), during
idle-to-idle transitions, but only trigger ops.update_idle() on actual
state changes to prevent unnecessary updates to the scx scheduler and
maintain balanced state transitions.

With this change in place, the core utilization in the previous example
becomes the following:

 CPU 0: 18.8%
 CPU 1: 19.4%
 CPU 2: 18.0%
 CPU 3: 18.7%
 CPU 4: 19.3%
 CPU 5: 18.9%
 CPU 6: 18.7%
 CPU 7: 19.3%

[1] https://github.com/sched-ext/scx/pull/1139

Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: switch class when preempted by higher priority scheduler</title>
<updated>2025-01-17T12:40:49Z</updated>
<author>
<name>Honglei Wang</name>
<email>jameshongleiwang@126.com</email>
</author>
<published>2025-01-08T02:33:28Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=e7960da6f2f438d907c17d463364dae6d242f775'/>
<id>urn:sha1:e7960da6f2f438d907c17d463364dae6d242f775</id>
<content type='text'>
[ Upstream commit 68e449d849fd50bd5e61d8bd32b3458dbd3a3df6 ]

ops.cpu_release() function, if defined, must be invoked when preempted by
a higher priority scheduler class task. This scenario was skipped in
commit f422316d7466 ("sched_ext: Remove switch_class_scx()"). Let's fix
it.

Fixes: f422316d7466 ("sched_ext: Remove switch_class_scx()")
Signed-off-by: Honglei Wang &lt;jameshongleiwang@126.com&gt;
Acked-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Replace rq_lock() to raw_spin_rq_lock() in scx_ops_bypass()</title>
<updated>2025-01-17T12:40:48Z</updated>
<author>
<name>Changwoo Min</name>
<email>changwoo@igalia.com</email>
</author>
<published>2025-01-08T15:08:06Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=d9e446dd63cee7161717a6a8414ba9c6435af764'/>
<id>urn:sha1:d9e446dd63cee7161717a6a8414ba9c6435af764</id>
<content type='text'>
[ Upstream commit 6268d5bc10354fc2ab8d44a0cd3b042d49a0417e ]

scx_ops_bypass() iterates all CPUs to re-enqueue all the scx tasks.
For each CPU, it acquires a lock using rq_lock() regardless of whether
a CPU is offline or the CPU is currently running a task in a higher
scheduler class (e.g., deadline). The rq_lock() is supposed to be used
for online CPUs, and the use of rq_lock() may trigger an unnecessary
warning in rq_pin_lock(). Therefore, replace rq_lock() to
raw_spin_rq_lock() in scx_ops_bypass().

Without this change, we observe the following warning:

===== START =====
[    6.615205] rq-&gt;balance_callback &amp;&amp; rq-&gt;balance_callback != &amp;balance_push_callback
[    6.615208] WARNING: CPU: 2 PID: 0 at kernel/sched/sched.h:1730 __schedule+0x1130/0x1c90
=====  END  =====

Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
Signed-off-by: Changwoo Min &lt;changwoo@igalia.com&gt;
Acked-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: initialize kit-&gt;cursor.flags</title>
<updated>2025-01-09T12:33:51Z</updated>
<author>
<name>Henry Huang</name>
<email>henry.hj@antgroup.com</email>
</author>
<published>2024-12-22T15:43:16Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=41db022612b6308b093a62bb37a62a29e248d095'/>
<id>urn:sha1:41db022612b6308b093a62bb37a62a29e248d095</id>
<content type='text'>
commit 35bf430e08a18fdab6eb94492a06d9ad14c6179b upstream.

struct bpf_iter_scx_dsq *it maybe not initialized.
If we didn't call scx_bpf_dsq_move_set_vtime and scx_bpf_dsq_move_set_slice
before scx_bpf_dsq_move, it would cause unexpected behaviors:
1. Assign a huge slice into p-&gt;scx.slice
2. Assign a invalid vtime into p-&gt;scx.dsq_vtime

Signed-off-by: Henry Huang &lt;henry.hj@antgroup.com&gt;
Fixes: 6462dd53a260 ("sched_ext: Compact struct bpf_iter_scx_dsq_kern")
Cc: stable@vger.kernel.org # v6.12
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix invalid irq restore in scx_ops_bypass()</title>
<updated>2025-01-09T12:33:50Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-12-11T21:01:51Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=786362ce60d79967875f43e0ba55ad7a5376c133'/>
<id>urn:sha1:786362ce60d79967875f43e0ba55ad7a5376c133</id>
<content type='text'>
commit 18b2093f4598d8ee67a8153badc93f0fa7686b8a upstream.

While adding outer irqsave/restore locking, 0e7ffff1b811 ("scx: Fix raciness
in scx_ops_bypass()") forgot to convert an inner rq_unlock_irqrestore() to
rq_unlock() which could re-enable IRQ prematurely leading to the following
warning:

  raw_local_irq_restore() called with IRQs enabled
  WARNING: CPU: 1 PID: 96 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40
  ...
  Sched_ext: create_dsq (enabling)
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : warn_bogus_irq_restore+0x30/0x40
  lr : warn_bogus_irq_restore+0x30/0x40
  ...
  Call trace:
   warn_bogus_irq_restore+0x30/0x40 (P)
   warn_bogus_irq_restore+0x30/0x40 (L)
   scx_ops_bypass+0x224/0x3b8
   scx_ops_enable.isra.0+0x2c8/0xaa8
   bpf_scx_reg+0x18/0x30
  ...
  irq event stamp: 33739
  hardirqs last  enabled at (33739): [&lt;ffff8000800b699c&gt;] scx_ops_bypass+0x174/0x3b8
  hardirqs last disabled at (33738): [&lt;ffff800080d48ad4&gt;] _raw_spin_lock_irqsave+0xb4/0xd8

Drop the stray _irqrestore().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Ihor Solodrai &lt;ihor.solodrai@pm.me&gt;
Link: http://lkml.kernel.org/r/qC39k3UsonrBYD_SmuxHnZIQLsuuccoCrkiqb_BT7DvH945A1_LZwE4g-5Pu9FcCtqZt4lY1HhIPi0homRuNWxkgo1rgP3bkxa0donw8kV4=@pm.me
Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
Cc: stable@vger.kernel.org # v6.12
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: add a missing rcu_read_lock/unlock pair at scx_select_cpu_dfl()</title>
<updated>2024-12-14T19:03:41Z</updated>
<author>
<name>Changwoo Min</name>
<email>multics69@gmail.com</email>
</author>
<published>2024-11-09T06:29:05Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=cd38a8f52940d41c2563d19c86f81e0b89374bb7'/>
<id>urn:sha1:cd38a8f52940d41c2563d19c86f81e0b89374bb7</id>
<content type='text'>
[ Upstream commit f39489fea677ad78ca4ce1ab2d204a6639b868dc ]

When getting an LLC CPU mask in the default CPU selection policy,
scx_select_cpu_dfl(), a pointer to the sched_domain is dereferenced
using rcu_read_lock() without holding rcu_read_lock(). Such an unprotected
dereference often causes the following warning and can cause an invalid
memory access in the worst case.

Therefore, protect dereference of a sched_domain pointer using a pair
of rcu_read_lock() and unlock().

[   20.996135] =============================
[   20.996345] WARNING: suspicious RCU usage
[   20.996563] 6.11.0-virtme #17 Tainted: G        W
[   20.996576] -----------------------------
[   20.996576] kernel/sched/ext.c:3323 suspicious rcu_dereference_check() usage!
[   20.996576]
[   20.996576] other info that might help us debug this:
[   20.996576]
[   20.996576]
[   20.996576] rcu_scheduler_active = 2, debug_locks = 1
[   20.996576] 4 locks held by kworker/8:1/140:
[   20.996576]  #0: ffff8b18c00dd348 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x4a0/0x590
[   20.996576]  #1: ffffb3da01f67e58 ((work_completion)(&amp;dev-&gt;power.work)){+.+.}-{0:0}, at: process_one_work+0x1ba/0x590
[   20.996576]  #2: ffffffffa316f9f0 (&amp;rcu_state.gp_wq){..-.}-{2:2}, at: swake_up_one+0x15/0x60
[   20.996576]  #3: ffff8b1880398a60 (&amp;p-&gt;pi_lock){-.-.}-{2:2}, at: try_to_wake_up+0x59/0x7d0
[   20.996576]
[   20.996576] stack backtrace:
[   20.996576] CPU: 8 UID: 0 PID: 140 Comm: kworker/8:1 Tainted: G        W          6.11.0-virtme #17
[   20.996576] Tainted: [W]=WARN
[   20.996576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   20.996576] Workqueue: pm pm_runtime_work
[   20.996576] Sched_ext: simple (disabling+all), task: runnable_at=-6ms
[   20.996576] Call Trace:
[   20.996576]  &lt;IRQ&gt;
[   20.996576]  dump_stack_lvl+0x6f/0xb0
[   20.996576]  lockdep_rcu_suspicious.cold+0x4e/0x96
[   20.996576]  scx_select_cpu_dfl+0x234/0x260
[   20.996576]  select_task_rq_scx+0xfb/0x190
[   20.996576]  select_task_rq+0x47/0x110
[   20.996576]  try_to_wake_up+0x110/0x7d0
[   20.996576]  swake_up_one+0x39/0x60
[   20.996576]  rcu_core+0xb08/0xe50
[   20.996576]  ? srso_alias_return_thunk+0x5/0xfbef5
[   20.996576]  ? mark_held_locks+0x40/0x70
[   20.996576]  handle_softirqs+0xd3/0x410
[   20.996576]  irq_exit_rcu+0x78/0xa0
[   20.996576]  sysvec_apic_timer_interrupt+0x73/0x80
[   20.996576]  &lt;/IRQ&gt;
[   20.996576]  &lt;TASK&gt;
[   20.996576]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   20.996576] RIP: 0010:_raw_spin_unlock_irqrestore+0x36/0x70
[   20.996576] Code: f5 53 48 8b 74 24 10 48 89 fb 48 83 c7 18 e8 11 b4 36 ff 48 89 df e8 99 0d 37 ff f7 c5 00 02 00 00 75 17 9c 58 f6 c4 02 75 2b &lt;65&gt; ff 0d 5b 55 3c 5e 74 16 5b 5d e9 95 8e 28 00 e8 a5 ee 44 ff 9c
[   20.996576] RSP: 0018:ffffb3da01f67d20 EFLAGS: 00000246
[   20.996576] RAX: 0000000000000002 RBX: ffffffffa4640220 RCX: 0000000000000040
[   20.996576] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa1c7b27b
[   20.996576] RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
[   20.996576] R10: 0000000000000001 R11: 000000000000021c R12: 0000000000000246
[   20.996576] R13: ffff8b1881363958 R14: 0000000000000000 R15: ffff8b1881363800
[   20.996576]  ? _raw_spin_unlock_irqrestore+0x4b/0x70
[   20.996576]  serial_port_runtime_resume+0xd4/0x1a0
[   20.996576]  ? __pfx_serial_port_runtime_resume+0x10/0x10
[   20.996576]  __rpm_callback+0x44/0x170
[   20.996576]  ? __pfx_serial_port_runtime_resume+0x10/0x10
[   20.996576]  rpm_callback+0x55/0x60
[   20.996576]  ? __pfx_serial_port_runtime_resume+0x10/0x10
[   20.996576]  rpm_resume+0x582/0x7b0
[   20.996576]  pm_runtime_work+0x7c/0xb0
[   20.996576]  process_one_work+0x1fb/0x590
[   20.996576]  worker_thread+0x18e/0x350
[   20.996576]  ? __pfx_worker_thread+0x10/0x10
[   20.996576]  kthread+0xe2/0x110
[   20.996576]  ? __pfx_kthread+0x10/0x10
[   20.996576]  ret_from_fork+0x34/0x50
[   20.996576]  ? __pfx_kthread+0x10/0x10
[   20.996576]  ret_from_fork_asm+0x1a/0x30
[   20.996576]  &lt;/TASK&gt;
[   21.056592] sched_ext: BPF scheduler "simple" disabled (unregistered from user space)

Signed-off-by: Changwoo Min &lt;changwoo@igalia.com&gt;
Acked-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: scx_bpf_dispatch_from_dsq_set_*() are allowed from unlocked context</title>
<updated>2024-12-05T13:01:35Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-11-09T19:40:25Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=acf588f9b6fb560e986365c6b175aaf589ef1f2a'/>
<id>urn:sha1:acf588f9b6fb560e986365c6b175aaf589ef1f2a</id>
<content type='text'>
[ Upstream commit 72b85bf6a7f6f6c38c39a1e5b04bc1da1bf5016e ]

4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
added four kfuncs for dispatching while iterating. They are allowed from the
dispatch and unlocked contexts but two of the kfuncs were only added in the
dispatch section. Add missing declarations in the unlocked section.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/ext: Remove sched_fork() hack</title>
<updated>2024-12-05T13:01:23Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2024-10-28T13:20:35Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=09162013082267af54bb39091b523a8daaa28955'/>
<id>urn:sha1:09162013082267af54bb39091b523a8daaa28955</id>
<content type='text'>
[ Upstream commit 0f0d1b8e5010bfe1feeb4d78d137e41946a5370d ]

Instead of solving the underlying problem of the double invocation of
__sched_fork() for idle tasks, sched-ext decided to hack around the issue
by partially clearing out the entity struct to preserve the already
enqueued node. A provided analysis and solution has been ignored for four
months.

Now that someone else has taken care of cleaning it up, remove the
disgusting hack and clear out the full structure. Remove the comment in the
structure declaration as well, as there is no requirement for @node being
the last element anymore.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/87ldy82wkc.ffs@tglx
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2024-11-15T17:59:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-15T17:59:51Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=d79944b0948c3a5e80229606e36281d6ef746b21'/>
<id>urn:sha1:d79944b0948c3a5e80229606e36281d6ef746b21</id>
<content type='text'>
Pull sched_ext fix from Tejun Heo:
 "One more fix for v6.12-rc7

  ops.cpu_acquire() was being invoked with the wrong kfunc mask allowing
  the operation to call kfuncs which shouldn't be allowed. Fix it by
  using SCX_KF_REST instead, which is trivial and low risk"

* tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
</content>
</entry>
<entry>
<title>sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST</title>
<updated>2024-11-14T18:50:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-11-14T18:50:58Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=a4af89cc50f3c1035c1e0dfb50948a23107f3e95'/>
<id>urn:sha1:a4af89cc50f3c1035c1e0dfb50948a23107f3e95</id>
<content type='text'>
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: David Vernet &lt;void@manifault.com&gt;
Cc: Zhao Mengmeng &lt;zhaomzhao@126.com&gt;
Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org
Fixes: 245254f7081d ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
</content>
</entry>
</feed>
