Commit graph

2967 commits

Author SHA1 Message Date
Krzysztof Kozlowski
c0a1ef9c5b thermal: of: Fix OF node leak in of_thermal_zone_find() error paths
Terminating for_each_available_child_of_node() loop requires dropping OF
node reference, so bailing out on errors misses this.  Solve the OF node
reference leak with scoped for_each_available_child_of_node_scoped().

Fixes: 3fd6d6e2b4 ("thermal/of: Rework the thermal device tree initialization")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-22 20:58:49 +02:00
Krzysztof Kozlowski
662b52b761 thermal: of: Fix OF node leak in thermal_of_zone_register()
thermal_of_zone_register() calls of_thermal_zone_find() which will
iterate over OF nodes with for_each_available_child_of_node() to find
matching thermal zone node.  When it finds such, it exits the loop and
returns the node.  Prematurely ending for_each_available_child_of_node()
loops requires dropping OF node reference, thus success of
of_thermal_zone_find() means that caller must drop the reference.

Fixes: 3fd6d6e2b4 ("thermal/of: Rework the thermal device tree initialization")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-22 20:58:49 +02:00
Krzysztof Kozlowski
afc954fd22 thermal: of: Fix OF node leak in thermal_of_trips_init() error path
Terminating for_each_child_of_node() loop requires dropping OF node
reference, so bailing out after thermal_of_populate_trip() error misses
this.  Solve the OF node reference leak with scoped
for_each_child_of_node_scoped().

Fixes: d0c75fa2c1 ("thermal/of: Initialize trip points separately")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240814195823.437597-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-22 20:58:49 +02:00
Yang Ruibin
57df60e1f9 thermal/debugfs: Fix the NULL vs IS_ERR() confusion in debugfs_create_dir()
The debugfs_create_dir() return value is never NULL, it is either a
valid pointer or an error one.

Use IS_ERR() to check it.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Fixes: 755113d767 ("thermal/debugfs: Add thermal cooling device debugfs information")
Signed-off-by: Yang Ruibin <11162571@vivo.com>
Link: https://patch.msgid.link/20240821075934.12145-1-11162571@vivo.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-21 20:38:44 +02:00
Rafael J. Wysocki
6e6f58a170 thermal: gov_bang_bang: Use governor_data to reduce overhead
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.

For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.

However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.

To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
2024-08-16 13:13:59 +02:00
Rafael J. Wysocki
5f64b4a1ab thermal: gov_bang_bang: Add .manage() callback
After recent changes, the Bang-bang governor may not adjust the
initial configuration of cooling devices to the actual situation.

Namely, if a cooling device bound to a certain trip point starts in
the "on" state and the thermal zone temperature is below the threshold
of that trip point, the trip point may never be crossed on the way up
in which case the state of the cooling device will never be adjusted
because the thermal core will never invoke the governor's
.trip_crossed() callback.  [Note that there is no issue if the zone
temperature is at the trip threshold or above it to start with because
.trip_crossed() will be invoked then to indicate the start of thermal
mitigation for the given trip.]

To address this, add a .manage() callback to the Bang-bang governor
and use it to ensure that all of the thermal instances managed by the
governor have been initialized properly and the states of all of the
cooling devices involved have been adjusted to the current zone
temperature as appropriate.

Fixes: 530c932bdf ("thermal: gov_bang_bang: Use .trip_crossed() instead of .throttle()")
Link: https://lore.kernel.org/linux-pm/1bfbbae5-42b0-4c7d-9544-e98855715294@piie.net/
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/8419356.T7Z3S40VBb@rjwysocki.net
2024-08-16 13:13:49 +02:00
Rafael J. Wysocki
84248e35d9 thermal: gov_bang_bang: Split bang_bang_control()
Move the setting of the thermal instance target state from
bang_bang_control() into a separate function that will be also called
in a different place going forward.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net
2024-08-16 13:13:42 +02:00
Rafael J. Wysocki
b9b6ee6fe2 thermal: gov_bang_bang: Call __thermal_cdev_update() directly
Instead of clearing the "updated" flag for each cooling device
affected by the trip point crossing in bang_bang_control() and
walking all thermal instances to run thermal_cdev_update() for all
of the affected cooling devices, call __thermal_cdev_update()
directly for each of them.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/13583081.uLZWGnKmhe@rjwysocki.net
2024-08-16 13:13:33 +02:00
Rafael J. Wysocki
d955d7cecb Merge branch 'thermal-intel'
Merge fixes for the int340x thermal driver handling of MSI IRQs:

 - Fix MSI error path cleanup in int340x, allow it to work with a
   subset of thermal MSI IRQs if some of them are not working and
   make it free all MSI IRQs on module exit (Srinivas Pandruvada).

* thermal-intel:
  thermal: intel: int340x: Free MSI IRQ vectors on module exit
  thermal: intel: int340x: Allow limited thermal MSI support
  thermal: intel: int340x: Fix kernel warning during MSI cleanup
2024-07-31 12:31:27 +02:00
Rafael J. Wysocki
f844793f2d thermal: trip: Avoid skipping trips in thermal_zone_set_trips()
Say there are 3 trip points A, B, C sorted in ascending temperature
order with no hysteresis.  If the zone temerature is exactly equal to
B, thermal_zone_set_trips() will set the boundaries to A and C and the
hardware will not catch any crossing of B (either way) until either A
or C is crossed and the boundaries are changed.

To avoid that, use non-strict inequalities when comparing the trip
threshold to the zone temperature in thermal_zone_set_trips().

In the example above, it will cause both boundaries to be set to B,
which is desirable because an interrupt will trigger when the zone
temperature becomes different from B regardless of which way it goes.
That will allow a new interval to be set depending on the direction of
the zone temperature change.

Fixes: 893bae9223 ("thermal: trip: Make thermal_zone_set_trips() use trip thresholds")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12509184.O9o76ZdvQC@rjwysocki.net
2024-07-31 12:30:58 +02:00
Srinivas Pandruvada
f8ce49be27 thermal: intel: int340x: Free MSI IRQ vectors on module exit
On module exit call proc_thermal_free_msi() to free vectors allocated by
pci_alloc_irq_vectors().

Fixes: 7a9a8c5faf ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240723140228.865919-4-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30 16:36:58 +02:00
Srinivas Pandruvada
b85a2d300a thermal: intel: int340x: Allow limited thermal MSI support
On some Lunar Lake pre-production systems, not all the MSI thermal
vectors are valid. In that case instead of failing module load, continue
with partial thermal interrupt support.

pci_alloc_irq_vectors() can return less than expected maximum vectors.
In that case call devm_request_threaded_irq() only for current maximum
vectors.

Fixes: 7a9a8c5faf ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake")
Reported-by: Yijun Shen <Yijun.Shen@dell.com>
Tested-by: Yijun Shen <Yijun.Shen@dell.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240723140228.865919-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30 16:36:51 +02:00
Srinivas Pandruvada
b630a04121 thermal: intel: int340x: Fix kernel warning during MSI cleanup
On some pre-production Lunar Lake systems, there is a kernel
warning:

remove_proc_entry: removing non-empty directory 'irq/172'
WARNING: CPU: 0 PID: 501 at fs/proc/generic.c:717
	remove_proc_entry+0x1b4/0x1e0
...
...
remove_proc_entry+0x1b4/0x1e0
report_bug+0x182/0x1b0
handle_bug+0x51/0xa0
exc_invalid_op+0x18/0x80
asm_exc_invalid_op+0x1b/0x20
remove_proc_entry+0x1b4/0x1e0
remove_proc_entry+0x1b4/0x1e0
unregister_irq_proc+0xf2/0x120
free_desc+0x41/0xe0
irq_domain_free_irqs+0x138/0x1c0
irq_free_descs+0x52/0x80
irq_domain_free_irqs+0x151/0x1c0
msi_domain_free_locked.part.0+0x17e/0x1c0
msi_domain_free_irqs_all_locked+0x74/0xc0
pci_msi_teardown_msi_irqs+0x50/0x60
pci_free_msi_irqs+0x12/0x40
pci_free_irq_vectors+0x58/0x70

On these systems, not all the MSI thermal vectors are valid. This causes
devm_request_threaded_irq() to fail for some vectors. As part of the
clean up on this error, pci_free_irq_vectors() is called without calling
devm_free_irq(). This causes the above warning.

Add a function proc_thermal_free_msi() to call devm_free_irq() for all
successfully registered IRQ handlers, then call pci_free_irq_vectors().
Call this function for MSI cleanup.

Fixes: 7a9a8c5faf ("thermal: intel: int340x: Support MSI interrupt for Lunar Lake")
Reported-by: Yijun Shen <Yijun.shen@dell.com>
Tested-by: Yijun Shen <Yijun.shen@dell.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://patch.msgid.link/20240723140228.865919-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-30 16:36:39 +02:00
Rafael J. Wysocki
f7c1b0e4ae thermal: core: Back off when polling thermal zones on errors
Commit a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone
temperature is invalid") introduced a polling mechanism by which the
thermal core attampts to get a valid temperature value for thermal zones
where the .get_temp() callback returns errors to start with (for
example, due to initialization ordering woes).  However, this polling is
carried out periodically ad infinitum and every iteration of it causes
a message to be printed to the kernel log which means a lot of log noise
on systems where there are thermal zones that never get ready for some
reason.  It is also not really useful to continuously poll thermal zones
that never respond.

To address this, modify the thermal core to increase the delay between
consecutive thermal zone temperature checks after every check that fails
until it reaches a certain maximum value.  At that point, the thermal
zone in question will be disabled, but user space will be able to
reenable it if it believes that the failure is transient.

Also change the code to print messages regarding failed temperature
checks to the kernel log only twice, once when the thermal zone's
.get_temp() callback returns an error for the first time and once when
disabling the given thermal zone.  In addition, a dev_crit() message
will be printed at that point if the given thermal zone contains a
critical trip point to notify the system operator about the situation.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Link: https://lore.kernel.org/linux-acpi/CAGnHSE=RyPK++UG0-wAtVKgeJxe0uzFYgLxm+RUOKKoQquW=Ow@mail.gmail.com/
Reported-by: Tom Yan <tom.ty89@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2962033.e9J7NaK4W3@rjwysocki.net
2024-07-24 12:40:23 +02:00
Rafael J. Wysocki
e5f98896ef thermal: trip: Split thermal_zone_device_set_mode()
Pull a wrapper around thermal zone .change_mode() callback out of
thermal_zone_device_set_mode() because it will be used elsewhere
subsequently.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2206793.irdbgypaU6@rjwysocki.net
2024-07-23 14:18:15 +02:00
Rafael J. Wysocki
e528be3c87 thermal: core: Allow thermal zones to tell the core to ignore them
The iwlwifi wireless driver registers a thermal zone that is only needed
when the network interface handled by it is up and it wants that thermal
zone to be effectively ignored by the core otherwise.

Before commit a8a2617744 ("thermal: core: Call monitor_thermal_zone()
if zone temperature is invalid") that could be achieved by returning
an error code from the thermal zone's .get_temp() callback because the
core did not really handle errors returned by it almost at all.
However, commit a8a2617744 made the core attempt to recover from the
situation in which the temperature of a thermal zone cannot be
determined due to errors returned by its .get_temp() and is always
invalid from the core's perspective.

That was done because there are thermal zones in which .get_temp()
returns errors to start with due to some difficulties related to the
initialization ordering, but then it will start to produce valid
temperature values at one point.

Unfortunately, the simple approach taken by commit a8a2617744,
which is to poll the thermal zone periodically until its .get_temp()
callback starts to return valid temperature values, is at odds with
the special thermal zone in iwlwifi in which .get_temp() may always
return an error because its network interface may always be down.  If
that happens, every attempt to invoke the thermal zone's .get_temp()
callback resulting in an error causes the thermal core to print a
dev_warn() message to the kernel log which is super-noisy.

To address this problem, make the core handle the case in which
.get_temp() returns 0, but the temperature value returned by it
is not actually valid, in a special way.  Namely, make the core
completely ignore the invalid temperature value coming from
.get_temp() in that case, which requires folding in
update_temperature() into its caller and a few related changes.

On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0
and put THERMAL_TEMP_INVALID into the temperature return memory
location instead of returning an error when the firmware is not
running or it is not of the right type.

Also, to clearly separate the handling of invalid temperature
values from the thermal zone initialization, introduce a special
THERMAL_TEMP_INIT value specifically for the latter purpose.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net
[ rjw: Rebased on top of the current mainline ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-18 13:35:55 +02:00
Rafael J. Wysocki
281cfec53b Merge branch 'thermal-intel'
Merge updates of Intel thermal drivers for 6.11-rc1:

 - Switch Intel thermal drivers to new Intel CPU model defines (Tony
   Luck).

 - Clean up the int3400 and int3403 drivers (Erick Archer and David Alan
   Gilbert).

 - Improve intel_pch_thermal kernel log messages printed during suspend
   to idle (Zhang Rui).

 - Make the intel_tcc_cooling driver use a model-specific bitmask for
   TCC offset (Ricardo Neri).

 - Add DLVR and MSI interrupt support for the Lunar Lake platform to the
   int340x thermal driver (Srinivas Pandruvada).

 - Enable workload type hints (WLT) support and power floor interrupt
   support for the Lunar Lake platform in int340x ((Srinivas Pandruvada).

 - Make the HFI thermal driver use package scope for HFI instances as
   per the Intel SDM (Zhang Rui).

* thermal-intel:
  thermal: intel: hfi: Give HFI instances package scope
  thermal: intel: int340x: Enable WLT and power floor support for Lunar Lake
  thermal: intel: int340x: Support MSI interrupt for Lunar Lake
  thermal: intel: int340x: Remove unnecessary calls to free irq
  thermal: intel: int340x: Add DLVR support for Lunar Lake
  thermal: intel: int340x: Capability to map user space to firmware values
  thermal: intel: int340x: Cleanup of DLVR sysfs on driver remove
  thermal: intel: intel_tcc_cooling: Use a model-specific bitmask for TCC offset
  thermal: intel: intel_tcc: Add model checks for temperature registers
  thermal: intel: intel_pch: Improve cooling log
  thermal: int3403: remove unused struct 'int3403_performance_state'
  thermal: int3400: Use sizeof(*pointer) instead of sizeof(type)
  thermal: intel: intel_soc_dts_thermal: Switch to new Intel CPU model defines
  thermal: intel: intel_tcc_cooling: Switch to new Intel CPU model defines
2024-07-15 20:44:31 +02:00
Rafael J. Wysocki
ab33da3a22 Merge branch 'thermal-core'
Merge updates related to the thermal core for 6.11-rc1:

 - Redesign the .set_trip_temp() thermal zone callback to take a trip
   pointer instead of a trip ID and update its users (Rafael Wysocki).

 - Avoid using invalid combinations of polling_delay and passive_delay
   thermal zone parameters (Rafael Wysocki).

 - Update a cooling device registration function to take a const
   argument (Krzysztof Kozlowski).

 - Make the uniphier thermal driver use thermal_zone_for_each_trip() for
   walking trip points (Rafael Wysocki).

* thermal-core:
  thermal: core: Add sanity checks for polling_delay and passive_delay
  thermal: trip: Fold __thermal_zone_get_trip() into its caller
  thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback
  thermal: imx: Drop critical trip check from imx_set_trip_temp()
  thermal: trip: Add conversion macros for thermal trip priv field
  thermal: helpers: Introduce thermal_trip_is_bound_to_cdev()
  thermal: core: Change passive_delay and polling_delay data type
  thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()
  thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip points
2024-07-15 20:43:21 +02:00
Raphael Gallais-Pou
e61cc85edb thermal/drivers/sti: Cleanup code related to stih416
"st,stih416-mpe-thermal" compatible seems to appear nowhere in the
device-tree nor in the documentation.
Remove compatible and related code.

Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com>
Link: https://lore.kernel.org/r/20240708161840.102004-1-rgallaispou@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
d5c38eec5d thermal/drivers/generic-adc: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-12-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
f637bfe26c thermal/drivers/generic-adc: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-11-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
bc55630c65 thermal/drivers/qcom-tsens: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-10-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
ecfee9176b thermal/drivers/qcom-spmi-adc-tm5: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-9-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:41 +02:00
Krzysztof Kozlowski
d0b297e76b thermal/drivers/imx: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-8-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
e9ac90242b thermal/drivers/imx: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-7-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
3e1a0680bb thermal/drivers/hisi: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-6-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
ca6176693f thermal/drivers/exynos: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-5-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
4a6cf76edf thermal/drivers/exynos: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.  While touching devm_kzalloc(), use preferred sizeof(*)
syntax.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-4-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
9d55cb3ba3 thermal/drivers/broadcom: Simplify with dev_err_probe()
Error handling in probe() can be a bit simpler with dev_err_probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-3-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
fd972a1745 thermal/drivers/broadcom: Simplify probe() with local dev variable
Simplify the probe() function by using local 'dev' instead of
&pdev->dev.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-2-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Krzysztof Kozlowski
e90c369cc2 thermal/drivers/broadcom: Fix race between removal and clock disable
During the probe, driver enables clocks necessary to access registers
(in get_temp()) and then registers thermal zone with managed-resources
(devm) interface.  Removal of device is not done in reversed order,
because:
1. Clock will be disabled in driver remove() callback - thermal zone is
   still registered and accessible to users,
2. devm interface will unregister thermal zone.

This leaves short window between (1) and (2) for accessing the
get_temp() callback with disabled clock.

Fix this by enabling clock also via devm-interface, so entire cleanup
path will be in proper, reversed order.

Fixes: 8454c8c09c ("thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-1-241644e2b6e0@linaro.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:40 +02:00
Chen-Yu Tsai
a5d4afb92e thermal/drivers/mediatek/lvts_thermal: Provide default calibration data
On some pre-production hardware, the SoCs do not contain calibration
data for the thermal sensors. The downstream drivers provide default
values that sort of work, instead of having the thermal sensors not
work at all.

Port the default values to the upstream driver. These values are from
the ChromeOS kernels, which sadly do not cover the MT7988.

Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20240620092306.2352606-1-wenst@chromium.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Julien Panis
be3e224ec5 dt-bindings: thermal: mediatek: Fix thermal zone definitions for MT8188
Fix thermal zone names for consistency with the other SoCs:
- GPU0 must be used as the first GPU item.
- SOCx deal with audio DSP, video, and infra subsystems.

The naming must be fixed "atomically" so compilation does not break.
As a result, the change is made in the dt-bindings and in the LVTS
driver within a single commit, despite the checkpatch warning.

The definitions can be safely modified here because they are used only
in the LVTS driver, which is modified accordingly, and have not yet
been included in a released kernel.

Fixes: 78c88534e5 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8188")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-2-8c8e3c7a3643@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Julien Panis
6b04928e83 dt-bindings: thermal: mediatek: Fix thermal zone definition for MT8186
Fix a thermal zone name for consistency with the other SoCs:
MFG contains GPU, the latter is more specific and must be used here.

The naming must be fixed "atomically" so compilation does not break.
As a result, the change is made in the dt-bindings and in the LVTS
driver within a single commit, despite the checkpatch warning.

The definition can be safely modified here because it is used only
in the LVTS driver, which is modified accordingly, and has not yet
been included in a released kernel.

Fixes: a2ca202350 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8186")
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-1-8c8e3c7a3643@baylibre.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Théo Lebrun
854a8e208c thermal/drivers/k3_j72xx_bandgap: Implement suspend/resume support
This add suspend-to-ram support.

The derived_table is kept-as is, so the resume is only about
pm_runtime_* calls and restoring the same registers as the probe.

Extract the hardware initialization procedure to a function called at
both probe-time & resume-time.

The probe-time loop is split in two to ensure doing the hardware
initialization before registering thermal zones. That ensures our
callbacks cannot be called while in bad state.

The 100ms delay in the hardware initialization sequence was removed.
It was initially added to be sure the thresholds are programmed before
enabling the interrupt, but in fact it's not needed (tested on J7200
platform).

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
Acked-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
Link: https://lore.kernel.org/r/20240425153238.498750-1-thomas.richard@bootlin.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15 13:31:39 +02:00
Niklas Söderlund
f996e2b17a thermal/drivers/renesas/rcar: Add dependency on OF
The R-Car thermal driver depends on OF, describe this.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240506154011.344324-3-niklas.soderlund+renesas@ragnatech.se
2024-07-15 13:31:39 +02:00
Niklas Söderlund
9d617949d4 thermal/drivers/renesas: Group all renesas thermal drivers together
Move all Renesas thermal drivers to a vendor specific directory.

All drivers are moved verbatim apart from the updated include path for
thermal_hwmon.h.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20240506154011.344324-2-niklas.soderlund+renesas@ragnatech.se
2024-07-15 13:31:38 +02:00
Rafael J. Wysocki
3669716401 thermal: core: Add sanity checks for polling_delay and passive_delay
If polling_delay is nonzero and passive_delay is greater than
polling_delay, the thermal zone temperature will be updated less
often when tz->passive is nonzero, which is not as expected.  Make
the thermal zone registration fail with -EINVAL in that case as
this is a clear thermal zone configuration mistake.

If polling_delay is nonzero and passive_delay is 0, which is regarded
as a valid thermal zone configuration, the thermal zone will use polling
except when tz->passive is nonzero.  However, the expected behavior in
that case is to continue temperature polling with the same delay value
regardless of tz->passive, so set passive_delay to the polling_delay
value then.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/5802156.DvuYhMxLoT@rjwysocki.net
2024-07-12 15:14:57 +02:00
Rafael J. Wysocki
5b674baa59 thermal: trip: Fold __thermal_zone_get_trip() into its caller
Because __thermal_zone_get_trip() is only called by thermal_zone_get_trip()
now, fold the former into the latter.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/22339769.EfDdHjke4D@rjwysocki.net
2024-07-12 15:14:56 +02:00
Rafael J. Wysocki
0728c81087 thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback
Out of several drivers implementing the .set_trip_temp() thermal zone
operation, three don't actually use the trip ID argument passed to it,
two call __thermal_zone_get_trip() to get a struct thermal_trip
corresponding to the given trip ID, and the other use the trip ID as an
index into their own data structures with the assumption that it will
always match the ordering of entries in the trips table passed to the
core during thermal zone registration, which is fragile and not really
guaranteed.

Even though the trip IDs used by the core are in fact their indices in the
trips table passed to it by the thermal zone creator, that is purely a
matter of convenience and should not be relied on for correctness.

For this reason, modify trip_point_temp_store() to pass a (const) trip
pointer to .set_trip_temp() and adjust the drivers implementing it
accordingly.

This helps to simplify the drivers invoking __thermal_zone_get_trip()
from their .set_trip_temp() callback functions because they will not
need to do it now and the other drivers can store their internal
trip indices in the priv field in struct thermal_trip and their
.set_trip_temp() callback functions can get those indices from there.

The intel_quark_dts thermal driver can instead use the trip type to
determine the requisite trip index.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8392906.T7Z3S40VBb@rjwysocki.net
[ rjw: Add missing colon and 2 empty code lines ]
[ rjw: Add missing change in imx_thermal.c and adjust the changelog ]
[ rjw: Drop an unused local variable ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-12 15:14:01 +02:00
Rafael J. Wysocki
81caa5d519 thermal: imx: Drop critical trip check from imx_set_trip_temp()
Because the IMX thermal driver does not flag its critical trip as
writable, imx_set_trip_temp() will never be invoked for it and so the
critical trip check can be dropped from there.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2272035.iZASKD2KPV@rjwysocki.net
2024-07-10 13:33:33 +02:00
Rafael J. Wysocki
462be1c353 Merge back thermal control material for 6.11. 2024-07-10 13:01:38 +02:00
Zhang Rui
b755367602 thermal: intel: hfi: Give HFI instances package scope
The Intel Software Developer's Manual defines the scope of HFI (registers
and memory buffer) as a package. Use package scope(*) in the software
representation of an HFI instance.

Using die scope in HFI instances has the effect of creating multiple
conflicting instances for the same package: each instance allocates its
own memory buffer and configures the same package-level registers.
Specifically, only one of the allocated memory buffers can be set in the
MSR_IA32_HW_FEEDBACK_PTR register. CPUs get incorrect HFI data from the
table.

The problem does not affect current HFI-capable platforms because they
all have single-die processors.

(*) We used die scope for HFI instances because there had been
    processors with packages enumerated as dies. None of those systems
    supported HFI, though. If such a system emerged, it would need to
    be quirked.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://patch.msgid.link/20240703055445.125362-1-rui.zhang@intel.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-09 18:29:07 +02:00
Rafael J. Wysocki
d1fbf18a0f thermal: trip: Add conversion macros for thermal trip priv field
Some drivers will need to store integers in the priv field of struct
thermal_trip, so add conversion macros for doing this in a consistent
way and switch over the int340x_thermal driver that already does it and
uses custom conversion functions to using the new macros.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3297884.aeNJFYEL58@rjwysocki.net
2024-07-09 18:22:59 +02:00
Rafael J. Wysocki
463b86fed2 thermal: helpers: Introduce thermal_trip_is_bound_to_cdev()
Introduce a new helper function thermal_trip_is_bound_to_cdev() for
checking whether or not a given trip point has been bound to a given
cooling device.

The primary user of it will be the Tegra thermal driver.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/13545762.uLZWGnKmhe@rjwysocki.net
2024-07-09 18:22:59 +02:00
Rafael J. Wysocki
d05374dee2 thermal: core: Change passive_delay and polling_delay data type
It is better to use unsigned int as the data type for the passive_delay
and polling_delay arguments of thermal_zone_device_register_with_trips()
because they are implicitly cast to unsigned int anyway in
thermal_set_delay_jiffies() and if they happen to be negative at that
point, the resulting behavior may not be as desired.

Update the thermal_zone_device_register_with_trips() definition
accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/5803791.DvuYhMxLoT@rjwysocki.net
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-09 18:19:24 +02:00
Rafael J. Wysocki
94eacc1c58 thermal: core: Fix list sorting in __thermal_zone_device_update()
The order in which lists are sorted in __thermal_zone_device_update()
is reverse with respect to what it should be due to a mistake in
thermal_trip_notify_cmp().

Fix it and observe that it is not necessary to sort the lists in
different orders.  They can both be sorted in ascending order if
way_down_list is walked in reverse order which allows the code to
be slightly more straightforward (and less prone to silly mistakes).

Fixes: 7454f2c42c ("thermal: core: Sort trip point crossing notifications by temperature")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12481676.O9o76ZdvQC@rjwysocki.net
2024-07-08 17:24:22 +02:00
Rafael J. Wysocki
a8a2617744 thermal: core: Call monitor_thermal_zone() if zone temperature is invalid
Commit 202aa0d4bb ("thermal: core: Do not call handle_thermal_trip()
if zone temperature is invalid") caused __thermal_zone_device_update()
to return early if the current thermal zone temperature was invalid.

This was done to avoid running handle_thermal_trip() and governor
callbacks in that case which led to confusion.  However, it went too
far because monitor_thermal_zone() still needs to be called even when
the zone temperature is invalid to ensure that it will be updated
eventually in case thermal polling is enabled and the driver has no
other means to notify the core of zone temperature changes (for example,
it does not register an interrupt handler or ACPI notifier).

Also if the .set_trips() zone callback is expected to set up monitoring
interrupts for a thermal zone, it has to be provided with valid
boundaries and that can only happen if the zone temperature is known.

Accordingly, to ensure that __thermal_zone_device_update() will
run again after a failing zone temperature check, make it call
monitor_thermal_zone() regardless of whether or not the zone
temperature is valid and make the latter schedule a thermal zone
temperature update if the zone temperature is invalid even if
polling is not enabled for the thermal zone.

Fixes: 202aa0d4bb ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid")
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2764814.mvXUDI8C0e@rjwysocki.net
[ rjw: Changed THERMAL_RECHECK_DELAY_MS to 250 ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04 19:01:59 +02:00
Krzysztof Kozlowski
4acab508eb thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()
The 'type' string passed to thermal_of_cooling_device_register() is a
'const char *', so do the same in the devm interface.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240703083141.96013-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04 14:25:20 +02:00
Nícolas F. R. A. Prado
aaa18ff54b thermal: gov_power_allocator: Return early in manage if trip_max is NULL
Commit da781936e7 ("thermal: gov_power_allocator: Allow binding
without trip points") allowed the governor to bind even when trip_max
is NULL. This allows a NULL pointer dereference to happen in the manage
callback.

Add an early return to prevent it, since the governor is expected to not do
anything in this case.

Fixes: da781936e7 ("thermal: gov_power_allocator: Allow binding without trip points")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://patch.msgid.link/20240702-power-allocator-null-trip-max-v1-1-47a60dc55414@collabora.com
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04 13:35:50 +02:00