fix(sac): make temperature a property to fix checkpoint resume bug (#2877)

* fix(sac): make temperature a property to fix checkpoint resume bug Temperature was stored as a plain float and not restored after loading a checkpoint, causing incorrect loss computations until update_temperature() was called. Changed to a property that always computes from log_alpha, ensuring correct behavior after checkpoint loading. * simplify docstrings
2026-05-21 19:49:49 +00:00 · 2026-01-30 12:23:22 +01:00
parent 3409ef0dc2
commit 04cbf669cf
3 changed files with 8 additions and 9 deletions
@@ -545,9 +545,6 @@ def add_actor_information_and_train(
                training_infos["temperature_grad_norm"] = temp_grad_norm
                training_infos["temperature"] = policy.temperature

-                # Update temperature
-                policy.update_temperature()
-
        # Push policy to actors if needed
        if time.time() - last_time_policy_pushed > policy_parameters_push_frequency:
            push_actor_policy_to_queue(parameters_queue=parameters_queue, policy=policy)