fix(sac): make temperature a property to fix checkpoint resume bug (#2877)

* fix(sac): make temperature a property to fix checkpoint resume bug

Temperature was stored as a plain float and not restored after loading
a checkpoint, causing incorrect loss computations until update_temperature()
was called. Changed to a property that always computes from log_alpha,
ensuring correct behavior after checkpoint loading.

* simplify docstrings
This commit is contained in:
Michel Aractingi
2026-01-30 12:23:22 +01:00
committed by GitHub
parent 3409ef0dc2
commit 04cbf669cf
3 changed files with 8 additions and 9 deletions
-3
View File
@@ -545,9 +545,6 @@ def add_actor_information_and_train(
training_infos["temperature_grad_norm"] = temp_grad_norm
training_infos["temperature"] = policy.temperature
# Update temperature
policy.update_temperature()
# Push policy to actors if needed
if time.time() - last_time_policy_pushed > policy_parameters_push_frequency:
push_actor_policy_to_queue(parameters_queue=parameters_queue, policy=policy)