cleanup

2026-05-22 20:19:43 +00:00 · 2025-10-14 14:48:18 +02:00
parent d3f1ece680
commit 4170d1b6f1
2 changed files with 7 additions and 6 deletions
@@ -22,7 +22,6 @@ You can specify all parameters directly in the command without running `accelera
 accelerate launch \
  --multi_gpu \
  --num_processes=2 \
  --mixed_precision=fp16 \
  $(which lerobot-train) \
  --dataset.repo_id=${HF_USER}/my_dataset \
  --policy.type=act \
@@ -75,10 +74,6 @@ When you launch training with accelerate:
 3. **Gradient synchronization**: Gradients are synchronized across GPUs during backpropagation
 4. **Single process logging**: Only the main process logs to wandb and saves checkpoints
 ## Mixed Precision Training
 For faster training, you can enable mixed precision (fp16 or bf16). This is configured during `accelerate config` or by passing `--mixed_precision=fp16` to `accelerate launch`. LeRobot's `use_amp` setting is automatically handled when using accelerate.
 ## Learning Rate and Training Steps Scaling
 **Important:** LeRobot does **NOT** automatically scale learning rates or training steps based on the number of GPUs. This gives you full control over your training hyperparameters.
@@ -145,8 +145,14 @@ def train(cfg: TrainPipelineConfig, accelerator: Accelerator | None = None):
    # It will automatically detect if running in distributed mode or single-process mode
    # We set step_scheduler_with_optimizer=False to prevent accelerate from adjusting
    # the lr_scheduler steps based on the num_processes
    # We set find_unused_parameters=True to handle models with conditional computation paths
    if accelerator is None:
-        accelerator = Accelerator(step_scheduler_with_optimizer=False)
+        from accelerate.utils import DistributedDataParallelKwargs
        ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)
        accelerator = Accelerator(
            step_scheduler_with_optimizer=False,
            kwargs_handlers=[ddp_kwargs]
        )
    # Determine if this is the main process (for logging and checkpointing)
    # When using accelerate, only the main process should log to avoid duplicate outputs