diff --git a/docs/source/il_robots.mdx b/docs/source/il_robots.mdx index e0e2388e1..36f97b8fc 100644 --- a/docs/source/il_robots.mdx +++ b/docs/source/il_robots.mdx @@ -514,6 +514,83 @@ Additionally you can provide extra `tags` or specify a `license` for your model If your local computer doesn't have a powerful GPU you could utilize Google Colab to train your model by following the [ACT training notebook](./notebooks#training-act). +#### Train using Hugging Face Jobs + +Hugging Face jobs let's you easily select hardware and run the training in the cloud. So if you don't have a powerful GPU or you need more VRAM or just want to train a model much faster use HF Jobs! It's pay as you go and you simply pay for each second of use, you can see the pricing and additional information [here](https://huggingface.co/docs/hub/jobs). + +To run the training use this command: + + + +```bash +hf jobs run \ + --flavor a10g-small \ + --timeout 4h \ + --secrets HF_TOKEN \ + huggingface/lerobot-gpu:latest \ + -- \ + python -m lerobot.scripts.lerobot_train \ + --dataset.repo_id=username/dataset \ + --policy.type=act \ + --steps=5000 \ + --batch_size=16 \ + --policy.device=cuda \ + --policy.repo_id=username/your_policy \ + --log_freq=100 +``` + + + + +```python +from huggingface_hub import run_job, get_token + +run_name = "act_so101_hf_jobs" +dataset_id = "username/dataset" +user_hub_id = "username" + +command_args = [ + "python", "-m", "lerobot.scripts.lerobot_train", + "--dataset.repo_id", dataset_id, + "--policy.type", "act", + "--steps", "5000", + "--batch_size", "16", + "--num_workers", "4", + "--policy.device", "cuda", + "--log_freq", "100", + "--save_freq", "1000", + "--save_checkpoint", "true", + "--wandb.enable", "false", + "--policy.repo_id", f"{user_hub_id}/{run_name}" +] + +print(f"Submitting job '{run_name}' to Hugging Face Infrastructure...") + +job_info = run_job( + image="huggingface/lerobot-gpu:latest", + command=command_args, + flavor="a10g-small", + timeout="4h", + secrets={"HF_TOKEN": get_token()} +) + +print("\nšŸš€ Job successfully launched!") +print(f"šŸ”¹ Job ID: {job_info.id}") +print(f"šŸ”— Live UI Dashboard & Logs: {job_info.url}") +``` + + + + + +You can modify the ```--flavor``` to use different hardware, for example: ```t4-small```, ```a100-large```, ```h200```. Use ```hf jobs hardware``` to see the full list with pricing. +Depending on the model you want to train and the hardware you selected you can also modify the ```--batch_size``` and ```--number_of_workers```. +For longer training sessions increase the timeout. + +Once the training is started you can go to [Jobs](https://huggingface.co/settings/jobs) and see if your jobs is running as well as all the outputs. Sometimes it takes a few minutes to schedule your job so be patient. + +After training the model will be pushed to hub and you can use it as any other model with LeRobot. + #### Upload policy checkpoints Once training is done, upload the latest checkpoint with: