Training
| ||
LeRobot Tutorial Catalog
- 1. Install Environment
- 2. Configure Parameters
- 3. Remote Control Operation
- 4. Record Dataset
- 5. Visualize Dataset
- 6. Replay Dataset
- 7. Training
- 8. Assessment
7. Training
- Note: The following operations are performed inside the container
- Note: You need to navigate to the lerobot directory to execute the following code
7.1 Parameter Description
The following parameters can be adjusted according to actual needs.
--policy.type: What model to use for training, start with act for basics. --local_files_only: Whether to use local data for training, the default is true, representing the use of local data. --wandb.enable: Whether to enable the wandb platform to upload training logs, the default is false, representing not using it. --resume: Whether to resume training, true means use it. --config_path: If you resume training, you need to provide the address of the training parameter file, such as "/home/ws/lerobot/outputs/train/act_roarm_m3_test/checkpoints/last/pretrained_model/train_config.json”
If you have uploaded your dataset to the Hugging Face Hub using --control.push_to_hub=true, you can pull your dataset online to train it on other devices.
7.2 Training Dataset Preparation
7.2.1 Compression
If data has not been uploaded to the hub, you need to package the dataset for subsequent training. The save location for the dataset is, for example, "/root/.cache/huggingface/lerobot".
tar czvf roarm_m3_datasets.tar.gz /root/.cache/huggingface/lerobot
Next, select the file and download it to your local device.
7.2.2 Decompression
Select the file and upload it to the root directory of the training environment.
Place the packaged dataset in the training container, go to the root directory, and decompress it to /root/.cache/huggingface/lerobot.
cd /root/../ && tar xzvf /root/roarm_m3_datasets.tar.gz
7.3 Windows System Local GPU Training
- Note: The Windows local GPU needs to be NVIDIA.
7.3.1 Training Environment Setup
On the Windows system, use docker desktop and the pre-configured Ubuntu 22.04 image RoArm-M3-AI-Kit-GPU-x86_64, and you do not need to install and configure the environment manually.
docker load -i RoArm-M3-AI-Kit-GPU-x86_64.tar
Using an image to create a container, where "--shm-size=4g” can be allocated based on the available GPU capacity on the system.
docker run --gpus all -it -p 2200:22 -p 9090:9090 --shm-size=4g --ulimit memlock=-1 --ulimit stack=67108864 --env NVIDIA_DRIVER_CAPABILITIES=all -w /home/ws/lerobot --name lerobot dudulrx0601/lerobot_gpu
Then, after entering the container, enable remote login.
service ssh start
Use MobaXterm to remotely log in to your local container.
Go to the LeRobot catalog.
cd /home/ws/lerobot
If you have not previously selected to upload a dataset, you can open http://127.0.0.1:9090 in your browser to visualize the dataset locally and confirm if it is the dataset you want to train on:
python lerobot/scripts/visualize_dataset_html.py \
--repo-id ${HF_USER}/roarm_m3_test \
--host 0.0.0.0
7.3.2 Training
Use Nvidia GPU for local dataset training:
python lerobot/scripts/train.py \
--dataset.repo_id=${HF_USER}/roarm_m3_test \
--policy.type=act \
--output_dir=outputs/train/act_roarm_m3_test \
--job_name=act_roarm_m3_test \
--local_files_only=true\
--wandb.enable=false
7.4 Jetson Orin System Local GPU Training
On the Ubuntu system, use the Ubuntu 22.04 image [RoArm-M3-AI-Kit-GPU-arm64] that we have configured with the LeRobot environment, and you don't need to install and configure the environment by yourself.
Use Jetson Orin GPU for local dataset training:
python lerobot/scripts/train.py \
--dataset.repo_id=${HF_USER}/roarm_m3_test \
--policy.type=act \
--output_dir=outputs/train/act_roarm_m3_test \
--job_name=act_roarm_m3_test \
--local_files_only=true\
--wandb.enable=false
