Unitree RL Lab Official Example Training Guide

March 27, 2026

This note summarizes the basic workflow for the official Unitree-G1-29dof-Velocity example in the unitree_rl_lab repository. The immediate goal is to get the G1 humanoid example running correctly before moving on to parameter tuning, policy export, and deployment on real hardware.

1. Goal

Run the official velocity-tracking example for the G1 29-DoF humanoid. The core task name is:

Unitree-G1-29dof-Velocity

The minimum acceptance criteria are:

the G1 robot model loads correctly
the Unitree-G1-29dof-Velocity training task starts successfully
training can be resumed from a checkpoint

2. Key prerequisite

Before starting, make sure that the versions of Isaac Lab and Isaac Sim match. If neither Isaac Sim nor Isaac Lab is installed yet, refer to the installation guide.

version_match

Note: Isaac Sim is the underlying simulation platform, built on PhysX. It provides the physical world but is not responsible for training. Isaac Lab is the training framework built on top of Isaac Sim; within the provided physics simulation environment, it defines tasks and trains policies with reinforcement learning to produce .pt files. MuJoCo, by contrast, is another independent physics world. It is used to load the exported ONNX and deploy.yaml files and test whether the policy truly generalizes under different physical dynamics, rather than merely adapting to the specific physics characteristics of Isaac Sim.

3. Clone the official repository

git clone https://github.com/unitreerobotics/unitree_rl_lab.git
cd unitree_rl_lab

4. Install `unitree_rl_lab`

First activate the environment used for Isaac Lab:

conda activate env_isaaclab

Then run the following command inside the unitree_rl_lab directory:

./unitree_rl_lab.sh -i

After installation completes, open a new shell so that environment variables and shell completion are refreshed correctly.

5. Download robot model assets

The official repository provides two options:

Option 1: use USD assets
Option 2: use URDF assets

If the installed versions of Isaac Sim / Isaac Lab are relatively recent, the official recommendation is generally to use the URDF route. A common approach is to clone unitree_ros directly:

git clone https://github.com/unitreerobotics/unitree_ros.git

If unitree_model is used instead, make sure git lfs is enabled:

sudo apt update
sudo apt install git-lfs
git lfs install
git clone https://github.com/unitreerobotics/unitree_model.git

Then edit unitree.py under the unitree_rl_lab directory:

nano source/unitree_rl_lab/unitree_rl_lab/assets/robots/unitree.py

Update the paths so they point to the actual locations on the local machine, for example:

UNITREE_ROS_DIR = "</home/your-name/repos/unitree_ros/unitree_ros>"
UNITREE_MODEL_DIR = "</home/your-name/repos/unitree_usd>"

In unitree.py, different asset-loading modes can be selected for different task models. For example:

UNITREE_G1_29DOF_CFG = UnitreeArticulationCfg(
    spawn=UnitreeUrdfFileCfg(
        asset_path=f"{UNITREE_ROS_DIR}/robots/g1_description/g1_29dof_rev_1_0.urdf",
    ),
    # spawn=UnitreeUsdFileCfg(
    #     usd_path=f"{UNITREE_MODEL_DIR}/G1/29dof/usd/g1_29dof_rev_1_0/g1_29dof_rev_1_0.usd",
    # ),

6. Verify that the task is registered

After installation and asset-path configuration, it is better to confirm task registration before starting training. Run the following command from the unitree_rl_lab directory:

./unitree_rl_lab.sh -l

Under normal conditions, the output should list multiple Unitree-related tasks. The one discussed here is:

Unitree-G1-29dof-Velocity

7. Run the official G1 training example

Once the task appears in the list, training can be launched directly. Run the following command from the unitree_rl_lab repository root:

PYTHONPATH=$PWD/source/unitree_rl_lab python scripts/rsl_rl/train.py \
  --task Unitree-G1-29dof-Velocity \
  --headless

It is usually better to start with headless mode because it places fewer requirements on the local graphical environment and is more convenient for verifying that the training pipeline works end to end.

If training starts successfully, the terminal will print output similar to the following:

################################################################################
                         Learning iteration 3032/52300

                            Total steps: 72056832
                       Steps per second: 85467
                        Collection time: 1.083s
                          Learning time: 0.067s
                        Mean value loss: 0.0059
                    Mean surrogate loss: -0.0096
                      Mean entropy loss: 9.5192
                            Mean reward: 35.93
                    Mean episode length: 995.45
                        Mean action std: 0.35
        Episode_Reward/track_lin_vel_xy: 0.7915
         Episode_Reward/track_ang_vel_z: 0.2411
                   Episode_Reward/alive: 0.1496
    Episode_Reward/base_linear_velocity: -0.0144
   Episode_Reward/base_angular_velocity: -0.0342
               Episode_Reward/joint_vel: -0.1077
               Episode_Reward/joint_acc: -0.0405
             Episode_Reward/action_rate: -0.3637
          Episode_Reward/dof_pos_limits: -0.0013
                  Episode_Reward/energy: -0.0027
    Episode_Reward/joint_deviation_arms: -0.1064
  Episode_Reward/joint_deviation_waists: -0.0595
    Episode_Reward/joint_deviation_legs: -0.0833
     Episode_Reward/flat_orientation_l2: -0.0097
             Episode_Reward/base_height: -0.0004
                    Episode_Reward/gait: 0.5041
              Episode_Reward/feet_slide: -0.0308
          Episode_Reward/feet_clearance: 0.9713
      Episode_Reward/undesired_contacts: -0.0007
              Curriculum/terrain_levels: 0.0424
          Curriculum/lin_vel_cmd_levels: 0.4000
     Metrics/base_velocity/error_vel_xy: 0.4542
    Metrics/base_velocity/error_vel_yaw: 1.0218
           Episode_Termination/time_out: 0.9807
        Episode_Termination/base_height: 0.0000
    Episode_Termination/bad_orientation: 0.0193
--------------------------------------------------------------------------------
                         Iteration time: 1.15s
                           Time elapsed: 00:13:56
                                    ETA: 15:37:30

8. Logs and output files

After training starts, outputs are generally written under logs/rsl_rl/ in the repository root, organized by task name and timestamp.

9. MuJoCo simulation commands

Before launching MuJoCo, you need to convert the trained pt file into an onnx file. This process is not covered here. Then place the onnx file and the deploy.yaml file into their respective directories:

Put the onnx file into unitree_rl_lab/deploy/robots/g1_29dof/config/policy/velocity/v0/exported
Put the deploy.yaml file into unitree_rl_lab/deploy/robots/g1_29dof/config/policy/velocity/v0/params

9.1 `unitree_mujoco` configuration

The MuJoCo configuration file used in this setup is unitree_mujoco/simulate/config.yaml.

A recommended configuration is:

robot: "g1"
robot_scene: "scene_29dof.xml"
domain_id: 0
interface: "lo"
use_joystick: 0
joystick_type: "xbox"
joystick_device: "/dev/input/js0"
joystick_bits: 16
print_scene_information: 1
enable_elastic_band: 1

Notes:

robot must be set to g1
robot_scene is best set to scene_29dof.xml
this setup uses a keyboard rather than a gamepad, so use_joystick: 0
switching input from a joystick to a keyboard is outside the scope of this note

9.2 Launch MuJoCo

cd /home/your-name/repos/unitree_mujoco/simulate/build
./unitree_mujoco

9.3 Launch the controller

Open another terminal and run:

cd /home/your-name/repos/unitree_rl_lab/deploy/robots/g1_29dof/build
./g1_ctrl --network lo

If communication is working correctly, the g1_ctrl terminal should show something like:

Connected to robot.
FSM: Start FixStand

9.4 What to do after launch

This is one of the easiest places to get confused. It is important to distinguish which key works in which window.

9.5 Keys in the MuJoCo window

These keys only work when the MuJoCo window has focus:

8 or the down arrow key Down
- increase the elastic-band length so the robot lowers gradually
9
- toggle the elastic band on or off

Notes:

pressing 8 does not make the robot drop to the ground immediately; each press increases the elastic-band length incrementally
if 9 is pressed before the robot is actually bearing weight, it may fall directly

9.6 Keys in the `g1_ctrl` terminal

These keys are not pressed in the MuJoCo window. They are entered in the terminal that is running ./g1_ctrl:

W move forward
S move backward
A move left
D move right
Q turn left
E turn right

10. Common issues

10.1 `Unitree-G1-29dof-Velocity` does not appear in the task list

First check whether the correct Isaac Lab environment is active and whether unitree_rl_lab has actually been installed into the current Python environment.

10.2 Robot model loading fails

Start by checking UNITREE_ROS_DIR or UNITREE_MODEL_DIR. In most cases, the problem is an incorrect path rather than a code issue.

10.3 The first launch is unusually slow

The first run of an Isaac Lab / Isaac Sim task may spend considerable time on shader caching, asset initialization, and extension loading. That does not necessarily mean the process is stuck.

11. Pitfall notes

11.1 Wrong training working directory

It is easy to assume that the training command should be run from the directory containing source/.../__init__.py, but that location is where the environment is registered, not the runtime entry point.

Training, replay, and task-list commands should all be executed from the unitree_rl_lab repository root, which is the directory that contains unitree_rl_lab.sh, scripts/, and source/:

cd unitree_rl_lab

11.2 Incorrect use of the `-p` argument

If the following command is used:

./unitree_rl_lab.sh -p scripts/rsl_rl/train.py

the result will be an error similar to:

play.py: error: unrecognized arguments

In the official repository, -p is used for play mode rather than for passing a script path. The training script should be invoked directly with Python:

python scripts/rsl_rl/train.py --headless --task Unitree-G1-29dof-Velocity

11.3 `unitree_rl_lab` package not loaded correctly

If running the script directly produces:

ModuleNotFoundError: No module named 'unitree_rl_lab'

that usually means the current Python environment cannot resolve the unitree_rl_lab package correctly.

The recommended first step is an editable install:

pip install -e source/unitree_rl_lab

11.4 `unitree_rl_lab.tasks` module loading failure

If the following error continues to appear:

No module named 'unitree_rl_lab.tasks'

that usually does not mean the package is missing. More often, PYTHONPATH is pointing at the wrong directory level.

For this repository, a more reasonable temporary command is:

PYTHONPATH=$PWD/source/unitree_rl_lab python scripts/rsl_rl/train.py --headless --task Unitree-G1-29dof-Velocity

If this workflow needs to be kept long term, an editable install is still preferable to relying permanently on a manually set PYTHONPATH.

11.5 Incorrect `UNITREE_MODEL_DIR` path configuration

When configuring the G1 asset path, if UNITREE_MODEL_DIR is set to a specific .usd file instead of the model root directory, path concatenation errors will occur. The error message often looks like:

...g1_29dof_rev_1_0.usd/G1/29dof/...

The reason is that the code appends subpaths to UNITREE_MODEL_DIR. This variable therefore needs to point to the model root directory rather than to an individual file.

When loading a custom USD file, the following error may appear:

No contact sensors added to the prim

Although the error is raised from contact-sensor-related logic, in this setup the more likely root cause is a version and interface mismatch between the unitree_rl_lab training script and rsl-rl 5.0.1. The practical fix is to add compatibility handling for rsl-rl 5.0.1. Two changes are required.

Step 1: add legacy configuration compatibility handling in train.py.

Path: scripts/rsl_rl/train.py

Below """Rest everything follows.""", after AppLauncher starts, change the code to:

app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app

# Add the following content
from isaaclab_rl.rsl_rl import (
    RslRlOnPolicyRunnerCfg,
    RslRlVecEnvWrapper,
    handle_deprecated_rsl_rl_cfg,
)

Then, after the following line in main():

agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli)

# Add the following content
agent_cfg = handle_deprecated_rsl_rl_cfg(agent_cfg, installed_version)

Step 2: add obs_groups in rsl_rl_ppo_cfg.py.

Path: source/unitree_rl_lab/unitree_rl_lab/tasks/locomotion/agents/rsl_rl_ppo_cfg.py

Add the following line to BasePPORunnerCfg:

obs_groups = {"actor": ["policy"], "critic": ["policy"]}

This line can be placed near experiment_name or policy, for example:

@configclass
class BasePPORunnerCfg(RslRlOnPolicyRunnerCfg):
    num_steps_per_env = 24
    max_iterations = 50000
    save_interval = 100
    experiment_name = ""
    empirical_normalization = False
    obs_groups = {"actor": ["policy"], "critic": ["policy"]}
    policy = RslRlPpoActorCriticCfg(
        init_noise_std=1.0,
        actor_hidden_dims=[512, 256, 128],
    )

11.6.2 Standard deviation error in the action distribution

Problem description: this issue has only been observed when training the go1-velocity task on an RTX 5090. The same configuration trains normally on the quadruped setup. The workaround below is worth trying, but it is not guaranteed to resolve the problem.

During training, an error may occur if the standard deviation becomes negative:

RuntimeError: normal expects all elements of std >= 0.0
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

The default configuration uses noise_std_type="scalar". After migration to rsl_rl 5.0.1, this parameter may be updated by gradients during PPO training and can fall below 0. Once that happens, torch.normal raises the std >= 0.0 error.

Path: source/unitree_rl_lab/unitree_rl_lab/tasks/locomotion/agents/rsl_rl_ppo_cfg.py

In BasePPORunnerCfg, replace the original policy with:

    policy = RslRlPpoActorCriticCfg(
      init_noise_std=1.0,
      noise_std_type="log",
      actor_obs_normalization=False,
      critic_obs_normalization=False,
      actor_hidden_dims=[512, 256, 128],
      critic_hidden_dims=[512, 256, 128],
      activation="elu",
      )

Note: if training has already failed because the standard deviation dropped below 0, updating policy alone does not allow that run to resume. In that case, either restart training from scratch or convert the checkpoint to the new format before resuming.

12. Supplement: task launch, GPU selection, and resume training

12.1 Standard launch

If the goal is simply to start Unitree-G1-29dof-Velocity, run:

PYTHONPATH=$PWD/source/unitree_rl_lab python scripts/rsl_rl/train.py \
  --task Unitree-G1-29dof-Velocity \
  --headless

If a local graphical environment is available and the training window needs to be observed, --headless can be omitted.

12.2 Select a specific GPU

If the machine has multiple GPUs, CUDA_VISIBLE_DEVICES can be used to control which physical GPU is exposed to the current process. For example:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$PWD/source/unitree_rl_lab python scripts/rsl_rl/train.py \
  --task Unitree-Go2-Velocity \
  --headless \
  --device cuda:0

CUDA_VISIBLE_DEVICES=1 PYTHONPATH=$PWD/source/unitree_rl_lab python scripts/rsl_rl/train.py \
  --task Unitree-G1-29dof-Velocity \
  --headless \
  --device cuda:0

Points to note:

CUDA_VISIBLE_DEVICES=1 means physical GPU 1 is exposed to the current process
--device cuda:0 should not be changed here; it means device 0 within the process-visible device list

This pattern is useful for manually placing training jobs on specific GPUs in a multi-GPU machine, but it is still single-process, single-GPU training rather than distributed multi-GPU training.

12.3 Resume training

If training is interrupted, or if training needs to continue from existing weights, use --resume, --load_run, and --checkpoint. For example:

CUDA_VISIBLE_DEVICES=1 PYTHONPATH=$PWD/source/unitree_rl_lab python scripts/rsl_rl/train.py \
  --task Unitree-G1-29dof-Velocity \
  --headless \
  --device cuda:0 \
  --resume \
  --load_run "2026-03-27_21-17-37" \
  --checkpoint "model_2300.pt"

The parameters mean:

--resume: continue training from an existing run
--load_run: specify the run directory to load
--checkpoint: specify the model file to restore

The corresponding run directory and model files can usually be found under logs/rsl_rl/<task-name>/<timestamp-directory>/.

12.4 Resuming other tasks

The same method also applies to other tasks. For example, to continue training Unitree-Go2-Velocity:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$PWD/source/unitree_rl_lab python scripts/rsl_rl/train.py \
  --task Unitree-Go2-Velocity \
  --headless \
  --device cuda:0 \
  --resume \
  --load_run "2026-03-27_20-59-40" \
  --checkpoint "model_300.pt"

Note that --task, --load_run, and --checkpoint must match each other. Training outputs from different tasks should not be mixed.

Reference materials: