Universal Humanoid Motion Representations for Physics-Based Control



Abstract

We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. Due to the high-dimensionality of humanoid control as well as the inherent difficulties in reinforcement learning, prior methods have focused on learning skill embeddings for a narrow range of movement styles (e.g. locomotion, game characters) from specialized motion datasets. This limited scope hampers its applicability in complex tasks. Our work closes this gap, significantly increasing the coverage of motion representation space. To achieve this, we first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator. This is achieved using an encoder-decoder structure with a variational information bottleneck. Additionally, we jointly learn a prior conditioned on proprioception (humanoid's own pose and velocities) to improve model expressiveness and sampling efficiency for downstream tasks. Sampling from the prior, we can generate long, stable, and diverse human motions. Using this latent space for hierarchical RL, we show that our policies solve tasks using natural and realistic human behavior. We demonstrate the effectiveness of our motion representation by solving generative tasks and motion tracking using VR controllers.


  1. MoCap Motion Imitation
  2. Random Motion Generation
  3. Motion Tracking Downstream Task
  4. Generative Downstream Tasks
  5. Comparison with VQ-latent space


MoCap Motion Imitation

In this section, we visualize the motion imitation result from PHC+ and PULSE (distilled from PHC+) as a sanity check. PHC+ can imitate ALL of its training data as well as recovery from fail-states such as fallen on the ground. PULSE largely inherit these abilities through online distillation.

PHC+: train data imitation
PHC+: test data + fail-state recovery
PULSE: train data imitation
PULSE: test data + fail-state recovery

Random Motion Generation

Switching Between Imitation and Generation

Here we show that we can dynamically switch between random motion generation and imitation, thanks to the fail-state recovery ability of PULSE. The video on the left shows we begin with imitation, then switch to random motion sampling, and back to imitation.

Random Sampling

In this section, we visualize 8 humanoids together using noise sampled from the prior. We also show that we can vary the sampled motion styles by changing the variance for sampling. If using a small std (the learned prior usually computes a small variance), the sampled motion is smooth and stable. In this case, the humanoid can sometimes stand still for a long time before starting to move again. If using a bigger variance (e.g. 0.22), the motion become more erratic and energetic, and fall down more. Luckily, the humanoid has the ability to get up by sampling the recovery skill. Notice that this behavior originates from training with PHC+, which has the the ability to recover from fallen states. The getup behavior comes completely from random sampling from the prior.

Random generation, using a small std.
Random generation, using a big std.

We can enable inter-human collision and generate human-to-human interactions.

Random generation, using a small std.
Random generation, using a big std.

Comparison with SOTA on Motion Generation

In this section, we compare with SOTA generative and latent space models, comparing with both kinematics-based (HuMoR) and physics-based (ASE and CALM) models. Comparing to HuMoR, our method can generate stable, long-term, and physically plausible motion, while during our experiments, more than 50% of generated motion (out of 200) for HuMoR have implausible motion. Compared to other physics-based latent space, our representation has more coverage and can generate more natural and realistic motion, even though the training data is the same.

Comparison with kinematics-based motion latent space, HuMoR. We use the same sequence as the initial state as HuMoR. In 00:45 and 1:55, HuMoR generated implausible motion.
Comparison with physics-based motion latent space, ASE and CALM, as well training our model from scratch using RL (without distillation).

Training Visualization

Here we visualize sampling behavior from our latent space during training (task: reach and speed). For all downstream tasks, we use the fixed standard deviation of 0.22 during training. We can see that using our latent space as action space for hierarchical RL, the agent can sample realistic human behavior during training.

Motion Tracking Downstream Task

In this section, showcase the VR controller tracking task, where we track the 6DOF pose of the two hand controllers and headset. This is a challenging task as it requires the policy to perform free-form motion tracking to match the controllers. We show that our latent space has enough coverage of the motor skill from AMASS to solve this task and can be applied to real-world captures. Input is visualized as three red dots.

Real-world data capture
Comparison with other latent space models

Here we visualize tracking performance on the synthetic data used to train the tracker using AMASS data. Input is visualized as three red dots.

AMASS train data
AMASS test data

Generative Downstream Task

In this section, we show results on applying our method to downstream generative tasks.

Terrain

On the challenging terrain traversal task, our method is able to demonstrate agile human behavior using only simple trajectory following rewarld (without using any additional adversarial rewards like in PACER). Applying ASE at 30Hz can solve this task somewhat, though the motion can be jerky. CALM could not solve this task due to the lack of a style reward. Training from scratch shows unnatural motion.

Task: Trajectory following and terrain traversal
Task: Trajectory following and terrain traversal

Strike

Task: Strike block
Task: Strike block

Reach

The red dot in the air indicates reach target.

Task: Reach
Task: Reach

Speed

The red block on the ground indicates target speed.

Task: X-direction speed
Task: X-direction speed

Comparison with VQ-latent space

Here show our attempt at using a vector quantized (VQ) latent space. While after distillation, the policy can achieve high imitation success rate, it manifest micro jitters to remain standing still. This is a result of the controller switching between different discrete latent codes rapidly. One can increase the latent space size and number of codes to maybe ameliorate this behavior, but then it could defeat the purpose of using a quantized latent space as the discrete space become more and more expressive and closer to a continuous one.