VIGOR : Visual Goal-In-Context Inference for Unified Humanoid Fall Safety

University of Michigan

TL;DR: A teacher-student framework that distills terrain-aware recovery goals into a deployable egocentric-depth policy.

Why Fall Safety?

Reliable fall-recovery is essential for enabling humanoid robots to operate in real, cluttered environments. Unlike quadrupeds and wheeled systems, humanoids experience high-energy impacts, complex whole-body contact, and significant viewpoint changes during a fall, and failure to stand up leaves the robot effectively inoperable. Existing approaches either treat impact mitigation and standing as separate modules or assume flat terrain and blind proprioceptive sensing, limiting robustness and safety in unstructured settings. We introduce a unified framework for fall mitigation and stand-up recovery with perceptual awareness using egocentric vision. Our method employs a privileged teacher that learns recovery strategies conditioned on sparse demonstration priors and local terrain geometry, and a deployable student that infers the teacher’s goal representation from egocentric depth and short-term proprioception via distillation. This enables the student to reason about both how to fall and how to get up while avoiding unsafe contacts. Simulation experiments demonstrate higher success and safer recovery across diverse non-flat terrains and disturbance conditions compared to blind baselines.

Teaser is Under Construction

Vision-Enabled Fall Recovery

We investigate the impact of vision on fall recovery by comparing our visual policy to a blind proprioceptive-only policy. The controller is activated only after the robot tilts 20 degrees, ensuring recovery begins from a true fall. The blind policy exhibits more unsafe behaviors, including increased head and neck contacts and erratic motions.

Unsafe Blind policy

Safe Visual policy (ours)

Unified fall mitigation + stand-up recovery with egocentric vision.

VIGOR Framework

Our visual fall-recovery policy training pipeline has four key components:

  • 1. Motion retargeting: Human fall–recovery demonstrations are kinematically retargeted to the robot.
  • 2. Terrain alignment: References are used directly on flat terrain and coarsely projected onto uneven terrain to provide sparse tracking targets.
  • 3. Teacher policy learning: A privileged teacher is trained with RL and learns a Visual Goal-in-Context representation that captures the immediate recovery pose and local terrain.
  • 4. Goal-in-context distillation: A student distills the teacher’s terrain-aware recovery behavior from egocentric depth and short-term proprioceptive history for deployment.
Our method diagram

Fall Safety

We evaluate our policy on fall mitigation tasks across diverse terrains. We start the robot in a default pose, and engage the policy once the robot has tilted past 20 degrees.

Forward fall

Backward fall

Sideways fall

Toward stairs

Diagonal on stairs

Diagonal on stairs

Stones

Off Platform

Stones

Stand-Up Recovery

We also evaluate our policy on standing up tasks across many different terrains with diverse initial configurations.

Flat

Stones

Box (face up)

Legs up (face up)

Stairs (face up)

Stairs (face down)

Gap (face down)

Gap (face up)

Legs up (face down)

Simulation Experiments

Under Construction

BibTeX

@article{test,
  author    = {TODO},
  title     = {TODO},
  journal   = {TODO},
  year      = {TODO}
}