VIGOR : Visual Goal-In-Context Inference for Unified Humanoid Fall Safety

TL;DR: A teacher-student framework that distills terrain-aware recovery goals into a deployable egocentric-depth policy.

Policy disabled

Why Fall Safety?

Unified recovery demands perception, control, and robust contact strategy.

Reliable fall recovery is critical for humanoids operating in cluttered environments. Unlike quadrupeds or wheeled robots, humanoids experience high-energy impacts, complex whole-body contact, and large viewpoint changes during a fall, where the robot must rapidly interpret the surrounding scene to recover safely. Existing methods fragment fall safety into separate problems such as fall avoidance, impact mitigation, and stand-up recovery, or rely on end-to-end policies trained without visual perception through reinforcement learning or imitation learning, often on flat terrain, limiting scalability and generalization. We present a unified fall safety approach for all phases of fall recovery. It builds on two insights: 1) natural human fall and recovery poses are highly constrained and transferable from flat to complex terrain through alignment, and 2) fast whole-body reactions require integrated perceptual-motor representations. We train a privileged teacher using sparse human demonstrations on flat and simulated complex terrains, and distill it into a deployable student that relies only on egocentric depth and proprioception. The student reacts by matching the teacher’s goal-in-context latent representation, combining the next target pose with the locally perceived terrain. Experiments in simulation and on a real Unitree G1 humanoid demonstrate robust zero-shot fall safety across diverse non-flat environments without real-world fine-tuning.

Vision-Enabled Fall Recovery

Visual context reduces hazardous contacts and stabilizes recovery behavior.

We investigate the impact of vision on fall recovery by comparing our visual policy to a blind proprioceptive-only policy. The controller is activated only after the robot tilts 20 degrees, to test genuine fall recovery from a true fall state. The blind policy exhibits more unsafe behaviors, including increased head and neck contacts and erratic motions.

Unsafe Blind policy

Safe Visual policy (ours)

Interactive 3D Rollouts

Compare privileged teacher, depth student, and blind student rollouts side by side.

Teacher
    Student (Depth)
      Student (Blind)

        VIGOR Framework

        Teacher-student distillation with terrain-aware goal inference.

        Our visual fall-recovery policy training pipeline has four key components:

        • 1. Motion retargeting: Human fall–recovery demonstrations are kinematically retargeted to the robot.
        • 2. Terrain alignment: References are used directly on flat terrain and coarsely projected onto uneven terrain to provide sparse tracking targets.
        • 3. Teacher policy learning: A privileged teacher is trained with RL and learns a Visual Goal-in-Context representation that captures the immediate recovery pose and local terrain.
        • 4. Goal-in-context distillation: A student distills the teacher’s terrain-aware recovery behavior from egocentric depth and short-term proprioceptive history for deployment.
        Our method diagram

        Factorized Data Generation

        Decouple motion priors and terrain geometry for scalable coverage.

        By factorizing motion priors and terrain variation, we enable scalable recovery learning without exhaustive pose–terrain coverage.

        Human motions
        Sparse demos as structural priors
        ×
        Terrain bank
        Independent geometry variation
        =
        Projected keypoints
        Scalable pose × terrain coverage

        Fall Safety

        Impact mitigation under forward, backward, lateral, and uneven-terrain falls.

        We evaluate our policy on fall mitigation tasks across diverse terrains. We start the robot in a default pose, and engage the policy only after the robot tilts 20 degrees to test genuine fall recovery.

        Forward fall

        Backward fall

        Sideways fall

        Toward stairs

        Diagonal on stairs

        Diagonal on stairs

        Stones

        Off Platform

        Stones

        Stand-Up Recovery

        Robust stand-up from diverse poses and terrain constraints.

        We also evaluate our policy on standing up tasks across many different terrains with diverse initial configurations.

        Flat

        Stones

        Box (face up)

        Legs up (face up)

        Stairs (face up)

        Stairs (face down)

        Gap (face down)

        Gap (face up)

        Legs up (face down)

        BibTeX

        @misc{azulay2026vigor,
          title        = {VIGOR: Visual Goal-In-Context Inference for Unified Humanoid Fall Safety},
          author       = {Osher Azulay and Zhengjie Xu and Andrew Scheffer and Stella X. Yu},
          year         = {2026},
          eprint       = {2602.16511},
          archivePrefix= {arXiv},
          primaryClass = {cs.RO},
          url          = {https://arxiv.org/abs/2602.16511}
        }