AURA: Autonomous Upskilling with Retrieval-Augmented Agents

TL;DR: AURA uses RAG retrieval and a YAML schema with static verification to iteratively generate rewards, domain randomizations, and training configs for curriculum RL.

Abstract

Training Curriculum: Click on a rollout above to see

Simulation Rollouts

Simulation Rollouts Thumbnail

Hardware Rollouts

Hardware deployment rollouts showing zero-shot transfer from simulation to real robot.

AURA Framework

Framework thumbnail

AURA enables prompt-to-policy deployment through specialized LLM agents. A High-Level Planner queries past experiences from a vector database to design a multi-stage workflow, which Stage-Level LLMs expand into schema-validated YAML files encoding rewards, randomizations, and training configurations. After GPU-accelerated training using MuJoCo-MJX, user feedback on deployment rollouts is attached to the curriculum and embedded into the VDB, enabling iterative improvement across tasks and embodiments.

Iteration Graphs

Performance comparison graphs Reveal overlay (down-up)

Survival and linear velocity tracking scores across iterations to evaluate locomotion policy quality on a custom humanoid. The plots show the policy quality improvements of AURA over five iterations compared to MuJoCo Playground's expert designed rewards. AURA Blind generates rewards from scratch (VDB is initialized as empty) and AURA Tune modifies and improves an existing reward designed for another embodiment (VDB is initialized with MuJoCo Playground's Berkeley Humanoid expert human rewards, domain randomizations, and training configuration).

Framework Comparisons

Berkeley Humanoid performance comparison graphs Reveal overlay (down-up)

Policy evaluation across metrics. Episode survival length and linear velocity tracking are used to evaluate a velocity command following task on the Berkeley Humanoid. The success rate of pushing cubes is evaluated on the UR5e enviorment. *CurricuLLM's Berkeley Humanoid and Fetch-Push results are reported in the paper. **MuJoCo Playground's Pushing Cube Success is reported using Franka Emika Panda rewards on the UR5e embodiment, which shouldn't be expected to be successful. AURA adapts the Franka expert reward and training configuration into an effective curriculum for the UR5e.

Framework Training-Launch Comparisons

Generation bar graph Reveal overlay (down-up)

Training-launch-success-rate comparing AURA and its ablated variants. All evaluations are conducted with GPT-4.1, as the original models used in the baselines are deprecated at the time of assessment. *CurricuLLM is evaluated on generating rewards for Berkeley Humanoid locomotion. **Eureka's 12% is evaluated on training-launch success rate for their ANYmal task, which is most similar in complexity to humanoid robot tasks. Eureka's training-launch success rate across all available embodiments in their examples is 49% with simpler tasks generating more successfully.

Learning Curve Graphs

Learning curve graphs Learning curve – reveal

The learning curve above shows the training convergence of each framework's policies.

Full Project Video

Full project video thumbnail

Appendix

BibTeX

@article{aura2025,
  title={AURA: Autonomous Upskilling with Retrieval-Augmented Agents},
  author={Anonymous Authors},
  journal={Under Review},
  year={2025}
}