🎬 RAPO++ Text-to-Video Prompt Optimization
This demo showcases Stage 1 (RAPO): Retrieval-Augmented Prompt Optimization using knowledge graphs.
How it works:
- Enter a simple text-to-video prompt
- The system retrieves contextually relevant modifiers from a knowledge graph
- Your prompt is enhanced with specific actions and atmospheric details
- Use the optimized prompt for better T2V generation results!
Example prompts to try:
- "A person walking"
- "A car driving"
- "Someone cooking"
- "A group of people talking"
Based on the paper: RAPO++ (arXiv:2510.20206)
Input
1 5
1 10
Results
Examples
| Original Prompt | Number of Places to Retrieve | Modifiers per Place |
|---|
About RAPO++
RAPO++ is a three-stage framework for text-to-video generation prompt optimization:
- Stage 1 (RAPO): Retrieval-Augmented Prompt Optimization using relation graphs (demonstrated here)
- Stage 2 (SSPO): Self-Supervised Prompt Optimization with test-time iterative refinement
- Stage 3: LLM fine-tuning on collected feedback data
The system is model-agnostic and works with various T2V models (Wan2.1, Open-Sora-Plan, HunyuanVideo, etc.).
Papers:
- RAPO (CVPR 2025): The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
- RAPO++ (arXiv:2510.20206): Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
Project Page: https://whynothaha.github.io/RAPO_plus_github/
GitHub: https://github.com/Vchitect/RAPO