Explorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from limited reconstruction scope and suboptimal visual quality. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To enable 3D world generation, we introduce two methods that lift the 2D content to 3D world, ensuring efficiency and effectiveness. To lift the panorama scene video to 3D world, we propose two separate pipelines — a feed-forward large reconstruction model for rapid 3D scene reconstruction and an optimization-based pipeline for accurate and detailed 3D scene reconstruction. For efficiency, we introduce a feed-forward panoramic 3D reconstruction model that projects video latents and camera poses to predict omni-directional 3D Gaussian Splatting attributes. To facilitate convergence, we adopt a two-stage training strategy and supervise the model using rendered panoramic novel views. For effectiveness, we also propose a optimization-based reconstruction method. However, no existing panoramic video dataset provides associated camera poses. To facilitate effective training, we also introduce the Matrix-Pano dataset — the first large-scale synthetic collection comprising 116,759 high-quality static panoramic video sequences with various annotations. Extensive experiments demonstrate the effectiveness of our proposed framework, which achieves state-of-the-art performance in panoramic video generation and 3D world generation.
Input Image / Text | Panoramic Video | 3D Scene |
---|---|---|
![]() |
||
![]() |
||
an impressionistic winter landscape |
Input Image | S-curve Trajectory | Straight Trajectory | Diagonal Right-Front |
---|---|---|---|
![]() |
![]() |
![]() |
|
![]() |
|||
![]() |
Matrix-3D can generate 3D scenes with greater range than WorldLabs.
Input Image | WorldLabs Result | HunyuanWorld 1.0 | Matrix-3D Result |
---|---|---|---|
![]() |
|||
![]() |
Our proposed optimization-based pipeline enables accurate and detailed 3D scene reconstruction, while our feed-forward variant provides fast and efficient reconstruction.
Input Image | Ours (Feed-forward) | Ours (Optimization) |
---|---|---|
![]() |
|
|
![]() |
|
3D worlds generated by Matrix-3D allow exploration in any direction, facilitating the development of an endless exploration strategy. Given an input image and an initial trajectory path, users can generate the first segment of the 3D scene. Subsequently, users can look around, change direction, and continue exploration along a second trajectory. This approach enables endless exploration, allowing users to freely navigate the 3D scene in any direction.
Input Image | First Exploration | Second Exploration | Combined Video |
---|---|---|---|
![]() |
|
|
|
![]() |
|
|
|
![]() |
|
|
-Compared with SOTA 360-video generation methods, Matrix-3D
delivers superior visual quality and plausible geometric structure in the generated panorama videos.
- Our method also outperforms previous camera-controlled video generation approaches regarding visual quality and camera controllability.
@article{yang2025matrix3d,
title = {Matrix-3D: Omnidirectional Explorable 3D World Generation},
author = {Zhongqi Yang and Wenhang Ge and Yuqi Li and Jiaqi Chen and Haoyuan Li and Mengyin An and Fei Kang and Hua Xue and Baixin Xu and Yuyang Yin and Eric Li and Yang Liu and Yikai Wang and Hao-Xiang Guo and Yahui Zhou},
journal = {arXiv preprint arXiv:2508.08086},
year = {2025}
}