Matrix-3D: Omnidirectional Explorable 3D World Generation

Zhongqi Yang1*, Wenhang Ge2*, Yuqi Li1,3*, Jiaqi Chen1†, Haoyuan Li1†, Mengyin An1, Fei Kang1, Hua Xue1, Baixin Xu1, Yuyang Yin1, Eric Li1, Yang Liu1, Yikai Wang4, Hao-Xiang Guo1‡, Yahui Zhou1

1Skywork AI

2Hong Kong University of Science and Technology (Guangzhou)

3Institute of Computing Technology, Chinese Academy of Sciences

4School of Artificial Intelligence, Beijing Normal University
*Equal contribution. Equal contribution. Corresponding author and project lead.
-->

Abstract

Explorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from limited reconstruction scope and suboptimal visual quality. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To enable 3D world generation, we introduce two methods that lift the 2D content to 3D world, ensuring efficiency and effectiveness. To lift the panorama scene video to 3D world, we propose two separate pipelines — a feed-forward large reconstruction model for rapid 3D scene reconstruction and an optimization-based pipeline for accurate and detailed 3D scene reconstruction. For efficiency, we introduce a feed-forward panoramic 3D reconstruction model that projects video latents and camera poses to predict omni-directional 3D Gaussian Splatting attributes. To facilitate convergence, we adopt a two-stage training strategy and supervise the model using rendered panoramic novel views. For effectiveness, we also propose a optimization-based reconstruction method. However, no existing panoramic video dataset provides associated camera poses. To facilitate effective training, we also introduce the Matrix-Pano dataset — the first large-scale synthetic collection comprising 116,759 high-quality static panoramic video sequences with various annotations. Extensive experiments demonstrate the effectiveness of our proposed framework, which achieves state-of-the-art performance in panoramic video generation and 3D world generation.



Overview of Matrix-3D



Given trajectory guidance in the form of scene mesh renderings and corresponding masks—obtained by rendering an estimated mesh along a user-defined camera trajectory—we train an image-to-video diffusion model to generate high-quality panoramic videos that precisely follow the specified trajectory. The generated 2D panoramic content is then lifted into an omnidirectional, explorable 3D world using a large-scale panorama reconstruction model.

Matrix-Pano Dataset: Scalable Synthetic Panoramic Videos

Existing 3D scene datasets are often limited in scale, inconsistent in quality, and lack accurate camera and geometric annotations. Meanwhile, collecting real-world 3D scene data remains costly. To address these challenges, we introduce the Matrix-Pano dataset—a scalable synthetic panoramic video dataset designed for generating high-quality, explorable panoramic sequences.

Data Samples

Matrix-Pano Dataset Examples

Automated Trajectory Generation & Capture

Trajectory and Camera Control

Scale & Open Source Plan

Through a rigorous multi-stage generation and filtering process, we retained 116,759 high-quality panoramic video sequences, each annotated with its corresponding 3D exploration path. A curated subset will be open-sourced to promote research and development in panoramic video generation and 3D scene understanding.

Geometric & Textural Consistency

Input Image / Text Panoramic Video 3D Scene
an impressionistic winter landscape

Fine-Grained Trajectory Control

Generate different 3D scenes based on different user-specified camera trajectories. Each row shows an input panorama (top) and the rendered 3D scene video (bottom), corresponding to different camera paths.

Input Image S-curve Trajectory Straight Trajectory Diagonal Right-Front

Large-Scale 3D Scene Generation

Matrix-3D can generate 3D scenes with greater range than WorldLabs.

Input Image WorldLabs Result HunyuanWorld 1.0 Matrix-3D Result

3D Scene Reconstruction: panorama LRM vs. 3DGS optimization

Our proposed optimization-based pipeline enables accurate and detailed 3D scene reconstruction, while our feed-forward variant provides fast and efficient reconstruction.

Input Image Ours (Feed-forward) Ours (Optimization)




Endless Exploration

3D worlds generated by Matrix-3D allow exploration in any direction, facilitating the development of an endless exploration strategy. Given an input image and an initial trajectory path, users can generate the first segment of the 3D scene. Subsequently, users can look around, change direction, and continue exploration along a second trajectory. This approach enables endless exploration, allowing users to freely navigate the 3D scene in any direction.

Input Image First Exploration Second Exploration Combined Video









Comparison


Comparison of Panoramic Video Generation and Camera Guided Generation Models



-Compared with SOTA 360-video generation methods, Matrix-3D delivers superior visual quality and plausible geometric structure in the generated panorama videos.
- Our method also outperforms previous camera-controlled video generation approaches regarding visual quality and camera controllability.



BibTeX

@article{yang2025matrix3d,
        title     = {Matrix-3D: Omnidirectional Explorable 3D World Generation},
        author    = {Zhongqi Yang and Wenhang Ge and Yuqi Li and Jiaqi Chen and Haoyuan Li and Mengyin An and Fei Kang and Hua Xue and Baixin Xu and Yuyang Yin and Eric Li and Yang Liu and Yikai Wang and Hao-Xiang Guo and Yahui Zhou},
        journal   = {arXiv preprint arXiv:2508.08086},
        year      = {2025}
      }