Matrix-3D

Explorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from limited reconstruction scope and suboptimal visual quality. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To enable 3D world generation, we introduce two methods that lift the 2D content to 3D world, ensuring efficiency and effectiveness. To lift the panorama scene video to 3D world, we propose two separate pipelines — a feed-forward large reconstruction model for rapid 3D scene reconstruction and an optimization-based pipeline for accurate and detailed 3D scene reconstruction. For efficiency, we introduce a feed-forward panoramic 3D reconstruction model that projects video latents and camera poses to predict omni-directional 3D Gaussian Splatting attributes. To facilitate convergence, we adopt a two-stage training strategy and supervise the model using rendered panoramic novel views. For effectiveness, we also propose a optimization-based reconstruction method. However, no existing panoramic video dataset provides associated camera poses. To facilitate effective training, we also introduce the Matrix-Pano dataset — the first large-scale synthetic collection comprising 116,759 high-quality static panoramic video sequences with various annotations. Extensive experiments demonstrate the effectiveness of our proposed framework, which achieves state-of-the-art performance in panoramic video generation and 3D world generation.

Input Image / Text	Panoramic Video	3D Scene


an impressionistic winter landscape

Input Image	S-curve Trajectory	Straight Trajectory	Diagonal Right-Front

Input Image	WorldLabs Result	HunyuanWorld 1.0	Matrix-3D Result

Input Image	First Exploration	Second Exploration	Combined Video

Matrix-3D: Omnidirectional Explorable 3D World Generation

Abstract

Overview of Matrix-3D

Matrix-Pano Dataset: Scalable Synthetic Panoramic Videos

Data Samples

Automated Trajectory Generation & Capture

Scale & Open Source Plan

Geometric & Textural Consistency

Fine-Grained Trajectory Control

Generate different 3D scenes based on different user-specified camera trajectories. Each row shows an input panorama (top) and the rendered 3D scene video (bottom), corresponding to different camera paths.

Large-Scale 3D Scene Generation

3D Scene Reconstruction: panorama LRM vs. 3DGS optimization

Endless Exploration

Comparison

Comparison of Panoramic Video Generation and Camera Guided Generation Models

BibTeX

Input Image	Ours (Feed-forward)	Ours (Optimization)