Single-View to 3D Scene in Under a Second -- Powered by Apple's SHARP Research
Monocular 3D Reconstruction 實作單張影像到 3D 場景的快速重建流程,將 2D 照片轉換為可渲染的 3D 表示。適合用於研究復現、即時 3D 內容生成原型,以及評估單目重建在 AR/VR 與視覺化應用的可行性。
Monocular 3D Reconstruction implements single-image 3D scene generation with Gaussian-style representations. It is useful for research replication, rapid 3D content prototyping, and AR/VR feasibility studies.
📸 單張照片到 3D 場景,不到一秒完成! 本專案實作 Apple SHARP 研究成果,透過單一前饋神經網路,將任意 2D 照片轉換為高品質 3D 高斯潑濺(Gaussian Splatting)表示。🧠 核心技術包含影像編碼器、多解析度解碼器、單眼深度估計與高斯參數預測,一次推理即可產出具有 絕對深度的度量級 3D 模型。🎮 內建基於 SuperSplat 的互動式網頁檢視器(Next.js 架構),使用者上傳照片後可即時在瀏覽器中 360 度探索 3D 場景。🎬 支援 CUDA 加速的相機軌跡影片渲染,輸出 .mp4 動畫。⚡ 支援 CPU、CUDA 及 Apple MPS(Metal)多裝置推理。📊 相較先前最先進技術,LPIPS 降低 25-34%、DISTS 降低 21-43%,展現強大的零樣本泛化能力。🏗️ 適合電腦視覺研究者、3D 內容創作者、以及需要快速從照片生成 3D 資產的開發者使用。
Generating a full 3D scene from a single photograph has long been one of the hardest problems in computer vision. This project implements Apple's SHARP approach (Sharp Monocular View Synthesis), which produces photorealistic 3D Gaussian representations from a single 2D image in one feedforward pass -- no multi-view capture, no scanning, no waiting. It pairs the ML inference pipeline with a custom web-based 3D viewer built on SuperSplat for immediate interactive exploration of generated scenes.
Single 2D Image (any photograph)
|
v
+--------------------------------------------------+
| SHARP Neural Network |
| |
| Image Encoder --> Multi-Resolution Decoder |
| --> Monocular Depth Estimation |
| --> Gaussian Parameter Prediction (NDC) |
| --> Unproject to Metric 3D Space |
| |
| Single feedforward pass, < 1 second on GPU |
+--------------------------------------------------+
|
v
3D Gaussian Splat (.ply) -- metric scale, absolute depth
|
+---> Web Viewer (Next.js + SuperSplat)
| - Interactive 3D exploration in browser
| - Upload image, view result immediately
|
+---> Video Rendering (gsplat, CUDA)
- Camera trajectory animation
- .mp4 output
- Sub-second 3D generation from any single photograph via a single neural network forward pass
- Metric-scale output with absolute depth -- enables real-world camera movements
- 3DGS-compatible (.ply format) works with any Gaussian Splat renderer
- Interactive web viewer for immediate 3D scene exploration in the browser
- Multi-device inference -- runs on CPU, CUDA, and Apple MPS (Metal)
- Zero-shot generalization across datasets, reducing LPIPS by 25-34% and DISTS by 21-43% vs. prior state of the art
| Layer | Technology |
|---|---|
| ML Framework | PyTorch |
| Model | Encoder-Decoder with Gaussian Head (SHARP) |
| 3D Representation | 3D Gaussian Splatting |
| CLI | Click |
| Web Viewer | Next.js (App Router), SuperSplat, Three.js |
| Video Rendering | gsplat (CUDA only) |
| Package Manager | pip, pyproject.toml |
# Create environment
conda create -n sharp python=3.13
conda activate sharp
# Install dependencies
pip install -r requirements.txt
# Run prediction (model downloads automatically on first run)
sharp predict -i /path/to/image.jpg -o /path/to/output/
# Render camera trajectory video (CUDA GPU required)
sharp predict -i /path/to/image.jpg -o /path/to/output/ --rendercd viewer
npm install
npm run dev
# Open http://localhost:3000monocular-3d-reconstruction/
src/
sharp/
cli/
predict.py # Main prediction CLI
render.py # Camera trajectory video rendering
models/
encoders/ # Image feature encoders
decoders/ # Multi-resolution convolutional decoders (UNet, etc.)
gaussian_decoder.py # 3D Gaussian parameter prediction
predictor.py # End-to-end RGB Gaussian predictor
monodepth.py # Monocular depth estimation module
composer.py # Model composition
heads.py, blocks.py # Neural network building blocks
utils/ # I/O, Gaussian ops, logging
viewer/
src/
app/ # Next.js App Router
api/generate/route.ts # Image-to-3D API endpoint
api/ply/route.ts # PLY file serving
components/
SuperSplatViewer.tsx # SuperSplat-based 3D viewer
GaussianSplatViewer.tsx # Custom Gaussian Splat viewer
EmbeddedViewer.tsx # Embedded viewer wrapper
public/supersplat/ # SuperSplat viewer assets
supersplat-viewer-source/ # SuperSplat viewer source code
data/ # Sample images and teaser assets
Based on: Sharp Monocular View Synthesis in Less Than a Second -- Mescheder, Dong, Li, Bai, Santos, Hu, Lecouat, Zhen, Delaunoy, Fang, Tsin, Richter, Koltun (Apple, 2025).
arXiv:2512.10685 | Project Page
Built by Huang Akai (Kai) -- Creative Technologist, Founder @ Universal FAW Labs