Monocular 3D Reconstruction

Single-View to 3D Scene in Under a Second -- Powered by Apple's SHARP Research

About

Monocular 3D Reconstruction 實作單張影像到 3D 場景的快速重建流程，將 2D 照片轉換為可渲染的 3D 表示。適合用於研究復現、即時 3D 內容生成原型，以及評估單目重建在 AR/VR 與視覺化應用的可行性。

About (EN)

Monocular 3D Reconstruction implements single-image 3D scene generation with Gaussian-style representations. It is useful for research replication, rapid 3D content prototyping, and AR/VR feasibility studies.

📋 Quick Summary

📸 單張照片到 3D 場景，不到一秒完成！ 本專案實作 Apple SHARP 研究成果，透過單一前饋神經網路，將任意 2D 照片轉換為高品質 3D 高斯潑濺（Gaussian Splatting）表示。🧠 核心技術包含影像編碼器、多解析度解碼器、單眼深度估計與高斯參數預測，一次推理即可產出具有 絕對深度的度量級 3D 模型。🎮 內建基於 SuperSplat 的互動式網頁檢視器（Next.js 架構），使用者上傳照片後可即時在瀏覽器中 360 度探索 3D 場景。🎬 支援 CUDA 加速的相機軌跡影片渲染，輸出 .mp4 動畫。⚡ 支援 CPU、CUDA 及 Apple MPS（Metal）多裝置推理。📊 相較先前最先進技術，LPIPS 降低 25-34%、DISTS 降低 21-43%，展現強大的零樣本泛化能力。🏗️ 適合電腦視覺研究者、3D 內容創作者、以及需要快速從照片生成 3D 資產的開發者使用。

🤔 Why This Exists

Generating a full 3D scene from a single photograph has long been one of the hardest problems in computer vision. This project implements Apple's SHARP approach (Sharp Monocular View Synthesis), which produces photorealistic 3D Gaussian representations from a single 2D image in one feedforward pass -- no multi-view capture, no scanning, no waiting. It pairs the ML inference pipeline with a custom web-based 3D viewer built on SuperSplat for immediate interactive exploration of generated scenes.

🏗️ Architecture

Single 2D Image (any photograph)
        |
        v
+--------------------------------------------------+
|  SHARP Neural Network                             |
|                                                   |
|  Image Encoder --> Multi-Resolution Decoder       |
|       --> Monocular Depth Estimation              |
|       --> Gaussian Parameter Prediction (NDC)     |
|       --> Unproject to Metric 3D Space            |
|                                                   |
|  Single feedforward pass, < 1 second on GPU       |
+--------------------------------------------------+
        |
        v
3D Gaussian Splat (.ply) -- metric scale, absolute depth
        |
        +---> Web Viewer (Next.js + SuperSplat)
        |         - Interactive 3D exploration in browser
        |         - Upload image, view result immediately
        |
        +---> Video Rendering (gsplat, CUDA)
                  - Camera trajectory animation
                  - .mp4 output

Key Capabilities

Sub-second 3D generation from any single photograph via a single neural network forward pass
Metric-scale output with absolute depth -- enables real-world camera movements
3DGS-compatible (.ply format) works with any Gaussian Splat renderer
Interactive web viewer for immediate 3D scene exploration in the browser
Multi-device inference -- runs on CPU, CUDA, and Apple MPS (Metal)
Zero-shot generalization across datasets, reducing LPIPS by 25-34% and DISTS by 21-43% vs. prior state of the art

🛠️ Tech Stack

Layer	Technology
ML Framework	PyTorch
Model	Encoder-Decoder with Gaussian Head (SHARP)
3D Representation	3D Gaussian Splatting
CLI	Click
Web Viewer	Next.js (App Router), SuperSplat, Three.js
Video Rendering	gsplat (CUDA only)
Package Manager	pip, pyproject.toml

🏁 Quick Start

ML Pipeline

# Create environment
conda create -n sharp python=3.13
conda activate sharp

# Install dependencies
pip install -r requirements.txt

# Run prediction (model downloads automatically on first run)
sharp predict -i /path/to/image.jpg -o /path/to/output/

# Render camera trajectory video (CUDA GPU required)
sharp predict -i /path/to/image.jpg -o /path/to/output/ --render

Web Viewer

cd viewer
npm install
npm run dev
# Open http://localhost:3000

📁 Project Structure

monocular-3d-reconstruction/
  src/
    sharp/
      cli/
        predict.py               # Main prediction CLI
        render.py                # Camera trajectory video rendering
      models/
        encoders/                # Image feature encoders
        decoders/                # Multi-resolution convolutional decoders (UNet, etc.)
        gaussian_decoder.py      # 3D Gaussian parameter prediction
        predictor.py             # End-to-end RGB Gaussian predictor
        monodepth.py             # Monocular depth estimation module
        composer.py              # Model composition
        heads.py, blocks.py      # Neural network building blocks
      utils/                     # I/O, Gaussian ops, logging
  viewer/
    src/
      app/                       # Next.js App Router
        api/generate/route.ts    # Image-to-3D API endpoint
        api/ply/route.ts         # PLY file serving
      components/
        SuperSplatViewer.tsx      # SuperSplat-based 3D viewer
        GaussianSplatViewer.tsx   # Custom Gaussian Splat viewer
        EmbeddedViewer.tsx       # Embedded viewer wrapper
    public/supersplat/           # SuperSplat viewer assets
  supersplat-viewer-source/      # SuperSplat viewer source code
  data/                          # Sample images and teaser assets

Research Reference

Based on: Sharp Monocular View Synthesis in Less Than a Second -- Mescheder, Dong, Li, Bai, Santos, Hu, Lecouat, Zhen, Delaunoy, Fang, Tsin, Richter, Koltun (Apple, 2025).

arXiv:2512.10685 | Project Page

Built by Huang Akai (Kai) -- Creative Technologist, Founder @ Universal FAW Labs

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
src/sharp		src/sharp
supersplat-viewer-source		supersplat-viewer-source
viewer		viewer
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
ACKNOWLEDGEMENTS		ACKNOWLEDGEMENTS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_MODEL		LICENSE_MODEL
README.md		README.md
images.jpeg		images.jpeg
ml-shape.code-workspace		ml-shape.code-workspace
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monocular 3D Reconstruction

About

About (EN)

📋 Quick Summary

🤔 Why This Exists

🏗️ Architecture

Key Capabilities

🛠️ Tech Stack

🏁 Quick Start

ML Pipeline

Web Viewer

📁 Project Structure

Research Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Monocular 3D Reconstruction

About

About (EN)

📋 Quick Summary

🤔 Why This Exists

🏗️ Architecture

Key Capabilities

🛠️ Tech Stack

🏁 Quick Start

ML Pipeline

Web Viewer

📁 Project Structure

Research Reference

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages