Multi-Method Image Segmentation & Object Detection

A practical, hands-on exploration of four state-of-the-art deep learning models for computer vision — implemented, compared, and demonstrated on diverse real-world images. This project highlights strengths, trade-offs, and unique capabilities such as zero-shot detection and segmentation, serving as a showcase of my technical skills for portfolios and CVs.

🎯 Project Goal

In computer vision, two key challenges are:

Detection — identifying what objects are present.
Segmentation — outlining their exact boundaries.

Different models excel in different areas: some prioritize speed, others accuracy, and some offer flexible, zero-shot capabilities. This project’s goal was to:

Implement and run four leading models: DeepLabV3, YOLOv8, Segment Anything Model (SAM), and GroundingDINO + SAM.
Process the same input images across all models, saving:
- Visual outputs
- Structured data (class labels, bounding boxes, confidence scores)
Compare architectures, performance, and outputs, emphasizing the strengths of semantic segmentation, real-time detection, and prompt-based zero-shot segmentation.

💡 Solution Approach

The project is implemented as a comparative Jupyter Notebook, with each section dedicated to one model. The same curated image set is processed through all pipelines for side-by-side evaluation.

1️⃣ Semantic Segmentation — DeepLabV3

Model: deeplabv3_resnet101 (pre-trained on COCO)
Task: Dense, pixel-level classification with predefined classes.
Output: High-precision masks and segmentation maps.

2️⃣ Panoptic Segmentation — SAM + CLIP

Model: vit-h variant of SAM.
Enhancement: Integrated OpenAI CLIP for zero-shot semantic labeling of SAM’s class-agnostic masks.
Result: Fully automated panoptic segmentation with meaningful class names.

3️⃣ Object Detection — YOLOv8

Strength: Real-time detection with bounding boxes & class labels.
Use Case: Benchmarked against semantic and panoptic methods.
Extra: Applied to both original and SAM-segmented images.

4️⃣ Zero-Shot Detection & Segmentation — GroundingDINO + SAM

Pipeline:
1. GroundingDINO: Text-prompt-based detection (e.g., “a person on a horse”).
2. SAM: Precise segmentation masks for detected boxes.
Benefit: Promptable, zero-shot segmentation without retraining.

🛠️ Technologies & Libraries

Category	Tools / Frameworks
Core Frameworks	PyTorch, TorchVision
Models & Architectures	YOLOv8 (`ultralytics`), SAM, DeepLabV3, GroundingDINO, CLIP, Transformers
Utilities & Processing	Pillow, OpenCV, NumPy, Matplotlib, requests, supervision
Environment	Jupyter Notebook, pip

🖼️ Dataset

Type: Curated collection of 7 diverse real-world images.
Location: images/input/
Scenes: Group gatherings, action shots, wildlife, landscapes.
Purpose: Test generalization and robustness across varied, non-benchmark examples.

⚙️ Installation & Execution

1. Clone Repository

git clone <repository-url>
cd <repository-name>

2. Install Dependencies

pip install torch torchvision pillow matplotlib
pip install git+https://github.com/facebookresearch/segment-anything.git
pip install ultralytics
pip install groundingdino-py supervision

3. Prepare Environment

Place images in images/input/
First run will auto-download pretrained model weights into download_model/

4. Run Notebook

Open and execute segmention_yolo_deepleb.ipynb cell-by-cell.

5. View Results

Outputs are saved in:

images/deeplab_segmented/ — DeepLabV3
images/segmented/ — SAM
images/yolo/ — YOLOv8
images/groundingdino_sam/ — GroundingDINO + SAM

📊 Performance Highlights

Model	Notable Strengths	Example Performance
DeepLabV3	Strong pixel-level segmentation	7 images in 4.63s on GPU (0.66s/image)
SAM	Extremely high-quality masks	88 masks in 53.48s on CPU (heavy model)
YOLOv8	Real-time detection	Performance depends on hardware
GroundingDINO + SAM	Flexible, prompt-based segmentation	Accurate zero-shot results

🏙️ Sample Outputs

SAM

YOLOv8

DeepLabV3

GroundingDINO + SAM

e.g Detection Results (festive-holiday-gathering-with-friends)

🧠 Key Learnings & Reflections

Learned practical trade-offs between semantic, panoptic, and real-time approaches.
Composing GroundingDINO + SAM revealed the power of multi-model pipelines.
SAM + CLIP integration turned a class-agnostic model into a zero-shot labeling tool.
Running heavy models like SAM (vit-h) on CPU highlighted the importance of balancing accuracy and compute resources.

👤 Author

Mehran Asgari 📧 [email protected] 🌐 GitHub Profile

📄 License

Licensed under the Apache 2.0 License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
segmention_yolo_deepleb.ipynb		segmention_yolo_deepleb.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Method Image Segmentation & Object Detection

🎯 Project Goal

💡 Solution Approach

1️⃣ Semantic Segmentation — DeepLabV3

2️⃣ Panoptic Segmentation — SAM + CLIP

3️⃣ Object Detection — YOLOv8

4️⃣ Zero-Shot Detection & Segmentation — GroundingDINO + SAM

🛠️ Technologies & Libraries

🖼️ Dataset

⚙️ Installation & Execution

1. Clone Repository

2. Install Dependencies

3. Prepare Environment

4. Run Notebook

5. View Results

📊 Performance Highlights

🏙️ Sample Outputs

SAM

YOLOv8

DeepLabV3

GroundingDINO + SAM

e.g Detection Results (festive-holiday-gathering-with-friends)

🧠 Key Learnings & Reflections

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Languages

License

imehranasgari/ComputerVisionFusion_ObjectDetection_Segmentation

Folders and files

Latest commit

History

Repository files navigation

Multi-Method Image Segmentation & Object Detection

🎯 Project Goal

💡 Solution Approach

1️⃣ Semantic Segmentation — DeepLabV3

2️⃣ Panoptic Segmentation — SAM + CLIP

3️⃣ Object Detection — YOLOv8

4️⃣ Zero-Shot Detection & Segmentation — GroundingDINO + SAM

🛠️ Technologies & Libraries

🖼️ Dataset

⚙️ Installation & Execution

1. Clone Repository

2. Install Dependencies

3. Prepare Environment

4. Run Notebook

5. View Results

📊 Performance Highlights

🏙️ Sample Outputs

SAM

YOLOv8

DeepLabV3

GroundingDINO + SAM

e.g Detection Results (festive-holiday-gathering-with-friends)

🧠 Key Learnings & Reflections

👤 Author

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages