Skip to content

Commit 98613dc

Browse files
MKhalusovaraghavanone
authored andcommitted
Depth estimation task guide (huggingface#22205)
* added doc to toc, auto tip with supported models, mention of task guide in model docs * make style * removed "see also" * minor fix
1 parent e057108 commit 98613dc

File tree

5 files changed

+153
-1
lines changed

5 files changed

+153
-1
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,8 @@
8585
title: Zero-shot object detection
8686
- local: tasks/zero_shot_image_classification
8787
title: Zero-shot image classification
88+
- local: tasks/monocular_depth_estimation
89+
title: Depth estimation
8890
title: Computer Vision
8991
- sections:
9092
- local: tasks/image_captioning

docs/source/en/model_doc/dpt.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
3333
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DPT.
3434

3535
- Demo notebooks for [`DPTForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DPT).
36-
- See also: [Semantic segmentation task guide](./tasks/semantic_segmentation)
36+
- [Semantic segmentation task guide](../tasks/semantic_segmentation)
37+
- [Monocular depth estimation task guide](../tasks/monocular_depth_estimation.mdx)
3738

3839
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
3940

docs/source/en/model_doc/glpn.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
4545
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with GLPN.
4646

4747
- Demo notebooks for [`GLPNForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/GLPN).
48+
- [Monocular depth estimation task guide](../tasks/monocular_depth_estimation.mdx)
4849

4950
## GLPNConfig
5051

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Monocular depth estimation
14+
15+
Monocular depth estimation is a computer vision task that involves predicting the depth information of a scene from a
16+
single image. In other words, it is the process of estimating the distance of objects in a scene from
17+
a single camera viewpoint.
18+
19+
Monocular depth estimation has various applications, including 3D reconstruction, augmented reality, autonomous driving,
20+
and robotics. It is a challenging task as it requires the model to understand the complex relationships between objects
21+
in the scene and the corresponding depth information, which can be affected by factors such as lighting conditions,
22+
occlusion, and texture.
23+
24+
<Tip>
25+
The task illustrated in this tutorial is supported by the following model architectures:
26+
27+
<!--This tip is automatically generated by `make fix-copies`, do not fill manually!-->
28+
29+
[DPT](../model_doc/dpt), [GLPN](../model_doc/glpn)
30+
31+
<!--End of the generated tip-->
32+
33+
</Tip>
34+
35+
In this guide you'll learn how to:
36+
37+
* create a depth estimation pipeline
38+
* run depth estimation inference by hand
39+
40+
Before you begin, make sure you have all the necessary libraries installed:
41+
42+
```bash
43+
pip install -q transformers
44+
```
45+
46+
## Depth estimation pipeline
47+
48+
The simplest way to try out inference with a model supporting depth estimation is to use the corresponding [`pipeline`].
49+
Instantiate a pipeline from a [checkpoint on the Hugging Face Hub](https://huggingface.co/models?pipeline_tag=depth-estimation&sort=downloads):
50+
51+
```py
52+
>>> from transformers import pipeline
53+
54+
>>> checkpoint = "vinvino02/glpn-nyu"
55+
>>> depth_estimator = pipeline("depth-estimation", model=checkpoint)
56+
```
57+
58+
Next, choose an image to analyze:
59+
60+
```py
61+
>>> from PIL import Image
62+
>>> import requests
63+
64+
>>> url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
65+
>>> image = Image.open(requests.get(url, stream=True).raw)
66+
>>> image
67+
```
68+
69+
<div class="flex justify-center">
70+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/depth-estimation-example.jpg" alt="Photo of a busy street"/>
71+
</div>
72+
73+
Pass the image to the pipeline.
74+
75+
```py
76+
>>> predictions = depth_estimator(image)
77+
```
78+
79+
The pipeline returns a dictionary with two entries. The first one, called `predicted_depth`, is a tensor with the values
80+
being the depth expressed in meters for each pixel.
81+
The second one, `depth`, is a PIL image that visualizes the depth estimation result.
82+
83+
Let's take a look at the visualized result:
84+
85+
```py
86+
>>> predictions["depth"]
87+
```
88+
89+
<div class="flex justify-center">
90+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/depth-visualization.png" alt="Depth estimation visualization"/>
91+
</div>
92+
93+
## Depth estimation inference by hand
94+
95+
Now that you've seen how to use the depth estimation pipeline, let's see how we can replicate the same result by hand.
96+
97+
Start by loading the model and associated processor from a [checkpoint on the Hugging Face Hub](https://huggingface.co/models?pipeline_tag=depth-estimation&sort=downloads).
98+
Here we'll use the same checkpoint as before:
99+
100+
```py
101+
>>> from transformers import AutoImageProcessor, AutoModelForDepthEstimation
102+
103+
>>> checkpoint = "vinvino02/glpn-nyu"
104+
105+
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)
106+
>>> model = AutoModelForDepthEstimation.from_pretrained(checkpoint)
107+
```
108+
109+
Prepare the image input for the model using the `image_processor` that will take care of the necessary image transformations
110+
such as resizing and normalization:
111+
112+
```py
113+
>>> pixel_values = image_processor(image, return_tensors="pt").pixel_values
114+
```
115+
116+
Pass the prepared inputs through the model:
117+
118+
```py
119+
>>> import torch
120+
121+
>>> with torch.no_grad():
122+
... outputs = model(pixel_values)
123+
... predicted_depth = outputs.predicted_depth
124+
```
125+
126+
Visualize the results:
127+
128+
```py
129+
>>> import numpy as np
130+
131+
>>> # interpolate to original size
132+
>>> prediction = torch.nn.functional.interpolate(
133+
... predicted_depth.unsqueeze(1),
134+
... size=image.size[::-1],
135+
... mode="bicubic",
136+
... align_corners=False,
137+
... ).squeeze()
138+
>>> output = prediction.numpy()
139+
140+
>>> formatted = (output * 255 / np.max(output)).astype("uint8")
141+
>>> depth = Image.fromarray(formatted)
142+
>>> depth
143+
```
144+
145+
<div class="flex justify-center">
146+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/depth-visualization.png" alt="Depth estimation visualization"/>
147+
</div>

utils/check_task_guides.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ def _find_text_in_file(filename, start_prompt, end_prompt):
7070
"translation.mdx": transformers_module.models.auto.modeling_auto.MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES,
7171
"video_classification.mdx": transformers_module.models.auto.modeling_auto.MODEL_FOR_VIDEO_CLASSIFICATION_MAPPING_NAMES,
7272
"document_question_answering.mdx": transformers_module.models.auto.modeling_auto.MODEL_FOR_DOCUMENT_QUESTION_ANSWERING_MAPPING_NAMES,
73+
"monocular_depth_estimation.mdx": transformers_module.models.auto.modeling_auto.MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES,
7374
}
7475

7576
# This list contains model types used in some task guides that are not in `CONFIG_MAPPING_NAMES` (therefore not in any

0 commit comments

Comments
 (0)