Using the Vision API

Authentication and Configuration

For an overview of authentication in google-cloud-python, see :doc:`google-cloud-auth`.
In addition to any authentication configuration, you should also set the :envvar:`GOOGLE_CLOUD_PROJECT` environment variable for the project you'd like to interact with. If the GOOGLE_CLOUD_PROJECT environment variable is not present, the project ID from JSON file credentials is used.

If you are using Google App Engine or Google Compute Engine this will be detected automatically.
After configuring your environment, create a :class:`~google.cloud.vision.client.Client`.

>>> from google.cloud import vision
>>> client = vision.Client()

or pass in credentials and project explicitly.

>>> from google.cloud import vision
>>> client = vision.Client(project='my-project', credentials=creds)

Creating an :class:`~google.cloud.vision.image.Image`

The :class:`~google.cloud.vision.image.Image` class is used to load image data from sources such as a Google Cloud Storage URI, raw bytes, or a file.

From a Google Cloud Storage URI

>>> from google.cloud import vision
>>> client = vision.Client()
>>> image = client.image(source_uri='gs://my-test-bucket/image.jpg')

From a filename

>>> image = client.image(filename='image.jpg')

From raw bytes

>>> with open('./image.jpg', 'rb') as image_file:
...     bytes_image = client.image(content=image_file.read())

Manual Detection

You can call the detection method manually.

>>> from google.cloud import vision
>>> from google.cloud.vision.feature import Feature
>>> from google.cloud.vision.feature import FeatureTypes
>>> client = vision.Client()
>>> image = client.image(source_uri='gs://my-test-bucket/image.jpg')
>>> features = [Feature(FeatureTypes.FACE_DETECTION, 5),
...             Feature(FeatureTypes.LOGO_DETECTION, 3)]
>>> annotations = image.detect(features)
>>> len(annotations.faces)
2
>>> for face in annotations.faces:
...     print(face.joy_likelihood)
0.94099093
0.54453093
>>> len(annotations.logos)
2
>>> for logo in annotations.logos:
...     print(logo.description)
'google'
'github'

Face Detection

:meth:`~google.cloud.vision.image.Image.detect_faces` will search for faces in an image and return the coordinates in the image of each landmark type that was detected.

>>> from google.cloud import vision
>>> client = vision.Client()
>>> image = client.image(source_uri='gs://my-test-bucket/image.jpg')
>>> faces = image.detect_faces(limit=10)
>>> first_face = faces[0]
>>> first_face.landmarks.left_eye.landmark_type
<LandmarkTypes.LEFT_EYE: 'LEFT_EYE'>
>>> first_face.landmarks.left_eye.position.x_coordinate
1301.2404
>>> first_face.detection_confidence
0.9863683
>>> first_face.joy
<Likelihood.VERY_LIKELY: 'VERY_LIKELY'>
>>> first_face.anger
<Likelihood.VERY_UNLIKELY: 'VERY_UNLIKELY'>

Label Detection

:meth:`~google.cloud.vision.image.Image.detect_labels` will attempt to label objects in an image. If there is a car, person and a dog in the image, label detection will attempt to identify those objects and score the level of certainty from 0.0 to 1.0.

>>> from google.cloud import vision
>>> client = vision.Client()
>>> image = client.image(source_uri='gs://my-storage-bucket/image.jpg')
>>> labels = image.detect_labels(limit=3)
>>> labels[0].description
'automobile'
>>> labels[0].score
0.9863683

Landmark Detection

:meth:`~google.cloud.vision.image.Image.detect_landmarks` will attempt to detect landmarks such as "Mount Rushmore" and the "Sydney Opera House". The API will also provide their known geographical locations if available.

>>> from google.cloud import vision
>>> client = vision.Client()
>>> with open('./image.jpg', 'rb') as image_file:
...     image = client.image(content=image_file.read())
>>> landmarks = image.detect_landmarks()
>>> landmarks[0].description
'Sydney Opera House'
>>> landmarks[0].locations[0].latitude
-33.857123
>>> landmarks[0].locations[0].longitude
151.213921
>>> landmarks[0].bounds.vertices[0].x_coordinate
78
>>> landmarks[0].bounds.vertices[0].y_coordinate
162

Logo Detection

With :meth:`~google.cloud.vision.image.Image.detect_logos`, you can identify brand logos in an image. Their shape and location in the image can be found by iterating through the detected logo's vertices.

>>> from google.cloud import vision
>>> client = vision.Client()
>>> with open('./image.jpg', 'rb') as image_file:
...     image = client.image(content=image_file.read())
>>> logos = image.detect_logos(limit=3)
>>> print(len(logos))
3
>>> first_logo = logos[0]
>>> first_logo.description
'Google'
>>> first_logo.score
0.9795432
>>> print(len(first_logo.bounds.vertices))
4
>>> first_logo.bounds.vertices[0].x_coordinate
78
>>> first_logo.bounds.vertices[0].y_coordinate
62

Safe Search Detection

:meth:`~google.cloud.vision.image.Image.detect_safe_search` will try to categorize the entire contents of the image under four categories.

adult: Represents the likelihood that the image contains adult content.
spoof: The likelihood that an obvious modification was made to the image's canonical version to make it appear funny or offensive.
medical: Likelihood this is a medical image.
violence: Violence likelihood.

>>> from google.cloud import vision
>>> client = vision.Client()
>>> with open('./image.jpg', 'rb') as image_file:
...     image = client.image(content=image_file.read())
>>> safe_search_results = image.detect_safe_search()
>>> safe_search = safe_search_results[0]
>>> safe_search.adult
<Likelihood.VERY_UNLIKELY: 'VERY_UNLIKELY'>
>>> safe_search.spoof
<Likelihood.POSSIBLE: 'POSSIBLE'>
>>> safe_search.medical
<Likelihood.VERY_LIKELY: 'VERY_LIKELY'>
>>> safe_search.violence
<Likelihood.LIKELY: 'LIKELY'>

Text Detection

:meth:`~google.cloud.vision.image.Image.detect_text` performs OCR to find text in an image.

>>> from google.cloud import vision
>>> client = vision.Client()
>>> with open('./image.jpg', 'rb') as image_file:
...     image = client.image(content=image_file.read())
>>> texts = image.detect_text()
>>> texts[0].locale
'en'
>>> texts[0].description
'some text in the image'
>>> texts[1].description
'some other text in the image'

Image Properties

:meth:`~google.cloud.vision.image.Image.detect_properties` will process the image and determine the dominant colors in the image.

>>> from google.cloud import vision
>>> client = vision.Client()
>>> with open('./image.jpg', 'rb') as image_file:
...     image = client.image(content=image_file.read())
>>> properties = image.detect_properties()
>>> colors = properties.colors
>>> first_color = colors[0]
>>> first_color.red
244.0
>>> first_color.blue
134.0
>>> first_color.score
0.65519291
>>> first_color.pixel_fraction
0.758658

No results found

If no results for the detection performed can be extracted from the image, then an empty list is returned. This behavior is similiar with all detection types.

Example with :meth:`~google.cloud.vision.image.Image.detect_logos`:

>>> from google.cloud import vision
>>> client = vision.Client()
>>> with open('./image.jpg', 'rb') as image_file:
...     image = client.image(content=image_file.read())
>>> logos = image.detect_logos(limit=3)
>>> logos
[]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the Vision API

Authentication and Configuration

Creating an :class:`~google.cloud.vision.image.Image`

From a Google Cloud Storage URI

From a filename

From raw bytes

Manual Detection

Face Detection

Label Detection

Landmark Detection

Logo Detection

Safe Search Detection

Text Detection

Image Properties

No results found

FilesExpand file tree

vision-usage.rst

Latest commit

History

vision-usage.rst

File metadata and controls

Using the Vision API

Authentication and Configuration

Creating an :class:`~google.cloud.vision.image.Image`

From a Google Cloud Storage URI

From a filename

From raw bytes

Manual Detection

Face Detection

Label Detection

Landmark Detection

Logo Detection

Safe Search Detection

Text Detection

Image Properties

No results found