-
Notifications
You must be signed in to change notification settings - Fork 475
New metric: CLIP IQA #1931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
New metric: CLIP IQA #1931
Changes from 46 commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
9197f0f
starting point
SkafteNicki 0a70ff7
links
SkafteNicki dcd713f
small correction to clip score
SkafteNicki 6256ba2
implementation update
SkafteNicki d94ab1e
some testing
SkafteNicki 5294fba
fix implementation
SkafteNicki d8d09e2
update
SkafteNicki c39cc68
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki e53da97
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6a59e89
changelog
SkafteNicki 86af666
merge master
SkafteNicki 7e60e4c
fix ruff
SkafteNicki d566614
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b15ca90
working module
SkafteNicki f39c554
improve testing
SkafteNicki baf2b0a
merge master
SkafteNicki d1880b3
change transformers check
SkafteNicki 04ef55a
fix doc mistake
SkafteNicki 23e3025
add plotting functionality
SkafteNicki 659607f
fix conditional imports
SkafteNicki 06ad2b9
add requirement
SkafteNicki aa4a2aa
fix typing issues
SkafteNicki 5238d83
change requirement and checks on import
SkafteNicki 5edafa6
fix
SkafteNicki 9cc8604
Merge branch 'master' into newmetric/clip_iqa
Borda 0b5e455
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki e93a694
add link ref
SkafteNicki 3e1d746
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki ca4ecb4
skip on missing
SkafteNicki 518c5bb
F401
SkafteNicki 0dbdd2e
skip on older versions
SkafteNicki 67ad04e
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki 4a48b8f
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki 6e3957b
Merge branch 'master' into newmetric/clip_iqa
Borda f0c85a3
Merge branch 'master' into newmetric/clip_iqa
Borda 8255256
Merge branch 'master' into newmetric/clip_iqa
Borda 65665d8
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki 9fea319
fix mistake
SkafteNicki 39654ce
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki c0dfc97
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki 3b2d10e
another skip
SkafteNicki d68944f
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki 3ebdc23
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 5e0797c
Merge branch 'master' into newmetric/clip_iqa
mergify[bot] 9dc4327
move files
SkafteNicki df6fbd2
small corrections after moving files
SkafteNicki 68d6fd3
Uodate cmd
Borda 6396b67
Apply suggestions from code review
Borda 04fe382
Merge branch 'master' into newmetric/clip_iqa
SkafteNicki 5ebe5e9
improve based on suggestions
SkafteNicki 71c5dac
change import in doctests
SkafteNicki File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| .. customcarditem:: | ||
| :header: CLIP IQA | ||
| :image: https://pl-flash-data.s3.amazonaws.com/assets/thumbnails/image_classification.svg | ||
| :tags: Image | ||
|
|
||
| .. include:: ../links.rst | ||
|
|
||
| ######################################## | ||
| CLIP Image Quality Assessment (CLIP-IQA) | ||
| ######################################## | ||
|
|
||
| Module Interface | ||
| ________________ | ||
|
|
||
| .. autoclass:: torchmetrics.multimodal.CLIPImageQualityAssessment | ||
| :noindex: | ||
| :exclude-members: update, compute | ||
|
|
||
|
|
||
| Functional Interface | ||
| ____________________ | ||
|
|
||
| .. autofunction:: torchmetrics.functional.multimodal.clip_image_quality_assessment | ||
| :noindex: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,326 @@ | ||
| # Copyright The Lightning team. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import Dict, List, Literal, Tuple, Union | ||
|
|
||
| import torch | ||
| from torch import Tensor | ||
|
|
||
| from torchmetrics.functional.multimodal.clip_score import _get_clip_model_and_processor | ||
| from torchmetrics.utilities.checks import _SKIP_SLOW_DOCTEST, _try_proceed_with_timeout | ||
| from torchmetrics.utilities.imports import _PIQ_GREATER_EQUAL_0_8, _TRANSFORMERS_GREATER_EQUAL_4_10 | ||
|
|
||
| if _TRANSFORMERS_GREATER_EQUAL_4_10: | ||
| from transformers import CLIPModel as _CLIPModel | ||
| from transformers import CLIPProcessor as _CLIPProcessor | ||
|
|
||
| def _download_clip() -> None: | ||
| _CLIPModel.from_pretrained("openai/clip-vit-base-patch16") | ||
| _CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16") | ||
|
|
||
| if _SKIP_SLOW_DOCTEST and not _try_proceed_with_timeout(_download_clip): | ||
| __doctest_skip__ = ["clip_score"] | ||
|
|
||
| else: | ||
| __doctest_skip__ = ["clip_image_quality_assessment"] | ||
| _CLIPModel = None | ||
| _CLIPProcessor = None | ||
|
|
||
| if not _PIQ_GREATER_EQUAL_0_8: | ||
| __doctest_skip__ = ["clip_image_quality_assessment"] | ||
|
|
||
| _PROMPTS: Dict[str, Tuple[str, str]] = { | ||
| "quality": ("Good photo.", "Bad photo."), | ||
| "brightness": ("Bright photo.", "Dark photo."), | ||
| "noisiness": ("Clean photo.", "Noisy photo."), | ||
| "colorfullness": ("Colorful photo.", "Dull photo."), | ||
| "sharpness": ("Sharp photo.", "Blurry photo."), | ||
| "contrast": ("High contrast photo.", "Low contrast photo."), | ||
| "complexity": ("Complex photo.", "Simple photo."), | ||
| "natural": ("Natural photo.", "Synthetic photo."), | ||
| "happy": ("Happy photo.", "Sad photo."), | ||
| "scary": ("Scary photo.", "Peaceful photo."), | ||
| "new": ("New photo.", "Old photo."), | ||
| "warm": ("Warm photo.", "Cold photo."), | ||
| "real": ("Real photo.", "Abstract photo."), | ||
| "beutiful": ("Beautiful photo.", "Ugly photo."), | ||
| "lonely": ("Lonely photo.", "Sociable photo."), | ||
| "relaxing": ("Relaxing photo.", "Stressful photo."), | ||
| } | ||
|
|
||
|
|
||
| def _get_clip_iqa_model_and_processor( | ||
| model_name_or_path: Literal[ | ||
| "clip_iqa", | ||
| "openai/clip-vit-base-patch16", | ||
| "openai/clip-vit-base-patch32", | ||
| "openai/clip-vit-large-patch14-336", | ||
| "openai/clip-vit-large-patch14", | ||
| ] | ||
| ) -> Tuple[_CLIPModel, _CLIPProcessor]: | ||
| """Extract the CLIP model and processor from the model name or path.""" | ||
| if model_name_or_path == "clip_iqa": | ||
| if not _PIQ_GREATER_EQUAL_0_8: | ||
| raise ValueError( | ||
| "For metric `clip_iqa` to work with argument `model_name_or_path` set to default value `'clip_iqa'`" | ||
| ", package `piq` version v0.8.0 or later must be installed. Either install with `pip install piq` or" | ||
| "`pip install torchmetrics[image]`" | ||
| ) | ||
|
|
||
| import piq | ||
|
|
||
| model = piq.clip_iqa.clip.load().eval() | ||
| # any model checkpoint can be used here because the tokenizer is the same for all | ||
| processor = _CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16") | ||
| return model, processor | ||
| return _get_clip_model_and_processor(model_name_or_path) | ||
|
|
||
|
|
||
| def _clip_iqa_format_prompts(prompts: Tuple[Union[str, Tuple[str, str]]] = ("quality",)) -> Tuple[List[str], List[str]]: | ||
| """Converts the provided keywords into a list of prompts for the model to calculate the achor vectors. | ||
Borda marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Args: | ||
| prompts: A string, list of strings or tuple of strings. If a string is provided, it must be one of the | ||
| availble prompts. If a list of strings is provided, all strings must be one of the availble prompts. | ||
| If a tuple of strings is provided, it must be of length 2 and the first string must be a positive prompt | ||
| and the second string must be a negative prompt. | ||
SkafteNicki marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Returns: | ||
| Tuple containing a list of prompts and a list of the names of the prompts. The first list is double the length | ||
| of the second list. | ||
|
|
||
| Examples:: | ||
|
|
||
| >>> # single prompt | ||
| >>> _clip_iqa_format_prompts(("quality",)) | ||
| (['Good photo.', 'Bad photo.'], ['quality']) | ||
| >>> # multiple prompts | ||
| >>> _clip_iqa_format_prompts(("quality", "brightness")) | ||
| (['Good photo.', 'Bad photo.', 'Bright photo.', 'Dark photo.'], ['quality', 'brightness']) | ||
| >>> # Custom prompts | ||
| >>> _clip_iqa_format_prompts(("quality", ("Super good photo.", "Super bad photo."))) | ||
| (['Good photo.', 'Bad photo.', 'Super good photo.', 'Super bad photo.'], ['quality', 'user_defined_0']) | ||
|
|
||
| """ | ||
| if not isinstance(prompts, tuple): | ||
| raise ValueError("Argument `prompts` must be a tuple containing strings or tuples of strings") | ||
|
|
||
| prompts_names: List[str] = [] | ||
| prompts_list: List[str] = [] | ||
| count = 0 | ||
| for p in prompts: | ||
| if not isinstance(p, (str, tuple)): | ||
| raise ValueError("Argument `prompts` must be a tuple containing strings or tuples of strings") | ||
| if isinstance(p, str): | ||
| if p not in _PROMPTS: | ||
| raise ValueError( | ||
| f"All elements of `prompts` must be one of {_PROMPTS.keys()} if not custom tuple promts, got {p}." | ||
| ) | ||
| prompts_names.append(p) | ||
| prompts_list.extend(_PROMPTS[p]) | ||
| if isinstance(p, tuple) and len(p) != 2: | ||
| raise ValueError("If a tuple is provided in argument `prompts`, it must be of length 2") | ||
| if isinstance(p, tuple): | ||
| prompts_names.append(f"user_defined_{count}") | ||
| prompts_list.extend(p) | ||
| count += 1 | ||
|
|
||
| return prompts_list, prompts_names | ||
|
|
||
|
|
||
| def _clip_iqa_get_anchor_vectors( | ||
| model_name_or_path: str, | ||
| model: _CLIPModel, | ||
| processor: _CLIPProcessor, | ||
| prompts_list: List[str], | ||
| device: Union[str, torch.device], | ||
| ) -> Tensor: | ||
| """Calculates the anchor vectors for the CLIP IQA metric. | ||
|
|
||
| Args: | ||
| model_name_or_path: string indicating the version of the CLIP model to use. | ||
| model: The CLIP model | ||
| processor: The CLIP processor | ||
| prompts_list: A list of prompts | ||
| device: The device to use for the calculation | ||
|
|
||
| """ | ||
| if model_name_or_path == "clip_iqa": | ||
| text_processed = processor(text=prompts_list) | ||
| anchors_text = torch.zeros( | ||
| len(prompts_list), processor.tokenizer.model_max_length, dtype=torch.long, device=device | ||
| ) | ||
| for i, tp in enumerate(text_processed["input_ids"]): | ||
| anchors_text[i, : len(tp)] = torch.tensor(tp, dtype=torch.long, device=device) | ||
|
|
||
| anchors = model.encode_text(anchors_text).float() | ||
| else: | ||
| text_processed = processor(text=prompts_list, return_tensors="pt", padding=True) | ||
| anchors = model.get_text_features( | ||
| text_processed["input_ids"].to(device), text_processed["attention_mask"].to(device) | ||
| ) | ||
| return anchors / anchors.norm(p=2, dim=-1, keepdim=True) | ||
|
|
||
|
|
||
| def _clip_iqa_update( | ||
| model_name_or_path: str, | ||
| images: Tensor, | ||
| model: _CLIPModel, | ||
| processor: _CLIPProcessor, | ||
| data_range: Union[int, float], | ||
| device: Union[str, torch.device], | ||
| ) -> Tensor: | ||
| images = images / float(data_range) | ||
| """Update function for CLIP IQA.""" | ||
| if model_name_or_path == "clip_iqa": | ||
| default_mean = torch.tensor([0.48145466, 0.4578275, 0.40821073], device=device).view(1, 3, 1, 1) | ||
| default_std = torch.tensor([0.26862954, 0.26130258, 0.27577711], device=device).view(1, 3, 1, 1) | ||
SkafteNicki marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| images = (images - default_mean) / default_std | ||
| img_features = model.encode_image(images.float(), pos_embedding=False).float() | ||
| else: | ||
| processed_input = processor(images=[i.cpu() for i in images], return_tensors="pt", padding=True) | ||
| img_features = model.get_image_features(processed_input["pixel_values"].to(device)) | ||
| return img_features / img_features.norm(p=2, dim=-1, keepdim=True) | ||
|
|
||
|
|
||
| def _clip_iqa_compute( | ||
| img_features: Tensor, | ||
| anchors: Tensor, | ||
| prompts_names: List[str], | ||
| format_as_dict: bool = True, | ||
| ) -> Union[Tensor, Dict[str, Tensor]]: | ||
| """Final computation of CLIP IQA.""" | ||
| logits_per_image = 100 * img_features @ anchors.t() | ||
| probs = logits_per_image.reshape(logits_per_image.shape[0], -1, 2).softmax(-1)[:, :, 0] | ||
| if len(prompts_names) == 1: | ||
| return probs.squeeze() | ||
| if format_as_dict: | ||
| return {p: probs[:, i] for i, p in enumerate(prompts_names)} | ||
| return probs | ||
|
|
||
|
|
||
| def clip_image_quality_assessment( | ||
| images: Tensor, | ||
| model_name_or_path: Literal[ | ||
| "clip_iqa", | ||
| "openai/clip-vit-base-patch16", | ||
| "openai/clip-vit-base-patch32", | ||
| "openai/clip-vit-large-patch14-336", | ||
| "openai/clip-vit-large-patch14", | ||
| ] = "clip_iqa", | ||
| data_range: Union[int, float] = 1.0, | ||
| prompts: Tuple[Union[str, Tuple[str, str]]] = ("quality",), | ||
| ) -> Union[Tensor, Dict[str, Tensor]]: | ||
| """Calculates `CLIP-IQA`_, that can be used to measure the visual content of images. | ||
|
|
||
| The metric is based on the `CLIP`_ model, which is a neural network trained on a variety of (image, text) pairs to | ||
| be able to generate a vector representation of the image and the text that is similar if the image and text are | ||
| semantically similar. | ||
|
|
||
| The metric works by calculating the cosine similarity between user provided images and pre-defined promts. The | ||
| promts always comes in pairs of "positive" and "negative" such as "Good photo." and "Bad photo.". By calculating | ||
Borda marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| the similartity between image embeddings and both the "positive" and "negative" prompt, the metric can determine | ||
| which prompt the image is more similar to. The metric then returns the probability that the image is more similar | ||
| to the first prompt than the second prompt. | ||
|
|
||
| Build in promts are: | ||
| * quality: "Good photo." vs "Bad photo." | ||
| * brightness: "Bright photo." vs "Dark photo." | ||
| * noisiness: "Clean photo." vs "Noisy photo." | ||
| * colorfullness: "Colorful photo." vs "Dull photo." | ||
| * sharpness: "Sharp photo." vs "Blurry photo." | ||
| * contrast: "High contrast photo." vs "Low contrast photo." | ||
| * complexity: "Complex photo." vs "Simple photo." | ||
| * natural: "Natural photo." vs "Synthetic photo." | ||
| * happy: "Happy photo." vs "Sad photo." | ||
| * scary: "Scary photo." vs "Peaceful photo." | ||
| * new: "New photo." vs "Old photo." | ||
| * warm: "Warm photo." vs "Cold photo." | ||
| * real: "Real photo." vs "Abstract photo." | ||
| * beutiful: "Beautiful photo." vs "Ugly photo." | ||
| * lonely: "Lonely photo." vs "Sociable photo." | ||
| * relaxing: "Relaxing photo." vs "Stressful photo." | ||
|
|
||
| Args: | ||
| images: Either a single [N, C, H, W] tensor or a list of [C, H, W] tensors | ||
Borda marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| model_name_or_path: string indicating the version of the CLIP model to use. By default this argument is set to | ||
| ``clip_iqa`` which corresponds to the model used in the original paper. Other availble models are | ||
| `"openai/clip-vit-base-patch16"`, `"openai/clip-vit-base-patch32"`, `"openai/clip-vit-large-patch14-336"` | ||
| and `"openai/clip-vit-large-patch14"` | ||
| data_range: The maximum value of the input tensor. For example, if the input images are in range [0, 255], | ||
| data_range should be 255. The images are normalized by this value. | ||
| prompts: A string, tuple of strings or nested tuple of strings. If a string is provided, it must be one of the | ||
| availble prompts. If a tuple of strings is provided, all strings must be one of the availble prompts. | ||
| If a nested tuple of strings is provided, it must be of length 2 and the first string must be a positive | ||
| prompt and the second string must be a negative prompt. | ||
|
|
||
| .. note:: If using the default `clip_iqa` model, the package `piq` must be installed. Either install with | ||
| `pip install piq` or `pip install torchmetrics[image]`. | ||
Borda marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Returns: | ||
| A tensor of shape ``(N,)`` if a single promts is provided. If a list of promts is provided, a dictionary of | ||
| with the promts as keys and tensors of shape ``(N,)`` as values. | ||
|
|
||
| Raises: | ||
| ModuleNotFoundError: | ||
| If transformers package is not installed or version is lower than 4.10.0 | ||
| ValueError: | ||
| If not all images have format [C, H, W] | ||
| ValueError: | ||
| If promts is a tuple and it is not of length 2 | ||
| ValueError: | ||
| If promts is a string and it is not one of the available promts | ||
| ValueError: | ||
| If promts is a list of strings and not all strings are one of the available promts | ||
|
|
||
| Example:: | ||
| Single promt: | ||
|
|
||
| >>> from torchmetrics.functional.image import clip_image_quality_assessment | ||
| >>> import torch | ||
| >>> _ = torch.manual_seed(42) | ||
| >>> imgs = torch.randint(255, (2, 3, 224, 224)).float() | ||
| >>> clip_image_quality_assessment(imgs, prompts=("quality",)) | ||
| tensor([0.8894, 0.8902]) | ||
|
|
||
| Example:: | ||
| Multiple promts: | ||
|
|
||
| >>> from torchmetrics.functional.image import clip_image_quality_assessment | ||
| >>> import torch | ||
| >>> _ = torch.manual_seed(42) | ||
| >>> imgs = torch.randint(255, (2, 3, 224, 224)).float() | ||
| >>> clip_image_quality_assessment(imgs, prompts=("quality", "brightness")) | ||
| {'quality': tensor([0.8894, 0.8902]), 'brightness': tensor([0.5507, 0.5208])} | ||
|
|
||
| Example:: | ||
| Custom promts. Must always be a tuple of length 2, with a positive and negative prompt. | ||
|
|
||
| >>> from torchmetrics.functional.image import clip_image_quality_assessment | ||
| >>> import torch | ||
| >>> _ = torch.manual_seed(42) | ||
| >>> imgs = torch.randint(255, (2, 3, 224, 224)).float() | ||
| >>> clip_image_quality_assessment(imgs, prompts=(("Super good photo.", "Super bad photo."), "brightness")) | ||
| {'user_defined_0': tensor([0.9652, 0.9629]), 'brightness': tensor([0.5507, 0.5208])} | ||
|
|
||
| """ | ||
| prompts_list, prompts_names = _clip_iqa_format_prompts(prompts) | ||
|
|
||
| model, processor = _get_clip_iqa_model_and_processor(model_name_or_path) | ||
| device = images.device | ||
| model = model.to(device) | ||
|
|
||
| with torch.inference_mode(): | ||
| anchors = _clip_iqa_get_anchor_vectors(model_name_or_path, model, processor, prompts_list, device) | ||
| img_features = _clip_iqa_update(model_name_or_path, images, model, processor, data_range, device) | ||
| return _clip_iqa_compute(img_features, anchors, prompts_names) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.