GitHub - muktac5/Visual-Goal-Guidance

Visual Goal-Guidance: Step Inference for Instruction Retrieval

Description: Given a text-based goal and a set of images, the idea is to retrieve all the images that corresponding to the steps leading up to the goal and eventually order them in the next best action based order.

Existing Implementation: Given a textual goal and 4 images, identify 1 image which corresponds to the steps leading upto the goal. Link: https://aclanthology.org/2021.emnlp-main.165.pdf

Our primary implementation is predominantly encapsulated in the following notebooks: "Goal_step_relevance_LLava.ipynb," "Intent_data_prep_for_step_ordering," and "Step_ordering.ipynb."

In our exploration, we employed various models, including ViT and CLIP. Subsequently, we progressed towards an end-to-end trained large multimodal model that integrates a vision encoder and Vicuna for comprehensive visual and language understanding purposes.

Remarkably, we achieved commendable performance without the necessity for prior training with a smaller model in comparison with the custom model integration with LLM and CLIP using Microsoft GIT. It is important to acknowledge the challenges associated with intent detection from images, such as discerning between visually similar objects like celery, scallion, and asparagus.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
COCO_LLM_VIT.ipynb		COCO_LLM_VIT.ipynb
Goal_step_relevance_LLava.ipynb		Goal_step_relevance_LLava.ipynb
Intent_Detection_using_CLIP.ipynb		Intent_Detection_using_CLIP.ipynb
Intent_Detection_using_Vision_Transformer.ipynb		Intent_Detection_using_Vision_Transformer.ipynb
Intent_data_prep_for_step_ordering.ipynb		Intent_data_prep_for_step_ordering.ipynb
README.md		README.md
Step_ordering.ipynb		Step_ordering.ipynb
VGSI_Existing_triplet_network_implementation.ipynb		VGSI_Existing_triplet_network_implementation.ipynb
Wikihow_Data_Analysis.ipynb		Wikihow_Data_Analysis.ipynb
sample_test_data_on_intent_generated_from_dress_images - val.csv		sample_test_data_on_intent_generated_from_dress_images - val.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages