Consider swapping out BLIP2 for an actual VLM

The current kalliste pipeline uses BLIP2 for image understanding, but newer Vision-Language Models (VLMs) offer significantly better performance and capabilities.

## Current State
- Using BLIP2 for image captioning and understanding
- BLIP2 is from 2023 and has been superseded by more capable models

## Proposed Alternatives
Consider evaluating and potentially migrating to:
- **GPT-4V / GPT-4o** - Excellent multimodal capabilities, API-based
- **Claude 3.5 Sonnet** - Strong vision capabilities, good for analysis
- **LLaVA-1.5/1.6** - Open source, good performance, self-hostable
- **InternVL** - Strong open source option
- **Qwen-VL** - Alibaba's VLM, good multilingual support
- **CogVLM** - Good open source alternative

## Evaluation Criteria
- **Performance**: Accuracy on image understanding tasks
- **Speed**: Inference time and throughput
- **Cost**: API costs vs self-hosting requirements
- **Integration**: Ease of integration with existing pipeline
- **Capabilities**: Support for complex reasoning, multiple images, etc.
- **Licensing**: Commercial usage requirements

## Implementation Plan
1. Research and benchmark candidate VLMs
2. Create evaluation dataset from existing kalliste use cases
3. Implement proof of concept with top 2-3 candidates
4. Performance and cost analysis
5. Migration plan with fallback options

## Benefits
- Better image understanding and description quality
- More sophisticated reasoning about image content
- Potential for advanced features (object counting, spatial reasoning, etc.)
- Future-proofing the pipeline with more capable models

This upgrade could significantly improve the quality of kalliste's image processing and analysis capabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider swapping out BLIP2 for an actual VLM #5

Current State

Proposed Alternatives

Evaluation Criteria

Implementation Plan

Benefits

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Consider swapping out BLIP2 for an actual VLM #5

Description

Current State

Proposed Alternatives

Evaluation Criteria

Implementation Plan

Benefits

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions