A tool to analyze TikTok videos and identify frames with the clearest images of outfits, using GPT-4 Vision. It intelligently clusters and groups different outfits to ensure each unique outfit in a video is identified.
- Download TikTok videos from URLs
- Extract frames at customizable intervals
- Analyze frames using GPT-4 Vision to identify outfits
- Score frames based on outfit clarity and completeness
- Intelligently cluster frames by unique outfits
- Display the best representative frame for each distinct outfit
- Export detailed analysis as JSON with outfit grouping
- Clone this repository
- Install dependencies:
pip install -e .python main.py VIDEO_URL_OR_PATH --output-dir ./resultsvideo_source: URL or local path to the TikTok video (required)--output-dir: Directory to save extracted frames and results--api-key: OpenAI API key (alternatively, set theOPENAI_API_KEYenvironment variable)--sample-rate: Extract every Nth frame (default: 15)--top-n: Number of top frames to display (default: 20, only affects fallback behavior)
# Using a TikTok URL
python main.py https://www.tiktok.com/@username/video/1234567890 --output-dir ./outfit_results
# Using a local video file
python main.py /path/to/video.mp4 --sample-rate 10 --top-n 3- The script downloads the TikTok video (if a URL is provided)
- Frames are extracted at regular intervals (specified by sample rate)
- GPT-4 Vision analyzes each frame for outfits, providing detailed descriptions and clarity scores
- The frames are ranked by their outfit clarity scores
- GPT-4 Vision analyzes the top scoring frames again to cluster them into unique outfits
- The best frame from each unique outfit cluster is selected and displayed
- All results, including outfit clusters, are saved to the output directory
The tool uses GPT-4 Vision to identify when multiple frames show the same outfit from different angles or timestamps. This ensures that:
- If a video contains 4 different outfits, all 4 will be represented in the results
- The highest quality frame for each unique outfit is prioritized
- Related frames (same outfit, different angles) are grouped together
- The output focuses on outfit diversity rather than just frame quality
- Python 3.10+
- OpenAI API key with access to GPT-4 Vision
MIT