[Feature]integrate LazyLLM framework to support more text/image model#158
Conversation
…ext model and image model
…ensetime-LazyLLM-framework
…ensetime-LazyLLM-framework
Summary of ChangesHello @uestcsh917-art, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在通过集成LazyLLM框架,显著扩展系统对各种AI模型提供商的支持。这一改进使得用户能够更灵活地选择和切换不同的文本生成、图像生成及图像内容描述模型,从而提升了系统的多功能性和可扩展性。通过统一的接口和简化的配置,开发者可以更便捷地接入来自不同厂商的先进AI能力,为未来的功能迭代奠定基础。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| self._lazyllm_api_keys = lazyllm_api_keys | ||
| self._lazyllm_image_caption_source = lazyllm_image_caption_source |
There was a problem hiding this comment.
在__init__方法中,_lazyllm_api_keys被初始化为None。然而,在_get_lazyllm_client方法中,api_key = self._lazyllm_api_keys.get(source, "")这行代码直接访问了_lazyllm_api_keys的.get()方法,如果_lazyllm_api_keys为None,则会引发AttributeError。建议在访问_lazyllm_api_keys之前添加一个None检查。
| self._lazyllm_api_keys = lazyllm_api_keys | |
| self._lazyllm_image_caption_source = lazyllm_image_caption_source | |
| self._lazyllm_api_keys = lazyllm_api_keys or {} | |
| self._lazyllm_image_caption_source = lazyllm_image_caption_source |
| # import lazyllm | ||
| from typing import Optional, List | ||
| from PIL import Image | ||
| from .base import ImageProvider | ||
| from config import get_config | ||
| # from lazyllm.components.formatter import decode_query_with_filepaths | ||
| # from lazyllm import LOG |
There was a problem hiding this comment.
文件顶部的lazyllm相关导入语句被注释掉了,但在__init__方法中又在try-except块内重新导入。这种做法有些冗余且可能引起混淆。建议删除这些被注释掉的导入语句,只保留try-except块内的导入,以保持代码的清晰性。
| # import lazyllm | |
| from typing import Optional, List | |
| from PIL import Image | |
| from .base import ImageProvider | |
| from config import get_config | |
| # from lazyllm.components.formatter import decode_query_with_filepaths | |
| # from lazyllm import LOG | |
| from typing import Optional, List | |
| from PIL import Image | |
| from .base import ImageProvider | |
| from config import get_config |
| text_source = _get_config_value('LAZYLLM_TEXT_SOURCE', 'siliconflow') | ||
| image_source = _get_config_value('LAZYLLM_IMAGE_SOURCE', 'siliconflow') |
There was a problem hiding this comment.
在_get_provider_config方法中,LAZYLLM_TEXT_SOURCE和LAZYLLM_IMAGE_SOURCE的默认值被设置为'siliconflow'。然而,在.env.example和backend/config.py中,这些变量的默认值分别是'deepseek'和'doubao'。这种默认值的不一致可能会导致配置上的混淆。建议将此处的默认值与.env.example和backend/config.py中的默认值保持一致。
| text_source = _get_config_value('LAZYLLM_TEXT_SOURCE', 'siliconflow') | |
| image_source = _get_config_value('LAZYLLM_IMAGE_SOURCE', 'siliconflow') | |
| text_source = _get_config_value('LAZYLLM_TEXT_SOURCE', 'deepseek') | |
| image_source = _get_config_value('LAZYLLM_IMAGE_SOURCE', 'doubao') |
README.md
Outdated
| LAZYLLM_TEXT_SOURCE=deepseek # 文本生成厂商 | ||
| LAZYLLM_IMAGE_SOURCE=doubao # 图片生成厂商 |
There was a problem hiding this comment.
在README.md的LazyLLM配置示例中,缺少了LAZYLLM_IMAGE_CAPTION_SOURCE的说明。.env.example中包含了此配置项,为了文档的完整性和清晰性,建议在README.md中也添加对它的说明。
| LAZYLLM_TEXT_SOURCE=deepseek # 文本生成厂商 | |
| LAZYLLM_IMAGE_SOURCE=doubao # 图片生成厂商 | |
| LAZYLLM_TEXT_SOURCE=deepseek # 文本生成厂商 | |
| LAZYLLM_IMAGE_SOURCE=doubao # 图片生成厂商 | |
| LAZYLLM_IMAGE_CAPTION_SOURCE=qwen # 图片描述厂商 |
…ensetime-LazyLLM-framework
…ensetime-LazyLLM-framework
…ensetime-LazyLLM-framework
|
@Anionex Hi, I have synced the latest code for this PR and resolved all conflicts. New feature enables easily integrate many models of different providers(including doubao,qwen,siliconflow,...) via the LazyLLM framework. Could you please take some time to review it? I will make timely revisions if there are any issues. |
|
Thanks🚀, I’ll review it soon. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This PR successfully integrates the LazyLLM framework, extending support for various text and image models, which is a valuable enhancement. However, several issues require attention. Critical security vulnerabilities related to predictable temporary filenames, race conditions, and resource leakage were identified. Additionally, the review highlighted an uninitialized variable that could lead to runtime errors, inconsistencies in default configurations, unused imports and parameters, and hardcoded paths in demo scripts. Addressing these points will significantly improve the code's robustness, reliability, and maintainability.
Some reasoning models include <think>...</think> blocks in their output. Add strip_think_tags() utility and apply it in all text providers and image caption generation to clean these tags from responses.
Use the user's configured resolution from settings instead of hardcoding 1K, which is unsupported by some models like seedream-4-5. Update test description to say '固定分辨率' instead of '1K'.
Include vendor SDKs (dashscope for qwen, zhipuai for glm) in the lazyllm optional dependency group so they install together with lazyllm.
All lazyllm vendor APIs (qwen/dashscope, doubao, siliconflow) expect width*height format. Convert resolution (1K/2K/4K) + aspect ratio to actual pixel dimensions instead of passing shorthand values.
When LazyLLM format is selected, hide irrelevant API Base URL / API Key fields and show vendor source dropdowns (text/image/caption model) with dynamic per-vendor API key inputs. Includes backend model fields, migration, controller sync logic, and i18n support.
ff15ccf to
51512eb
Compare
Doubao API requires 'WIDTHxHEIGHT' format (with 'x'), not 'WIDTH*HEIGHT'.
Some models (doubao-seedream) require >= 3686400 total pixels. Scale up dimensions when needed and round up to 64-byte alignment.
- qwen: uses '*' separator, max 2048px per dimension - doubao: uses 'x' separator, min 3686400 total pixels - Scale up for doubao minimum, cap for qwen maximum
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for the LazyLLM framework as a new AI provider option, alongside existing OpenAI and Gemini formats. Key changes include adding LazyLLM-specific configuration fields for text, image, and image caption model sources, as well as vendor API keys, to the .env.example file, backend configuration, and database settings. The backend now includes new LazyLLMTextProvider and LazyLLMImageProvider classes, along with a lazyllm_env.py utility for API key management. The FileParserService has been updated to support image captioning via LazyLLM, and a strip_think_tags utility was added to clean AI responses across all text providers. The README.md and README_EN.md files were updated to reflect the new configuration options and revised Docker logging commands. The frontend Settings page was modified to include UI elements for selecting LazyLLM as a provider and configuring its specific model sources and API keys. Review comments highlighted a potential environment variable injection vulnerability in _sync_settings_to_config due to unvalidated vendor strings, suggested removing sudo from docker compose commands in the documentation for broader applicability, recommended refactoring complex image dimension calculation logic in LazyLLMImageProvider into a separate helper method, and advised reusing the strip_think_tags function in _generate_single_caption to avoid code duplication.
1. Security: Add whitelist for lazyllm vendor names to prevent environment variable injection via arbitrary vendor strings 2. Refactor: Extract image dimension calculation into a standalone _calculate_image_dimensions() function with clear vendor constraints documented in VENDOR_IMAGE_CONSTRAINTS dict 3. DRY: Reuse strip_think_tags() from text providers module instead of duplicating regex logic in file_parser_service
延伸改进工作基于原 PR 进行了以下修复和增强: 🔧 Bug 修复
🔒 安全修复
📦 依赖
✅ 测试验证
|
功能概述
新增LazyLLM框架接入,支持便捷调用不同厂商的文本/图像生成模型
1. 文本生成(LazyllmTextProvider)
LazyllmTextProvider接口,自动路由到指定模型和厂商,便于灵活切换和扩展。2. 图片生成与编辑(LazyllmImageProvider)
3. 图片内容描述
实现细节
backend/services/ai_provider/text中新增lazyllm_provider,通过lazyllm包调用不同厂商的文本生成模型backend/services/ai_provider/image中新增lazyllm_provider,通过lazyllm包调用不同厂商的图像编辑模型backend/services/file_parser_service.py中新增lazyllm接入vlm模型;在backend/controllers/reference_file_controller.py中新增对应的api导入pyproject.toml中新增lazyllm及其相关依赖注意事项
pyproject.toml中的依赖pillow>=12.0.0,google-genai>=1.52.0与lazyllm的依赖存在冲突,如果你想使用lazyllm,请改为pillow<12.0.0,>=8.0.0,google-genai<1.52.0再导入,并运行uv pip install '.[sdk]'安装lazyllm和必要的sdk兼容性问题已修复