Skip to content

[Feature]integrate LazyLLM framework to support more text/image model#158

Merged
Anionex merged 39 commits intoAnionex:mainfrom
uestcsh917-art:feature/integrate-sensetime-LazyLLM-framework
Feb 13, 2026
Merged

[Feature]integrate LazyLLM framework to support more text/image model#158
Anionex merged 39 commits intoAnionex:mainfrom
uestcsh917-art:feature/integrate-sensetime-LazyLLM-framework

Conversation

@uestcsh917-art
Copy link
Contributor

@uestcsh917-art uestcsh917-art commented Jan 12, 2026

功能概述

新增LazyLLM框架接入,支持便捷调用不同厂商的文本/图像生成模型

1. 文本生成(LazyllmTextProvider)

  • 支持多种主流大模型(如 Qwen、Deepseek、doubao、GLM、MINIMAX、sensenova 等)的文本生成能力。
  • 通过统一的 LazyllmTextProvider 接口,自动路由到指定模型和厂商,便于灵活切换和扩展。

2. 图片生成与编辑(LazyllmImageProvider)

  • 集成 Lazyllm 框架的图片生成与编辑能力,目前支持 doubao、qwen、siliconflow 等厂商和平台的主流图片编辑模型。
  • 兼容多种分辨率和比例,自动处理 API 返回格式,提升鲁棒性和易用性。
  • 完善异常处理与日志记录,便于定位图片生成失败原因。

3. 图片内容描述

  • 集成Lazyllm 框架内的vlm模型,用于处理用户传入的参考文件中的图片,生成文字描述

实现细节

  1. backend/services/ai_provider/text中新增lazyllm_provider,通过lazyllm包调用不同厂商的文本生成模型
  2. backend/services/ai_provider/image中新增lazyllm_provider,通过lazyllm包调用不同厂商的图像编辑模型
  3. backend/services/file_parser_service.py中新增lazyllm接入vlm模型;在backend/controllers/reference_file_controller.py中新增对应的api导入
  4. pyproject.toml中新增lazyllm及其相关依赖

注意事项

pyproject.toml中的依赖pillow>=12.0.0,google-genai>=1.52.0与lazyllm的依赖存在冲突,如果你想使用lazyllm,请改为pillow<12.0.0,>=8.0.0,google-genai<1.52.0再导入,并运行uv pip install '.[sdk]'安装lazyllm和必要的sdk
兼容性问题已修复

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @uestcsh917-art, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在通过集成LazyLLM框架,显著扩展系统对各种AI模型提供商的支持。这一改进使得用户能够更灵活地选择和切换不同的文本生成、图像生成及图像内容描述模型,从而提升了系统的多功能性和可扩展性。通过统一的接口和简化的配置,开发者可以更便捷地接入来自不同厂商的先进AI能力,为未来的功能迭代奠定基础。

Highlights

  • LazyLLM框架集成: 引入LazyLLM框架,以支持更广泛的文本和图像生成模型,提升了系统的多功能性和可扩展性。
  • 多厂商文本模型支持: 通过统一的LazyllmTextProvider接口,支持Qwen、Deepseek、doubao、GLM、MINIMAX、sensenova等主流大模型的文本生成能力。
  • 多厂商图像模型支持: 集成了LazyLLM框架的图像生成与编辑能力,目前支持doubao、qwen、siliconflow等厂商。
  • 图像内容描述: 整合了LazyLLM框架内的VLM模型,用于从用户传入的参考图片中生成文字描述。
  • 配置与依赖更新: 更新了.env.exampleREADME.mdbackend/config.py以包含LazyLLM相关的配置项和API Key设置,并在pyproject.toml中新增了lazyllm及其相关SDK依赖。
  • 依赖冲突解决提示: 提供了pillowgoogle-genai依赖冲突的解决方案,确保LazyLLM的顺利安装和运行。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此拉取请求集成了LazyLLM框架,以支持更广泛的文本和图像生成模型,显著增强了应用程序的灵活性和功能。更改包括更新配置、为文本和图像生成添加新的提供程序实现,以及在文件解析服务中集成LazyLLM进行图像描述。同时,README.md.env.example中的文档也已更新,以反映新的配置选项。

Comment on lines 86 to 87
self._lazyllm_api_keys = lazyllm_api_keys
self._lazyllm_image_caption_source = lazyllm_image_caption_source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

__init__方法中,_lazyllm_api_keys被初始化为None。然而,在_get_lazyllm_client方法中,api_key = self._lazyllm_api_keys.get(source, "")这行代码直接访问了_lazyllm_api_keys.get()方法,如果_lazyllm_api_keysNone,则会引发AttributeError。建议在访问_lazyllm_api_keys之前添加一个None检查。

Suggested change
self._lazyllm_api_keys = lazyllm_api_keys
self._lazyllm_image_caption_source = lazyllm_image_caption_source
self._lazyllm_api_keys = lazyllm_api_keys or {}
self._lazyllm_image_caption_source = lazyllm_image_caption_source

Comment on lines 15 to 21
# import lazyllm
from typing import Optional, List
from PIL import Image
from .base import ImageProvider
from config import get_config
# from lazyllm.components.formatter import decode_query_with_filepaths
# from lazyllm import LOG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

文件顶部的lazyllm相关导入语句被注释掉了,但在__init__方法中又在try-except块内重新导入。这种做法有些冗余且可能引起混淆。建议删除这些被注释掉的导入语句,只保留try-except块内的导入,以保持代码的清晰性。

Suggested change
# import lazyllm
from typing import Optional, List
from PIL import Image
from .base import ImageProvider
from config import get_config
# from lazyllm.components.formatter import decode_query_with_filepaths
# from lazyllm import LOG
from typing import Optional, List
from PIL import Image
from .base import ImageProvider
from config import get_config

Comment on lines 185 to 186
text_source = _get_config_value('LAZYLLM_TEXT_SOURCE', 'siliconflow')
image_source = _get_config_value('LAZYLLM_IMAGE_SOURCE', 'siliconflow')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

_get_provider_config方法中,LAZYLLM_TEXT_SOURCELAZYLLM_IMAGE_SOURCE的默认值被设置为'siliconflow'。然而,在.env.examplebackend/config.py中,这些变量的默认值分别是'deepseek''doubao'。这种默认值的不一致可能会导致配置上的混淆。建议将此处的默认值与.env.examplebackend/config.py中的默认值保持一致。

Suggested change
text_source = _get_config_value('LAZYLLM_TEXT_SOURCE', 'siliconflow')
image_source = _get_config_value('LAZYLLM_IMAGE_SOURCE', 'siliconflow')
text_source = _get_config_value('LAZYLLM_TEXT_SOURCE', 'deepseek')
image_source = _get_config_value('LAZYLLM_IMAGE_SOURCE', 'doubao')

README.md Outdated
Comment on lines 219 to 220
LAZYLLM_TEXT_SOURCE=deepseek # 文本生成厂商
LAZYLLM_IMAGE_SOURCE=doubao # 图片生成厂商
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

README.md的LazyLLM配置示例中,缺少了LAZYLLM_IMAGE_CAPTION_SOURCE的说明。.env.example中包含了此配置项,为了文档的完整性和清晰性,建议在README.md中也添加对它的说明。

Suggested change
LAZYLLM_TEXT_SOURCE=deepseek # 文本生成厂商
LAZYLLM_IMAGE_SOURCE=doubao # 图片生成厂商
LAZYLLM_TEXT_SOURCE=deepseek # 文本生成厂商
LAZYLLM_IMAGE_SOURCE=doubao # 图片生成厂商
LAZYLLM_IMAGE_CAPTION_SOURCE=qwen # 图片描述厂商

@uestcsh917-art uestcsh917-art changed the title [Feature]integrate lazy llm framework to support more text/image model [Feature]integrate LazyLLM framework to support more text/image model Jan 13, 2026
@uestcsh917-art
Copy link
Contributor Author

@Anionex Hi, I have synced the latest code for this PR and resolved all conflicts. New feature enables easily integrate many models of different providers(including doubao,qwen,siliconflow,...) via the LazyLLM framework. Could you please take some time to review it? I will make timely revisions if there are any issues.

@Anionex
Copy link
Owner

Anionex commented Jan 23, 2026

Thanks🚀, I’ll review it soon.

@Anionex
Copy link
Owner

Anionex commented Jan 29, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR successfully integrates the LazyLLM framework, extending support for various text and image models, which is a valuable enhancement. However, several issues require attention. Critical security vulnerabilities related to predictable temporary filenames, race conditions, and resource leakage were identified. Additionally, the review highlighted an uninitialized variable that could lead to runtime errors, inconsistencies in default configurations, unused imports and parameters, and hardcoded paths in demo scripts. Addressing these points will significantly improve the code's robustness, reliability, and maintainability.

Some reasoning models include <think>...</think> blocks in their output.
Add strip_think_tags() utility and apply it in all text providers and
image caption generation to clean these tags from responses.
Use the user's configured resolution from settings instead of
hardcoding 1K, which is unsupported by some models like seedream-4-5.
Update test description to say '固定分辨率' instead of '1K'.
Include vendor SDKs (dashscope for qwen, zhipuai for glm) in the
lazyllm optional dependency group so they install together with lazyllm.
All lazyllm vendor APIs (qwen/dashscope, doubao, siliconflow) expect
width*height format. Convert resolution (1K/2K/4K) + aspect ratio to
actual pixel dimensions instead of passing shorthand values.
When LazyLLM format is selected, hide irrelevant API Base URL / API Key
fields and show vendor source dropdowns (text/image/caption model) with
dynamic per-vendor API key inputs. Includes backend model fields,
migration, controller sync logic, and i18n support.
@Anionex Anionex force-pushed the feature/integrate-sensetime-LazyLLM-framework branch from ff15ccf to 51512eb Compare February 13, 2026 13:45
Doubao API requires 'WIDTHxHEIGHT' format (with 'x'), not 'WIDTH*HEIGHT'.
Some models (doubao-seedream) require >= 3686400 total pixels.
Scale up dimensions when needed and round up to 64-byte alignment.
- qwen: uses '*' separator, max 2048px per dimension
- doubao: uses 'x' separator, min 3686400 total pixels
- Scale up for doubao minimum, cap for qwen maximum
@Anionex
Copy link
Owner

Anionex commented Feb 13, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the LazyLLM framework as a new AI provider option, alongside existing OpenAI and Gemini formats. Key changes include adding LazyLLM-specific configuration fields for text, image, and image caption model sources, as well as vendor API keys, to the .env.example file, backend configuration, and database settings. The backend now includes new LazyLLMTextProvider and LazyLLMImageProvider classes, along with a lazyllm_env.py utility for API key management. The FileParserService has been updated to support image captioning via LazyLLM, and a strip_think_tags utility was added to clean AI responses across all text providers. The README.md and README_EN.md files were updated to reflect the new configuration options and revised Docker logging commands. The frontend Settings page was modified to include UI elements for selecting LazyLLM as a provider and configuring its specific model sources and API keys. Review comments highlighted a potential environment variable injection vulnerability in _sync_settings_to_config due to unvalidated vendor strings, suggested removing sudo from docker compose commands in the documentation for broader applicability, recommended refactoring complex image dimension calculation logic in LazyLLMImageProvider into a separate helper method, and advised reusing the strip_think_tags function in _generate_single_caption to avoid code duplication.

1. Security: Add whitelist for lazyllm vendor names to prevent
   environment variable injection via arbitrary vendor strings

2. Refactor: Extract image dimension calculation into a standalone
   _calculate_image_dimensions() function with clear vendor constraints
   documented in VENDOR_IMAGE_CONSTRAINTS dict

3. DRY: Reuse strip_think_tags() from text providers module instead
   of duplicating regex logic in file_parser_service
@Anionex
Copy link
Owner

Anionex commented Feb 13, 2026

延伸改进工作

基于原 PR 进行了以下修复和增强:

🔧 Bug 修复

  1. 图像分辨率兼容性 - 不同厂商对 size 参数格式要求不同:

    • qwen: 使用 * 分隔符,限制 512-2048px
    • doubao: 使用 x 分隔符,要求最低 3,686,400 总像素
    • 已重构为 _calculate_image_dimensions() 函数,按厂商自动适配
  2. 清理 <think> 标签 - 部分推理模型(如 DeepSeek)返回内容包含 <think>...</think> 块,已在所有文本 provider 中添加 strip_think_tags() 清理

  3. 测试配置 - 图像模型测试改用用户设置的分辨率,而非硬编码 1K(部分模型不支持)

🔒 安全修复

  1. 环境变量注入防护 - 添加 ALLOWED_LAZYLLM_VENDORS 白名单,防止通过任意 vendor 名覆盖系统环境变量

📦 依赖

  1. 厂商 SDK - 将 dashscopezhipuai 加入 lazyllm optional dependencies,uv sync --extra lazyllm 一键安装

✅ 测试验证

  • qwen-image-edit-max: 2048*1152
  • doubao-seedream-4-5: 2560x1472

@Anionex Anionex merged commit 7b8d583 into Anionex:main Feb 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants