Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
55602c0
[feature]add lazyllm framework to integrate different providers for t…
Jan 8, 2026
2dff12f
new feature:add sensetime-lazyllm provider,integrate different image/…
Jan 9, 2026
b675350
Merge remote-tracking branch 'upstream/main' into feature/integrate-s…
Jan 9, 2026
a32c199
fix bug
Jan 9, 2026
2b1601f
Merge remote-tracking branch 'upstream/main' into feature/integrate-s…
Jan 12, 2026
bd92bd3
add feature:image description
Jan 12, 2026
add3d66
format update
Jan 12, 2026
2d2ed67
format update
Jan 12, 2026
d0b3a31
fix bugs / update format
Jan 13, 2026
cd5816c
fix the conflict of lazyllm
Jan 19, 2026
8614adb
Merge remote-tracking branch 'upstream/main' into feature/integrate-s…
Jan 19, 2026
3fed3de
resolve package conflict of lazyllm / add lazyllm test demo
Jan 19, 2026
11a1a56
files recovery
Jan 19, 2026
76b6b4a
Merge remote-tracking branch 'upstream/main' into feature/integrate-s…
Jan 22, 2026
5a78efa
[feature]Optimize the api_key configuration with the namespace featur…
Jan 22, 2026
4ad861d
fix bug
Jan 23, 2026
ef833a8
Merge remote-tracking branch 'upstream/main' into feature/integrate-s…
Jan 23, 2026
3ab7684
fix bug
Jan 23, 2026
03bb352
update version
Jan 30, 2026
f3a8758
fix bug:permission verification for image_caption_model in lazyllm
Jan 30, 2026
e2807ce
Merge branch 'main' into feature/integrate-sensetime-LazyLLM-framework
uestcsh917-art Feb 3, 2026
4bd34c3
merge: sync main into pr-158
Anionex Feb 12, 2026
3aad109
fix: complete lazyllm provider integration and config consistency
Anionex Feb 12, 2026
b0e95d1
fix: support vendor-prefixed lazyllm api keys
Anionex Feb 12, 2026
106006f
fix: require vendor-prefixed lazyllm api keys
Anionex Feb 12, 2026
c232acb
fix: remove AGENTS.md from pr-158
Anionex Feb 12, 2026
0b7fde1
feat(backend): register lazyllm providers and add volcengine dependency
Anionex Feb 13, 2026
d9be474
fix(backend): strip <think> tags from AI responses
Anionex Feb 13, 2026
a92cc69
fix(settings): use configured resolution for image model test
Anionex Feb 13, 2026
600da40
feat(deps): add dashscope and zhipuai to lazyllm optional dependencies
Anionex Feb 13, 2026
f8ef7c8
fix(backend): use pixel dimensions for lazyllm image resolution
Anionex Feb 13, 2026
2e797fc
feat(settings): add lazyllm provider frontend configuration
Anionex Feb 12, 2026
677b14f
fix(settings): move lazyllm vendor API key inputs right after source …
Anionex Feb 13, 2026
51512eb
style(settings): use orange gradient for LazyLLM format button
Anionex Feb 13, 2026
2778209
fix(backend): use 'x' separator for lazyllm image resolution dimensions
Anionex Feb 13, 2026
9ec8180
fix(backend): ensure minimum pixel count for lazyllm image resolution
Anionex Feb 13, 2026
a6db47c
fix(backend): handle vendor-specific image size formats and limits
Anionex Feb 13, 2026
5de6b7b
refactor(backend): address code review feedback
Anionex Feb 13, 2026
c26ee5d
chore: trigger CI
Anionex Feb 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,27 @@ OPENAI_TIMEOUT=300.0
# 最多重试次数,减少重试避免累积超时,默认2次
OPENAI_MAX_RETRIES=2

# Lazyllm 格式配置(当 AI_PROVIDER_FORMAT=lazyllm 时使用)
# 选择文本生成模型、图片生成模型、图片描述模型的提供厂商
TEXT_MODEL_SOURCE=qwen
IMAGE_MODEL_SOURCE=qwen
IMAGE_CAPTION_MODEL_SOURCE=qwen

# 国内各厂商 API Key(只需配置你所使用模型对应的厂商)
# 统一格式:{SOURCE}_API_KEY
DOUBAO_API_KEY=your-doubao-api-key
QWEN_API_KEY=your-qwen-api-key
DEEPSEEK_API_KEY=your-deepseek-api-key
GLM_API_KEY=your-glm-api-key
SILICONFLOW_API_KEY=your-siliconflow-api-key
SENSENOVA_API_KEY=your-sensenova-api-key
MINIMAX_API_KEY=your-minimax-api-key

# AI 模型配置
TEXT_MODEL=gemini-3-flash-preview
IMAGE_MODEL=gemini-3-pro-image-preview
# 图片识别模型配置(用于为解析文件中的图片生成描述)
IMAGE_CAPTION_MODEL=gemini-3-flash-preview

# Flask 配置
LOG_LEVEL=INFO
Expand All @@ -43,9 +61,6 @@ MAX_IMAGE_WORKERS=8
MINERU_TOKEN=your-mineru-token
MINERU_API_BASE=https://mineru.net

# 图片识别模型配置(用于为解析文件中的图片生成描述)
IMAGE_CAPTION_MODEL=gemini-3-flash-preview

# 可编辑导出服务配置
BAIDU_OCR_API_KEY=you-baidu-api-key

Expand Down
86 changes: 41 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@
### 5. 可自由编辑的pptx导出(Beta迭代中)
- **导出图像为高还原度、背景干净的、可自由编辑图像和文字的PPT页面**
- 相关更新见 https://github.com/Anionex/banana-slides/issues/121
- **若出现可编辑 ppt 效果不佳,如文字重叠、文字无样式等问题,一般为配置问题导致,可参考[可编辑PPTX导出常见问题及排查解决方案](https://github.com/Anionex/banana-slides/issues/121#issuecomment-3708872527) 进行排查**
<img width="1000" alt="image" src="https://github.com/user-attachments/assets/a85d2d48-1966-4800-a4bf-73d17f914062" />

<br>
Expand Down Expand Up @@ -143,7 +142,6 @@
* 修复导出相关 500、参考文件关联时序、outline/page 数据错位、任务轮询错误项目、描述生成无限轮询、图片预览内存泄漏、批量删除部分失败处理。
* 优化格式示例提示、HTTP 错误提示文案、Modal 关闭体验、清理旧项目 localStorage、移除首次创建项目冗余提示。
* 若干其他优化和修复

- 【1-4】 : v0.3.0发布:可编辑pptx导出全面升级:
* 支持最大程度还原图片中文字的字号、颜色、加粗等样式;
* 支持了识别表格中的文字内容;
Expand All @@ -155,25 +153,6 @@
- 【12-27】: 加入了对无图片模板模式的支持和较高质量的文字预设,现在可以通过纯文字描述的方式来控制ppt页面风格


## **🔧 常见问题**

1. **生成页面文字有乱码,文字不清晰**
- 可选择更高分辨率的输出(openai 格式可能不支持调高分辨率,建议使用gemini格式)。根据测试,生成页面前将 1k 分辨率调整至 2k 后,文字渲染质量会显著提升。
- 请确保在页面描述中包含具体要渲染的文字内容。

2. **导出可编辑 ppt 效果不佳,如文字重叠、无样式等**
- 90% 情况为 API 配置出现问题。可以参考 [issue 121](https://github.com/Anionex/banana-slides/issues/121) 中的排查与解决方案。

3. **支持免费层级的 Gemini API Key 吗?**
- 免费层级只支持文本生成,不支持图片生成。

4. **生成内容时提示 503 错误或 Retry Error**
- 可以根据 README 中的命令查看 Docker 后端日志,定位 503 问题的详细报错,一般是模型配置不正确导致。

5. **.env 中设置了 API Key 之后,为什么不生效?**
- 运行时编辑 `.env` 后需要重启 Docker 容器以应用更改。
- 如果曾在网页设置页中配置参数,会覆盖 `.env` 中的参数,可通过“还原默认设置”恢复为 `.env` 设置。

## 🗺️ 开发计划

| 状态 | 里程碑 |
Expand Down Expand Up @@ -244,6 +223,21 @@ OPENAI_API_BASE=https://api.openai.com/v1
# VERTEX_PROJECT_ID=your-gcp-project-id
# VERTEX_LOCATION=global
# GOOGLE_APPLICATION_CREDENTIALS=./gcp-service-account.json

# Lazyllm 格式配置(当 AI_PROVIDER_FORMAT=lazyllm 时使用)
# 选择文本生成和图片生成使用的厂商
TEXT_MODEL_SOURCE=deepseek # 文本生成模型厂商
IMAGE_MODEL_SOURCE=doubao # 图片编辑模型厂商
IMAGE_CAPTION_MODEL_SOURCE=qwen # 图片描述模型厂商

# 各厂商 API Key(只需配置你要使用的厂商)
DOUBAO_API_KEY=your-doubao-api-key # 火山引擎/豆包
DEEPSEEK_API_KEY=your-deepseek-api-key # DeepSeek
QWEN_API_KEY=your-qwen-api-key # 阿里云/通义千问
GLM_API_KEY=your-glm-api-key # 智谱 GLM
SILICONFLOW_API_KEY=your-siliconflow-api-key # 硅基流动
SENSENOVA_API_KEY=your-sensenova-api-key # 商汤日日新
MINIMAX_API_KEY=your-minimax-api-key # MiniMax
...
```

Expand Down Expand Up @@ -277,24 +271,13 @@ OPENAI_API_BASE=https://api.openai.com/v1

2. **启动服务**

**⚡ 使用预构建镜像(推荐)**

项目在 Docker Hub 提供了构建好的前端和后端镜像(同步主分支最新版本),可以跳过本地构建步骤,实现快速部署:

```bash
# 使用预构建镜像启动(无需从头构建)
docker compose -f docker-compose.prod.yml up -d
```

镜像名称:
- `anoinex/banana-slides-frontend:latest`
- `anoinex/banana-slides-backend:latest`

**从头构建镜像**

```bash
docker compose up -d
```
更新:项目也在dockerhub提供了构建好的前端和后端镜像(同步主分支最新版本),名字分别为:
1. anoinex/banana-slides-frontend
2. anoinex/banana-slides-backend


> [!TIP]
> 如遇网络问题,可在 `.env` 文件中取消镜像源配置的注释, 再重新运行启动命令:
Expand All @@ -316,14 +299,14 @@ docker compose up -d
4. **查看日志**

```bash
# 查看后端日志(最后 200 行
docker logs --tail 200 banana-slides-backend
# 查看后端日志(实时查看最后50行
sudo docker compose logs -f --tail 50 backend

# 实时查看后端日志(最后 100 行
docker logs -f --tail 100 banana-slides-backend
# 查看所有服务日志(后200行
sudo docker compose logs -f --tail 200

# 查看前端日志(最后 100 行)
docker logs --tail 100 banana-slides-frontend
# 查看前端日志
sudo docker compose logs -f --tail 50 frontend
```

5. **停止服务**
Expand Down Expand Up @@ -571,7 +554,6 @@ banana-slides/
└── README.md # 本文件
```


## 交流群
为了方便大家沟通互助,建此微信交流群.

Expand All @@ -581,6 +563,20 @@ banana-slides/






**常见问题**
1. **支持免费层级的 Gemini API Key 吗?**
* 免费层级只支持文本生成,不支持图片生成。
2. **生成内容时提示 503 错误或 Retry Error**
* 可以根据 README 中的命令查看 Docker 内部日志,定位 503 问题的详细报错,一般是模型配置不正确导致。
3. **.env 中设置了 API Key 之后,为什么不生效?**
1. 运行时编辑.env需要重启 Docker 容器以应用更改。
2. 如果曾在网页设置页中设置,会覆盖 `.env` 中参数,可通过“还原默认设置”还原到 `.env`。
4. **生成页面文字有乱码**
* 可以尝试更高分辨率的输出(openai格式可能不支持调高分辨率)
* 确保在页面描述中包含具体要渲染的文字内容


## 🤝 贡献指南
Expand Down Expand Up @@ -672,8 +668,8 @@ banana-slides/
<img width="240" alt="image" src="https://github.com/user-attachments/assets/fd7a286d-711b-445e-aecf-43e3fe356473" />

感谢以下朋友对项目的无偿赞助支持:
> @雅俗共赏、@曹峥、@以年观日、@John、@胡yun星Ethan, @azazo1、@刘聪NLP、@🍟、@苍何、@万瑾、@biubiu、@law、@方源、@寒松Falcon
> 如对赞助列表有疑问,可<a href="mailto:anionex@qq.com">联系作者</a>
> @雅俗共赏、@曹峥、@以年观日、@John、@azazo1、@刘聪NLP、@🍟、@苍何、@biubiu
> 如对赞助列表有疑问(如赞赏后没看到您的名字),可<a href="mailto:anionex@qq.com">联系作者</a>

## 📈 项目统计

Expand Down
16 changes: 16 additions & 0 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,22 @@ OPENAI_API_BASE=https://api.openai.com/v1

**Use the new editable export configuration method to achieve better results**: You need to obtain an API KEY from the [Baidu AI Cloud Platform](https://console.bce.baidu.com/iam/#/iam/apikey/list) and fill it in the `BAIDU_OCR_API_KEY` field in the `.env` file (there is sufficient free usage quota). For details, see the instructions in https://github.com/Anionex/banana-slides/issues/121

# LazyLLM format configuration (used when AI_PROVIDER_FORMAT=lazyllm)
TEXT_MODEL_SOURCE=deepseek # text model provider
IMAGE_MODEL_SOURCE=doubao # image-editing model provider
IMAGE_CAPTION_MODEL_SOURCE=qwen # image decription model provider

# API Keys for different provider
DOUBAO_API_KEY=your-doubao-api-key # doubao
DEEPSEEK_API_KEY=your-deepseek-api-key # DeepSeek
QWEN_API_KEY=your-qwen-api-key # qwen
GLM_API_KEY=your-glm-api-key # GLM
SILICONFLOW_API_KEY=your-siliconflow-api-key # siliconflow
SENSENOVA_API_KEY=your-sensenova-api-key # sensenova
MINIMAX_API_KEY=your-minimax-api-key # MiniMax
...
```


<details>
<summary>📒 Using Vertex AI (GCP Free Tier)</summary>
Expand Down
27 changes: 27 additions & 0 deletions backend/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,33 @@ def _load_settings_to_config(app):
app.config['BAIDU_OCR_API_KEY'] = settings.baidu_ocr_api_key
logging.info("Loaded BAIDU_OCR_API_KEY from settings")

# Load LazyLLM source settings
if settings.text_model_source:
app.config['TEXT_MODEL_SOURCE'] = settings.text_model_source
logging.info(f"Loaded TEXT_MODEL_SOURCE from settings: {settings.text_model_source}")
if settings.image_model_source:
app.config['IMAGE_MODEL_SOURCE'] = settings.image_model_source
logging.info(f"Loaded IMAGE_MODEL_SOURCE from settings: {settings.image_model_source}")
if settings.image_caption_model_source:
app.config['IMAGE_CAPTION_MODEL_SOURCE'] = settings.image_caption_model_source
logging.info(f"Loaded IMAGE_CAPTION_MODEL_SOURCE from settings: {settings.image_caption_model_source}")

# Sync LazyLLM vendor API keys to environment variables
# Only allow known vendor names to prevent environment variable injection
ALLOWED_LAZYLLM_VENDORS = {'qwen', 'doubao', 'deepseek', 'glm', 'siliconflow', 'sensenova', 'minimax', 'openai', 'kimi'}
if settings.lazyllm_api_keys:
import json
try:
keys = json.loads(settings.lazyllm_api_keys)
for vendor, key in keys.items():
if key and vendor.lower() in ALLOWED_LAZYLLM_VENDORS:
os.environ[f"{vendor.upper()}_API_KEY"] = key
elif key:
logging.warning(f"Ignoring unknown lazyllm vendor: {vendor}")
logging.info(f"Loaded LazyLLM API keys for vendors: {[v for v, k in keys.items() if k and v.lower() in ALLOWED_LAZYLLM_VENDORS]}")
except (json.JSONDecodeError, TypeError):
logging.warning("Failed to parse lazyllm_api_keys from settings")

except Exception as e:
logging.warning(f"Could not load settings from database: {e}")

Expand Down
7 changes: 6 additions & 1 deletion backend/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class Config:
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY', '')
GOOGLE_API_BASE = os.getenv('GOOGLE_API_BASE', '')

# AI Provider 格式配置: "gemini" (Google GenAI SDK), "openai" (OpenAI SDK), "vertex" (Vertex AI)
# AI Provider 格式配置: "gemini" (Google GenAI SDK), "openai" (OpenAI SDK), "vertex" (Vertex AI), "lazyllm" (Lazyllm Framework)
AI_PROVIDER_FORMAT = os.getenv('AI_PROVIDER_FORMAT', 'gemini')

# Vertex AI 专用配置(当 AI_PROVIDER_FORMAT=vertex 时使用)
Expand All @@ -61,6 +61,11 @@ class Config:
OPENAI_API_BASE = os.getenv('OPENAI_API_BASE', 'https://aihubmix.com/v1')
OPENAI_TIMEOUT = float(os.getenv('OPENAI_TIMEOUT', '300.0')) # 增加到 5 分钟(生成清洁背景图需要很长时间)
OPENAI_MAX_RETRIES = int(os.getenv('OPENAI_MAX_RETRIES', '2')) # 减少重试次数,避免过多重试导致累积超时

# Lazyllm 格式专用配置(当 AI_PROVIDER_FORMAT=lazyllm 时使用)
TEXT_MODEL_SOURCE = os.getenv('TEXT_MODEL_SOURCE', 'deepseek') # 文本生成模型厂商
IMAGE_MODEL_SOURCE = os.getenv('IMAGE_MODEL_SOURCE', 'doubao') # 图片生成模型厂商
IMAGE_CAPTION_MODEL_SOURCE = os.getenv('IMAGE_CAPTION_MODEL_SOURCE', 'doubao') # 图片识别模型厂商

# AI 模型配置
TEXT_MODEL = os.getenv('TEXT_MODEL', 'gemini-3-flash-preview')
Expand Down
3 changes: 2 additions & 1 deletion backend/controllers/reference_file_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ def _parse_file_async(file_id: str, file_path: str, filename: str, app):
openai_api_key=current_app.config.get('OPENAI_API_KEY', ''),
openai_api_base=current_app.config.get('OPENAI_API_BASE', ''),
image_caption_model=current_app.config['IMAGE_CAPTION_MODEL'],
provider_format=current_app.config.get('AI_PROVIDER_FORMAT', 'gemini')
provider_format=current_app.config.get('AI_PROVIDER_FORMAT', 'gemini'),
lazyllm_image_caption_source=current_app.config.get('IMAGE_CAPTION_MODEL_SOURCE', 'doubao'),
)

# Parse file
Expand Down
Loading
Loading