Very high disk read #9825
Replies: 16 comments 3 replies
-
        
 📦 Deployment environmentDocker 📦 Deployment modeServer-side mode (lobe-chat-database mirror) 📌 Software versionv1.116.0 💻 System environmentUbuntu 🌐 BrowserChrome 🐛 Question descriptionIt's been a long time since a certain version was updated (the node where miniio did bad things), once running lobechat would cause the server to get stuck. It is believed that the disk reads are always high (about 130MB/s), resulting in the disk reading quota used up. Then because minio was doing bad things at that time, I thought it was a problem with minio, but it was useless to lower the version of minio before doing bad things. Then I found that if I turned off the ladder, the problem disappeared and I didn't care. I think this may be a thunder buried in advance by the minio author. After a certain time node, your server will not be used. There is no ladder or communication, which will lead to a thunder. I think this is too bad. Until today, I completely replaced minio with garage, and there was no minio at all. However, when I went online lobe-chat-db, the server was still stuck, and it was still the same 130+ disk reading, and I can't see what it reads from iotop. Am I the only one who encountered this bug? This is a completely vicious bug. Here is my configuration file: 📷 Reproduction stepsNo response 🚦 Expected resultsNo response 📝 Supplementary informationNo response  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         👀 @CoREse Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
        
 目前没有发现其他用户报告与你描述的这种极高磁盘读(130MB/s)相关的问题,也没有官方文档或近期 PR 涉及类似的 bug 或修复。 根据代码分析,lobe-chat-database 的文件服务会通过 S3 兼容接口(MinIO、Garage 等)频繁读取文件内容或字节流,如果有大量文件或某些逻辑反复触发这些操作,确实可能导致磁盘读飙升,尤其是在 S3 存储后端配置异常或网络异常时,应用可能会不断重试或全量扫描文件,进一步加剧 I/O 压力。 有类似的案例表明,切换存储后端(如从 MinIO 切换到 Garage 或 Cloudflare R2)时,所有相关的环境变量都必须彻底更新,否则会导致上传/下载异常,但一般不会直接导致高磁盘读,除非有某些兼容性或配置问题导致应用反复尝试访问本地或远程存储[相关讨论]。 目前代码和文档中没有发现专门的性能诊断或详细日志开关。建议你可以尝试: 
 如果有进一步的日志或复现细节,可以继续补充,有助于更精确定位问题。 To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other    | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         如果觉得可能是 Minio 的问题,可以尝试下将 S3 切换成 Cloudflare R2 这类三方服务,如果还出现这种问题那可能是和 pg 相关的情况。 目前社区里好像的确就只有你遇到这个问题,我好像之前没看到过类似的反馈。 欢迎验证后反馈  | 
  
Beta Was this translation helpful? Give feedback.
-
        
 If you think it may be a problem with Minio, you can try switching S3 to Cloudflare R2 and other three-party services. If this problem still occurs, it may be a pg-related situation. You seem to be the only one in the community that has encountered this problem. I don’t seem to have seen similar feedback before. Feedback after verification is welcome  | 
  
Beta Was this translation helpful? Give feedback.
-
        
 更新,我后续分别将pgvector和minio都拆开了,然后pgvector对结果无影响,minio和(lobe+casdoor)两者单独运行正常,一起运行会触发bug。 然后我将服务器其它服务关掉,一起运行正常,所以我怀疑是docker的问题?docker在内存占用高(但并没有满)的情况下会导致大量的磁盘读取?不管有没有设置swapfile都是这样。这也可以解释之前开启v2ray会导致bug,v2ray似乎占用内存不少。 但是之后,我将minio换成garage,由于garage内存占用显著小于minio,所以可以正常运行,这符合预期,但是当我浏览lobe里的图片时,浏览到第二张又卡死了,一看,又是这个bug。那么我想到的有两个可能: 
 商业的s3解决方案暂时还没有试过。  | 
  
Beta Was this translation helpful? Give feedback.
-
        
 
 Update, I will disassemble both pgvector and minio respectively in the future, and then pgvector has no effect on the result. Minio and (lobe+casdoor) run normally, and running together will trigger a bug. Then I turned off other services on the server and ran normally, so I suspect it was a problem with docker? Does docker cause a lot of disk reads when the memory footprint is high (but not full)? This is true regardless of whether the swapfile is set or not. This can also explain that turning on v2ray before will cause bugs, and v2ray seems to take up a lot of memory. But afterwards, I switched minio to garage. Since garage's memory usage is significantly smaller than minio, it works normally, which is as expected, but when I browse the picture in lobe, I browse to the second picture and it was stuck again. When I saw that it was this bug again. Then there are two possibilities that come to my mind: 
 The commercial s3 solution has not been tried yet.  | 
  
Beta Was this translation helpful? Give feedback.
-
        
 浏览图片的服务器死了但没完全死。我看了下top,next-server (v15.3.5)这个进程占用了很多内存,600多兆的物理内存和22g的虚拟内存,我在想会不会是它的问题,它导致了内存占用高导致了docker的bug?  | 
  
Beta Was this translation helpful? Give feedback.
-
        
 
 The server that browses the picture is dead but not completely dead. I looked at the top. Next-server (v15.3.5) This process occupies a lot of memory, more than 600 megabytes of physical memory and 22g of virtual memory. I was wondering if it was its problem. It caused high memory usage and caused docker bugs?  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         The next-server process should be LobeChat's, so this indicates that this is the source of the freeze. But the question is, under what circumstances would this be triggered? May I ask on which page you performed this operation? 
 Original Contentnext-server 这个进程应该是 LobeChat 的,那么这说明卡死的来源就是这个。但问题在于什么情况下会触发这种情况?请问你是在哪个页面做的这个操作? | 
  
Beta Was this translation helpful? Give feedback.
-
        
 就是普通的聊天界面。我现在lobe界面和数据库,s3啥的都分开放了,lobe单独放在了一个大内存机器上面,就没有问题了。当然只是我个人没有问题了,至于问题还存不存在我不清楚。  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         那我先转一个 Discussion 吧,看看未来社区有没有其他用户会遇到  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         更新:搞不好还真是minio的问题,我可能没有错怪它。今天我的没有装lobechat,但装了minio的服务器又出现这个问题了,然后我重启后发现minio挂了。虽然我没有证据证明是它干的,但是这个相关性已经很大了。  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         更新:实锤了!就是minio!我的s3已经换成了rustfs,然后想看看minio的policy怎么写的。然后,启动minio的瞬间,服务器就挂了,然后看服务器监控就是超高的硬盘读。然后强制重启后,再打开minio,基本都是一分钟内就开始(我连上console之后查看policy,基本看一个policy之后服务器就会挂掉)。  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         我也遇到了这个问题,应该也是 minio 的问题。我最近会进行一些测试,稍后反馈 btw, 可能和我用的 minio 版本有关系,我用的是最后一个能实现 Web 管理的那个版本, minio/minio:RELEASE.2025-04-22T22-12-26Z  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         更新一下进度,从 minio/minio:RELEASE.2025-04-22T22-12-26Z 换成 minio/minio:RELEASE.2025-04-08T15-41-24Z 版本后, 整个服务就稳定下来了,会继续测试,预计一周后再做一次更新。目前看有可能是版本导致的。用早一个时间的版本就行了。  | 
  
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
📦 部署环境
Docker
📦 部署模式
服务端模式(lobe-chat-database 镜像)
📌 软件版本
v1.116.0
💻 系统环境
Ubuntu
🌐 浏览器
Chrome
🐛 问题描述
很久的问题了,在某一个版本更新后(就是minio干坏事的那个节点),一运行lobechat就会导致服务器卡死,看监控是磁盘读取一直保持很高(大概130MB/s)导致用光了磁盘的读取配额。然后因为当时正好minio做坏事了嘛,我就以为是minio的问题,但是降到minio干坏事之前的版本也没用,然后我发现关掉梯子的话问题就消失了,就没管。
我觉得这个可能是minio作者预先埋的雷,某个时间节点之后就让你服务器用不了,没有梯子没有通讯导致不会暴雷,心想这也太坏了。直到今天,我用garage完全取代了minio,完全没有minio了,但是当我上线lobe-chat-db的时候,服务器还是卡死了,还是一样的130+的磁盘读,iotop看不到是什么东西读的。
只有我一个人遇到这个bug吗?这完全是恶性bug了吧。
这是我的配置文件:
📷 复现步骤
No response
🚦 期望结果
No response
📝 补充信息
No response
Beta Was this translation helpful? Give feedback.
All reactions