Plugin daemon fails after ECS task redeployment with connection errors

**Self Checks**

To make sure we get to you in time, please check the following :)
- [x] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify-plugin-daemon/issues), including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 [Language Policy](https://github.com/langgenius/dify/issues/1542)).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
- [x] "Please do not modify this template :) and fill in all the required fields."

**Versions**
1. dify-plugin-daemon Version: 0.1.3-local
2. dify-api Version: 1.4.3

**Describe the bug**

Plugin daemon experiences network connection failures when ECS tasks are redeployed, causing plugins to become inaccessible. Two distinct but related issues occur:

**Important Question**: Is deploying new versions and restarting ECS tasks an expected/supported use case for dify-plugin-daemon? We want to understand if this is a supported deployment pattern or if we're using the plugin daemon outside of its intended scope.

1. **S3 Storage Issue (Previous)**: When using S3 for plugin storage, plugins fail to load after container restart with error "plugin_unique_identifier is not valid"
2. **Network Connection Issue (Current)**: After switching to EFS + local storage and forcing new deployment, plugin daemon fails to connect to itself using old IP addresses

**To Reproduce**

Steps to reproduce the behavior:

### Issue 1: S3 Storage Problem
1. Configure dify-plugin-daemon with `PLUGIN_STORAGE_TYPE=aws_s3`
2. Install plugins (OpenAI, Bedrock) via Dify web interface
3. Restart/redeploy ECS service
4. Observe error logs: `2025/07/17 04:50:14 watcher.go:73: [ERROR]list installed plugins failed: plugin_unique_identifier is not valid:`

### Issue 2: Network Connection Problem (After EFS Migration)
1. Configure dify-plugin-daemon with `PLUGIN_STORAGE_TYPE=local` and EFS mount
2. Install plugins via Dify web interface 
3. Force new deployment via ECS service update
4. Check logs and see connection errors to old task IP addresses

**Expected behavior**

Plugin daemon should:
1. Successfully reload installed plugins after container restarts/redeployments
2. Handle IP address changes gracefully without requiring manual intervention
3. Maintain plugin state consistency across deployments

**Screenshots**

### Error Logs - S3 Storage Issue:
```
2025/07/17 04:50:14 watcher.go:73: [ERROR]list installed plugins failed: 
plugin_unique_identifier is not valid: 
```

### Error Logs - Network Connection Issue:
```
2025/07/18 05:14:00 middleware.go:131: [ERROR]redirect request failed: Post "http://169.254.172.2:5002/plugin/a5df51ca-fba9-4170-8369-4ae0eff4f543/dispatch/model/schema": dial tcp 169.254.172.2:5002: connect: cannot assign requested address

2025/07/18 05:14:00 factory.go:28: [ERROR]PluginDaemonInternalServerError: redirect request failed: Post "http://169.254.172.2:5002/plugin/a5df51ca-fba9-4170-8369-4ae0eff4f543/dispatch/model/schema": dial tcp 169.254.172.2:5002: connect: cannot assign requested address
```

### Environment Configuration:
```bash
SERVER_HOST=0.0.0.0
SERVER_PORT=5002
PLUGIN_WORKING_PATH=/app/shared_plugins
PLUGIN_INSTALLED_PATH=installed
PLUGIN_PACKAGE_CACHE_PATH=packages
PLUGIN_STORAGE_TYPE=local
```

**Additional context**

### Infrastructure Details:
- **Platform**: AWS ECS Fargate
- **Network**: awsvpc mode with dynamic IP assignment
- **Storage**: 
  - Initial setup: S3 for plugin storage
  - Current setup: EFS (NFSv4.1) mounted at `/app/shared_plugins`
- **Database**: Aurora PostgreSQL 15.10

### Analysis:
1. **S3 Issue**: Suggests plugin metadata in database becomes inconsistent with actual S3 storage
2. **Network Issue**: Plugin daemon appears to cache/store IP addresses internally, causing connection failures when container IP changes

### Potential Root Causes:
1. Plugin daemon may be storing absolute network references instead of using localhost/relative addressing
2. Database plugin metadata may contain stale network configuration
3. Plugin state management may not be designed for ephemeral container environments

### Workaround Applied:
Complete plugin cleanup and reinstallation after each deployment:
```bash
# Clear EFS plugin data
rm -rf /app/shared_plugins/langgenius/

# Clear database plugin references
DELETE FROM plugins;
DELETE FROM plugin_installations;
DELETE FROM plugin_declarations;
```

### Request:
Is there a configuration option or best practice to make plugin daemon more resilient to container restarts and IP address changes in containerized environments like ECS Fargate?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plugin daemon fails after ECS task redeployment with connection errors #406

Issue 1: S3 Storage Problem

Issue 2: Network Connection Problem (After EFS Migration)

Error Logs - S3 Storage Issue:

Error Logs - Network Connection Issue:

Environment Configuration:

Infrastructure Details:

Analysis:

Potential Root Causes:

Workaround Applied:

Request:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plugin daemon fails after ECS task redeployment with connection errors #406

Description

Issue 1: S3 Storage Problem

Issue 2: Network Connection Problem (After EFS Migration)

Error Logs - S3 Storage Issue:

Error Logs - Network Connection Issue:

Environment Configuration:

Infrastructure Details:

Analysis:

Potential Root Causes:

Workaround Applied:

Request:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions