Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period

## [Bug] Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period

**Labels:** `bug`, `proxy`, `preview`

---

### Description

The Daytona preview proxy (`*.proxy.daytona.works`) intermittently returns HTTP 400 errors to browsers after a tab has been idle for 5+ seconds. The error body is:

```json
{"statusCode": 400, "message": "bad request: 404 Not Found", "code": "BAD_REQUEST"}
```

---

### Reproduction Steps

1. Create a sandbox with a Vite dev server on port 5173
2. Get a preview URL (signed or regular) and load it in a browser
3. Leave the tab idle for 5–30+ seconds
4. Return to the tab — page resources fail with 400 errors in the browser console:
   ```
   GET /@vite/client       400 (Bad Request)
   GET /@react-refresh     400 (Bad Request)
   GET /src/App.tsx        400 (Bad Request)
   ```
5. Manual page refresh immediately fixes it

---

### Connection Architecture

```
Browser ──HTTP/2──> Daytona Cloud Proxy (*.proxy.daytona.works)
                         │
                         │  persistent TCP connection
                         ▼
                    Daytona Daemon inside sandbox (:2280)
                         │
                         │  HTTP/1.1 pooled connection
                         ▼
                    App server (e.g. Vite :5173)
```

---

### Root Cause

Node.js's default HTTP server `keepAliveTimeout` is **5 seconds**. After 5s of inactivity, Vite (or any Node.js-based dev server) closes the pooled connection on its side.

When the browser returns from idle:

1. The Daytona daemon still holds a reference to the now-dead connection in its pool
2. The daemon attempts to reuse the stale connection for the incoming request
3. The backend (Vite) has already closed it → daemon receives a connection-reset / EOF error
4. The proxy wraps this as `400 {"message": "bad request: 404 Not Found"}`

---

### Evidence

- Vite response header confirms: `keep-alive: timeout=5`
- `ss -tnp` inside the sandbox shows the daemon (PID 1) maintains an HTTP/1.1 connection to the backend
- That connection **disappears from `ss -tnp` after exactly 5–8 seconds** of no traffic
- The `/health-coder` endpoint on the **same Vite port** returns 200 — the server IS running; the issue is the stale pooled connection, not the server being down
- The error is intermittent because it is time-dependent: any request within 5s of the last one succeeds
- Two load-balanced proxy IPs were observed (`100.52.152.155`, `35.175.80.172`), each maintaining its own persistent connection to the sandbox daemon — this explains why the failure is not 100% reproducible across reloads

---

### Workaround (Client-Side)

Extend Node.js HTTP server `keepAliveTimeout` in the Vite config so the backend never closes the connection before the daemon's pool TTL expires.

**`vite.config.ts`**

```typescript
import { defineConfig, type Plugin } from 'vite'
import react from '@vitejs/plugin-react'

function extendKeepAlivePlugin(): Plugin {
  return {
    name: 'extend-keep-alive',
    configureServer(server) {
      const apply = () => {
        if (server.httpServer) {
          // Default Node.js keepAliveTimeout is 5s. The Daytona daemon pools
          // connections to the backend and may reuse one after Vite has already
          // closed it, resulting in 400 errors. Extending this timeout prevents
          // Vite from closing the connection before the daemon's pool TTL.
          server.httpServer.keepAliveTimeout = 120_000  // 2 minutes
          server.httpServer.headersTimeout   = 121_000  // must be > keepAliveTimeout
        }
      }
      // httpServer may not be bound yet at configureServer time; hook both
      apply()
      server.httpServer?.on('listening', apply)
    }
  }
}

export default defineConfig({
  plugins: [react(), extendKeepAlivePlugin()],
})
```

**Verified:** with this fix applied, all requests succeed after 60–120s of idle. `ss -tnp` confirms the daemon reuses the same pooled connection (same source port) rather than hitting a dead one.

---

### Requested Fix (Proxy / Daemon Side)

The daemon should handle stale connection reuse **transparently** so users do not need to configure their applications around proxy internals.

| Option | Description |
|--------|-------------|
| **A — Validate before reuse** | Probe the pooled connection before sending a real request. If dead, open a new one and retry transparently. |
| **B — Shorten pool TTL** | Evict pooled connections from the daemon pool after < 5s (before the backend closes them), so the daemon always opens a fresh connection. |
| **C — Retry on connection-reset** | If the backend returns a connection-reset / EOF error, automatically retry on a new connection before surfacing an error to the client. |

> **Note:** Returning `400` to the client for a backend connection-reset is semantically incorrect. A `502 Bad Gateway` would at minimum be more accurate; a silent retry is the correct behavior.

---

### Some dicussion points

1. What is the pool TTL configured in the daemon for backend connections?
2. Does the daemon implement stale connection detection (TCP keepalive probes or a health check before reuse)?
3. Which component generates the `400 {"message": "bad request: 404 Not Found"}` response — the cloud proxy or the daemon?
4. Is the pool TTL configurable per-sandbox or globally?

---

### Environment

| | |
|---|---|
| Daytona daemon | `v0.143.0-prod` |
| Vite | `7.3.1` |
| Proxy | `*.proxy.daytona.works` (2 load-balanced IPs) |
| Sandbox runtime | Node.js via NVM |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period #3846

[Bug] Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period

Description

Reproduction Steps

Connection Architecture

Root Cause

Evidence

Workaround (Client-Side)

Requested Fix (Proxy / Daemon Side)

Some dicussion points

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Option	Description
A — Validate before reuse	Probe the pooled connection before sending a real request. If dead, open a new one and retry transparently.
B — Shorten pool TTL	Evict pooled connections from the daemon pool after < 5s (before the backend closes them), so the daemon always opens a fresh connection.
C — Retry on connection-reset	If the backend returns a connection-reset / EOF error, automatically retry on a new connection before surfacing an error to the client.


Daytona daemon	`v0.143.0-prod`
Vite	`7.3.1`
Proxy	`*.proxy.daytona.works` (2 load-balanced IPs)
Sandbox runtime	Node.js via NVM

Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period #3846

Description

[Bug] Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period

Description

Reproduction Steps

Connection Architecture

Root Cause

Evidence

Workaround (Client-Side)

Requested Fix (Proxy / Daemon Side)

Some dicussion points

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions