Skip to content

Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period #3846

@radisicc

Description

@radisicc

[Bug] Preview proxy returns 400 "bad request: 404 Not Found" after sandbox idle period

Labels: bug, proxy, preview


Description

The Daytona preview proxy (*.proxy.daytona.works) intermittently returns HTTP 400 errors to browsers after a tab has been idle for 5+ seconds. The error body is:

{"statusCode": 400, "message": "bad request: 404 Not Found", "code": "BAD_REQUEST"}

Reproduction Steps

  1. Create a sandbox with a Vite dev server on port 5173
  2. Get a preview URL (signed or regular) and load it in a browser
  3. Leave the tab idle for 5–30+ seconds
  4. Return to the tab — page resources fail with 400 errors in the browser console:
    GET /@vite/client       400 (Bad Request)
    GET /@react-refresh     400 (Bad Request)
    GET /src/App.tsx        400 (Bad Request)
    
  5. Manual page refresh immediately fixes it

Connection Architecture

Browser ──HTTP/2──> Daytona Cloud Proxy (*.proxy.daytona.works)
                         │
                         │  persistent TCP connection
                         ▼
                    Daytona Daemon inside sandbox (:2280)
                         │
                         │  HTTP/1.1 pooled connection
                         ▼
                    App server (e.g. Vite :5173)

Root Cause

Node.js's default HTTP server keepAliveTimeout is 5 seconds. After 5s of inactivity, Vite (or any Node.js-based dev server) closes the pooled connection on its side.

When the browser returns from idle:

  1. The Daytona daemon still holds a reference to the now-dead connection in its pool
  2. The daemon attempts to reuse the stale connection for the incoming request
  3. The backend (Vite) has already closed it → daemon receives a connection-reset / EOF error
  4. The proxy wraps this as 400 {"message": "bad request: 404 Not Found"}

Evidence

  • Vite response header confirms: keep-alive: timeout=5
  • ss -tnp inside the sandbox shows the daemon (PID 1) maintains an HTTP/1.1 connection to the backend
  • That connection disappears from ss -tnp after exactly 5–8 seconds of no traffic
  • The /health-coder endpoint on the same Vite port returns 200 — the server IS running; the issue is the stale pooled connection, not the server being down
  • The error is intermittent because it is time-dependent: any request within 5s of the last one succeeds
  • Two load-balanced proxy IPs were observed (100.52.152.155, 35.175.80.172), each maintaining its own persistent connection to the sandbox daemon — this explains why the failure is not 100% reproducible across reloads

Workaround (Client-Side)

Extend Node.js HTTP server keepAliveTimeout in the Vite config so the backend never closes the connection before the daemon's pool TTL expires.

vite.config.ts

import { defineConfig, type Plugin } from 'vite'
import react from '@vitejs/plugin-react'

function extendKeepAlivePlugin(): Plugin {
  return {
    name: 'extend-keep-alive',
    configureServer(server) {
      const apply = () => {
        if (server.httpServer) {
          // Default Node.js keepAliveTimeout is 5s. The Daytona daemon pools
          // connections to the backend and may reuse one after Vite has already
          // closed it, resulting in 400 errors. Extending this timeout prevents
          // Vite from closing the connection before the daemon's pool TTL.
          server.httpServer.keepAliveTimeout = 120_000  // 2 minutes
          server.httpServer.headersTimeout   = 121_000  // must be > keepAliveTimeout
        }
      }
      // httpServer may not be bound yet at configureServer time; hook both
      apply()
      server.httpServer?.on('listening', apply)
    }
  }
}

export default defineConfig({
  plugins: [react(), extendKeepAlivePlugin()],
})

Verified: with this fix applied, all requests succeed after 60–120s of idle. ss -tnp confirms the daemon reuses the same pooled connection (same source port) rather than hitting a dead one.


Requested Fix (Proxy / Daemon Side)

The daemon should handle stale connection reuse transparently so users do not need to configure their applications around proxy internals.

Option Description
A — Validate before reuse Probe the pooled connection before sending a real request. If dead, open a new one and retry transparently.
B — Shorten pool TTL Evict pooled connections from the daemon pool after < 5s (before the backend closes them), so the daemon always opens a fresh connection.
C — Retry on connection-reset If the backend returns a connection-reset / EOF error, automatically retry on a new connection before surfacing an error to the client.

Note: Returning 400 to the client for a backend connection-reset is semantically incorrect. A 502 Bad Gateway would at minimum be more accurate; a silent retry is the correct behavior.


Some dicussion points

  1. What is the pool TTL configured in the daemon for backend connections?
  2. Does the daemon implement stale connection detection (TCP keepalive probes or a health check before reuse)?
  3. Which component generates the 400 {"message": "bad request: 404 Not Found"} response — the cloud proxy or the daemon?
  4. Is the pool TTL configurable per-sandbox or globally?

Environment

Daytona daemon v0.143.0-prod
Vite 7.3.1
Proxy *.proxy.daytona.works (2 load-balanced IPs)
Sandbox runtime Node.js via NVM

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions