Skip to content

[Feature]: Implement persistent user sessions to survive browser close, page refresh, and container restarts #26

@MatthewGrimshaw

Description

@MatthewGrimshaw

What problem are you trying to solve?

When running the Azure FinOps agent in a large production tenant, users lose all session context — conversation history, Azure connection state, OAuth tokens, uploaded file references, maturity scores, and tool call results — whenever the browser tab is closed, the page is refreshed, or the container restarts. In tenants with thousands of subscriptions, some queries take minutes to complete. Losing that context forces users to re-authenticate, re-query, and re-analyze from scratch, which is a significant productivity loss and a poor experience.

Root cause analysis based on the current codebase:

  1. Backend session store is in-memory onlyProgram.cs registers AddDistributedMemoryCache() which is a non-distributed, in-process memory store. All session data (OAuth tokens, refresh tokens, user identity, session timestamps) is lost when the container restarts, scales out, or recycles. There is no persistence layer.

  2. Conversation history lives only in Vue reactive stateChatView.vue stores all messages in const messages = ref([]) (line 1717). This is purely in-memory JavaScript state. A page refresh, browser close, or navigation clears it completely. Nothing is written to localStorage, sessionStorage, or any backend store.

  3. User identity is random and ephemeralProgram.cs (line ~224) assigns each new browser session a crypto-random sessionUserId via RandomNumberGenerator.GetInt32(). This ID is not tied to the user's Entra ID object ID (OID). After re-authentication, the same human gets a different userId, making it impossible to reconnect to previous session state.

  4. Per-user state is keyed by ephemeral IDAiTelemetry.cs stores UserSessions, UserTokens, and UserTools in ConcurrentDictionary<long, ...> keyed by the random userId. When the user gets a new session (browser restart), they get a new userId, orphaning all prior in-memory state. The UserStateJanitor then evicts the orphaned state after 1 hour.

  5. No horizontal scaling support — with DistributedMemoryCache, running multiple container instances (App Service scale-out, Container Apps replicas) means each instance has its own isolated session store. A user routed to instance B after authenticating on instance A will appear unauthenticated.

What users lose on page refresh:

  • All conversation messages and AI responses
  • All rendered charts, maturity scores, and generated scripts
  • Tool call history and timing data
  • Uploaded file references and analysis results

What users lose on container restart (in addition to above):

  • OAuth access tokens and refresh tokens (must re-authenticate via Entra ID)
  • Azure connection state (which APIs are consented)
  • Copilot SDK session (subprocess terminated)

Proposed solution

A three-layer approach: bind identity to Entra ID, persist conversations in the browser, and persist backend sessions in Redis. Each layer is independently valuable and can be shipped incrementally.

Layer 1: Bind user identity to Entra ID OID (deterministic identity)

Files: src/Dashboard/Auth/MicrosoftAuthEndpoints.cs, src/Dashboard/Program.cs

Currently, the user is identified by a random number generated at first request. After the Entra ID OAuth callback completes, the user's id_token contains an oid (object ID) claim — a stable GUID that uniquely identifies the user across all sessions and devices. The userId should be derived deterministically from this OID.

Implementation:

  • After successful OAuth token exchange in MicrosoftAuthEndpoints.cs, extract the oid claim from the validated id_token.
  • Derive a deterministic long userId from the OID: BitConverter.ToInt64(SHA256.HashData(Encoding.UTF8.GetBytes(oid)), 0). This is a stable, non-reversible mapping.
  • Update the session user JSON object to include the Entra oid, name, email, and avatar (from id_token claims) alongside the deterministic userId.
  • Keep the anonymous random-ID assignment in Program.cs as-is for unauthenticated visitors, but overwrite it with the Entra-derived identity upon successful OAuth login.
  • When a user authenticates and their deterministic userId already exists in AiTelemetry.UserTokens / UserSessions / UserTools, reconnect to the existing state instead of creating new entries. This enables session resumption after page refresh (the session cookie is still valid and maps to the same Entra OID).

Security:

  • The OID is never exposed to the client — only the derived hash-based userId.
  • The mapping is one-way (SHA-256); the OID cannot be recovered from the userId.
  • Session binding remains via the encrypted ASP.NET session cookie (HttpOnly, Secure, SameSite=Lax).

Layer 2: Persist conversation history in frontend localStorage

File: src/Dashboard/frontend/src/components/ChatView.vue

Store the conversation message array in localStorage so it survives page refresh and browser close. This is the highest-impact, zero-cost change.

Implementation:

  • Define a localStorage key scoped to the authenticated user: finops-chat-${userOid} where userOid comes from the /api/me response (the Entra OID or a hash of it exposed via a user info endpoint). For unauthenticated sessions, use the session-assigned userId from the existing user prop.
  • After each completed assistant message (when the SSE stream ends in the send() function), serialize the messages array to localStorage. Only store serializable display data: role, content (markdown text), toolCalls (name + duration, not raw results), and charts (ECharts option JSON). Exclude non-serializable or security-sensitive data.
  • On component mount (onMounted), check for a stored conversation and restore it into messages.value. Show a brief toast: "Previous conversation restored" with a "Clear" action.
  • Cap stored history at 50 messages (most recent) and 2 MB total to avoid localStorage quota issues. Trim oldest messages when the cap is exceeded.
  • On explicit clearMessages(), remove the localStorage entry alongside resetting Vue state.
  • Never store OAuth tokens, refresh tokens, subscription IDs, or raw API responses in localStorage — only the rendered conversation display text.

Security:

  • localStorage is origin-scoped by the browser (same-origin policy). No other domain can read it.
  • Conversation data is keyed per-user — switching Entra accounts shows only that user's history.
  • Sensitive raw API data (token values, full ARM responses) is never persisted — only the LLM-generated summary text and chart configs.
  • The existing CSP headers (default-src 'self') prevent XSS injection that could exfiltrate localStorage.

Layer 3: Replace in-memory session store with Azure Cache for Redis

Files: src/Dashboard/Program.cs, src/Dashboard/Dashboard.csproj, src/Dashboard/appsettings.json

Replace AddDistributedMemoryCache() with AddStackExchangeRedisCache() so backend session data (OAuth tokens, refresh tokens, session metadata) persists across container restarts and is shared across multiple instances.

Implementation:

  • Add NuGet package: Microsoft.Extensions.Caching.StackExchangeRedis.
  • In Program.cs, replace:
    builder.Services.AddDistributedMemoryCache();
    with:
    var redisConnection = builder.Configuration["Redis:ConnectionString"];
    if (!string.IsNullOrEmpty(redisConnection))
    {
        builder.Services.AddStackExchangeRedisCache(options =>
        {
            options.Configuration = redisConnection;
            options.InstanceName = "finops:";
        });
    }
    else
    {
        // Fallback for local dev without Redis
        builder.Services.AddDistributedMemoryCache();
    }
  • Add Redis connection string to appsettings.json (empty default) and document the environment variable Redis__ConnectionString for production.
  • Set the session idle timeout to remain at 60 minutes and absolute timeout at 8 hours (unchanged).
  • Ensure the Redis instance uses TLS (Azure Cache for Redis enforces this by default on port 6380).

Azure resource selection:

  • Azure Cache for Redis Basic C0 ($16/mo, 250 MB, no SLA) — sufficient for dev/test and small deployments. Session data per user is ~2-4 KB (tokens + metadata), so 250 MB supports ~60,000 concurrent user sessions.
  • Azure Cache for Redis Standard C0 ($51/mo, 250 MB, 99.9% SLA, replication) — recommended for production. Automatic failover ensures session continuity.
  • Azure Managed Redis (Balanced B0) — newer SKU, evaluate if available in target region.

For the absolute lowest cost, Layer 2 (frontend localStorage) alone solves the conversation history problem with zero Azure cost. Layer 3 adds backend resilience and horizontal scaling.

Security:

  • Redis connection uses TLS (enforced by Azure Cache for Redis).
  • Access is via connection string with access key (stored in App Service configuration / Key Vault, never in source).
  • Session data in Redis is encrypted by ASP.NET Data Protection (the existing AddDataProtection() configuration persists keys to /home/dataprotection-keys/, which on App Service is durable storage).
  • Redis network access should be restricted to the App Service VNet via Private Endpoint or firewall rules.
  • All existing session security properties are preserved: HttpOnly cookies, Secure flag, SameSite=Lax, 8-hour absolute timeout, CSRF validation.

Files to Modify

File Change
src/Dashboard/Program.cs Replace AddDistributedMemoryCache() with conditional Redis/memory cache; keep anonymous ID assignment but add OID override path
src/Dashboard/Dashboard.csproj Add Microsoft.Extensions.Caching.StackExchangeRedis package reference
src/Dashboard/appsettings.json Add Redis:ConnectionString configuration key
src/Dashboard/Auth/MicrosoftAuthEndpoints.cs After OAuth callback, extract Entra OID from id_token, derive deterministic userId, update session user object
src/Dashboard/AI/ChatEndpoints.cs When authenticated user reconnects (same deterministic userId), reattach to existing UserTokens/UserTools if available
src/Dashboard/frontend/src/components/ChatView.vue Add localStorage save/restore for messages array; scope by user identity; cap at 50 messages / 2 MB
src/Dashboard/Endpoints/MetaEndpoints.cs Add GET /api/me endpoint returning user display info + identity hash (for frontend localStorage key) if not already present

Acceptance Criteria

  • After Entra ID login, userId is derived deterministically from the user's Entra OID via SHA-256 hash, replacing the random crypto ID
  • The same human logging in from a different browser tab or after a page refresh gets the same userId and reconnects to existing backend state (tokens, Copilot session) if still alive
  • Conversation messages are saved to localStorage after each completed assistant response, scoped by user identity
  • On page load, previous conversation is restored from localStorage with a visual indicator and a "Clear" option
  • localStorage entries are capped at 50 messages and 2 MB; oldest messages are trimmed when exceeded
  • No OAuth tokens, refresh tokens, or raw API response bodies are stored in localStorage — only display-safe conversation text and chart configs
  • AddDistributedMemoryCache() is replaced with AddStackExchangeRedisCache() when a Redis connection string is configured, with fallback to in-memory for local development
  • Session data (tokens, refresh tokens, user metadata) survives container restarts when Redis is configured
  • Multiple container instances share session state via Redis (horizontal scaling works)
  • All existing security properties are preserved: HttpOnly + Secure + SameSite=Lax cookies, 8-hour absolute timeout, CSRF validation, Data Protection encryption

References

Area

New AI tool (Azure / Graph / Log Analytics)

Alternatives considered

1. Server-side conversation persistence in a database (Cosmos DB / SQL)

Store the full conversation history in Azure Cosmos DB or Azure SQL. Rejected because it introduces significant infrastructure cost ($25+/mo for Cosmos DB serverless, $5+/mo for SQL Basic), requires schema design for conversation documents, adds write latency to every message, and is over-engineered for the primary use case (surviving a page refresh). The LLM context window is the true conversation memory — the frontend display history is a UX convenience, not a source of truth.

2. Server-side conversation persistence in SQLite

Use an embedded SQLite database file for conversation storage. Rejected because SQLite is single-writer and cannot support horizontal scaling (multiple container instances). It also adds file locking complexity in containerized environments and doesn't solve the session token persistence problem (still needs a distributed cache for OAuth tokens).

3. Azure Blob Storage for session state

Use Azure Blob Storage as the distributed cache backend. Rejected because blob storage has higher latency per operation (~50-100ms) compared to Redis (~1-5ms), and ASP.NET session middleware reads the session on every request. At 50-100ms per request, this would add noticeable latency to every page load and API call. Blob storage is optimized for throughput, not the low-latency key-value access pattern that sessions require.

4. Sticky sessions (ARR affinity) instead of distributed cache

Enable Application Request Routing affinity on App Service to pin users to a specific instance. Rejected because it doesn't survive instance restarts or scale-in events, creates uneven load distribution, and is explicitly an anti-pattern for production workloads per Azure's own guidance. It also doesn't solve the container restart problem.

5. Frontend sessionStorage instead of localStorage

Use sessionStorage for conversation history. Rejected because sessionStorage is scoped to the browser tab and cleared when the tab is closed — which is the exact problem we're trying to solve. localStorage persists across tab/browser close and is the correct storage scope for this use case.

6. IndexedDB for frontend persistence

Use IndexedDB for structured conversation storage with larger capacity. Deferred — IndexedDB supports larger storage quotas and structured queries, but adds API complexity (async, transactional) for a simple key-value use case. localStorage's 5-10 MB limit is more than sufficient for 50 messages of conversation text. IndexedDB can be considered later if conversation history needs grow (e.g. storing full tool call results, multi-conversation management).

7. Encrypt localStorage data with a per-session key

Encrypt the conversation history in localStorage using a key derived from the session cookie or user credentials. Deferred — the conversation data stored in localStorage is the same text visible on screen to the user. It contains no tokens or secrets. The origin-scoping of localStorage plus the existing CSP headers provide sufficient protection. Encryption would add complexity (key management, performance overhead) for marginal security benefit. If compliance requirements mandate it, this can be added as a follow-up using the Web Crypto API.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions