Harden dev deploy pipeline and add OpenTelemetry instrumentation#84
Conversation
Initialize OTEL in app lifespan with OTLP when configured, extend logging to bridge stdlib and third-party loggers, and add tests. Document OTEL env vars in .env.example. Ignore local Terraform plan outputs in .gitignore. Trim trailing whitespace in platform module README. Made-with: Cursor
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds configurable OpenTelemetry initialization and log–trace correlation, conditional App Hub registration/adoption, Firestore PITR/deletion policy inputs, Cloud Run probe and sidecar adjustments, Terraform workflow/CI inputs and docs updates, and accompanying tests and minor deployment script/workflow tweaks. Changes
Sequence DiagramsequenceDiagram
participant App as FastAPI App
participant OTEL as OTEL SDK
participant Instr as FastAPI/HTTPX Instrumentors
participant Collector as OTLP Collector
participant CloudLog as Cloud Logging
Note over App,OTEL: Startup
App->>OTEL: setup_opentelemetry()
OTEL->>OTEL: init TracerProvider & MeterProvider
OTEL->>Collector: register OTLP span/metric exporters
App->>Instr: instrument_app(app)
Note over App,OTEL: Request flow
App->>OTEL: start span
App->>App: emit log
App->>OTEL: (log processor) get_current_span()
OTEL-->>App: span context
App->>CloudLog: emit JSON with logging.googleapis.com/trace/spanId/trace_sampled
OTEL->>Collector: export spans/metrics
Collector->>CloudLog: telemetry ingestion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Review rate limit: 3/5 reviews remaining, refill in 22 minutes and 1 second. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/terraform.yml (1)
184-190: 🧹 Nitpick | 🔵 TrivialDocument operational implications of
-refresh=falsein the workflow.Using
-refresh=falseavoids the Monitoring API refresh race condition (as documented in lines 86-88), but it means Terraform won't detect out-of-band drift during CI runs. If resources are modified outside Terraform (e.g., via console orgcloud), the plan won't reflect those changes.Consider adding a periodic refresh-enabled plan (manual or scheduled) to detect drift, or document in the README that local runs should use
-refresh=trueto catch drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/terraform.yml around lines 184 - 190, The CI terraform plan call that uses -refresh=false (the terraform plan invocation producing tfplan.binary and using variables like project_id, region, terraform_state_bucket_name, infra_deployment_ref and EXTRA) should be accompanied by documentation and/or an additional workflow job to surface drift: update the README or repository docs to explain that -refresh=false avoids the Monitoring API race but prevents detection of out-of-band drift, and add either a scheduled/manual workflow that runs terraform plan without -refresh=false (or a documented local run instruction to use -refresh=true) so drift is periodically detected and operators know how to run a refresh-enabled plan.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@infra/terraform/modules/smart-events-platform/variables.tf`:
- Around line 167-177: Add validation blocks to the two variables to reject
values with leading/trailing whitespace and optionally enforce an
allowed-character pattern; for example in variable "apphub_application_id" and
"apphub_service_id" add a validation { condition = var.apphub_application_id ==
"" || trimspace(var.apphub_application_id) == var.apphub_application_id &&
(can(regex("^[a-zA-Z0-9-_]+$", var.apphub_application_id)) ?
length(regexall("^[a-zA-Z0-9-_]+$", var.apphub_application_id)) == 1 : false)
error_message = "apphub_application_id must not include leading/trailing
whitespace and may only contain letters, digits, hyphens or underscores" } (and
the equivalent using var.apphub_service_id for that variable), using trimspace
and regex to detect/trap invalid input.
In `@infra/terraform/README.md`:
- Line 15: Update the guidance that currently recommends granting project-wide
roles/iam.serviceAccountAdmin; instead instruct maintainers to grant
iam.serviceAccounts.getIamPolicy and iam.serviceAccounts.setIamPolicy only on
the specific runtime/service account resources (using
google_service_account_iam_member or resource-scoped IAM bindings) or create a
minimal custom role with just those permissions, and replace the explicit
project-level roles/iam.serviceAccountAdmin recommendation with this
resource-scoped or custom-role approach while keeping mentions of
iam.serviceAccounts.getIamPolicy / setIamPolicy and
google_service_account_iam_member so readers can locate the relevant Terraform
resources.
In `@README.md`:
- Around line 368-370: Cloud Run section still implies the baked-in collector is
the default while the later Terraform text correctly states the default is
Google’s public otelcol-google image; update the Cloud Run wording so both
places consistently state the default is the public otelcol-google image and
that baking your own image is optional — if you want the repo’s
otelcol-google-config.yaml baked in you must build/push to your GAR and set the
module input otel_collector_image to the full image URI; ensure the Cloud Run
text mirrors the Terraform paragraph (public image default, baking is
recommended only when you need exact config) and mention otel_collector_image
and otelcol-google-config.yaml so readers can find the override instructions.
- Line 335: The heading "Terraform deploy identity (IAM)" is using #### which
skips from the previous ## level and triggers markdownlint MD001; change that
heading to ### (increase level by one) so headings progress ## → ### → ... and
the file's heading hierarchy is consistent; update the "Terraform deploy
identity (IAM)" heading text accordingly to use three hashes and run
markdownlint to verify the MD001 warning is resolved.
In `@src/app/otel.py`:
- Around line 39-49: The _should_init function currently uses multiple ifs and a
final explicit return False; simplify by returning a single boolean expression:
check _is_sdk_disabled() and the three env vars (OTEL_EXPORTER_OTLP_ENDPOINT,
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, OTEL_EXPORTER_OTLP_METRICS_ENDPOINT)
directly so the function becomes a concise one-liner that negates
_is_sdk_disabled() and ORs the three environment checks (calling .strip() on
each) to determine the result.
In `@tests/test_otel.py`:
- Around line 70-76: The pytest.parametrization currently passes parameter names
as a comma-separated string to `@pytest.mark.parametrize` ("key, value"); change
that to a tuple of names (e.g., ("key", "value")) in the pytest.mark.parametrize
call so the decorator uses a tuple for parameter names; update the decorator
where the three OTEL endpoint pairs are provided to use ("key", "value") instead
of "key, value".
---
Outside diff comments:
In @.github/workflows/terraform.yml:
- Around line 184-190: The CI terraform plan call that uses -refresh=false (the
terraform plan invocation producing tfplan.binary and using variables like
project_id, region, terraform_state_bucket_name, infra_deployment_ref and EXTRA)
should be accompanied by documentation and/or an additional workflow job to
surface drift: update the README or repository docs to explain that
-refresh=false avoids the Monitoring API race but prevents detection of
out-of-band drift, and add either a scheduled/manual workflow that runs
terraform plan without -refresh=false (or a documented local run instruction to
use -refresh=true) so drift is periodically detected and operators know how to
run a refresh-enabled plan.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bf90b31e-7fa9-495c-ab13-5de6bd78189b
📒 Files selected for processing (25)
.env.example.github/workflows/deploy-env.yml.github/workflows/terraform.yml.gitignoreREADME.mdinfra/terraform/README.mdinfra/terraform/environments/dev/main.tfinfra/terraform/environments/dev/terraform.tfvars.exampleinfra/terraform/environments/dev/variables.tfinfra/terraform/fixtures/terraform-ci.tfvarsinfra/terraform/modules/smart-events-platform/README.mdinfra/terraform/modules/smart-events-platform/apphub.tfinfra/terraform/modules/smart-events-platform/cloud_run.tfinfra/terraform/modules/smart-events-platform/firestore.tfinfra/terraform/modules/smart-events-platform/iam.tfinfra/terraform/modules/smart-events-platform/locals.tfinfra/terraform/modules/smart-events-platform/observability.tfinfra/terraform/modules/smart-events-platform/variables.tfpyproject.tomlsrc/app/logging_config.pysrc/app/main.pysrc/app/otel.pytests/test_logging_config.pytests/test_otel.pytools/deploy_cloud_run.sh
- Validate apphub_application_id and apphub_service_id (trim, charset). - Prefer scoped SA IAM over project serviceAccountAdmin in Terraform READMEs; document terraform plan refresh vs drift for CI and operators. - Align README Cloud Run collector wording with public otelcol-google default; fix Terraform IAM heading level (MD001); note CI -refresh=false trade-off. - Collapse _should_init to a single expression; parametrize OTEL tests with a tuple of parameter names. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/otel.py`:
- Around line 66-70: The current assignment of _service uses
(os.environ.get("OTEL_SERVICE_NAME") or "smart-events").strip(), which yields an
empty string if OTEL_SERVICE_NAME is present but only whitespace; update the
logic so you first read and strip the environment value, then if the stripped
value is falsy fallback to "smart-events" before passing to Resource.create
(affecting _service, SERVICE_NAME usage and the Resource.create call); in short:
strip the raw env var first, then apply the default when the stripped result is
empty.
- Around line 39-48: Current _should_init() and setup_opentelemetry() conflate
traces and metrics so an unconfigured signal falls back to default exporters and
instrument_app() uses the generic active() flag; split activation by adding
per-signal checks (e.g., _should_init_traces() and _should_init_metrics() or
have _should_init() return two booleans), change setup_opentelemetry() to only
instantiate OTLPSpanExporter when the traces endpoint env var
(OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_TRACES_ENDPOINT) is set and
pass that endpoint to OTLPSpanExporter, and only instantiate OTLPMetricExporter
when the metrics endpoint env var (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT) is set
and pass that endpoint to OTLPMetricExporter; finally update instrument_app() to
gate instrumentation on the tracing-enabled flag (the new
_should_init_traces()/traces boolean) instead of the generic active() so only
tracing instrumentation runs when tracing is explicitly enabled.
- Around line 91-115: The current reset_opentelemetry_state_for_tests() helper
cannot restore globals because opentelemetry.trace.set_tracer_provider and
opentelemetry.metrics.set_meter_provider are one-shot; update the test strategy
instead of trying to call those setters repeatedly: leave
reset_opentelemetry_state_for_tests() alone or remove its calls to
otel_trace.set_tracer_provider and otel_metrics.set_meter_provider, and change
the autouse fixture in tests/test_otel.py to patch (mock) the functions
opentelemetry.trace.set_tracer_provider and
opentelemetry.metrics.set_meter_provider at import time (or run tests in a
subprocess) so each test controls what provider is installed before
setup_opentelemetry() runs; alternative approach is to add a private test-only
hook that setup_opentelemetry() will accept (e.g., a parameter or internal
function name) to inject fresh providers without calling the one-shot setters.
Ensure you reference reset_opentelemetry_state_for_tests(), setup_opentelemetry,
and the mocked symbols otel_trace.set_tracer_provider /
otel_metrics.set_meter_provider when making changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f3c33f57-8a31-4797-b0f9-6eb8d0677c4a
📒 Files selected for processing (6)
.github/workflows/terraform.ymlREADME.mdinfra/terraform/README.mdinfra/terraform/modules/smart-events-platform/variables.tfsrc/app/otel.pytests/test_otel.py
- Strip OTEL_SERVICE_NAME before defaulting to smart-events - Initialize traces and metrics only when their OTLP endpoints are set; pass explicit endpoints to gRPC exporters - Gate FastAPI/HTTPX instrumentation on trace OTLP only - Simplify test reset and mock provider setters in test_otel autouse Co-authored-by: Cursor <cursoragent@cursor.com>
* Add optional local LLM proxy dev tooling (#62) * feat(tools): add optional local LLM proxy MVP - Add tools/local-llm-proxy with LiteLLM, compose, env example, scripts - Add CI workflow for shell/yaml/compose validation on path changes - Ignore tools/local-llm-proxy/.env for local secrets Refs #61 Made-with: Cursor * chore: add shellcheck pre-commit hook - Use shellcheck-py v0.11.0.1 for repo-wide shell linting - Fix deploy_cloud_run env-var join without a subshell (SC2030/SC2031) - Log expected Ollama model in test-setup.sh (SC2034) Made-with: Cursor * fix(tools): satisfy shellcheck SC2034 in test-setup.sh Declare expected Ollama model inside main() so CI shellcheck sees usage. Made-with: Cursor * PR comment added * chore(local-llm-proxy): harden proxy tooling and CI lint - Point yamllint at root .yamllint; lint lib.sh in workflow; trigger on .yamllint - Document CURSOR_TO_LITELLM_KEY as dev-only placeholder in .env.example - Pin LiteLLM image to v1.83.7-stable; add compose env defaults - Share err/log via lib.sh; fix shellcheck for script dir and dynamic source - README: require curl for test-setup.sh; test-setup: curl timeouts and model log Made-with: Cursor * pr comments added * fixed pr comments * added ngrok support * feat(local-llm-proxy): add Ngrok tunnel, PostgreSQL DB, and hardware-accelerated Ollama support - Integrated Ngrok for secure public URL exposure of the LiteLLM proxy. - Added a PostgreSQL container for LiteLLM UI data persistence. - Reverted to native Ollama support on Mac to leverage Apple Silicon GPU acceleration. - Created `proxy.sh` wrapper script for unified service lifecycle management (start/stop/restart). - Updated healthchecks to use Python for container compatibility. - Hardened model loading and test scripts for local development workflows. Made-with: Cursor * feat(local-llm-proxy): automate virtual key seeding and harden proxy scripts - Rename `CURSOR_TO_LITELLM_KEY` to `LITELLM_MASTER_KEY` to align with LiteLLM terminology. - Update `proxy.sh` to automatically seed a virtual key (`local-proxy-key`) restricted to configured models upon startup. - Harden `proxy.sh` with robust `jq` parsing and authenticated model fetching to prevent initialization errors. - Fix root `.env` line endings to resolve Pydantic boolean parsing failures. - Update documentation and `.gitignore` for the new virtual key persistence. Made-with: Cursor * Feat/devtooling localllm (#63) * feat(tools): add optional local LLM proxy MVP - Add tools/local-llm-proxy with LiteLLM, compose, env example, scripts - Add CI workflow for shell/yaml/compose validation on path changes - Ignore tools/local-llm-proxy/.env for local secrets Refs #61 Made-with: Cursor * chore: add shellcheck pre-commit hook - Use shellcheck-py v0.11.0.1 for repo-wide shell linting - Fix deploy_cloud_run env-var join without a subshell (SC2030/SC2031) - Log expected Ollama model in test-setup.sh (SC2034) Made-with: Cursor * fix(tools): satisfy shellcheck SC2034 in test-setup.sh Declare expected Ollama model inside main() so CI shellcheck sees usage. Made-with: Cursor * PR comment added * chore(local-llm-proxy): harden proxy tooling and CI lint - Point yamllint at root .yamllint; lint lib.sh in workflow; trigger on .yamllint - Document CURSOR_TO_LITELLM_KEY as dev-only placeholder in .env.example - Pin LiteLLM image to v1.83.7-stable; add compose env defaults - Share err/log via lib.sh; fix shellcheck for script dir and dynamic source - README: require curl for test-setup.sh; test-setup: curl timeouts and model log Made-with: Cursor * pr comments added * fixed pr comments * added ngrok support * feat(local-llm-proxy): add Ngrok tunnel, PostgreSQL DB, and hardware-accelerated Ollama support - Integrated Ngrok for secure public URL exposure of the LiteLLM proxy. - Added a PostgreSQL container for LiteLLM UI data persistence. - Reverted to native Ollama support on Mac to leverage Apple Silicon GPU acceleration. - Created `proxy.sh` wrapper script for unified service lifecycle management (start/stop/restart). - Updated healthchecks to use Python for container compatibility. - Hardened model loading and test scripts for local development workflows. Made-with: Cursor * feat(local-llm-proxy): automate virtual key seeding and harden proxy scripts - Rename `CURSOR_TO_LITELLM_KEY` to `LITELLM_MASTER_KEY` to align with LiteLLM terminology. - Update `proxy.sh` to automatically seed a virtual key (`local-proxy-key`) restricted to configured models upon startup. - Harden `proxy.sh` with robust `jq` parsing and authenticated model fetching to prevent initialization errors. - Fix root `.env` line endings to resolve Pydantic boolean parsing failures. - Update documentation and `.gitignore` for the new virtual key persistence. Made-with: Cursor * moved the local-llm-proxy to a separate repo * Add configurable initial ramp users for Locust load tests (#64) * initial implementation * feat(load-test): add configurable initial ramp user count Expose SMART_EVENTS_LOADTEST_INITIAL_USERS and workflow_dispatch input so the first Locust step can differ from per-step growth; document env vars and ramp math. Made-with: Cursor * fixed pr comments * implemented pr feedback * made load tests more robust * fix(load): ramp docs, duration stop, and signup bootstrap stats - Document SMART_EVENTS_LOADTEST_STEP_TIME and MAX_DURATION; stop the shape when run_time >= max_duration. - Cap default initial users to max_users; only reject INITIAL_USERS > max when set explicitly. - Use catch_response for /api/signup/verify (200/409 success); richer signup failure text; log loaded attendee count. - Shorten unknown-attendee signup error log to satisfy ruff line length. Made-with: Cursor * Create infrastructure deployment using IaC (#70) * added caveman skill and rule * feat: add Terraform deployment scaffolding and strengthen session loading Introduce Terraform modules, environment configs, and CI workflows for GCP infrastructure while updating session routing/loader and Firestore seeding logic to better handle track and timeslot data from local and Firestore-backed catalogs. Made-with: Cursor * fixed an bug around sesison schedule view * fix: CI hygiene, Terraform hardening, and stricter session parsing - Drop duplicate tflint --recursive before pre-commit in build and lint workflows. - Ignore nested Terraform state; mark dev platform output sensitive; align tfvars example. - Add firestore_location_id, pin utils module via git ref, optional uptime alert policy. - Tighten CI deploy SA IAM; idempotent tfstate bootstrap; conditional Cloud Run unauth flag. - Accept only string track IDs in pages and sessions loader; safer timeslot fallback. - Seed Firestore: ship sample override only for Locust sessions dir; fix ruff line length. - Test seed catalog selection by comparing session ids to shipped sample files. Made-with: Cursor * test: do not require Locust sessions dir on disk for seed path test tests/load/data/ is gitignored, so CI checkouts lack that folder. The shipped-demo override is keyed by resolved path equality; assert catalog ids match sample without asserting load_test_dir.is_dir(). Made-with: Cursor * Harden deploy and IaC naming; share JSON list helpers - Normalize CLOUD_RUN_ALLOW_UNAUTH to lowercase before treating as true in deploy workflow. - Pad Terraform service account account_id when under GCP minimum length; clarify environment validation message. - Add app.utils.json_helpers (as_dict_list, as_str_list) and use from pages and sessions_loader. - Normalize track IDs with strip().lower() without redundant str(). Made-with: Cursor * Add linux_amd64 provider checksums to dev Terraform lock file terraform init on GitHub Actions (linux_amd64) was rewriting the lock file during pre-commit terraform_validate, which fails the hook when tracked files change. Align the dev environment lock file with providers lock for darwin_arm64 and linux_amd64 so CI matches local behavior. Made-with: Cursor * Use Firestore ids as canonical; strict session track handling - FirestoreReadOnlyStore: always set document id from snapshot.id. - FirestoreSessionStore: always set row id from doc.id. - Sessions page: match_track and track catalog accept only string ids/tracks. Made-with: Cursor * enable prelease builds from feature branches * fixed linting issues * refactored GH action * added unit tests * added support for firestore enumlator to run integration tests * fixed JRE version for integration tests * Fix Firestore emulator leakage and login verification (#73) * added config to disable firestore emulator * fixed GH action bug * fixed PR comments * fixed pr comments * fixed pr comments * Sessions: multi-select tracks, ordered columns, and deselect all (#75) * initial bug fix * feat(sessions): multi-select tracks, ordered columns, deselect all - Support repeated track= params; preserve selection order for schedule columns - Add track=none for explicit empty selection; query helpers for HTMX chips - Track chips: selected indicator, data-track-key; remove All-tracks Alpine bar - Session schedule layout: column selection from selected track keys - E2E: Playwright mac-x64/arm64 symlink shim; update extended coverage and sessions UI - Ruff: wrap long startup log line in main.py Made-with: Cursor * fixed pr comments * fixed pr comments * Fix empty My Events to Sessions navigation regressions (#78) * bug fixed * pr comments fixed * Implement app monitoring and OTel collector sidecar (#79) * implemented app montoring with app hub setup * ruleset updated * fixed failing GH actions * be more restrictive on the linting check side * implemented PR feedback * Add event program schema validation and session catalog hardening - Validate main event document against event_program.schema.json - Normalize Firestore-style session docs; optional event arg for load_sessions - Per-request session catalog reload middleware for fresh Firestore reads - Sessions page day key fallback when timeslot omits start; schedule grid UTC fix - API catalog-diagnostics endpoint; seed_firestore validates events - README: EVENT_PROGRAM_SCHEMA_PATH; E2E language switch force-click stability Co-authored-by: Cursor <cursoragent@cursor.com> * Add Cursor Cloud backlog workflow rule and prompt (#88) Co-authored-by: Cursor <cursoragent@cursor.com> * Harden dev deploy pipeline and add OpenTelemetry instrumentation (#84) * fixed permission issues. * fixing auth issue * updated * updated * updated * updated * updated * updated * updated * updated * updated * updated * updated * updated * Add OpenTelemetry setup and log correlation Initialize OTEL in app lifespan with OTLP when configured, extend logging to bridge stdlib and third-party loggers, and add tests. Document OTEL env vars in .env.example. Ignore local Terraform plan outputs in .gitignore. Trim trailing whitespace in platform module README. Made-with: Cursor * Harden App Hub tfvars, clarify IAM and OTEL docs, simplify OTEL init - Validate apphub_application_id and apphub_service_id (trim, charset). - Prefer scoped SA IAM over project serviceAccountAdmin in Terraform READMEs; document terraform plan refresh vs drift for CI and operators. - Align README Cloud Run collector wording with public otelcol-google default; fix Terraform IAM heading level (MD001); note CI -refresh=false trade-off. - Collapse _should_init to a single expression; parametrize OTEL tests with a tuple of parameter names. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix OpenTelemetry env handling and per-signal OTLP setup - Strip OTEL_SERVICE_NAME before defaulting to smart-events - Initialize traces and metrics only when their OTLP endpoints are set; pass explicit endpoints to gRPC exporters - Gate FastAPI/HTTPX instrumentation on trace OTLP only - Simplify test reset and mock provider setters in test_otel autouse Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * Make login code requirement configurable (#90) * Implement configurable login code requirement Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Harden login code defaults and errors Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Make login error tests force strict mode Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Stabilize language switch E2E test Co-authored-by: Oliver L <se02035@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Config-driven toggles: Information, Profile, and AI assistant (#91) * Add feature-toggle settings (information/profile/ai_assistant) Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Gate optional features in routes, nav, and assistant shell Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Add feature toggle tests and README deployment-modes section Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Address PR #91 review: include session-ratings in always-available list - README: add /api/session-ratings* to the 'always available' sentence alongside /sessions, /api/sessions, /my-schedule, /api/schedule*. - tests/test_feature_toggles.py: add test_session_ratings_api_reachable_when_all_flags_off (placed next to test_sessions_api_reachable_when_all_flags_off) to lock in that GET /api/session-ratings stays reachable when every feature flag is off. Verified against api.py L958 and L974: neither GET /api/session-ratings nor PUT /api/session-ratings/{session_id} declares require_feature_enabled, so they really are always available. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Address PR #91 review: clarify default-on wording in README The parenthetical 'flip a flag to false (or omit it)' was ambiguous: it read as 'to disable, set false or omit', which is the opposite of actual behavior (defaults are True, so omitting the variable keeps the feature enabled). Spell it out and reaffirm the stable 404 body shape. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(e2e): stabilize language switch + mobile nav locators Disable hx-boost on language links so full reload avoids detached-node clicks. Scope data-app-nav targets by viewport (no or_ strict violations) and open the drawer via Alpine navOpen with menu fallback after animated closes. Co-authored-by: Oliver L <se02035@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Document Cloud Agent VM gotchas in AGENTS.md and cursor rules (#92) * Document Cloud Agent VM gotchas in AGENTS.md and rules - AGENTS.md: refresh-inherited-venv, backlog-rule lives on dev, Terraform pre-commit carve-out, verify-pr.sh wrapper false positive. - python-venv-pip.mdc: 'Refreshing an inherited venv' subsection. - commit-workflow.mdc: env-only hook carve-out (Terraform/tflint), cross-reference to verify-pr.sh false positive. - cloud-backlog-agent-workflow.mdc: warn when started on main, cross-reference env-only hook and verify-pr.sh carve-outs in Phase 2 instructions. These changes are doc-only and pulled from issues hit while implementing #68 in PR #91. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Fix flaky test_language_change after htmx-boost re-render race The language switcher dropdown is rendered by Alpine inside the header. After a language switch, htmx-boost replaces #app-shell-main and the language-switcher component is re-mounted with open=false. The previous test sequence (click button -> click force=True on link) raced against that re-mount: the dropdown had already been closed (or detached) by the time Playwright resolved/clicked the link, producing the 'element was detached from the DOM, retrying' timeout seen on GHA run https://github.com/se02035/[REDACTED]/actions/runs/25235275354. Replace the two ad-hoc click pairs with a small open_switcher_and_pick helper that: - waits for the switcher button to display the *current* locale (so the previous render has settled), - clicks the button and waits up to ~1.5s for the target link to be visible, retrying the click up to 5x if the htmx swap closed the dropdown again, - then clicks the link. No application/template changes; the language switcher behavior itself is unchanged. Verified locally with 5x desktop runs and 3x mobile runs in a row, plus a full tests/e2e/ pass on both viewports (36 + 36). Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(e2e): stabilize language switcher test against htmx swap races Scope locators to #language-switcher-mount and retry open-dropdown + click when Alpine remounts after hx-boost swaps (fixes CI timeout on EN link click). Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(e2e): click language links via DOM when Alpine hides menu Playwright actionability requires visible targets; CI timed out on ?lang=en because x-show kept the link hidden after intermittent reopen. Use evaluate(el.click()) after visible opener retries (matches prior dev helper), with wait_for(attached) on the final path. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(e2e): wait for visible language link on header switch fallback Replace attached-only wait so Alpine x-show dropdown must be open before evaluate(el.click()) on the fallback path. Co-authored-by: Oliver L <se02035@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Split README into docs guides (#96) * docs: split README into guides under docs/ Move long-form content to focused markdown files (local development, testing, configuration, infrastructure, deployment, data seeding). Keep a lean root README with quick start and documentation index. Update AGENTS.md, infra/terraform README, and Cursor rules to link to the new docs. Refs #95 Co-authored-by: Cursor <cursoragent@cursor.com> * docs: target Unix shells; drop Windows setup notes State Unix/macOS/Linux scope in README and dev guides; remove Windows venv and E2E env examples from commit-workflow and python-venv-pip. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * Add QR login scanner for attendee ID (issue 86) (#97) * feat(login): add QR login scanner with tests and PWA cache update Refs #86 Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(login): a11y QR dialog, scanner retry state, SW lazy static JS Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(qr): guard load after dialog close, clear scan status on errors Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(qr): validate decode before close, guard html5.start resolution Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(qr): clear Html5Qrcode after stop promise settles Refs #86 Co-authored-by: Oliver L <se02035@users.noreply.github.com> * test(e2e): stabilize QR scanner dialog visibility on CI The stub used delayMs 20 so the modal could close before Playwright observed showModal on slow runners. Use a longer stub delay and explicit visibility timeouts for the dialog and heading. Refs #86 Co-authored-by: Oliver L <se02035@users.noreply.github.com> * Fix QR login scanner E2E by repairing JS parse errors The file header comment contained `**/` which terminated the block comment early and broke the entire script in the browser. Replace Promise.prototype.finally with then/catch for older Chromium. E2E server env now sets non-placeholder legal emails; add Playwright coverage for deny-permission, lib load failure, and insecure-context paths. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * test: expand QR scanner coverage (static + E2E) Add unit checks for qr-login-scanner.js Playwright-compat patterns and E2E cases for html5 retry after load failure and stub path without attendee-id. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(qr-login-scanner): show scanning status on retry Match openQrLoginScanner by setting data-msg-scanning on the status line before stopScanner and stub or html5 load paths; failure paths still clear it. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(qr-scanner): apply review feedback improvements - Hide retry button when insecure-context error is shown (retrying cannot fix a non-HTTPS context so the button was misleading UX). - Remove redundant double-stop calls in startRealScanner onOk then/catch: after html5.stop() resolves the scanner is already stopped; early-return on stale state instead of calling stop() again. - Fix applyDecoded always returning true: the final fallback branch (no passcode field, no form) now returns false to match the declared contract. - Move QR-trigger click binding from inline onclick to a JS addEventListener in DOMContentLoaded, keyed off [data-qr-login-trigger]. Removes the inline handler from login.html and improves resilience (no uncaught ReferenceError if the script fails to load). - Remove redundant role="dialog" attribute from the <dialog> element; the element's implicit ARIA role is already dialog. - Extend test_qr_login_scanner_js.py with a new guard test that asserts the E2E test-hook localStorage key constants have not been silently renamed. - Update test_login_page_includes_qr_login_scanner_markup to assert the absence of the removed inline onclick and the narrowed role check. Co-authored-by: Oliver L <se02035@users.noreply.github.com> * fix(tests): apply inline code-review findings - conftest.py: seed ATT_ID_SIGNUP_LOGOUT with account_activated=False so test_user_signup_login_and_logout exercises the real first-time signup modal. All other E2E users remain pre-activated for direct passcode login. Consolidate the two allowlist_data imports into one. - test_ui_flows.py: update _open_qr_login_scanner docstring to reflect that the QR trigger is now bound via addEventListener in DOMContentLoaded, not an inline onclick attribute (changed in the previous review commit). - test_api.py: extract the fetch handler body with regex and assert that cache.put appears inside it, not just anywhere in sw.js, so a regression that removes runtime caching would be caught by this test. Co-authored-by: Oliver L <se02035@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Oliver L <se02035@users.noreply.github.com> * feat(infra): destroy dev terraform stack via GitHub Action (#103) * feat(infra): add manual destroy workflow for dev terraform stack Add .github/workflows/terraform-destroy.yml so the dev infra can be torn down from a GitHub Action, mirroring the deploy job's auth/init wiring but running terraform destroy -auto-approve against infra/terraform/environments/dev. Safety: - workflow_dispatch only; required confirm input must equal 'destroy-dev' - shares concurrency group 'terraform-dev' with deploy so destroy and apply cannot race the same state - runs in the dev GitHub Environment, so existing reviewer rules apply Document the new workflow under 'Bootstrap and remote state' in docs/infrastructure.md, including the firestore_deletion_policy caveat and the post-destroy refresh of GitHub Environment vars / GCP_SA_KEY. Refs #83 * docs(infra): correct firestore destroy default to DELETE in dev The dev root (infra/terraform/environments/dev/variables.tf) defaults firestore_deletion_policy to DELETE, not the platform module's ABANDON. Update the destroy workflow note and docs/infrastructure.md to reflect that destroy WILL delete the Firestore database (and its data) in GCP unless the user sets firestore_deletion_policy = "ABANDON" in the dev terraform.tfvars before destroying. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Oliver L <se02035@users.noreply.github.com>
Harden dev deploy pipeline and add OpenTelemetry instrumentation
🎫 GitHub Issue
Type of Task
Description
src/app/otel.py), wire it inmain.py, extend logging configuration for correlation and third-party loggers, and add unit tests. Document OTEL-related environment variables in.env.example..gitignore; small README and script touch-ups aligned with deploy and observability.Refs #53
🌟 Impact
ℹ️ Additional Information
devenvironment.CLOUD_RUN_SET_ENV_VARS, image bases, and project/region) stay aligned withREADME.mdand Terraform outputs.💥 Issues
gcloud run deploy(outside Terraform) can be overwritten when Terraform updates the service template; prefer defining durable env in Terraform or redeploying via the Deploy to environment workflow after infra applies.curland that the deploy service account hasroles/run.invokeron the service (or enable unauthenticated access where policy allows).🏞️ Demo
Not applicable (infrastructure, workflow, and backend instrumentation changes only).