Fix: Add per-deployment Bearer auth to block rogue evaluation requests by shivam-MBZUAI · Pull Request #68 · one-covenant/crusades

shivam-MBZUAI · 2026-03-27T08:50:25Z

Each Basilica deployment now gets a unique auth token (secrets.token_urlsafe) passed via EVAL_AUTH_TOKEN env var. The eval container rejects any /evaluate or /eval-status request without the correct Authorization header (401). This prevents the OOM attack where a rogue miner pumps fake requests to our deployments. Health endpoint stays open for Basilica probes. Backward compatible — no token means open endpoints (local/dev mode).

Summary by CodeRabbit

New Features
- Evaluation submission and status endpoints can be protected by per-run bearer tokens; submissions and polling must include Authorization: Bearer when enabled. Health/readiness stays unauthenticated.
- Per-evaluation tokens are generated and conveyed to evaluation deployments to secure submission and polling.
Stability
- Background evaluation tasks are now tracked for better cleanup and logging.
Documentation
- Added attack description and token-mitigation guidance; updated dashboard connection host and bumped package version.

coderabbitai · 2026-03-27T08:50:40Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Per-evaluation bearer-token auth was added: the client generates a fresh token, injects it into Basilica deployments via EVAL_AUTH_TOKEN, sends it as Authorization headers on POST /evaluate and GET /eval-status/{job_id}, and the FastAPI service validates tokens and tracks background evaluation tasks.

Changes

Cohort / File(s)	Summary
Client-side Auth Token Implementation `arbos/tester.py`, `src/crusades/affinetes/runner.py`	Generate a per-evaluation `auth_token` (`secrets.token_urlsafe(32)`), inject it as `EVAL_AUTH_TOKEN` into the deployment env, and include `Authorization: Bearer {auth_token}` on both POST `/evaluate` and GET `/eval-status/{job_id}` requests via `httpx.AsyncClient` headers.
Server-side Auth Validation & Background Task Tracking `environments/templar/env.py`	Read `EVAL_AUTH_TOKEN` into `_AUTH_TOKEN`, add `_verify_auth` FastAPI dependency applied to `POST /evaluate` and `GET /eval-status/{job_id}` when token is set, change POST handler signature to `body: EvaluateRequest`, and track background evaluation tasks in `_background_tasks`.
Documentation & Notes `exploits/basilica_request_injection.md`, `README.md`	Add exploit writeup documenting request-injection attack and the mitigation (per-deployment bearer tokens); update README TUI host URL.

Sequence Diagram

sequenceDiagram
    participant Runner as Runner/Tester
    participant Deployment as Basilica Deployment
    participant Server as FastAPI Env
    participant Queue as Background Job Queue

    Runner->>Runner: Generate auth_token\n(secrets.token_urlsafe(32))
    Runner->>Deployment: Deploy with env\nEVAL_AUTH_TOKEN=auth_token
    Runner->>Runner: Create httpx.AsyncClient\nwith Authorization header
    Runner->>Server: POST /evaluate\nAuthorization: Bearer {auth_token}
    Server->>Server: _verify_auth(request)\nvalidate token
    alt Auth Valid
        Server->>Queue: Enqueue background job
        Server-->>Runner: 200 {job_id}
    else Auth Invalid
        Server-->>Runner: 401 {detail: "unauthorized"}
    end
    loop Polling
        Runner->>Server: GET /eval-status/{job_id}\nAuthorization: Bearer {auth_token}
        Server->>Server: _verify_auth(request)
        alt Auth Valid
            Server-->>Runner: 200 {status}
        else Auth Invalid
            Server-->>Runner: 401 {detail: "unauthorized"}
        end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Fix: Bad Gateway Error 502 #61: Introduced the async /evaluate and /eval-status submission/polling flow that this change secures with bearer-token authentication.
Harden eval timeouts, unblock inductor/dynamo config and @staticmethod #54: Modifies the Basilica submission/polling logic (timeouts/exception handling) in the same evaluation path and may overlap with header/client changes.
Fix: Block deferred-computation exploit and harden validator #64: Related changes to Basilica evaluation flow and deployment endpoints that also touch auth/polling behaviors.

Suggested reviewers

joellidin

Poem

🐰 I hide a token, soft and small,
I tuck it in the container wall.
Headers hop with secrets kept,
Servers check before they’ve leapt.
Jobs run safe — the rabbits smile.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: adding per-deployment Bearer authentication to block unauthorized evaluation requests, which aligns with the core modifications across all changed files.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/add_auth

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@environments/templar/env.py`:
- Around line 3322-3331: The created background task from asyncio.create_task in
the endpoint is not stored and may be garbage-collected; declare a module-level
set named _background_tasks: set[asyncio.Task] (e.g., near other globals) and
replace the create_task call with task =
asyncio.create_task(_evaluation_background(job_id, body)), add the task to
_background_tasks, and attach a done callback that removes the task from
_background_tasks when finished (so the task is kept alive until completion but
not leaked). Update references around _evaluation_background, _jobs_lock and the
create_task invocation to use this pattern.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 569351c7-569b-48a6-b28d-949a4d0c0bc8

📥 Commits

Reviewing files that changed from the base of the PR and between a2c18e4 and 8285333.

📒 Files selected for processing (3)

arbos/tester.py
environments/templar/env.py
src/crusades/affinetes/runner.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

arbos/tester.py (1)

252-286: ⚠️ Potential issue | 🟠 Major

Treat poll-time 401s as terminal failures.

environments/templar/env.py now returns 401 for a bad bearer token, but this loop retries every non-200 except 404. If auth is rejected on GET /eval-status/{job_id}, the client will burn the whole deployment budget instead of failing fast.

🛠️ Suggested fix

                     try:
                         poll_resp = await poll_client.get(poll_url)
                         consecutive_errors = 0

                         if poll_resp.status_code == 404:
                             return EvalResult(
                                 success=False,
                                 error=f"Job {job_id} lost (404 on poll) — "
                                 "container may have restarted",
                             )
+                        if poll_resp.status_code == 401:
+                            return EvalResult(
+                                success=False,
+                                error="Polling unauthorized (401) — eval auth token was rejected",
+                            )

                         if poll_resp.status_code != 200:
                             logger.warning(
                                 f"Poll got {poll_resp.status_code}, retrying "
                                 f"in {poll_interval}s ({elapsed:.0f}s elapsed)"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@arbos/tester.py` around lines 252 - 286, The poll loop currently treats every
non-200 (except 404) as transient; modify the loop so that when
poll_resp.status_code == 401 you return an EvalResult(success=False, error=...)
immediately instead of retrying; locate the async polling block using
poll_client/poll_resp in tester.py (the while True loop that builds EvalResult)
and add a branch for 401 that returns a clear error referencing job_id and
poll_url (e.g., "auth rejected (401) on poll"), keeping existing handling for
404 and other status codes unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@environments/templar/env.py`:
- Around line 3316-3324: The handler evaluate currently accepts body:
EvaluateRequest so FastAPI parses the request body before your in-function auth
guard (_check_auth) runs; move the auth check out of the handler by either (A)
creating a dependency function check_auth (accepting Request) and changing
evaluate signature to use Depends(check_auth) so auth runs before body parsing,
or (B) register a global HTTP middleware that calls _check_auth (or a wrapper)
and returns the appropriate Response on failure; update references to
_check_auth/check_auth and keep EvaluateRequest and Request parameters unchanged
in evaluate after adding the dependency or middleware.

---

Outside diff comments:
In `@arbos/tester.py`:
- Around line 252-286: The poll loop currently treats every non-200 (except 404)
as transient; modify the loop so that when poll_resp.status_code == 401 you
return an EvalResult(success=False, error=...) immediately instead of retrying;
locate the async polling block using poll_client/poll_resp in tester.py (the
while True loop that builds EvalResult) and add a branch for 401 that returns a
clear error referencing job_id and poll_url (e.g., "auth rejected (401) on
poll"), keeping existing handling for 404 and other status codes unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70957c74-cbf9-4eda-8db4-6c923ac41282

📥 Commits

Reviewing files that changed from the base of the PR and between 8285333 and 20cf891.

📒 Files selected for processing (3)

arbos/tester.py
environments/templar/env.py
src/crusades/affinetes/runner.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/crusades/affinetes/runner.py

Each Basilica deployment now gets a unique auth token (secrets.token_urlsafe) passed via EVAL_AUTH_TOKEN env var. The eval container rejects any /evaluate or /eval-status request without the correct Authorization header (401). This prevents the OOM attack where a rogue miner pumps fake requests to our deployments. Health endpoint stays open for Basilica probes. Backward compatible — no token means open endpoints (local/dev mode).

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/crusades/affinetes/runner.py (1)

858-898: ⚠️ Potential issue | 🟠 Major

Fail fast on unauthorized poll responses.

/eval-status is now auth-gated, but this loop still treats 401/403 like transient noise. Because those responses also reset consecutive_errors, a bad token will spin until http_timeout expires instead of surfacing the auth problem immediately.

💡 Suggested fix

             async with httpx.AsyncClient(timeout=poll_timeout, headers=auth_headers) as poll_client:
                 while True:
                     ...
                     try:
                         poll_resp = await poll_client.get(poll_url)
 
-                        consecutive_errors = 0
+                        if poll_resp.status_code in (401, 403):
+                            return EvaluationResult.failure(
+                                f"Basilica /eval-status auth failed: HTTP {poll_resp.status_code}: "
+                                f"{poll_resp.text[:200]}",
+                                task_id=task_id,
+                            )
+
+                        consecutive_errors = 0
 
                         if poll_resp.status_code == 404:
                             ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/crusades/affinetes/runner.py` around lines 858 - 898, The polling loop in
async with httpx.AsyncClient (around poll_client.get(poll_url)) incorrectly
treats 401/403 as transient and resets consecutive_errors; update the logic in
that block to detect poll_resp.status_code == 401 or 403 and immediately return
an EvaluationResult.failure with a clear auth error (include job_id, poll_url,
status code and task_id) instead of continuing or resetting consecutive_errors;
keep existing handling for 404 and other non-200 codes unchanged and ensure
consecutive_errors is not reset on auth errors.

arbos/tester.py (1)

252-330: ⚠️ Potential issue | 🟡 Minor

Don't hide auth failures behind the polling timeout.

The tester now relies on the same bearer token for /eval-status, but a permanent 401/403 still gets retried until the overall timeout. That makes auth regressions much harder to diagnose.

💡 Suggested fix

             async with httpx.AsyncClient(timeout=poll_timeout, headers=auth_headers) as poll_client:
                 while True:
                     ...
                     try:
                         poll_resp = await poll_client.get(poll_url)
-                        consecutive_errors = 0
+                        if poll_resp.status_code in (401, 403):
+                            return EvalResult(
+                                success=False,
+                                error=f"/eval-status auth failed: HTTP {poll_resp.status_code}: "
+                                f"{poll_resp.text[:200]}",
+                            )
+                        consecutive_errors = 0
 
                         if poll_resp.status_code == 404:
                             ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@arbos/tester.py` around lines 252 - 330, The polling loop is hiding permanent
auth failures (401/403) by retrying until the overall timeout; update the logic
around poll_client.get / poll_resp.status_code to treat 401 and 403 as immediate
hard errors: when poll_resp.status_code is 401 or 403 return an
EvalResult(success=False, error=...) that includes the status code, job_id and
response text/body to aid debugging instead of continuing; keep existing
handling for 404 and other non-200 codes but ensure EvalResult, poll_client.get,
poll_resp.status_code, and job_id are referenced in the new branch so the
failure is surfaced immediately.

♻️ Duplicate comments (1)

environments/templar/env.py (1)

2974-2985: ⚠️ Potential issue | 🟠 Major

Depends still lets unauthenticated /evaluate requests parse the body.

With body: EvaluateRequest on Line 3316, FastAPI still reads/parses the request body before this dependency can reject the call. An unauthenticated caller can therefore keep burning JSON/Pydantic work with large payloads, which misses the goal of blocking rogue requests early. Move this check to HTTP middleware (or a custom APIRoute) so /evaluate is rejected before body handling.

💡 Suggested direction

+@app.middleware("http")
+async def _protect_eval_routes(request: Request, call_next):
+    if _AUTH_TOKEN and (
+        request.url.path == "/evaluate"
+        or request.url.path.startswith("/eval-status/")
+    ):
+        auth = request.headers.get("authorization", "")
+        if auth != f"Bearer {_AUTH_TOKEN}":
+            return JSONResponse(status_code=401, content={"detail": "unauthorized"})
+    return await call_next(request)
+
-@app.post("/evaluate", dependencies=[Depends(_verify_auth)])
+@app.post("/evaluate")
 async def evaluate(body: EvaluateRequest):
     ...
 
-@app.get("/eval-status/{job_id}", dependencies=[Depends(_verify_auth)])
+@app.get("/eval-status/{job_id}")
 async def eval_status(job_id: str):
     ...

In FastAPI, for a route like `@app.post("/evaluate", dependencies=[Depends(auth_dep)]) async def evaluate(body: EvaluateRequest): ...`, does FastAPI read/parse the request body before `auth_dep` runs? Please answer from FastAPI docs or source and include the order of body parsing vs dependency resolution.

Also applies to: 3315-3316

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@environments/templar/env.py` around lines 2974 - 2985, The _verify_auth
dependency runs too late (after FastAPI has already read/parsed the request body
for routes like the /evaluate handler that accepts body: EvaluateRequest), so
move the authentication check out of the dependency into a component that
executes before body parsing — either an HTTP middleware
(app.middleware("http")) that inspects the Authorization header and returns a
401 Response when it doesn't match _AUTH_TOKEN, or a custom APIRoute (override
get_route_handler on your route class) that performs the same header check
before awaiting request.body()/request.json(); replace usages of
Depends(_verify_auth) on the evaluate endpoint with this early-check mechanism
and keep the same rejection semantics (401, "unauthorized") and logging as in
_verify_auth.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@arbos/tester.py`:
- Around line 252-330: The polling loop is hiding permanent auth failures
(401/403) by retrying until the overall timeout; update the logic around
poll_client.get / poll_resp.status_code to treat 401 and 403 as immediate hard
errors: when poll_resp.status_code is 401 or 403 return an
EvalResult(success=False, error=...) that includes the status code, job_id and
response text/body to aid debugging instead of continuing; keep existing
handling for 404 and other non-200 codes but ensure EvalResult, poll_client.get,
poll_resp.status_code, and job_id are referenced in the new branch so the
failure is surfaced immediately.

In `@src/crusades/affinetes/runner.py`:
- Around line 858-898: The polling loop in async with httpx.AsyncClient (around
poll_client.get(poll_url)) incorrectly treats 401/403 as transient and resets
consecutive_errors; update the logic in that block to detect
poll_resp.status_code == 401 or 403 and immediately return an
EvaluationResult.failure with a clear auth error (include job_id, poll_url,
status code and task_id) instead of continuing or resetting consecutive_errors;
keep existing handling for 404 and other non-200 codes unchanged and ensure
consecutive_errors is not reset on auth errors.

---

Duplicate comments:
In `@environments/templar/env.py`:
- Around line 2974-2985: The _verify_auth dependency runs too late (after
FastAPI has already read/parsed the request body for routes like the /evaluate
handler that accepts body: EvaluateRequest), so move the authentication check
out of the dependency into a component that executes before body parsing —
either an HTTP middleware (app.middleware("http")) that inspects the
Authorization header and returns a 401 Response when it doesn't match
_AUTH_TOKEN, or a custom APIRoute (override get_route_handler on your route
class) that performs the same header check before awaiting
request.body()/request.json(); replace usages of Depends(_verify_auth) on the
evaluate endpoint with this early-check mechanism and keep the same rejection
semantics (401, "unauthorized") and logging as in _verify_auth.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 84989510-0604-4bc6-acc9-b4df404da8e8

📥 Commits

Reviewing files that changed from the base of the PR and between 20cf891 and 0337997.

📒 Files selected for processing (3)

arbos/tester.py
environments/templar/env.py
src/crusades/affinetes/runner.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

exploits/basilica_request_injection.md (1)
10-10: Tighten wording to avoid ambiguity in threat description.

“Somewhat predictable” weakens the statement in a security writeup. Prefer a concrete phrasing (e.g., “predictable” or “enumerable”).
✍️ Suggested wording tweak
-2. Attacker finds the deployment URL (Basilica URLs are somewhat predictable)
+2. Attacker discovers the deployment URL (Basilica URLs are predictable)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@exploits/basilica_request_injection.md` at line 10, Replace the ambiguous
phrase "somewhat predictable" in the threat description with a concrete term to
avoid weakening the claim; update the sentence in
exploits/basilica_request_injection.md (the line containing "Basilica URLs are
somewhat predictable") to use either "predictable" or "enumerable" (e.g.,
"Basilica URLs are predictable" or "Basilica URLs are enumerable") so the threat
model is clear and unambiguous.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@exploits/basilica_request_injection.md`:
- Line 34: The sentence "Each deployment gets a unique token that exists only in
memory — never logged, never stored on disk." makes an absolute guarantee;
change it to a scoped or aspirational statement such as "Each deployment gets a
unique token that is kept only in memory in the current implementation; avoid
logging or storing it on disk" or "designed to be kept only in memory — do not
log or persist tokens." Update the line in
exploits/basilica_request_injection.md to remove the absolute "never" wording
and optionally add a short caveat noting this depends on current infra/logging
configuration.
- Around line 17-22: The fenced log snippet in
exploits/basilica_request_injection.md is missing a language tag (triggering
markdownlint MD040); edit the markdown block that contains the POST /evaluate
... torch.OutOfMemoryError lines and add a language identifier (e.g., use
```text) to the opening fence so the block becomes a labeled code block.

---

Nitpick comments:
In `@exploits/basilica_request_injection.md`:
- Line 10: Replace the ambiguous phrase "somewhat predictable" in the threat
description with a concrete term to avoid weakening the claim; update the
sentence in exploits/basilica_request_injection.md (the line containing
"Basilica URLs are somewhat predictable") to use either "predictable" or
"enumerable" (e.g., "Basilica URLs are predictable" or "Basilica URLs are
enumerable") so the threat model is clear and unambiguous.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6a76ed09-1e19-479e-8c1e-462ffd53c3cc

📥 Commits

Reviewing files that changed from the base of the PR and between 0337997 and d484402.

📒 Files selected for processing (2)

README.md
exploits/basilica_request_injection.md

✅ Files skipped from review due to trivial changes (1)

README.md

coderabbitai

🧹 Nitpick comments (1)

exploits/basilica_request_injection.md (1)

10-10: Consider tightening wording in the attack description.

“Somewhat predictable” weakens the statement; a more direct phrasing would read cleaner in incident documentation.

✏️ Suggested wording tweak

-2. Attacker finds the deployment URL (Basilica URLs are somewhat predictable)
+2. Attacker discovers the deployment URL (Basilica URLs are predictable enough to enumerate)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@exploits/basilica_request_injection.md` at line 10, The sentence "Attacker
finds the deployment URL (Basilica URLs are somewhat predictable)" is weakly
worded—replace the parenthetical "somewhat predictable" with a more direct
phrase such as "are predictable" (e.g., "Attacker finds the deployment URL
(Basilica URLs are predictable)") to tighten the attack description; locate the
exact sentence containing "somewhat predictable" and update it accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@exploits/basilica_request_injection.md`:
- Line 10: The sentence "Attacker finds the deployment URL (Basilica URLs are
somewhat predictable)" is weakly worded—replace the parenthetical "somewhat
predictable" with a more direct phrase such as "are predictable" (e.g.,
"Attacker finds the deployment URL (Basilica URLs are predictable)") to tighten
the attack description; locate the exact sentence containing "somewhat
predictable" and update it accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b47a4fd6-8824-4e97-ad26-e7ed22790d91

📥 Commits

Reviewing files that changed from the base of the PR and between d484402 and b624a04.

📒 Files selected for processing (4)

arbos/tester.py
exploits/basilica_request_injection.md
src/crusades/__init__.py
src/crusades/affinetes/runner.py

✅ Files skipped from review due to trivial changes (1)

src/crusades/init.py

🚧 Files skipped from review as they are similar to previous changes (1)

arbos/tester.py

Hourly cron-friendly script that snapshots the SQLite DB safely (using Python's sqlite3.backup API) and uploads to Cloudflare R2 with timestamped names. Auto-rotates to keep last 48 backups.