Welcome!
TL;DR — the non-negotiables (read this even if you skip the rest)
- Pick a category first (see "What goes where" below): worker, cronjob, backfiller, model, or API endpoint — each has a home.
- Go/Python for workers / cronjobs / backfillers. Python for APIs and quantitative risk models. Anything else needs prior discussion.
- On-chain data should come from chain RPC or our cached block payload, not from third-party indexers or other derived/intermediate feeds. Off-chain feeds (CoinGecko, DefiLlama, etc.) need a good reason and case-by-case approval from the maintainers.
- Keep data pipelines separate from model pipelines — ingest ≠ compute-meaning. The correct pattern is for data pipelines to write to the data store, and model pipelines to ingest the data needed from that store.
- Every timeseries table must be a hypertable + compressed + S3-tiered, in the same migration that creates it.
- Never modify an applied migration — write a new one.
- PR title:
TICKET-1234: <description>. GitHub squash-merges; don't squash locally.
This document is aimed at anyone who has never seen the repo before — in particular:
- contributors of quantitative risk models (the
app/risk_engine/side) - contributors of data pipelines that ingest on-chain and off-chain data (the workers, cronjobs, backfillers)
It explains the layout, how to run things locally, how data acquisition works, and the conventions you're expected to follow when opening a pull request.
If anything here is wrong or unclear, fix it in the same PR as the work that surfaced the problem. Docs rot fast — keep them honest.
Stuck on the actual work? Jump to §16 Getting help.
| Tool | Why | Install |
|---|---|---|
| Go 1.26+ | Every shipped service is Go (experiments/ is scratch and exempt) |
https://go.dev/dl/ |
| Docker | Local infra (Postgres, Redis, Temporal, LocalStack) | Docker Desktop / Colima |
kind |
Runs a Kubernetes cluster inside Docker — mirrors prod | brew install kind |
kubectl |
Talks to the kind cluster | brew install kubectl |
kustomize (optional) |
Only needed for manually previewing a deploy (kustomize build … | kubectl diff -f -). make dev-up uses kubectl's built-in kustomize. |
brew install kustomize |
| AWS CLI (optional) | Only needed if you want to fetch real Alchemy keys via make dev-env |
brew install awscli |
| An Alchemy API key | Mainnet access | ask a team member or sign up at alchemy.com |
You do not need AWS credentials to develop locally — the kind cluster runs LocalStack to emulate SNS/SQS/S3.
git clone git@github.com:archon-research/stl.git
cd stl/stl-verify
# One-time: install lint/test tools.
make tools
# Spin up a full local pipeline (kind cluster, Postgres, Redis,
# LocalStack, Temporal, watcher, workers, cronjobs, python-api).
make dev-upThe first run builds every Docker image and takes several minutes. Warm
re-runs skip already-built images; run make dev-up-rebuild (alias for
COLD=1 make dev-up) to force a rebuild.
When it finishes:
kubectl --context=kind-vector get pods -n vectorEverything should be Running. For local-only pause/resume use
make dev-suspend and make dev-resume (do not use these in CI/prod).
Use make dev-down to delete the cluster; nuke persistent volumes too
with make dev-wipe.
⚠️ You need an Alchemy key for anything to actually work. By defaultmake dev-uppoints the watcher at a mock blockchain server that ships with the repo. The mock is enough to boot the cluster and exercise plumbing, but it is not a fully implemented chain — most RPC methods are stubs, responses are synthetic, and any worker that expects realistic block / receipt / trace data will produce garbage or fall over.To run against the real chain, put your key in
.env.secretsat the repo root (the file is created bymake dev-preflighton first run) and switch the cluster over:make kind-secrets # propagate .env.secrets into the cluster make kind-use-alchemy # point the watcher at Alchemy instead of the mockTo go back to the mock (e.g. offline dev):
make kind-use-mock.
To iterate on a single service without rebuilding the whole cluster, use
the run-* targets, which run the binary on your host against the kind
cluster's Postgres/Redis:
make run-watcher # Ethereum live watcher
make run-oracle-price-worker # Oracle-price per-block worker
make run-sparklend-position-tracker # SparkLend per-block worker
# ... grep `^run-` in stl-verify/Makefile for the full listIf a run-* target needs secrets (e.g. ALCHEMY_API_KEY), run
make dev-env first — it pulls secrets from AWS Secrets Manager and
writes per-service .env files.
stl/
├── stl-verify/ # All Go code lives here (single Go module)
│ ├── cmd/ # One subdirectory per binary (see §7)
│ ├── internal/
│ │ ├── domain/ # Business entities — zero dependencies
│ │ ├── ports/ # Interfaces (inbound = use cases, outbound = infra)
│ │ ├── services/ # Use-case implementations
│ │ ├── adapters/
│ │ │ ├── inbound/ # HTTP / gRPC / CLI handlers
│ │ │ └── outbound/ # alchemy, postgres, redis, sns, sqs, s3, temporal…
│ │ └── pkg/ # Cross-cutting helpers (env, telemetry, lifecycle…)
│ ├── db/migrations/ # SQL migrations, applied automatically
│ ├── python/ # Python API (served via k8s)
│ ├── ts/ # TypeScript helpers
│ └── Makefile # The canonical entry point for every workflow
├── k8s/
│ ├── base/<svc>/ # env-agnostic Kustomize manifests per service
│ └── overlays/{staging,prod}/ # image tags + namespace per environment
├── docs/ # Protocol specs, ADRs, entity diagrams
├── experiments/ # Scratch — not shipped
└── CLAUDE.md / AGENTS.md # Agent conventions (also at stl-verify/AGENTS.md — worth reading as a human too)
Infrastructure (Terraform/OpenTofu) lives in a separate private repo for security reasons. If your change needs new AWS resources (a new SQS queue, an SNS subscription, an IAM policy, a secret, etc.), the change to that repo must land before the code here can deploy cleanly.
| If you're adding… | It goes in… |
|---|---|
| A quantitative risk model (Python) | stl-verify/python/app/risk_engine/<model>/, exposed via a service in app/services/ |
| An on-chain per-block data pipeline | cmd/workers/<name>/ (Go) or cli/workers/<name>/ (Python) |
| An off-chain API data pipeline (periodic) | cmd/cronjobs/<name>/ (Go) or cli/cronjobs/<name>/ (Python) |
| A historical on-chain backfill | cmd/backfillers/<name>/ |
| A new HTTP endpoint for serving data out | stl-verify/python/app/api/v1/ (extend the python-api service) |
| A new TimescaleDB schema (table, index, backfill) | A new migration file under stl-verify/db/migrations/ |
| A protocol spec or design doc | docs/ |
Don't touch experiments/ (scratch, not shipped), the image tags in
k8s/overlays/{staging,prod}/kustomization.yaml (CI owns those).
Pick the language before you start, and when in doubt, ask first.
- APIs are Python. Anything user- or client-facing —
endpoints, anything that serves data out to another system —
belongs in
stl-verify/python/. The existingpython-apiservice is the reference; extend it rather than starting a new HTTP server. - Workers, cronjobs, and backfillers can be any language you prefer,
but we strongly prefer Go or Python. Most of the pipeline is Go
today (the watcher, all SQS workers, all Temporal cronjobs), which
means Go is the path of least resistance: shared helpers, the
lifecycle.Run/temporal.RunCronjobharnesses, the hexagonal scaffolding, the Makefile wiring, and CI are all built around it. Python is a fully supported second option. - Anything outside Go or Python needs prior discussion. Open an
issue or start a thread with
@archon-research/vector-engineersbefore writing code. PRs introducing a new runtime (Rust, TypeScript, Java, …) without a prior design conversation may be rejected regardless of code quality — every new language adds build infrastructure, observability integration, deployment, dependency-management, and on-call load that the team has to absorb forever. That cost is only worth paying when we've agreed it is. - The existing
stl-verify/ts/is for the frontend/UI (currently being built out). That's the established use case; anything else in TypeScript still needs prior discussion.
Two rules that apply everywhere in this repo — they override local convenience:
- Prefer on-chain data whenever we can. If the data lives on the chain, it should be read from the chain (via Alchemy / Erigon and the cached block payload) — it's auditable (we can replay a block), trust-minimized (no third party), and stays consistent with the rest of the pipeline. Off-chain sources (CoinGecko, Anchorage, Etherscan, etc.) are allowed only with a good reason and with explicit approval from the repo maintainers — e.g., the data only exists off-chain. Write the justification in the PR description so that reviewers can see it.
- Data pipelines and model pipelines stay separate. A data pipeline writes "what happened on the chain or at an external API" into Postgres as append-only time-series. A model pipeline reads that data from Postgres and writes "what it means" (risk scores, derived metrics, liquidation forecasts) into its own tables. Don't merge the two into a single worker or cronjob — coupling ingest to model output means every model change forces a re-ingest, and every ingest bug corrupts the model surface. Split them across separate entry points (and usually separate PRs).
The repo follows a hexagonal (ports and adapters) architecture.
Dependencies point inward: domain ← ports ← services ← adapters, and
cmd/* wires concrete adapters into services.
(Exported from the team Excalidraw board. The ASCII diagram below is the text-searchable canonical — if it drifts from the image, one of them is wrong.)
┌─► Redis (block cache, 2d TTL)
│
Alchemy WS ──► watcher ───────┼─► Postgres (block_state)
│
└─► SNS FIFO (one topic per chain)
│
▼
SQS FIFO queues
│
┌────────────────┼────────────────┐
▼ ▼ ▼
oracle-price- sparklend- morpho-indexer
worker tracker (and others)
│ │ │
└────────────────┴────────────────┘
▼
Postgres / TimescaleDB
(time-series, append-only)
Temporal schedules ──► cronjob pods ──► external APIs ──► Postgres
(anchorage, (CoinGecko,
offchain-prices, Anchorage,
data-validator) …)
The watcher fans out three independent writes per block: Redis (hot
cache), Postgres (block_state bookkeeping), and SNS (the message that
triggers workers). The cache key convention is:
stl:{chainId}:{blockNumber}:{version}:{dataType}
where version is bumped when a reorg invalidates a block and dataType
is one of block, receipts, traces, blobs. The SNS/SQS message
only carries a block pointer — workers fetch the actual payload from
Redis using this key.
Each subdirectory under cmd/ is one main.go. They are grouped by
lifecycle:
| Group | Purpose | Example |
|---|---|---|
cmd/base/ |
The chain watcher — the source of block events | watcher |
cmd/workers/ |
Long-running SQS consumers — one message per block | oracle-price-indexer, morpho-indexer, sparklend-indexer, raw-data-backup |
cmd/cronjobs/ |
Long-running Temporal workers triggered on a schedule | offchain-price-indexer, anchorage-indexer, watcher-data-validator |
cmd/backfillers/ |
One-shot jobs that fill historical gaps | oracle-pricing-backfill, sparklend-backfill, raw-block-bulk-downloader |
cmd/util/ |
Dev tooling (migrate, generate-er, cronjob-manifest, stress-test helpers) |
— |
If you're adding data acquisition, pick the category first — it determines the plumbing, deployment shape, and tests you'll write.
Examples: "Track liquidations on a new lending protocol" → per-block
worker in cmd/workers/. "Snapshot an off-chain API every 15 min" →
cronjob in cmd/cronjobs/. "Re-ingest last month's oracle prices" →
backfiller in cmd/backfillers/.
This is the hot path: extract something from every new Ethereum (or Avalanche / Arbitrum / Base / Optimism / Unichain) block.
cmd/base/watchersubscribes tonewHeadsover an Alchemy WebSocket, fetches the full block (header, receipts, traces, optional blobs) via HTTP, writes the raw block into the Redis cache, and publishes aBlockEventto an SNS FIFO topic. It also handles reorgs by bumping the cacheversionand re-emitting affected blocks.- SNS → SQS FIFO fan-out (defined in the Infrastructure repo) gives each worker its own queue, partitioned by block number for ordered processing.
- A worker in
cmd/workers/<name>pulls a message, reads the cached block data from Redis, does its protocol-specific work (e.g. call oracle contracts at that block, diff SparkLend positions, etc.), and writes rows to Postgres/TimescaleDB.
Every worker follows the same skeleton (cmd/workers/oracle-price-indexer/main.go
is the reference):
func run(ctx context.Context, args []string) error {
cfg, err := parseConfig(args) // flags + env, fail fast on missing config
...
consumer, err := sqsadapter.NewConsumer(awsCfg, sqsadapter.Config{...}, logger)
pool, err := postgres.OpenPool(ctx, postgres.DefaultDBConfig(cfg.dbURL))
repo, err := postgres.NewOnchainPriceRepository(pool, logger, buildID, 0)
service, err := oracle_price_worker.NewService(shared.SQSConsumerConfig{...}, consumer, repo, ...)
return lifecycle.Run(ctx, logger, service) // runs the consume loop; handles SIGINT/SIGTERM graceful stop
}- Create
cmd/workers/<my-worker>/main.go. Copy an existing worker as a template. Keepmain()small — it parses flags, wires adapters, and callslifecycle.Run. - Create a service in
internal/services/<my_worker>/. The service owns the business logic, depends only on ports, and exposes a public API tested in isolation (mock the repo + consumer + any contract caller). Services are expected to have ~100% unit-test coverage. - Add outbound ports/adapters if needed. A new protocol reader goes
in
internal/adapters/outbound/<vendor>/; expose it behind a small interface ininternal/ports/outbound/. - Add migrations (see §11) for any new tables.
- Add k8s manifests under
k8s/base/<my-worker>/:deployment.yaml,serviceaccount.yaml,kustomization.yaml. Copyk8s/base/oracle-price-worker/as the template. Wire the new service intok8s/overlays/{staging,prod}/kustomization.yaml. - Add build/deploy targets to the Makefile (
docker-build-<name>,docker-release-<name>, and register the worker in therun-*/kind-load-workers/kind-deploy-workersgroupings). Grep for an existing worker name in the Makefile to see every site you need to touch. - Coordinate with infra. Open a PR in the Infrastructure repo for the SQS queue, SNS subscription, IAM policy, and any secrets — your code PR depends on those resources existing.
Tip: It's welcome (often preferred) to split the k8s-manifest and Infrastructure-repo changes into a follow-up PR. The code PR stays narrowly scoped for engineering review, and the deploy/infra PR stays narrowly scoped for the infra owner — each reviewer only sees their domain.
The flow is identical to the Go version — same SNS FIFO fan-out,
same SQS FIFO queue, same Redis cache lookup by
stl:{chainId}:{blockNumber}:{version}:{dataType}, same Postgres
writes. Only the language changes.
No Python SQS worker exists in the repo yet. If you're the first, you'll be introducing the shared plumbing (SQS/Redis adapters, Dockerfile, Makefile targets) on top of the business logic — budget for it. Put every reusable piece under
app/adapters/…so the second Python worker is copy-paste.Tooling: we use
uvfor Python packaging and execution — not pip, pip-tools, or poetry. Add deps withuv add <pkg>; run anything under the project env withuv run <cmd>; refresh the lockfile withuv sync.
Code skeleton — entry point at stl-verify/python/cli/workers/<my_worker>/main.py:
import asyncio
import signal
from app.adapters.postgres import open_pool
from app.adapters.redis import BlockCache # thin wrapper around redis-py
from app.adapters.sqs import SQSConsumer # thin wrapper around aioboto3
from app.config import load_config
from app.services.my_worker import MyWorkerService
async def run() -> None:
cfg = load_config() # env-driven, fail fast on missing vars
consumer = SQSConsumer(cfg.queue_url, cfg.aws_region)
cache = BlockCache(cfg.redis_url)
pool = await open_pool(cfg.database_url)
service = MyWorkerService(consumer, cache, pool, chain_id=cfg.chain_id)
stop = asyncio.Event()
loop = asyncio.get_running_loop()
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, stop.set)
try:
await service.run_until(stop) # consume loop + per-message handling
finally:
await pool.close()
if __name__ == "__main__":
asyncio.run(run())Run it locally with uv run python -m cli.workers.<my_worker>.main (from stl-verify/python/).
Directory layout — mirrors the Go hexagonal structure (Python uses
cli/ for entry points where Go uses cmd/):
stl-verify/python/cli/workers/<my_worker>/main.py— the entry point shown above. No business logic here.stl-verify/python/app/services/<my_worker>/— business logic for one block event. Full unit tests; mock the SQS consumer, the Redis reader, the repo, and any contract caller.stl-verify/python/app/domain/entities/— pure entities.stl-verify/python/app/ports/— interfaces.stl-verify/python/app/adapters/{sqs,redis,postgres,onchain,…}/— concrete implementations, one subpackage per vendor.
Operational contract (must match Go workers exactly):
- Long-poll SQS receive; process one message at a time in FIFO order; delete on success; let it redrive on failure.
- Handle
SIGINT/SIGTERM— finish the in-flight message, close the DB pool, exit within ~25s (the Python equivalent of Go'slifecycle.Run). - Read block data from Redis using the exact cache-key convention above; do not refetch from Alchemy unless cache-miss rate indicates a real bug.
Adding a new Python per-block worker:
- Create
stl-verify/python/cli/workers/<my_worker>/main.pyfrom the skeleton above. Keep it small. - Create the service in
stl-verify/python/app/services/<my_worker>/with full unit tests (mock every adapter). - Add ports/adapters if needed. A new protocol reader goes in
app/adapters/<vendor>/, exposed via a small interface inapp/ports/. If this is the first Python worker, you'll also createapp/adapters/sqs/andapp/adapters/redis/. - Add dependencies with
uv— e.g.uv add aioboto3 redisfromstl-verify/python/. This updates bothpyproject.tomlanduv.lockatomically. - Add migrations (see §11) for any new tables.
- Add a Dockerfile. The existing
stl-verify/python/Dockerfileis FastAPI-specific. First Python worker introduces either a multi-target Dockerfile (build arg pickscli/workers/<name>) or a parallelDockerfile.worker. Subsequent workers reuse it. Use theuvbase image pattern already in the existing Dockerfile (COPY --from=ghcr.io/astral-sh/uv …,uv sync --frozen --no-dev). - Add Makefile targets —
docker-build-<name>,docker-release-<name>,kind-load-<name>,kind-deploy-<name>, and arun-<name>target for local dev (invokesuv rununder the hood). Follow the pattern ofdocker-build-python-api/docker-release-python-apiinstl-verify/Makefile. - Add k8s manifests under
k8s/base/<my-worker>/(deployment.yaml,serviceaccount.yaml,kustomization.yaml). Copyk8s/base/oracle-price-worker/as the template; update image name and any probes. Register ink8s/overlays/{staging,prod}/kustomization.yaml. - Coordinate with infra. Same SQS queue / SNS subscription / IAM PR as for a Go worker — the plumbing outside the worker is language-agnostic.
Tip: Same as Go workers — split the k8s-manifest and Infrastructure-repo changes into a follow-up PR.
Use this when the data source is not keyed on block height — typically an external REST API (CoinGecko, Anchorage) or a periodic consistency check of our own data.
We use Temporal for scheduling, not Kubernetes CronJobs. Each cronjob is a plain Deployment running a Temporal worker that registers a schedule and an activity on startup:
- The shared runner is
internal/adapters/outbound/temporal.RunCronjob. - Schedules are stored in Temporal, not in k8s. On startup the worker
calls
ScheduleClient().Create— if it already exists, Temporal returnsAlreadyExistsand we carry on. - Changing the interval env var does not take effect automatically: you must delete the schedule in Temporal (UI or CLI) and restart the worker. This is intentional — schedules are Temporal state, not k8s state.
A minimal cronjob (cmd/cronjobs/offchain-price-indexer/main.go):
func main() {
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer cancel()
err := temporal.RunCronjob(ctx, temporal.BuildMeta{...}, temporal.CronjobConfig{
Name: "offchain-price-indexer", // → task queue, schedule ID
IntervalEnv: "PRICE_FETCH_INTERVAL", // overrideable at runtime
IntervalDefault: "5m",
OpenDatabase: postgres.PoolOpener(postgres.DefaultDBConfig(env.Get("DATABASE_URL", ...))),
Setup: setupRunner, // returns temporal.Runner
})
if err != nil { slog.Error("fatal", "error", err); os.Exit(1) }
}
func setupRunner(ctx context.Context, deps temporal.Dependencies) (temporal.Runner, error) {
// Build your service here; return a Runner whose Run(ctx) is the body
// of one tick. Temporal wraps it in an activity + workflow for you.
}- Create
cmd/cronjobs/<my-cronjob>/main.gousing the skeleton above. Pick a sensible default interval — err towards longer, and always make it overrideable via an env var. - Put the business logic in
internal/services/<my_cronjob>/with full unit tests (mock external HTTP clients). - Add k8s manifests under
k8s/base/<my-cronjob>/:deployment.yaml,serviceaccount.yaml,kustomization.yaml. Copyk8s/base/offchain-price-indexer/as the template — cronjob Deployments are small (50m/64Mi requests) because the work happens inside Temporal activities. Register the service ink8s/overlays/{staging,prod}/kustomization.yaml. A skeleton generator is available viago run ./cmd/util/cronjob-manifestif you'd rather start from generated YAML. - Add a
docker-build-cronjob-<name>target to the Makefile (follow the pattern — most of the wiring is automatic thanks to theCRONJOBS := ...glob instl-verify/Makefile). - Infra PR for any new secrets/IAM + a Temporal namespace entry if needed.
Tip: As with workers, splitting the k8s-manifest and Infrastructure-repo changes into a follow-up PR keeps each review narrow and domain-scoped.
Cronjobs are idempotent by design — a tick may be retried by Temporal. Your service must tolerate running twice on the same window without producing duplicates.
The flow is identical to the Go version — Temporal is still the
scheduler, the worker registers a schedule on startup, and each tick
runs one activity. Only the language changes. Same uv tooling rules
as the worker section above.
No Python Temporal worker exists in the repo yet. If you're the first, factor the boilerplate (client connect, schedule ensure, worker run) into a shared harness at
app/adapters/temporal/so the second one is copy-paste.
Code skeleton — entry point at stl-verify/python/cli/cronjobs/<my_cronjob>/main.py.
Uses the temporalio SDK:
import asyncio
import os
from datetime import timedelta
from temporalio.client import (
Client, Schedule, ScheduleActionStartWorkflow,
ScheduleIntervalSpec, ScheduleSpec,
)
from temporalio.service import RPCError
from temporalio.worker import Worker
from app.config import load_config
from app.services.my_cronjob import MyCronjobService, tick_workflow
NAME = "my-cronjob"
INTERVAL = timedelta(minutes=int(os.getenv("MY_CRONJOB_INTERVAL_MIN", "15")))
async def run() -> None:
cfg = load_config()
client = await Client.connect(cfg.temporal_host, namespace=cfg.temporal_namespace)
service = MyCronjobService(cfg) # wires DB + HTTP clients
# Idempotent schedule creation — swallow AlreadyExists on restarts.
try:
await client.create_schedule(
NAME,
Schedule(
action=ScheduleActionStartWorkflow(
tick_workflow,
id=f"scheduled-{NAME}",
task_queue=NAME,
),
spec=ScheduleSpec(intervals=[ScheduleIntervalSpec(every=INTERVAL)]),
),
)
except RPCError as e:
if "AlreadyExists" not in str(e):
raise
worker = Worker(
client,
task_queue=NAME,
workflows=[tick_workflow],
activities=[service.tick],
)
await worker.run()
if __name__ == "__main__":
asyncio.run(run())Run it locally with uv run python -m cli.cronjobs.<my_cronjob>.main (from stl-verify/python/).
Directory layout (mirrors Go; cli/ ≈ cmd/):
stl-verify/python/cli/cronjobs/<my_cronjob>/main.py— entry point (above). No business logic.stl-verify/python/app/services/<my_cronjob>/— business logic, the body of one tick, plus thetick_workflowwrapper. Full unit tests; mock external HTTP clients.stl-verify/python/app/domain/entities/— pure entities.stl-verify/python/app/ports/— interfaces.stl-verify/python/app/adapters/{postgres,onchain,temporal,…}/— concrete implementations, one subpackage per vendor.
Temporal contract (must match Go cronjobs exactly):
- Task queue, schedule ID, and workflow ID all derive from the cronjob name (lowercase, hyphenated).
- Interval read from
<NAME>_INTERVALenv var with a sensible default — prefer longer, and always overrideable. - Create the schedule on startup; swallow
AlreadyExists; changing the interval still requires deleting the schedule in Temporal and restarting the worker. - The activity must be idempotent — Temporal retries. Guard every write against duplicates.
Adding a new Python cronjob:
- Create
stl-verify/python/cli/cronjobs/<my_cronjob>/main.pyfrom the skeleton above. - Create the service in
stl-verify/python/app/services/<my_cronjob>/with unit tests (mock HTTP / repo adapters). Put thetick_workflowwrapper here or inapp/adapters/temporal/if it's generic. - Add ports/adapters for any new external client.
- Add dependencies with
uv— e.g.uv add temporalio(andhttpxif you need a new HTTP client, though it's already in). Run fromstl-verify/python/; updatespyproject.tomlanduv.locktogether. - Add migrations (see §11) for any new tables.
- Dockerfile & Makefile targets — same situation as Python
workers; the first Python cronjob introduces the shared plumbing
(use the
uvbase image pattern already instl-verify/python/Dockerfile). - Add k8s manifests under
k8s/base/<my-cronjob>/. Copyk8s/base/offchain-price-indexer/as the template (small: 50m/64Mi, because the work happens inside the activity). Register ink8s/overlays/{staging,prod}/kustomization.yaml. - Infra PR for any new secrets / IAM + a Temporal namespace entry if needed.
Tip: Same as with workers — split the k8s-manifest and Infrastructure-repo changes into a follow-up PR.
Idempotency and migration rules above still apply.
Risk models are Python and live under
stl-verify/python/app/risk_engine/<model>/. Every model exposes the
same contract: Required Risk Capital (RRC) in USD, computed at the
model's documented default stress and overrideable via a scenario map.
Many models are expected, and more than one may apply to the same
(asset_id, prime_id) — the API returns all applicable results in one
response.
Two families exist today and illustrate the range:
- Asset-level rating (
risk_engine/suraf/) — pre-computed CRR per asset;RRC = usd_exposure × CRR. Defaultusd_exposurederived fromprime_id. - Position-level (
risk_engine/crypto_lending/) — reads collateral state via ports; RRC computed under a default x% gap between liquidation trigger and execution.
app/risk_engine/<model>/ # Pure math. No I/O.
app/ports/ # One `-er` interface per data dependency
app/adapters/{postgres,onchain}/ # One adapter per protocol variant
app/services/<model>_service.py # Wraps math + ports; implements compute(...)
app/api/v1/risk.py # The two routes below. Never import risk_engine/ here.
Both return { asset_id, prime_id?, results: [{ version, risk_model, rrc_usd, details }, ...] }.
Each details is a discriminated union keyed on risk_model (the discriminator field is named risk_model to avoid collision with Pydantic's model_* protected namespace).
Version is a reference to block_version and processing_version. These allow auditability of the results.
GET /v1/risk/rrc?asset_id=…&prime_id=…— every applicable model at its defaults.POST /v1/risk/rrc/scenariowith{ asset_id, prime_id?, overrides: { <model>: {...knobs} } }— same, with per-model scenario overrides. Unknown models/keys → 422.
Every service implements
compute(asset_id, prime_id, overrides) -> RrcResult, with defaults
and override schema documented in its docstring. A registry in
app/services/ maps (asset_id, prime_id) to the set of applicable
services — handlers iterate the registry, never branch on model.
Anything loaded once (rating packages, mappings) goes in
app/main.py::create_app before resources are acquired, so a bad
config fails startup, not a request. Stash on app.state.<name>,
expose via app/api/deps.py. Unit-test services with mocked ports;
integration-test the endpoint against real Postgres (testcontainers) —
mock only third-party APIs. Migrations for any new tables: see §11.
A migration is a single SQL file that makes one versioned change to the database schema (create or alter a table, add an index, backfill a column, etc.). The migrator tracks which files have already run, by checksum, and applies any new ones in order — so "adding a migration" is how you roll a schema change out to every environment reproducibly.
Migrations live in stl-verify/db/migrations/ and are applied
automatically — by the migrator Job on make dev-up, and by the
ArgoCD PreSync hook in staging/prod.
Rules (non-negotiable):
- Filename:
YYYYMMDD_HHMMSS_description.sql(usedate +"%Y%m%d_%H%M%S"). - Plain SQL only, ending with a self-tracking insert:
INSERT INTO migrations (filename) VALUES ('20260420_120000_my_change.sql') ON CONFLICT (filename) DO NOTHING;
- Never modify an applied migration. The migrator checksums every file; a changed checksum fails the deploy. To fix a mistake, write a new migration.
- Every timeseries table is a hypertable, tiered to S3, and
compressed. Without exception. All three are set up in the same
migration that creates the table — don't ship a naked table and
"add the policies later". Specifically:
- Hypertable via
SELECT create_hypertable(...)(or the distributed-hypertable equivalent). Pick a chunk interval that matches the ingest rate — too small and planning cost dominates, too large and compression and S3 tiering can't evict anything. - Compression policy via
ALTER TABLE ... SET (timescaledb.compress, ...)plusSELECT add_compression_policy(...). Choosesegmentby/orderbycolumns that reflect how the table is queried — getting this wrong costs 10–100× on reads. - Tiered-storage (S3) policy via
SELECT add_tiering_policy(...). This is what keeps the hot Postgres volume small; skipping it is how we run out of disk in prod. Also: primitives must be compatible with distributed hypertables. When in doubt, readdocs/data_entities.mdand ADR-0002, or copy the most recent timeseries migration as a template.
- Hypertable via
- Use
CREATE INDEX CONCURRENTLYon big tables. Test on staging first.
| Command | What it runs | When you need it |
|---|---|---|
make test |
Unit tests | Everyday dev |
make test-race |
Unit tests with -race |
What CI runs — run before pushing |
make test-integration |
Tests tagged integration (requires Docker — uses testcontainers) |
Anything touching Postgres, Redis, SQS, S3 |
make e2e |
End-to-end watcher tests | Changes to the live/backfill pipeline |
make cover / make cover-all |
Coverage report | When working on a service with a coverage goal |
Philosophy:
- Services are the unit under test. Test only the public API; mock
outbound ports with small handwritten fakes. Use table-driven tests.
Services and
main.gofiles should strive for 100% coverage on meaningful code paths. The only exceptions are branches that genuinely can't be exercised (generated code, OS-specific shims, truly unreachable error branches); each exception should be justified in the PR. - Integration tests may only mock things we do not control. Alchemy and other third-party APIs: mock. Our Postgres, our Redis, our SQS: use the real thing (via testcontainers) — mocking them has bitten us in production.
- For
main.go, extract arun(ctx, args) errorfunction and call it frommain()— then the integration test callsrundirectly.
Python code uses its own Makefile and uv. We use
uv as the Python package manager and
runner — not pip, pip-tools, or poetry. uv sync installs deps
from uv.lock; uv add <pkg> adds a dep; uv run <cmd> runs
anything under the project env.
Commands for stl-verify/python/ live in stl-verify/python/Makefile
(each one wraps uv run), run from that directory:
| Command | What it runs |
|---|---|
make test-unit |
Python unit tests (uv run pytest tests/unit) |
make test-integration |
Python integration tests (Docker needed) |
make lint / make lint-fix |
Ruff check / autofix |
make run |
Start the FastAPI server locally |
The root stl-verify/Makefile targets in the table above only cover
Go code.
Most of these are also spelled out in CLAUDE.md and stl-verify/AGENTS.md, which apply to humans too.
- Hexagonal boundaries. Domain has zero imports outside the standard
library.
services/depends onports/, never onadapters/. Adapters are dumb — no business logic. - Interfaces use the
-ersuffix (Reader,Publisher,Multicaller). - Constructors are
NewFoo(...). - Files are
snake_case.go. - Errors: wrap with context —
fmt.Errorf("fetching block %d: %w", n, err). Never ignore an error. Prefer returning the error over continuing. - Prefer the standard library. Pull in a dependency only when the stdlib equivalent is materially worse. De-duplicate by extracting a helper, not by copy-paste.
- Keep
main()at the top of the file and small — it should read like prose. Extract helpers aggressively. - No global state, no singletons. Inject everything through constructors.
- Binaries built with
go buildgo intostl/dist/(gitignored).
- Branch off
main. Name the branch after the Linear ticket if there is one (VEC-123-short-slug). - Open a PR early — drafts are fine. The
CODEOWNERSfile auto-requests review from@archon-research/vector-engineers. - Before you push, run:
CI runs the same commands. A CI run that modifies files (e.g. a stray
cd stl-verify make ci # lint + static + vuln + unit + race make test-integration # if you touched anything data-adjacent
go mod tidydiff) is a hard fail — commit the result locally. - PR title format:
TICKET-1234: <good description>— e.g.VEC-123: Add SparkLend position tracker. The ticket prefix is how we link PRs to Linear and find things later; the description should say what the PR does, not what it is (preferVEC-123: Backfill oracle prices for new Aave marketsoverVEC-123: Oracle changes). Commit messages can be whatever you like — do not bother squashing WIP commits locally, GitHub is configured for squash-and-merge, so the PR title becomes the commit message onmainand your intermediate commits are discarded automatically. - Merge to
main— CI then triggers.github/workflows/deploy.yaml, which bumps image tags ink8s/overlays/staging/kustomization.yamland ArgoCD rolls the change into thevectornamespace on the staging EKS cluster. Prod is a separate manual promotion (bump the tag ink8s/overlays/prod/kustomization.yaml).
- Read the Makefile directly —
stl-verify/Makefileis the source of truth for every workflow and has many more targets than are documented here (Erigon management, bulk downloads, stress tests, bastion tunnels, …). Grep the top of the file or search for the target name you're after; there is nomake help. - Look at existing examples before inventing a pattern. Every
category (
workers/,cronjobs/,backfillers/) has a reference implementation that wires the adapters in the approved way. Copy it. - Secrets never go in git. Local secrets live in
.env.secretsat the repo root (gitignored, created bymake dev-preflight). Cloud secrets live in AWS Secrets Manager and are pulled bymake dev-env. - If you're stuck on infra, contact the vector team.
- Code questions / design review:
@archon-research/vector-engineerson GitHub, or#proj-verify-beaconon Laniakea Slack (review is required anyway — ask early). - Protocol specs: see
docs/—aave_v3_spec.md,morpho_spec.md,sparklend_spec.md, etc. - Architecture decisions:
docs/adr/holds the short record of why we chose kind over Minikube, why every row is versioned, and so on. - Anything else / general questions:
#proj-verify-beaconon Laniakea Slack.
Thanks for contributing — ship data-correct code and we'll all sleep better.
