I build production LLM systems and the data infrastructure they sit on top of. Started in warehouse operations, taught myself to code to fix the manual work my team was drowning in. Three years later I'm shipping adversarial agent evals, production data pipelines, and live trading systems.
Live at claygeo.dev
Selected work:
solhunt-duel — Adversarial red/blue agent system for smart-contract auditing. Red writes exploits, Blue writes patches, four server-side Forge-verified gates decide the verdict. The agents cannot see or modify the gates — because if you let agents grade themselves, they will. Multi-provider LLM router across Anthropic, OpenAI, OpenRouter, and Ollama with cost controls and structured fallbacks. Public leaderboard.
solhunt — Autonomous AI agent that finds and exploits smart-contract vulnerabilities. Reads Solidity, writes a Foundry exploit test, executes it on a forked mainnet, iterates against real compiler and execution feedback. Reproduced Beanstalk's $182M flash-loan hack in 1m44s for $0.65. Hit 67.7% on a curated 32-contract DeFiHackLabs subset, beating Anthropic's SCONE-bench at 51.1% — then collapsed to 13% on a random 95-contract draw. I published both numbers. That 54-point honesty gap drove the verifier-gate design in solhunt-duel.
competitive-intel-platform — Production data engineering at scale. 14-state, 31-chain, 65+ store competitive pricing pipeline. Reverse-engineered 3 proprietary retail APIs (SweedPOS, Algolia, Trulieve GraphQL) under Cloudflare/auth with Playwright + session cookie handoff. BullMQ + Redis worker fleet, Postgres normalization, OCR on promotional images via Google Cloud Vision, Telegram alerts. Replaced a 2-week manual analysis cycle with same-day data. Used daily by my 4-person pricing team.
Hyperliquid Copy-Trading Harness — Live mainnet AI/quant system. $1,554 personal capital, 11 server-side risk gates, regime-stratified backtest with null-distribution validation: 50 random wallets sampled, Bonferroni-adjusted at α=0.05, 0 promoted. Stop-loss watchdog with re-arm and emergency market close. The harness, not the trades, is the artifact.
sandwich-rs — Real-time Solana MEV sandwich detector. Rust + Helius enhanced WebSocket + bounded-backpressure parser pool + per-pool ring buffer with ≤3-slot detection window. IDL-correct pool extraction for Raydium AMM v4 + Orca Whirlpool. Profit calc with Jito tip detection across 8 known tip accounts. Idempotent Postgres persistence, SSE feed, slot-resume marker for outage gap recovery.
hyperliquid-trading-sim — Live at tradeterm.app. Full-stack paper trading platform with 70+ perpetual futures on real Hyperliquid + Binance data. Atomic Postgres transactions, JWT auth + RLS, WebSocket streaming, stress-tested to 1,000 TPS.
How I work: I ship end-to-end. Not "I prototyped something" — I ship to production, measure it, iterate. Three of the projects above are live URLs you can hit right now. The cannabis inventory system has been running in warehouse operations for over a year. I write things down honestly — when solhunt's exploit rate dropped from 67.7% to 13% on a random sample, I published both numbers and treated the gap as a design problem.
Stack:
Languages: TypeScript · Python · Rust · Solidity · SQL LLM/AI: Anthropic SDK · OpenAI · OpenRouter · Ollama · multi-provider routing · eval harness design · agent orchestration · tool use · structured fallbacks Smart contracts: Foundry · Anvil · forked mainnet · Etherscan API · Alchemy Data/infra: PostgreSQL · Supabase · BullMQ · Redis · Docker · Playwright · WebSockets Frontend: React 18 · Next.js 14 · Vite · Bun Deploy: Netlify · Vercel · Hetzner · GitHub Actions
Currently looking for AI engineering roles — LLM systems, evals, agents, production AI infrastructure. Particularly interested in teams building agent frameworks, eval platforms, or AI security. Remote-friendly.



