Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
823a87b
save
grzesir Nov 18, 2025
a9afa21
save and deploy
grzesir Nov 19, 2025
42971ec
bug fixes and docs update
grzesir Nov 21, 2025
83968bf
Wording to switch to ThetaData.
grzesir Nov 21, 2025
308e41c
save. theta almost fully working
grzesir Nov 22, 2025
2d2b6e4
Document remote guardrails and trust downloader readiness
grzesir Nov 22, 2025
1a1574a
Massive improvements for Theta data.
grzesir Nov 23, 2025
d776842
deploy
grzesir Nov 23, 2025
0a6cff7
test fixes
grzesir Nov 23, 2025
64f8bca
bug fixes for theta data + added tests
grzesir Nov 23, 2025
b0af23d
deployed
grzesir Nov 23, 2025
1c69e75
Bug fixes for Theta Data S3 caching.
grzesir Nov 24, 2025
855ad76
deploy 4.4.0
grzesir Nov 24, 2025
64ea7b9
Fix cash accounting and tearsheet prep
grzesir Nov 26, 2025
b09c8b6
Add position sign tests and bracket regression
grzesir Nov 26, 2025
aad915c
deploy
grzesir Nov 26, 2025
aafec49
theta cache download problem fixed
grzesir Nov 28, 2025
e56487b
Huge improvements to ThetaData backtesting. Including better caching …
grzesir Nov 29, 2025
c60cd37
Saved. Backtest working very well now for ThetaData
grzesir Nov 29, 2025
1528643
Clean up logs and fix option Greeks for Theta data.
grzesir Nov 29, 2025
8e7f01f
Fix linting errors and undefined name references
grzesir Nov 29, 2025
a2d515e
Fix undefined name errors (F821) for CI
grzesir Nov 29, 2025
f8b9bf6
Reduce integration test date ranges for faster CI
grzesir Nov 29, 2025
894fa67
Fix integration tests: don't require trades for ThetaData verification
grzesir Nov 29, 2025
ffc10ef
Fix CI test failures: prevent data source override for legacy tests
grzesir Nov 29, 2025
29894ff
Fix CI: set BACKTESTING_DATA_SOURCE=none to prevent ThetaData override
grzesir Nov 30, 2025
9256502
Fix test failures: ThetaData-only optimization and test cleanup
grzesir Nov 30, 2025
9425310
fix: revert PandasData filter direction to avoid order fill regression
grzesir Nov 30, 2025
cb6ad2d
fix: revert PANDAS section changes to fix Polygon/multileg test regre…
grzesir Nov 30, 2025
6abf82c
fix: extend mock data coverage in test_update_pandas_data_fetches_rea…
grzesir Nov 30, 2025
5530333
fix: update test_stock_diversified_leverage expected CAGR to match cu…
grzesir Dec 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/cicd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,18 @@ jobs:

env:
AIOHTTP_NO_EXTENSIONS: 1
# CRITICAL: Set this to "none" so tests use their explicit data sources.
# Tests that want ThetaData must explicitly request it.
# Without this, the default is "ThetaData" which overrides ALL backtests.
BACKTESTING_DATA_SOURCE: none
POLYGON_API_KEY: ${{ secrets.POLYGON_API_KEY }}
THETADATA_USERNAME: ${{ secrets.THETADATA_USERNAME }}
THETADATA_PASSWORD: ${{ secrets.THETADATA_PASSWORD }}
# NOTE (2025-11-28): Data Downloader is a production proxy for ThetaData that allows
# shared access without requiring a local ThetaTerminal JAR. When these are set,
# ThetaData tests will use the remote downloader instead of spawning a local process.
DATADOWNLOADER_BASE_URL: ${{ secrets.DATADOWNLOADER_BASE_URL }}
DATADOWNLOADER_API_KEY: ${{ secrets.DATADOWNLOADER_API_KEY }}
ALPACA_TEST_API_KEY: ${{secrets.ALPACA_TEST_API_KEY}} # Required for alpaca unit tests
ALPACA_TEST_API_SECRET: ${{secrets.ALPACA_TEST_API_SECRET}} # Required for alpaca unit tests
TRADIER_TEST_ACCESS_TOKEN: ${{secrets.TRADIER_TEST_ACCESS_TOKEN}} # Required for tradier unit tests
Expand Down
43 changes: 43 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# LumiBot Agent Instructions (Theta / Downloader Focus)

These rules are mandatory whenever you work on ThetaData integrations.

1. **Never launch ThetaTerminal locally.** Production has the only licensed session. Starting the jar (even briefly or via Docker) instantly terminates the prod connection and halts all customers.
2. **Use the shared downloader endpoint.** All tests/backtests must set `DATADOWNLOADER_BASE_URL=http://44.192.43.146:8080` (or whatever prod IP `/version` reports) and `DATADOWNLOADER_API_KEY=<secret>`. Do not short-cut by hitting Theta
directly.
3. **Respect the queue/backoff contract.** LumiBot no longer enforces a 30 s client timeout; instead it listens for the downloader’s `{"error":"queue_full"}` responses and retries with exponential backoff. If you add new downloader
integrations, reuse that helper so we never DDoS the server.
4. **Long commands = safe-timeout.** Wrap backtests/pytest/stress jobs with `/Users/robertgrzesik/bin/safe-timeout <duration> …` to ensure we never spawn orphaned processes.
5. **Artifacts.** When demonstrating fixes, capture `Strategy\ Library/logs/*.log`, tear sheets, and downloader stress JSONs so the accuracy/dividend/resilience story stays reproducible.

Failure to follow these rules will break everyone's workflows—double-check env vars before running anything.

---

## Test Philosophy (CRITICAL FOR ALL PROJECTS)

### Test Age = Test Authority

When tests fail, how you fix them depends on **how old the test is**:

| Test Age | Authority Level | How to Fix |
|----------|----------------|------------|
| **>1 year old** | LEGACY - High authority | **Fix the CODE**, not the test. These tests have proven themselves over time. |
| **6-12 months** | ESTABLISHED - Medium authority | Investigate carefully. Likely fix the code, but could be test issue. |
| **<6 months** | NEW - Lower authority | Test may need adjustment. Still verify code isn't broken. |
| **<1 month** | EXPERIMENTAL | Test is still being refined. Adjust as needed. |

### Check Test Age Before Fixing

```bash
git log --format="%ai" --follow -- tests/path/to/test.py | tail -1
```

### Conflict Resolution

When old tests and new tests conflict:
1. **Old test wins by default** - it has proven track record
2. If the new test represents genuinely new functionality, **ask the user for judgment**
3. Document any judgment calls in the test file with comments

This philosophy applies to ALL projects, not just LumiBot.
426 changes: 426 additions & 0 deletions BACKTESTING_ARCHITECTURE.md

Large diffs are not rendered by default.

291 changes: 291 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,291 @@
# CLAUDE.md - AI Assistant Instructions for LumiBot

## Quick Start

**First, read these files:**
1. `BACKTESTING_ARCHITECTURE.md` - Understand the backtesting data flow
2. `AGENTS.md` - Critical rules for ThetaData (DO NOT SKIP)

## Project Overview

LumiBot is a trading and backtesting framework supporting multiple data sources (Yahoo, ThetaData, Polygon) and brokers (Alpaca, Interactive Brokers, Tradier, etc.).

## Key Locations

| What | Where |
|------|-------|
| LumiBot library | `/Users/robertgrzesik/Documents/Development/lumivest_bot_server/strategies/lumibot/` |
| Strategy Library | `/Users/robertgrzesik/Documents/Development/Strategy Library/` |
| Demo strategies | `/Users/robertgrzesik/Documents/Development/Strategy Library/Demos/` |
| Environment config | `Demos/.env` for strategies, `lumibot/.env` for library |
| Backtest logs | `/Users/robertgrzesik/Documents/Development/Strategy Library/logs/` |

## Critical Rules

### ThetaData Rules (MUST FOLLOW)

1. **NEVER run ThetaTerminal locally** - It will kill production connections
2. **Only use Data Downloader** at `http://44.192.43.146:8080`
3. **Always compare ThetaData vs Yahoo** - Yahoo is the gold standard for split-adjusted prices
4. See `AGENTS.md` for complete rules

### Data Source Selection

The `BACKTESTING_DATA_SOURCE` env var **OVERRIDES** explicit code:
```bash
# In .env file
BACKTESTING_DATA_SOURCE=thetadata # Uses ThetaData regardless of code
BACKTESTING_DATA_SOURCE=yahoo # Uses Yahoo regardless of code
BACKTESTING_DATA_SOURCE=none # Uses whatever class the code specifies
```

### Cache Management

If seeing wrong/stale data:
1. Bump `LUMIBOT_CACHE_S3_VERSION` (e.g., v5 → v6)
2. Clear local cache: `rm -rf ~/Library/Caches/lumibot/`

## Common Tasks

### Run a Backtest

```bash
cd "/Users/robertgrzesik/Documents/Development/Strategy Library/Demos"
python3 "TQQQ 200-Day MA.py"
```

### Compare Yahoo vs ThetaData

1. Edit `Demos/.env`:
- Set `BACKTESTING_DATA_SOURCE=yahoo`
2. Run backtest, note results
3. Edit `Demos/.env`:
- Set `BACKTESTING_DATA_SOURCE=thetadata`
4. Run backtest, compare results
5. Results should match within ~1-2%

### Check Backtest Results

```bash
ls -la "/Users/robertgrzesik/Documents/Development/Strategy Library/logs/" | grep TQQQ | tail -10
```

Look at `*_tearsheet.csv` for CAGR and metrics.

## Known Issues & Fixes

### ✅ ThetaData Split Adjustment (FIXED - Nov 28, 2025)

**Status:** FIXED - Split handling now correct

**Root cause:** The `_apply_corporate_actions_to_frame()` function was being called 26+ times per backtest without any idempotency check, causing split adjustments to be applied multiple times.

**Fix applied:**
1. Added idempotency check at start of `_apply_corporate_actions_to_frame()` - checks for `_split_adjusted` column marker
2. Added marker at end of function after successful adjustment
3. Cache version bumped to v7

### ✅ ThetaData Dividend Split Adjustment (FIXED - Nov 28, 2025)

**Status:** FIXED - 17/21 dividends now match Yahoo within 5%

**Root causes found:**
1. `_update_cash_with_dividends()` was called 3 times per day without idempotency
2. ThetaData dividend amounts were UNADJUSTED for splits
3. ThetaData returned duplicate dividends for same ex_date (e.g., 2019-03-20 appeared 4x)
4. ThetaData returned special distributions with `less_amount > 0` (e.g., 2015-07-02)

**Fixes applied:**
1. Added `_dividends_applied_tracker` in `_strategy.py` to prevent multiple applications
2. Added split adjustment to `get_yesterday_dividends()` in `thetadata_backtesting_pandas.py`
3. Added deduplication by ex_date in `_normalize_dividend_events()`
4. Added filter for `less_amount > 0` to exclude special distributions

**Verified split adjustment:**
- ThetaData cumulative factor for 2014 dividends: 48x (2×3×2×2×2)
- After adjustment: $0.01182 raw → $0.000246 adjusted ≈ Yahoo's $0.000250 ✓

**Current results:** ~47% CAGR with ThetaData vs ~42% with Yahoo (gap due to phantom dividends)

### ⚠️ ThetaData Phantom Dividend (KNOWN ISSUE - Reported to ThetaData)

**Status:** KNOWN DATA QUALITY ISSUE - Reported to ThetaData support team

| Date | ThetaData | Yahoo | Status |
|------|-----------|-------|--------|
| 2014-09-18 | $0.41 raw | None | ⚠️ PHANTOM - main cause of CAGR gap |
| 2015-07-02 | $1.22 raw | None | ✅ FILTERED (less_amount=22.93) |
| 2020-12-23 | $0.000283 | None | ⚠️ PHANTOM |
| 2021-12-23 | $0.000119 | None | ⚠️ PHANTOM |

**Root cause:** ThetaData phantom dividends are DATA ERRORS in the SIP feed, not Return of Capital (ROC) distributions. Confirmed via Perplexity research - these amounts don't appear in any other financial database (Yahoo, Bloomberg, SEC filings).

**Workaround options:**
1. Use `BACKTESTING_DATA_SOURCE=yahoo` for dividend-sensitive strategies
2. Wait for ThetaData to fix the data quality issue
3. Accept ~5% CAGR gap as known ThetaData limitation

**Key files:**
- `lumibot/tools/thetadata_helper.py` - `_apply_corporate_actions_to_frame()`, `_normalize_dividend_events()`
- `lumibot/backtesting/thetadata_backtesting_pandas.py` - `get_yesterday_dividends()`
- `lumibot/strategies/_strategy.py` - `_update_cash_with_dividends()`

### ✅ ThetaData Zero-Price Data Filtering (FIXED - Nov 28, 2025)

**Status:** FIXED - Zero-price rows now filtered automatically

**Root cause:** ThetaData sometimes returns rows with all-zero OHLC values (e.g., Saturday 2019-06-08 for MELI), which caused `ZeroDivisionError` when strategies tried to calculate positions.

**Fix applied:**
1. Added zero-price filtering when loading from cache (`thetadata_helper.py` lines ~2501-2513)
2. Added zero-price filtering when receiving new data from ThetaData (`thetadata_helper.py` lines ~2817-2829)
3. Cache is self-healing - bad data is filtered on load

**Unit tests added:**
- `TestZeroPriceFiltering` class with 6 tests covering all edge cases
- Tests verify: zero-row removal, valid-zero-volume preservation, weekend-zero handling, partial-zeros, empty DF, all-zero DF

### Cache Version Mismatch

Always ensure `.env` files have matching cache versions:
- `lumibot/.env`
- `Demos/.env`

## Test Philosophy (CRITICAL - READ THIS)

### Test Age = Test Authority

When tests fail, how you fix them depends on **how old the test is**:

| Test Age | Authority Level | How to Fix |
|----------|----------------|------------|
| **>1 year old** | LEGACY - High authority | **Fix the CODE**, not the test. These tests have proven themselves over time. |
| **6-12 months** | ESTABLISHED - Medium authority | Investigate carefully. Likely fix the code, but could be test issue. |
| **<6 months** | NEW - Lower authority | Test may need adjustment. Still verify code isn't broken. |
| **<1 month** | EXPERIMENTAL | Test is still being refined. Adjust as needed. |

### Check Test Age Before Fixing

```bash
# Check when a test file was first created
git log --format="%ai" --follow -- tests/path/to/test.py | tail -1

# Check when a specific test function was added
git log -p --all -S 'def test_function_name' -- tests/
```

### Conflict Resolution

When old tests and new tests conflict:
1. **Old test wins by default** - it has proven track record
2. If the new test represents genuinely new functionality, ask the user
3. Document any judgment calls in the test file with comments

### Adding Comments to Tests

For tests over 1 year old, add a comment when modifying:
```python
# LEGACY TEST (created Aug 2023) - This test has proven correct behavior
# DO NOT modify expected values without understanding the full impact
# If this test fails, FIX THE CODE, not this test
def test_important_behavior():
...
```

## Testing Checklist for Data Source Changes

1. Run TQQQ 200-Day MA with Yahoo (2013-2025) → expect ~30-45% CAGR
2. Run same strategy with ThetaData → should match Yahoo within ~5%
3. Check for anomalous daily returns (>50% gain/loss indicates split issue)
4. Compare specific prices around split dates (esp. Jan 13, 2022 2:1 forward split)

## Architecture Quick Reference

```
Strategy.backtest()
Data Source Selection (env var overrides code)
├── Yahoo: yfinance → split-adjusted prices
├── ThetaData: Data Downloader → split-adjusted prices
└── Polygon: Polygon API → handles splits in cache validation
BacktestingBroker (simulates trades)
Results (tearsheet, trades, logs)
```

See `BACKTESTING_ARCHITECTURE.md` for detailed data flow diagrams.

## CRITICAL: MCP Servers (MUST USE)

**ATTENTION CLAUDE:** These MCP servers are **ESSENTIAL** for working effectively on LumiBot. They have proven to make a **massive difference** in debugging complex data issues, validating financial data accuracy, and navigating this large codebase. **USE THEM PROACTIVELY AND FREQUENTLY.**

**IF THESE MCP SERVERS ARE NOT AVAILABLE:** Inform the user immediately and strongly recommend they install them. The debugging experience without these tools is significantly degraded. Installation instructions can be found in the Claude Code documentation.

### Why These Tools Matter for LumiBot

LumiBot deals with:
- Complex financial data from multiple sources (Yahoo, ThetaData, Polygon)
- Split adjustments, dividend calculations, and corporate actions
- Large codebase with intricate data flow paths
- Data accuracy requirements where small errors compound over time

Without MCP tools, debugging these issues is slow and error-prone. With them, you can validate data against external sources, systematically break down problems, and navigate the codebase efficiently.

---

### Sequential Thinking (`mcp__sequential-thinking__sequentialthinking`) - **USE THIS FIRST**

**STRONGLY RECOMMENDED** for ANY complex debugging task. This tool has been critical for:
- Breaking down backtesting discrepancies into systematic steps
- Analyzing why ThetaData vs Yahoo results differ
- Planning fixes that don't introduce regressions
- Debugging split/dividend calculation issues

**USE IT:** Before diving into complex code changes, use sequential thinking to plan your approach.

### Perplexity (`mcp__perplexity__*`) - **ESSENTIAL FOR DATA VALIDATION**

**CRITICAL** for validating financial data. LumiBot data issues often stem from incorrect source data. Perplexity lets you:
- `perplexity_search` - Verify stock split dates and ratios
- `perplexity_research` - Deep dive into dividend history discrepancies
- `perplexity_ask` - Quick validation of corporate action data

**REAL EXAMPLE:** We discovered ThetaData "phantom dividends" by using Perplexity to cross-reference against Yahoo, Bloomberg, and SEC filings. This identified data quality issues that would have been impossible to find otherwise.

**USE IT:** Whenever you see unexpected financial data, validate it with Perplexity before assuming the code is wrong.

### Memory (`mcp__memory__*`) - **TRACK YOUR FINDINGS**

**HIGHLY RECOMMENDED** for maintaining context across debugging sessions:
- Store known data source discrepancies
- Record phantom dividend dates and amounts
- Track cache version changes and their reasons
- Remember which fixes were applied and why

**USE IT:** When you discover an issue, store it in memory so future sessions don't have to rediscover it.

### X-Ray (`mcp__xray__*`) - **NAVIGATE THE CODEBASE**

**PARTIALLY USEFUL** for understanding the LumiBot codebase:
- `explore_repo` - **WORKS** - Map the directory structure
- `what_breaks` - **WORKS** - Find all usages of a function (text search)
- `find_symbol` - **REQUIRES ast-grep** - If this fails, install: `brew install ast-grep`

**USE IT:** Before modifying any function, use `what_breaks` to understand the impact. If `find_symbol` fails, use Grep tool instead.

### Context7 (`mcp__context7__*`) - **GET CURRENT DOCS**

**USEFUL** for library documentation:
- `resolve-library-id` - Find library IDs
- `get-library-docs` - Get current pandas, yfinance, polygon docs

### Chrome DevTools (`mcp__chrome-devtools__*`)

**USEFUL** for debugging Data Downloader issues:
- Test API endpoints directly
- Inspect network responses from ThetaData
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,16 @@

Lumibot is a backtesting and trading library for stocks, options, crypto, futures and more. It is made so that the same code you use for backtesting can be used for live trading, making it easy to transition from backtesting to live trading. Lumibot is a highly flexible library that allows you to create your own strategies and indicators, and backtest them on historical data. It is also highly optimized for speed, so you can backtest your strategies quickly and efficiently.

**IMPORTANT: This library requires data for backtesting. The recommended data source is [Polygon.io](https://polygon.io/?utm_source=affiliate&utm_campaign=lumi10) (a free tier is available too). Please click the link to give us credit for the sale, it helps support this project. You can use the coupon code 'LUMI10' for 10% off.**
**IMPORTANT: This library requires data for backtesting. Our recommended data source is [ThetaData](https://www.thetadata.net/) because they provide the deepest historical coverage we’ve found and directly support BotSpot. Use the promo code `BotSpot10` at checkout for 10% off the first order (the code also tells ThetaData you were referred by us).**

> **Contributor note:** Read `AGENTS.md` before running anything Theta-related. That file spells out the hard rules—never launch ThetaTerminal or the shared downloader locally, always point LumiBot at the AWS-hosted downloader, and wrap all long
> commands with `/Users/robertgrzesik/bin/safe-timeout`. Breaking these rules kills the only licensed Theta session.

## Architecture Documentation

- `BACKTESTING_ARCHITECTURE.md` - Detailed documentation of the backtesting data flow (Yahoo, ThetaData, Polygon data sources, caching, and data flow diagrams)
- `CLAUDE.md` - AI assistant instructions for working with the codebase
- `AGENTS.md` - Critical rules for ThetaData and production safety

## Documentation - 👇 Start Here 👇

Expand Down
Loading