Skip to content

fix(qlib): prevent MultiIndex duplication from groupby().rolling() pattern#1401

Open
shin4 wants to merge 6 commits into
microsoft:mainfrom
shin4:fix-multiindex-rolling
Open

fix(qlib): prevent MultiIndex duplication from groupby().rolling() pattern#1401
shin4 wants to merge 6 commits into
microsoft:mainfrom
shin4:fix-multiindex-rolling

Conversation

@shin4
Copy link
Copy Markdown

@shin4 shin4 commented Apr 28, 2026

Summary

This PR introduces a preventive fix for pandas MultiIndex issues caused by groupby().rolling() patterns in LLM-generated factor code.

Fixes #678

Problem

When LLM generates factor code with rolling operations on MultiIndex data (index: ['datetime', 'instrument']), a common pattern produces 3-level indices instead of the expected 2-level:

# ❌ WRONG - Creates 3-level index: ['instrument', 'datetime', 'instrument']
ma_20 = volume.groupby(level='instrument').rolling(window=20).mean()
# ValueError: The name instrument occurs multiple times

This causes pd.concat() to fail with:

AssertionError: Length of new_levels (3) must be <= self.nlevels (2)

See Issue #678 for detailed error report.

Solution

1. Preventive Code Fix

Auto-detect and fix the problematic pattern in generated factor code before execution:

# rdagent/scenarios/qlib/developer/utils.py
def _fix_groupby_rolling_pattern(code: str) -> str:
    """
    Fix pandas groupby().rolling() patterns that cause index duplication.
    
    Converts: .groupby(level='instrument').rolling(window=N).mean()
    To:       .groupby(level='instrument').transform(lambda x: x.rolling(window=N).mean())
    """

2. Prompt Enhancement

Add documentation in prompts.yaml to guide LLM to generate correct code from the start:

**CRITICAL: Pandas MultiIndex groupby().rolling() Pattern**

❌ WRONG:
  ma_20 = volume.groupby(level='instrument').rolling(window=20).mean()

✅ CORRECT:
  ma_20 = volume.groupby(level='instrument').transform(
    lambda x: x.rolling(window=20).mean()
  )

Changes

File Change
rdagent/scenarios/qlib/developer/utils.py Add _fix_groupby_rolling_pattern() function
rdagent/scenarios/qlib/experiment/prompts.yaml Add MultiIndex rolling pattern documentation

Comparison with #1375

Aspect This PR (Preventive) #1375 (Remedial)
Fix timing Before code execution Before concat
Root cause ✅ Yes ⚠️ Partially
Data integrity ✅ Preserved ⚠️ May drop level incorrectly
Index ordering ✅ Correct ⚠️ May need swaplevel

Recommendation: Merge both for defense-in-depth.

Testing

  • All offline tests pass: pytest -m offline
  • Manual testing with qlib fin_factor scenario
  • Verified factor data produces correct 2-level MultiIndex

Related


📚 Documentation preview 📚: https://RDAgent--1401.org.readthedocs.build/en/1401/

This PR introduces a **preventive fix** for pandas MultiIndex issues caused by `groupby().rolling()` patterns in LLM-generated factor code.

Fixes microsoft#678

## Problem

When LLM generates factor code with rolling operations on MultiIndex data (index: `['datetime', 'instrument']`), a common pattern produces 3-level indices instead of the expected 2-level:

```python
# ❌ WRONG - Creates 3-level index: ['instrument', 'datetime', 'instrument']
ma_20 = volume.groupby(level='instrument').rolling(window=20).mean()
# ValueError: The name instrument occurs multiple times
```

This causes `pd.concat()` to fail with:
```
AssertionError: Length of new_levels (3) must be <= self.nlevels (2)
```

See Issue microsoft#678 for detailed error report.

## Solution

### 1. Preventive Code Fix

Auto-detect and fix the problematic pattern in generated factor code **before execution**:

```python
# rdagent/scenarios/qlib/developer/utils.py
def _fix_groupby_rolling_pattern(code: str) -> str:
    """
    Fix pandas groupby().rolling() patterns that cause index duplication.

    Converts: .groupby(level='instrument').rolling(window=N).mean()
    To:       .groupby(level='instrument').transform(lambda x: x.rolling(window=N).mean())
    """
```

### 2. Prompt Enhancement

Add documentation in `prompts.yaml` to guide LLM to generate correct code from the start:

```yaml
**CRITICAL: Pandas MultiIndex groupby().rolling() Pattern**

❌ WRONG:
  ma_20 = volume.groupby(level='instrument').rolling(window=20).mean()

✅ CORRECT:
  ma_20 = volume.groupby(level='instrument').transform(
    lambda x: x.rolling(window=20).mean()
  )
```

## Changes

| File | Change |
|------|--------|
| `rdagent/scenarios/qlib/developer/utils.py` | Add `_fix_groupby_rolling_pattern()` function |
| `rdagent/scenarios/qlib/experiment/prompts.yaml` | Add MultiIndex rolling pattern documentation |

## Comparison with microsoft#1375

| Aspect | This PR (Preventive) | microsoft#1375 (Remedial) |
|--------|---------------------|------------------|
| Fix timing | Before code execution | Before concat |
| Root cause | ✅ Yes | ⚠️ Partially |
| Data integrity | ✅ Preserved | ⚠️ May drop level incorrectly |
| Index ordering | ✅ Correct | ⚠️ May need swaplevel |

**Recommendation**: Merge both for defense-in-depth.

## Testing

- All offline tests pass: `pytest -m offline`
- Manual testing with qlib fin_factor scenario
- Verified factor data produces correct 2-level MultiIndex

## Related

- Fixes microsoft#678
- Complements microsoft#1375
@shin4
Copy link
Copy Markdown
Author

shin4 commented Apr 28, 2026

@microsoft-github-policy-service agree

shin4 added 5 commits April 29, 2026 15:52
The condition at line 127 was checking if feature_codes was NOT in plan,
but it should check if it IS in plan before adding user instruction.

This bug prevented baseline factor information from being communicated
to the LLM during hypothesis generation.
Expose the base_features_path parameter in fin_factor_cli() so users can
specify custom baseline features directory via CLI.

Usage: rdagent fin_factor --base-features-path ./baseline_features
Changes:
- Change '1-5 Factors' to '1-3 Quality Factors'
- Add requirement for economic intuition justification
- Add baseline_context section listing existing factors
- Historical validation: 29 quality factors beat 158 quantity factors

This helps LLM generate higher quality factors and avoid duplicates.
Changes:
- Add baseline_context to context_dict in prepare_context()
- Replace simple threshold-based RAG with dynamic strategy
- New _generate_dynamic_rag() function parses exploration history
- Tracks 9 direction keywords: momentum, volatility, volume, etc.
- Recommends underexplored directions based on trace history

This improves factor discovery by making LLM aware of existing
baseline factors and guiding exploration to new directions.
Creates compare_alpha_baselines.py to run backtests for both factor sets
and compare key metrics:
- 年化收益 (Annualized Return)
- 最大回撤 (Maximum Drawdown)
- 信息比率 (Information Ratio)
- IC均值 (Mean IC)
- ICIR (IC Information Ratio)

Usage: python compare_alpha_baselines.py

Requires Docker with local_qlib:latest image.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fail to concat factors with different MultiIndex

1 participant