fix(qlib): prevent MultiIndex duplication from groupby().rolling() pattern by shin4 · Pull Request #1401 · microsoft/RD-Agent

shin4 · 2026-04-28T08:06:40Z

Summary

This PR introduces a preventive fix for pandas MultiIndex issues caused by groupby().rolling() patterns in LLM-generated factor code.

Fixes #678

Problem

When LLM generates factor code with rolling operations on MultiIndex data (index: ['datetime', 'instrument']), a common pattern produces 3-level indices instead of the expected 2-level:

# ❌ WRONG - Creates 3-level index: ['instrument', 'datetime', 'instrument']
ma_20 = volume.groupby(level='instrument').rolling(window=20).mean()
# ValueError: The name instrument occurs multiple times

This causes pd.concat() to fail with:

AssertionError: Length of new_levels (3) must be <= self.nlevels (2)

See Issue #678 for detailed error report.

Solution

1. Preventive Code Fix

Auto-detect and fix the problematic pattern in generated factor code before execution:

# rdagent/scenarios/qlib/developer/utils.py
def _fix_groupby_rolling_pattern(code: str) -> str:
    """
    Fix pandas groupby().rolling() patterns that cause index duplication.
    
    Converts: .groupby(level='instrument').rolling(window=N).mean()
    To:       .groupby(level='instrument').transform(lambda x: x.rolling(window=N).mean())
    """

2. Prompt Enhancement

Add documentation in prompts.yaml to guide LLM to generate correct code from the start:

**CRITICAL: Pandas MultiIndex groupby().rolling() Pattern**

❌ WRONG:
  ma_20 = volume.groupby(level='instrument').rolling(window=20).mean()

✅ CORRECT:
  ma_20 = volume.groupby(level='instrument').transform(
    lambda x: x.rolling(window=20).mean()
  )

Changes

File	Change
`rdagent/scenarios/qlib/developer/utils.py`	Add `_fix_groupby_rolling_pattern()` function
`rdagent/scenarios/qlib/experiment/prompts.yaml`	Add MultiIndex rolling pattern documentation

Comparison with #1375

Aspect	This PR (Preventive)	#1375 (Remedial)
Fix timing	Before code execution	Before concat
Root cause	✅ Yes	⚠️ Partially
Data integrity	✅ Preserved	⚠️ May drop level incorrectly
Index ordering	✅ Correct	⚠️ May need swaplevel

Recommendation: Merge both for defense-in-depth.

Testing

All offline tests pass: pytest -m offline
Manual testing with qlib fin_factor scenario
Verified factor data produces correct 2-level MultiIndex

This PR introduces a **preventive fix** for pandas MultiIndex issues caused by `groupby().rolling()` patterns in LLM-generated factor code. Fixes microsoft#678 ## Problem When LLM generates factor code with rolling operations on MultiIndex data (index: `['datetime', 'instrument']`), a common pattern produces 3-level indices instead of the expected 2-level: ```python # ❌ WRONG - Creates 3-level index: ['instrument', 'datetime', 'instrument'] ma_20 = volume.groupby(level='instrument').rolling(window=20).mean() # ValueError: The name instrument occurs multiple times ``` This causes `pd.concat()` to fail with: ``` AssertionError: Length of new_levels (3) must be <= self.nlevels (2) ``` See Issue microsoft#678 for detailed error report. ## Solution ### 1. Preventive Code Fix Auto-detect and fix the problematic pattern in generated factor code **before execution**: ```python # rdagent/scenarios/qlib/developer/utils.py def _fix_groupby_rolling_pattern(code: str) -> str: """ Fix pandas groupby().rolling() patterns that cause index duplication. Converts: .groupby(level='instrument').rolling(window=N).mean() To: .groupby(level='instrument').transform(lambda x: x.rolling(window=N).mean()) """ ``` ### 2. Prompt Enhancement Add documentation in `prompts.yaml` to guide LLM to generate correct code from the start: ```yaml **CRITICAL: Pandas MultiIndex groupby().rolling() Pattern** ❌ WRONG: ma_20 = volume.groupby(level='instrument').rolling(window=20).mean() ✅ CORRECT: ma_20 = volume.groupby(level='instrument').transform( lambda x: x.rolling(window=20).mean() ) ``` ## Changes | File | Change | |------|--------| | `rdagent/scenarios/qlib/developer/utils.py` | Add `_fix_groupby_rolling_pattern()` function | | `rdagent/scenarios/qlib/experiment/prompts.yaml` | Add MultiIndex rolling pattern documentation | ## Comparison with microsoft#1375 | Aspect | This PR (Preventive) | microsoft#1375 (Remedial) | |--------|---------------------|------------------| | Fix timing | Before code execution | Before concat | | Root cause | ✅ Yes | ⚠️ Partially | | Data integrity | ✅ Preserved | ⚠️ May drop level incorrectly | | Index ordering | ✅ Correct | ⚠️ May need swaplevel | **Recommendation**: Merge both for defense-in-depth. ## Testing - All offline tests pass: `pytest -m offline` - Manual testing with qlib fin_factor scenario - Verified factor data produces correct 2-level MultiIndex ## Related - Fixes microsoft#678 - Complements microsoft#1375

shin4 · 2026-04-28T08:06:51Z

@microsoft-github-policy-service agree

The condition at line 127 was checking if feature_codes was NOT in plan, but it should check if it IS in plan before adding user instruction. This bug prevented baseline factor information from being communicated to the LLM during hypothesis generation.

Expose the base_features_path parameter in fin_factor_cli() so users can specify custom baseline features directory via CLI. Usage: rdagent fin_factor --base-features-path ./baseline_features

Changes: - Change '1-5 Factors' to '1-3 Quality Factors' - Add requirement for economic intuition justification - Add baseline_context section listing existing factors - Historical validation: 29 quality factors beat 158 quantity factors This helps LLM generate higher quality factors and avoid duplicates.

Changes: - Add baseline_context to context_dict in prepare_context() - Replace simple threshold-based RAG with dynamic strategy - New _generate_dynamic_rag() function parses exploration history - Tracks 9 direction keywords: momentum, volatility, volume, etc. - Recommends underexplored directions based on trace history This improves factor discovery by making LLM aware of existing baseline factors and guiding exploration to new directions.

Creates compare_alpha_baselines.py to run backtests for both factor sets and compare key metrics: - 年化收益 (Annualized Return) - 最大回撤 (Maximum Drawdown) - 信息比率 (Information Ratio) - IC均值 (Mean IC) - ICIR (IC Information Ratio) Usage: python compare_alpha_baselines.py Requires Docker with local_qlib:latest image.

shin4 added 5 commits April 29, 2026 15:52

feat(cli): add base_features_path parameter to fin_factor command

fffcf54

Expose the base_features_path parameter in fin_factor_cli() so users can specify custom baseline features directory via CLI. Usage: rdagent fin_factor --base-features-path ./baseline_features

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(qlib): prevent MultiIndex duplication from groupby().rolling() pattern#1401

fix(qlib): prevent MultiIndex duplication from groupby().rolling() pattern#1401
shin4 wants to merge 6 commits into
microsoft:mainfrom
shin4:fix-multiindex-rolling

shin4 commented Apr 28, 2026 •

edited by github-actions Bot

Loading

Uh oh!

shin4 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shin4 commented Apr 28, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

1. Preventive Code Fix

2. Prompt Enhancement

Changes

Comparison with #1375

Testing

Related

Uh oh!

shin4 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shin4 commented Apr 28, 2026 •

edited by github-actions Bot

Loading