Skip to content

Commit 7a3620e

Browse files
committed
some refinement
1 parent 04e9d5a commit 7a3620e

4 files changed

Lines changed: 5 additions & 56 deletions

File tree

rdagent/components/coder/data_science/ensemble/prompts.yaml

Lines changed: 5 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ ensemble_coder:
22
system: |-
33
You are a world-class data scientist and machine learning engineer with deep expertise in statistics, mathematics, and computer science.
44
Your knowledge spans cutting-edge data analysis techniques, advanced machine learning algorithms, and their practical applications to solve complex real-world problems.
5-
5+
66
## Task Description
77
Currently, you are working on model ensemble implementation. Your task is to write a Python function that combines multiple model predictions and makes final decisions.
88
@@ -51,31 +51,6 @@ ensemble_coder:
5151
}
5252
{% endif %}
5353
54-
You must carefully allocate the training and runtime budget to ensure the **ensemble logic is well-executed and evaluated**, without compromising model performance.
55-
### 1. Core Principle
56-
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
57-
58-
### 2. Training-Time Resource Allocation
59-
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within runtime limits**.
60-
- Avoid reducing base model quality just to save time. For example:
61-
- Freezing large parts of the model (e.g., embeddings)
62-
- Using only embedding-level regression instead of full modeling
63-
- Using extreme simplifications like LoRA or tiny backbones if they degrade performance
64-
65-
### 3. Expectation on Ensemble Design
66-
- Implement an ensemble strategy that improves performance.
67-
This can be as simple as training the same model with different random seeds or data splits and averaging the outputs.
68-
More advanced methods like stacking or blending are optional and can be used if beneficial.
69-
Feel free to choose a practical and reliable ensemble approach within the available time and resources.
70-
- Consider the resource budget as a whole: a strong ensemble depends on both good base models and effective combination.
71-
72-
### 4. Final Reminder
73-
You have full access to the training code, task definition, and previous results.
74-
You should weigh trade-offs thoughtfully and pick a design that maximizes ensemble performance without shortcuts that hurt model quality or cause timeout.
75-
- The current time budget is sufficient for thorough training and ensemble.
76-
- If you believe the existing single-model code is already good, avoid large modifications.
77-
- Avoid overly strict constraints; focus on effectively using available time to build a robust ensemble.
78-
7954
user: |-
8055
--------- Code Specification ---------
8156
{{ code_spec }}
@@ -127,17 +102,10 @@ ensemble_eval:
127102
128103
## Evaluation Criteria
129104
- You will be given the standard output (`stdout`) from the ensemble test and, if applicable, the workflow test.
130-
- The code should have no try-except blocks to ensure errors are exposed.
131-
- Verify that the scoring uses the specified metric exactly and correctly.
132-
- Validate that the prediction shapes and values are consistent and sensible.
133-
- Confirm that the ensemble completes training and inference within expected time (no timeout or incomplete training).
134-
- Critically, check that the base models maintain good quality and are **not deliberately degraded to save time**. For example:
135-
- Avoid freezing large parts of the model that reduce learning capacity.
136-
- Avoid replacing full models with simplistic embedding regressors.
137-
- Avoid using tricks that severely impair model expressiveness just to reduce runtime.
138-
- Reject ensemble implementations that sacrifice model performance for training speed.
139-
- Provide full error messages and stack traces if any failures occur.
140-
105+
- Code should have no try-except blocks because they can hide errors.
106+
- Check whether the code implement the scoring process using the given metric.
107+
- The stdout includes the local variable values from the ensemble code execution. Check whether the validation score is calculated correctly.
108+
141109
Please respond with your feedback in the following JSON format and order
142110
```json
143111
{

rdagent/scenarios/data_science/proposal/exp_gen/ensemble/ensemble.py

Lines changed: 0 additions & 11 deletions
This file was deleted.

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -448,10 +448,6 @@ task_gen:
448448
{% endif %}
449449
{% endif %}
450450
451-
452-
453-
454-
455451
user: |-
456452
# Competition Scenario Description
457453
{{ scenario_desc }}

rdagent/scenarios/data_science/proposal/exp_gen/proposal.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
import asyncio
21
import json
3-
import re
42
from enum import Enum
53
from typing import Any, Dict, List, Optional, Tuple
64

@@ -14,11 +12,9 @@
1412
from rdagent.components.coder.data_science.pipeline.exp import PipelineTask
1513
from rdagent.components.coder.data_science.raw_data_loader.exp import DataLoaderTask
1614
from rdagent.components.coder.data_science.workflow.exp import WorkflowTask
17-
from rdagent.core.conf import RD_AGENT_SETTINGS
1815
from rdagent.core.proposal import ExpGen
1916
from rdagent.core.scenario import Scenario
2017
from rdagent.log import rdagent_logger as logger
21-
from rdagent.log.timer import RDAgentTimer
2218
from rdagent.oai.backend.base import RD_Agent_TIMER_wrapper
2319
from rdagent.oai.llm_utils import APIBackend, md5_hash
2420
from rdagent.scenarios.data_science.dev.feedback import ExperimentFeedback

0 commit comments

Comments
 (0)