You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rdagent/components/coder/data_science/ensemble/prompts.yaml
+5-37Lines changed: 5 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@ ensemble_coder:
2
2
system: |-
3
3
You are a world-class data scientist and machine learning engineer with deep expertise in statistics, mathematics, and computer science.
4
4
Your knowledge spans cutting-edge data analysis techniques, advanced machine learning algorithms, and their practical applications to solve complex real-world problems.
5
-
5
+
6
6
## Task Description
7
7
Currently, you are working on model ensemble implementation. Your task is to write a Python function that combines multiple model predictions and makes final decisions.
8
8
@@ -51,31 +51,6 @@ ensemble_coder:
51
51
}
52
52
{% endif %}
53
53
54
-
You must carefully allocate the training and runtime budget to ensure the **ensemble logic is well-executed and evaluated**, without compromising model performance.
55
-
### 1. Core Principle
56
-
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
57
-
58
-
### 2. Training-Time Resource Allocation
59
-
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within runtime limits**.
60
-
- Avoid reducing base model quality just to save time. For example:
61
-
- Freezing large parts of the model (e.g., embeddings)
62
-
- Using only embedding-level regression instead of full modeling
63
-
- Using extreme simplifications like LoRA or tiny backbones if they degrade performance
64
-
65
-
### 3. Expectation on Ensemble Design
66
-
- Implement an ensemble strategy that improves performance.
67
-
This can be as simple as training the same model with different random seeds or data splits and averaging the outputs.
68
-
More advanced methods like stacking or blending are optional and can be used if beneficial.
69
-
Feel free to choose a practical and reliable ensemble approach within the available time and resources.
70
-
- Consider the resource budget as a whole: a strong ensemble depends on both good base models and effective combination.
71
-
72
-
### 4. Final Reminder
73
-
You have full access to the training code, task definition, and previous results.
74
-
You should weigh trade-offs thoughtfully and pick a design that maximizes ensemble performance without shortcuts that hurt model quality or cause timeout.
75
-
- The current time budget is sufficient for thorough training and ensemble.
76
-
- If you believe the existing single-model code is already good, avoid large modifications.
77
-
- Avoid overly strict constraints; focus on effectively using available time to build a robust ensemble.
78
-
79
54
user: |-
80
55
--------- Code Specification ---------
81
56
{{ code_spec }}
@@ -127,17 +102,10 @@ ensemble_eval:
127
102
128
103
## Evaluation Criteria
129
104
- You will be given the standard output (`stdout`) from the ensemble test and, if applicable, the workflow test.
130
-
- The code should have no try-except blocks to ensure errors are exposed.
131
-
- Verify that the scoring uses the specified metric exactly and correctly.
132
-
- Validate that the prediction shapes and values are consistent and sensible.
133
-
- Confirm that the ensemble completes training and inference within expected time (no timeout or incomplete training).
134
-
- Critically, check that the base models maintain good quality and are **not deliberately degraded to save time**. For example:
135
-
- Avoid freezing large parts of the model that reduce learning capacity.
136
-
- Avoid replacing full models with simplistic embedding regressors.
137
-
- Avoid using tricks that severely impair model expressiveness just to reduce runtime.
138
-
- Reject ensemble implementations that sacrifice model performance for training speed.
139
-
- Provide full error messages and stack traces if any failures occur.
140
-
105
+
- Code should have no try-except blocks because they can hide errors.
106
+
- Check whether the code implement the scoring process using the given metric.
107
+
- The stdout includes the local variable values from the ensemble code execution. Check whether the validation score is calculated correctly.
108
+
141
109
Please respond with your feedback in the following JSON format and order
0 commit comments