Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
6 changes: 6 additions & 0 deletions evals/registry/eval_sets/chemistry_enzyme.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
chemistry_enzyme:
evals:
- scipaper_enzyme_substrate
- scipaper_enzyme_activate_compound
- scipaper_enzyme_inhibitor
- scipaper_enzyme_localization
18 changes: 18 additions & 0 deletions evals/registry/evals/00_scipaper_enzyme_activate_compound.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
scipaper_enzyme_activate_compound:
id: scipaper_enzyme_activate_compound.val.csv
metrics: [accuracy]

scipaper_enzyme_activate_compound.val.csv:
class: evals.elsuite.rag_table_extract:TableExtract
args:
samples_jsonl: 00_scipaper_enzyme_activate_compound/samples.jsonl
instructions: |
Please give a complete list of Activating Compound, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
```csv
Activating Compound,Comment,Organism
Cu2+,at 0.001 mM of the activity without activator,Homo sapiens
p-xylene,"11.4 mM, slight activation",Bos taurus
NH4+, 0.002 mM,Bos taurus
```
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
18 changes: 18 additions & 0 deletions evals/registry/evals/00_scipaper_enzyme_inhibitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
scipaper_enzyme_inhibitor:
id: scipaper_enzyme_inhibitor.val.csv
metrics: [accuracy]

scipaper_enzyme_inhibitor.val.csv:
class: evals.elsuite.rag_table_extract:TableExtract
args:
samples_jsonl: 00_scipaper_enzyme_inhibitor/samples.jsonl
instructions: |
Please give a complete list of Inhibitor, Commentand Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
```csv
Inhibitor,Comment,Organism
ATP,"competitive inhibition of verapamil-dependent ATPase-activity",Homo sapiens
p-xylene,"11.4 mM, slight inhibitor",Bos taurus
NH4+, 0.002 mM,Bos taurus
```
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
16 changes: 16 additions & 0 deletions evals/registry/evals/00_scipaper_enzyme_localization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
scipaper_enzyme_localization:
id: scipaper_enzyme_localization.val.csv
metrics: [accuracy]

scipaper_enzyme_localization.val.csv:
class: evals.elsuite.rag_table_extract:TableExtract
args:
samples_jsonl: 00_scipaper_enzyme_localization/samples.jsonl
instructions: |
Please give a complete list of Localization, Commentand and Organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
```csv
Localization,Organism
periplasm,Bos taurus
```
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!
19 changes: 19 additions & 0 deletions evals/registry/evals/00_scipaper_enzyme_substrate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
scipaper_enzyme_substrate:
id: scipaper_enzyme_substrate.val.csv
metrics: [accuracy]

scipaper_enzyme_substrate.val.csv:
class: evals.elsuite.rag_table_extract:TableExtract
args:
samples_jsonl: 00_scipaper_enzyme_substrate/samples.jsonl
instructions: |
Please give a complete list of SMILES structures, Km values, Vmax values, target info (protein or cell line), and organism of all substrates in the paper. Usually the substrates' tags are numbers or IUPAC names.
1. Output in csv format, write units not in header but in the value like "10.5 µM". Quote the value if it has comma! For example:
```csv
Substrate,Inhibitors, Km value,Km max,Comment,organism,Vmax value,SMILES,Target info,Activating Compound,
ATP,Cu2+,0.001 mM,-,-,Homo sapiens,-,-,ATP-linker aldehyde,Carboxybenzaldehyde,
p-xylene,NADH,0.004 mM,-,-,Homo sapiens,-,C1CCCCC1,-,Methylbenzaldehyde
NADPH,benzaldehyde, 0.12 mM,125 mM,enzyme form ATP,Bos taurus,-,-,NH4+

```
2. If there are multiple tables, concat them. Don't give me reference or using "...", give me complete table!