Skip to content

Commit b9dc5a4

Browse files
authored
Merge pull request #848 from KM3NeT/main
KM3NeT data extractors, example of use and detector classes
2 parents fd53394 + a088f9a commit b9dc5a4

File tree

17 files changed

+1182
-4
lines changed

17 files changed

+1182
-4
lines changed
4.93 MB
Binary file not shown.

docs/source/installation/install.rst

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,52 @@ To achieve this, we recommend installing |graphnet|\ GraphNeT into a CVMFS with
5555
5656
Once installed, |graphnet|\ GraphNeT is available whenever you open the CVMFS locally.
5757

58+
Installation with km3io (KM3NeT)
59+
-----------------------------------------------
60+
61+
This installation is only necessary if you want to process KM3NeT/ARCA or KM3NeT/ORCA files. Processing means converting them from a `.root` offline format into a suitable format for training using |graphnet|. If you already have your KM3NeT data in `SQLite` or `parquet` format and only want to train a model or perform inference on this database, this specific installation is not needed.
62+
63+
Note that this installation will add `km3io` ensuring it is built with a compatible versions. The steps below are provided for a conda environment, with an enviroment created in the same way it is done above in this page, but feel free to choose a different enviroment setup.
64+
65+
As mentioned, it is highly reommended to create a conda enviroment where your installation is done to do not mess up any dependecy. It can be done with the following commands:
66+
67+
.. code-block:: bash
68+
69+
# Create an environment with Python 3.10
70+
conda create -p <full-path-to-env> --no-default-packages python=3.10 -y
71+
# Activate the environment and move to the graphnet repository you just cloned. If using conda:
72+
conda activate <full-path-to-env>
73+
74+
The isntallation of GraphNeT is then done by:
75+
76+
.. code-block:: bash
77+
78+
git clone https://github.com/graphnet-team/graphnet.git
79+
cd graphnet
80+
81+
Choose the appropriate requirements file based on your system. Here there is just an example of installation with PyTorch-2.5.1 but check the matrix above for a full idea of all the versions can be installed.
82+
83+
For CPU-only enviroments:
84+
85+
.. code-block:: bash
86+
87+
pip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cpu
88+
pip3 install -e .[torch-25] -f https://data.pyg.org/whl/torch-2.5.1+cpu.html
89+
90+
For GPU enviroments with, for instance, CUDA 11.8 drivers:
91+
92+
.. code-block:: bash
93+
94+
pip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
95+
pip3 install -e .[torch-25] -f https://data.pyg.org/whl/torch-2.5.1+cu118.html
96+
97+
Downgrade setuptools for compatibility between km3io and GraphNeT.
98+
99+
.. code-block:: bash
100+
101+
pip3 install --force-reinstall setuptools==70.3.0
102+
pip3 install km3io==1.2.0
103+
104+
58105
.. note::
59106
We recommend installing |graphnet|\ GraphNeT without GPU in clean metaprojects.
60-
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
"""Code to run the extraction of km3net data."""
2+
3+
import os
4+
import warnings
5+
6+
from graphnet.constants import EXAMPLE_OUTPUT_DIR, TEST_DATA_DIR
7+
from graphnet.data.readers import KM3NeTReader
8+
from graphnet.data.writers import ParquetWriter, SQLiteWriter
9+
from graphnet.data import DataConverter
10+
from graphnet.data.extractors.km3net import (
11+
KM3NeTTruthExtractor,
12+
KM3NeTFullPulseExtractor,
13+
KM3NeTTriggPulseExtractor,
14+
KM3NeTHNLTruthExtractor,
15+
KM3NeTRegularRecoExtractor,
16+
KM3NeTHNLRecoExtractor,
17+
)
18+
19+
from graphnet.utilities.argparse import ArgumentParser
20+
21+
22+
def main(backend: str, triggered: str, HNL: str, OUTPUT_DIR: str) -> None:
23+
"""Convert ROOT files from KM3NeT to `backend` format."""
24+
warnings.simplefilter(action="ignore", category=FutureWarning)
25+
26+
input_dir = [f"{TEST_DATA_DIR}/km3net"]
27+
if OUTPUT_DIR != "None":
28+
outdir = f"{OUTPUT_DIR}/{backend}"
29+
else:
30+
outdir = f"{EXAMPLE_OUTPUT_DIR}/{backend}"
31+
os.makedirs(outdir, exist_ok=True)
32+
print(60 * "*")
33+
print(f"Saving to {outdir}")
34+
print(60 * "*")
35+
if backend == "parquet":
36+
save_method = ParquetWriter(truth_table="truth")
37+
elif backend == "sqlite":
38+
save_method = SQLiteWriter() # type: ignore
39+
else:
40+
raise ValueError("Invalid backend choice")
41+
42+
if HNL == "km3net-vars":
43+
truth_extractor = KM3NeTTruthExtractor(name="truth")
44+
reco_extractor = KM3NeTRegularRecoExtractor(name="reco")
45+
elif HNL == "hnl-vars":
46+
truth_extractor = KM3NeTHNLTruthExtractor(name="truth") # type: ignore
47+
reco_extractor = KM3NeTHNLRecoExtractor(name="reco") # type: ignore
48+
else:
49+
raise ValueError("Invalid HNL choice")
50+
51+
if triggered == "Triggered":
52+
pulse_extractor = KM3NeTTriggPulseExtractor(name="pulse_map")
53+
elif triggered == "Snapshot":
54+
pulse_extractor = KM3NeTFullPulseExtractor(
55+
name="pulse_map"
56+
) # type: ignore
57+
else:
58+
raise ValueError("Invalid triggered choice")
59+
60+
converter = DataConverter(
61+
file_reader=KM3NeTReader(),
62+
save_method=save_method,
63+
extractors=[truth_extractor, pulse_extractor, reco_extractor],
64+
outdir=outdir,
65+
num_workers=1,
66+
)
67+
68+
converter(input_dir=input_dir)
69+
70+
71+
if __name__ == "__main__":
72+
73+
# Parse command-line arguments
74+
parser = ArgumentParser(
75+
description="""
76+
Convert root files from KM3NeT to an sqlite or parquet.
77+
"""
78+
)
79+
80+
parser.add_argument(
81+
"backend",
82+
choices=["sqlite", "parquet"],
83+
help="Choose the backend format",
84+
)
85+
parser.add_argument(
86+
"triggered",
87+
choices=["Triggered", "Snapshot"],
88+
help="Choose between triggered or snapshot pulse maps",
89+
)
90+
parser.add_argument(
91+
"HNL",
92+
choices=["km3net-vars", "hnl-vars"],
93+
help="Km3net truth or adding Heavy Neutral Lepton info",
94+
)
95+
parser.add_argument(
96+
"OUTPUT_DIR",
97+
default="None",
98+
help="Output directory (optional)",
99+
)
100+
101+
args, unknown = parser.parse_known_args()
102+
103+
# Run example script
104+
main(args.backend, args.triggered, args.HNL, args.OUTPUT_DIR)

examples/07_km3net/README.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# KM3NeT Data Conversion
2+
3+
This folder contains an example script for extracting information from ROOT files of KM3NeT offline data and converting it into intermediate file formats suitable for deep learning training or inference using GraphNeT. Supported output formats include SQLite and Parquet. After this conversion, training and inference on KM3NeT data can be performed efficiently.
4+
5+
## Example Usage
6+
7+
The following example demonstrates how to perform the conversion using a sample KM3NeT-like file containing a few events with random information:
8+
9+
```bash
10+
python 01_convert_km3net.py <output_format> <pulse_option> <variable_set> [OUTPUT_DIR]
11+
```
12+
13+
### Arguments:
14+
- `<output_format>`: Specifies the output format, either `sqlite` or `parquet`.
15+
- `<pulse_option>`: Determines whether to extract all pulses (`Snapshot`) or only the triggered ones (`Triggered`).
16+
- `<variable_set>`: Defines the variables to include, such as `km3net-vars` for standard neutrino-related data or `hnl-vars` for additional quantities related to Heavy Neutral Lepton searches.
17+
- `[OUTPUT_DIR]` (optional): Specifies the output directory. If not provided, the output will be stored in GraphNeT's default example output directory, which can be found using:
18+
19+
```python
20+
from graphnet.constants import EXAMPLE_OUTPUT_DIR
21+
print(EXAMPLE_OUTPUT_DIR)
22+
```
23+
24+
The path to the ROOT file converted can be found by running:
25+
```python
26+
from graphnet.constants import TEST_DATA_DIR
27+
print(TEST_DATA_DIR)
28+
```
29+
30+
### Output Structure
31+
32+
The generated SQLite or Parquet file contains:
33+
- A **pulse table**, storing hit-by-hit information for each event, with a unique identifier linking pulses to their respective events.
34+
- A **true Monte Carlo event table**, including ground-truth event information. If available and selected, it may also contain reconstructed information from likelihood-based methods.
35+
- Unavailable variables (e.g., true Monte Carlo information in real data files) will be filled with unphysical placeholder values.
36+
37+
### Reading the Output Files
38+
39+
The output files can be read using Python.
40+
41+
- **If you chose to create a Parquet output**:
42+
You will find several `.parquet` files in the output folder, each corresponding to a different extracted table (e.g., a table with the true event information, a table with pulse information, etc.).
43+
To read one of these tables:
44+
45+
```python
46+
import pandas as pd
47+
48+
df = pd.read_parquet("FILE_NAME.parquet")
49+
print(df.head())
50+
51+
- **If you chose to create an SQLite output**:
52+
In this case, you will find a single `.db` file per converted input, which contains all the tables inside.
53+
To list the table names and preview their contents:
54+
55+
```python
56+
import pandas as pd
57+
import sqlite3
58+
59+
# Connect to the SQLite database
60+
conn = sqlite3.connect("FILE_NAME.db")
61+
cursor = conn.cursor()
62+
63+
# Get the table names
64+
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
65+
tables = [t[0] for t in cursor.fetchall()]
66+
print("The following tables are stored inside the file:", tables)
67+
68+
# Preview the first 5 rows of each table
69+
for t in tables:
70+
print(f"\nTable: {t}")
71+
df = pd.read_sql_query(f"SELECT * FROM {t[0]} LIMIT 5;", conn)
72+
print(df)
73+
74+
75+
## Help
76+
77+
For more information on available options, use the help flag:
78+
79+
```bash
80+
python 01_convert_km3net.py -h
81+
```
82+
83+
or
84+
85+
```bash
86+
python 01_convert_km3net.py --help
87+
```

examples/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@ Examples are grouped into five numbered subfolders, roughly in order of how you
77
2. **Data.** Reading in data in intermediate formats, plotting feature distributions, and converting data between intermediate file formats. These examples are entirely self-contained and can be run by anyone.
88
3. **Weights.** Fitting per-event weights.
99
4. **Training.** Training GNN models on various physics tasks.
10-
5**LiquidO.** Converting h5 files from the LiquidO experiment into intermediate formats suitable for deep learning.
10+
5. **LiquidO.** Converting h5 files from the LiquidO experiment into intermediate formats suitable for deep learning.
11+
6. **Prometheus.** Converting parquet files from the Prometheus simulation software into intermediate formats suitable for deep learning.
12+
7. **KM3NeT.** Converting root files from the KM3NeT experiment into intermediate formats suitable for deep learning.
1113

1214
Each subfolder contains similarly numbered example scripts.
1315
Each example script comes with a simple command-line interface and help functionality, e.g.

src/graphnet/data/constants.py

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,18 @@ class FEATURES:
5252
]
5353
KAGGLE = ["x", "y", "z", "time", "charge", "auxiliary"]
5454
LIQUIDO = ["sipm_x", "sipm_y", "sipm_z", "t"]
55+
KM3NET = [
56+
"t",
57+
"pos_x",
58+
"pos_y",
59+
"pos_z",
60+
"dir_x",
61+
"dir_y",
62+
"dir_z",
63+
"tot",
64+
"trig",
65+
]
66+
KM3NET_HNL = KM3NET
5567

5668

5769
class TRUTH:
@@ -71,8 +83,6 @@ class TRUTH:
7183
"interaction_type",
7284
"interaction_time", # Added for vertex reconstruction
7385
"inelasticity",
74-
"visible_inelasticity",
75-
"visible_energy",
7686
"stopped_muon",
7787
]
7888
DEEPCORE = ICECUBE86
@@ -167,3 +177,49 @@ class TRUTH:
167177
"energy",
168178
"pid",
169179
]
180+
KM3NET = [
181+
"true_pdgid",
182+
"true_E",
183+
"true_pos_x",
184+
"true_pos_y",
185+
"true_pos_z",
186+
"true_dir_x",
187+
"true_dir_y",
188+
"true_dir_z",
189+
"true_zenith",
190+
"true_azimuth",
191+
"run_id",
192+
"evt_id",
193+
"frame_index",
194+
"trigger_counter",
195+
"n_hits",
196+
"event_no",
197+
"is_cc_flag",
198+
"tau_topology",
199+
]
200+
KM3NET_HNL = [
201+
"true_pdgid",
202+
"true_E",
203+
"true_pos_x",
204+
"true_pos_y",
205+
"true_pos_z",
206+
"true_dir_x",
207+
"true_dir_y",
208+
"true_dir_z",
209+
"true_zenith",
210+
"run_id",
211+
"evt_id",
212+
"frame_index",
213+
"trigger_counter",
214+
"n_hits",
215+
"event_no",
216+
"is_cc_flag",
217+
"tau_topology",
218+
"zenith_hnl",
219+
"azimuth_hnl",
220+
"angle_between_showers",
221+
"Energy_hnl",
222+
"Energy_second_shower",
223+
"Energy_imbalance",
224+
"distance",
225+
]

src/graphnet/data/dataconverter.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
from .extractors.liquido import H5Extractor
2222
from .extractors.internal import ParquetExtractor
2323
from .extractors.prometheus import PrometheusExtractor
24+
from .extractors.km3net import KM3NeTExtractor
2425

2526
from .dataclasses import I3FileSet
2627

@@ -51,6 +52,7 @@ def __init__(
5152
List[ParquetExtractor],
5253
List[H5Extractor],
5354
List[PrometheusExtractor],
55+
List[KM3NeTExtractor],
5456
],
5557
index_column: str = "event_no",
5658
num_workers: int = 1,
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
"""Extractors for extracting pure-python data from KM3NeT-Offline files."""
2+
3+
from .km3netextractor import KM3NeTExtractor
4+
from .km3netpulseextractor import (
5+
KM3NeTTriggPulseExtractor,
6+
KM3NeTFullPulseExtractor,
7+
)
8+
from .km3nettruthextractor import (
9+
KM3NeTTruthExtractor,
10+
KM3NeTHNLTruthExtractor,
11+
KM3NeTRegularRecoExtractor,
12+
KM3NeTHNLRecoExtractor,
13+
)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""Base class for all KM3NeTROOT extractors."""
2+
3+
from typing import Any
4+
from abc import abstractmethod
5+
6+
from graphnet.data.extractors import Extractor
7+
8+
# needs to be implemented at the end. It is a class that will kind of
9+
# gather all the specific extractors for the different data types and
10+
# help to call them all from the reader. Equivalent to the I3extractor (that
11+
# I don't yet understand) in the IceCube example
12+
13+
14+
class KM3NeTExtractor(Extractor):
15+
"""Base class for all KM3NeT extractors."""
16+
17+
def __init__(self, extractor_name: str):
18+
"""Initailize KM3NeTTExtractor.
19+
20+
Args:
21+
extractor_name: Name of the `KM3NeTExtractor` instance.
22+
Used to keep track of the provenance of different data,
23+
and to name tables to which this data is saved.
24+
"""
25+
super().__init__(extractor_name=extractor_name)
26+
27+
@abstractmethod
28+
def __call__(self, file: Any) -> dict:
29+
"""Extract information from file."""
30+
pass

0 commit comments

Comments
 (0)