Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions Ingestion/Ingesting_Data.md

This file was deleted.

38 changes: 38 additions & 0 deletions Ingestor/IngestManual_ESS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Ingest Instructions ESS
___This page needs to be updated___

![Kafka flow](screenshots/kafka.png)

As shown in the picture above the detector and the data collection software is writing into kafka topics. The topics is then being read by a filewriter that in turn writes the dataset to storage and sends back an event when the file is written that contains metadata about the file and the experiment. There is then an ingestion program that will parse the event from the filewriter and gather additional information before triggering REST calls into the backend.

### How the ingestion program works:

#### 1. Parse Event

The ingestor is subscribing to a topic in Kafka where the filewriter creates an event when a file has been written. This event contains information about location, proposal id and metadata that exist on the file in the form of nexus data.

#### 2. Login

Login and get a access token that can be used for interfacing with the backend. We advise to create a special ingestor account to be used when doing automatic ingestion.

#### 3. Gather more metadata

The ingestor contacts the User Office and get's additional information regarding the experiment, the principal investigator and the sample.

#### 4. Create dataset

![dataset](screenshots/dataset.png)

Using the information from the filewriter event with the additional information gathered from the user office a dataset request can be constructed and sent to the backend.

#### 5. Create OrigDatablocks

![datablock](screenshots/datablock.png)

After a dataset has been created we attach the files that relate, this is done by creating datablocks that attaches to the dataset

#### 6. Create Sample

![sample](screenshots/sample.png)

If sample information is available a sample record can be created and attached to the dataset. This is an important step as it makes the data available to a wider audience of people.
45 changes: 45 additions & 0 deletions Ingestor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Ingesting Data into SciCat

## Using pyscicat (recommended)
Pyscicat is python client for working with the SciCat API, which provides an easy mechanism to ingest data. See https://www.scicatproject.org/pyscicat/howto/ingest.html to get started.

For an example of the full workflow, please see the `pyscicat.ipynb` Jupyter notebook in SciCat live: https://github.com/SciCatProject/scicatlive/blob/main/services/jupyter/config/notebooks/pyscicat.ipynb. This includes how to authenticate, create a dataset, add datablocks and upload an attachment.

## Manual ingestion

The following steps will add a dataset to your system using the API with the Linux program curl.

1.
Login to the backend. The default password for the ingestor user is aman. Running the command below in the terminal will yield an access token.

```
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{"username":"ingestor", "password":"<your_password>"}' 'http://localhost:3000/api/v3/Users/login'
```

2.
Create a json file with the contents below and name it metadata.json
```
{
"creationLocation": "/PSI/SLS/TOMCAT",
"sourceFolder": "/scratch/devops",
"type": "raw",
"ownerGroup":"p16623"
}
```

3.
cat the metadata.json file and pipe it to a curl command. Insert your access token in the command below and run it in the terminal:

```
cat metadata.json | curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d @- 'http://localhost:3000/api/v3/Datasets?access_token=YOUR_TOKEN_HERE'
```

There should now be a dataset in your mongoDB instance.



# Site Specific Examples

For site specific examples see the following links:
* [ESS](IngesManual_ESS.md)
* [PSI](ingestManual.md) In future to move to https://data-catalog-services.pages.psi.ch
Loading