-
Notifications
You must be signed in to change notification settings - Fork 13
Description
We currently index/upload:
- Validphys reports
- fits
- theories
and implicitly every file contained in a valdphys report.
The way it is done has many issues.
- It is insecure because it currently uses an ssh connection with more privileges than those required to upload a file, such as to manipulate every file ever uploaded. People rarely bother with
ssh-agentso the ssh keys are stored typically unencrypted. - It is inefficient because we only have functionality corresponding to
reindex_allthat scans over all the files in order to build an index, and it is triggered every time a single file is updated. This is not so negligible when we reconstuct metadata in complicated ways such as by parsing a large html file (as done for validphys reports withoutmeta.yaml). - Common parts corresponding to indexing fits, theories and reports could maybe be bundled together.
- It is un(der)documented and basically only I understand how it works.
I think ideally the indexing should have the following properties
-
Single source of truth based on the content of the files we index. There should not be a database with potentially conflicting information.
-
The basic actions are
index_oneandreindex_all, where index one adds one item to the index that the user sees, andreindex_allrebuilds the index from scratch. -
The index_one action can be performed without full privileges, limiting the scope of the potential loss of credential required for uploading.
These can be more or less done with the current serverscripts layout, which is good in that it is dead simple minded but currently bad in all the ways above.