diff --git a/README.md b/README.md index f9cac7fb..074a8862 100644 --- a/README.md +++ b/README.md @@ -55,6 +55,8 @@ Finally, there is current work in progress to support a combination of [multiple * [Remote HPC Configuration](https://pysqa.readthedocs.io/en/latest/advanced.html#remote-hpc-configuration) * [Access to Multiple HPCs](https://pysqa.readthedocs.io/en/latest/advanced.html#access-to-multiple-hpcs) * [Debugging](https://pysqa.readthedocs.io/en/latest/debug.html) + * [Local Queuing System](https://pysqa.readthedocs.io/en/latest/debug.html#local-queuing-system) + * [Remote HPC](https://pysqa.readthedocs.io/en/latest/debug.html#remote-hpc) # License `pysqa` is released under the BSD license https://github.com/pyiron/pysqa/blob/main/LICENSE . It is a spin-off of the `pyiron` project https://github.com/pyiron/pyiron therefore if you use `pysqa` for calculation which result in a scientific publication, please cite: diff --git a/docs/source/debug.md b/docs/source/debug.md index 2051ee36..76b9cf99 100644 --- a/docs/source/debug.md +++ b/docs/source/debug.md @@ -1,5 +1,16 @@ # Debugging -The configuration of a queuing system adapter, in particular in a remote configuration with a local installation of `pysqa` communicating to a remote installation on your HPC can be tricky. To simplify the process `pysqa` provides a series of utility functions: +The configuration of a queuing system adapter, in particular in a remote configuration with a local installation of `pysqa` communicating to a remote installation on your HPC can be tricky. + +## Local Queuing System +To simplify the process `pysqa` provides a series of steps for debugging: + +* When `pysqa` submits a calculation to a queuing system it creates an `run_queue.sh` script. You can submit this script using your batch command e.g. `sbatch` for `SLURM` and take a look at the error message. +* The error message the queuing system returns when submitting the job is also stored in the `pysqa.err` file. +* Finally, if the `run_queue.sh` script does not match the variables you provided, then you can test your template using `jinja2`: `Template(open("~/.queues/queue.sh", "r").read()).render(**kwargs)` here `"~/.queues/queue.sh"` is the path to the queuing system submit script you want to use and `**kwargs` are the arguments you provide to the `submit_job()` function. + +## Remote HPC +The failure to submit to a remote HPC cluster can be related with to an issue with the local `pysqa` configuration or an issue with the remote `pysqa` configuration. To identify which part is causing the issue, it is recommended to first test the remote `pysqa` installation on the remote HPC cluster: + * Login to the remote HPC cluster and import `pysqa` on a python shell. * Validate the queue configuration by importing the queue adapter using `from pysqa import QueueAdapter` then initialize the object from the configuration dictionary `qa = QueueAdapter(directory="~/.queues")`. The current configuration can be printed using `qa.config`. * Try to submit a calculation to print the hostname from the python shell on the remote HPC cluster using the `qa.submit_job(command="hostname")`.